Sign in

Custom Search Engine XML Specification

  1. Sample Custom Search Engine
  2. External Annotations
  3. Inline Annotations
  4. Look and Feel
  5. Refinements
  6. Using Google's Labels
  7. Combining Labels
  8. Subscribed Links
  9. Synonyms New!
  10. Labels

A Custom Search Engine consists of an XML specification of the engine and a set of annotations. Annotations are used to specify the webpages your search engine searches over. You can use the wizard for creating a Custom Search Engine (and not bother with any XML). However, there are some features that are not available through the form based interface. To access these features, you need to specify your Custom Search Engine using XML.

If you have a single CSE, we recommend using the XML to specify your custom search engine only if you want to use some of the more advanced features. While the XML gives you a great deal of flexibility, it is also very easy to make mistakes. Further, using XML may sometimes interact poorly with the Google marker.

We strongly recommend that you download the XML specification of your search engine generated by the system for you. In case something goes wrong, you can always upload this XML and revert back to your old state.

You will need to use XML if you want to host the CSE specification on your own site. You might want to do this if you have many CSEs, if you programmatically generate CSEs or if you use third party tools to generate CSEs.

We will illustrate the XML specification by using a sequence of examples.

Example 1. Sample Custom Search Engine

We first begin with a example that corresponds to a simple Custom Search Engine. The following is the XML specification of the RealClimate Custom Search Engine:
<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine volunteers="false"
        keywords="climate &quot;global warming&quot; &quot;greenhouse gases&quot;">
    <Title>RealClimate</Title>
    <Description>"..."</Description>
    <Context>
      <BackgroundLabels>
        <Label name="_cse_hwbuiarvsbo" mode="FILTER"/>
        <Label name="_cse_exclude_hwbuiarvsbo" mode="ELIMINATE"/>
      </BackgroundLabels>
    </Context>
  </CustomSearchEngine>
</GoogleCustomizations>
Here are the definitions of elements and attributes used in the above example.
  • CustomSearchEngine: This element specifies the custom search engine.
  • volunteers:(optional) This attribute on the CustomSearchEngine element specifies whether the search engine create is accepting volunteer applications. Value is true or false. Note that this field will have an effect only for Google stored CSE. A Linked CSE does not have collaboration features and this field is ignored.
  • keywords:(optional) This attribute on the CustomSearchEngine element specifies a list of keywords associated with the search engine. The keywords should be words that are likely to appear on pages in the cse. They are used to tune the search results. The keywords are in the same format as Google search queries, see the search tips for more information.
  • Title:(optional) Title of your custom search engine.
  • Description:(optional) Description of your custom search engine. This will appear on the Co-op home page of your search engine
  • Context: The CustomSearchEngine element consists of a single (at this point in time) Context sub-element.
  • BackgroundLabels:(optional) This element specifies the labels that are to be implicitly included in every query. BackgroundLabels can include any number of labels.
  • Simple Custom Search Engine's have 2 background labels, one to boost or filter the preferred sites and one to remove the excluded sites. Search results are controlled by annotating sites/url-patterns with either of these two labels.

    To prevent accidental label name collision, the form based interface auto generates background label names for each Custom Search Engine. You can change these, but when you do so, remember to change the annotations associated with the old label name, if any. If you have more than two background labels, you may not be able to use the Google Marker with this Custom Search Engine.

    If the background labels are in FILTER mode, all searches are restricted to the sites annotated with that label.

  • Label: This specifies the actual label that is to be included as part of every request to the Custom Search Engine. The various attributes and subelements of Label are described later.
Note that the specification does not make any reference to which annotations to use. In the case of Google Stored CSEs, the annotations used are those created by the author of the CSE and the collaborators on the CSE. For Linked CSEs, you need to specify which annotations should be used. There are two ways of doing this - by using External annotations or by using Inline annotations.

Example 2. External Annotations

You can specify external annotation files in your CSE definition. These external annotation files could be hosted on your site or some other site. Read the XML specification for annotations for information on annotation file format. The following example illustrates the use of referencing an external annotations files. External annotations are typically used for Linked CSEs though they may also be used for Google Stored CSEs.
<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine ... >
    ...
  </CustomSearchEngine>

  <!-- An XML annotation file hosted on some site -->
  <Include type="Annotations" href="http://mysite.com/myannofile.xml" />
 
  <!-- The output of a third party script that takes a given user as a parameter 
       and spits out the user's bookmarks in XML annotation format -->
  <Include type="Annotations" href="http://mybookmarks.com/createannofile?user=foo" />
</GoogleCustomizations>

As shown in the examples, you could either host XML files on your server or output the XML thorough a script that converts your data into an XML annotation file. This script could get data from anywhere - your database, RSS or Atom feeds, iCal feeds, Open Directory, etc. The files you include could also use <Include> tags themselves to include more files. We follow nested <Include>s to a limit currently set to five levels for a total of 50 annotation files.

Note that the <Include> tag only specifies where to fetch annotations from. It does not say anything about how those annotations should be used - whether they should be included in or excluded from your CSE. An annotation file is simply an association of sites or url patterns to labels. You specify how to use these labels in your CSE definition. This enables you to use a single annotation file for different purposes in multiple CSEs. For example, one CSE could restrict its search to those sites, another CSE could eliminate those sites, and yet another CSE could simply promote those sites. Read more about labels.

Example 3. Inline Annotations

If you have only a few annotations that you don't expect to change often, you could also embed the annotations in the CSE specification as shown in the following example. Inline annotations are typically used for Linked CSEs. Make sure that the annotations have at least one label and that the label corresponds to at least one of the background labels.
<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine ... >
    <Context>
      <BackgroundLabels>
        <Label name="_cse_solar_example" mode="FILTER"/>
        <Label name="_cse_exclude_solar_example" mode="ELIMINATE"/>
      </BackgroundLabels>
    </Context>
  </CustomSearchEngine>

  <Annotations>
    <Annotation about="http://www.solarenergy.org/*">
      <Label name="_cse_solar_example"/>
    </Annotation>
    <Annotation about="http://www.solarfacts.net/*">
      <Label name="_cse_solar_example"/>
    </Annotation>
  </Annotations>
</GoogleCustomizations>

Example 4. Look and Feel

You can associate a logo with your Custom Search Engine and you can also change the colors of the background, border, title, text, visited links, and associated information with the search result. This is an example of the XML specification:

<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine ... >
    <Context>
      ...
    </Context>

    <LookAndFeel>
      <Logo url="http://www.ascii.com/logo.gif" destination="http://www.ascii.com/" height="90"/>
      <Colors url="#3366cc" background="#FFFFFF" border="#336699" title="#0000CC" text="#000000" visited="#ffbd10" light="#000000"/>
    </LookAndFeel>

  </CustomSearchEngine>
</GoogleCustomizations>

This Custom Search Engine is associated with the logo found at http://www.ascii.com/logo.gif. The destination indicates where users should be taken if they click on the logo. Please specify the height of the logo as it will help us scale it appropriately.

This Custom Search Engine color specification defines the colors for the search results page:

  • background: the background should be #FFFFFF
  • border: the border should be #336699 (the border shows around ads and above search results)
  • title: the title of the search result should be #0000CC
  • text: the text beneath the title should be #000000
  • url: the url beneath the text should be #3366cc
  • visited: when a url is visited it should turn #FFBD10
  • light: other information associated with the url, (e.g. Cached) should be #000000

Example 5. Refinements

The following example shows the specification for the CS Curriculum Search engine:

<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine>
    <Title>Curriculum</Title>

    <Context refinementsTitle="Refine results for $q:">

      <Facet>
        <FacetItem title="Lectures">
           <Label name="lectures" mode="BOOST" weight="0.8">
              <Rewrite>lecture OR lectures</Rewrite>
           </Label>
        </FacetItem>
      </Facet>

      <Facet>
        <FacetItem title="Assignments">
          <Label name="assignments" mode="BOOST" weight="0.8">
            <Rewrite>homework OR assignment OR assignments</Rewrite>
          </Label>
        </FacetItem>
     </Facet>

     <Facet>
       <FacetItem title="Reference">
         <Label name="reference" mode="FILTER">
            <IgnoreBackgroundLabels>true</IgnoreBackgroundLabels>
         </Label>
       </FacetItem>
     </Facet>

     <Facet>
        <FacetItem title="Papers">
          <Redirect url="http://scholar.google.com/scholar?q=$q"/>
        </FacetItem>
      </Facet>

      <BackgroundLabels>
        <Label name="_cse_omuauf_lfve" mode="FILTER"/>
        <Label name="_cse_exclude_omuauf_lfve" mode="ELIMINATE"/>
      </BackgroundLabels>

    </Context>

  </CustomSearchEngine>
</GoogleCustomizations>
Here are the definitions of elements and attributes used in the above example. All elements and attributes are neccesary unless specified.
  • refinementsTitle: (optional) The refinements title appears above the refinements. It may include the term '$q', which will be replaced by the user's query. If not specified, the default message "Refine results for $q:" will be displayed.
  • Facet: (optional) Facets are used to organize refinements.
  • title: The anchor text that will be used for the refinement. It may include the term '$q', which will be replaced by the user's query.
  • FacetItem: (optional) Each facet item is a refinement. A Facet item should contain either a Label or a Redirect.
  • Redirect: (optional) Redirect can be used to send the user's query off to some other page. The redirect url may have the term '$q', which will get replaced by the user's query. Note that if your url has cgi arguments, they should be separated by '&amp; and not &.

    One interesting use of Redirect is to send the user to a different custom search engine, thereby allowing nested contexts. The machine learning search engine uses this.

  • Label: Each refinement must specify a label.
A Context can have up to four facets, each with four facet items. These refinements will appear at the top of the search results.

Example 6. Using Google's Labels

The Custom Search Engine platform comes with a number of prebuilt labels that you can use, either in your refinements or in your background labels. The following is a simple example that eliminates shopping comparison sites from search results.
<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine version="1.0" volunteers="false">
    <Title>No Shopping Comparisons</Title>
    <Description>...</Description>

   <Context>
      <BackgroundLabels>
        <Label name="shopping_comparisons" mode="ELIMINATE"/>
      </BackgroundLabels>
    </Context>
  </CustomSearchEngine>
</GoogleCustomizations>
You can not only use Google provided labels, but also enhance them by adding and removing sites from them. So, you might decide that some of the pages that Google has labeled 'shopping_comparisons' are not shopping comparison pages or that Google has missed out some shopping comparison sites. In either case, you can enhance shopping_comparisons, for your custom search engine, by adding the appropriate annotations yourself or by collaborating with someone who has added the appropriate annotations.

Example 7. Combining Labels

You can use any number of background labels, combining them in interesting ways. The following example from the Tech stuff search engine combines its own label together with Google's to obtain a slice of the index that contains just forums/message-boards about Tech stuff.
<?xml version="1.0"?>

<GoogleCustomizations version="1.0">
  <CustomSearchEngine version="1.0" volunteers="true">
    <Title>Tech Forums Context</Title>

    <Context>
      <BackgroundLabels>
        <Label name="_cse_mvlbkhd6eso" mode="FILTER"/>
        <Label name="forums" mode="FILTER"/>
      </BackgroundLabels>
    </Context>
  </CustomSearchEngine>
</GoogleCustomizations>

Example 8. Subscribed Links

You can add Subscribed Links to your Custom Search Engine in order to display custom results on relevant queries. You might wish to create your own Subscribed Links to give users of your Custom Search Engine information tailored to the area in which you specialize. You can also use Subscribed Links created by others; to see some of what's available, browse the directory.

The following example creates a Custom Search Engine focused on news sites, using its own label together with Google's, and adds the Digg and Search Engine Watch Subscribed Links to provide news headlines from Digg.com and SearchEngineWatch.com:

<GoogleCustomizations version="1.0">
  <CustomSearchEngine version="1.0" volunteers="false">
    <Title>NewsWithDigg</Title> 
    <Description>...</Description>

    <Context>
      <BackgroundLabels>
        <Label name="_cse_mvlbkhd6e324" mode="BOOST" weight="0.9"/>
        <Label name="news" mode="FILTER"/>
      </BackgroundLabels>
    </Context>
    <SubscribedLinks>
      <SubscribedLink creator="017771777217723414381"/>
      <SubscribedLink creator="000455946290257563395"/>
    </SubscribedLinks>
  </CustomSearchEngine>
</GoogleCustomizations>

Observe that the "creator" attribute for each Subscribed Link you want to add must be the same as the value of the user= parameter for the Subscribed Link provider's profile page.

Please note that subscribed linkes do not currently work with Linked CSEs, but we expect this to be changed soon.

New! Example 9. Synonyms

Synonyms enable visitors to find documents with related terms without issuing multiple queries. For example, if a user enters a query "cd" - the search engine will automatically search for all documents that contain the words "cd" and "certificate of deposit". This provides users with most relevant results without entering multiple queries to search for documents. The synonym dictionary can include alternate words or phrases for commonly entered search queries. Synonyms, like all query terms, are case insensitive.

<GoogleCustomizations version="1.0">
  <CustomSearchEngine ...>
    <Title>...</Title>
    <Description>...</Description>

    <Context>
      <Synonyms>
        <SynonymEntry word="cd">
          <Synonym>certificate of deposit</Synonym>
        </SynonymEntry>
        <SynonymEntry word="aaa">
          <Synonym>automobile association of america</Synonym>
        </SynonymEntry>
        <SynonymEntry word=...>
          <Synonym>...</Synonym>
        </SynonymEntry>
     </Synonyms>
   </Context>
</GoogleCustomizations>

Notes:

  • Max 100 SynonymEntry allowed for one custom search engine.
  • Max 10 Synonym allowed for each word.

Labels

Labels are the primitives for building Custom Search Engines. They can be explicit (as with refinements) or implicit (as with background labels).

They are used to modify the Google ranking function. The modifications can range from a hard restrict to a specified set of documents to a subtle reordering of results. The modifications come in effect when the query includes that label. The label can be implemented by a combination of query rewrites and promotion/demotion of results matching that label. At this point, background labels cannot use query rewrites.

Labels can have the following elements and attributes:
  • Rewrite

    If you would like to narrow the scope of pages that are returned when clicking on a specific label, you can specify a rewrite for that label. Rewrites are just additional search terms that are appended to a user's query when they click on a label. The format for a rewrite is as follows:

     <Label name="assignments" mode="BOOST" weight="0.8">
         <Rewrite>homework OR assignment OR assignments</Rewrite>
     </Label>
    

    In this example, we want to retrieve documents that also (in addition to the user's query) have one of the words 'homework', 'assignment', or 'assignments' when users click on the assignments label. Rewrites have a maximum of 100 character limit. They should be all lowercase, with the exception of any upppercase operators like OR.

  • mode & weight

    The label can also specify promotion/demotion of results matching that label. The direction and magnitude of promotion/demotion is specified by the following attributes:

    • mode: Mode can have one of the following values:
      • FILTER Only allow results that have been annotated with this label
      • ELIMINATE Remove results that have been annotated with this label
      • BOOST Promote/demote results that have been annotated with this label. How much they get promoted or demoted depends on the 'weight' attribute.
    • weight: A number between -1 and 1, that indicates how much results annotated with that label should be demoted or promoted.
    Weights less than zero indicate that results matching that label should be demoted. Weights greater than zero indicate that results matching that label should be promoted. Note that using a weight of -1 or 1 is not the same as setting the mode to ELIMINATE or FILTER respectively. When the mode is set to FILTER, the only results shown are those that match the label. So, if there are no results matching the label, nothing will be shown. On the other hand, when the mode is BOOST and the weight is 1, results matching the label are just strongly preferred over those that don't. So, results may be shown even when none of them match the label. Similarly if the mode is ELIMINATE and all the results match the label, none of them will be shown. On the other hand, if the mode is BOOST and the weight is -1, the results matching the label may be shown. The net boost applied to a result is a function of many parameters, including the weight on the label, the score, if any on the annotation, the number of people collaborating on the Custom Search Engine who have annotated the result with that label, etc.

  • IgnoreBackgroundLabels

    A refinement label can specify that when it is selected, the background labels in the context should be ignored. This is useful if the refinement wants to present a point of view that is different from that of the rest of the Custom Search Engine.

  • top

    Boost labels can have an additional attribute top which takes an integer value N. When set, the top N search results come from this label if the patterns associated with thtat label are sufficiently relevant to the user's query.

    As an example, if you wanted your forums to show up as the top 3 results with little regard for ranking, your label would look something like:

     <Label name="my_forums" mode="BOOST" top="3"/>