Sign in

Annotations label file formats

Why would I create a file of annotations?

You can add sites (or url patterns) one at a time to your Custom Search Engine in the control panel of your Custom Search Engine or with the Google Marker. If you would like to add a large group of websites to a search engine, all at once, you can also upload files with a list of annotations using one of the file formats described in this document.

Another reason for uploading annotations in one of these file formats is that you can also associate scores with url patterns, a feature not currently available in the control panel or the Google Marker. These scores can be used to control the ranking of search results.

Finally, if you have a list of sites, already lying around, in a format such as OPML, you might want to upload the OPML file, rather than type in each url.

When a result matches multiple url patterns, the one that is used is the most specific pattern. This is especially useful when you are using patterns with scores.

Where do I upload my file?

You can upload your annotations file, XML, OPML or tab-delimited, to your search engine using the "Advanced" tab in the control panel of your Custom Search Engine. This upload page shows you all of the annotation and advanced search engine files that you have uploaded. The upload process will identify any errors.

OPML file format

We currently accept OPML files such as those generated by Bloglines. The value of 'htmlUrl' attribute from each 'outline' element is added to the list of sites used by the custom search engine.

Tab delimited file format

You can upload annotations for your Custom Search Engine in tab delimited format. If you want to upload a file for your Custom Search Engine, with identifier 004084431507867063159:Ansi-stoubiq (the identifier can be found on the advanced tab or in the code tab of the control panel), you must label your URL patterns with "_cse_Ansi-stoubiq" if you want to include them or with "_cse_exclude_Ansi-stoubiq" to exclude them from your Custom Search Engine.

For example this file will include URL pattern "www.webmd.com/hw/*" but exclude "www.webmd.com/hw/cancer/*".

URL        Label
www.webmd.com/hw/*        _cse_Ansi-stoubiq
www.webmd.com/hw/cancer/*        _cse_exclude_Ansi-stoubiq

You can also specify a score for the annotation, a description, and any number of additional attributes you want to give to the annotation. The following is a tab-delimited example annotations file that includes URLs and URL patterns for some disease-related webpages.

URL        Label        Label        Label        Score        Comment        A=Date
www.cancer.gov/cancertopics/types/liver/*        _cse_Ansi-stoubiq  symptoms      This labels this url as symptoms.  20060504
www.medicinenet.com/liver_cancer/*        _cse_Ansi-stoubiq  symptoms    1.0  This labels this url as symptoms.  20060504
www.webmd.com/hw/cancer/*        _cse_Ansi-stoubiq  symptoms  for_patients  1.0  This is a great site for patients!  20060504
www.oncologychannel.com/*/treatment        _cse_Ansi-stoubiq  treatment          20060504
www.sirweb.org/*Treatments        _cse_Ansi-stoubiq  treatment    0.7    20060504

The first line in the file must be as in the above snippet and specify the fields used in your file. Each subsequent line corresponds to an annotation. Within an annotation you can label a URL or URL pattern with multiple labels. Each label must have its own column within your file.

Each annotation can also specify a score between -1 and 1. A positive score indicates that this URL is more relevant to this label. A negative score indicates that this URL is less relevant to this label. To add a score to an annotation just add a column in your file with "Score" as heading and place the score (scale of -1.0 to 1.0) in that column.

You can also have comments associated with labels. The comments cannot contain tabs. The sample file above includes some comments. Scroll to the right to view these examples.

You can also add your own attributes. For example, the user above defined a Date attribute. Each attribute must begin with "A=".

The order of the columns in your file doesn't matter. Headings are case-insensitive.

If you created your custom search engine using the wizard (as opposed to specifying it using XML) and want to use the tab delimited file format to upload annotations, you need to make sure that the label you specify in your annotations are the ones used in your custom search engine.

To get these labels, please consult the 'Advanced' section of the control panel of your custom search engine, which will give you two labels --- one for adding sites to your custom search engine and one for exlcuding sites from your custom search engine. Make sure that the labels in your annotation files are the same as those used by your custom search engine.

XML file format

You can upload annotations for your Custom Search Engine in XML format. If you want to upload a file for your Custom Search Engine, with identifier 004084431507867063159:Ansi-stoubiq (the identifier can be found on the advanced tab or in the code tab of the control panel), you must label your URL patterns with "_cse_Ansi-stoubiq" if you want to include them or with "_cse_exclude_Ansi-stoubiq" to exclude them from your Custom Search Engine.

For example this file will include URL pattern "www.webmd.com/hw/*" but exclude "www.webmd.com/hw/cancer/*".

<GoogleCustomizations>
  <Annotations file="my-health-annotations.xml">
    <Annotation about="www.webmd.com/hw/*">
      <Label name="_cse_Ansi-stoubiq"/>
    </Annotation>
    <Annotation about="www.webmd.com/hw/cancer/*">
      <Label name="_cse_exclude_Ansi-stoubiq"/>
    </Annotation>
  </Annotations>
</GoogleCustomizations>

The following illustrates the format of an annotations XML file. The XML file format has the same features as the tab-delimited format, except that you are not allowed to add your own attributes.

<GoogleCustomizations>
  <Annotations file="livercancer-annotations.xml">
    <Annotation about="www.cancer.gov/cancertopics/types/liver/*">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="symptoms"/>
      <Comment>This labels this url as symptoms.</Comment>
    </Annotation>
  
    <Annotation about="www.medicinenet.com/liver_cancer/*" score="1.0">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="symptoms"/>
      <Comment>This labels this url as symptoms.</Comment>
    </Annotation>
  
    <Annotation about="www.webmd.com/hw/cancer/*" score="0.7">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="symptoms"/>
      <Label name="for_patients"/>
      <Comment>This labels this url as symptoms and for_patients.</Comment>
    </Annotation>
  
    <Annotation about="www.oncologychannel.com/*/treatment">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="treatment"/>
    </Annotation>
  
    <Annotation about="www.sirweb.org/*Treatments" score="0.7">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="treatment"/>
    </Annotation>
  </Annotations>
</GoogleCustomizations>

There are certain cases where you might want to manage the annotations files yourself by making them available to your Custom Search Engine from your website. If you manage them yourself you are not subject to the global per-user annotation limit, and you can easily update your annotations without pushing data to Google. Suppose you have an XML-formatted annotations file at http://mysite.com/myannofile.xml. To include these annotations in your search engine, simply refer to http://mysite.com/myannofile.xml in the XML annotation file you upload through the "Advanced" tab, as shown below.

<GoogleCustomizations>
  <Annotations file="livercancer-annotations.xml">
    <Annotation about="www.cancer.gov/cancertopics/types/liver/*">
      <Label name="_cse_Ansi-stoubiq"/>
      <Label name="symptoms"/>
      <Comment>This labels this url as symptoms.</Comment>
    </Annotation>
         ...
  </Annotations>
  <Include type="Annotations" href="http://mysite.com/myannofile.xml" />
</GoogleCustomizations>

As with the tab delimited format, if you created your custom search engine using the wizard (as opposed to specifying it using XML) and want to use the XML file format to upload annotations, you need to make sure that the labels you specify in your annotations are the ones used in your custom search engine.

To get these labels, please consult the 'Advanced' section of the control panel of your custom search engine, which will give you two labels --- one for adding sites to your custom search engine and one for exlcuding sites from your custom search engine. Make sure that the labels in your annotation files are the same as those used by your custom search engine.