Annotations label file formats
Why would I create a file of annotations?
You can add sites (or url patterns) one at a time to your Custom Search Engine in the control panel of
your Custom Search Engine or with the Google Marker. If you would like to add a large group of websites
to a search engine, all at once, you can also upload files with a list of annotations using one of the
file formats described in this document.
Another reason for uploading annotations in one of these file formats is that you can also
associate scores with url patterns, a feature not currently available in the control panel
or the Google Marker. These scores can be used to control the ranking of
search results.
Finally, if you have a list of sites, already lying around, in a format such as OPML,
you might want to upload the OPML file, rather than type in each url.
When a result matches multiple url patterns, the one that is used is the most specific pattern. This
is especially useful when you are using patterns with scores.
Where do I upload my file?
You can upload your annotations file, XML, OPML or tab-delimited, to your search engine using the
"Advanced" tab in the control panel of your Custom Search Engine. This upload page shows you all of
the annotation and advanced search engine files that you have uploaded. The upload process will
identify any errors.
OPML file format
We currently accept OPML files such as those generated by Bloglines. The value of 'htmlUrl' attribute
from each 'outline' element is added to the list of sites used by the custom search engine.
Tab delimited file format
You can upload annotations for your Custom Search Engine in tab delimited format. If you want to
upload a file for your Custom Search Engine, with identifier
004084431507867063159:Ansi-stoubiq
(the identifier can be found on the advanced tab or in the code tab of the control panel), you must
label your URL patterns with "_cse_Ansi-stoubiq" if you want to include them or with
"_cse_exclude_Ansi-stoubiq" to exclude them from your Custom Search Engine.
For example this file will include URL pattern "www.webmd.com/hw/*" but exclude "www.webmd.com/hw/cancer/*".
URL Label
www.webmd.com/hw/* _cse_Ansi-stoubiq
www.webmd.com/hw/cancer/* _cse_exclude_Ansi-stoubiq
You can also specify a score for the annotation, a description, and any number of additional
attributes you want to give to the annotation. The following is a tab-delimited example
annotations file that includes URLs and URL patterns for some disease-related webpages.
URL Label Label Label Score Comment A=Date
www.cancer.gov/cancertopics/types/liver/* _cse_Ansi-stoubiq symptoms This labels this url as symptoms. 20060504
www.medicinenet.com/liver_cancer/* _cse_Ansi-stoubiq symptoms 1.0 This labels this url as symptoms. 20060504
www.webmd.com/hw/cancer/* _cse_Ansi-stoubiq symptoms for_patients 1.0 This is a great site for patients! 20060504
www.oncologychannel.com/*/treatment _cse_Ansi-stoubiq treatment 20060504
www.sirweb.org/*Treatments _cse_Ansi-stoubiq treatment 0.7 20060504
The first line in the file must be as in the above snippet and specify the fields
used in your file. Each subsequent line corresponds to an annotation. Within an annotation you
can label a URL or URL pattern with multiple labels. Each label must have its
own column within your file.
Each annotation can also specify a score between -1 and 1.
A positive score indicates that this URL is more relevant to
this label. A negative score indicates that this URL is less relevant to this
label.
To add a score to an annotation just add a column in your file with "Score" as heading
and place the score (scale of -1.0 to 1.0) in that column.
You can also have comments associated with labels. The comments cannot contain
tabs. The sample file above includes some comments. Scroll to the right to view
these examples.
You can also add your own attributes. For example, the user above defined a Date
attribute. Each attribute must begin with "A=".
The order of the columns in your file doesn't matter. Headings are case-insensitive.
If you created your custom search engine using the wizard
(as opposed to specifying it using XML) and want to use
the tab delimited file format to upload annotations, you need to make sure that the
label you specify in your annotations are the ones used in your custom search engine.
To get these labels, please consult the 'Advanced' section of the control panel of
your custom search engine, which will give you two labels --- one for adding sites
to your custom search engine and one for exlcuding sites from your custom search engine.
Make sure that the labels in your annotation files are the same as those used by
your custom search engine.
XML file format
You can upload annotations for your Custom Search Engine in XML format. If you want to upload a file for your Custom Search Engine, with identifier 004084431507867063159:Ansi-stoubiq (the identifier can be found on the advanced tab or in the code tab of the control panel), you must label your URL patterns with "_cse_Ansi-stoubiq" if you want to include them or with "_cse_exclude_Ansi-stoubiq" to exclude them from your Custom Search Engine.
For example this file will include URL pattern "www.webmd.com/hw/*" but exclude "www.webmd.com/hw/cancer/*".
<GoogleCustomizations>
<Annotations file="my-health-annotations.xml">
<Annotation about="www.webmd.com/hw/*">
<Label name="_cse_Ansi-stoubiq"/>
</Annotation>
<Annotation about="www.webmd.com/hw/cancer/*">
<Label name="_cse_exclude_Ansi-stoubiq"/>
</Annotation>
</Annotations>
</GoogleCustomizations>
The following illustrates the format of an annotations XML file. The XML file format has
the same features as the tab-delimited format, except that you are not allowed to add
your own attributes.
<GoogleCustomizations>
<Annotations file="livercancer-annotations.xml">
<Annotation about="www.cancer.gov/cancertopics/types/liver/*">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="symptoms"/>
<Comment>This labels this url as symptoms.</Comment>
</Annotation>
<Annotation about="www.medicinenet.com/liver_cancer/*" score="1.0">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="symptoms"/>
<Comment>This labels this url as symptoms.</Comment>
</Annotation>
<Annotation about="www.webmd.com/hw/cancer/*" score="0.7">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="symptoms"/>
<Label name="for_patients"/>
<Comment>This labels this url as symptoms and for_patients.</Comment>
</Annotation>
<Annotation about="www.oncologychannel.com/*/treatment">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="treatment"/>
</Annotation>
<Annotation about="www.sirweb.org/*Treatments" score="0.7">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="treatment"/>
</Annotation>
</Annotations>
</GoogleCustomizations>
There are certain cases where you might want to manage the annotations files yourself by making
them available to your Custom Search Engine from your website. If you manage them yourself you
are not subject to the global per-user annotation limit, and you can easily update your annotations
without pushing data to Google. Suppose you have an XML-formatted annotations file
at http://mysite.com/myannofile.xml. To include these annotations
in your search engine, simply refer to http://mysite.com/myannofile.xml
in the XML annotation file you upload through the "Advanced" tab, as shown below.
<GoogleCustomizations>
<Annotations file="livercancer-annotations.xml">
<Annotation about="www.cancer.gov/cancertopics/types/liver/*">
<Label name="_cse_Ansi-stoubiq"/>
<Label name="symptoms"/>
<Comment>This labels this url as symptoms.</Comment>
</Annotation>
...
</Annotations>
<Include type="Annotations" href="http://mysite.com/myannofile.xml" />
</GoogleCustomizations>
As with the tab delimited format, if you created your custom search engine using the wizard
(as opposed to specifying it using XML) and want to use
the XML file format to upload annotations, you need to make sure that the
labels you specify in your annotations are the ones used in your custom search engine.
To get these labels, please consult the 'Advanced' section of the control panel of
your custom search engine, which will give you two labels --- one for adding sites
to your custom search engine and one for exlcuding sites from your custom search engine.
Make sure that the labels in your annotation files are the same as those used by
your custom search engine.
|