Indexable File Formats
This document lists the file formats that the Google Search Appliance can crawl, index, and search.
Overview
Crawled with empty body: Conversion error
•
|
•
|
How the Google Search Appliance Determines the Document Title
The search appliance ignores the title tag in a web page if it has only one character.
PDF Documents
XLS Documents
The search appliance uses the Properties > Title property as the title in the search index. If the search appliance is unable to do this, it uses the name of the first worksheet.
<meta http-equiv="Content-Type" content="text/html; charset=Latin1">
<meta name="Producer" content="Acrobat Distiller 4.05 for Windows">
<meta name="ModDate" content="D:20011129112148-06’00’">
<meta name="Author" content="Charles Dickens">
<meta name="CreationDate" content="D:20011129112114">
<meta name="Creator" content="Microsoft Word 9.0">
Text Documents
Indexable Word Processing Formats
The following table lists supported word processing formats.
Indexable Spreadsheet Formats
The following table lists supported spreadsheet formats.
Versions 3.0 through 2010 (2007 with extensions xlsx and xlsm) |
||
StarOffice versions 5.2–9.0, and OpenOffice version 1.1 (Text only) |
||
Indexable Database Formats
The following table lists supported database formats.
Indexable Graphics Formats
Versions 1.0–1.7 (Acrobat Versions 1–9, including Japanese PDF) |
||
Indexable Presentation Formats
The following table lists supported presentation formats.
StarOffice versions 5.2–9.0 and OpenOffice version 1.1 (text only) |
||
Indexable Email Formats
The following table lists supported email formats.
mht, multipart (alternative, digest, mixed, newsgroup, signed), tnef |
||
Indexable Multimedia Formats
The following table lists supported multimedia formats.
Indexable Archive Formats
The following table lists supported archive formats.
Note that the search appliance only indexes file names and plain text files inside the archive.
7z Self Extracting .exe (BZIP2 and split archives not supported) |
||
To enable the search appliance to crawl these types of compressed files, comment out these file types under Do Not Follow Patterns on the Content Sources > Web Crawl > Start and Block URLs page.
Other Indexable Formats
The following table lists other supported formats.