![]() |
|
Admin Console Help
Home |
Content Sources > Web Crawl > Freshness TuningUse the Content Sources > Web Crawl > Freshness Tuning page to fine-tune the timing of crawls for different URLs. You can fine-tune crawling by:
Before Starting this TaskBefore fine-tuning the timing of crawls on different URLs, complete the tasks listed in the following table.
Specifying URL Patterns to Crawl FrequentlyUse Crawl Frequently for URL patterns for content that changes frequently, as often as once an hour or even every few minutes. Crawling these URLs frequently keeps your serving index fresh. It is possible to slow the system down by overloading the frequently changing content section. Try to keep the number of URLs fairly small to avoid reduced performance. To set options for crawling frequently changing content:
Specifying URLs Patterns to Crawl InfrequentlyUse Crawl Infrequently to index documents that are never updated or modified, such as a stable database, or that are only incrementally added to, such as in a mail or a news archive. With this option, you can set the crawler to crawl them once a week, once a month, or no more than once every 3 months. This reduces the load on your web servers. To set options for crawling archival servers:
Specifying Always Force Recrawl of URL PatternsThe first time URLs are crawled, the data is indexed and stored on disk. Subsequently, to allow for faster crawls and less load on the servers, only files modified after the date in the Appliance's If-Modified-Since request header will be recrawled. These updates are added to the index. Type URL patterns in the Always Force Recrawl section only if out-of-date pages are displayed in your index. The crawler attempts to determine which servers contain content with incorrect dates and attempts to adjust automatically, other types of errors may be present. Make sure that your servers maintain the correct time. If you think one or more of your web servers does not support the If-Modified-Since option or is misconfigured, use this section to type URL patterns to recrawl. Refer problems with your web servers to your webmaster. To force recrawling certain URL patterns, regardless of your web server's response to If-Modified-Since:
Specifying Recrawl of URL PatternsIf you discover that a set of URLs has not been recrawled recently (usually because changes made to the web pages or because of a temporary error or misconfiguration present when the crawler last tried to crawl the URL), you can type the pattern in the Recrawl these URL Patterns box to inject it quickly into the queue of URLs the search appliance is crawling. The URL is crawled soon, unless there are higher priority URLs in the queue. To have the search appliance recrawl a URL pattern:
For More InformationFor detailed information about freshness tuning, see "Administering Crawl: Advanced Topics," which is linked to the Google Search Appliance help center.
|
||||||||||
© Google Inc.
|