Content Sources > Web Crawl > Coverage Tuning

Use the Content Sources > Web Crawl > Coverage Tuning page to control the number of URLs the search appliance crawls for a site. For example, you might want to limit the number of URLs the search appliance crawls in one site and allocate more space for other sites.

You tune crawl coverage by entering the URL pattern and setting the maximum number of URLs to crawl for it. The URL patterns you provide must conform to the "Rules for Valid URL Patterns" in "Administering Crawl: Constructing URL Patterns," which is linked to the Google Search Appliance help center.

Note that coverage tuning limits behave differently than the license limit. When a coverage tuning limit is reached, no new URLs that match the coverage tuning pattern will be crawled. In contrast, when the license limit is reached, the search appliance will discard crawled URLs with a lower priority (PageRank) in favor of crawling new URLs of a higher priority. 

Tuning Crawl Coverage

To tune crawl coverage, create a crawler coverage configuration and recrawl the URL pattern:

  1. Under For URLs Matching Pattern, type the URL patterns to be limited.
  2. Under Specify the maximum number of URLs you wish to crawl, enter a number that is less than your license limit.
  3. Optionally, to add more lines, click Add More Rows.
  4. Click Save.

When the value for coverage tuning is increased for a particular URL pattern to add more documents, the URL pattern needs to be recrawled manually. To recrawl a URL pattern, either click Recrawl this Pattern on the Index > Diagnostics > Index Diagnostics page or follow these steps:

  1. Click Content Sources > Web Crawl > Freshness Tuning.
  2. Under Recrawl these URL Patterns, type the URL patterns that you entered for coverage tuning.
  3. Click Recrawl.

