Back to Home | Admin Console Help | Log Out
 Admin Console Help
 
Admin Console Help

Home

Content Sources
 Web Crawl
   Start and Block URLs
   Case-Insensitive Patterns
   Proxy Servers
   HTTP Headers
   Duplicate Hosts
   Coverage Tuning
   Crawl Schedule
   Host Load Schedule
   Freshness Tuning
   Secure Crawl
  Connector Managers
  Connectors
  Feeds
  Groups
  OneBox Modules
  Diagnostics

Index

Search

Reports

GSA Unification

GSAn

Administration

More Information

Content Sources > Web Crawl > Proxy Servers

Use the Content Sources > Web Crawl > Proxy Servers page to configure a proxy server to crawl outside your internal network and include the crawled data in your index.

Before Starting this Task

Before configuring a proxy server, complete the tasks shown in the following table.

Task Description
Identify URL patterns to crawl Identify the URL patterns that need to be crawled through a proxy server. The patterns must conform to the section "Rules for Valid URL Patterns" in "Administering Crawl: Constructing URL Patterns," which is linked to the Google Search Appliance help center.
Locate the proxy server address Locate the IP address or fully-qualified domain name of the proxy server.
Determine the proxy server port Determine the port at which the proxy server listens for requests.
Add to host load exceptions Add the proxy server to the Exceptions to Web Server Host Load.

Configuring a Proxy Server

To configure a proxy server:

  1. Under Proxy Servers, specify a URL pattern that you want the search appliance to crawl through a proxy server in the For URLs Matching Pattern text box.
  2. Specify the IP address or fully-qualified domain name for the proxy server to use for crawling URLs in the Use This Proxy Server text boxes.
  3. Specify the proxy port in the On Port text boxes.
  4. If you need more rows for additional URL patterns or proxy servers, click the Add More Rows button.
  5. Click Save.

Authenticating to a Proxy Server

When the search appliance is crawling content, it can authenticate to a proxy server that supports Basic authentication. To enable authenticating to a proxy server, add a Proxy-Authorization header for the crawler in the Additional HTTP Headers for Crawler box on the Content Sources > Web Crawl > HTTP Headers page.

Because Additional HTTP Headers for Crawler headers are sent to all servers, the Proxy-Authorization header will also be sent to servers/proxies that it is not meant for.

A Proxy-Authorization header uses the following format:

Proxy-Authorization:credentials

For example, suppose that you want the search appliance to authenticate to a proxy server using base64 encoding with username=username and password=password. In this instance, add the following Proxy-Authorization header:

Proxy-Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQK

Where the encoded string is "username:password" base64 encoded.

To encode the username and password in base64 on Linux or Unix, enter the following commands:

$ echo username:password > /tmp/foo
$ uuencode -m /tmp/foo /tmp/bar
begin-base64 666 /tmp/bar
dXNlcm5hbWU6cGFzc3dvcmQK
$ rm /tmp/foo

For More Information

For more information about URL patterns used for crawling, see:


 
© Google Inc.