How do I block viewing of pages in the Google cache?


This article applies to:

  • WebMarshal

Question:

How do I block viewing of pages in the Google cache?

Symptoms:

  • Web pages in the Google cache are not blocked.

  • Cached pages may have banned content.

  • Previous WebMarshal TextCensor solution for this issue no longer working (see Notes section)

Procedure:

Because Google is a search engine, by default WebMarshal excludes pages on Google sites from TextCensoring. 

A policy enforcement problem can arise because Google allows users to view cached copies of many of the pages it indexes.  The content of these pages may violate your Acceptable Use Policy, but WebMarshal allows users to view them.

Since the Google cache has the same URL path as Google's search pages, you cannot block access to it using a site blocking rule.

To block browsing to pages from the Google cache, use a WebMarshal TextCensor Script and rule.

Add a TextCensor Script

  1. Create a new TextCensor script with the default trigger level of 5.
  2. Add the following phrase as an item with the default weighting of 5:
    It is a snapshot of the page as it appeared
  3. Save the TextCensor Script with the name "Google Cache"

Add a Rule

  1. Create a new Content Analysis rule (also known as Content Download rule).
  2. If prompted, choose the rule type "Web Page Text Content".
  3. On the Rule Conditions page of the new rule wizard, click the hyperlink "Script" and select the "Google Cache" script.
  4. On the Rule Actions page of the wizard, make sure the action "Block request" is selected.
  5. Clear the checkbox for the action "add URL to category." This rule should not add the URL to a category.
  6. Optionally choose a classification for this request.
  7. Save the Rule with the name "Deny Google Cache".
  8. Move the new Rule to the top of the order of evaluation for TextCensor rules. It must be evaluated before any other TextCensor rule that could trigger on the page content.

Reload the WebMarshal configuration.

Note:

  • WebMarshal 6.0 and above default configuration includes a TextCensor script and rule similar to the above.
  • Google occasionally changes the text of their cache notice. The text suggested in this article matches the text used by Google when the article was last updated. You may need to adjust the text of your TextCensor script to match any changes.

This article was previously published as:
NETIQKB35671

Last Modified 11/14/2008.
https://support.trustwave.com/kb/KnowledgebaseArticle10242.aspx