How are Regular Expressions used with Firewall Suite?


This article applies to:

  • Firewall Suite 4.X

Question:

  • What are Regular Expressions?
  • How are Regular Expressions used with Firewall Suite?

Note: For help with Regular Expressions in MailMarshal SMTP, see the following Knowledge Base article:
  Q10520: How do I use Header Rewrite / Regular Expressions?

Information:

Regular expressions provide a powerful means for matching patterns of characters. Regular expressions (REs) are understood by a number of commands including ed, ex, sed, awk, grep, egrep, expr and even vi.

Refer to the following sections of this article to learn more about Regular Expressions and how to use them with Firewall Suite.

Building Regular Expressions (Examples)

Most regular expressions you will ever need to use are very simple, often consisting of a few basic elements.

Example 1:
If you wanted to match all of the values that begin with "couch," your regular expression would be as follows:

^couch

Example 2:
If you wanted to match all the values that end with "couch," your regular expression would be as follows:

couch$

Example 3:
In some cases, you may have an either/or situation. In this case you would use the pipe symbol (|) to combine two regular expressions. For example, couch|chair would match a value containing either couch or (|) chair, i.e. blue_chair, chair_55, big_couch_55, etc.

Example 4:
In this example, you are trying to match three months of your product news. You might use the following regular expression to define a qualifying page URL that contains any product news HTML pages from January, February or March:

/product/news/(jan|feb|mar)/.+\.htm

Literally, this reads:
Match any item (most likely a URL), containing the following:
/product/news/ , followed by either jan, feb, or mar , followed by / and one or more of any character (.+), followed by .htm.

This would return the following URLs:

/product/news/jan/chair.htm
/product/news/feb/mirror.htm
/product/news/mar/couch.htm
/product/news/jan/table.htm
/product/news/jan/table.html

but not:

/product/news/jan/chair.asp
/product/news/jan/chair.gif
/product/news/jan/.htm
/product/news/apr/chair.htm

Example 5:
In this example, we want to match all URLs that indicate that an individual product in the furniture category has been registered. We would use the following regular expression to define our qualifying page URL:

^/product/furniture/.+/register.htm

Literally, this reads:
Match all URLs that begin with /product/furniture/, followed by one or more occurrences of any character, followed by /register.htm.

The following URLs would be matched:

/product/furniture/couch/register.htm
/product/furniture/chair/register.htm
/product/furniture/couch/register.htm
/product/furniture/bedroom/armoire/register.htm

but not

/product/furniture/index.htm

Comparing Regular Expressions with Wildcards

Refer to the table below to see how you might use a wildcard or regular expression to accomplish the same thing.

Wildcard (*) Regular Expression Meaning
*chair* chair contains chair
*chair chair$ ends with chair
chair* ^chair begins with chair
chair (no wildcard) ^chair$ is exactly chair

Matching Order Rules

There are several rules involved with how regular expression matching occurs:

  1. The first match found takes priority over other matches found if there are two matching input strings.
  2. The left-most match takes priority in a list of concatenated expressions.
  3. The matches found using *, +, and ? are considered longest first.
  4. Nested constructs are evaluated from the outside in.

Regular Expression Syntax

The basic element of a regular expression can be any of the following:

Basic element Example
a single character Matches anything containing the single character to be matched. For example, a would match cause, bat, fan, and ant.

You can also combine several characters together, in which case a match would be anything containing those characters in that combination. For instance, ball would match basketball, ballerina and ballroom

\ followed by a single character Allows special characters to be used as a single character. For example, "." has special meaning. The only way you can use it to mean just "period" would be to precede it by \. This is especially useful when describing paths, i.e. \.html$ (anything ending in .html). Other characters that need to be preceded by \ if they are to be used without special meaning are the following: \, . , $, *, ?, +, [, ], (, ), |.
$ Matches anything that ends with the value; i.e., cause$ would match cause and because.
^ Matches anything that begins with the value; i.e., ^couch would match couches and couch.
. Matches any single character; i.e., cou*h would match couth, couch, and cough.
range [ ] Matches a sequence of characters, which are enclosed in brackets "[ ]", i.e., [0-9] would match on any decimal digit.

If the sequence is preceded by a caret (^), it matches any single character not from the sequence. For example, [^a-z] would match on anything that is not a letter of the alphabet.

| | Joins multiple expressions

This article was previously published as:
NETIQKB677

Last Modified 4/13/2006.
https://support.trustwave.com/kb/KnowledgebaseArticle10811.aspx