Trustwave Named in 2025 Gartner^® Market Guide for Digital Forensics & Incident Response for the 6th Time. Learn More

Government
Contact Us
Login

Fusion Platform Login

What is the Trustwave Fusion Platform?

Support Download/Forum Login

MailMarshal Cloud Login
Incident Response
Experiencing a security breach?

Get access to immediate incident response assistance.

24 HOUR HOTLINES
AMERICAS +1 855 438 4305

EMEA +44 8081687370

AUSTRALIA +61 1300901211

SINGAPORE +65 68175019

Recommended Actions

Submit RFP
Government
Contact Us
Login

login

Fusion Platform Login

What is the Trustwave Fusion Platform?

Support Download/Forum Login

MailMarshal Cloud Login
Incident Response
Incident Response
Experiencing a security breach?

Get access to immediate incident response assistance.

24 HOUR HOTLINES
AMERICAS +1 855 438 4305

EMEA +44 8081687370

AUSTRALIA +61 1300901211

SINGAPORE +65 68175019

Recommended Actions

Trustwave Named in 2025 Gartner^® Market Guide for Digital Forensics & Incident Response for the 6th Time. Learn More

Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

Database Security

Prevent unauthorized access and exceed compliance requirements.

Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

View All Trustwave Services

BY INDUSTRY

BY REGULATION

BY TOPIC

Offensive Security

Solutions to maximize your security ROI

Operational Technology

End-to-end OT security

Microsoft Security

Unlock the full power of Microsoft Security

Securing the Cloud

Safely navigate and stay protected

Securing the IoT Landscape

Test, monitor and secure network objects

About Us

We reduce cyber risk and fortify organizations

Awards and Accolades

Recognition by analysts and media outlets

Trustwave SpiderLabs Team

Global researchers, ethical hackers, and responders

Trustwave Fusion Security Operations Platform

Unprecedented security visibility and control

Trustwave Security Colony

Access to cybersecurity threat protection resources

Microsoft Security

Unlock the full power of Microsoft Security

Trustwave PartnerOne Program

Join forces with Trustwave to protect against the most advance cybersecurity threats

BLOGS

UPCOMING

MEDIA & ASSETS

NOTICES

HELP

Trustwave Knowledge Base

KB Home Search Latest Additions Most Popular

Knowledgebase

Home » Knowledgebase » WebMarshal » INFO: Changes in TextCensor behavior (TextCensor 2)

INFO: Changes in TextCensor behavior (TextCensor 2)

Show/Hide Article Tools

This article applies to:

Trustwave SEG 7.2 and above
MailMarshal SPE 3.6 and above
WebMarshal 6.9.6 and above
TextCensor functionality

Question:

What are the changes in behavior in TextCensor 2?
What differences in behavior could be caused by upgrading to TextCensor 2?
Error message: Script too big, exceeded maximum number of DFA states

Information:

The product versions named above use TextCensor 2.

When you upgrade to the new versions, existing TextCensor scripts and expressions are migrated to the new syntax.

For SEG, to validate the changes and list any scripts that require manual update, use the TextCensor Upgrade Preview tool. You can download this tool from the product upgrade page (requires customer login).
For MailMarshal SPE upgrade, see the upgrade document on the SPE upgrade page.

Unicode and wide characters

In WebMarshal 6.9.6 and above, TextCensor 2 works with double-byte text, including non-Roman alphabets (such as Hebrew and Arabic), and non-alphabetic languages (such as Chinese).

SEG and SPE do not currently support searching for double-byte text. This functionality is under investigation for addition to SEG in a later release to be announced.

Upgrade Considerations

Customized scripts that contain thousands of items can cause an error on upgrade ("Script too big, exceeded maximum number of DFA states").
- If you encounter this issue, revise the script to use fewer expressions. Eliminate old and irrelevant entries. Trustwave strongly recommends against TextCensor scripts with large numbers of items, since they can cause significant performance issues.
- Alternatively, split the script into two or more segments.
- Be aware that wildcard matching provides additional options in TextCensor2.
"Increasing" and "decreasing" score options are no longer supported. Scripts are automatically updated to use the "maximum matches" option.
In rare cases, an item contains a series of keywords that cannot be upgraded automatically. You must change these items manually. You can contact Support for assistance.

Syntax Differences

There are minor differences in the treatment of word boundaries and quoted text between the two versions of TextCensor. These differences affect the upgrade of TextCensor "items" (now called "expressions").

In the earlier version of TextCensor, all non-word characters (such as punctuation) were treated as word breaks by default, unless added to a list of special characters. The new version of TextCensor does not treat most symbols as word breaks (for details see the User Guide).
Double quotes are used in both versions to mark text as case sensitive.
In the new version, double quotes MUST be preceded by or followed by white space (or the beginning or ending of the expresssion).
- You can "escape" the quote by preceding it with a \ character. It will then be treated as a literal.

Following from the above considerations, when a TextCensor item is migrated, space is added as necessary to make a legal TextCensor 2 expression. In some cases matching behavior is changed.

Original item	Expression after migration	Comments
a"b"c	a "b" c	Closely matches the old behavior
a"&@#"c	a&@#c	Case sensitivity and boundaries lost
a"&@#" c	a&@# c	Case sensitivity and first boundary lost
a "&@#"c	a &@#c	Case sensitivity and second boundary lost

The original TextCensor would match continuous sequences or split sequences. The migration/upgrade process assumes that where symbols are combined in the absence of spaces, the intent was to match the sequence literally. Case sensitivity is lost on sequences where symbols combine with alphanumeric strings.

Case sensitivity could be lost in some other cases where non-alphanumeric characters are used in literal sequences.

However, an expression consisting entirely of a single quoted string will preserve case sensitivity across the entire expression.

Note:

The syntax issues would not affect most uses of TextCensor to look for phrases in ordinary language. They are more likely to affect scripts aimed specifically at JavaScript or other code.

To contact Trustwave about this article or to request support: