Trustwave and Cybereason Merge to Form Global MDR Powerhouse for Unparalleled Cybersecurity Value. Learn More

Trustwave and Cybereason Merge to Form Global MDR Powerhouse for Unparalleled Cybersecurity Value. Learn More

Services
Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

Database Security

Prevent unauthorized access and exceed compliance requirements.

Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Solutions
BY TOPIC
Microsoft Security
Unlock the full power of Microsoft Security
Offensive Security
Solutions to maximize your security ROI
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Partners
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats
Loading...
Loading...

INFO: Changes in TextCensor behavior (TextCensor 2)

Expand / Collapse


This article applies to:

  • Trustwave SEG 7.2 and above
  • MailMarshal SPE 3.6 and above
  • WebMarshal 6.9.6 and above
  • TextCensor functionality

Question:

  • What are the changes in behavior in TextCensor 2?
  • What differences in behavior could be caused by upgrading to TextCensor 2?
  • Error message: Script too big, exceeded maximum number of DFA states

Information:

The product versions named above use TextCensor 2.

When you upgrade to the new versions, existing TextCensor scripts and expressions are migrated to the new syntax.

  • For SEG, to validate the changes and list any scripts that require manual update, use the TextCensor Upgrade Preview tool. You can download this tool from the product upgrade page (requires customer login).
  • For MailMarshal SPE upgrade, see the upgrade document on the SPE upgrade page.

Unicode and wide characters

In WebMarshal 6.9.6 and above, TextCensor 2 works with double-byte text, including non-Roman alphabets (such as Hebrew and Arabic), and non-alphabetic languages (such as Chinese).

SEG and SPE do not currently support searching for double-byte text. This functionality is under investigation for addition to SEG in a later release to be announced.

 

    Upgrade Considerations

    • Customized scripts that contain thousands of items can cause an error on upgrade ("Script too big, exceeded maximum number of DFA states").
      • If you encounter this issue, revise the script to use fewer expressions. Eliminate old and irrelevant entries. Trustwave strongly recommends against TextCensor scripts with large numbers of items, since they can cause significant performance issues.
      • Alternatively, split the script into two or more segments. 
      • Be aware that wildcard matching provides additional options in TextCensor2. 
    • "Increasing" and "decreasing" score options are no longer supported. Scripts are automatically updated to use the "maximum matches" option.
    • In rare cases, an item contains a series of keywords that cannot be upgraded automatically. You must change these items manually. You can contact Support for assistance.

    Syntax Differences

    There are minor differences in the treatment of word boundaries and quoted text between the two versions of TextCensor. These differences affect the upgrade of TextCensor "items" (now called "expressions").

    • In the earlier version of TextCensor, all non-word characters (such as punctuation) were treated as word breaks by default, unless added to a list of special characters. The new version of TextCensor does not treat most symbols as word breaks (for details see the User Guide).
    • Double quotes are used in both versions to mark text as case sensitive.
    • In the new version, double quotes MUST be preceded by or followed by white space (or the beginning or ending of the expresssion).
      • You can "escape" the quote by preceding it with a \ character. It will then be treated as a literal.

    Following from the above considerations, when a TextCensor item is migrated, space is added as necessary to make a legal TextCensor 2 expression. In some cases matching behavior is changed.

    Original item    Expression after migration    Comments
     a"b"c a "b" c Closely matches the old behavior
     a"&@#"c  a&@#c Case sensitivity and boundaries lost
     a"&@#" c  a&@# c Case sensitivity and first boundary lost
     a "&@#"c  a &@#c Case sensitivity and second boundary lost

    The original TextCensor would match continuous sequences or split sequences. The migration/upgrade process assumes that where symbols are combined in the absence of spaces, the intent was to match the sequence literally. Case sensitivity is lost on sequences where symbols combine with alphanumeric strings.

    Case sensitivity could be lost in some other cases where non-alphanumeric characters are used in literal sequences.

    However, an expression consisting entirely of a single quoted string will preserve case sensitivity across the entire expression.

    Note:

    The syntax issues would not affect most uses of TextCensor to look for phrases in ordinary language. They are more likely to affect scripts aimed specifically at JavaScript or other code.


    To contact Trustwave about this article or to request support:


    Rate this Article:
         

    Add Your Comments


    Comment submission is disabled for anonymous users.
    Please send feedback to Trustwave Technical Support or the Webmaster
    .