Why does MailMarshal detect some PDF files as PDFInvalid or Binary Unknown?


This article applies to:

  • Trustwave MailMarshal/SEG

Question:

  • Why does MailMarshal detect some PDF files as PDFInvalid or Binary Unknown? 

Reply:

The PDF standard (ISO 32000) requires that PDF files start with the string %PDF-

Some PDF files have been observed to have characters before the PDF declaration.

MailMarshal SEG is a security product and so it strictly follows the PDF standard. If there are characters before the string %PDF- then MailMarshal does not recognize the file as a valid PDF document.

Past versions of Acrobat and other Adobe software will open files that have characters before the declaration. However, this is now considered a possible security risk. Adobe states that new versions of Adobe Acrobat or Reader might not open these files.

Workaround:

In current supported MailMarshal versions, files that have up to 100 characters before the PDF header are recognized separately as type PDFInvalid. You can set up specific rule conditions to use this type.

  • These files will be unpacked and scanned as for valid PDFs.

Files with more than 100 characters before the PDF header are typed as Binary Unknown (BIN).

  • If you want to pass these files through MailMarshal, you can create an exception to the "Block Unknown Attachment" rule. For instance you could make a rule to allow files with the extension ".PDF" from specific senders.
  • Note that this solution will allow these files to pass through MailMarshal without scanning of the text or image content. Virus scanning would still be performed.

The recommended and permanent solution is to inform the creator (or sender) of these PDF files about the formatting issues.

References:


Last Modified 3/1/2020.
https://support.trustwave.com/kb/KnowledgebaseArticle16040.aspx