You are on page 1of 4

Malicious PDF Document Analysis Cheatsheet

1. General Flags:
- /AA (Additional Actions): Specifies additional actions triggered by events.
- /o (Open Action): Defines the action performed upon document opening.
- /Filter: Specifies filters or transformations applied to object data. Algorithm that encodes or
decodes the stream.
- /JS (JavaScript): Indicates the presence of JavaScript code.
- /S (Subtype): Specifies the subtype of the object.
- /Type: Defines the type of object.
- /URI (Uniform Resource Identifier): Indicates a URL or URI.

2. Action-related Flags:
- /AA /OpenAction: Executes an action when the document is opened.
- /AA /Before/After: Performs actions before or after specific events.
- /AA /C: Defines actions based on user keystrokes.
- /AA /D: Executes actions when a page is de-selected.
- /AA /JS: Triggers JavaScript code execution.

3. Security-related Flags:
- /Encrypt: Indicates document encryption is present.
- /Perms: Specifies permissions granted to the document.
- /Sig: Defines the presence of digital signatures.
- /F (File Attachment): Embeds files within the PDF.
- /Cert: References digital certificates.

4. Object-specific Flags:
- /Length: Indicates the length of object content.
- /Width /Height: Specifies the dimensions of objects.
- /BitsPerComponent: Defines the number of bits per color component.
- /ColorSpace: Specifies the color space used by objects.
- /XObject: Identifies external objects referenced by the PDF.
- /MediaBox /CropBox: Defines the dimensions of the page or object.

5. Exploitation Techniques:
- Obfuscation: Hides malicious code using encoding or encryption.
- Exploitation of Vulnerabilities: Targets software vulnerabilities.
- Code Injection: Inserts malicious code into legitimate objects.
- URL Redirection: Redirects users to malicious websites.
- Social Engineering: Manipulates user behavior to execute actions.

6. Analysis Techniques:
- JavaScript Analysis: Inspect embedded JavaScript for malicious intent.
- Content Extraction: Extract and analyze embedded files or objects.
- Hex and Text Editors: Analyze PDF structure and encoded content.
- Malware Sandboxing: Execute PDFs in controlled environments.
- Static and Dynamic Analysis: Use specialized tools for in-depth examination.

Evading Detection Techniques in Malicious PDFs Cheatsheet

1. Content Encoding Techniques:


- Stream Encoding: Encodes content as a stream, utilizing techniques like FlateDecode,
ASCIIHexDecode, or LZWDecode.
- Base64 Encoding: Converts binary data into a printable ASCII format using Base64 encoding.
- Hexadecimal Encoding: Represents binary data as a sequence of hexadecimal characters.
- Unicode Encoding: Converts characters into Unicode representations to bypass filters.

2. Polymorphism and Obfuscation:


- Polymorphic Malware: Generates unique variations of malicious PDFs to evade signature-
based detection.
- Code Obfuscation: Applies various techniques to obscure and hide the intent and
functionality of the malicious code.
- Dynamic Code Generation: Generates malicious code dynamically during runtime, making
static analysis more challenging.

3. Exploit Techniques:
- Zero-Day Exploits: Targets unknown vulnerabilities in PDF software, evading signature-based
detection.
- File Format Vulnerabilities: Exploits vulnerabilities in the PDF file format or specific PDF
viewer applications.
- Heap Spraying: Injects malicious code into memory to exploit vulnerabilities in the PDF
viewer's memory management.

4. Anti-Analysis and Anti-Debugging:


- Anti-Debugging Techniques: Detects if the PDF is being analyzed in a debugging environment
and modifies behavior accordingly.
- Anti-Analysis Checks: Implements checks to identify virtualized or sandboxed environments,
altering behavior to avoid detection.
- Conditional Execution: Delays execution or alters behavior based on specific environmental
conditions or system properties.

5. Exploiting Trusted Features:


- Document-Level JavaScript: Leverages JavaScript functionality allowed by PDF specifications
for malicious purposes.
- Trusted File Formats: Embeds or references trusted file formats like images or fonts to deliver
malicious payloads.
- Exploiting Embedded Objects: Utilizes embedded objects such as Flash, media files, or forms
to execute malicious actions.

6. File Manipulation and Hidden Content:


- File Append/Prepend: Appends or prepends malicious content to an otherwise legitimate
PDF file.
- Hidden Layers: Utilizes hidden layers or invisible objects within a PDF to hide malicious
content or actions.
- Content Splitting: Splits malicious content across different objects or files to evade signature-
based detection.

Workflow for Analyzing PDF Files:


 Use pdfid.py to quickly get a summary of how many objects, URI, streams, number of
pages, JS, JavaScript etc... you have in that PDF file.
Command: pdfid.py malicious.pdf
 The above command will also show any 0 counts. To not show them, use the -n flag.
Command: pdfid.py -n malicious.pdf
 Next, use the tool pdf-parser.py for more detailed look into the pdf document as this
tool will parse the PDF file, examine its object, and shows you more insights. Running
with -a shows you statistics of all the objects, which objects contain corresponding
keywords, as well as URI.
Command: pdf-parser.py malicious.pdf -a
 General note: If you see in the output from pdfid.py the URI was 26 as an example and
pdf-parser.py shows you only 13, it means that pdf-paser.py found 13 clickable links.
Pdfid shows you all instances of URI present in the file, including non-clickable, generic
string.
 While looking at pdf-parser.py result, if you see /XOBJECT, it means the pdf has
embedded image that has hyperlink associated with it.
 Look at the contents of the objects:
Command: pdf-parser.py malicious.py -s /URI | more
Command: pdf-parser.py malicious.py -s /XOBJECT | more
 If you want to see just the URIs, or values of a certain key, use -K flag:
Command: pdf-parser.py malicious.py -k /URI
 Analyze PDF document with specific object, for example 6:
Command: pdf-parser.py malicious.pdf -o 6
 To extract content of a stream of any object, specify an object number then:
Command: pdf-parser.py malicious.py -o 6 -d objects6.jpg
Command: feh objects6.jpg &
 To find out which URL or which link is associated with extracted image from the above
line is to use the -r (reference) flag, this will give you what reference object is in use:
Command: pdf-parser.py malicious.pdf -r 6
Note, here you will find object displayed on the results. We want to continue use the
same command from above (-r) and follow the trail until we get to the bottom of what’s
going on or contents we are after, in this case URI. However, sometimes, you will see
weird keywords sometimes such as in this case, we saw “/Annots”, which offers a way to
associate a link within an object. It can offer way to embed links. To See content of
/Annots, follow the object referenced in there using the -o param. Pdf-parser.py
malicious.pdf -o 14, find the object and follow this object using -o flag to see the
content. Or if you find any references, use -r, as situation permits.
 IF you come across any hyperlink that takes you to a certain website and downloads or
pulls more file then, we use tools like wget or curl to download them. But before you
browser them, only do it from laboratory environment and always use proxy or vpn or
similar approach to obfuscate your origin. Other tools are there for interactive tools
such as pinpoint and scout or thug. Honeyclient tool – thug, real browser on a
purposefully vulnerable windows system. Thug can record in a log file everything it
does, saves everything it saves and examines.
 Dump sus object stream to a file using command flags:
Command: pdf-parser.py malicious.pdf -f -w -d
 Use PDF Parser to look for mal javascripts: /JS, /JavaScript /AcroForm, /XFA keywords.
 Pass protected PDF Files – determine the pwd, decrypt using tools qpdf and pdftk.

You might also like