You are on page 1of 10

Lab: Investigating Malicious PDF: Analysis

Goals
- Learn how to profile and examine a PDF file and be able to tell
whether it’s malicious or not.
- Use various tools to statically analyze a PDF file and determine
the PDF’s file content
- Extract and analyze suspicious objects

Scenario

- You are a forensic examiner in a large firm. John Kim, A


colleague of yours from HR department received two resumes for an
open position within the firm.
- John Kim viewed the resumes and listed the senders as possible
candidates. Linda.pdf & Lucy.pdf
- Few days later, the firewall administrator noticed a strange
connection going from John Kim’s machine to outside the network.
- Check the 2 resumes and identify the maliciousness of the files.

Requirements
- Windows VM – CDTH Analyst
- Pyid
- PDF_Parser
- exiftool
- Linda.pdf | Lucy2.pdf

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 1 | PAGE


Walkthrough

1. Setting up the Lab Environment

- Check if the files are all in the same folder:


- C:\Users\IEUser\Desktop\Lab Files\Day3\PDF
- Make sure you have the following tools to investigate PDF. Check
the Flare folders and you should have the following tools:

2. Obtain General Overview of the Suspicious PDF Files

- We start off by running the PDFID.py script on each file of the


suspected PDFs, as follows.
- Make sure your PDF files and tools are all residing into one
folder.
- The results will be similar to the following:

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 2 | PAGE


Linda.pdf

Lucy2.pdf

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 3 | PAGE


- Notice how running the tool on different files returns different
results. The big difference between the number of objects is the
first thing we notice.
- The second file (Lucy2.pdf) contains a Javascript object!
- This is interesting and suspicious at the same time since resume
file has very little use of JavaScript. Something we can tell is
that even though the two files are similar in structure and
format, the other one doesn’t contain Javascript objects.

3. Extract the Files Metadata

- It is worth trying to extract both files’ metadata and see if we


can find anything useful within. We can use exiftool once again
as follows.
- Linda.pdf

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 4 | PAGE


Lucy2.pdf

- One thing we noticed here is that Lucy2’s file contains less


metadata
- On the other hand, Linda’s file seems like a template that has
been downloaded from a website.
- This adds another question mark on Lucy’s file, in addition to
the existence of JavaScript object.

4. List the Objects in the Malicious File

- We can use the pdf_parser.py script to perform a more in-depth


analysis of the pdf file.
- By now we have a good reason to suspect Lucy2’s file. So, we’ll
continue our in-depth analysis against it.
- We’ll first start by a general examination using the --stats
option.

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 5 | PAGE


pdf-parser.py Lucy2.pdf

- The output may seem too large for the console terminal to show,
so it may be better to redirect the output to another text file
using the “>” symbol.
- The most distinguishable difference between the two files is that
Lucy’s contains a Javascript object which is typically used by
attackers to deliver malicious payloads
- It would be a good idea to search for that specific object and
extract it for further analysis.
- We can search for the Javascript reference within the file using
the –search Javascript option.

- pdf-parser.py --search Javascript Lucy2.pdf

- It will produce the same result but more specific to that object
and search filter that you put.

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 6 | PAGE


5. Preparing the Extracted Object for Analysis

- Sometimes an attacker tries to make your life harder by


compressing or obfuscating the hidden payload
- In order to be able to read it and fully analyze the malicious
code, we may need to decompress the Javascript content.
- We can do that using the --filter and --raw options.
- We will use the object 148 since we just noticed on the previous
task that a PDF calls cmd.exe for some reason and that’s what we
need to find out.

pdf-parser.py --object=148 --filter --raw Lucy2.pdf

- Even without an in-depth knowledge of Javascript and before


starting the malicious code analysis, we can see that something
is not right.
- Why would a Javascript code, within a PDF file, want to call
cmd.exe for?

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 7 | PAGE


6. Extracting the evil code from the object

- Now that we have displayed the code in plain text, it is better


to extract it to separate file to make the analysis easier. We
can do that as before using the “>” symbol after the previous
command.

pdf-parser.py --object=148 --filter --raw Lucy2.pdf > object148.js

pdf-parser.py --object=147 --filter --raw Lucy2.pdf > object147.js

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 8 | PAGE


7. Analyzing the Evil Code from the Object

Object147.js

- The first code seems to be saving a file called Lucy on the


victim’s HDD. The nlaunch:0 suggests that there are no programs
being launched for now.
- The second script is even more interesting; The PDF seems to be
launching the CMD.exe from the victim’s machine.

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 9 | PAGE


Object 148.js

- This is definitely not something a normal resume would do.


- This behavior indicates a normal phishing campaign that tries to
call cmd.exe to perform certain shell commands. It could be to
download the whole payload from C2 server. It could be to delete
certain files to destroy the system.

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 10 | P A G E

You might also like