Basic PDF Word Document Analysis

Lab: Investigating Malicious PDF: Analysis
Goals
- Learn how to profile and examine a PDF file and be able to tell
whether it’s malicious or not.
- Use various tools to statically analyze a PDF file and determine
the PDF’s file content
- Extract and analyze suspicious objects
Scenario
- You are a forensic examiner in a large firm. John Kim, A

colleague of yours from HR department received two resumes for an
open position within the firm.
- John Kim viewed the resumes and listed the senders as possible
candidates. Linda.pdf & Lucy.pdf
- Few days later, the firewall administrator noticed a strange
connection going from John Kim’s machine to outside the network.
- Check the 2 resumes and identify the maliciousness of the files.
Requirements
- Windows VM – CDTH Analyst
- Pyid
- PDF_Parser
- exiftool
- Linda.pdf | Lucy2.pdf
GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 1 | PAGE

Walkthrough
1. Setting up the Lab Environment
- Check if the files are all in the same folder:

- C:\Users\IEUser\Desktop\Lab Files\Day3\PDF
- Make sure you have the following tools to investigate PDF. Check
the Flare folders and you should have the following tools:
2. Obtain General Overview of the Suspicious PDF Files
- We start off by running the PDFID.py script on each file of the

suspected PDFs, as follows.
- Make sure your PDF files and tools are all residing into one
folder.
- The results will be similar to the following:

Linda.pdf
Lucy2.pdf

- Notice how running the tool on different files returns different
results. The big difference between the number of objects is the
first thing we notice.
- The second file (Lucy2.pdf) contains a Javascript object!
- This is interesting and suspicious at the same time since resume
file has very little use of JavaScript. Something we can tell is
that even though the two files are similar in structure and
format, the other one doesn’t contain Javascript objects.
3. Extract the Files Metadata
- It is worth trying to extract both files’ metadata and see if we

can find anything useful within. We can use exiftool once again
as follows.
- Linda.pdf

Lucy2.pdf
- One thing we noticed here is that Lucy2’s file contains less

metadata
- On the other hand, Linda’s file seems like a template that has
been downloaded from a website.
- This adds another question mark on Lucy’s file, in addition to
the existence of JavaScript object.
4. List the Objects in the Malicious File
- We can use the pdf_parser.py script to perform a more in-depth

analysis of the pdf file.
- By now we have a good reason to suspect Lucy2’s file. So, we’ll
continue our in-depth analysis against it.
- We’ll first start by a general examination using the --stats
option.

pdf-parser.py Lucy2.pdf
- The output may seem too large for the console terminal to show,
so it may be better to redirect the output to another text file
using the “>” symbol.
- The most distinguishable difference between the two files is that
Lucy’s contains a Javascript object which is typically used by
attackers to deliver malicious payloads
- It would be a good idea to search for that specific object and
extract it for further analysis.
- We can search for the Javascript reference within the file using
the –search Javascript option.
- pdf-parser.py --search Javascript Lucy2.pdf
- It will produce the same result but more specific to that object
and search filter that you put.

5. Preparing the Extracted Object for Analysis
- Sometimes an attacker tries to make your life harder by

compressing or obfuscating the hidden payload
- In order to be able to read it and fully analyze the malicious
code, we may need to decompress the Javascript content.
- We can do that using the --filter and --raw options.
- We will use the object 148 since we just noticed on the previous
task that a PDF calls cmd.exe for some reason and that’s what we
need to find out.
pdf-parser.py --object=148 --filter --raw Lucy2.pdf
- Even without an in-depth knowledge of Javascript and before

starting the malicious code analysis, we can see that something
is not right.
- Why would a Javascript code, within a PDF file, want to call
cmd.exe for?

6. Extracting the evil code from the object
- Now that we have displayed the code in plain text, it is better

to extract it to separate file to make the analysis easier. We
can do that as before using the “>” symbol after the previous
command.
pdf-parser.py --object=148 --filter --raw Lucy2.pdf > object148.js
pdf-parser.py --object=147 --filter --raw Lucy2.pdf > object147.js

7. Analyzing the Evil Code from the Object
Object147.js
- The first code seems to be saving a file called Lucy on the

victim’s HDD. The nlaunch:0 suggests that there are no programs
being launched for now.
- The second script is even more interesting; The PDF seems to be
launching the CMD.exe from the victim’s machine.

Object 148.js
- This is definitely not something a normal resume would do.

- This behavior indicates a normal phishing campaign that tries to
call cmd.exe to perform certain shell commands. It could be to
download the whole payload from C2 server. It could be to delete
certain files to destroy the system.
GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 10 | P A G E

Basic PDF Word Document Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic PDF Word Document Analysis

Uploaded by

Copyright:

Available Formats

Lab: Investigating Malicious PDF: Analysis

- You are a forensic examiner in a large firm. John Kim, A

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 1 | PAGE

1. Setting up the Lab Environment

- Check if the files are all in the same folder:

2. Obtain General Overview of the Suspicious PDF Files

- We start off by running the PDFID.py script on each file of the

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 2 | PAGE

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 3 | PAGE

3. Extract the Files Metadata

- It is worth trying to extract both files’ metadata and see if we

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 4 | PAGE

- One thing we noticed here is that Lucy2’s file contains less

4. List the Objects in the Malicious File

- We can use the pdf_parser.py script to perform a more in-depth

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 5 | PAGE

- pdf-parser.py --search Javascript Lucy2.pdf

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 6 | PAGE

- Sometimes an attacker tries to make your life harder by

pdf-parser.py --object=148 --filter --raw Lucy2.pdf

- Even without an in-depth knowledge of Javascript and before

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 7 | PAGE

- Now that we have displayed the code in plain text, it is better

pdf-parser.py --object=148 --filter --raw Lucy2.pdf > object148.js

pdf-parser.py --object=147 --filter --raw Lucy2.pdf > object147.js

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 8 | PAGE

- The first code seems to be saving a file called Lucy on the

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 9 | PAGE

- This is definitely not something a normal resume would do.

GUIDEM I.T. TRAINING CENTER INVESTIGATING MALICIOUS PDF: ANALYSIS 10 | P A G E

You might also like