Malicious PDF Documents Explained

Attack Trends
Editors: Marcus Sachs, marcus.sachs@verizon.com

David Ahmad, drma@mac.com
Malicious PDF
Documents Explained
W
hat makes a PDF file malicious? PDF PDF Vulnerabilities
The JBIG2Decode bug is a well-
designers and the PDF reader software known vulnerability in the Adobe
Reader PDF language. The PDF
architects never intended for files to be language supports the inclusion
of many types of image formats,
able to modify the operating system thus the reader must be able to
decode and decompress many for-
running the PDF reader. But security researchers and malware mats. Several bugs in the JBIG2
decoding algorithm’s implemen-
Didier Stevens authors found ways to exploit The PDF language supports tation give attackers execution
Contraste PDF readers’ software bugs and embedded JavaScript. The em- control when they embed a mal-
Europe NV to creatively use the PDF lan- bedded JavaScript interpreter is formed image into the PDF docu-
guage, enabling them to produce limited in its interaction with the ment. When users open a PDF
PDF documents that execute ar- operating system—for instance, document with a vulnerable PDF
bitrary code. the JavaScript interpreter can’t reader, the reader tries to render
Embedded files are a good read from and write to arbitrary the image by decompressing the
example of this design philoso- files on the file system. malformed image data, causing
phy. The PDF language allows JavaScript is often used for form the decompression algorithm to
files to be embedded inside PDF validation and calculation. When malfunction and giving attackers
documents. For example, you designing a PDF form, users can execution control. What happens
could use this feature to include a add extended input validation with next depends on the payload code
spreadsheet in a PDF document. JavaScript. For example, when a the attacker includes in the PDF
Although embedding executable form requires a credit-card number, document. Adobe has fixed this
files is technically possible, PDF users can add JavaScript to check JBIG2Decode bug; it can no lon-
reader software won’t let you ex- the number with the Luhn formula ger be exploited on recent versions
tract and launch embedded ex- (a checksum algorithm used for a of Adobe Reader.
ecutable files. It blacklists several variety of identification numbers). A well-known vulnerability in
types of executable files based on On PDF order forms, JavaScript the PDF JavaScript language is a
file extension, but blacklisting calculates totals and sales tax. bug in util.printf. Program-
has its limitations. For example, Attackers can create malware mers use this printf function to
Python code (extension .py) isn’t based on bugs in the PDF and format data in particular forms, for
blacklisted and can be extracted JavaScript languages’ implementa- example, converting a floating-
and launched from a PDF docu- tion. Some bugs can be triggered point number to a string repre-
ment, provided the user accepts in such a way that they provide sentation of the floating-point
the warning dialog box. control over the execution of number with two digits after the
This is a simple example of how machine language by the micro- decimal point. The programmer
an attacker can create a PDF doc- processor (so-called extended in- provides a format string parameter
ument with a malicious payload. struction pointer [EIP] control). to instruct the printf function
However, most malware authors These bugs, called vulnerabilities, on the format to use. A bug in the
won’t use this technique because let the attacker take control over algorithm’s implementation in-
it requires some user cooperation the attacked program. Such attack terprets this format string. When
and because most Windows ma- code, or exploit, is often specially attackers call the util.printf
chines don’t have a Python inter- crafted data that causes the pro- with a particular format string and
preter installed. gram to malfunction. numerical argument, the printf
80 COPUBLISHED BY THE IEEE COMPUTER AND RELIABILITY SOCIETIES 1540-7993/11/$26.00 © 2011 IEEE JANUARY/FEBRUARY 2011
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 22,2022 at 18:01:58 UTC from IEEE Xplore. Restrictions apply.
Attack Trends
function malfunctions in such a Attack Methods location, these attackers plant their
way that it provides attackers ex- Skilled attackers craft their attacks shellcode at the location the EIP is
ecution control. To exploit this to obtain complete and precise pointing to after exploitation.
vulnerability, attackers created a control over the EIP. They include Because the PDF and JavaScript
PDF document with embedded their shellcode in the PDF docu- languages don’t provide attack-
JavaScript that calls function util. ment loaded into the running PDF ers with a means to precisely plant
printf. In the PDF language, reader’s memory. This shellcode shellcode at an arbitrary address, at-
JavaScript execution is linked will thus be loaded into memory tackers developed a workaround—
to actions. In this case, attackers at a known location, which usu- heap spraying—in which shellcode
linked the JavaScript execution ally depends on the targeted Win- is stored inside a JavaScript string
to the action of opening the PDF dows operating system’s version that is copied many times (10,000
document—when the document and configuration. To execute the to 100,000 times). The result is that
opens, the JavaScript executes and shellcode, attackers must craft the the JavaScript interpreter’s memory
exploits the vulnerability. Adobe data passed to the vulnerable func- for strings and other data (called
has fixed this bug as well. tion—for example, malformed the heap) is filled with strings con-
image data for JBIG2Decode—so taining shellcode. When the EIP
Execution Control that the EIP will point to the start points to the heap (for example,
When attackers obtain execution of the shellcode. 0x30303030), there’s a good chance
control, a program’s normal flow Less-skilled attackers obtain that it’ll point to the “sprayed”
is disrupted. When a program only partial control over the EIP. shellcode (depending on how much
executes, the microprocessor ex- They can make the EIP point out- the attacker sprayed the heap).
ecutes a sequence of instructions. side of the normal set of instruc- So, less-skilled attackers will
This isn’t a linear sequence; the tions, but they can’t make it point combine an exploit with a Java
program flow will frequently to a specific location. For example, Script heap spray to attack Adobe
jump from one set of instructions they can only make the EIP point Reader. However, there’s one ca-
to another owing to conditional to location 0x30303030. Because veat: if the EIP points somewhere
statements, subroutine calls, inter- they aren’t skilled enough or don’t in the shellcode, but not exactly at
rupts, and so forth. have the resources to make the the start of the shellcode, the shell-
When a function (a set of in- EIP point to a precise shellcode code will fail to execute properly.
structions) contains bugs, its pro-
gram flow won’t execute the way
the programmer intended. When
a buggy function’s program flow
can be forced to jump outside of
the function to a completely dif-
ferent set of instructions, the bug
qualifies as a vulnerability. This
means the attacker has obtained
control over the EIP. The instruc-
tion pointer is a critical register
in the microprocessor that points
to the microprocessor’s execution
instruction. Once the instruc-
tion executes, the EIP will point
to the next sequential instruction,
unless the executed instruction
has changed the EIP and there-
by changed the program flow.
A successful exploitation of the
function’s vulnerability will let at-
tackers change the EIP to point to
an instruction they provide. They
can then force the attacked pro-
gram to execute a set of malicious
instructions, or shellcode.
www.computer.org/security 81
Attack Trends
The heap spray programmer works a long period, the JavaScript heap will run, but because the Java
around this problem by prefixing spray is running. When the PDF Script heap spray didn’t execute,
the shellcode with a long sequence reader becomes responsive again, there’s no shellcode in memory to
of single-byte instructions that es- the heap spray is complete and the take control over the machine. Of
sentially do nothing—NOP (no exploit has executed, or is still ex- course, when dealing with skilled
operation) instructions—in which ecuting. The best thing to do in attackers that don’t use heap sprays,
the EIP is incremented to pass that case is kill the PDF reader disabling JavaScript won’t mitigate
execution to the next NOP in- process while it’s still in the heap- a PDF language bug’s exploitation.
struction. A NOP sled is a long se- spraying phase. PDF reader software designers
quence of NOP instructions (easily have begun using Windows se
50,000 bytes long). In a classic heap PDF Shellcode Types curity features such as data execu-
spray, the shellcode (approximately Attackers can design shellcode to tion prevention (DEP) and address
100 bytes) is prefixed with a long do almost anything, but because space layout randomization (ASLR)
NOP sled, and attackers repeatedly developing complex shellcode to prevent exploits from executing.
spray this into the heap. requires specialized skills, most In the case of Adobe, only the latest
Because the heap is filled with malicious PDF shellcode can be version of Adobe Reader (version 9
large NOP sleds and small shell- classified in two categories (ex- and X) uses such Windows security
code, the EIP will likely point cluding malicious PDF documents features when it executes on recent
somewhere into a NOP sled, rath- used in targeted attacks). versions of Windows (Windows
er than in the shellcode. On ex- The first downloads an ex- Vista and Windows 7). DEP avoids
ecution of the NOP sled, the EIP ecutable via HTTP from a web- exploits by preventing execution of
slides all the way up the NOP sled, site, writes it to disk, executes the code located in memory dedicated
until it encounters the shellcode’s downloaded executable (creating a to data. ASLR changes the entry
first instruction and executes it. new process), and terminates its ex- point of Windows API functions
Most of the PDF malware found ecution. This often results in a PDF and application functions every
in the wild exploits a PDF language reader crash, unless the malicious time Windows boots. This way,
or JavaScript vulnerability and uses author has taken special precautions shellcode can’t use hard-coded
JavaScript heap spray to execute not to crash the reader program. Windows API function addresses.
shellcode. This can be partially ex- The second extracts an embed- Although there are techniques
plained by the use of fuzzing to find ded executable from the PDF doc- to bypass DEP and ASLR, they
bugs. PDF fuzzing is an automatic ument, writes it to disk, and then make the attacker’s job harder.
method in which a program creates executes the embedded execut- In November 2011, Adobe re-
a very large set of PDF documents able. The shellcode uses its own leased Adobe Reader X. This ver-
that are all different from each oth- method to extract embedded ex- sion comes with a new feature
er by making small changes to the ecutables; it’s not hindered by the known as Protected Mode Adobe
PDF document’s parameters and restrictions the PDF reader soft- Reader. PMAR uses a sandbox
data structures. Each document ware imposes on embedded files. to isolate the rendering engine of
is automatically opened with the Adobe Reader from the rest of the
PDF reader under investigation; the operating system. Even if malware
hope is to find PDF documents that
are malformed in such a way that
they crash the PDF reader. If the
T hese are the principles under-
lying the recommendations
to disable JavaScript in the PDF
manages to bypass DEP and ASLR,
the sandbox will prevent it from al-
tering the operating system, hence
crash analysis reveals that the crash reader to mitigate 0-day vulner- stopping the malware infection.
was caused by the EIP pointing to abilities. If the vulnerability is in a
a place where there was no memory JavaScript function, then disabling Didier Stevens is an IT security
segment, and this location is inside the JavaScript execution will pre- consultant at Contraste Europe NV,
the heap, then attackers can eas- vent it from calling and exploiting an IT consulting services company. In
ily transform this malformed PDF the vulnerable function. 2011, Didier was awarded Microsoft
document in an exploit by adding If the vulnerability is in the MVP Consumer Security. Contact
JavaScript to heap spray shellcode. PDF language’s implementation, him via his IT security-related blog at
Depending on the heap size, then disabling JavaScript will blog.DidierStevens.com.
JavaScript heap spraying can take mitigate the attack by prevent-
several minutes. If after opening a ing the JavaScript heap spray from Selected CS articles and columns
malicious PDF document the PDF executing. In most cases, the PDF are also available for free at
reader becomes unresponsive for reader will just crash. The exploit http://ComputingNow.computer.org.
82 IEEE SECURITY & PRIVACY JANUARY/FEBRUARY 2011


Malicious PDF Documents Explained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Malicious PDF Documents Explained

Uploaded by

Copyright:

Available Formats

Attack Trends

Editors: Marcus Sachs, marcus.sachs@verizon.com

82 IEEE SECURITY & PRIVACY JANUARY/FEBRUARY 2011

You might also like

Malicious PDF Documents Explained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Malicious PDF Documents Explained

Uploaded by

Copyright:

Available Formats

Attack Trends

Editors: Marcus Sachs, marcus.sachs@verizon.com

82 IEEE SECURITY & PRIVACY JANUARY/FEBRUARY 2011

You might also like

82 IEEE SECURITY & PRIVACY JANUARY/FEBRUARY 2011