You are on page 1of 66

March 9-11, 2015

Santa Clara , CA
Registration Now Open!

Learn how to design, build and develop apps


for the wearable technology revolution
at Wearables TechCon 2015!
Two Huge Technical Tracks
Hardware and Design Track
Choose from 30+ classes on product design, electronic engineering for
wearable devices and embedded development. The hardware track is a
360-degree immersion on building and designing the next
generation of wearable devices.

Software and App Development Track


Select from 30+ classes on designing software and applications for
the hottest wearable platforms. Take deep dives into the leading SDKs,
and learn tricks and techniques that will set your wearable software
application apart!

A BZ Media Event

2 Days of Exhibits
Business-Critical Panels
Special Events
Industry Keynotes

Wearables DevCon blew away all my


expectations, great first year. Words
can't even describe how insightful
and motivating the talks were.
Mike Diogovanni, Emerging Technology
Lead, Isobar

www.wearablestechcon.com

TEAM
Editor:
Joanna Kretowicz
joanna.kretowicz@eforensicsmag.com
Betatesters/Proofreaders:
Olivier Caleff, Kishore P.V., JohanScholtz,
Mark Dearlove, Massa Danilo, Andrew
J. Levandoski, Robert E. Vanaman, Tom
Urquhart, M1ndl3ss, Henrik Becker,
JAMES FLEIT, Richard C Leitz Jr
Senior Consultant/Publisher:
Pawe Marciniak
CEO: Joanna Kretowicz
jaonna.kretowicz@eforensicsmag.com
Marketing Director: Joanna Kretowicz
jaonna.kretowicz@eforensicsmag.com
Art Director: Ireneusz Pogroszewski
ireneusz.pogroszewski@software.com.pl
DTP: Ireneusz Pogroszewski
Publisher: Software Press Sp. z o.o.
02-676 Warszawa, ul. Postpu 17D
Phone: 1 917 338 3631
www.eforensicsmag.com

DISCLAIMER!
The techniques described in our articles
may only be used in private, local networks. The editors hold no responsibility
for misuse of the presented techniques or
consequent data loss.

CONTENTS

MODULE 1 6
CASE OF STUDY 7
MALWARE EVOLUTION AND NEW THREATS 7
TOOLSET 8
CREATING OUR LAB 8
REMnux: A Linux Distribution
for Reverse-Engineering Malware 12
Downloading REMnux 12
WINDOWS SYSTEM SANDBOXES 15
Wireshark 15
CaptureBAT 15
Procmon 15
Regshot 15

MODULE 2 16
Overview of the PE File Format 16
PE File Sections 18
Section Table 18
Relative Virtual Addresses 19
The Data Directory 19
Importing Functions 20
PE File Structure 21
The MS-DOS Header 21
The IMAGE_NT_HEADERS Header 22
The Section Table 22
PE Files Structure Walkthrough 22
Executable Compression 23
PE file Identification: First Contact Analysis 27
Creating our own Repository 33

MODULE 3 36
Introduction 36
CaptureBAT 36
RegShot 37
Process Monitor & Process Explorer 37
FakeNet 37
Real Malware Sample Analysis. 37
Exercise Proposal #1 37
Resolution: Exercise #1 #2, #3 and #4 38

www.eForensicsMag.com

MODULE 4 39
Use Cases 40
Architecture 40
Requirements 41
Installing Python libraries 41
Virtualization Software 42
Installing Tcpdump 42
Installing Volatility 42
Installing Cuckoo 43
Install Cuckoo 43
Configuration 43
PREPARING GUEST REQUIREMENTS 43
Install Python 43
Additional Software 43
Saving the Virtual Machine 44
VirtualBox 44
KVM 44
VMware Workstation 45
JavaScript Malware Analysis 46
JSDetox Malware Javascript Analysis Framework 46
Android Malware Analysis 59
Androwarn 59

www.eForensicsMag.com

MALWARE ANALYSIS
STARTERKIT
by Anderson Tamborim

This workshop will cover the main initial topics that everyone
entering the world of malicious artifacts must know. We will
provide the 101 guidance for the following points:
-What the first steps to create a virtual laboratory to retrieve,
manage,analyze and create intelligence from pieces of malware
-C
 utting edge tools that you must dominate to fight back
malware outbreaks
- The basic principles of dynamic malware analysis, how to
observe and learn from the malware while it is executing on
atarget host
- How to identify main aspects of malware like: anti virtualization
tricks,keylogging, backdoors, infected documments, malicious
javascripts and other tricks that malware creators use very often.

www.eForensicsMag.com

MODULE 1
In this module, we will know a little about the main motivations
around the malware creation and will start to create our
virtualLab.

What you will learn:


The needed initial tools to create
the virtual analysis lab

What you should know:


Basic knowledge on network and
Windows Operating Systems
Knowledge of network protocols

t first, we have to ask ourselves: Why analyze a piece of malware?


Why invest time and effort in such activity? What kind of benefit we can
achieve from this specific action.

In this class, we will try to answer some of these questions, and make other
new ones. This material intended to be you first step into the giant world of
crypto virology artefacts. This part of security area are really challenging and
need a huge amount of study, dedication and focus, specially as we will see
during the material, all the most interesting information can pass through your
eyes on a blink.
So lets start with a simple question: After all, what is a Malware?
Malware, short formalicious software, is any software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems.It can appear in the form of executable,scripts, active content, and
other software.Malware is a general term used to refer to a variety of forms of
hostile or intrusive software. The termbadwareis sometimes used, and applied
to both true (malicious) malware and unintentionally harmful software.
Malware includescomputer viruses,ransonware,spyware,adware, scareware, and other malicious programs. As of 2011, the majority of active malware
threats were worms or trojans rather than viruses. In law, malware is sometimes
known as a computer contaminant, as in the legal codes of several U.S. states.
Malware is often disguised as, or embedded in, non-malicious files.
Spyware or other malware is sometimes found embedded in programs supplied officially by companies, e.g., downloadable from websites, that appear
useful or attractive, but may have, for example, additional hidden tracking
functionality that gathers marketing statistics. An example of such software,
which was described as illegitimate, is the Sony rootkit, a Trojan embedded
into CDs sold by Sony, which silently installed and concealed itself on purchasers computers with the intention of preventing illicit copying; it also reported on users listening habits, and created vulnerabilities that were exploited by unrelated malware.
The term malware only applies to software that intentionally causes harm.
Software that causes harm due to bugs or poor design are not classified as
malware; for example some legitimate software written before the year 2000
had errors that caused serious malfunctions these programs are not considered malware. So basically, malware is a umbrella term.

www.eForensicsMag.com

CASE OF STUDY

Before Internet access became widespread, viruses spread on personal computers by infecting the executable boot sectors of floppy disks. By inserting a copy of itself into the machine code instructions in
these executables, a virus causes itself to be run whenever a program is run or the disk is booted. Early
computer viruses were written for the Apple II and Macintosh, but they became more widespread with
the dominance of the IBM PC and MS-DOS system. Executable-infecting viruses are dependent on users exchanging software or boot-able floppies and thumb drives so they spread rapidly in computer hobbyist circles.
The first worms, network-borne infectious programs, originated not on personal computers, but on multitasking Unix systems. The first well-known worm was the Internet Worm of 1988, which infected SunOS
and VAX BSD systems. Unlike a virus, this worm did not insert itself into other programs. Instead, it exploited security holes (vulnerabilities) in network server programs and started itself running as a separate
process. Todays worms use this same behavior as well.
With the rise of the Microsoft Windows platform in the 1990s, and the flexible macros of its applications, it became possible to write infectious code in the macro language of Microsoft Word and similar
programs. These macro viruses infect documents and templates rather than applications (executables),
but rely on the fact that macros in a Word document are a form of executable code.
Today, worms are most commonly written for the Windows OS, although a few like Mare-D and the
L10n worm are also written for Linux and UNIX systems. Worms today work in the same basic way as
1988s Internet Worm: they scan the network and use vulnerable computers to replicate. Because they
need no human intervention, worms can spread with incredible speed. The SQL Slammer infected thousands of computers in a few minutes in 2003.

MALWARE EVOLUTION AND NEW THREATS

Disregarding the initial motivations of the first creators of malware, now we face a new brand of technologies, techniques and motivations who will make us put a new light to the cybercriminal landscape. The
majority of the malware realeased and found in the wild are focused in stealing information, specially
bank information. Credit Card numbers, Internet Banking credentials, PayPal credentials and so on. Botnet creators generate massive amount of banking data infecting and disseminating their malware campaign using the most innovative and creative way you will ever think.
The time were the malware phishing attempt was gross and very perceptive will remain in the past.
Everyday new manners to disguise the malware and spread using especially social networks grow bigger and stronger.
If you search in the website http://www.phishtank.com you can see what I am talking about. Lets take
a closer look in one case reported to them:

Figure 1. Phishing scam reported at phishitank.com

www.eForensicsMag.com

This is a case of a site pretending to be a PayPal login. For some unsuspicious users this could be hard
to detect the difference.
In the ongoing of the course, I will present you with some real world malware. Most of them reached by
e-mail phishing campaigns just like the one above. We will dissect them and try to obtain as most information as we can. For this lets initiate the building of our very first toolset.

TOOLSET

The toolset for malware analysis is one of the most important thing you have keep in mind when starting
the job. They will be your Knife and Butter tools for understand and observe how a malicious sample behaves. In this course, I will introduce to you the basic toolset you will ever need. This will be your malware
analysis starter kit. For each chapter of the workshop I will show you the tool used, the basic features
of it, the benefits you can have by using it and where you can find more information. All the tools will be
stored in a Cloud driver to make easier to get all of them.
The same will happen with the malware samples. Most of them will be pieces of living malware and
you have to watch out and manipulate with extreme care, since we dont want to you get infected and
compromise your machines. For this, the initial job that we will do is to create the first lab you will use
during the course.
Pay attention in the configurations and never execute a piece of these samples at your own computer. Most of them, even being old, can still harm your system. We will focus on Microsoft Windows Systemsmalware.

CREATING OUR LAB

For our malware analysis laboratory I prefer to use Virtualbox engine. VirtualBox is a powerful x86 and
AMD64/Intel64 virtualization product for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers, it is also the only professional
solution that is freely available as Open Source Software under the terms of the GNU General Public
License (GPL) version 2.
Presently, VirtualBox runs on Windows, Linux, Macintosh, and Solaris hosts and supports a large number of guest operating systemsincluding but not limited to Windows (NT 4.0, 2000, XP, Server 2003, Vista, Windows 7, Windows 8), DOS/Windows 3.x, Linux (2.4, 2.6 and 3.x), Solaris and OpenSolaris, OS/2,
and OpenBSD.
VirtualBox is being actively developed with frequent releases and has an ever growing list of features,
supported guest operating systems and platforms it runs on. VirtualBox is a community effort backed by
a dedicated company: everyone is encouraged to contribute while Oracle ensures the product always
meets professional quality criteria.
I recommend you to create the machine that will be infected using an interface Host-only. This because you need to ensure that the machine who will be infected for study has no contact with your real infrastructure. This is importante because we dont know how kind of action the malwares could do. If you
caught something in the wild (a new malware), you do not have any clue of what kind of activity this little
guy will attempt to accomplish once you execute it. It could deliver some information of your network to
the malware creator and stuff like that. We do not want this, right.
There is a diagram above of a suggesting laboratory. We will use one Windows XP machine to be
our matrix of analysis. The other machine you can use a Windows XP or Seven, its up to you. In this
one, well install and execute some tools to watch the malware in the behavioral analysis (executing).
TheRemnux Linux is a Linux distribution created specially with tools for malware analysis. Most of the
tools well use are already in this system.

www.eForensicsMag.com

Figure 2. Virtual lab diagram

Tails is a live system that aims to preserve your privacy and anonymity. It helps you to use the Internet
anonymously and circumvent censorship almost anywhere you go and on any computer but leaving no
trace unless you ask it to explicitly.
It is a complete operating system designed to be used from a DVD, USB stick, or SD card independently of the computers original operating system. It is Free Software and based on Debian GNU/Linux
Tails comes with several built-in applications pre-configured with security in mind: web browser, instant
messaging client, email client, office suite, image and sound editor, etc.
The main purpose to use Tails, is to avoid malware creators and C&C (command & center) servers of
malware to notice who we are, what is our source origin IP address. When we want to access some remote resource hosted on the web, we will access through Tails interface. You can download tails from
here: http://dl.amnesia.boum.org/tails/stable/tails-i386-1.1.1/tails-i386-1.1.1.iso.

Figure 3. Tails boot screen

www.eForensicsMag.com

Figure 4. Welcome screen

Figure 5. Configuration screen

10

www.eForensicsMag.com

Figure 6. Tor network settings

Figure 7. Tor network sockets ready for use

Figure 8. Vidalia Control Panel

11

www.eForensicsMag.com

REMNUX: A LINUX DISTRIBUTION FOR REVERSE-ENGINEERING MALWARE

REMnux is a lightweight Linux distribution for assisting malware analysts with reverse-engineering malicious software. The distribution is based on Ubuntu and is maintained by Lenny Zeltser.
REMnux incorporates a number of tools for analyzing malicious executables that run on Microsoft Windows, as well as browser-based malware, such as Flash programs and obfuscated JavaScript. Thispopular toolkit includes programs for analyzing malicious documents, such PDF files, and utilities for reverse-engineering malware through memory forensics.
REMnux can also be used for emulating network services within an isolated lab environment when performing behavioral malware analysis. As part of this process, the analyst typically infects another laboratory system with the malware sample and redirects the connections to the REMnux system listening on
the appropriate ports.

DOWNLOADING REMNUX

You can download the REMnux distribution as a virtual appliance archive and as an ISO image of a Live CD:
OVF/OVA virtual appliance: remnux-5.0-ovf-public.ova for most virtualization tools, including VMware and VirtualBox (MD5 hash e5ab6981d1a4d5956b05ed525130d41f)
VMware virtual appliance: remnux-5.0-vm-public.zip only for VMware virtualization softare and includes VMware Tools (MD5 hash 77ec0701661caceaa1a5eef90c0bacd1).
ISO image of a Live CD: remnux-5.0-live-cd.iso for ephemeral malware analysis sessions (MD5
hash a06b2603a13fba97f50818c2ab12bbe6).
For our lab, I recommend you to download the OVF/OVA virtual appliance. Remnux main features are:





Excellent for running services when performing behavioral malware analysis in a lab.
Useful for performing static analysis of malicious executables and web pages.
Includes tools for examining malicious documents, such as Microsoft Office and Adobe PDF files.
Includes many utilities for memory forensics and reverse-engineering malware.
Used by many beginner and experienced malware analysts worldwide.
Get it as a virtual appliance archive for VMware, VirtualBox,etc. and as a Live CD ISO file.

When you finish booting the REMnux machine, you will arrive in a screen with a terminal opened. To elevate to root shell you need to use sudo su. The user and password are remnux and malware.

Figure 9. Remnux Linux fully loaded

12

www.eForensicsMag.com

Figure 10. Mindmap on the Desktop shows all the tools in the REMNnux Distro

Figure 11. GUI of Bokken. Free software for static analysis


13

www.eForensicsMag.com

Figure 12. MalwareCrawler for multiple samples download

Figure 13. Folder with many tools we will use in next modules

You can enjoy the next days to explore the REMnux system and become familiar with the structure.
Wewill use many tools from it and this will help you to do this in a more easy way knowing the this Linux.

14

www.eForensicsMag.com

WINDOWS SYSTEM SANDBOXES

For most of the dynamic analysis process in the Module 2 we will need some Windows virtual machines
prepared with tools that can help us in order to identify the malicious behavior of the samples. I personally recommend the use of a Windows XP SP3 machine, because must of malware are created to aim
this system. It is rare to see malwares that are directed to Windows 8 or 7 exclusively, so as Windows
XP need less resources, will be a great thing for a little farm of virtual machines.
The procedure of creation are quite simple. Just install the Windows XP and lead it with default configuration. Dont install any update, turn the Windows Firewall off and dont install any antivirus system.
What we want is to let the malware work very free, so we can uncover everything it does.
There are some tools that we will have to install in the machine to help us in the analysis process.
Forthis case, you have to install all the software, remove the installer file and then generate a Snapshot.
Every time that we manipulate some malware code inside the virtual machine, you will have to flush the
VM to the snapshot state. This is important because you dont want to being analyzing a X malware and
the remains of the Y malware that you manipulate earlier interfere in the new results.
Here we go with the initial list of tools for the Sandbox:

WIRESHARK

Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis,
software and communications protocol development, and education. https://www.wireshark.org/download.html.

CAPTUREBAT

CaptureBAT is a free Windows tool for capturing local information about the systems processing. It allows you to observe process, registry, file system and network-level activities on the host. This tool is
useful for system troubleshoting as well as behavioral malware analysis. http://zeltser.com/reverse-malware/capturebat.html.

PROCMON

Process Monitor is a free tool from Windows Sysinternals, part of the Microsoft TechNet website. The tool
monitors and displays in real-time all file system activity on a Microsoft Windows operating system. http://
technet.microsoft.com/en-us/sysinternals/bb896645.aspx.

REGSHOT

Regshot is an open-source (LGPL) registry compare utility that allows you to quickly take a snapshot of
your registry and then compare it with a second one done after doing system changes or installing a
new software product. http://sourceforge.net/projects/regshot/.
Since these tools were installed and you generated the Snapshot to reset the virtual machine, we are
ready to start the analysis processes.
In the next module, we will do a Massive resource around the dynamic analysis. We will use the above tools
and generate reports for real malware samples that I will host in an encrypted virtual drive for download.

15

www.eForensicsMag.com

MODULE 2
In this module, we will learn about PE Files, their main aspects,
we will talk about the binary compression, how to create our
own malware repository and initiate some samples classificationusing identifiers.

What you will learn:


Windows PE files internal structure, binary compression and create or mini malware zoo.

What you should know:


Basic knowledge on network and
GNU Linux commands;
Knowledge of network protocols.

nce we already have created the initial analysis lab, lets start to understand the aspects of the main type of file that we will manipulate in
this course, the Windows PE Files. They are the known executables.
To have best understanding of the capabilities of a malicious executable and
what kind of information are possible to obtain when analyzing this files, we
need to know how the internal structure of this kind of file works. Lets look
into the PE file system internal.

OVERVIEW OF THE PE FILE FORMAT

Microsoft introduced the PE File format, more commonly known as the PE


format, as part of the original Win32 specifications. However, PE files are
derived from the earlier Common Object File Format (COFF) found on VAX/
VMS. This makes sense since much of the original Windows NT team came
from Digital Equipment Corporation. It was natural for these developers to
use existing code to quickly bootstrap the new Windows NT platform.
The term Portable Executable was chosen because the intent was to have
a common file format for all flavors of Windows, on all supported CPUs. To
alarge extent, this goal has been achieved with the same format used on Windows NT and descendants, Windows 95 and descendants, and Windows CE.
OBJ files emitted by Microsoft compilers use the COFF format. You can get
an idea of how old the COFF format is by looking at some of its fields, which
use octal encoding! COFF OBJ files have many data structures and enumerations in common with PE files, and Ill mention some of them as I go along.
The addition of 64-bit Windows required just a few modifications to the PE
format. This new format is called PE32+. No new fields were added, and only
one field in the PE format was deleted. The remaining changes are simply the
widening of certain fields from 32 bits to 64 bits. In most of these cases, you
can write code that simply works with both 32 and 64-bit PE files. The Windows header files have the magic pixie dust to make the differences invisible
to most C++-based code.
The distinction between EXE and DLL files is entirely one of semantics. They
both use the exact same PE format. The only difference is a single bit that indicates if the file should be treated as an EXE or as a DLL. Even the DLL file
extension is artificial. You can have DLLs with entirely different extensions for
instance .OCX controls and Control Panel applets (.CPL files) are DLLs.
16

www.eForensicsMag.com

A very handy aspect of PE files is that the data structures on disk are the same data structures used in
memory. Loading an executable into memory (for example, by calling LoadLibrary) is primarily a matter
of mapping certain ranges of a PE file into the address space. Thus, a data structure like the IMAGE_
NT_HEADERS (which Ill examine later) is identical on disk and in memory. The key point is that if you
know how to find something in a PE file, you can almost certainly find the same information when the file
is loaded in memory.
Its important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Windows loader looks at the PE file and decides what portions of the file to map in. This mapping is consistent in that higher offsets in the file correspond to higher memory addresses when mapped
into memory. The offset of an item in the disk file may differ from its offset once loaded into memory.
However, all the information is present to allow you to make the translation from disk offset to memory
offset (see Figure1).

Figure 1. Offsets

When PE files are loaded into memory via the Windows loader, the in-memory version is known as
amodule. The starting address where the file mapping begins is called an HMODULE. This is a point
worth remembering: given an HMODULE, you know what data structure to expect at that address, and
you can use that knowledge to find all the other data structures in memory. This powerful capability can
be exploited for other purposes such as API interception. (To be completely accurate, an HMODULE isnt
the same as the load address under Windows CE, but thats a story for yet another day.
A module in memory represents all the code, data, and resources from an executable file that is needed by a process. Other parts of a PE file may be read, but not mapped in (for instance, relocations).
Some parts may not be mapped in at all, for example, when debug information is placed at the end of
the file. A field in the PE header tells the system how much memory needs to be set aside for mapping
the executable into memory. Data that wont be mapped in is placed at the end of the file, past any parts
that will be mapped in.
The central location where the PE format (as well as COFF files) is described is WINNT.H. Within this
header file, youll find nearly every structure definition, enumeration, and #define needed to work with
PE files or the equivalent structures in memory. Sure, there is documentation elsewhere. MSDN has the
Microsoft Portable Executable and Common Object File Format Specification, for instance (see the
October 2001 MSDN CD under Specifications). But WINNT.H is the final word on what PE files look like.
There are many tools for examining PE files. Among them are Dumpbin from Visual Studio, and Depends
from the Platform SDK. I particularly like Depends because it has a very succinct way of examining
17

www.eForensicsMag.com

a files imports and exports. A great free PE viewer is PEBrowse Professional, from Smidgeonsoft (http://
www.smidgeonsoft.com).
From an API standpoint, the primary mechanism provided by Microsoft for reading and modifying PE
files is IMAGEHLP.DLL.

PE FILE SECTIONS

A PE file section represents code or data of some sort. While code is just code, there are multiple types of
data. Besides read/write program data (such as global variables), other types of data in sections include
API import and export tables, resources, and relocations. Each section has its own set of in-memory attributes, including whether the section contains code, whether its read-only or read/write, and whether
the data in the section is shared between all processes using the executable.
Generally speaking, all the code or data in a section is logically related in some way. At a minimum,
there are usually at least two sections in a PE file: one for code, the other for data. Commonly, theres at
least one other type of data section in a PE file. Ill look at the various kinds of sections in Part 2 of this
article next month.
Each section has a distinct name. This name is intended to convey the purpose of the section. For example, a section called .rdata indicates a read-only data section. Section names are used solely for the
benefit of humans, and are insignificant to the operating system. A section named FOOBAR is just as
valid as a section called .text. Microsoft typically prefixes their section names with a period, but its not
arequirement. For years, the Borland linker used section names like CODE and DATA.
While compilers have a standard set of sections that they generate, theres nothing magical about
them. You can create and name your own sections, and the linker happily includes them in the executable. In Visual C++, you can tell the compiler to insert code or data into a section that you name with
#pragma statements. For instance, the statement
#pragma data_seg( MY_DATA )

causes all data emitted by Visual C++ to go into a section called MY_DATA, rather than the default .data
section. Most programs are fine using the default sections emitted by the compiler, but occasionally you
may have funky requirements which necessitate putting code or data into a separate section.
Sections dont spring fully formed from the linker; rather, they start out in OBJ files, usually placed there
by the compiler. The linkers job is to combine all the required sections from OBJ files and libraries into
the appropriate final section in the PE file. For example, each OBJ file in your project probably has at
least a .text section, which contains code. The linker takes all the sections named .text from the various
OBJ files and combines them into a single .text section in the PE file.
Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset thats some multiple of
the alignment value. For instance, in the PE file, a typical alignment would be 0x200. Thus, every section
begins at a file offset thats a multiple of 0x200.
Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page. On x86 CPUs,
pages are 4KB aligned, while on the IA-64, theyre 8KB aligned. The following code shows a snippet of
PEDUMP output for the .text and .data section of the Windows XP KERNEL32.DLL.

SECTION TABLE
01 .text
VirtSize: 00074658
raw data offs:
00000400 raw

02 .data
VirtSize: 000028CA
raw data offs:
00074C00 raw

18

VirtAddr: 00001000
data size: 00074800
VirtAddr: 00076000
data size: 00002400

www.eForensicsMag.com

The .text section is at offset 0x400 in the PE file and will be 0x1000 bytes above the load address of KERNEL32 in memory. Likewise, the .data section is at file offset 0x74C00 and will be 0x76000 bytes above
KERNEL32s load address in memory.
Its possible to create PE files in which the sections start at the same offset in the file as they start from
the load address in memory. This makes for larger executables, but can speed loading under Windows
9xor Windows Me. The default /OPT:WIN98 linker option (introduced in Visual Studio 6.0) causes PE
files to be created this way. In Visual Studio .NET, the linker may or may not use /OPT:NOWIN98, depending on whether the file is small enough.
An interesting linker feature is the ability to merge sections. If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. This is done via the linker /merge
switch. For instance, the following linker option combines the .rdata and .text sections into a single section called .text:
/MERGE:.rdata=.text

The advantage to merging sections is that it saves space, both on disk and in memory. At a minimum,
each section occupies one page in memory. If you can reduce the number of sections in an executable
from four to three, theres a decent chance youll use one less page of memory. Of course, this depends
on whether the unused space at the end of the two merged sections adds up to a page.
Things can get interesting when youre merging sections, as there are no hard and fast rules as to
whats allowed. For example, its OK to merge .rdata into .text, but you shouldnt merge .rsrc, .reloc, or
.pdata into other sections. Prior to Visual Studio .NET, you could merge .idata into other sections. In Visual Studio .NET, this is not allowed, but the linker often merges parts of the .idata into other sections,
such as .rdata, when doing a release build.
Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section. This situation works because at load
time the system can temporarily set the attributes of the pages containing the imports data to read/write.
Oncethe imports table is initialized, the pages are then set back to their original protection attributes.

RELATIVE VIRTUAL ADDRESSES

In an executable file, there are many places where an in-memory address needs to be specified. For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you cant rely on the
executable file actually loading there. For this reason, its important to have some way of specifying addresses that are independent of where the executable file loads.
To avoid having hardcoded memory addresses in PE files, RVAs are used. An RVA is simply an offset
in memory, relative to where the PE file was loaded. For instance, consider an EXE file loaded at address
0x400000, with its code section at address 0x401000. The RVA of the code section would be:
(target address) 0x401000 (load address)0x400000

= (RVA)0x1000.

To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Incidentally, the actual memory address is called a Virtual Address (VA) in PE parlance. Another way to think of a VA is that its an RVA with the preferred load address
added in. Dont forget the earlier point I made that a load address is the same as the HMODULE.
Want to go spelunking through some arbitrary DLLs data structures in memory? Heres how. Call GetModuleHandle with the name of the DLL. The HMODULE thats returned is just a load address; you can
apply your knowledge of the PE file structures to find anything you want within the module.

THE DATA DIRECTORY

There are many data structures within executable files that need to be quickly located. Some obvious examples are the imports, exports, resources, and base relocations. All of these well-known data structures
are found in a consistent manner, and the location is known as the DataDirectory.
19

www.eForensicsMag.com

The DataDirectory is an array of 16 structures. Each array entry has a predefined meaning for what
it refers to. The IMAGE_DIRECTORY_ENTRY_xxx#defines are array indexes into the DataDirectory
(from 0 to 15).

IMPORTING FUNCTIONS

When you use code or data from another DLL, youre importing it. When any PE file loads, one of the
jobs of the Windows loader is to locate all the imported functions and data and make those addresses
available to the file being loaded.
When you link directly against the code and data of another DLL, youre implicitly linking against the
DLL. You dont have to do anything to make the addresses of the imported APIs available to your code.
The loader takes care of it all. The alternative is explicit linking. This means explicitly making sure that
the target DLL is loaded and then looking up the address of the APIs. This is almost always done via the
LoadLibrary and GetProcAddress APIs.
When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs
when the program first starts. If there are any problems (for example, a referenced DLL that cant be
found), the process is aborted.
Within a PE file, theres an array of data structures, one per imported DLL. Each of these structures gives
the name of the imported DLL and points to an array of function pointers. The array of function pointers is
known as the import address table (IAT). Each imported API has its own reserved spot in the IAT where the
address of the imported function is written by the Windows loader. This last point is particularly important:
once a module is loaded, the IAT contains the address that is invoked when calling imported APIs.
The beauty of the IAT is that theres just one place in a PE file where an imported APIs address is
stored. No matter how many source files you scatter calls to a given API through, all the calls go through
the same function pointer in the IAT.
Lets examine what the call to an imported API looks like. There are two cases to consider: the efficient
way and inefficient way. In the best case, a call to an imported API looks like this:
CALL DWORD PTR [0x00405030]

If youre not familiar with x86 assembly language, this is a call through a function pointer. Whatever
DWORD-sized value is at 0x405030 is where the CALL instruction will send control. In the previous example, address 0x405030 lies within the IAT.
The less efficient call to an imported API looks like this:
CALL 0x0040100C

0x0040100C:
JMP
DWORD PTR [0x00405030]

In this situation, the CALL transfers control to a small stub. The stub is a JMP to the address whose value
is at 0x405030. Again, remember that 0x405030 is an entry within the IAT. In a nutshell, the less efficient
imported API call uses five bytes of additional code, and takes longer to execute because of the extra JMP.
Youre probably wondering why the less efficient method would ever be used. Theres a good explanation. Left to its own devices, the compiler cant distinguish between imported API calls and ordinary functions within the same module. As such, the compiler emits a CALL instruction of the form
CALL XXXXXXXX

whereXXXXXXXXis an actual code address that will be filled in by the linker later. Note that this last
CALL instruction isnt through a function pointer. Rather, its an actual code address. To keep the cosmic
karma in balance, the linker needs to have a chunk of code to substitute forXXXXXXXX. The simplest
way to do this is to make the call point to a JMP stub, like you just saw.
20

www.eForensicsMag.com

Where does the JMP stub come from? Surprisingly, it comes from the import library for the imported
function. If you were to examine an import library, and examine the code associated with the imported
API name, youd see that its a JMP stub like the one just shown. What this means is that by default, in
the absence of any intervention, imported API calls will use the less efficient form.
Logically, the next question to ask is how to get the optimized form. The answer comes in the form of
ahint you give to the compiler. The __declspec(dllimport) function modifier tells the compiler that the
function resides in another DLL and that the compiler should generate this instruction
CALL DWORD PTR [XXXXXXXX]

rather than this one:


CALL XXXXXXXX

In addition, the compiler emits information telling the linker to resolve the function pointer portion of the
instruction to a symbol named __imp_functionname. For instance, if you were calling MyFunction, the
symbol name would be __imp_MyFunction. Looking in an import library, youll see that in addition to the
regular symbol name, theres also a symbol with the __imp__ prefix on it. This __imp__ symbol resolves
directly to the IAT entry, rather than to the JMP stub.
So what does this mean in your everyday life? If youre writing exported functions and providing a .H
file for them, remember to use the __declspec(dllimport) modifier with the function:
__declspec(dllimport) void Foo(void);

If you look at the Windows system header files, youll find that they use __declspec(dllimport) for the
Windows APIs. Its not easy to see this, but if you search for the DECLSPEC_IMPORT macro defined
in WINNT.H, and which is used in files such as WinBase.H, youll see how __declspec(dllimport) is prepended to the system API declarations.

PE FILE STRUCTURE

Now lets dig into the actual format of PE files. Ill start from the beginning of the file, and describe the
data structures that are present in every PE file. Afterwards, Ill describe the more specialized data structures (such as imports or resources) that reside within a PEs sections. All of the data structures that Ill
discuss below are defined in WINNT.H, unless otherwise noted.
In many cases, there are matching 32 and 64-bit data structures for example, IMAGE_NT_HEADERS32 and IMAGE_NT_HEADERS64. These structures are almost always identical, except for some
widened fields in the 64-bit versions. If youre trying to write portable code, there are #defines in WINNT.H
which select the appropriate 32 or 64-bit structures and alias them to a size-agnostic name (in the previous example, it would be IMAGE_NT_HEADERS). The structure selected depends on which mode
youre compiling for (specifically, whether _WIN64 is defined or not). You should only need to use the 32
or 64-bit specific versions of the structures if youre working with a PE file with size characteristics that
are different from those of the platform youre compiling for.

THE MS-DOS HEADER

Every PE file begins with a small MS-DOS executable. The need for this stub executable arose in the
early days of Windows, before a significant number of consumers were running it. When executed on
a machine without Windows, the program could at least print out a message saying that Windows was
required to run the executable.
The first bytes of a PE file begin with the traditional MS-DOS header, called an IMAGE_DOS_HEADER. The only two values of any importance are e_magic and e_lfanew. The e_lfanew field contains the
file offset of the PE header. The e_magic field (a WORD) needs to be set to the value 0x5A4D. Theres
a #define for this value, named IMAGE_DOS_SIGNATURE. In ASCII representation, 0x5A4D is MZ, the
initials of Mark Zbikowski, one of the original architects of MS-DOS.

21

www.eForensicsMag.com

THE IMAGE_NT_HEADERS HEADER

The IMAGE_NT_HEADERS structure is the primary location where specifics of the PE file are stored. Its
offset is given by the e_lfanew field in the IMAGE_DOS_HEADER at the beginning of the file. There are
actually two versions of the IMAGE_NT_HEADER structure, one for 32-bit executables and the other for
64-bit versions. The differences are so minor that Ill consider them to be the same for the purposes of
this discussion. The only correct, Microsoft-approved way of differentiating between the two formats is
via the value of the Magic field in the IMAGE_OPTIONAL_HEADER (described shortly).
An IMAGE_NT_HEADER is comprised of three fields:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

In a valid PE file, the Signature field is set to the value 0x00004550, which in ASCII is PE00. A #define, IMAGE_NT_SIGNATURE, is defined for this value. The second field, a struct of type IMAGE_FILE_
HEADER, predates PE files. It contains some basic information about the file; most importantly, a field
describing the size of the optional data that follows it. In PE files, this optional data is very much required,
but is still called the IMAGE_OPTIONAL_HEADER. The DataDirectory array at the end of the IMAGE_
OPTIONAL_HEADERs is the address book for important locations within the executable. Each DataDirectory entry looks like this:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD
VirtualAddress;
// RVA of the data
DWORD
Size;
// Size of the data
};

THE SECTION TABLE

Immediately following the IMAGE_NT_HEADERS is the section table. The section table is an array of
IMAGE_SECTION_HEADERs structures. An IMAGE_SECTION_HEADER provides information about
its associated section, including location, length, and characteristics.
The file alignment of sections in the executable file can have a significant impact on the resulting file
size. In Visual Studio 6.0, the linker defaulted to a section alignment of 4KB, unless /OPT:NOWIN98 or
the /ALIGN switch was used. The Visual Studio .NET linker, while still defaulting to /OPT:WIN98, determines if the executable is below a certain size and if that is the case uses 0x200-byte alignment.
Another interesting alignment comes from the .NET file specification. It says that .NET executables
should have an in-memory alignment of 8KB, rather than the expected 4KB for x86 binaries. This is to
ensure that .NET executables built with x86 entry point code can still run under IA-64. If the in-memory
section alignment were 4KB, the IA-64 loader wouldnt be able to load the file, since pages are 8KB on
64-bit Windows.

PE FILES STRUCTURE WALKTHROUGH

One interessante content that could help you to understand better the structure of a pe file is this walkthrough image, displaying the dissecated view of a Portable Executable File.

22

www.eForensicsMag.com

Figure 2.

You can check the author URL on: https://code.google.com/p/corkami/wiki/PE101?show=content.

EXECUTABLE COMPRESSION

Executable compressionis any means of compressing anexecutablefile and combining the compressed
data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing
it. Inmost cases this happens transparently so the compressed executable can be used in exactly the
same way as the original. Executable compressors are often referred to as (runtime packers), (software
packers), (software protectors) (or even (polymorphic tools) and obfuscation tools).
A compressed executable can be considered aself-extrating package, where compressed data is
packaged along with the relevant decompression code in an executable file. Some compressed executables can be decompressed to reconstruct the original program file without being directly executed.
Twoprograms that can be used to do this are CUP386 and UNP.
Most compressed executables decompress the original code in memory and most require slightly more
memory to run (because they need to store the decompressor code, the compressed data and the decompressed code). Moreover, some compressed executables have additional requirements, such as
those that write the decompressed executable to the file system before executing it.
Executable compression is not limited to binary executables, but can also be applied to scripts, such
asJavascript. Because most scripting languages are designed to work on human-readable code, which
has a highredundancy, compression can be very effective and as simple as replacing long names used
to identifyvariablesandfunctionswith shorter versions and/or removingwhite-space.
Softwaredistributors use executable compression for a variety of reasons, primarily to reduce thesecondary storagerequirements of their software; as executable compressors are specifically designed to
compress executable code, they often achieve bettercompression ratiothan standarddata compressionfacilities such asgzip,ziporbzip2. This allows software distributors to stay within the constraints of
their chosen distribution media (such asCD-ROM,DVD-ROM, orFloppy disk), or to reduce the time and
bandwidth customers require to access software distributed via theInternet.
23

www.eForensicsMag.com

Executable compression is also frequently used to deterreverse engineeringor to obfuscate the contents of the executable (for example, to hide the presence ofmalwarefrom antivirusscanners) by proprietary methods of compression and/or addedencryption. Executable compression can be used to prevent directdisassembly, maskstring literalsand modify signatures. Although this does not eliminate the
chance of reverse engineering, it can make the process more costly.
A compressed executable requires less storage space in the file system, thus less time to transfer data
from the file system into memory. On the other hand, it requires some time to decompress the data before
execution begins. However, the speed of various storage media has not kept up with average processor
speeds, so the storage is very often the bottleneck. Thus the compressed executable will load faster on
most common systems. On modern desktop computers, this is rarely noticeable unless the executable is
unusually big, so loading speed is not a primary reason for or against compressing an executable.
On operating systems which read executable images on demand from the disk (seevirtual memory),
compressed executables make this process less efficient. The decompressor stub allocates a block of
memory to hold the decompressed data, which stays allocated as long as the executable stays loaded,
whether it is used or not, competing for memory resources with other applications all along. If the operating system uses a swap file, the decompressed data has to be written to it to free up the memory instead
of simply discarding unused data blocks and reloading them from the executable image if needed again.
This is usually not noticeable, but it becomes a problem when an executable is loaded more than once
at the same time the operating system cannot reuse data blocks it has already loaded, the data has to
be decompressed into a new memory block, and will be swapped out independently if not used. Theadditional storage and time requirements mean that it has to be weighed carefully whether to compress
executables which are typically run more than once at the same time.
Another disadvantage is that some utilities can no longer identifyrun-time librarydependencies, asonly thestatically linkedextractor stub is visible.
Also, some oldervirus scannerssimply report all compressed executables asvirusesbecause the
decompressor stubs share some characteristics with those. Most modern virus scanners can unpack
several different executable compression layers to check the actual executable inside, but some popular anti-virus and anti-malware scanners have had troubles with false positive alarms on compressed
executables. In an attempt to solve the problem ofmalwareobfuscated with the help ofruntime packerstheIEEEIndustry Connections Security Grouphas introduced asoftware taggantsystem.
Executable compression used to be more popular when computers were limited to the storage capacity offloppy disksand smallhard drives; it allowed the computer to store more software in the same
amount of space, without the inconvenience of having to manually unpack an archive file every time the
user wanted to use the software. However, executable compression has become less popular because
of increased storage capacity on computers.
Lets take a look on a practical example of binary compression. For this example we will use a binary sample generated on Delphi 7 that you can download here: https://dl.dropboxusercontent.com/u/41203508/
sample_demo.exe.
I download it on my REMNux Linux.

Figure 3. Original sample.exe weight

24

www.eForensicsMag.com

Now we will compress the sample_demo.exe using the UPX packer.


UPX achieves an excellent compression ratio and offers very fast decompression. Your executables
suffer no memory overhead or other drawbacks for most of the formats supported, because of in-place
decompression. UPX strengths in a nutshell:
excellent compression ratio: typically compresses better than WinZip/zip/gzip, use UPX to decrease
the size of your distribution!
very fast decompression: ~10 MB/sec on an ancient Pentium 133, ~200 MB/sec on an Athlon XP
2000+.
no memory overhead for your compressed executables because of in-place decompression.
safe: you can list, test and unpack your executables. Also, a checksum of both the compressed and
uncompressed file is maintained internally.
universal: UPX can pack a number of executable formats.
portable: UPX is written in portable endian-neutral C++.
extendable: because of the class layout its very easy to add new executable formats or new compression algorithms.
free: UPX is distributed with full source code under the GNU General Public License v2+, with special exceptions granting the free usage for commercial programs as stated in the UPX License
Agreement.
UPX is native on REMNUX Linux, but you can download the Windows version if you want here: http://
upx.sourceforge.net/download/upx391w.zip.

Figure 3. UPX options

As you can see, you can control the level of compression using the option -1 to -9. The less the
value the compress will be faster, the higher the number the compression will be better. Lets pack our
sample and see the difference.

25

www.eForensicsMag.com

Figure 4. Compressing binary using UPX and difference between file sizes

Our sample got a compression rate of 43,29%! This is really a very good result. The new file have just
164k comparated with the 380k of original file. Another thing that compression process help is to hide
some hardcoded strings on the binary. When you compile the source code and generate the executable,
some strings hardcoded and defined outside variables are clear text visible when editing the executable
using a Hexdecimal Editor for example. Compressing you modify the original file structure and the strings
are modified and scrambled.
Using the Linux program strings we can extract the visible strings inside a executable for reading.
Lets comparate the number of printable/visible strings after and before:

Figure 5. Difference on string number in our samples

Since binary compression is almost a constant in all malware sample that you will get in touch, is necessary to know it very well. Download the most of packers you can and try it over the sample_demo.exe.
See the difference in term of file size, detection rate on antivirus and so on.
Next part we will talk about PE file identification. You can also use the tools that we will talk analizing
the sample among different packers.
Try to combine some of then. Is it possible? Go ahead and do your lab!
Here you got a small cheet with the most known binary Packers.

26

www.eForensicsMag.com

Name

Latest stable

Software license

x86-64support

.netshrink

2.6 (April3,2014)[1]

Proprietary

Yes

Armadillo

9.62 (June7,2013)

Proprietary

Yes

ASPack

2.29 (August3,2011)

Proprietary

ASPR (ASProtect)

1.64 (September1,2011)

Proprietary

BoxedApp Packer

2.2 (June16,2009)[2]

Proprietary

Yes

CExe

1.0b (July20,2001)

GPL

No

dotBundle

1.3 (April4,2013)[3]

Proprietary

Yes

Enigma Protector

3.80 (August2,2012)[4]

Proprietary

Yes

EXE Bundle

3.11 (January7,2011)[5]

Proprietary

EXE Stealth

4.14 (June29,2011)[6]

Proprietary

eXPressor

1.8.0.1 (January14,2010)

Proprietary

FSG

2.0 (Unknown)

Freeware

No

kkrunchysrc

0.23a4 (Unknown)

BSD

No

MEW

1.1 (Unknown)

Freeware

No

MPRESS

2.19 (January2,2012)

Freeware

Yes

Obsidium

1.5 (March19,2014)[7]

Proprietary

Yes

PELock

1.0.694 (January23,2012)[8]

Proprietary

No

PESpin

1.33 (May3,2011)

Freeware

Yes

RLPack Basic

1.21 (October31,2008)

GPL

No

Smart Packer Pro

1.9.2 (July14,2013)

Proprietary

Yes

Themida

2.2.8.0 (March18,2014)

Proprietary

Yes

UPX

3.09 (February18,2013)

GPL

experimental

VMProtect

2.1 (September26,2011)

Proprietary

Yes

XComp/XPack

0.98 (February18,2007)

Freeware

No

PE FILE IDENTIFICATION: FIRST CONTACT ANALYSIS

At first, we will make a simple trial to identify what kind of binary we are dealing with. Is it an executable, a
document or a dll? We need this knowledge to define the best approach on the next steps in the analysis.
If the sample was obfuscated or encrypted it will be much harder to obtain important information without
using some reversion methods on it. Identifying and knowing what we are dealing with will provide us the
option to have a better toolkit to execute a most satisfying analysis.
For the initial identification in our sample, we will use PEid. PEiD detects most common packers,
cryptors and compilers for PE files and it can currently detect more than 470 different signatures. It
seems that the official website (www.peid.info) has been discontinued. Hence, the tool is no longer available from the official website but it still hosted on other sites. You can download it from here: http://www.
softpedia.com/get/Programming/Packers-Crypters-Protectors/PEiD-updated.shtml.

27

www.eForensicsMag.com

After you unpack the file, you will have a folder tree similar to the one below:

Figure 6. PEid directory tree files

Now you need to update the signature base on the folder. You can use one of the repositories ahead:
http://reverse-engineering-scripts.googlecode.com/files/UserDB.TXT
http://research.pandasecurity.com/blogs/images/userdb.txt
PEid interface is quite simple, you just need to load the file that will be identified and it will show you the
information about it. Lets see what PEid can tell about our sample.

Figure 7. Checking original sample

As you can see, it identies that our sample were created using Borland Delphi. In what this will be usefull? Module 3 will cover Static Analysis of the sample, in this case Disassembly and another stuff, and
knowning what kind of programming language is used will help to choose the right toolkit for analkysis.
Now lets put the sample compressed using UPX.

28

www.eForensicsMag.com

Figure 8. Checking UPX Packed sample

A very nice point is that, besides it tell us that the sample is compressed using UPX, it also says that
were used a Delphi Stub. So we didnt need to uncompress the file to go any further. However, off course
it will never be so sure. Best option is to goin deep and unpack the sample. Lets check the use of another packer and see if PEiD will generate an accurate report.

Figure 9. Checking PESpin packed sample

Very good, it detected the packing using PESpin. PESpin is a very famous binary packer/crypter. Different from UPX that only compress the binary, PESpin encrypt the sections and is very difficult to recover
the original file without the right tools. However, its stub are much known among antivirus systems and
is rarely an occasion were a malware encrypted with PESping passes totally unindentified.
Another very interesting tool that I recommend for PE Identification is the EXEinfo PE. Besides PEid
was discontinued, EXEinfo PE are being updated and it bring to us much more information in the program interface. You can obtain a free copy at http://exeinfo.atwebpages.com/.

29

www.eForensicsMag.com

Now lets examine our same samples using EXEinfo.

Figure 10. Checking original sample

Figure 11. UPX Packed sample

Figure 12. PESpin packed sample

A nice feature is that EXEinfo PE has a Lamer Info Help Hint Unpack Info above the information
of the detected signature. He also give us some hints on how to unpack the binary and go ahead in the
analysis. In much cases follow this hints will really help you on unpacking process.
30

www.eForensicsMag.com

As the packing process is used by the malware creators to avoid detection and make the analysis process harder, it could also be a bad thing if the developer do not pay attention of the final result. As we will
see, the fact of a binary being packed can be directly attributed to a malicious activity.
Lets submit the 3 files to a online multiengine antivirus platform and see the results. First we will send
the clean file.

Figure 13. Original sample

For our clean file, there are two antivirus signatures. This could be derived of the fact that our sample
uses some common strings for file access and shell execution features, this probably triggered these engines. Now lets check sending the UPX packed file.

Figure 14. UPX Packed sample

As you can notice, the signature of detection has changed. In the first case, the antivirus says about
Delphi package. In the second it acuse the use of UPX packer. However, we were benefited from the
upx use. The detection rate goes down from two to one. And for the PESpin packed sample:

31

www.eForensicsMag.com

Figure 15. PESpin packed sample

The use of PESpin trigged too many antivirus signature detection. In total, it triggered 15. So not a good
ideia use it to stay FUD (Fully UnDetected).
Another tool that can help you to figure out some binary features is the PeStudio. PeStudiois a tool
that performs thestatic investigationof any Windows executable files. Executable files analyzed with
PeStudio are never started. Unknown Executable files and even Malware can be inspected withno risk.
PeStudio runs on any Windows Platform and is fullyportable, no installation is required. PeStudio has
azero foot print, it does not change the system it is running on nor does it leave anything behind.
Among very famous security tools, PeStudio has proudly obtainedRank 4on theBest 2013 Security
Tools. You can handle a free copy at http://www.winitor.com.

Figure 16. PeStudio


32

www.eForensicsMag.com

As the image says, just drop the file you want to analyze inside the program windows and it will automate it.

Figure 17. Sample analyzed on PeStudio

The most important thing you will notice using PEStudio is the hability to identify some blacklisted
strings in the sample. What is the meaning of this? In most of malware cases, malware creators uses
hardcoded strings like, USER, PORT, SERVER, PASSWORD, so, the fact of identifying this itens inside
your sample will help you quickly to performe the next steps that we will know in modelo 3.
PEStudio will also search for your sample in http://www.virustotal.com, the last analysis will be displayed and you can see of someone else have sent it before you get in touch with the executable.

CREATING OUR OWN REPOSITORY

Since in next modules you will need some real malware to practicise, you may ask me Whoa, where do
I will get these little monsters? Could be a very rought work to catch some malware samples in the wild.
However, for this I will help you to build your own malware Zoo. There are many malware repositories in
the internet, and to become the process of gather then and download to our lab we will use the Maltrieve.
Maltrieve originated as a fork ofmwcrawler. It retrieves malware directly from the sources as listed at
a number of sites, including:





Malc0de
Malware Black List
Malware Domain List
VX Vault
URLqery
CleanMX

These lists will be implemented if/when they return to activity.


NovCon Minotaur

33

www.eForensicsMag.com

Other improvements include:







Proxy support
Multithreading for improved performance
Logging of source URLs
Multiple user agent support
Better error handling
VxCageandCuckoo Sandboxsupport

Since In the module 4 (Dynamic Analysis) we will use Cuckoo and VxCage, this tool will fit like a Glove!
REMNux Linux already have Maltrieve installed, but since the virtual machine disk space could be
a problem I installed a news Linux machine just to act like a malware repository. If you have the possibility to do this I recommend, but if you cant dont hesitate to start even using the REMNUX Default
disk space.

Figure 18. Directory tree of maltrieve

Since you download and extract the files in the /opt directory, you have to configurate the maltrieve.
cfg file with the correct parameters. I recommend you to create a new folder in the root of system and
redirect the downloaded malware into there. In my lab I created the folder /malware. Very suggestible.

Figure 19. maltrieve.cfg

Once you set up the configuration you just need to execute the maltrieve.py file with h to see the validoptions:

34

www.eForensicsMag.com

Figure 20. Usage syntaxe

Since the process will take some time and bandwidith, you can execute it in background passing the
& parameter in the end of command. You can see the evolution of process using the mailtreve.log file.

Figure 21. Tail on log entries showing off the actitivy

After feew hours, you can check your output directory and you will be amazed with the number of files
downloaded. Be careful, all this files are real and live malware, botnet, APTs, exploits and many more.

Figure 22. Our brand new zoo plenty of malwares for analysis!

You will use this repository ongoing in the course, every example that you read in the course you can
try it using another type of malware and you will adquire some practice this way. Update your Zoo everyweek and you will have always a lot of new stuff to play with.

35

www.eForensicsMag.com

MODULE 3
In this module we will learn about the Dynamic Analysis process
of malware. We will know the main tools used in this process and
the possible limitations.

What you will learn:


Handle the must used toolkit for
malware dynamic analysis.

What you should know:


Basic knowledge on Windows systems, basic knowledge on Linux
systems.

ynamic Analysis consist in observe the behavior of the application during the runtime. Different from the static analysis that we observe the
sample without executing it, in dynamic analysis we will set an ambient that can simulate the real world, applying some restrictions and using
software that can track everything that happen inside this little jail. We will try
to see the sample executing and acting like was in real life, so we can learn
from its behavior, learn what files it modify, create or delete, if it contacts external hosts, if it try to send or download information data and using the results of this observation generate the best mitigation plan. The Dynamic Analysis is very good but can have few limitations.
Some malware creators uses some protections in his executables to detect
if the malware are running a real machine or if it is inside a sandbox or virtual
machine specially crafted to analyze it, so taking this assumptions in mind
you must understand that if the sample just exit earlier some protection must
be applied.
In the previous chapters I give to you a initial recipe of how to create your
very first malware laboratory. Since I presume that you have already did it, its
time now for us to practice the use of this toolkit and initiate with some malware samples analysis.

INTRODUCTION

For the next videos were well navigate through the set of tools you will need
to download the first demo sample.
You can download it from here: http://tinyurl.com/lfgov78.

CAPTUREBAT

CaptureBATis a free Windows tool for capturing local information about the
systems processing. It allows you to observe process, registry, file system
and network-level activities on the host. This tool is useful for system troubleshoting as well as behavioral malware analysis.
Just in case you can obtain it from here: http://tinyurl.com/lkvvc34.
You just have to keep in mind that CaptureBAT will not analyze the sample
itself, it will generate a report of all the activities done in the between execution and finalization of the malware runtime.
36

www.eForensicsMag.com

REGSHOT

Regshotis an open-source (LGPL) registry compare utility that allows you to quickly take a snapshot
of your registry and then compare it with a second one done after doing system changes or installing
a new software product.
Download Link: http://tinyurl.com/lkvvc34.
Its very importante to remember to reset the virtual machine snapshot to a clear one everytime you will
use this tool. This can avoid confusion by infected shots of Windows registry hives.

PROCESS MONITOR & PROCESS EXPLORER

Process Monitoris a free tool from Windows Sysinternals, part of the Microsoft TechNet website. The
toolmonitorsand displays in real-time all file system activity on a Microsoft Windows operating system.
Download Link: http://tinyurl.com/2okfn9.
Process Exploreris a freeware task manager and system monitor for Microsoft Windows created by Sysinternals, which has been acquired by Microsoft. It provides the functionality of Windows Task Manager
along with a rich set of features for collecting information aboutprocesses running on the users system.
Download Link: http://tinyurl.com/ys2zq2.
When using ProcMon, you see that a lot of noise are generated into the capture from processes whitin the Windows System. I strongly recommend that you take some time to create filters to help you limit
the informations displayed in the stdout of application. Remember to save this filter modifications in the
productive version of your virtual machine snapshot.

FAKENET

FakeNetis a tool that aids in the dynamic analysis of malicious software. The tool simulates a network
so that malware interacting with a remote host continues to run allowing the analyst to observe the malwares network activity from within a safe environment.
Download Link: https://sourceforge.net/projects/fakenet/.

REAL MALWARE SAMPLE ANALYSIS.

Now that we are familiarized with all the tools in our Virtual Machine, its time to start the first analysis of
a real malware.
For this practice exercise we will use a real malware. You can download it from here: http://tinyurl.com/
mlcx7cg.
Remember! Even this sample is old and not really active, this still can harm your system, so watch out
and dont unpack it outside your virtual machine.
The password to uncompress the file is malware. Quite obvious hehe.
Follow me into this analysis!

EXERCISE PROPOSAL #1

Consider this scenario: An user in your organization reports that her desktop is mishehaving. Your initial
assessment reveals an unifamiliar program named hanuman.exe running on the users workstation.
Anti-virus tools dont recognize this program as malware, and you cannot locate any relevant information about it on the Web. You grab a copy of hanuman.exe, roll up your sleeves, and begin the malware
dynamics analysis process.
Grab your copy over here: http://tinyurl.com/psyhyrn.
In the next vdeo you will have the resolution of the analysis. Please consider do not watch it before you
had finished the analysis by your own!
37

www.eForensicsMag.com

RESOLUTION: EXERCISE #1 #2, #3 AND #4

For this second exercise proposal you can consider de same scenario but now you will need to analyze
some different samples. The complex in this new samples are becoming higher, since the first exercise
was just a Warm-up.
Exercise 2 sample: http://tinyurl.com/nryknn5
Exercise 3 sample: http://tinyurl.com/lodb7th
Exercise 4 sample: http://tinyurl.com/mdlsmsk
Tips in resolution: Keep notes for each Discovery you made on each sample. Write down and this will
help you to keep going further in the investigation.
Clear your snapshot if you think you need to restart the analysis. Dont worry, in the beginning the
amount of information generated will be tough to keep track, but with practice will become easier to locate information you are looking for.
The vdeo with these 3 exercises solutions will be at the module 4!
Now that you are becoming more practical with the use of the tools and the overal concepts, I recomend you to grab as many malwares as you can from your Malware Zoo and try to analyze it.
If you want to share with me the result of a specific sample that you are trying to analyze, feel free to
contact me using my e-mail! Just put on the subject Malware Analysis Starterkit.
In module 4 we will go on through Code Analysis using Static Malware Analisys tools. We will deal not
only with executables (PEfiles) but also with Java applets malware (Jar) and JavaScript files. Also in the
last session of module 4 I will present to you some frameworks that generate automatic analysis of malware (both dynamics and static) and are very interesting and time saver when it comes to the analysis
of a huge amount of samples.

38

www.eForensicsMag.com

MODULE 4
In this module you will know more analysis tools for different
approach like JavaScript artifacts, Android malicious programs
and infected Office files. Also we will deal with some automated
analysis frameworks.

What you will learn:


How to use automated malware
analysis frameworks like Cuckoo
and ZeroWine tryouts, basic concepts on Android malware and
different methods of infection using JavaScript, MS Office files and
many more.

What you should know:


Basic knowledge on network and
GNU Linux commands;
Knowledge of network protocols;

he main goal of this module is to show you some automated frameworks that will make your life easier (and faster). You have learned how
to use the tools on your Windows XP Sandbox and how to execute the
dynamic analysis of the malware samples at your own.
Now you have to know how to do this in a very fast way. Using these frameworks you can create scripts to generate a huge amount of malware analysis
data from your malware repository.
At first we will use Cuckoo Sandbox and them Zerowine Tryouts.
AUTOMATED ANALYSIS: CUCKOO SANDBOX

Cuckoo is an open source automated malware analysis system.


Its used to automatically run and analyze files and collect comprehensive
analysis results that outline what the malware does while running inside an
isolated Windows operating system.
It can retrieve the following type of results:
Traces of win32 API calls performed by all processes spawned by the
malware.
Files being created deleted or downloaded by the malware during its execution.
Memory dumps of the malware processes.
Network traffic trace in PCAP format.
Screenshots of Windows desktop taken during the execution of the malware.
Full memory dumps of the machines.
39

www.eForensicsMag.com

USE CASES

Cuckoo is designed to be used both as a standalone application as well as to be integrated in larger


frameworks, thanks to its extremely modular design.
It can be used to analyze:











Generic Windows executables


DLL files
PDF documents
Microsoft Office documents
URLs and HTML files
PHP scripts
CPL files
Visual Basic (VB) scripts
ZIP files
Java JAR
Python files
Almost anything else

Thanks to its modularity and powerful scripting capabilities, theres not limit to what you can achieve
withCuckoo.

ARCHITECTURE

Cuckoo Sandbox consists of a central management software which handles sample execution and analysis.
Each analysis is launched in a fresh and isolated virtual machine. Cuckoos infrastructure is composed
by an Host machine (the management software) and a number of Guest machines (virtual machines for
analysis).
The Host runs the core component of the sandbox that manages the whole analysis process,
whiletheGuests are the isolated environments where the malware samples get actually safely executed
and analyzed.
The following picture explains Cuckoos main architecture:

40

www.eForensicsMag.com

Although the recommended setup isGNU/Linux(Ubuntu preferably) as host andWindows XP Service


Pack 3as guest, Cuckoo has proved to work smoothly also onMac OS Xas host andWindows Vistaand
Windows 7as guests.
Cuckoo can be downloaded from theofficial website, where the stable and packaged releases are distributed, or can be cloned from the official git repository.

REQUIREMENTS

Before proceeding on configuring Cuckoo, youll need to install some required software and libraries.

INSTALLING PYTHON LIBRARIES

Cuckoo host components are completely written in Python, therefore make sure to have an appropriate
version installed. For the current releasePython 2.7is preferred.
Install Python on Ubuntu:
$ sudo apt-get install python

If you want to use the Django-based web interface, youll have to install MongoDB too:
$ sudo apt-get install mongodb

In order to properly function, Cuckoo requires SQLAlchemy and Python BSON to be installed.
Install withapt-get:
$ sudo apt-get install python-sqlalchemy python-bson

Install withpip:
$ sudo pip install sqlalchemy bson

There are other optional dependencies that are mostly used by modules and utilities. The following libraries are not strictly required, but their installation is recommended:












Dpkt(Highly Recommended): for extracting relevant information from PCAP files.


Jinja2(Highly Recommended): for rendering the HTML reports and the web interface.
Magic(Optional): for identifying files formats (otherwise use file command line utility)
Pydeep(Optional): for calculating ssdeep fuzzy hash of files.
Pymongo(Optional): for storing the results in a MongoDB database.
Yaraand Yara Python (Optional): for matching Yara signatures (use release 1.7.2 or above or the
svn version).
Libvirt(Optional): for using the KVM machine manager.
Bottlepy(Optional): for using theapi.pyorweb.pyutility (use release 0.10 or above).
Django(Optional): for using the web interface (use release 1.5 or above).
Pefile(Optional): used for static analysis of PE32 binaries.
Volatility(Optional): used for forensic analysis on memory
MAEC Python bindings(Optional): used for MAEC reporting (use a release >=4.0, but <4.1).
Chardet(Optional): used for detecting string encoding.

Some of them are already packaged in Debian/Ubuntu and can be installed with the following command:
$ sudo apt-get install python-dpkt python-jinja2 python-magic python-pymongo python-gridfs python-libvirt python-bottle python-pefile python-chardet

Except forpython-magic,python-dpktandpython-libvirt, the others can be installed throughpiptoo:


$ sudo pip install jinja2 pymongo bottle pefile cybox==2.0.1.4 maec==4.0.1.0 django chardet

41

www.eForensicsMag.com

YaraandPydeepwill have to be installed manually, so please refer to their websites.


If want to use KVM its packaged too and you can install it with the following command:
$ sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils

VIRTUALIZATION SOFTWARE

Despite heavily relying onVirtualBoxin the past, Cuckoo has moved on being architecturally independent from the virtualization software. As you will see throughout this documentation, youll be able to define and write modules to support any software of your choice.
For the sake of this guide we will assume that you have VirtualBox installed (which still is the default
option), but this doesnotaffect anyhow the execution and general configuration of the sandbox.
You are completely responsible for the choice, configuration and execution of your virtualization software, therefore please refrain from asking for help on it in our channels and lists: refer to the softwares
official documentation and support.
Assuming you decide to go for VirtualBox, you can get the proper package for your distribution at the
official download page. The installation of VirtualBox is outside the scope of this documentation, if you
are not familiar with it please refer to theofficial documentation.

INSTALLING TCPDUMP

In order to dump the network activity performed by the malware during execution, youll need a network
sniffer properly configured to capture the traffic and dump it to a file.
By default Cuckoo adoptstcpdump, the prominent open source solution.
Install it on Ubuntu:
$ sudo apt-get install tcpdump

Tcpdump requires root privileges, but since you dont want Cuckoo to run as root youll have to set specific Linux capabilities to the binary:
$ sudo setcap cap_net_raw,cap_net_admin=eip /usr/sbin/tcpdump

You can verify the results of last command with:


$ getcap /usr/sbin/tcpdump /usr/sbin/tcpdump = cap_net_admin,cap_net_raw+eip

If you dont havesetcapinstalled you can get it with:


$ sudo apt-get install libcap2-bin

Or otherwise (not recommended) do:


$ sudo chmod +s /usr/sbin/tcpdump

INSTALLING VOLATILITY

Volatility is an optional tool to do forensic analysis on memory dumps. In combination with Cuckoo, it can
automatically provide additional visibility into deep modifications in the operating system as well as detect the presence of rootkit technology that escaped the monitoring domain of Cuckoos analyzer.
In order to function properly, Cuckoo requires at least version 2.3 of Volatility. You can get it from the
official repository.
See the volatility documentation for detailed instructions on how to install it.

42

www.eForensicsMag.com

INSTALLING CUCKOO

You either can run Cuckoo from your own user or create a new one dedicated just to your sandbox setup.
Make sure that the user that runs Cuckoo is the same user that you will use to create and run the virtual
machines, otherwise Cuckoo wont be able to identify and launch them.
Create a new user:
$ sudo adduser cuckoo

If youre using VirtualBox, make sure the new user belongs to the vboxusers group (or the group you
used to run VirtualBox):
$ sudo usermod -G vboxusers cuckoo

If youre using KVM or any other libvirt based module, make sure the new user belongs to the libvirtd
group (or the group your Linux distribution uses to run libvirt):
$ sudo usermod -G libvirtd cuckoo

INSTALL CUCKOO

Extract or checkout your copy of Cuckoo to a path of your choice and youre ready to go ;-).

CONFIGURATION

Cuckoo relies on six main configuration files:


cuckoo.conf: for configuring general behavior and analysis options.
auxiliary.conf: for enabling and configuring auxiliary modules.
<machinery>.conf: for defining the options for your virtualization software (the file has the same
name of the machinery module you choose in cuckoo.conf).
memory.conf: Volatility configuration.
processing.conf: for enabling and configuring processing modules.
reporting.conf: for enabling or disabling report formats.
To get Cuckoo working you have to editauxiliary.conf:,cuckoo.confand<machinery>.confat least.

PREPARING GUEST REQUIREMENTS

In order to make Cuckoo run properly in your virtualized Windows system, you will have to install some
required software and libraries.

INSTALL PYTHON

Python is a strict requirement for the Cuckoo guest component (analyzer) in order to run properly.
You can download the proper Windows installer from theofficial website. Also in this case Python 2.7
is preferred.
Some Python libraries are optional and provide some additional features to Cuckoo guest component.
They include:
Python Image Library: its used for taking screenshots of the Windows desktop during the analysis.
They are not strictly required by Cuckoo to work properly, but you are encouraged to install them if you
want to have access to all available features. Make sure to download and install the proper packages
according to your Python version.

ADDITIONAL SOFTWARE

At this point you should have installed everything needed by Cuckoo to run properly.
Depending on what kind of files you want to analyze and what kind of sandboxed Windows environment you want to run the malware samples in, you might want to install additional software such as
43

www.eForensicsMag.com

browsers, PDF readers, office suites etc. Remember to disable the auto update or check for updates
feature of any additional software.
From release 0.4 Cuckoo adopts a custom agent that runs inside the Guest and that handles the communication and the exchange of data with the Host. This agent is designed to be cross-platform, therefore you should be able to use it on Windows as well as on Linux and OS X. In order to make Cuckoo
work properly, youll have to install and start this agent.
Its very simple.
In theagent/directory you will find andagent.pyfile, just copy it to the Guest operating system (in whatever way you want, perhaps a temporary shared folder or by downloading it from a Host webserver) and
run it. This will launch the XMLRPC server, which will be listening for connections.
On Windows simply launching the script will also spawn a Python window, if you want to hide it you can
rename the file fromagent.pytoagent.pyw, which will prevent the window from spawning.
If you want the script to be launched at Windows boot, just place the file in theStartupfolder.

SAVING THE VIRTUAL MACHINE

Now you should be ready to save the virtual machine to a snapshot state.
Before doing thismake sure you rebooted it softly and that its currently running, with Cuckoos agent
running and with Windows fully booted.
Now you can proceed saving the machine. The way to do it obviously depends on the virtualization
software you decided to use.
If you follow all the below steps properly, your virtual machine should be ready to be used by Cuckoo.

VIRTUALBOX

If you are going for VirtualBox you can take the snapshot from the graphical user interface or from the
command line:
$ VBoxManage snapshot <Name of VM> take <Name of snapshot> --pause

After the snapshot creation is completed, you can power off the machine and restore it:
$ VBoxManage controlvm <Name of VM> poweroff $ VBoxManage snapshot <Name of VM> restorecurrent

KVM

If decided to adopt KVM, you must fist of all be sure to use a disk format for your virtual machines which
supports snapshots. By default libvirt tools create RAW virtual disks, and since we need snapshots youll
either have to use QCOW2 or LVM. For the scope of this guide we adopt QCOW2, which is easier to
setup than LVM.
The easiest way to create such a virtual disk correctly is using the tools provided by the libvirt suite. You
can either usevirshif you prefer command-line interfaces orvirt-managerfor a nice GUI. You should be
able to directly create it in QCOW2 format, but in case you have a RAW disk you can convert it like this:
$ cd /your/disk/image/path $ qemu-img convert -O qcow2 your_disk.raw your_disk.qcow2

Now you have to edit your VM definition as follows:


$ virsh edit <Name of VM>

Find the disk section, it looks like this:


<disk type=file device=disk>
44

www.eForensicsMag.com

<driver name=qemu type=raw/>


<source file=/your/disk/image/path/your_disk.raw/>
<target dev=hda bus=ide/>
<address type=drive controller=0 bus=0 unit=0/>
</disk>

And change type to qcow2 and source file to your qcow2 disk image, like this:
<disk type=file device=disk>
<driver name=qemu type=qcow2/>
<source file=/your/disk/image/path/your_disk.qcow2/>
<target dev=hda bus=ide/>
<address type=drive controller=0 bus=0 unit=0/>
</disk>

Now test your virtual machine, if everything works prepare it for snapshotting while running Cuckoos
agent. This means the virtual machine needs to be running while you are taking the snapshot. Then you
can shut it down. You can finally take a snapshot with the following command:
$ virsh snapshot-create <Name of VM>

Having multiple snapshots can cause errors.


ERROR: No snapshot found for virtual machine VM-Name. VM snapshots can be managed using the
following commands.
$ virsh snapshot-list VM-Name
$ virsh snapshot-delete VM-Name 1234567890

VMWARE WORKSTATION

If you decided to adopt VMware Workstation, you can take the snapshot from the graphical user interface or from the command line:
$ vmrun snapshot /your/disk/image/path/wmware_image_name.vmx your_snapshot_name

Where your_snapshot_name is the name you choose for the snapshot. After that power off the machine
from the GUI or from the command line:
$ vmrun stop /your/disk/image/path/wmware_image_name.vmx hard

In case you planned to use more than one virtual machine, theres no need to repeat all the steps done
so far: you can clone it. In this way youll have a copy of the original virtualized Windows with all requirements already installed.
Now if you have followed all the instructions right we can go ahead and start our Cuckoo instance.
The main features and the usage are in the video! Lets go see it working!
You can download the PHPVirtualbox from here: http://sourceforge.net/projects/phpvirtualbox/.
The full documentation around Cuckoo Sandbox can be taken from here: http://docs.cuckoosandbox.
org/en/latest/.
The next automated analysis framework that we will work with is Zerowine Tryouts. The same way of
Cuckoo, Zerowine tryouts is an Opensource malware analysis framework that the main goal is to just upload the malicious sample (either Windows Executable or PDF file) and wait till the job is done.
You can download your copy over here: http://downloads.sourceforge.net/zerowine-tryout/zerowinetryout-alpha5-image.7z.
45

www.eForensicsMag.com

JAVASCRIPT MALWARE ANALYSIS

JavaScript gives web pages authors the ability to run any code they want when your browser visits or is
steered to their page. Although the various JavaScript implementations have some security functions to
try and keep JS code from doing anything overtly hostile to your computer, two problems emerge: that
code has bugs, which allow for attack or exploitation of user browsers.
Example current Javascript-related attack techniques that are quite effective use hidden iframes to
load JS malware from other compromised sites which then tries to execute in the browser. This is seen
in advertisements included into big popular sites as well as in less well-trafficked ones. If successful it
may then continue on to exploit local system software. In this manner the various versions of the Black
Hole Exploit Kit attack vulnerable versions of PDF and Flash software to infect the host machine with
botnet clients.
Its been difficult for browser and system makers to make their legitimate messages hard to counterfeit.
Windows User Account Control is one of the best techniques because of how it interrupts every other
program when it needs privileges to complete a task. Most browser and software pop-up messages are
easily faked and you should be wary of them.
Since Javascript must be downloaded to run on the client, its source is easily accessible. The code can
be captured either during transport, from within the browser, or on disk from cache. For this and other reasons, Javascript malware writers must resort to all sorts of dirty tricks to hide their real malicious functions.
Obfuscation not only discourages casual reverse-engineering of the exploit used and its inner workings, it also makes it more difficult for internet security/virus-scanners to corrently identify and prevent
the malware from running. If your code appears to be constructing a very large string with hex-encoded
data (ie., attempting a buffer overflow condition with shellcode to execute arbitrary commands), then
youre due to get flagged. If on the other hand you have some innocent-looking strings compressed or
encrypted so as not to reveal their evil nature at first glance, your dirty work may in fact fly under the radar, undetected.

JSDETOX MALWARE JAVASCRIPT ANALYSIS FRAMEWORK

JSDetox is a tool to support the manual analysis of malicious Javascript code.


While it does use the browser as user interface, the whole analysis/execution of the javascript code is
done in the backend. As with any tool that handles malicious, unknown code, you should consider installing JSDetox into an isolated environment. It is quite easy toinstall on most linux distributions, so it should
be easy to set up JSDetox inside a virtual machine.
Static analysis / deobfuscation
JSDetox does not only reformat/beautify code but is able to analyze it and precompute static code.
A simple example:
ORIGINAL CODE
var x = 10 * 3 + 100 70 / 10;

ANALYSIS RESULT
var x = 123;

HTML DOM EMULATION


Despite normal obfuscation techniques, the latest Javascript malware makes use of the objects/functions only available in browsers, e.g. the document object.
JSDetox emulates parts of a browser, especially the document object (you can even import an HTML
document that will be used for the emulation).

46

www.eForensicsMag.com

This feature makes it possible to handle code like this:


document.write(<div id=AU4Ae>212</div>); var OoF2wUnZ = parseInt(document.getElementById(AU4Ae).
innerHTML); if(OoF2wUnZ == 212) {
...

DATA ANALYSIS
JSDetox can be used to analyze shellcode embedded in Javascript malware. Most shellcode is stored
in unicode sequences like this:
%u4141%u4141%u8366%ufce4%uebfc ...

The data analysis part of JSDetox can be used to parse strings like these and extract the shellcode. The
obtained shellcode can be viewed as classic hexdump or disassembled code.
Many shellcodes contain data (in most cases a URL to download the real malware) that is encrypted
with a small XOR loop the analysis function scans for these and shows possible matches.
Lets take a closer look into the JSDetox usage. You can find some screencasts generated by the JSDetox website over here blackhole exploit kit sceencast.
The JSDetox website is http://relentless-coding.org/projects/jsdetox.
MALZILLA
Web pages that contain exploits often use a series of redirects and obfuscated code to make it more difficult for somebody to follow. MalZilla is a useful program for use in exploring malicious pages. It allows
you to choose your own user agent and referrer, and has the ability to use proxies. It shows you the full
source of webpages and all the HTTP headers. It gives you various decoders to try and deobfuscate javascript aswell.
This Tutorial is part of the malzilla website and cover an usage of malicious javascript analysis.
Lets take a look at the following picture (click for larger version):

This is the content from a link that Ive received in a eCard spam mail.

47

www.eForensicsMag.com

To get the direct link to malware, we have to deal just with theunescapeJavaScript function.
Thats not a big deal, but not all the data supplied to theunescapefunction need to be unescaped, so,
if we are doing it manually, we would need to take care which part need to be unescaped and which part
should remain as it is.
Click onSend script to Decoder, andRun scripton Decoder tab after that brings the following results:

In the lower box on Decoder tab one can see the results a VBScript used to download and
run the malicious file.
In the next example we are dealing with a script that writes directly to a binary file, no downloading.

48

www.eForensicsMag.com

As the script is written in VBScript, which cant be interpreted by SpiderMonkey engine, we will use
some other Malzillas functions.
First, we will copy the data from the script to the box on Misc Decoders tab:

If you take a look at the first picture from this example, you can see that theMZsignature is written to
the file in one step, and the rest of the data in second step.
We will do it in one single step. On previous picture I have added\u4D5Aat the beginning of the code
sequence, which is the ASCII representation of the lettersMZ.
Dont get fooled by\umarks in the sequence, this data sequence has nothing to do with Unicode, as
the data is not a text, but just a data that will be written to a EXE file.
Now, we need to enter\uinOverride default delimiterbox, as the next used function would expect the
delimiter to be%u, and not\u.
After clicking onUCS2 To Hexbutton, we will have the following situation:

49

www.eForensicsMag.com

Now, we will click onHex To Fileand save the result as a binary file on the HDD.
Checking the resulting file on Virustotal.com reveals the following:

The next example uses more complicated transformations and math. functions for decoding the data.

50

www.eForensicsMag.com

Functioneval()is used to run the result of the decoded data as the result of decoding is also a script:

We will change theeval() function todocument.write()(deprecated in Malzilla 0.9.2, eval() is intercepted automatically), as we actually want to see the script, not to run it.

51

www.eForensicsMag.com

The result of running the script is a VBScript:

As you can see, we have a sequence of Unicode codes that needs to be escaped.
We will copy/paste the sequence to the box on Misc Decoders tab, and useDecode UCS2to see what
is hiding there:

52

www.eForensicsMag.com

The result of unescaping is a shellcode, and the download address of the malware file is visible.
The shellcode looks broken because I didnt bother to remove quotation and addition marks before
clicking onDecode UCS2.
As all I want is the link, I do not care about theshellcode.
Next example is a bit more complicated than previous examples.
It is using a script known asdF(after the name of the variable often used in this script, which is changed
tozXin our sample):

53

www.eForensicsMag.com

After clicking onSend Script To Decoderand running the script, we will have the following situation:

As you can see, just the first part of the script is decoded (selected text on the picture, just for the purpose of explaining what is decoded and what isnt).
Now, we will select the decoded script (just the script, without <script> tags):

54

www.eForensicsMag.com

and copy it over the original part (which is now decoded):

Now, we will run the whole script again. As can be seen on the next picture, the result is also a script:

55

www.eForensicsMag.com

Clear the upper box, in lower box select the script between the <script> tags and copy it to the upper box:

After running the script again, we have the following results:

56

www.eForensicsMag.com

Scrolling down a bit reveals a sequence of Unicode codes that need to be unescaped:

Doing like in the previous example (copy the data to Misc Decoders tab and using Decode UCS2 button), we will get the following:

As it can be seen, it is another shellcode with a plaintext link to the malicious file.

57

www.eForensicsMag.com

And one more example of usage:

The transformations used for decoding would take a lot of time if someone would try to decode the
datamanually.
In Malzilla you can just click onSend Script To Decoder, andRun scripton Decoder tab after that, and
you have the following:

58

www.eForensicsMag.com

The link you can see on the picture is a direct link for downloading the malicious file.
More examples of Malzillas usage can be found over here: http://malzilla.sourceforge.net/documents.html.
You can grab your copy of Malzilla here: http://malzilla.sourceforge.net/downloads.html.
I personally recommend you this lecture: https://isc.sans.edu/diary/Advanced+obfuscated+JavaScript
+analysis/4246.

ANDROID MALWARE ANALYSIS

Android is an open-source mobile operation system. It is now being developed by Google and is based
on a Linux kernel. The applications are written in Java and are transformed into a slightly different format
known as Dalvik.
The apps are then run in the Dalvik virtual machine, which provides a layer of abstraction over the real
hardware. This way most applications can be run on any Hardware as long as the API of the Operating
system meets the requirements of the app. Besides the Java part native code can be used. This needs
to be provided along with the application and must be compiled for all target platforms. The native code
should mainly be used for computation intensive tasks like graphic rendering.
Below the Dalvik VM lies the Linux kernel, which provides hardware abstraction and rights management. Using Linux users and groups enforces the permissions requested by the Application, so far every
malware known had to acquire needed access rights the official way.
Android applications are packed in the format apk, which is a ZIP archive containing the AndroidManifest.xml, resources like media files, the actual code as classes.dex and some other optional files. The
XML provides the Android system with important information like which class to use when starting the
app and what permissions are needed. Only permissions listed in this file will be provided to the application, if it tries to use any other the call will either fail or return an empty result.
When installing an application these permissions are shown to the user, who must make sure that he
reviews them to prevent malicious apps from accessing important data or being installed in the first place.
The code is contained in classes.dex, which is a collection of all compiled classes. Instead of the regular
format used in .jars all classes are packed into one file which saves some space on the mobile device.
As 97% of all mobile malware development are focused in Android platform, its important to know some
tricks to analyze possible malicious software.

ANDROWARN

Yet another static code analyzer for malicious Android applications.


Androwarn is a tool whose main aim is to detect and warn the user about potential malicious behaviours developped by an Android application.
The detection is performed with the static analysis of the applications Dalvik bytecode, represented
as Smali.
This analysis leads to the generation of a report, according to a technical detail level chosen from the user.
FEATURES
Structural and data flow analysis of the bytecode targeting different malicious behaviours categories
Telephony identifiers exfiltration: IMEI, IMSI, MCC, MNC, LAC, CID, operators name...
Device settings exfiltration: software version, usage statistics, system settings, logs...
Geolocation information leakage: GPS/WiFi geolocation...
Connection interfaces information exfiltration: WiFi credentials, Bluetooth MAC adress...
Telephony services abuse: premium SMS sending, phone call composition...
Audio/video flow interception: call recording, video capture...
Remote connection establishment: socket open call, Bluetooth pairing, APN settings edit...
59

www.eForensicsMag.com

PIM data leakage: contacts, calendar, SMS, mails...


External memory operations: file access on SD card...
PIM data modification: add/delete contacts, calendar events...
Arbitrary code execution: native code using JNI, UNIX command, privilege escalation...
Denial of Service: event notification deactivation, file deletion, process killing, virtual keyboard
disable, terminal shutdown/reboot...
Report generation according to several detail levels
Essential (-v 1) for newbies
Advanced (-v 2)
Expert (-v 3)
Report generation according to several formats
plaintext (TXT)
formatted text (HTML) from a Bootstrap template
USAGE
python androwarn.py -i my_application_to_be_analyzed.apk -r html -v 3

python androwarn.py -hto see full options


By default, the report is generated in theReportfolder
Lets take a look into a usage example of Androwarn!
Here is the link of the malware for your Exercise: http://download1843.mediafire.com/68764t6fezcg/
uacc7zcfn4c7x5z/com.Beauty.Girl-1.zip.
The password is infected.
For your exercise I recommend you to download some Malicious Android Samples from the site Contagio: http://contagiodump.blogspot.com.br/2011/03/take-sample-leave-sample-mobile-malware.html.

Here you can have a huge list of malware to download. Grab some of then and thrown inside the Androwarn to see the results.
As I said in the video, you can also use the online Android Sandbox.
http://androidsandbox.net/index.html
Android Sandbox, is a service that allows the dynamic and static analysis of mobile applications. You
may use this platform free of charge and see all activities of applications, comment on these and add
your analysis. The service allows you to use synchronously its own 200.000+ malware signature database in addition with online anti-virus scanning services of major providers. Also, the R&D team at Balich
IT discovers malware signatures and publishes analytical reports on these.

60

www.eForensicsMag.com

ON THE UPLOAD PAGE YOU CAN CHOSE THE ANALYSIS CRITERIA OF THE MOBILE APPLICATIO
Deobfuscate *.classes:Allows you to de-obfuscate, obfuscated codes.
URL Scan:Allows you analyse all URL information contained within the application on the fly and
checks for blacklisting via Virustotal.
Network Dump:Allows you to download all network activity of the uploaded application captured on
Android Sandbox as a *.pcap file.
Emu-ScriptAllows you to set key combinations to be executed during the analysis.
Password:Allows you to set a password for your analysis report output in order to limit access.
YOU CAN UPLOAD YOUR FILE FOR ANALYSIS BY USING THE +APKS BUTTON IN ORDER TO UPLOAD
THE FILE TO ANDROID SANDBOX.

61

www.eForensicsMag.com

YOU CAN ACCESS THE ANALYSIS REPORTS THROUGH THE REPORTS PAGE. YOU CAN EITHER
READ YOUR REPORT ONLINE BY CLICKING ON THE READ REPORT BUTTON OR DOWNLOAD YOUR
REPORT IN WORD FORMAT BY CLICKING ON THE DOWNLOAD REPORT BUTTON

ANALYSIS OF PDF / MICROSOFT OFFICE MALWARE


PDF files have become very common in everyday work. Its hard to imagine business proposals without
PDFs. The PDF format is used in almost all companies to share business deals, company brochures,
and even invitations.
Previous years were not good for PDF users, as several vulnerabilities were published, such as buffer
overflow vulnerability in versions prior to version 9. A lot of the attacks were observed trying to abuse the
bug by using social engineering or by hosting malicious PDF files on the Internet. Just the simple act of
opening the PDF file could exploit a vulnerability to automatically download malicious code from the internet, and display a decoy PDF file to trick you into believing that nothing wrong has happened.
WEPAWET
Wepawet is a service for detecting and analyzing web-based malware. It currently handles Flash, JavaScript, and PDF files. To use Wepawet, just go tohttp://wepawet.iseclab.org. Upload a sample or specify
a URL and the resource will be analyzed and a report will be generated.
PDF EXAMINER
PDF Examiner by Malware Tracker is able to scan the uploaded PDF for several known exploits and it
allows the user to explore the structure of the file, as well as examining, decoding, and dumping PDF object contents. This tool lends itself well to manual PDF analysis tasks. Go to www.malwaretracker.com.
Press the tab of+Pdf examiner scan tab and select the PDF to scan.

Balbuzardis a package of malware analysis tools in python to extract patterns of interest from suspicious files (IP addresses, domain names, known file headers, interesting strings, etc). It can also crack
malware obfuscation such as XOR, ROL, etc by bruteforcing and checking for those patterns.
https://bitbucket.org/decalage/balbuzard/wiki/Home

62

www.eForensicsMag.com

OfficeMalScanner v0.5 is a Ms Office forensic tool to scan for malicious traces, like shellcode heuristics, PE-files or embedded OLE streams. Found files are being extracted to disk. It supports disassembly and hexview as well as an easy brute force mode to detect encrypted files. Next to this, an office file
is being scanned for VB-macro code and if found, it will be extracted for further analysis. The inflate
feature extracts Ms Office 2007 documents into a directory and marks potentially malicious files. Also
included in this package is a tool called MalHost-Setup, some kind of MS Office runtime emulation environment to debug shellcode in malicious documents in realtime.
http://www.reconstructer.org/code.html
Here you have the file to the PDF analysis exercise: http://tinyurl.com/mgq9gtg.
Here you have the file to the Office Analysis exercise: http://tinyurl.com/mnh77n3.
The password to decompress the files is infected.
EXTRA SOME USEFUL AND NICE STUFF
Despite the main tools that we learn in this module, REMnux has much more interesting scripts and minor
programs that you can enjoy and that could be really helpful. Lets take a closer look onto some of them.
FINAL WORDS
Hope you all had enjoyed this starter kit course. As the name says: This is only the beginning of Malware
analysis study. Now you can go further to study new process and tools that will expand your capabilities
and make you evolve in this area.
There is some material that I can recommend:

http://www.amazon.com/Malware-Analysts-Cookbook-DVD-Techniques/dp/0470613033

63

www.eForensicsMag.com

http://www.amazon.com/Practical-Malware-Analysis-Dissecting-Malicious/dp/1593272901
In my humble opinion, those are the best books for learning malware analysis available on Amazon.
Keep going and search the knowledge!

Best regards.
ANDERSON TAMBORIM
Anderson Tamborim is an Information Security Specialist with more than 12 years of experience in the field. He possesses
huge expertise in Penetration Testing on corporate environment, developing advanced techniques to bypass security devices
like IDS/IPS, firewalls, content filters and endpoint security systems (antivirus, antimalware, hids, etc.). Today Anderson works
as a Security Researcher and Lead Penetration Testing at NextLayer Security Solutions.

64

www.eForensicsMag.com

SharePoint is at the Crossroads


Which Way Will You Go?
SharePoint in the cloud or on premises? Or both? Come to SPTechCon Austin
2015 and learn about the differences between Office 365, cloud-hosted
SharePoint, on-premises SharePoint, and hybrid solutions and build your
companys SharePoint Roadmap!
For developers, the future means a new app model and new app paradigms.
For IT pros and SharePoint admins, its trying to retain control over an installation thats now in the cloud. For information workers and their managers, its
about learning how to work social. But its not for everyone.
Where do you need to be?
The answer is simple: SPTechCon Austin. With a collection of the top
SharePoint MVPs and expert speakers, more than 80 classes and tutorials
to choose from and panels focused on the changes in SharePoint,
SPTechCon will teach you how to master the present and plan for the future.

Migrate to SharePoint 2013! Prepare for Office 365!


Build Your Hybrid Model!
A BZ Media Event
SPTechCon is a trademark of BZ Media LLC. SharePoint is a registered trademark of Microsoft.

February 8 -11, 2015


Renaissance Austin Hotel

80+ Classes
40+ Microsoft Expert
Speakers
Get Your Texas-Sized
Registration Discount
Register NOW!

www.sptechcon.com

Big Data Gets Real in Boston!


People are talking about
BigData TechCon!

April 26-28, 2015


Seaport World Trade Center Hotel
Big Data TechCon is a great learning
experience and very intensive.

Huaxia Rui, Assistant Professor,


University of Rochester

Get some sleep beforehand,


and divide and conquer the packed
schedule with colleagues.

Paul Reed, Technology Strategy & Innovation, FIS

Choose from 55+ classes and tutorials!


Big Data TechCon is the HOW-TO technical conference
for professionals implementing Big Data solutions
at their company

Worthwhile, technical, and a breath of


fresh air.

Julian Gottesman, CIO, DRA Imaging

Come to Big Data TechCon to learn the best ways to:


Process and analyze the real-time data pouring into your organization
Learn how to extract better data analytics and predictive analysis
to produce the kind of actionable information and reports your
organization needs.
Come up to speed on the latest Big Data technologies like Yarn, Hadoop,
Apache Spark and Cascading
Understand HOW to leverage Big Data to help your organization today

Big Data TechCon is definitely worth the


investment.

Sunil Epari, Solutions Architect, Epari Inc.

www.BigDataTechCon.com
Big Data TechCon is a trademark of BZ Media LLC.

A BZ Media Event