Professional Documents
Culture Documents
Malware analysis
Pavel Laskov
Wilhelm Schickard Institute for Computer Science
What do we want to know about malware?
Experiment:
Current instances of malware were collected from a
Nepenthes honeypot.
Files were scanned with Avira AntiVir.
Results:
First scan: Second scan:
undetected undetected
24% 15%
detected detected
76% 85%
Sandboxes
Behavior analysers
Clustering and classification methods
Signature generation tools
CWSandbox: key ideas
Malware
first
Kernel32.dll-CreateFileA (*without* Hook):
instr
API hooking: intercept Windows 77E8C1F7
77E8C1F8
PUSH ebp
MOV ebp, esp
can
them
API calls by overwriting API 77E8C1FA
77E8C1FD
77E8C202
PUSH SS:[ebp+8]
CALL +$0000d265
TEST eax, eax
over
the
77E8C1FD
functions’ code loaded in the …
77E8C226
JNZ
…
RET
with
orig
whe
process memory. (a)
hoo
nece
Kernel32.dll-CreateFileA (*with* Hook):
deta
DLL injection: insertion of hook 77E8C1F7
77E8C1FD
JMP [CreatFileA-Hook]
CALL +$0000d265
offer
resea
77E8C202
functions into the running 77E8C1FD
…
77E8C226
TEST eax, eax
JNZ +$05
… 1 tion
T
RET
processes. Application.CreatFileA-Hook:
level
sider
exist
2005EDB7 -custom hook code-
• suspend one of the target application’s running threads, Figure 2. CWSandbox overview. CWSandbox.exe creates a new
copy the to-be-executed code into the target’s address process image for the to-be-analyzed malware binary and then injects
During the initialization of a malware binary cwmonitor.dll
space, set the resumed thread’s instruction pointer to the
copied code’s location, and then resume the thread; or
the cwmonitor.dll into the target application’s address space. With
the help of the DLL, we perform API hooking and send all observed
is injected into its memory to carry out API hooking.
• copy the to-be-executed code into the target’s address behavior via the communication channel back to cwsandbox.exe. We
space and create a new thread in the target process with use the same procedure for child or infected processes.
the code location as the start address. DLL intercepts all API calls and reports them to CWSandbox.
With these building blocks in place, it’s now possible to
The samethe
inject and execute code into another process.
monitored API calls. The report provides data for
procedure is repeated for any child or infected
each process and its associated actions—one subsection
process. work operations,
The most popular technique is DLL injection, in
which the CWSandbox puts all custom code into a DLL
for all accesses to the file system and another for all net-
for example. One of our focuses is on
and the hook function directs the target process to load bot analysis, so we spent considerable effort on extracting
this DLL into its memory. Thus, DLL injection fulfills and evaluating the network connection data.
both requirements for API hooking: the custom hook After it analyzes the API calls’ parameters, the sand-
functions are loaded into the target’s address space, and box routes them back to their original API functions.
the API hooks are installed in the DLL’s initialization rou- Therefore, it doesn’t block the malware from integrating
Case study: Malware classification
Goals:
Detect variations of known malware families.
Detect previusly unknown families.
Determine essential features of specific families.
Main ideas:
Use AV scanners to assign labels to malware binaries.
Use machine learning to classify binaries among known
malware families.
model
label
AV scanner Learning
Malware classification: an overview
Report corpus
Acquisition of training data: creation of
balanced data sets for multi-class Acquisition of
classification. training data
Report corpus
label
Malware classification: feature extraction
0.8 3 0.8
4
5
0.6 6 0.6
7
0.5
8
0.4 9 0.4
10 0.3
11
0.2 12 0.2
13 0.1
14
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Malware families Predicted malware families
4
0.6 7 0.6
8 0.5
0.4 9 0.4
10 0.3
11
0.2 0.2
13
0.1
14
0
1 2 3 4 7 8 9 10 11 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Malware families Predicted malware families
Operational goals:
Generation of signatures for malware communicatioin
Detection of “drop-zones” and compromized machines
Internet
Signature
Bot master generation
Regular traffic
Repetitve execution
(Sandnet)
Execution of malware in a sandnet
Procedure:
Signatures are generated for 43 classes of malware collected
from a honeypot.
Test traffic is generated by running the same malware again
and merging it with normal traffic from two sources (DARPA
and University of Erlangen).
Results:
DARPA Erlangen
Average detection rate 0.81 0.72
Average false alarm rate 0 0.51 · 10−6
Online evaluation of Botzilla
Procedure:
Generated signatures were deployed on a large network at the
University of Erlangen (ca. 50,000 hosts) using a Vermont
network monitorin sensor.
Traffic statistics: ca. 840M flows in 4 days.
Alerts were manually evaluated.
Results:
# malicious flows detected 219
# false alarms 295
Detection rate 0.43
False alarm rate 3.51 · 10−7
Signature examples
Banload keylogger:
Storm worm:
Lessons learned
Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel
Laskov.
Learning and classification of malware behavior.
In Detection of Intrusions and Malware, and Vulnerability Assessment, Proc.
of 5th DIMVA Conference, pages 108–125, 2008.
Konrad Rieck, Guido Schwenk, Tobias Limmer, Thorsten Holz, and Pavel
Laskov.
Botzilla: Detecting the phoning homeöf malicious software.
In Proc. of 25th ACM Symposium on Applied Computing (SAC), March
2010.
(to appear).