MARD Framework Detects Metamorphic Malware in Real-Time

MARD: A Framework for Metamorphic Malware Analysis and Real-Time Detection
Shahid Alam and R. Nigel Horspool Issa Traore

Department of CS, University of Victoria Department of ECE, University of Victoria
Victoria, BC, V8P5C2, Canada Victoria, BC, V8P5C2, Canada
{salam, nigelh}@cs.uvic.ca itraore@ece.uvic.ca
Abstract—Because of the financial and other gains attached up a keyboard logger and stealing personal information etc.
with the growing malware industry, there is a need to automate Antimalware software detects and neutralizes the effects
the process of malware analysis and provide real-time malware of a malware. There are two basic detection techniques
detection. To hide a malware, obfuscation techniques are used.
One such technique is metamorphism encoding that mutates [13]: anomaly-based and signature-based. Anomaly-based
the dynamic binary code and changes the opcode with every detection technique uses the knowledge of the behavior of
run to avoid detection. This makes malware difficult to detect a normal program to decide if the program under inspection
in real-time and generally requires a behavioral signature for is malicious or not. Signature-based detection technique
detection. In this paper we present a new framework called uses the characteristics of a malicious program to decide
MARD for Metamorphic Malware Analysis and Real-Time
Detection, to protect the end points that are often the last if the program under inspection is malicious or not. Each
defense, against metamorphic malware. MARD provides: (1) of the techniques can be performed statically (before the
automation (2) platform independence (3) optimizations for program executes), dynamically (during or after the program
real-time performance and (4) modularity. We also present a execution) or both statically and dynamically (hybrid).
comparison of MARD with other such recent efforts. Experi- Detecting whether a given program is a malware is
mental evaluation of MARD achieves a detection rate of 99.6%
and a false positive rate of 4%. an undecidable problem [6], [16]. Antimalware software
detection techniques are limited by this theoretical result.
Keywords-End Point Security; Metamorphism; Malware Malware writers exploit this limitation to avoid detection.
Analysis and Detection; Control Flow Analysis; Automation;
In the early days the malware writers were hobbyists
but now the professionals have become part of this group
I. I NTRODUCTION
because of the financial gains [5] attached to it.
End point protection is often the last defense against a One of the basic techniques used by malware writers
security threat. An end point can be a desktop, a server, a is obfuscation [15]. Such a technique obscures a code to
laptop, a kiosk or a mobile device that connects to a net- make it difficult to understand, analyze and detect malware
work (Internet). Recent statistics by the ITU (International embedded in the code.
Telecommunications Union) [14] show that the number of Initial obfuscators were simple and were detected by
Internet users in the world have increased from 20% in 2006 simple signature-based detectors. To counter these detectors
to 35% (almost 2 billion in total) in 2011. According to the obfuscation techniques have evolved in sophistication
the 2011 Symantec Internet security threat report [8] there and diversity. Currently there are three categories of
was an 81% increase in the malware attacks over 2010, and obfuscation techniques [21]: packing, polymorphism and
403 million new malware were created, a 41% increase over metamorphism.
2010. In 2012 there was a 42% increase in the malware
attacks over 2011. With these increases and the anticipated Packing is a technique where a malware is packed
future increases, these end points pose a new security (compressed) to avoid detection. Unpacking needs to
challenge [23] to the security professionals and researchers be done before the malware can be detected. Current
in industry and in academia, to devise new methods and antimalware tools normally use entropy analysis [21] to
techniques for malware detection and protection. detect packing, but to unpack a program they must know
A broad definition of malware, also called malicious code, the packing algorithm used to pack the program. Packing
is used in the literature that includes viruses, worms, spy- is also used by legitimate software companies to distribute
wares and trojans. Here we use one of the earliest definitions and deploy their software products.
by Gary McGraw and Greg Morrisett [19]: Malicious code
is any code added, changed, or removed from a software Polymorphism is an encryption technique that mutates
system in order to intentionally cause harm or subvert the the static binary code to avoid detection. When an infected
intended function of the system. A malware carries out program executes, the malware is decrypted and written
activities such as setting up a back door for a bot, setting to memory for execution. With each run of the infected
program, a new version of the malware is encrypted and how parallelization and CFG reduction reduces the runtime
stored for the next run. This results in a different malware of MARD. In Section IV we evaluate the performance
signature with each new run of the program. The changed of MARD using an existing malware dataset. Section V
malware keeps the same functionality (i.e the opcode is describes the most recent efforts of detecting metamorphic
semantically the same for each instance). It is possible malware and compares them to MARD. We conclude in
for a signature-based technique to detect this similarity of Section VI.
signatures at runtime.
II. T HE MARD F RAMEWORK
Metamorphism is a technique that mutates the dynamic Our primary goal is to detect automatically in real-time
binary code to avoid detection. It changes the opcode with both known and unknown malware. We present in this sec-
each run of the infected program and does not use any tion the design principles underlying the MARD framework
encryption or decryption. The malware never keeps the and describes the proposed malware detection approach.
same sequence of opcodes in the memory. This is also
called dynamic code obfuscation. There are two kinds of A. MARD Design
metamorphic malware defined in [21] based on the channel Most existing malware use binaries to infiltrate a computer
of communication used: Closed-world malware, that do system. Binary analysis is the process of automatically
not rely on external communication and can generate the analysing the structure and behavior of a binary program.
newly mutated code using either a binary transformer or a We use binary analysis for malware detection.
metalanguage. Open-world malware, that can communicate To achieve platform independence, automation and opti-
with other sites on the Internet and update themselves with mization a binary program is first disassembled and trans-
new features. lated to an intermediate language. We introduce in [3], a new
intermediate language for malware analysis named MAIL.
As is clear from the above discussion out of the three MAIL is the acronym for Malware Analysis Intermediate
malware groups mentioned above, metamorphic malware are Language. In [2], we explain in detail with examples how to
getting more complex and pose a special threat and new translate a x86 and an ARM assembly program into a MAIL
challenges to the end point security. We propose in this paper program. For indirect jumps and calls (branches whose target
a new framework for real-time detection of metamorphic is unknown or cannot be determined by static analysis) only
malware using subgraph isomorphism. The proposed frame- a change in the source code can change them, so it is safe
work is named MARD for Metamorphic Malware Analysis to ignore these branches for malware analysis where the
and Real-Time Detection. change is only carried out in the machine (binary) code.
To provide continuous protection to an end point a secu- We ignore indirect jumps and mark them as UNKNOWN
rity software needs to be operated and threats need to be when translating them to MAIL. The MAIL program is then
detected in real-time. Antimalware provides protection from annotated with patterns. We then build a CFG (control flow
malware in two ways: graph) of the annotated MAIL program.
1) They can provide protection by detecting a malware Some of the major characteristics and advantages of the
before the software is installed. All the incoming MARD framework are as follows:
network traffic is monitored and scanned for malware.
Depending on the methods used this continuous mon- Platform independence: To make modern compilers [1],
itoring and scanning slows down a computer consider- [20] platform independent they are divided into two major
ably, which is not practical and desirable. This is one components: a front end and a back end. The design of the
of the main reasons this type of protection is not very MARD framework follows the same principle. In compilers
popular. the same C++ front end can be used with different back
2) They can provide protection by detecting a malware ends to generate code for different platforms such as x86,
during or after the software installation. A user can ARM and PowerPC etc. In the case of MARD the same
scan different files and parts of the computer as and back end can be used with different front ends to detect
when he/she desires. This type of protection is much malware for different platforms (Windows and Linux etc).
easier to use and is more popular. For example, we can implement a front end for the PE
In this paper our emphasis is on the second option. The (Windows executable) files and another front end for the
general problem of detecting metamorphic malware is NP- ELF (Linux executable) files. Both these front ends generate
complete [26]. That means there exist some metamorphic their output in our intermediate language (i.e. MAIL) that
malware whose detection is NP-complete. is used by the back end. Programs compiled for different
architectures such as Intel x86 and ARM (the two most
The remainder of the paper is organized as follows. popular architectures) can be translated to MAIL. Examples
Section II describes MARD in detail. In Section III we show of such translations are described in [2]. So we only need
to implement one back end that is able to perform analysis Data Set
and detect malware using MAIL. The use of MAIL in our

design keeps the front end completely separate from the MAIL
Report
back end and therefore provides an opportunity for platform Front End
(Malware Analysis Intermediate Language)
Back End
independence.
Template
Malware
Optimization: The main purpose of the optimizations Templates
in the front end is to reduce the number and complexity

of assembly instructions for malware analysis performed Binary program MAIL1
by the back end. We achieve this by: (1) Removing the
ACFG2
unnecessary instructions that are not required for malware Unpacker
Generator
analysis such as NOP instructions etc. (2) Generating an Unpacked binary ACFG2
optimized intermediate representation using MAIL, which

Disassembler Graph Miner
provides a high level representation of the disassembled
binary program. MAIL includes specific information such Assembly instructions Mined graph
as control flow information, function/API calls and patterns Similarity

Template
Optimizer
etc, for easier and optimized analysis and detection of Detector
malware. We also use parallelization and CFG reduction to Optimized code Results
optimize the runtime of MARD. MAIL1 Report

Generator Generator
MAIL1 Report
B. Malware Detection 1 MAIL = Malware Analysis Intermediate Language

2 ACFG = Annotated Control Flow Graph
After a program sample is translated to MAIL, an an- NOTE: The component “Unpacker” is not implemented in this version of the Malware Detector
notated control flow graph (CFG) for each function in

the program is built. Instead of using one large CFG as Figure 1. High Level Overview of the MARD Framework
signature, we divide a program into smaller CFGs, with one
CFG per function. A program signature is then represented
by the set of corresponding (smaller) CFGs. A program
that contains part of the control flow of a training malware CFG of a program is usually a sparse graph, therefore
sample, is classified as a malware, i.e. if a percentage it is possible to compute the isomorphism of two CFGs
(compared to some predefined threshold) of the number in a reasonable amount of time. An example of subgraph
of CFGs involved in a malware signature match with the matching is shown in Figure 2.
signature of a program then the program is classified as After the binary analysis we obtain a set of CFGs (each
a malware. Using a simple threshold-based classifier also corresponding to separate function) of a program. To detect
considerably reduces the runtime of MARD and facilitates if a program contains a malware we compare the CFGs of
real-time detection. the program with the CFGs of known malware samples from
Figure 1 gives an overview of the MARD framework. our training database. If a percentage of the CFGs of the
First, a training dataset is built, also called Malware Tem- program, greater than a predefined threshold, match one or
plates in Figure 1, using the malware training samples. After several of the CFGs of a malware sample (from the database)
a program (sample) is translated to MAIL and to ACFGs then the program will be classified as a malware.
(annotated CFGs) the Similarity Detector (Figure 1) detects
the presence of malware in the program, using the Malware 2) Pattern Matching: Very small graphs when matched
Templates as described above. All the steps as shown in against a large graph can produce a false positive. Likewise
Figure 1 are completely automated. to alleviate the impact of small graphs on detection accuracy,
Similarity detector in MARD is comprised of two sub- we integrate a Pattern Matching sub-component within the
components: a subgraph matching component and a pattern Subgraph Matching component. Every statement in MAIL
matching component. We describe these two components in is assigned a pattern as explained in [3]. If a CFG of a
the following. malware sample matches with a CFG of a program (i.e. the
1) Subgraph Matching: In our malware detection ap- two CFGs are isomorphic), then we further use the patterns,
proach, graph matching is defined in terms of subgraph assigned to MAIL statements, to match each statement in
isomorphism. Given the input of two graphs, subgraph the matching nodes of the two CFGs. A successful match
isomorphism determines if one of the graphs contains a requires all the statements in the matching nodes to have the
subgraph that is isomorphic to the other graph. Generally, same (exact) patterns, although there could be differences in
subgraph isomorphism is an NP-Complete problem [7]. A the corresponding statement blocks.
this ubiquitousness of multicores in the host machines (also
entry called the end points) and to optimize the runtime, we
decided to use Windows threads to parallelize the Subgraph
Matching component.
a
We carried out an experiment using different number
of threads ranging from 2 to 250. We used 250 malware
entry b samples and 30 benign applications each with a different size
of CFG. The main reason for this experiment was to develop
an empirical model to compute the number of threads to be
a c used in the Subgraph Matching component. The experiment
was run on machines with 2 and 4 Cores. Details of the
b d e
machines used and the results of this experiment are shown
in Table I. The percentage of CPU utilization on average
was 3 times more with threads than without threads. This is
c f g h also confirmed by the improvement in the runtime.
Table I
d i j RUNTIME OPTIMIZATION BY PARALLELIZING THE Subgraph
Matching COMPONENT USING DIFFERENT NUMBER OF THREADS
e k m 2 Cores 4 Cores
Number of Runtime Number of Runtime
Threads Reduced By Threads Reduced By
exit exit
2 3.20 times 4 3.98 times
4 5.92 times 8 4.11 times
(a) A malware (b) The malware embedded in-
sample side a benign program 8 7.20 times 32 5.95 times
Figure 2. An example of subgraph matching. The graph in Figure (a) 16 5.72 times 64 7.76 times
is matched as a subgraph of the graph in Figure (b). 32 5.64 times 128 7.53 times
250 5.41 times 250 6.37 times
III. RUNTIME O PTIMIZATION 8 14.87 times 64 14.53 times
There are two components in the MARD framework that Machines used:
consumes most of its runtime; one is the Translator (part of Intel Core i5 CPU M 430 (2 Cores) @ 2.27 GHz with 4GB of RAM
the MAIL Generator in Figure 1) and the other is the Sub- and Windows 8 Professional installed
Intel Core 2 Quad CPU Q6700 (4 Cores) @ 2.67 GHz with 4GB of
graph Matching (part of the Similarity Detector in Figure 1). RAM and Windows 7 Professional installed
The Translator reads each assembly instruction one by one
and translates it to the MAIL language. The translation time
increases with the size of the binary, and provide limited As we increased the number of threads the reduction
avenue for optimization. The Subgraph Matching component in the runtime also increased upto a certain number (8 in
matches the CFG against all the malware sample graphs. As 2 Cores and 64 in 4 Cores) of threads, after which the
the number of nodes in the graph increases the Subgraph reduction in the runtime started decreasing. As we increased
Matching runtime increases. The runtime also increases with the number of benign samples from 30 to 1387 (the last
the increase in the number of malware samples but provide row in Table I) the runtime only reduced by almost 2 times.
some options for optimization. We use two techniques to Beside other factors such as the number of CPUs/Cores and
improve the runtime of MARD, namely, parallelization and the memory available to the application, the other reasons
CFG reduction, described in the following. for this reduction in the runtime are the fact that more time
was spent on thread management and the increased sharing
A. Parallelization of information among threads.
Multicore processors are becoming popular nowadays. All Based on this experiment we developed Equation 1 to
the current desktops, laptops and even the energy efficient compute the maximum number of threads to be used by the
small mobile devices contain a multicore processor. Intel Subgraph Matching component. We also give an option to
has recently announced its Single-Chip Cloud Computer the user to choose the maximum number of threads used by
[18], a processor with 48 cores on a chip. Keeping in view the tool for the Subgraph Matching component.
b9
b10
b11 b13
b14
b12
b15
b35
3 b16
b185
T hreads = (N C) (1) b216

b280
b279
b270
b271
b273
b260
b34
b33
b32
b28
b19
b29
b17
b18
b31
b30
b184
b183
b182 b192 b194

b195 b196
b371
b243
b372
b373
b197
b190
b187
b188
b189
b207
b199
b244
b198 b245
b114 b325
b278 b191
b178
b303 b113 b181 b342
b272 b193
b179 b324 b246
b293 b301 b115
b120 b121 b247
b300 b277
b343
b215 b302
b275 b180 b251
where NC = Number of CPUs/Cores

b274 b116 b248
b298 b119 b177
b250
b431 b299 b176 b20
b276 b252
b118 b22
b141 b249
b481 b296 b433
b432 b21
b388 b23
b480 b140 b434 b117 b253
b297
b150 b387 b229 b259
b294 b255
b482 b139 b258
b479 b25
b149 b385 b254
b228 b24
Exit b386 b257
b148
b478 b398 b26 b27
b483
b147 b138
b477 b256 b157
b321 b227 b100
b397
b476 b158
b101
b137
B. CFG Reduction
b475 b396 b233 b110
b146
b1 b109
b145 b395
Entry
b2 b111 b112
b232
b0
b169
b142 b231 b3
b170 b165
b143
b171 b144 b4 b5 b8
b172 b136 b7
b77 b230 b91 b167 b6
b175 b166
b393 b78
b327 b168
b394 b76
b326
One of the advantages of using MAIL is that it provides

b389 b98
b74
b390
b94 b97
b73
b391 b392 b93
b92
b474 b72 b99
b262 b71 b96

b95
b292 b70
b291
b473
b366 b69 b87
b295
patterns for malware detection. Our detection method uses

b68 b86
b354 b88
b472
b67 b84
b89 b85
b463
b458 b365
b457 b133 b66
b456
b90
b306 b305 b65 b83
b132 b81 b82
b79 b80
b304
b186 b63
b364 b75
b62
b317
both subgraph matching and pattern matching techniques

b316 b61 b64
b314 b315 b46
b60
b135 b49
b129 b128 b48 b47 b41
b131 b59
b45 b44
b130 b43 b42
b58
b40 b37
b39 b38
b213 b57
b214
b36
b56
b205 b454
b212
for metamorphic malware detection. Even if we reduce the

b134 b55 b455
b206 b370 b369
b451
b54 b452
b211
b53 b453
b210 b209 b421 b423

b52
b208 b368 b51 b422

b367
b226
b334 b50 b406
b436 b323
b435 b333
b225
number of blocks in a CFG (it is possible for a CFG of some

b404 b403
b449 b448 b407
b450 b332 b405

b447
b331 b224
b446
b408
b330 b413
b223 b412
b310
b329
b222 b7 b9 b10 b25
b309 b400 b415 b5 b136
b307 b409 b53 b47
b399 b84
b13
b308 b338 b410 b1
binaries to be reduced to very small number of blocks) we

b328 b339 b337 b322 b401
b336 b2 b88 b46
b411 b6 b137
b341 b340 b402 b3 b44 b83
b335 b122 b92
b470 b15
b236 b4 b89
b383 b82
b39
b471 b382 b43 b139 b38
b237 b29 b123
b105 b67
b235 b234
b469 b40 b80 b37
b238 b381 b380 b81 b65
b62
b289 b384 b54
b468 b221 b42 b64
b160 b379 b63
b161 b439 b288 b41 b8
still get a good detection rate because of the combination of

b377 b106 b125
b140
b159 b290 b124 b55 b56
b440
b78
b441
b162
b360 b286 b93 b77 b127
b442 b141
b163 b438 b359 b284 b18 b126
b444 b419 b285
b75
b443 b420 b32 b17
b347 b282
b164 b437 b349 b358 b220 b115
b445 b361 b118
b103 b102
b418 b173 b117
b348 b127 b287 b281 b116
b105 b357 b114
b346 b374 b104 b113
b350 b362 b122 b219 b283 b31 b74 b107
the two techniques.

b417 b351 b106 b125 b174 b108
b30 b109
b356 b126 b76 b112 b66
b345 b375 b33
b414 b427 b218
b103 b363 b123 b138
b466 b104 b34 b111
b416 b108 b124 b378
b107 b87 b94
b355 b467 b344 b110 b21
b267 b376 b95
b268 b201 b20
b459 b204 b86
b311 b35 b12
b465 b428 b426 b102 b265 b19
b85
b241 b36 b73 b16
b464 b269 b200 b128
b266 b202 b101
b242 b352 b264 b203 b312 b119 b22
b461 b261 b60 b72
b429 b320 b318 b425
b156 b61
b98
To reduce the number of blocks (nodes) in a CFG for

b239
b240 b14 b97 b71 b129 b23 b28
b217 b313
b353 b462 b263 b70 b120
b154 b460 b430 b59 b27
b319 b424 b99 b45
b155 b69
b96 b130
b79 b24 b0
b100
b26 b90
b58 b134 b121
b153 b142
b11 Entry
b68
b131
b48 b144
b57 b91 b133 b135 b51
b152 b143
b49 Exit
b50
runtime optimization we carried out CFG reduction, also

b52
b151 b132
called CFG shrinking. We reduce the number of blocks in (a) Before shrinking (b) After shrink-
ing
a CFG by merging them together. Two blocks are merged
only if the merging does not change the control flow of a Figure 4. Example of a CFG, of one of the functions of one of the
program. samples of the MWOR class of malware, before and after shrinking.
The CFG has been reduced from 484 nodes to 145 nodes.
Given two blocks A and B in a CFG, if all the paths that
reach node B pass through block A, and all the children of b545
b546
b455
b232
b329
b297
b296
b544b543
b542
b435
b272
b270
b271
b44
b43
b41
b42
b328
A are reachable through B, then A and B will be merged.

b536
b233
b556 b541
b451 b436 b45
b454
b452 b273 b47
b40
b547
b206 b295
b327 b621
b537 b315
b978 b538 b540
b349
b351 b1010
b479 b555
b450 b437 b46
b459 b453 b982 b39
b448
b548
b549 b220 b627
b208
b294 b628
b410 b352 b1014
b207
b281 b222
b101 b308
b326
b316 b622 b307 b350 b1011
b477 b979b981 b539 b34 b313 b1019
b481 b478 b554 b439 b312
b460 b438 b38
b449 b983
b445 b245
b550 Entryb1 b1015
b56 b209
b221b293 b181
b219 b626
b629
b409 b170 b309 b1018
b475 b325 b285b282
b317
b168
b171 b623
b625
b180 b1100 b314
b480 b476 b250 b980 b118 b35 b311
b461 b21b553 b440 b37 b498 b246
b482 b497 b551 b444
b61 b108 b1016
b55 b0 b1017
Figures 3, 4 and 5 show examples of CFGs, from the

b57 b387 b284b283 b218b182b630
b292 b212 b189 b172 b127 b310
b486 b59 b474 b286
b324 b624b179 b1101
b484 b394 b318 b148
b117 b36 b20 b1381
b483 b462 b1131 b441b110
b496 b247 Exit
b487 b447 b552 b443b109 b17 b499 b1382
b535 b116
b60 b920
b388 b82
b33 b217b183b631
b291 b213 b188
b187 b173 b174 b19 b1380
b485 b65 b473 b323 b287 b178 b150 b503
b66 b393 b319 b154 b111 b1102 b1397
b458 b488 b495 b463 b1132 b248 b1383
b446 b442 b1104 b1396
b534 b64 b192 b500 b90
b522 b62 b115 b1379
b523 b32 b67 b190
b389 b83 b18
b30 b216b184 b186
b290 b214 b502
b489 b472 b322 b288
b320 b175 b177 b151 b1384
b915 b467 b392 b153 b1103 b249 b1395
b494 b464
b465 b112
b533 b468 b501 b89 b1394 b1378
b524 b63 b191 b114
b31 b1377
b456 b390 b1385
b916 b490 b493 b471 b215 b185
b289 b176 b84 b88 b1393 b1387
b466 b321 b152
b532 b87 b1376
b525 b919 b469 b391 b113 b1392
b457 b1373 b1388
b917
dataset described in this paper, before and after shrinking.

b491 b1386
b371 b1391
b526 b531 b492 b86 b1375
b914 b53
b196 b918 b1364 b1389 b58
b98 b85 b52
b470 b1390 b901 b298
b421 b97 b370
b530 b1374 b1034
b1250 b913 b527 b898 b54 b278
b902 b887 b1368 b1372
b1116 b897 b299
b1107 b528 b1079
b557 b1336 b1371 b900 b1035
b400 b96 b279
b912 b882
b408 b906 b1370 b300
b886 b1367 b896 b1080
b558 b1106 b881 b703 b899 b1036
b330 b401 b1115 b880 b529 b280
b911 b1271 b1086
b1066 b337 b1108 b1337
b407 b907 b885 b1369 b895
b331 b559 b404 b1105 b656 b1366 b1037 b1081
b402 b1114 b883 b702 b894
b336 b731 b1338 b1270 b1040 b1085
b1125 b332 b405 b910 b732 b1039
b1139 b908 b1082 b1060 b1059
b1126 b1038
Figures 3 and 4 are CFGs of functions of two different

b335 b1109 b884 b1365 b1061
b406 b1113 b1265 b1144 b1042 b1084
b1124 b1138 b1269
b333 b1083 b1062 b1058
b1127 b909 b1041
b144 b1110 b870 b1053
b1123 b334 b1345 b1044 b1057
b1111 b1266 b1145 b1047 b1054
b1128 b1268 b1052
b1344 b1045 b1056 b1055
b416 b145 b1137 b1112
b1129 b1043
b146 b1051
b1146
b415 b869 b1267
b147 b1136 b1050
b1130 b1341 b1049 b366
b414 b149 b1343
b890 b1133 b1048
b367 b1160
b1065 b413 b888 b1134 b868
b48 b893 b1135 b1342 b1046 b128
b1063 b411
b889 b1159
b368
b49 b1064 b412
b891 b1158
b50 b1195 b1140 b892 b369
b867 b968
malware samples and Figure 5 is a CFG of a function of

b1196 b1141 b1157
b51 b952 b965
b1194 b1150 b969 b977
b1219 b1142 b951 b967
b1189 b1143 b1340 b957 b966
b1220 b1149 b1148 b970 b976
b1193 b953
b866 b964
b1218 b1221 b1192 b971 b975
b434 b1339 b1147 b954 b974
b956 b963
b433 b1217 b1190
b955 b973 b962
b1213 b1191
b432 b743 b1216 b972
b1349 b1214 b865 b961
b1259 b403 b744
b1350 b1215 b959 b960
b1260 b1353
b1261 b1351 b1232
b1262 b1352 b958
b812
b1263
b755 b756
b1334 b1264 b723
b873
a benign sample. The shrinking does not change the shape

b1335 b722 b638
b1316 b811 b872 b669 b639
b724 b718 b721 b670
b1312 b1315 b757 b871 b640 b143
b720 b671 b641 b142
b1313 b687 b141
b1314 b693 b719 b717 b672 b642 b140
b751 b688 b673 b139
b692 b689 b716 b643
b750 b715 b674 b137
b749 b730 b810 b136
b691 b138 b135
b700 b729 b362
b701 b690 b363
b699 b819 b1246 b1249
b696 b1244 b1245
b695 b818 b820 b1243 b1247 b1248 b365
b698 b714 b1242 b364
b697 b806 b817 b821 b1241
b710 b694 b713 b861
b711 b712 b856
b858 b860
b859 b848 b653
b857 b849 b654 b662
b850 b655
b851 b659 b660 b661
b824 b852 b1229 b1230 b657 b658
b823 b805 b847 b1231
b862 b822 b853
b863 b813
b864
of a graph. As we can see in the Figures, the shapes of

b793 b794 b795 b796 b809 b854
b771 b772 b778 b797
b798 b814 b855 b816
b769 b815
b765 b804 b807
b801
b800 b808
b683 b684 b685 b836 b835
b705 b774 b831
b830
b704 b773
b706 b791 b829 b828
b747 b746 b789 b792 b827
b707 b803 b790 b825 b826
b745
b748
b1281 b153 b155
b752 b754 b1280
b1282
b753 b708 b1283
b736 b735 b709 b802
b734 b156
b152 b272 b154
b1277
b739 b1276
b740 b742 b768
b1165 b1164
b741 b737 b777 b1275 b124
b738 b767 b799 b1168 b1274 b1151 b1163
the graphs before and after shrinking are the same. More
b1279 b1162 b125 b12 b157
b766 b779 b1167 b1273
b667
b633
b632 b1278 b1166
b1161 b61b65b319b321b56b323b326b270b47b60b282
b666 b287 b284b41
b668 b686 b764 b1153 b1272 b355 b111b86
b109 b46b135
b1333 b728 b780 b1169 b127 b94
b665 b122 b121 b733 b837 b126 b105
b1332 b112 b158
b663 b727 b507 b114
b120 b726 b1170 b506 b267 b320 b57 b59b269b48b50
b874 b338b62b64 b283 b42 b33
b119 b725 b1171 b1173 b505 b356 b87b89 b32
b110 b45
b235 b875 b1172 b1175 b504 b128 b108 b286
b306 b876 b106 b160
b305 b877 b838 b1174 b113 b159 b26
b234 b268 b43 b100
b878 b10 b839 b131 b63 b58 b34 b134
b9 b83 b88 b49 b122
b879 b508 b103 b37 b132 b121
b8 b11 b515 b514 b8 b129 b35
b513 b102 b107 b44 b133 b120
b1322 b1225 b7 b27
b12 b775
b776 b1155 b104 b99 b18
b832
b833
b834 b512 b123
b160 b840 b1226 b1154 b516 b257 b17
b13 b161 b1320 b519 b130 b431 b29
b783 b1156 b509 b36 Entry b119
b15 b68 b1284 b781 b1222 b521 b263 b101
b14 b1323 b1321 b1223 b1211 b167 b84
b162 b159 b157 b841 b510 b517 b20
of these examples are available at http://www.cs.uvic.ca/

b16 b520 b24 b266 b188
b69 b1324 b784 b258 b28 b386
b2 b1212 b511 b518 b333 b118
b158 b782 b1209 b264 b285
b70 b1325 b75 b763 b788 b342 b378
b156 b1208 b336 b265 b0 b23 b115
b72 b74 b1235 b770 b842 b1224
b377 b1207 b334
b76 b761 b262 b51 b299
b1210 b1205 b341
b71 b762 b1287 b261
b378 b1319 b1234 b1179 b1204 b339 b116
b73 b787 b343 b335
b760 b1286 b1178 b364 b379
b379 b1206 b260
b843 b372 b340 b22 b117 b76
b1257 b78 b1318 b1233 b1288 b1177
b1258 b759 b363 b344 b330
b79 b1197 b130 b21
b155 b758 b373 b77
b77 b786 b362 b345 b31
b431 b1317 b1285 b1152 b1176 b331
b1256 b81 b1198 b129 b361 b223 b78
b80 b1289 b844 b360
b430 b380 b1292 b1119 b1203 b81
b79
b1291
b428 b1090 b1118 b1199 b365 b359 b30 b82
b381 b1255 b358
b1290 b785 b1120 b1201
b429 b366 b371 b222
∼salam/PhD/cfgs.html.
b427 b1089 b1202
b1254 b1294 b1293 b1071 b1091 b1117 b368 Exit
b1200 b369
b1121 b367
b610 b845 b1070 b1088 b433
b1295 b1237 b374 b370 b426
b1092 b432 b327
b611 b1253 b1296 b1072 b1122 b80
b1298 b1301 b1302 b1239 b1025 b424 b427
b609 b1331 b1330 b1238 b1069 b1087 b221 b328 b329
b1297 b1300 b1024 b425
b614 b1304 b1093
b1236 b1228 b1073 b172 b206 b163
b1306 b987
b1299 b1308 b1303 b1026 b1068 b259 b161 b205
b1309 b1361 b1023 b354 b162 b164
b1329 b1240 b949 b1029 b1094 b353 b168 b428
b1305 b1310 b681 b846 b986 b1074 b352 b377 b220 b169
b1307 b1311 b1067 b351 b429
b1227 b988 b349 b171 b170
b999 b1360 b1027 b1022 b607 b350
b948 b396 b397
b1077 b395 b398 b166 b165
b682 b929 b1075 b399
b1362 b950 b207 b187 b430 b175
b1000 b680 b1030 b608 b198 b197 b237
b928 b1021 b186 b193 b192 b184
b1328 b947 b1028 b1078
b1359 b1076 b409 b2 b3
b997 b1181 b922 b1033
b1020 b418
b1001 b998 b678 b1363 b904 b930 b1031 b400 b238 b236 b337
b990 b679 b675
b676 b204
b946 b417 b401 b217 b322
b989 b651 b1186 b921 b291 b219
b240
b1327 b634
b996 b644 b923 b1032 b296 b293 b292 b402 b239
b1002 b1358 b1180 b903 b931 b203 b294 b416 b227
b945 b218 b235 b73
b991 b905 b205 b295 b241
b677 b594 b937 b387 b290 b253 b300
b386 b652 b388 b53 b228 b216 b301 b72
b994 b650 b1187 b635 b298 b403 b226
b1003 b1326 b1182
b1185 b924 b289 b225 b251 b254
b995 b645 b932 b297 b415 b242 b75
b1354 b288 b404 b302
b1251 b992 b1357 b593 b936 b85 b405 b196 b229 b74
b7 b563 b944 b390 b255 b303
b385 b1098 b940 b406 b179 b1
b1004 b993 b646 b595 b664 b925 b407 b408 b243 b224 b215 b252
b647 b1188 b939 b389 b195 b149 b148
b649 b1183
b1184 b396 b636 b230 b214 b234
b1252 b648
b1355 b562 b933 b178 b150
b372 b592 b943 b249
b1347 b1097 b244
b1005 b384 b1356 b253 b564 b1099 b176 b247 b380 b151
b601 b938 b177 b248
b395 b596 b637 b941 b376 b213
b239 b591 b245 b136
b397 b580 b934 b250
b233 b137
b1006 b382 b1348 b6 b225 b373 b603 b246 b381 b141
b1346 b24 b211 b1096 b375 b419
b1013 b383 b93 b199 b252 b565 b600 b420 b138
b602 b140
b37 b106 b124 b133 b164
b193
b166
b238
b254 b597
b942 b203 b181 b256
b54 b145
b143
b30 b1007
b1008 b240 b102
b398
b582
b581
b590
b606
b935 b357 b382 b182
b393
b259 b52 b146 b139
b375 b5 b224 b575 b604 b1095 b212b232b194
b231 b4 b144
b1012 b251 b599 b199 b142
b25 b23 b210b226
b198 b264
b574
b204
b180
b191 b394 b392 b55 b147
b94b92 b579 b383 b202
b107
b105 b194 b200
b163 b167
b255 b566 b598 b589 b174
b183
b185 b421
b346
b1009 b125
b123
b134 b132 b165 b237 b583 b91b391
b985 b200 b412
b399 b605 b410
b29 b241b3 b576 b97
b38 b32 b376
b374b4
b26 b29 b100b223
b227 b275 b577
b573 b384 b190
b211
b210 b92 b98
b22b95b91 b197 b263
b265 b578 b173 b201 b422 b347 b90
b927b419 b201 b256 b567 b411
b195 b236 b308 b5 b96
b104 b126 b338 b561 b169 b261 b93
b359 b131 b984 b571 b385
b31 b302
b242 b274 b572 b189
b413
b209
b276 b423 b332
b348
b27b28 b228b99 b276 b570
b266 b307 b6
b257 b568 b306
b926 b202 b619 b312
b28 b83
b420
b418 b423
b103 b360 b358 b361 b339 b342b585b560 b260
b262 b39
b305
b324
b243 b275 b325
b40 b76 b39 b229
b301
b303 b277 b414 b38
b208 b280b277 b310
b309
b267 b569 b313
b258 b314
b620 b618 b40 b315 b304
b417 b424 b422 b584
b586
b355 b357 b340 b343 b341 b311b316
b244 b14 b274
b304 b16b19 b278b25
b279
b281
b230 b268
b84
b41 b77 b425
b 426
b617 b66 b318
b317
b356
b354 b344 b347 b587 b15 b273
b27 b11
b13 b95b271
b231 b269 b68
b612b616 b69
b67
b588
b82 b33 b353 b345b 348b 346 b9 b10
b78
b71
b613b615 b70
b42 b26 b81 b34
b43
b36
b25 b35
b75
b46
b21
b22
b23
b24
b89
b90
b91
Exit
(a) Before shrinking (b) After shrinking
b47 b20 b88
b87
b86
b85
b19
b44
b45
b18
b17
b79
b80
Figure 5. Example of a CFG, of one of the functions of the Windows
b48
b49
b50
b51
b57
b16
b60
b61
b62 b45
disk free space utility program df.exe, before and after shrinking. The
b53 b15 b63
CFG has been reduced from 894 nodes to 283 nodes.

b55 Exit
b52
b58 b19
b54 b64 b18
b56 b41 b46
b59 b14
b65
b23
b42 b22
b66
b13 b24 b25
b67
b26 b44
b12
b68
b20
b43
b27 b14
b69 b13
b11
b21
b70
b12
b15
b10
b71
b11
b72 b28
b9
b7
b8
b74
b73
b29
b10
b9
b16
b17
(for smaller dataset) and 100 times (for larger dataset), and
b5
b6
b30
b31 b32
b8
b7
b33
b34
still achieved a detection rate of 99.6% with a false positive
b35
b6
rate of 4% as latter shown in Table VI.

b4
b5
b3 b36
b4
b2
b3 b37
b1
b2
b1
b0
b38
b0
Entry
Entry
b40
b39
IV. MARD P ERFORMANCE

(a) Before shrinking (b) After shrinking We carried out an empirical study to analyse the correct-
ness and efficiency of MARD. We present, in this section,
Figure 3. Example of a CFG, of one of the functions of one of the the evaluation metrics, the empirical study, obtained results
samples of the MWOR class of malware, before and after shrinking.
The CFG has been reduced from 92 nodes to 47 nodes.
and analysis.
We were able to substantially reduce the number of nodes A. Performance Metrics

per CFG (in total a 90.6% reduction), as shown in Table III. Before evaluating the performance of MARD, we first
This reduced the runtime of MARD on average by 2.7 times define the following performance metrics:
Table II
True positive (TP) is the number of malware that are DATASET DISTRIBUTION BASED ON THE NUMBER OF C ONTROL
classified as malware. True negative (TN) is the number of F LOW G RAPHS (CFG S ) FOR EACH PROGRAM SAMPLE
benign programs that are classified as benign. False positive
(FP) is the number of benign programs that are classified Malware samples (1020) Benign program samples (2330)
as malware. False negative (FN) is the number of malware Number of Number of Number of Number of
CFGs Samples CFGs Samples
that are classified as benign.
Precision is the fraction of detected malware samples that 2 250 0 – 50 1277
are correctly detected. Accuracy is the fraction of samples, 5 – 32 204 51 – 100 317
including malware and benign, that are correctly detected as
33 – 57 222 101 – 199 310
either malware or benign. These two metrics are defined as
follows: 58 – 84 133 201 – 399 282
85 – 133 105 400 – 598 111
TP TP + TN 140 – 249 94 606 – 987 30
P recision = Accuracy = (2)
TP + FP P +N
133 – 1272 12 1001 – 1148 3
where P and N are the total number of malware and
benign programs respectively. Now we define the mean
maximum precision (MMP) and mean maximum accuracy Table III
(MMA) for n-fold cross-validation as follows: DATASET DISTRIBUTION BASED ON THE SIZE ( NUMBER OF NODES )
FOR EACH CFG AFTER NORMALIZATION AND SHRINKING
n
1X
MMP = P recisioni (3) Malware samples (1020) Benign program samples (2330)
n i=1 Number of Number of Number of Number of
Nodes CFGs Nodes CFGs
n
1X
MMA = Accuracyi (4) 1 – 10 60284 1 – 10 242240
n i=1
11 – 20 1652 11 – 20 3466
We also use two other metrics, TP rate and FP rate. 21 – 39 2177 21 – 39 1086
The TP rate (TPR), also called detection rate (DR), corre-
41 – 69 288 40 – 69 368
sponds to the percentage of samples correctly recognized as
malware out of the total malware dataset. The FP rate (FPR) 70 – 96 207 70 – 99 61
metric corresponds to the percentage of samples incorrectly 104 – 183 254 100 – 194 60
recognized as malware out of the total benign dataset. These 221 – 301 2 245 – 521 15
two metrics are defined as follows:
Total number of nodes before CFG shrinking = 4908422
TP Total number of nodes after CFG shrinking = 462974 (90.6% reduced)
TPR = (5) Runtime on average reduced by 2.7 times (for smaller dataset) and
P
100 times (for larger dataset)
FP
FPR = (6)
N
Receiver Operator Characteristic (ROC) Curve is a graph- respectively. The normalizations carried out are removal of
ical plot of depicting performance of a binary classifier. It NOP and other junk instructions.
is a two dimensional plot, where TPR is plotted on the Y- The dataset contains a variety of programs with CFGs
axis and FPR is plotted on the X-axis, and hence depicts the ranging from simple to complex for testing. As shown in
trade-offs between benefits (TP) and costs (FP). We want a Table II the number of CFGs per malware sample ranges
higher TPR and a lower FPR, so a point in ROC space to from 2 to 1272 and the number of CFGs per benign
the top left corner is desirable. program ranges from 0 to 1148. Some of the Windows DLLs
(dynamic link libraries) that were used in the experiment do
B. Dataset not have code but only data (i.e. they cannot be executed)
The dataset used for the experiments consists of total and as a result they have 0 node CFGs. The sizes of these
3350 sample Windows programs. Out of the 3350 programs, CFGs are shown in Table III. The size of the CFGs of the
1020 are metamorphic malware samples collected from three malware samples range from 1 to 301 nodes, and the size
different resources [22], [24], [17], and the other 2330 of the CFGs of the benign programs range from 1 to 521
are benign programs. The dataset distribution based on the nodes.
number of CFGs for each sample, and size of the CFG after The 1020 malware samples belong to the following three
normalization and shrinking is shown in Tables II and III metamorphic family of viruses: Next Generation Virus Gen-
eration Kit (NGVCK) (http://vxheaven.org/vx.php?id=tn02), from 20% to 90%. Table V lists the obtained results. As it
Second Generation Virus Generator (G2) (http://vxheaven. can be noted the best results are obtained for threshold values
org/vx.php?id=tg00) and Metamorphic Worm (MWOR) [17] of 20% − 25%. We decided to use 25% as threshold value
generated by metamorphic generator. NGVCK and MWOR for malware detection.
family of viruses are further divided into two and seven
Table V
Classes respectively. This Class distribution is shown in M ALWARE DETECTION PERFORMANCE BY VARYING THE
Table IV. DETECTION THRESHOLD VALUE
Table IV Threshold (%) DR (%)

Class DISTRIBUTION OF THE 1020 METAMORPHIC MALWARE
SAMPLES 20 99.2
Class Number of Comments 25 99.2

Samples 30 93.2
NGVCK 1 70 Generated by NGVCK with simple set of 40 86.4
obfuscations, such as dead code insertion
and instruction reordering etc. 50 82.8
NGVCK 2 200 Generated by NGVCK with complex set 60 76

of obfuscations, such as indirect jump
70 76
(e.g; push followed by a ret instruction)
to one of the data sections etc. 80 76
G2 50 Generated by G2 90 76
MWOR 1 100 Generated by MWOR with a padding ratio
of 0.5
MWOR 2 100 Generated by MWOR with a padding ratio D. Empirical Study
of 1.0
In this Section we discuss the experiments, and present the
MWOR 3 100 Generated by MWOR with a padding ratio
of 1.5
performance results obtained. The experiments were run on
the following machine: Intel Core i5 CPU M 430 (2 Cores)
MWOR 4 100 Generated by MWOR with a padding ratio @ 2.27 GHz with 4GB RAM, operating system Windows 8
of 2.0
professional.
MWOR 5 100 Generated by MWOR with a padding ratio We carried out two experiments: one with a smaller
of 2.5
dataset using 10-fold cross validation and the other with a
MWOR 6 100 Generated by MWOR with a padding ratio larger dataset using 5-fold cross validation.
of 3.0
To find the runtime improvement by CFG reduction, both
MWOR 7 100 Generated by MWOR with a padding ratio these experiments were carried out with and without CFG
of 4.0
reduction. We were able to complete all the runs (10 times)
MWOR uses two morphing techniques: dead code insertion and equiva- with and without CFG reduction for the smaller dataset. We
lent instruction substitution. Padding ratio is the ratio of the number of
dead code instructions to the core instructions of the malware. Padding completed all the runs (5 times) with CFG reduction with
ratio of 0.5 means that the malware has half as much dead code the larger dataset but completed only one run (took over
instructions as core instructions [17]. 34 hours to complete) without CFG reduction. This is one
of the other reasons to use a smaller dataset with a larger
This variety of CFGs and malware Classes in the samples dataset, to get accurate results for runtime improvement. In
provides a good testing platform for the proposed malware the following two Sections we present and describe these
detection framework. two experiments with the results.
1) Experiment with Smaller Dataset Using 10-fold Cross
C. Threshold Computation Validation: Out of 3350 Windows programs we selected
As explained above, to classify a program as benign randomly 1351 Windows programs. Out of these 1351
or malware we compare to some predefined threshold the programs, 250 are metamorphic malware samples belonging
percentage of its CFGs that match malware CFGs from the to Classes NGVCK 1 and NGVCK 2, and the other 1101
training set. To compute this threshold value we carried are benign programs.
out an experiment with MARD and selected 1387 sample The 10-fold cross validation was conducted by selecting
Windows programs for malware analysis and detection. Out 25 malware samples out of the 250 malware to train our
of these 1387 programs, 250 are metamorphic malware detector. The remaining 225 malware samples along with the
samples, and the other 1137 are benign programs. We ran 1101 benign programs were then used to test the detector.
the experiment with different values of the threshold ranging These two steps were repeated 10 times and each time
different set of 25 malware samples were selected for is 204 to 98.9% when we used a training dataset of 510
training and the remaining samples for testing. The overall samples.
performance results were obtained by averaging the results 3) ROC Graphs: ROC graph plots for both the above
obtained in the 10 different runs. experiments are shown in Figure 6. As expected both of
Using a threshold value of 25%, we conducted further them are in the top left corner of the graph. The ROC for
evaluation by increasing the size of the training set from larger dataset has more precision, i.e; more number of ROC
25 samples to 125 malware samples (50% of the malware points are closer to each other and this is also confirmed by
samples). The obtained results are listed in Table VI. The the MMP listed in Table VII.
DR improved from 94% when the size of the training set is
25 to 99.6% when we used a training dataset of 125 samples.
Table VI
M ALWARE DETECTION RESULTS FOR SMALLER DATASET
Training set size DR FPR MMP MMA Real-Time
25 94% 3.1% 0.86 0.96 3

125 99.6% 4% 0.85 0.97 3
Real-time here means the detection is fully automatic and finishes in a

reasonable amount of time. On average it took MARD 15.2037 seconds
with CFG shrinking and 40.4288 seconds without CFG shrinking to
complete the malware analysis and detection for 1351 samples including
25 training malware samples. This time excludes time for the training.
MARD achieved the same values for all the other performance metrics
(DR, FPR, MMP and MMA) with and without CFG shrinking.
2) Experiment with Larger Dataset Using 5-fold Cross

Validation: The 5-fold cross validation was conducted by (a) Smaller dataset using 10-fold cross validation
selecting 204 malware samples out of the 1020 malware
to train our detector. The remaining 816 malware samples
along with the 2330 benign programs in our dataset were
then used to test the detector. These two steps were repeated
5 times and each time different set of 204 malware samples
were selected for training and the remaining samples for
testing. The overall performance results were obtained by
averaging the results obtained in the 5 different runs.
Table VII
M ALWARE DETECTION RESULTS FOR LARGER DATASET
Training set size DR FPR MMP MMA Real-Time
204 97% 4.3% 0.91 0.96 3

510 98.9% 4.5% 0.91 0.97 3
Real-time here means the detection is fully automatic and finishes in a

reasonable amount of time. On average it took MARD 946.5824 seconds
with CFG shrinking and over 125400 seconds (over 34 hours) without (b) Larger dataset using 5-fold cross validation
CFG shrinking to complete the malware analysis and detection for 3350
samples including 204 training malware samples. This time excludes time Figure 6. ROC graph plots
for the training. Because of the time constraints we did not perform 5-
fold cross validation without CFG shrinking. The time (over 34 hours)
reported is just for one run of the experiment without CFG shrinking.
V. R ELATED W ORK
Using a threshold value of 25%, we conducted further This section discusses the previous research efforts on
evaluation by increasing the size of the training set from detecting metamorphic malware.
204 samples to 510 malware samples (50% of the malware A recent effort presented in [29] uses dynamic taint
samples). The obtained results are listed in Table VI. The analysis (DTA) to automatically detect if an unknown sample
DR improved from 97% when the size of the training set exhibits malicious behavior or not. A so-called taint engine
tracks the flow of information of the whole system, which through experimental evaluation its effectiveness for meta-
is compared against a set of defined policies. A proof of morphic malware analysis and real-time detection. We have
concept was implemented as a plugin of an emulator, and also compared MARD with other such detection systems.
evaluated using 46 malware and 56 benign samples for Table VIII reports the best DR results achieved by these
testing the system. The evaluation yielded DR=100% and detectors. Out of the 10 systems, MARD clearly shows
FPR=3%. results in the top, and unlike others is fully automatic,
However, while the proposed detector provides a fine supports malware detection for 64 bit Windows (PE binaries)
grained whole system analysis, it is not completely au- and Linux (ELF binaries) platforms and has the potential to
tomatic; human input is needed to make more accurate be used as a real-time detector. Currently we are carrying
detections, and to specify the policies. out further research into using similar techniques for web
Furthermore, as the detector runs in an emulator, it malware analysis and detection.
probably takes a considerable amount of time for detection.
R EFERENCES
Therefore it cannot be used for real-time malware detection.
In [25] the authors use model-checking to detect meta- [1] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D.
Ullman. Compilers: Principles, Techniques, and Tools (2nd
morphic malware. Model-checking techniques check if a
Edition). Addison-Wesley Longman Publishing Co., Inc.,
given model meets a given specification. A program is Boston, MA, USA, 2006.
modeled and malware’s behaviors are specified, using a
formal language. Therefore the behavior of a program can [2] Shahid Alam. MAIL: Malware Analysis Intermediate
be checked without executing the program. The proposed Language. http://www.cs.uvic.ca/∼salam/PhD/TR-MAIL.pdf,
Last accessed: February 3, 2014.
approach was evaluated with 200 malware and 8 Windows
benign programs, yielding DR=100% with FPR=12.5%. [3] Shahid Alam, R. Nigel Horspool, and Issa Traore. MAIL:
Model-checking is time consuming and sometimes it can Malware Analysis Intermediate Language - A Step Towards
run out of memory. Likewise the proposed approach is not Automating and Optimizing Malware Detection. In Security
suitable for real-time detection. of Information and Networks, SIN ’13, New York, NY, USA,
2013. ACM SIGSAC.
A technique described in [12] uses value set analy-
sis (VSA) [4] for detecting metamorphic malware, where [4] G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum.
malware binaries are run and traced inside a controlled envi- WYSINWYX: What You See Is Not What You eXecute. PhD
ronment to collect register values. Experimental evaluation thesis, University of Wisconsin, 2005.
using a dataset consisting of 826 files contaminated with
[5] J.M. Bauer, M.J.G. Eeten, and Y. Wu. ITU Study on the
7 specimens of malware and 385 benign files, achieved Financial Aspects of Network Security: Malware and Spam.
DR=98% and FPR=2.9%. c International Telecommunications Union, 2008.
In [11], a technique is presented that uses API call-gram
to detect malware. API call-gram captures the sequence in [6] F. Cohen. Computer Viruses: Theory and Experiments.
which API calls are made in a program. First a call graph Computer Security, 6(1):22 – 35, Feburary 1987.
is generated from the disassembled instructions of a binary [7] Stephen A. Cook. The Complexity of Theorem-Proving Pro-
program. This call graph is converted to call-gram. The cedures. In Proceedings of the third annual ACM symposium
call-gram becomes the input to a pattern matching engine. on Theory of computing, STOC ’71, pages 151–158, New
The evaluation of the approach was conducted using a York, NY, USA, 1971. ACM.
dataset consisting of 2550 benign and 3639 malware files,
[8] Symantec Corporation. Internet Security Threat Report - 2011
and achieved in the best case DR=98.4% and FPR=2.7%. Trends.
c Symantec Corporation, 17, April 2012.
The proposed system is not fully automated and cannot be
used as a real-time detector. [9] Mojtaba Eskandari and Sattar Hashemi. A Graph Mining
Approach for Detecting Unknown Malwares. Journal of
Visual Languages and Computing, 23(3):154 – 162, June
Table VIII gives a summary of all the systems’ research
2012.
efforts discussed above and a few additional detectors not
covered here due to space limitation. None of the prototype [10] Mojtaba Eskandari and Sattar Hashemi. ECFGM: Enriched
systems implemented can be used as a real-time detector. Control Flow Graph Miner for Unknown Vicious Infected
Furthermore a few systems that claim perfect detection rate Code Detection. Journal in Computer Virology, 8(3):99 –
108, August 2012.
were validated using small datasets.
[11] Parvez Faruki, Vijay Laxmi, M. S. Gaur, and P. Vinod.
VI. C ONCLUSION AND F UTURE W ORK Mining Control Flow Graph as API Call-Grams to Detect
Portable Executable Malware. In Security of Information
In this paper, we have presented a new metamorphic and Networks, SIN ’12, New York, NY, USA, 2012. ACM
malware detection framework named MARD and shown SIGSAC.
Table VIII
S UMMARY AND COMPARISON WITH MARD OF THE METAMORPHIC MALWARE ANALYSIS AND DETECTION SYSTEMS DISCUSSED IN S ECTION V
System Analysis Type DR FPR Data Set Size Real-Time Platform

Benign/Malware
MARD Static 98.9% 4.5% 2330 / 1020 3 Win & Linux 64

API-CFG [9], [10] Static 97.53% 1.97% 2140 / 2305 7 Win 32
Call-Gram [11] Static 98.4% 2.7% 3234 / 3256 7 Win 32
Chi-Squared [27] Static 89.74% ∼10% 40 / 200 7 Win 32
DTA [29] Dynamic 100% 3% 56 / 42 7 Win 64
Histogram [22] Static 100% 0% 40 / 60 7 Win 32
Model-Checking [25] Static 100% 1% 8 / 200 7 Win 32
MSA [28] Static 91% 52% 150 / 1209 7 Win 32
Opcode-Graph [24] Static 100% 1% 41 / 200 7 Win 32
VSA [12] Dynamic 98% 2.9% 385 / 826 7 Win 64
The perfect results of 100% and 0% should be validated with more number of samples than tested in the papers. The DR and FPR values for
Opcode-Graph are not directly mentioned in the paper. We computed these values by picking a threshold of 0.5 from the similarity score in the paper.
[12] Mahboobe Ghiasi, Ashkan Sami, and Zahra Salehi. Dynamic [22] B.B. Rad, M. Masrom, and S. Ibrahim. Opcodes Histogram
Malware Detection Using Registers Values Set Analysis. In for Classifying Metamorphic Portable Executables Malware.
Information Security and Cryptology, pages 54 – 59, 2012. In ICEEE, pages 209 – 213, September 2012.
[13] Nwokedi Idika and Aditya P. Mathur. A Survey of Malware [23] Thomas Raschke. The New Security Challenge: Endpoints.
Detection Techniques. Purdue University, 2007.
c International Data Corporation, 2005.
[14] International Telecommunications Union (ITU). The World [24] Neha Runwal, Richard M. Low, and Mark Stamp. Opcode
in 2011: ICT Facts and Figures.
c ITU, 2011. Graph Similarity and Metamorphic Detection. Journal in
Computer Virology, 8(1-2):37 – 52, May 2012.
[15] Cullen Linn and Saumya Debray. Obfuscation of Executable
Code to Improve Resistance to Static Disassembly. In ACM [25] Fu Song and Tayssir Touili. Efficient Malware Detection
CCS, pages 290 – 299, New York, NY, USA, 2003. ACM. Using Model-Checking. In Dimitra Giannakopoulou and Do-
minique Mry, editors, FM: Formal Methods, volume 7436 of
[16] David M. Chess and Steve R. White. An Undetectable Lecture Notes in Computer Science, pages 418–433. Springer
Computer Virus. Virus Bulletin Conference, September 2000. Berlin Heidelberg, 2012.
[17] Sudarshan Madenur Sridhara and Mark Stamp. Metamorphic [26] D. Spinellis. Reliable Identification of Bounded-Length
worm that carries its own morphing engine. Journal of Com- Viruses is NP-Complete. Information Theory, IEEE Trans-
puter Virology and Hacking Techniques, 9(2):49–58, 2013. actions on, 49(1):280 – 284, January 2003.
[18] Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul [27] AnnieH. Toderici and Mark Stamp. Chi-squared Distance and
Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Metamorphic Virus Detection. Journal in Computer Virology,
Vangal, Nitin Borkar, Greg Ruhl, and Saurabh Dighe. The pages 1 – 14, 2012.
48-core SCC Processor: the Programmer’s View. In SC’10,
pages 1–11, Washington, DC, USA, 2010. IEEE Computer [28] P. Vinod, V. Laxmi, M.S. Gaur, and G. Chauhan. MOMEN-
Society. TUM: Metamorphic Malware Exploration Techniques Using
MSA Signatures. In IIT, pages 232 – 237, March 2012.
[19] Gary McGraw and Greg Morrisett. Attacking Malicious
Code: A Report to the Infosec Research Council. IEEE Softw., [29] Heng Yin and Dawn Song. Privacy-Breaching Behavior
17(5):33 – 41, September 2000. Analysis. In Automatic Malware Analysis, SpringerBriefs in
Computer Science, pages 27–42. Springer New York, 2013.
[20] Steven S. Muchnick. Advanced Compiler Design and Imple-
mentation. Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA, 1997.
[21] Philip OKane, Sakir Sezer, and Kieran McLaughlin. Obfus-

cation: The Hidden Malware. IEEE Security and Privacy,
9(5):41 – 47, September 2011.

MARD Framework Detects Metamorphic Malware in Real-Time

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MARD Framework Detects Metamorphic Malware in Real-Time

Uploaded by

Copyright:

Available Formats

MARD: A Framework for Metamorphic Malware Analysis and Real-Time Detection

Shahid Alam and R. Nigel Horspool Issa Traore

and detect malware using MAIL. The use of MAIL in our

in the front end is to reduce the number and complexity

optimized intermediate representation using MAIL, which

as control flow information, function/API calls and patterns Similarity

optimize the runtime of MARD. MAIL1 Report

B. Malware Detection 1 MAIL = Malware Analysis Intermediate Language

notated control flow graph (CFG) for each function in

T hreads = (N C) (1) b216

b182 b192 b194

where NC = Number of CPUs/Cores

One of the advantages of using MAIL is that it provides

b262 b71 b96

patterns for malware detection. Our detection method uses

both subgraph matching and pattern matching techniques

for metamorphic malware detection. Even if we reduce the

b210 b209 b421 b423

b208 b368 b51 b422

number of blocks in a CFG (it is possible for a CFG of some

b450 b332 b405

binaries to be reduced to very small number of blocks) we

still get a good detection rate because of the combination of

the two techniques.

To reduce the number of blocks (nodes) in a CFG for

runtime optimization we carried out CFG reduction, also

A are reachable through B, then A and B will be merged.

Figures 3, 4 and 5 show examples of CFGs, from the

dataset described in this paper, before and after shrinking.

Figures 3 and 4 are CFGs of functions of two different

malware samples and Figure 5 is a CFG of a function of

a benign sample. The shrinking does not change the shape

of a graph. As we can see in the Figures, the shapes of

of these examples are available at http://www.cs.uvic.ca/

b42 b26 b81 b34

CFG has been reduced from 894 nodes to 283 nodes.

rate of 4% as latter shown in Table VI.

IV. MARD P ERFORMANCE

We were able to substantially reduce the number of nodes A. Performance Metrics

Table IV Threshold (%) DR (%)

Class Number of Comments 25 99.2

NGVCK 2 200 Generated by NGVCK with complex set 60 76

Training set size DR FPR MMP MMA Real-Time

25 94% 3.1% 0.86 0.96 3

Real-time here means the detection is fully automatic and finishes in a

2) Experiment with Larger Dataset Using 5-fold Cross

Training set size DR FPR MMP MMA Real-Time

204 97% 4.3% 0.91 0.96 3

Real-time here means the detection is fully automatic and finishes in a

System Analysis Type DR FPR Data Set Size Real-Time Platform

MARD Static 98.9% 4.5% 2330 / 1020 3 Win & Linux 64

[21] Philip OKane, Sakir Sezer, and Kieran McLaughlin. Obfus-

You might also like