You are on page 1of 147

A Comparison of Dynamic Malware Analysis Systems

and Security Information and Event Management


systems for Malware Analysis

Nikolaos Katsamakis

Submitted in partial fulfilment of the requirements of


Edinburgh Napier University for the degree of
Master of Science in Advanced Security and Digital
Forensics

School of Computing
August 2014
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 2

Authorship Declaration
I, Nikolaos Katsamakis, confirm that this dissertation and the work presented in it are my
own achievement.

Where I have consulted the published work of others this is always clearly attributed;

Where I have quoted from the work of others the source is always given. With the
exception of such quotations this dissertation is entirely my own work;

I have acknowledged all main sources of help;

If my research follows on from previous work or is part of a larger collaborative research


project I have made clear exactly what was done by others and what I have contributed
myself;

I have read and understand the penalties associated with Academic Misconduct.

I also confirm that I have obtained informed consent from all people I have involved in the
work in this dissertation following the School's ethical guidelines

Signed:

Date:

Matriculation no: 40132614


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 3

Data Protection Declaration


Under the 1998 Data Protection Act, The University cannot disclose your grade to an
unauthorized person. However, other students benefit from studying dissertations that
have their grades attached.

Please sign your name below one of the options below to state your preference.

The University may make this dissertation, with indicative grade, available to others.

The University may make this dissertation available to others, but the grade may not be
disclosed.

The University may not make this dissertation available to others.


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 4

Acknowledgements
I am using this opportunity to express my gratitude to the academic staff of the School of
Computing of Edinburgh Napier University for their aspiring guidance, invaluably
constructive criticism and friendly advice and attitude, during the MSc course in Advanced
Security and Digital Forensics.

I want to express my warm thanks to my supervisor, Richard MacFarlane for his guidance,
encouragement, insightful questions and constructive criticism during this project and
above all his support during the entire MSc course.

I also wish to thank Professor Bill Buchanan, who helped me in enriching my work, by
planting ideas for this project, for his inspiration and example and for being my internal
supervisor.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 5

Table of Contents
Authorship Declaration 2
Data Protection Declaration 3
Acknowledgements 4
List of Tables 7
List of Figures 8
Abstract 12
1.0 Introduction 13
1.1 Problem Outline 13
1.2 Aim and objectives 14
1.3 Background Research 14
1.4 Dynamic Malware Analysis Systems Technical Review 17
2.0 Literature Review 22
2.1 Introduction 22
2.2 Dynamic malware analysis techniques 22
2.3 Issues and Limitations of Dynamic Malware Analysis systems 27
2.3 Dynamic Malware Analysis Systems Detection Evaluation 30
2.4 SIEM systems and Dynamic Malware Analysis 33
2.5 Conclusions 34
3.0 Experimental System Design 36
3.1 Introduction 36
3.2 Sandbox Design 36
3.3 Comparison metrics 39
3.4 Malware samples 41
3.5 Comparison procedure 42
3.6 Conclusions 44
4.0 Implementation 46
4.1 Introduction 46
4.2 Objective 46
4.3 Malware Samples 46
4.4 Virtual Environment 48
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 6

4.4 FakeNet 48
4.5 Sandboxie and Buster Sandbox Analyzer 50
4.6 Cuckoo 52
4.7 Experimental Splunk Sandbox environment 56
4.8 Conclusions 61
5.0 Evaluation 62
5.1 Introduction 62
5.2 Sample Reports on the Analysis of Malware Samples 62
5.3 Systems implementation comparison 75
5.4 Dynamic Malware Analysis systems reporting comparison 78
5.5 DMA systems report comparison to Splunk Experimental Sandbox findings 92
5.6 Conclusions 96
6.0 Conclusion 98
6.1 Overall Conclusion 98
6.2 Appraisal of Achievements 100
6.3 Future work 103
Splunk sandbox 104
6.4 Personal Reflection 105
References 108
Appendix A: Project Proposal 112
Appendix B: Project Plan 117
Appendix C: Project Diary 118
Appendix D: DMA Analysis Report samples 125
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 7

List of Tables
Table 1:Implementation Techniques of DMAs (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011) 31
Table 2: System implementation techniques and monitoring mechanisms 76
Table 3: Network Activity Interpretation Discrepancies 79
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 8

List of Figures
Figure 1: Cuckoo's Architecture 18
Figure 2: Sandboxie's architecture 19
Figure 3: TRUMAN half proxy architecture 37
Figure 4: Experimental SIEM sandbox topology design for malware detection and analysis 38
Figure 5: Analysis Procedure 40
Figure 6: Samples analysis 43
Figure 7: Report analysis and comparison 44
Figure 8: VMware Machines used for the DMAs and the SIEM experimental sandbox 48
Figure 9: FakeNet network taunt Services initiating 49
Figure 10: FakeNet monitoring example 49
Figure 11: Sandboxie Chrome instance 50
Figure 12: Buster injection in Sandboxie configuration file 50
Figure 13: Buster pointed to the appropriate Sandbox 51
Figure 14: Sample Buster Analysis 51
Figure 15: VirtualBox Guest VM to be used for analysis by Cuckoo 52
Figure 16: Cuckoo service initiated, waiting for analysis tasks 53
Figure 17: Cuckoo utilities and web interface initiation 53
Figure 18: Cuckoo web interface for submission of executables to be analyzed 54
Figure 19: Cuckoo sample file details report 55
Figure 20: Activity summary as Signatures and Screenshots during execution 55
Figure 21: Cuckoo Network analysis, behaviour summary and mutexes created report 56
Figure 22: Splunk Universal Forwarder data inputs 56
Figure 23: Splunk's local Registry monitoring feature 57
Figure 24: Implemented topology of the Splunk sandbox due to the limitations 58
Figure 25: Malware process identified in Splunk monitoring for malware Cryptolocker sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9 59
Figure 26: DNS requests triggered by Cryptolocker malware sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9 59
Figure 27: Visualization of port activity for Cryptolocker malware sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9 60
Figure 28: Filtering by Request count for malware services deployed at 0.0.0.0. by Cryptolocker malware
sample SHA256: d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9 60
Figure 29: Visualization of network traffic for the Malware machine and the services initiated by the
malware sample at 0.0.0.0. Sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9 61
Figure 30: FakeNet Reporting on malware network activity 62
Figure 31: FakeNet sample DNS requests reported for the Cryptolocker sample 63
Figure 32: Buster report on DLL, file system and registry activity 64
Figure 33: Buster report on Domains contacted and Process information 64
Figure 34: Cuckoo file details report for the sample 65
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 9

Figure 35: Cuckoo signature and activity summary report for the sample 65
Figure 36: Cuckoo domains involved report 66
Figure 37: Cuckoo processes report 66
Figure 38: Cuckoo network and behaviour report 67
Figure 39: Cuckoo Registry report 67
Figure 40: Anubis report on the original executable and its network activity 68
Figure 41: Anubis report on the original executable registry and file system activities 69
Figure 42: Anubis report on the original executable process creation activities 69
Figure 43: Anubis report on the processes created by the original executable 70
Figure 44: Anubis report on network activity of the processes created by the original executable 70
Figure 45: Processes reported by Splunk parsing the Windows Security event log. 71
Figure 46: Splunk reports suspicious activity event by a strange process which starts Services at
0.0.0.0:56124 72
Figure 47: Splunk Report on Network activity Source Address 72
Figure 48: Search for Domain requests reported back all domains reported by the DMA systems 73
Figure 49: Splunk has detected a few more hosts in the DNS traffic initiated by the malware 73
Figure 50: Token elevation by the malware initiated process 73
Figure 51: Splunk also reported token elevations for cmd.exe which was the application that launched
the malware sample 74
Figure 52: Registry key query by injected code in explorer.exe 74
Figure 53: IDS log Parsed in Splunk to monitor network activity 75
Figure 54: DMA systems network report discrepancies 80
Figure 55: System activity reported by the DMAs for Cryptolocker C&C 82
Figure 56: Network activity reported by the DMAs for Cryptolocker C&C 82
Figure 57: System activity reported by the DMAs for Zeus Bot Key logger 83
Figure 58: Network activity reported by the DMAs for Zeus Bot Key logger 83
Figure 59: System activity reported by the DMAs for Cryptolocker Crypt 84
Figure 60: Network activity reported by the DMAs for Cryptolocker Crypt 84
Figure 61: System activity reported by the DMAs for Compressed Botnet 85
Figure 62: Network activity reported by the DMAs for Compressed Botnet 85
Figure 63: System activity reported by the DMAs for Zeus Bot 86
Figure 64: Network activity reported by the DMAs for Zeus Bot 86
Figure 65: System activity reported by the DMAs for Zeus banking key logger Bot 87
Figure 66: Network activity reported by the DMAs for Zeus banking key logger Bot 87
Figure 67: System activity reported by the DMAs for Alphx family Worm 88
Figure 68: Network activity reported by the DMAs for Alphx family Worm 88
Figure 69: System activity reported by the DMAs for Nasser Family Worm 89
Figure 70: Network activity reported by the DMAs for Nasser Family Worm 89
Figure 71: System activity reported by the DMAs for Email Bomber Mailnuke 90
Figure 72: Network activity reported by the DMAs for Email Bomber Mailnuke 90
Figure 73: System activity reported by the DMAs for Email Bomber 91
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 10

Figure 74: Network activity reported by the DMAs for Email Bomber 91
Figure 75: Splunk retrieved a registry Query initiated by a code injection in explorer.exe 93
Figure 76: Splunk retrieves a spawned process log which initiates services at 0.0.0.0:56124 94
Figure 77: Splunk retrieves token elevation information initiated from code injection in cmd.exe 94
Figure 78: Splunk's automated report on traffic Source Address for Crypto C&C malware sample. 96
Figure 79: Project plan Gant chart 117
Figure 80: Diary week 4 118
Figure 81: Diary week 5 120
Figure 82: diary week 6 122
Figure 83: diary week 7 122
Figure 84: Diary week 9 124
Figure 85 FakeNet HTTP request reported 125
Figure 86: DNS requests and socket creation 126
Figure 87: FakeNet Socket creation (continue) 127
Figure 88: Buster Analysis report 128
Figure 89: Buster File system and registry report 129
Figure 90: Buster Processes and Mutexes report 130
Figure 91: Cuckoo Binary report 131
Figure 92: Cuckoo Summary and screenshots 131
Figure 93: Cuckoo registry report 132
Figure 94: Cuckoo registry report (continue) 132
Figure 95: Cuckoo execution screenshot 133
Figure 96: Anubis binary report 134
Figure 97: Anubis process summary 135
Figure 98: Anubis process analysis 135
Figure 99: Anubis process analysis 2 136
Figure 100: Anubis process analysis 3 136
Figure 101: FakeNet Adobe execution error 137
Figure 102: Buster summary report 138
Figure 103: Buster registry analysis 139
Figure 104: Buster Process and Mutex analysis 140
Figure 105: Cuckoo binary analysis 141
Figure 106: Cuckoo summary and screenshots 141
Figure 107: Cuckoo Adobe execution error 142
Figure 108: Cuckoo behavior summary 142
Figure 109: Cuckoo Mutexes and Registry 143
Figure 110: Cuckoo Process activity 143
Figure 111: Cuckoo registry activity 144
Figure 112: Cuckoo Service initiation 144
Figure 113: Anubis process report 145
Figure 114: Anubis network activity report 145
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 11

Figure 115: Anubis spawned process report 146


Figure 116: Anubis spawned process report (continue) 146
Figure 117: Anubis spawned process report 147
Figure 118: Anubis Mutexes report 147
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 12

Abstract
Dynamic Malware Analysis systems are automated systems that can analyse malware and
produce reports on their malicious activity based on their implemented monitoring
mechanisms. The literature review carried out, has shown that these systems detection
abilities are based on the analysis and monitoring techniques implemented in each system
and they are prone to issues such as interpretation discrepancies in the reports
(Massicotte, Couture, Normandin, & Michaud, 2012). SIEM systems show similarities with
DMA systems since they employ monitoring mechanisms to analyze data and therefore
could overcome the issues that DMA systems face (Gabriel, Hoppe, Pastwa, & Sowa, 2009).
A comparison experiment of common DMA systems has been designed based on the
literature to identify issues of DMA systems. A SIEM sandbox has also been designed for
dynamic malware analysis. The SIEM sandbox was implemented with Splunk was able to
detect malware activity by processing data from an IDS implementation for network
activity and Splunk's security data collection mechanisms. The comparison between DMA
systems highlighted issues regarding their implementations, discrepancies in their reports
and detection issues through a quantitative approach on the findings categories of the
reports to assess the effectiveness of the systems. Splunk's detection abilities were also
compared to the to the findings of the DMA systems concluding that it introduced more
comprehensive application monitoring by reporting on token elevation, which is connected to
privilege escalation. Splunk's visualization abilities in reporting activity caused by the malware
samples gave a more comprehensive view in understanding the true purpose of malware.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 13

1.0 Introduction
One of the most significant threats nowadays is malicious code. Malicious code can be
found in executable files or other file types like document files which are very common in
today's world and can spread very fast even in large organizations that spend enormous
amounts of money to secure their systems (Sikorski & Honig, 2012). These types of
malicious code is commonly known as malware and the threats they pose are more and
more severe since the authors of such software aim in financial gain most of the times by
damaging or ransoming organizations or even by tricking individuals and stealing their data.

In order to defend against such threats organizations spend money in security and
monitoring infrastructures but these threats are hard to identify and create
countermeasures to mitigate their effect. While traditional antivirus mitigation relies on
signature detection to identify potential harmful software and files to alert users and
administrators today's more sophisticated and evolved malware pose a much graver
threat, since their authors are employing obfuscation and anti-detection techniques to help
them achieve their goals (Bayer, Moser, Krugel, & Kirda, 2006).

Dynamic malware analysis systems are systems used to analyse malware by executing a
possibly malicious piece of code or executable and detecting its effect upon execution by
using specialized monitoring mechanisms. Therefore dynamic malware analysis systems
can possibly address the issue of successful analysis of malicious software where other
techniques may fail (Bayer, Moser, Krugel, & Kirda, 2006).

Dynamic Malware Analysis systems are very widely used nowadays to assist in the analysis
of unknown executables and potentially malicious code which could be hidden in different
types of files. The detection accuracy of such systems depends on their implementation
and detection abilities as well as their reporting on the analysis performed. Since these
systems are not developed following a specific framework, which could help in avoiding
discrepancies that may appear between different systems.

1.1 Problem Outline


Malware threats are increasing day by day and the detection mechanisms that security
analysts use tend to evolve slower than the threats they face. Dynamic Malware Analysis
systems are automated systems that help security analysts in assessing the potential
maliciousness of unknown files and executables but they are dependent on their
implementation to successfully detect threats.

Several Dynamic Malware Analysis systems have been implemented and used by analysts
around the world, but many of these systems may not be able to detect malicious code for
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 14

various reasons that may derive from their core implementation, the complexity of current
malware or even bugs in the reporting mechanisms that the systems use.

1.2 Aim and objectives


The aim of this dissertation is to compare current Dynamic Malware Analysis systems,
evaluate them and identify issues, as well as to compare such systems to a novel Security
Information and Event Management system sandbox for Dynamic Malware Analysis to
explore the possibility of overcoming the issues that DMA systems face.

Objectives:

1. Review the literature around the subject of Dynamic Malware Analysis systems
including implementation techniques, issues and limitations and perform research
into SIEM technology and how it might be used for a similar purpose.
2. Design a comparison system and experiments for common Dynamic Malware
Analysis systems to assess their detection capabilities and possible interpretation
discrepancies in their analysis of malware samples, ant to compare between
systems. Also create a novel SIEM sandbox which can be compared with the DMA
system results.
3. Deploy DMA systems and implement a SIEM based dynamic malware analysis
sandbox, and run experiments with a range of malware samples to gather data for
analysis.
4. Evaluate DMA systems using appropriate metrics and compare the results to the
experimental SIEM sandbox findings.

1.3 Background Research


Malware analysis is an analysis procedure to identify and analyse software with potential
malicious purpose against users, systems and networks. When a piece of potential
malware is being analysed the goal is to determine what auctions a potential malicious
executable can perform, how it can be detected and how it could be contained (Xie, Lu, Su,
Wang, & Li, 2013). The purpose of this analysis is to identify malicious executables and
understand how the malware works and what problems it can create when executed in a
normal corporate environment since these would be common questions raised by senior
management that would require to be answered

Usually malware executables are not readable by humans and therefore require high skills
in reverse engineering of such executables which could reveal amounts of information
through complex procedures and the use of specific tools. There are two main approaches
when analyzing potential malware: static analysis and dynamic analysis.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 15

Static analysis techniques involve examining executable files to confirm if a file is malicious
without necessarily looking at the actual instructions and provide information about its
functions and sometimes the ability to create basic signatures that can be used to detect
such malware (Moser, Kruegel, & Kirda, 2007). Basic static analysis can be quick and easily
performed, but it can be ineffective against more complex malware and miss important
behaviours. More advanced static analysis techniques may include reverse -engineering
executables by disassembling the malware executable and analyzing the instructions to
determine what actions the program performs. These instructions would be used by the
CPU so this type of analysis can shed light in the exact actions of a program. However, this
deep analysis of an executable requires deep knowledge of disassembly and windows
operating systems concepts like the Windows API.

Dynamic analysis techniques derive from observing an executable's behaviour on a system


when the executable is being run (Yee, Chuan, Ismail, & Zainal, 2012). This procedure can
lead to ways of disinfection and effective signatures for detection. In order to run a
malware safely, an environment which will allow studying the running executable needs to
be established in order to avoid infection of actual systems or networks. These techniques
can be easily used but they may not be effective against every single malware and may
miss some important features. A more advanced dynamic malware analysis approach
involves running the malware using a debugger to examine the internal state of the
executable. This technique can provide more information that may be difficult to gather
using the previous mentioned techniques.

The type of analysis that is required to analyse a piece of malware, varies and depends on
the type of malware that is being analysed, the environment that the malware would infect
and the malware code, since it could be for example obfuscated or polymorphic.

Malware types
When performing malware analysis the procedure and be speeded up by correctly guessing
what the malware is trying to do and confirm that hypothesis with findings. These guesses
can be improved by a good knowledge of malware types and their usual behaviour (Sikorski
& Honig, 2012).

Backdoor: Malicious code that when installed in a system will provide access to an
attacker. Backdoors allow attackers to connect to a system through minimum or no
authentication and execute commands on that system (Agrawal, et al., 2010).

Botnets: Botnets are similar with backdoors but in a larger scale since they infect multiple
systems and the attacker can send instructions to all these systems to perform further
actions (Li, Duan, Liu, & Wu, 2010).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 16

Downloader: This type refers to programs whose sole intention is to download other
malicious code. They are commonly used by attackers during the early stages of an attack
since the downloader program will download and install or execute further malicious code
(Peng, Qingping, Huijuan, & Xiaoyi, 2010).

Information stealing malware: Programs used to steal information form a system and send
it to the attacker. The information could be data, password hashes or keystrokes or other
useful information for the attacker (Wang, Mao, Wei, & Lee, 2013).

Rootkit: Rootkits are programs that are used to hide other malicious code and are usually
paired with other types of malware, such as backdoors, to obfuscate the detection of an
attack (Baliga, Ganapathy, & Iftode, 2011).

Ransomware or Scareware: These are malware that are designed to frighten people or
corporations. Their purpose is to force the victim to comply with the attackers demands
and perform a payment to remove the malware and reverse its effects. The most common
action of Ransomware is to encrypt data and require the victims to pay to restore their
data (Garber, 2013).

Spam-sending malware: Malware that uses infected systems to send spam to other
systems in order to generate income for the attacker by selling spam services or perform
social engineering attacks (Wang, Mao, Wei, & Lee, 2013).

Worms and Viruses: Both are comprised of malicious code which can copy itself in order to
infect other systems as well. Worms are replicating through network connections and
services while traditional viruses infect a single system (Wang, Mao, Wei, & Lee, 2013).

These general types of malware may include most if not all types of known malware
nowadays but in order to analyse them and extract information on the actions that the
malware is performing when launched in a system, it is important to set up a safe
environment to avoid infection of other machines and networks.

Sandboxes
A safe environment to perform the analysis of a malware is very important because it helps
in mitigating the risk of exposing production machines and networks to unnecessary risks.
Virtual Machines can be very helpful in malware analysis since they provide a safe and
controlled environment (M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013). An air-
gapped network is a network of machines that are disconnected from other networks and
therefore virtualizes a real environment but there are disadvantages due to the fact that
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 17

the lack of internet connection can be essential for a piece of malware to show its purpose
by receiving commands, updating or performing other actions.

Virtual machines and complex structures or networks of virtual machines can be used to
create sandboxes as they can provide a safe environment to analyse malicious software or
actions (Sikorski & Honig, 2012). Sandboxes are usually used to perform dynamic malware
analysis since as when the malicious code is executed it happens in a safe environment
with minimal risk of infecting other machines.

Dynamic Malware Analysis Systems


Dynamic Malware Analysis Systems (DMAs) are widely used since they are automated
platforms that can provide information about unknown executables or potential malware.
These systems are set by their creators to work in a very specific way. This means that each
sandbox may have a different architecture in which a potential malware is executed and
mechanisms are in place to detect the activity of such executables. This can create
detection issues since not all DMAs will be able to detect all types of malware depending
on their architecture and the initial design goals of the system (Massicotte, Couture,
Normandin, & Michaud, 2012). A good example, is the Microsoft SysInternals suite is
popular since it can provide tools for a less automated dynamic malware analysis such as
ProcDump which can isolate and reproduce CPU spikes, Process Explorer which can show
files as well as registry keys and DLLs that are loaded, Process Monitor which can monitor
the file system, Registry processes Threads and DLL activities in real time (Microsoft, 2014).
SysInternals does not offer any automation in the process of monitoring malware activity.
Some of the commonly used, automated DMA systems nowadays include Cuckoo, FakeNet,
Comodo, Anubis, Sandboxie, Buster Sandbox Analyzer (M.Egele, T.Scholte, E.Kirda, &
C.Kruegel, 2011)

1.4 Dynamic Malware Analysis Systems Technical Review


Cuckoo
Cuckoo is a fairly new DMA system. It is regularly updated since its first beta launch in 2011
and by January 2012 the developing team added an online feature through the website
Malwr.com which allowed users to use cuckoo as an online platform instead of installing it
to a system or a virtual machine to perform analysis tasks. Cuckoo is a system with many
capabilities since it is designed in order to be able to retrieve as much information as
possible when an unknown executable or an unknown malware is executed in it.

It is able to retrieve:

 Traces of win32 API calls by processes spawned by malware


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 18

 Files being that are downloaded created or deleted by the malware during
execution
 Memory Dumps of processes initiated by the malware
 Screenshots of the windows machine (sandboxed VM) during execution of the
malware
 Complete memory dumps of machines (Cuckoo Foundation, 2014)

Its design goals were not restricted to executables. They included other forms of common
files and it has also been designed so that it can be integrated into larger frameworks. It is
able to analyse common file types like ZIP, JAR, VB scripts, PHP scripts, Microsoft Office
Documents, PDF documents, DLL files and generic Windows executables as well as other
types of files (Cuckoo Foundation, 2014).Cuckoo is designed around a main host machine
which runs the management software, which is software designed to control the rest
virtual machines and manages the analysis, while the analysis procedure takes place in
other virtual machines that report to the main host. A big advantage of this system is that
every instance is run on an isolated machine to avoid collisions between different analysis
procedures (Cuckoo Foundation, 2014).

Figure 1: Cuckoo's Architecture


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 19

Sandboxie and Buster Sandbox Analyzer


Sandboxie is another automated system that can be used for dynamic malware analysis. It
is designed to provide safety for systems by creating a sandbox on a hard drive which
intends to prevent rogue software or installation of programs that were not given
permission by the user to install as well as different types of malware such as viruses,
worms, spyware to spyware and ransomware (Sandboxie Holdings, 2014). It is a
configurable system to meet the needs of the user. It allows the execution of programs and
browsers in a sandboxed environment such as web browsers to avoid infection from
malicious software or code and prevent attacks by giving the user the option to force
running an application in the sandbox and at the same time it allows multiple sandbox
executions (multitask).

It is designed so that a sandbox is created on the hard disk of a machine and any changes or
data that can be created by applications or stored for future use are restrained in the
sandbox. In this isolated area on the hard drive the changes that applications or
executables may cause do not affect the rest of the system. Instead of having applications
write data on the hard drive in multiple locations spread on the hard drive the data is
stored inside the sandboxed area.

Figure 2: Sandboxie's architecture

Buster Sandbox Analyzer is a freeware tool designed to work with Sandboxie in order to
analyse how processes behave in a sandboxed environment while taking in account the
changes they make in a system, by monitoring the Windows API to assist analysts in
evaluating if these processes have malicious intentions or perform suspicious actions. It is
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 20

able to detect file system, registry and port changes. From the changes made Buster
collects information to evaluate whether there is some suspicious activity in the actions
taken by applications that are executed in a sandboxed environment. Buster is a great tool
to be used in combination with Sandboxie since its main goal is to detect any suspicious
activity from applications and monitor their behaviour while in sandboxed environment
(Buster, 2013). It can also take into account other actions performed by programs such as
key logging ending a windows session loading drivers and installation paths for any kind of
installation.

FakeNet
FakeNet is another automated malware analysis tool. It simulates a network to trigger
malware that are trying to connect with a remote host over the internet. It is able to track
the activity of executables when trying to contact some other machine or request
instructions through a network connection by virtualizing a network and trying to detect
activity from the file that is being executed through FakeNet. Although the tool supports
network detection it performs no system detection so it is designed specifically for network
propagation detection and activity of malware like worms. FakeNet uses a variety of
windows and third party libraries to support DNS, HTTP and SSL (Siko, 2014). The most
important feature of this dynamic malware analysis system is that the HTTP service is able
to serve meaningful files like web pages or images upon request and therefore it can
trigger a piece of malware to show its true intentions in case it is trying to contact some
well known webpage to check for an internet connection and then perform the malicious
actions it is intended for.

Anubis
Anubis is an acronym for the Analysis of Unknown Binaries and is based on TTanalyze.
Anubis executes samples in an emulated Windows XP as guest in Qemu (Bayer, Moser,
Krugel, & Kirda, 2006). The analysis takes place online by uploading the executable with
which an analysis instance is initiated and the malware is executed in a cloud sandbox.
Analysis is being performed by detecting and analyzing Windows API functions and the
parameters passed to these functions (Bayer, Moser, Krugel, & Kirda, 2006). Function calls
are monitored by checking the instruction pointers of the CPU with known entry points of
these functions in shared libraries. This procedure is complemented by DNS monitoring for
the analysis system. Anubis also passes arguments of functions to call-back routines that
are designed to perform the analysis and monitoring steps such as file and registry handling
operations.

The above dynamic malware analysis systems and tools can prove very helpful in the
analysis of a malicious executables and each of them can face its own limitations due to
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 21

design issues which may not be easily detected. Therefore it is important to research on
the academic work and advancements in dynamic malware analysis in order to access the
tools and their capabilities.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 22

2.0 Literature Review

2.1 Introduction
This chapter researches topics relevant to the aims and objectives defined in the
introduction of the thesis. Automation of dynamic malware analysis has been studied by
academics during the past years which lead to increasing the popularity of the DMAs.

Several studies have discussed and designed techniques that could be used to detect
malware activity. DMA systems also face limitations and issues due to the obfuscation and
detection evasion techniques employed by the malware authors who try to develop stealth
malware to achieve their goals without traces. Another issue is that sophisticated malware
may detect analysis environments. To counter these issues, several ways to avoid the
detection of analysis systems have been proposed by researchers.

Studies have used different approaches to evaluate the detection abilities of DMA systems
such as interpretation discrepancies studies, implementation comparisons and quantitative
approaches in respect of the results produced by the analysis performed by the systems.

Finally, SIEM systems have been used lately for security monitoring of large infrastructures
due to their ability to process large amounts of data. Such systems are similar to DMA
systems and can possibly be used for dynamic malware analysis.

2.2 Dynamic malware analysis techniques


Even though most available systems nowadays have been studied and assessed, it is more
common to find these studies in University libraries rather than at a chief security officer's
desk due to distribution issues (M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013). In
order to assist in malware detection and analysis several techniques have been introduced
and studied. Anomaly-based detection is a detection method that is based on detecting
suspicious activity compared to a certain pattern of operation which is considered normal.
Another technique involves malware honey pot infrastructures which are intended to
attract malware by presenting an environment that can be infected and compromised
easily while the environment is just tricking a piece of malware in showing what it is
intended for, in a non production, safe environment. The most common technique is
sandboxing. Sandboxing usually requires virtual machine emulation which can cause issues
in detection, since malware authors are aware of this technique and its popularity and are
trying to find ways to detect VME and therefore avoid detection by inserting mechanisms
in the malware like not performing any actions if a virtual machine is detected
(M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 23

Function call monitoring


Functions are parts of code that perform specific tasks. They are interesting in program
analysis because they abstract details of implementation and help in quickly acquiring an
overview of the functionality of a program. It is possible to intercept a function call and this
process is called hooking (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). Function hooking
is actually a monitoring process which can for example record the function invocation to a
log file or analyse the parameters that were inputted. This process can be very helpful in an
analysis procedure since most malware are using the operating system API of the infected
machine. APIs consist of functions that describe certain functionality and this can be very
helpful when for example a piece of malware is manipulating the Windows API. System
calls are well designed instructions that provide applications the ability to use operating
system sources to perform actions on their behalf because the applications are not allowed
to directly interact with the system environment since they are run in the user space while
the operating system is running in the kernel space is able to perform such actions
(Massicotte, Couture, Normandin, & Michaud, 2012). Hooking API functions provides an
analysis tool the ability to monitor a program in a more abstracted level, since the
monitoring involves the function being called.

Windows API functions can be used in detecting even network propagation of malware. By
monitoring and tainting file operation, network operation and process operation functions
combined with multiple process tracing it has been shown that propagation of most worms
can be detected successfully (Wu, Zhang, Lai, & Su, 2012). Specifically monitoring the
functions of Ws2_32.dll which is a network control Dynamic Link Library of Windows
systems, self propagation behaviour can be monitored and reported.

The fact that user space applications require to use system calls to perform actions is very
important. During a dynamic malware analysis procedure, monitoring mechanisms that
focus on the user space can produce rich results. Such results can aid in a better
understanding of the functionality of malware samples.

Function parameter analysis


Dynamic function parameter analysis is focused on the analysis of the values that are used
when a function is called. This helps in correlating the different function call on a given
object and can provide useful information about a programs behaviour since it provides a
different point view in the analysis procedure. An object oriented analysis in that manner
can provide valuable information which can be correlated with function calls and provide a
deeper understanding of the functionality of a potential malicious program (M.Damshenas,
A.Dehghantanha, & R.Mahmoud, 2013).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 24

A good example of such analysis and its importance, as part of dynamic malware analysis,
has been shown in a study of worms self propagation and was reported that function
parameters can provide a lot of information during such procedures since they can contain
port numbers, destination addresses, socket IDs and so on (Wu, Zhang, Lai, & Su, 2012).
Setting monitor mechanisms to capture arguments of functions can prove essential and
assist in understanding how malware operate.

Information Flow Tracking


Information flow tracking focuses on analyzing how a program processes data. The aim of
this technique is to highlight how data that may relate to a potential malware are
propagated in a system. The data that is interesting and require monitoring are marked or
tainted using various techniques in order to keep track of taint levels of certain data since
the taint levels are increasing when the data is being processed. The implementation of
such monitoring can be hard to achieve since the mechanism would be depending on the
application that is to be analysed and the language it was coded in (Shan & Shuangzhou,
2011). However information flow tracking is not a technique that is widely used for
malware analysis and if such techniques are implemented in malware analysis tools, it is
possible that the malware authors will find mechanisms to circumvent such systems.

Autostart Extensibility Points


Autostart Extensibility points are mechanisms that allow programs to be run when an
operating system is booting or when a specific application is launched. Many malware
nowadays try to sustain themselves over reboots of machines by adding instructions for
the malware execution in available autostart extensibility points (Dai & Kuo, 2007).
Therefore it is important for an analysis procedure to monitor such functions of
applications and operating systems when performing analysis of a malware sample.

Implementations of Dynamic Malware Analysis systems


Each system that is created for dynamic malware analysis is defined by the decisions of the
creators and these decisions can have various consequences. Systems that utilize the CPU
in higher privilege levels compared to the program being analysed cannot be affected by
the program. If a system is running with the same privileges as the program being analysed
then it requires stealth mechanisms to avoid detection from the program under analysis.
Malware that are executed in higher privilege than the analysis system can hide and avoid
detection (Sikorski & Honig, 2012). Therefore, it has been shown there can be drawbacks in
the various methods of implementation used to date.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 25

Analysis in Kernel or user Space


This method of analysis allows gathering information such as API calls and invoked
functions and can access memory structures and high level information from the operating
system. It is difficult to hide the analysis component when it is running in kernel space since
it is usually not possible to hide a process from all other processes running on the system
from the user space alone (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). Additionally
when running an analysis component in kernel mode it is complicated to perform
information flow tracking effectively.

Analysis in an Emulator
Execution of a program in an emulated environment gives the analysis system the ability to
control every aspect of the program under analysis. This technique relies on emulating
different components such as memory and CPU to execute the program. This is a major
drawback since the program could be easily able to detect an emulation environment by
requesting an operation that is not sufficiently implemented (Yin & Song, 2013). Full
system emulation can solve this issue since a full system is being emulated and all
components would be in place but there is still chance that sophisticated malware may
detect imperfections of CPU emulation and know they are run under an emulation
environment and stop their execution or not reveal their true purpose but the emulation is
not noticeable then the analysis component would not be detected from a program being
analysed.

Analysis in Virtual Machines


The main advantage of VM technology is that the privileged state of the physical machine is
not directly accessible in any VM (Stone-Gross, Cova, Gilbert, Kemmerer, Kruegel, & Vigna,
2011). This means that the privileged state of a VM is managed by the Virtual Machine
Management component. In contrast to emulators the host and guest share the same
architecture and this may result in the guest system to execute non physical instructions
resulting in improved performance (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). On the
other hand, Virtual machines provide strong isolation on recourses like emulators do.
Therefore an analysis component can potentially be stealthy and avoid detection.

Resetting the analysis environment


In order to achieve a fresh state for the system there can be performance issues since such
a procedure can be very time consuming. Software solutions are the slowest method for
resetting an analysis environment since the original clean state needs to be imaged before
any use and after each analysis the image is required to reset to the original state (Sikorski
& Honig, 2012).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 26

Virtual machine snapshots are very efficient in this procedure since the machines are saved
as regular files on the host machine hardware (VMware, 2014). By creating snapshots the
VMM is able to restore the machine in the previous state before the analysis procedure
had started. This mechanism can be used to reduce the time that would be needed for a
full reboot for a clean instance.

Resets in the analysis environments are of crucial importance since malicious code is being
executed and changes may have been made to the analysis system. These changes may
influence later analysis procedures in producing accurate results when detecting malware
activity. Therefore it is vital that each analysis instance is run in a clean state system.

Network simulation in DMAs


Network simulation is of high importance since modern types of malware rely on a network
connection to receive instructions, download other malicious code or contact the attacker
and therefore without network simulation such activities would not be detected by an
analysis system. In addition to providing internet access to a piece of malware it is also
important to provide easy targets like honeypots for infection to study its propagation
methods (Gorecki, Freiling, Kührer, & Holz, 2011). There are two types of restricting
network access.

A simulated network is one method of providing a program a network to access but it has
no connection to the real Internet. Common services can be simulated and traffic
redirected to these services. Such services can be DNS, IRC, SMTP servers and FTP servers
(M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). In the case that the simulation is
sophisticated it is high likely that the malware will exhibit malicious behaviour but if the
malware is designed to request updates or information through the Internet then its
actions may not be shown.

Filtered Internet access is achieved by allowing the malware to access the Internet but the
communication channel is limited and strictly monitored. Filtering outgoing traffic through
IDPS can achieve such results.

Network simulation techniques have been implemented in different platforms. An


interesting approach using both methods in an attempt to study possible improvements
and malware samples taken from Anubis has been reported. The system implemented was
able to use both methods Full emulation and half proxy as mechanisms for network
simulation and filtered access. The system design that was proposed used a target machine
in which malware was executed while all traffic was monitored through an interceptor,
while using parallel logging and a dispatcher forwarded traffic either to local services
emulated or the internet through the half proxy. Popular local services were used like FTP,
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 27

IRC, SMTP and web servers as well as a DNS and the study showed that more than 50% of
the malware that was executed initiated traffic in full emulation mode (Gorecki, Freiling,
Kührer, & Holz, 2011). The half proxy method reported wide activity as well but it was more
successful in terms of logging more information about the malware activity such as
fingerprinting and cracking attempts due to the responses from the emulated services .

Internet emulation is very important in dynamic malware analysis to identify traffic


generated by malware and analyse the actions they perform. Proxy emulation can be very
successful in detecting activity in the case the targeted services are unreachable by the
system that the malware is executed in. DNS emulation can be crucial in detecting more
activity originated from malware that are trying to access the Internet.

2.3 Issues and Limitations of Dynamic Malware Analysis systems


Security analysts are faced with more and more sophisticated and evolved malware. It is
hard to mitigate the risks that derive from such malicious programs, since the procedure of
analyzing a malware's behaviour and creating efficient countermeasures can be very time
consuming and require deep knowledge of the different features of systems that a
malware may exploit. Malware authors use various techniques to avoid detection and get
their malicious software to propagate as much as possible because these programs can be
used to produce profit (Sikorski & Honig, 2012).

Self modifying code is one of these techniques. Malware authors have been using this
technique because it can make static analysis of malware more difficult and disguise the
malicious intents of the code (Kawakoya, Iwamura, & Itoh, 2010 ). These modifications
used to be included within the program itself but recently Packers are being used for such
tasks. A Packer can automatically transform an executable using different syntax but
equivalent representation. This is achieved by encrypting or obfuscating the original binary
and stored as a new executable. An Unpacker routine can be used to restore the data to its
original state and this procedure takes place in the memory which prevents leaking any
unpacked information about the executable to the disk and therefore it is a robust
procedure that makes traditional malware analysis even more difficult. Polymorphic
variants of the same program can be created by choosing different or random encryption
keys (Kawakoya, Iwamura, & Itoh, 2010 ). On the bright side, the unpacking routines are
fairly similar and can be detected. Metamorphic variants are much better protected since
they can mutate unpacking routines as well and in addition to that most malware
nowadays appears in packed form and malware instances tend to apply recursive layers of
packers creating more problems for security analysts.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 28

A big advantage of Dynamic Malware Analysis is that since the program is being run the
unpacking routines are executed and the original binary is then run therefore it can be
monitored for malicious behaviour (Sikorski & Honig, 2012). Detection can be achieved by
detecting the unpacking sequence by enforcing “write xor execute” (W ⊕ X) policy in the
packed binaries' memory pages. This can be achieved by restricting memory pages as read
only and once the binary tries to write to a memory page a fault occurs. Once the fault
occurs the permission is changed to read/write only. Therefore once the original binary is
ready to execute it receives another fault due to page protection settings (M.Egele,
T.Scholte, E.Kirda, & C.Kruegel, 2011). By catching execute page faults due to permission
restrictions the analysis system can identify the end on an unpacking procedure and the
instruction that raised the fault indicates the entry point for the original binary. However,
this technique can be detected by malware if the program is set to query the page
protection settings.

Detection of analysis systems and vulnerabilities


One of the issues with dynamic analysis derives from the fact that not all possible
execution flows can be covered. The information is collected when the program is executed
only executed paths are considered. This can lead to detection of analysis platforms
(Sikorski & Honig, 2012).

In addition, there are other techniques that can be used to identify an instrumented
environment that malware authors use to protect their malicious software from showing
its true nature:

 Hardware: Virtual machine devices can be identified easily (e.g. pcnet32 which is
used by VMware's network interface or QEMU HARD-DISK virtual hard disks)
 Environment of Execution: Artefacts can be recovered from the execution
environment that are used to monitor processes such as a debugger Windows API
call
 Other Applications: The presence of popular monitoring applications such as
registry monitors can also be detected indicating an analysis system.
 Behavioural: Execution times can differ between real and virtualized environments
and it can be easy for a malware to detect these timing differences

Dynamic Malware Analysis systems like all systems may come with design flaws and
therefore vulnerabilities that may be exploited for malicious purposes. A study has shown
that deploying a decoy executable in online dynamic malware analysis systems for analysis
can provide information to the attacker about the analysis system (Yoshioka, Hosobuchi,
Orii, & Matsumoto, 2010). The decoy created for this purpose was a binary that would
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 29

contact a server with a unique identifier to report the IP address of the sandbox and this
behaviour was triggered only if the sandbox was connected to the internet. The study
measured the requests the decoy managed to pass to the control server and whether the
system was isolated or filtered and therefore secured. The Study was performed in nine
systems and six of these systems IP addresses were identified by the decoy and reported to
the control server through the Internet connection and therefore information about the
systems were revealed (Yoshioka, Hosobuchi, Orii, & Matsumoto, 2010).

This study has shown that it is possible for authors of malicious programs to create decoys
and identify Dynamic malware analysis systems and therefore insert new controls in
malware programs to detect them which could be for example specific IP ranges. This is
another important issue, because it is essential for dynamic malware analysis systems to be
secure from decoys, which may aim in exhibiting information about the systems, but at the
same time an internet connection is vital to reveal the behavior of many malware.

Countering Detection of Analysis Systems


Since many malware are designed to stop exhibiting their malicious activity when a
virtualized system is detected this can be considered as a safety mechanism for cloud
services and virtualized infrastructures. The fact that many companies and cloud services
are moving towards virtualized infrastructures can be an issue for malware to attack as
many targets as possible since they would stop their execution (M.Egele, T.Scholte, E.Kirda,
& C.Kruegel, 2011).

Additionally a way to mitigate malware risks is to deploy fake artifacts that imply the
existence of a virtual machine and therefore the malware would stop its execution (Sikorski
& Honig, 2012). The growing acceptance of systems virtualization technology on the other
hand may lead to an ineffectiveness of this technique since malware will want to attack
virtual environments to maximize their target spectrum possibly at the risk of detection. In
order for the attackers to avoid detection they would either have to find a way to detect
analysis systems specifically or virtualized environments that do not include monitoring
mechanisms.

One way to mitigate the detection of VM emulation that has been proposed is to use
virtual machines but through undocumented options. This is based on virtual machines
configurations and configuration files because the attackers rely on documented
configurations for the detection of virtual environments, while undocumented
configurations may not be detected by malware and make them show their intended
activities (M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013). This may be an option
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 30

but it would most likely result in issues in the virtual environment like the loss of control of
some functions of the virtual machines.

Another method that has been proposed is partial instead of full virtualization. Full
virtualization provides high management level of vulnerabilities without changing
requirements to an application and if an application is affected by malware, the attacker
has limited access to the system but does not affect the host. In partial virtualization which
can be found in many applications nowadays like Chrome the host system is not separated
from the application and therefore the system is at risk (Stone-Gross, Cova, Gilbert,
Kemmerer, Kruegel, & Vigna, 2011). Partial virtualization is therefore not a risk free option
since there is good chance that the malware may affect the host machine.

In implementations that perform analysis tasks in the same operating system instance as
the analysed malware it is possible to hide components by introducing rootkit techniques.
Analysis processes can be hidden from a malware sample by introducing a rootkit which
filters results of API calls that list loaded processes.

Analysis systems that operate outside the operating system in which the malware is
executed can also create shadow copies of memory pages that may not be accessible by
the analyzed malware at all. This can help if the malware is monitoring the memory page
tables to detect an analysis system.

2.3 Dynamic Malware Analysis Systems Detection Evaluation


As discussed previously, different dynamic malware analysis systems rely on their
implementation and the monitoring mechanisms introduced for the detection of malicious
actions performed by malicious programs. Therefore by testing different dynamic malware
analysis systems, it is likely that there will be discrepancies in the reports for specific
malware samples and these discrepancies may be produced for various reasons. In studies
to assess the accuracy of widely used dynamic malware analysis systems, different
methodologies have been used.

One type of accuracy assessment which has been studied in the past has been based on the
implementation of such systems and their capabilities in respect of the design of each
system (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). Each system was studied
separately, and therefore its monitoring abilities, to produce a report which reveals the
differences between the systems studied and in many cases the lack of monitoring
techniques implemented in each system.

This study has shown that each system's ability to detect malicious executables is mostly
based on each system's author's point of view regarding the implementation and the
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 31

monitoring techniques implemented in each system. If the author has not implemented a
monitoring technique, related activity to that detection technique may go unnoticed by the
analysis system. Additionally, malware have evolved in the past years turning to more
network based types that require internet or network connection to proceed with their
malicious actions and therefore this is a major issue for such systems if the network
monitoring techniques implemented do not comply to the malware activity the system
would be unable to detect the network malware activity.

Norman Sandbox (5.4)

Behav. Spyw. (5.10.1)


Learning & Clas. (6.3)

OmniUnpack (5.9.4)
Multipath exp. (5.2)

PolyUnpack (5.9.3)

Panorama (5.10.3)
CWSandbox (5.3)

Behav. Clas. (6.2)


Hookfinder (5.8)
Clustering (6.1)

TQana (5.10.2)
I Renovo (5.9.2)
WiLDCat (5.7)

Justin (5.9.1)
Anubis (5.1)

Joebox (5.5.)

Ether (5.6)
Analysis implementation
User-mode component o o o • • o • o o o • o • • • o o •
Kernel-mode component o o o • • o • o • • • • o o o o o o
Virtual machine monitor o o o o o o o • o o o o o o o o o o
Full-system emulation • • • o o o o o o o o • o o o o o o
Full-system simulation o o o o o • o o o o o o o o o o o o
Analysis targets
Single process • • • • • • • • • • • • • • • • o •
Spawned processes • o • • • o • o o o o o o o o o o o
All processes on a system o o o o o o o o o o o o o o o • • •
Complete operating system o o o o o o o o o o o o o o o o • o
Analysis support for
API calls • • • • • • • o o • o o o o o • • •
System calls • • • • • • • • o • o o o • • • • •
Function parameters • • • • • • • o o • o o o o o • • •
File-system operations • • • • • • • o o • o o o • • • • •
Process/thread creation • • • • • • • o o • o o o o • • • •
Registry operations • • • • • • • o o • o o o • • • • •
COM components o o o o o • o o o o o o o o o • o o
Detecting dyn. generated code o o o o o o o o o o • • • • o o o o
Restoring packed binaries o o o o o o o o o o • • • • o o o o
W φ X page protection o o o o o o o o o o • • o • o o o o
Multiple layers of packers o o o o o o o o o o o • o o o o o o
Signature matching after unpacking o o o o o o o o o o • o o • o o o o
Instruction trace o • • o o o o o • • o o • o o o • o
Information-flow tracking o • • o o o o o o • o o o o o • • o
Multiple-path exploration o • o o o o o o o o o o o o o o o o
ASEP monitoring o o o o o • o o o • o o o o o o o o
Networking support
Simulated network services o o o o o • o o o o o o o o o o o o
Internet access (filtered) • • • • • • • • • • • • • o • • • •
Table 1:Implementation Techniques of DMAs (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011)

Table 2:Implementation Techniques of DMAs (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011)

The results of the study highlight which monitoring techniques where used in each system,
for the systems that were included in the study. The lack of implementation of monitoring
techniques can cause inability to detect certain types of malicious activity. It is therefore
safe to assume that these systems could produce different reports for different types of
malware and possibly for same malware families since they may not be able to monitor all
the actions that a malware may perform when executed in a system. The differences in the
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 32

reports would be in different detection categories such as registry activity, file system
activity or network activity depending on the implementation of each system.

Another evaluation study of DMA Systems has been conducted using 74 different malware
samples on 8 different analysis systems and monitoring 46 different actions that a malware
sample may perform varying from process actions performed to network protocols and
ports used (Massicotte, Couture, Normandin, & Michaud, 2012). The measurements taken
in this study consist of abstracted reports by a Boolean vector after a single execution of
each malware sample in each analysis system. The discrepancies were grouped in four sets
of causes that were manually verified, plain bugs, environment, post-analysis and filtering
and semantics. The discrepancies recorded were grouped in four groups as well one group
where at least on analysis system disagrees with the findings of the rest about an action
performed by a malware sample, a group that all the analysis systems agree on an action
performed by the malware sample, a group where all analysis systems agree that an
actions was not performed by a given sample and a group where none or only one analysis
system reports an action.

The result was 33.9% of discrepancies between the systems studied. The study concluded
that the discrepancies recorded were due to several reasons. The reasons were categorised
according to the cause of the discrepancies. The categories included bugs related to the
submission interfaces or reporting mechanisms, environment issues depending on analysis
system configurations, operating systems, lack of software that the samples are trying to
manipulate, network issues from systems not allowing the malware to access the Internet,
issues with detection in respect of infected documents and disagreements between the
analysis systems on values reported and what actions that are performed and logged
actually mean (Massicotte, Couture, Normandin, & Michaud, 2012).

All the studies reviewed so far, however, agree on the fact that there are issues concerning
the accuracy of dynamic malware analysis systems in analyzing a malware sample. The
issues discussed are depending mainly on the implementation of each analysis system and
report issues or differences between the systems. It is therefore important that dynamic
malware analysis systems that will emerge in the future are implemented with caution and
as many monitoring mechanisms as possible both on the system that the malware is
executed as well as the network monitoring for that system. Additionally, updating abilities
considering bugs and reporting issues need to be introduced by developers, and there is
also a need for a unanimous framework for interpretation of the actions, that malware
samples are performing and how they are reported.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 33

Due to these issues that exist in Dynamic Malware Analysis systems, other types of analysis
systems could be eligible to completely or partially replace them. Since the implementation
of DMAs is based on monitoring actions that are happening in a system or a network these
monitoring techniques and mechanisms could be very similar to logging mechanisms and
therefore important information could be retrieved by deploying logging mechanisms in a
system or network and analyzing the logs produced in a centralized environment similar to
that provided by SIEM systems (Gabriel, Hoppe, Pastwa, & Sowa, 2009).

2.4 SIEM systems and Dynamic Malware Analysis


Security Information and Event Management systems are widely used by organizations
around the world today due to their data mining and centralized log processing abilities.
These systems have become popular since they can provide mechanisms to process
enormous amounts of data or security logs, in a centralized platform with comprehensive
and customizable interfaces that help in searching, detecting and interpreting such
information (Aguirre & Alonso, 2012). Organizations nowadays and their Security
Operations Centres (SOCs) come up against enormous amounts of security data originating
from a vast number and various types of monitoring systems that need to be analyzed fast
and accurately to detect attacks and deploy countermeasures to mitigate the risks that
derive from them (Madani, Rezayi, & Gharaee, 2011). In order to improve global cyber
security researchers have proposed better uses of big data analytics to mitigate past,
present and future security issues (Patel, 2012).

Even though big data analytics is changing the face of security technologies, forensics and
network monitoring, this progress cannot be considered a solution and research in security
must go on (Cardenas, Manadhata, & Rajan, 2013). Additionally, two more issues are
discussed by researchers concerning privacy, in terms of information sharing between large
corporations and law enforcement agencies, as well as, authenticity and integrity of data
collection related to the collection mechanisms.

SIEM systems can address the issue of short term handling of events and at the same time
help in long term improvement of security architectures in organizations. These are
achieved due to the fact that the data analyzed by SIEMs, derive from logged data within
organizations and therefore can shed light in security issues and assist analysts in quickly
identifying and mitigating the threats. Data mining focuses on identifying interesting and
relevant patterns in data which can be measured and clustered in the form of rules. These
rules can be used to assist security analysts in detecting patterns of malware and threats to
support SIEM procedures (Gabriel, Hoppe, Pastwa, & Sowa, Analyzing Malware Log Data to
Support Security Information and Event Management Some Research Results, 2009). The
research conducted on the rule creation, involved malware factor analysis by performing
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 34

association analysis to bring relationships of interest, between attributes of given data sets,
to light. This was accompanied by permanence and propagation analysis resulting that no
usual data mining methods could be applied to discover how long malware reside in a
system. This problem was addressed by clustering algorithms that clustered malware
incidents recorded by identifying similarities in these events. By doing so, important
information about malware behaviour could be retrieved and analyzed.

Dynamic Malware Analysis systems share a basic common feature with SIEM systems since
both types of systems are based on the analysis of logs produced by monitoring
mechanisms, to derive their security reports. Dynamic Malware Analysis systems use
monitoring mechanisms to log the activity of malware samples when executed in a
virtualized or emulated environment. SIEMs are designed to analyze security logs that are
produced by devices or detection and monitoring systems, strategically placed in networks
and systems, to detect attacks on these systems and help analysts in dealing with
enormous amounts of data (Gabriel, Hoppe, Pastwa, & Sowa, Analyzing Malware Log Data
to Support Security Information and Event Management: Some Research Results, 2009).
The similar way these systems process security information can be explored to identify if
SIEM systems can overcome the issues that Dynamic Malware Analysis systems face. It is
therefore interesting to explore the ability of SIEM systems to assist in the detection of
malware actions in a system.

The main difference between the two types of systems is that Dynamic Malware Analysis
systems generate reports for analysts after executing a malware sample by keeping records
of the actions that the malware is performing. SIEMs do not provide an automated report
but they can gather the information from the monitoring mechanisms and allow the
analyst to explore the information in a single environment to interpret the information.
The fact that SIEMs do not produce automated reports can be considered as non beneficial
in malware analysis but due to the lack of automated reporting, they do not suffer from
interpretation or reporting issues like DMAs do.

2.5 Conclusions
This chapter has showed through recent studies, that even though Dynamic Malware
Analysis systems may automatically produce in depth analysis reports, such systems may
also suffer from design flaws. Design flaws could derive from the lack of holistic approach
in the design of such systems. Due to the variety of system monitoring techniques, that
need to be implemented for the systems to be able to detect malware activity, several
issues arise. Some of these issues discussed in the literature, include plain bugs or
reporting system bugs, that can play an important part in the ability of a system to detect
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 35

and analyze certain types of malware or even produce inaccurate reporting (Massicotte,
Couture, Normandin, & Michaud, 2012).

Another problem is that malware authors have noticed the popularity of these systems and
are employing evasion techniques to make their malicious code more robust against
detection, by exploiting environmental attributes or other features of the systems
(M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). Also, DMA systems have not been
assessed thoroughly, since only a few assessments have been made in studies in respect of
the implementations and reporting, which did not include more recent systems.

Therefore, further research is required in the field of such systems and a unified framework
needs to be established, to avoid discrepancies in reports of Dynamic Malware Analysis
systems.

SIEMs use similar system monitoring mechanisms, and this similarity between the two
types of systems means that SIEMS can be used for dynamic malware analysis (Gabriel,
Hoppe, Pastwa, & Sowa, 2009). The main difference between the two types of systems is
that SIEMS do not produce fully automated reports and therefore they do not suffer from
issues in the reporting system like some Dynamic Malware analysis systems do.

The above points discussed in the literature review, lead to a need for comparison
assessment between DMA systems and their findings to be compared to a SIEM monitored
sandbox to examine the possibility of SIEM systems being able to overcome some of the
issues that DMA systems face.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 36

3.0 Experimental System Design

3.1 Introduction
This chapter describes the design of an experiment to compare common Dynamic Malware
Analysis systems as well as a comparison between DMA systems and SIEM systems in
detecting malware activity. The literature in the field reviewed in the previous chapter
showed that DMA systems are facing several issues while SIEM systems may be able to
overcome them.

In order to assess issues of Dynamic Malware Analysis systems and compare their analysis
results to SIEMs, a SIEM based sandbox needs to be designed. The SIEM sandbox has to be
comparable with current Dynamic Malware Analysis systems, hence it must be wrapped
with common services and monitoring mechanisms that resemble the ones implemented in
the systems it will be compared with.

Malware types that will be used in the experiment are justified based on the literature
review and current research, in order to drive the selection of specific malware samples
from online malware databases which will be used in the comparison experiments.

Finally, the comparison variables must be established in order to assess effectiveness and
issues of DMA systems and the comparison between the two types of systems and their
advantages or disadvantages in each case.

3.2 Sandbox Design


A study on an Internet emulation approach carried in comparison to the Anubis system, has
shown more results in malware analysis in full emulation of Internet access while in half
proxy mode, additional information was gathered compared to previous analysis
performed. This study was based on the creation of a transparent bridge that can redirect
or bypassing traffic to itself and by doing so, avoid outgoing malicious traffic. The client
didn't interact directly with the Internet and DNS was not affected. When working in a half
proxy mode the system redirected all traffic to the local services (Gorecki, Freiling, Kührer,
& Holz, 2011). The half proxy mode provided more results than other systems in many
cases and in some special cases information like downtimes of control servers or invalid
credentials sent by malware.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 37

Figure 3: TRUMAN half proxy architecture

The architecture behind Trumanbox in Figure 3 shows that all traffic is being intercepted
and not allowed to access the internet in half proxy mode since the traffic is redirected to
local services within the sandboxed environment. The additional feature is that the
dispatcher is trying to forward the traffic to log any additional activity by sending out the
malware request packets. In order for a sandbox analysis system to be able to detect more
activity a similar architecture would be required.

Similar features have also been implemented in research systems such as Osiris (Cao, Liu,
Miao, & Li, 2012). The infrastructure of Osiris includes a controlling platform that plays
important role in the guest communications since the malware is being executed on the
guest machine. Based on a QEMU implementation, data pipes are created for services for
better monitoring of process activity.

Another research DMA system called iPanda, is implemented using a control channel
between the main analyzer and the guest machine, whose purpose is to extract
information about the network behaviour complemented by information flow tracking,
system call monitoring, function code identification, detection evading techniques
monitoring (Xie, Lu, Su, Wang, & Li, 2013). Since the malware authors are more prone to
utilizing the internet to control their malware, recently designed DMA systems include
more complex network monitoring mechanisms to trick malware in showing their true
purpose.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 38

To achieve successful monitoring and to provide malware sample targets for use, a
sandboxed environment can be created by executing a malware sample in a network
environment or a host that is utilizing common services which aim in taunting the malware
sample to show its intended activity and an Intrusion Detection System to log interesting
network traffic. In addition, a firewall must be set up to prevent any unwanted traffic
leaving the sandboxed environment since it could cause propagation of the malware
sample (M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013). Also it is possible to filter
Internet access by introducing a proxy server that will also produce logs for Domains
requested and possibly replies or remote commands from a malware control server. DNS
requests and their logs can prove very important since many malware nowadays are
designed to contact servers on the internet that will respond to the malware providing
further malicious code to be downloaded, updates, or even commands that can be
executed once received by the malware to serve the malware author's purposes.

The importance of resetting the analysis environment has been discussed in the literature
and since the analysis environment requires several machines the analysis system should
be implemented in a virtual environment in order to provide a quick and safe way to reset
the machines to a clean state (Kangarlou, Xu, Ruth, & Eugster, 2007). Virtual machine
snapshots can produce the desired result in time efficient way. VMware Workstation could
provide such an infrastructure for the implementation (VMware, 2014).

Figure 4: Experimental SIEM sandbox topology design for malware detection and analysis
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 39

It is very important for the sandboxed environment to be able to control traffic leaving the
sandbox and block any malicious traffic since there is high probability that the malware
samples may utilize the network for propagation (Yu, Gu, Barnawi, Guo, & Stojmenovic,
2014). Therefore, the sandbox environment must be secured with as many as possible
mechanisms to avoid any unwanted infections during the comparison experiments. A
firewall, dropping traffic at the edge of the sandbox, or a proxy server that intercepts and
does not forward the traffic, could be very beneficial in providing security logs for analysis

In order to successfully evaluate such systems, well known malware samples need to be
executed in the analysis systems to produce their reports. The samples need to be complex
but thoroughly analyzed, in order to compare the systems on their detection effectiveness
and identify discrepancies in the reports of the systems under evaluation. Three of the
most common malware types will be used in the testing phase to achieve the evaluation
and three samples from each type will be analyzed.

3.3 Comparison metrics


To perform an objective comparison between different types of systems several
characteristics need to be considered:

 The dynamic malware analysis systems may differ greatly in respect of


implementation methodology used by their authors (M.Egele, T.Scholte, E.Kirda, &
C.Kruegel, 2011). Therefore the monitoring techniques implemented in each system
need to be taken under consideration.
 The sandboxed environment in which the malware is executed to retrieve the
actions it performs may be different (emulated-virtualized). This feature may also
have an effect in the detection capability of the system, depending on the
sophistication of the malware sample being analyzed (M.Egele, T.Scholte, E.Kirda, &
C.Kruegel, 2011).
 The one feature that all dynamic malware analysis systems share is that they
generate a report through various platforms (browser interface, platform interface,
text report) as shown in Figure 5 (Massicotte, Couture, Normandin, & Michaud,
2012).
 SIEM systems are solely based on the monitoring mechanisms that are feeding data
to the system, in order to detect any anomalies or malicious activity and report on
them (Gabriel, Hoppe, Pastwa, & Sowa, 2009).

The above characteristics lead to the conclusion that an objective comparison between
SIEM systems and Dynamic malware analysis systems can be achieved in two areas.
Implementation is the first, since it is depending on the ability of the SIEM system to
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 40

process data from as many monitoring mechanisms as possible and the compatibility of the
products of the monitoring mechanisms to the SIEM processing engine, SIEM systems could
possibly produce better results in detecting malware activity through a dynamic sandbox
setup, while Dynamic Malware Analysis systems use a fixed setup as designed by the
systems author and therefore it cannot be easily redesigned to catch types of activity that
may be detectable with slight changes in the way the sandbox works. An implementation
study and comparison between dynamic malware analysis systems has been conducted in
a study that concluded with a table of monitoring mechanisms implemented in each
system (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). The second feature is reporting
since, Dynamic Malware Analysis systems are based on a producing easily readable reports,
and this process again comes with a cost such as reporting bugs or interpretations of
activity based on the systems author's point of view, while SIEM systems provide all the
available information to analysts who can then interpret the information as needed.

Additionally, a recent study on the accuracy in system call-based malware detection used a
quantitative approach to measure accuracy of malware detection systems (Canali, Lanzi,
Balzarotti, Christoderescu, Kruegel, & Kirda, 2012). The study, identified problems in
malware detection based on behavioural models, concluding that the most accurate
models are the ones rely on fewer details about malware activity and how parameters in
behavioural models impact on accuracy and the reduction of false alarms. To achieve this,
several malware samples were used and detected by different systems based on specific
behavioural models for each detection.

The study was relatively similar to a comparison between DMA systems, since the systems
studied performed behavioural malware detection, which is a feature of DMA systems.
Therefore another metric that could be used to compare the DMA systems is a quantitative
approach based on the amount of events detected in each event category, such as registry
activity, file system activity, DNS request, for each malware sample.

Figure 5: Analysis Procedure


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 41

These reports are usually describing the activity of the malware sample by listing functions
called by malware or actions the malware has performed, processes it has tried to hook
itself on or sockets created and services it has started and other useful information
regarding its behaviour. As shown in a recent study, the comparison of reports can lead to
discrepancies in the results reported due to various reasons (Massicotte, Couture,
Normandin, & Michaud, 2012). Therefore, in order to compare these reports and identify
detection issues the results must be compared with verified malware databases. Such
databases, store hash values of known malware samples, in order to identify them even if a
packer or other obfuscation techniques have been used to hide the malware, while the
malware samples that can be found in such databases have been thoroughly analyzed and
all the actions that the malware performs are well documented. Malware are better
identified by hash values because each antivirus or security software vendor may use
different names to identify the very same malware sample. An examples of such databases
are Virustotal (VirusTotal, 2014), Virussign (VirusSign, 2014), Malwr (Shadowserver,
2014),etc which are databases that provides analysts with information on malware
including hashes and reports from different sources such as well known antivirus
applications and security organizations.

3.4 Malware samples


The malware samples that should be used in such an experiment need to derive from
categories of malware that would utilize as many features of a real environment as possible
in order to assist in the detection assessment of the systems. Malware nowadays are more
and more sophisticated to infiltrate large networks and propagate to infect as many targets
as can be found by the malware in a stealthy manner to perform their malicious actions
(Yu, Gu, Barnawi, Guo, & Stojmenovic, 2014). Since most types of malicious software have
very specific aims in target systems, different types of malware would produce richer
results. Recent types of malware are more and more internet or network based and need a
network connection in order to propagate (Jordan, Chang, & Luo, 2009).

According to a recent detection accuracy study based on behavioural models (Canali, Lanzi,
Balzarotti, Christoderescu, Kruegel, & Kirda, 2012), the challenge for such comparisons is to
find an optimal representation for an enormously large set of malware. The latest types of
malware are based on network or internet connection to perform malicious activity due to
the fact that these types of malware usually require instructions from their command and
control centre (Stone-Gross, Cova, Gilbert, Kemmerer, Kruegel, & Vigna, 2011). Therefore,
the network detection mechanisms of the DMA systems need to be assessed since they are
of vital importance. According to research on the field, the most obfuscated and complex
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 42

types of malware nowadays, include Botnets and Ransom-Spyware, while Worms and
Email spammers are older types but nevertheless network based malware types
(M.Damshenas, A.Dehghantanha, & R.Mahmoud, 2013).

For the above reasons the malware samples that will be used in the comparison
experiments should be of these types:

 Botnets
 Worms
 Email bombers
 Ransom-Spyware

The samples included in the experiment should be identified by hashing algorithms for the
possibility of replicating the experiment in the future. The hashing algorithms identify each
malware sample from its binary rather than a file name since a malware sample could be
distributed under different filenames.

3.5 Comparison procedure


In order to utilize as many features as possible for each Dynamic Malware Analysis system
and at the same time observe inabilities to detect malware activity, or discrepancies in
reporting malware behaviour, different types of malware need to be analyzed since the
different types of malware samples will perform various actions in the analysis systems.

For example, a worm type malware sample is more likely to try and propagate through a
network connection by creating or utilizing sockets to reach another machine in the
network or by using a service already running on the system by hooking into running
services (Peng, Qingping, Huijuan, & Xiaoyi, 2010). A spam-sending malware will most
likely try to find specific programs or services running on the machine, like MS Outlook, or
detect an E-mail server or service running and try to use it to send spam to specific contact
lists or the users or servers known email addresses, in order to send out E-mails. In a similar
fashion a downloader type malware will most likely try to download malicious code to the
machine, in which it is executed by contacting an external command centre or server to
request information on its next actions. This could resolve in downloading malicious code
which in turn, could be directly executed or in the form of a new executable or even create
a connection to request commands from its author. Scareware or Ransomware would try
to create changes to files, most commonly document file types, usually by encrypting the
files and forcing the owners to comply with the author's demands to release the files that
have been encrypted. Such malware may also try to contact the author to inform on a
successful attack (Garber, 2013).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 43

Due to above reasons, the systems that will be implemented need to be as similar as
possible in respect of installed software and software versions. Since outdated and well
detected malware samples will be used to perform the comparison between the systems
the software deployed in the systems should include older versions of known applications
in order to taunt the malware samples to show their intentioned purpose. Common
software like Adobe Reader v.9 and Microsoft Office 2003 will be installed on the systems
used by the DMA systems to perform the analysis tasks.

In addition to the report comparison for discrepancies in the interpretation of malware


activity and a quantitative approach of the results, the DMA systems need to be compared
on their implementation of monitoring modules for each monitoring mechanism type
(M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). Since not all systems implement all
monitoring mechanism possible the differences between the systems will be highlighted.
The importance of the monitoring mechanisms is vital since if a monitoring mechanism is
not implemented in a system then the system would not be able to detect activity related
to such features of the system that is performing the analysis of malware samples.

Figure 6: Samples analysis

It is also important to establish a timeframe of execution for each sample and keep to that
timeframe for each system under comparison. The reason for that timeframe is that some
sophisticated malware may not perform all malicious actions immediately on execution as
an attempt to avoid detection. Therefore all malware will be executed in the analysis
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 44

systems for a limited timeframe of 30 seconds (Massicotte, Couture, Normandin, &


Michaud, 2012).

The report comparison can then be achieved by comparing the reports of the different
analysis systems for each specific malware sample analyzed as shown in Figure 6.

Figure 7: Report analysis and comparison

Although Dynamic Malware Analysis systems can generate automated reports, SIEM
systems require human interaction to produce results. Though SIEM systems do not
provide an automated analysis, alert triggers can be configured in SIEM systems to notify
analysts of interesting events based on searches in the data. These alerts may produce
much more focused results but they can be hard to configure accurately.

3.6 Conclusions
This chapter described the design of an experimental SIEM sandbox based on a well
designed DMA sandbox implementation like TRUMAN box and other recent, research DMA
implementations that employ as many monitoring mechanisms as possible.

The comparison experiments metrics were chosen based on the literature. The metrics
include an implementation based comparison, an interpretation based comparison to
identify discrepancies in the interpretation of identified malware activity and a quantitative
approach to assess the DMA systems in the amount of activity reported for each category
of malware activity such as, file system activity, registry activity and DNS or HTTP requests.
These metrics could provide a holistic assessment of the systems studies since each metric
on each own studies the systems in single sided point of view.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 45

Additionally, specific and different types of malware have been chosen to be analyzed by
the systems being compared to identify issues in their reports. The malware types were
chosen based on the literature to simulate common threats that apply to a present
production infrastructure.

The comparison procedure to identify issues and limitations of the systems has been
described. Furthermore to compare the reports of DMA systems, the reports produced
after analyzing malware samples must be compared to identify detection issues and
discrepancies in the interpretation of the actions that the malware performed. By
performing the comparison, the ability of SIEM systems to detect malware as well as DMA
systems do can be assessed.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 46

4.0 Implementation

4.1 Introduction
This chapter describes the malware samples chosen based on the design of the comparison
experiments, their categories and hashes for identification. The malware samples were retrieved
from online databases.

The use of Virtual environment is described based on the design and literature due to the need to
reset the analysis environments to a clean state. Snapshots were used to restore the Virtual
machines to a fresh condition with all the analysis and taunt software installed before the malware
samples were executed.

The deployments of three common Dynamic Malware Analysis systems are also described.
FakeNet, and Buster Sandbox Analyzer as Windows based Dynamic Malware Analysis systems and
Cuckoo as a Linux based system.

Finally, the implementation of the designed SIEM sandbox system is included in this chapter. The
designed system requires the implementation of the monitoring server using Splunk as the
monitoring system, a firewall to block malicious traffic from leaving the system, a network intrusion
detection system to monitor network traffic, and the common services server to taunt possible
malicious network activity within the sandbox.

4.2 Objective
The objective is to implement a custom sandbox environment with equivalent monitoring
mechanisms that will be able to provide their data to the SIEM server in order to successfully detect
malware activity. The monitoring server needs to be able to collect data from the monitoring
mechanisms, while at the same time the sandboxed environment should be isolated from direct
internet connection.

4.3 Malware Samples


The malware samples collected for the experiment, originated from various online sources
and were identified by hash algorithm SHA 256 through Virustotal or Virusign which are
online databases for known malware (VirusSign, 2014).

The categories of malware introduced in the experiment are listed below:

 Botnets
 Worms
 Email bombers
 Ransom-Spyware
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 47

Botnet hashes and descriptions:

Compressed Botnet sample

e0f2b10182db6e124e539341eb7e896f1a35c19bbfa2ed67b4e40fc591f3bd57

Zeus botnet Sample

69e966e730557fde8fd84317cdef1ece00a8bb3470c0b58f3231e170168af169

Zeus botnet Sample

28520ba137f8872b2256205f37e56c0aa7f96b5b16c8a805aa591022dc940638

Worm hashes and descriptions:

Worm of the Alphx family sample

025ca97d6098bf44d7288013008bda9d30886b6d423e46969c0cc370c8896089

Worm of the Sasser family sample

09398d3f5cc102f7d932b765036e1ac1ff5dc27405d7357b81eaf48ca8ec71b8

Email Bombers hashes and descriptions:

Email bomber sample 1

8f5c6060c8b0a72ad3b0939acfd398acefe6c356bba0139e048250999ce2e448

Email Bomber sample 2

a93339617710234962471b7e9635c5765de9dc405541045fae119f7d45946578

Ransom-Spyware hashes and descriptions:

Cryptolocker sample fingerprints system and starts services

0dd7f3dffe8c6e69df6137cb413ad25c474d73a86f1d46d52846990aa66e6f43

Zeus Banking Spyware that contacts C&C


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 48

3ff49706e78067613aa1dcf0174968963b17f15e9a6bc54396a9f233d382d0e6

Cryptolocker sample that starts services and contacts C&C

d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

4.4 Virtual Environment


Each Dynamic Malware Analysis system was deployed on a different virtual machine using VMware
Workstation. FakeNet and Buster Sandbox Analyzer (Sandboxie) were deployed on Windows Server
2008 R2 machines, while Cuckoo was deployed on a Kali Linux machine since the software is
designed for Linux systems.

Figure 8: VMware Machines used for the DMAs and the SIEM experimental sandbox

For the experimental sandbox the monitoring machine was implemented on a Windows Server
2008 R2. Splunk was installed and configured to collect data from the monitoring mechanisms
through the network from the malware machine through Windows Management Instrumentation
(WMI). A Vyatta firewall was also set up setting the Splunk server subnet as a DMZ interface, the
virtual network internet connection through the host as an internal network interface and the
machine that will be used to execute the malware as a not trusted interface. Finally, the malware
execution and services machine was implemented also on a Windows Server 2008 R2 with common
services (FTP Server, Telnet Server and DNS Server).

The use of virtual machines was of great importance in order to provide a way to restore the
machines to a fresh state before each malware sample execution during the comparison
experiments.

4.4 FakeNet
FakeNet is provided as a windows installation module (msi) which was deployed on a fresh
Windows Server 2008 R2 virtual machine. FakeNet virtualizes common services in order to taunt
malware to perform network actions that can be monitored through a command prompt interface.
Using VMware snapshots the clean state of the environment was assured by resetting the
environment for each sample to be executed.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 49

Figure 9: FakeNet network taunt Services initiating

Figure 10: FakeNet monitoring example

Figure 9 shows the initiation of FakeNet's taunt services to trigger malware activity and Figure 10
shows sample activity for DNS requests and an HTTP connection as reported by FakeNet through
the CMD interface.

FakeNet is only a network monitoring tool, therefore it cannot monitor Windows API, registry and
file system for any changes that the malware sample may cause to the system.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 50

4.5 Sandboxie and Buster Sandbox Analyzer


Sandboxie comes as a Windows installation executable that when installed creates a separate area
on the hard drive in which disputable applications can be run to avoid any malware infection. For
example, a normal user could use Sandboxie to run web browsers in sandboxed mode to avoid
browser code hijacking from malicious websites as can be seen in Figure 11.

Figure 11: Sandboxie Chrome instance

Buster Sandbox Analyzer is an application that can use Sandboxie to analyze malicious activity from
unknown binaries. Buster is based on Windows API monitoring and can be injected in Sandboxie by
adding a few lines on the Sandboxie configuration file under the sandbox to be analyzed by Buster
as shown in Figure 12.

Figure 12: Buster injection in Sandboxie configuration file


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 51

Figure 13: Buster pointed to the appropriate Sandbox

After adding the instructions to use Busters Windows API monitoring library the sandbox is ready to
analyze malware activity as shown in Figure 13. In order to do so, Buster needs to be pointed to the
appropriate sandbox where the monitoring will commence. Once the sandbox has been located for
Buster the analysis can start.

Figure 14: Sample Buster Analysis

Buster is capturing malware behavior data during execution, such as spawned processes, as shown
in Figure 14.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 52

Once the analysis is terminated, Buster can produce analysis reports in respect of the registry, the
file system, network activity and user activity.

4.6 Cuckoo
Cuckoo is based on python and has several dependencies on python libraries such as python-
sqlalchemy, python-bson. There are also several features that will not work if the appropriate
software is not installed on the system. For example Jinja2 is required to render the HTML reports
produced and the web interface, Dpkt for analyzing pcap files and Volatility for memory analysis.
Once all the libraries required for Cuckoo are installed in the system the virtual environment must
be configured.

Figure 15: VirtualBox Guest VM to be used for analysis by Cuckoo

Cuckoo requires a virtual machine to be ready as a Malware execution machine to be monitored.


This is a case of Virtual machine running within a Virtual Machine. Cuckoo automatically restores
the machine to a fresh state after analyzing malware samples. The guest machine must have
python installed along with Python Image Library in order for Cuckoo to retrieve screenshots from
the guest machine. Also any additional software like Microsoft office, web browsers or PDF readers
need to be installed before using the analysis environment. Once the Guest is ready for analysis
cuckoo can be started as a service.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 53

Figure 16: Cuckoo service initiated, waiting for analysis tasks

Cuckoo comes with several utilities and different malware submission platforms. Submit.py module
can be used to submit malware samples through the terminal. Web.py can be used to submit
malware samples through a web interface which is connected to the reporting web interface.

Figure 17: Cuckoo utilities and web interface initiation

A web browser can then be used to submit samples and define settings for the analysis, such as
timeout, virtual machine to be used and memory acquisition.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 54

Figure 18: Cuckoo web interface for submission of executables to be analyzed

Once a file is submitted for analysis cuckoo executes the file on the virtual machine instructed at
the submission platform and monitors the system. Once the timeout timeframe has been met,
Cuckoo ends the analysis of the binary and reports back through the web interface providing
information about the binary such as type, hashes, yara signatures and others. In addition it also
provides a summary of malware activity highlighted by colours that match risk, red being the
highest risk indicator, network activity and registry-process activity. Finally, the system reports on
mutexes created processes and registry changes caused by the sample.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 55

Figure 19: Cuckoo sample file details report

Figure 20: Activity summary as Signatures and Screenshots during execution


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 56

Figure 21: Cuckoo Network analysis, behaviour summary and mutexes created report

4.7 Experimental Splunk Sandbox environment


The ideal scenario for a sandbox analysis would is to create a monitoring system outside the
sandbox which would receive data from a malware execution machine and be able to monitor all
activity on that machine. Splunk is able to achieve that, through a universal forwarder application
which is configured to send data to Splunk regarding the machine being monitored. The data that
can be transferred in such setup include Application, Security, System, Setup data. According to
Splunk's community portal discussions, it is possible to setup registry monitoring through much
more complex configuration.

Figure 22: Splunk Universal Forwarder data inputs


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 57

The universal forwarder does not include a module to forward registry data or file system data to
monitor registry hives in order to identify the activity of malware samples in the registry and file
system changes. The Forwarder could possibly use logs from 3rd party programs to forward data to
the Splunk monitor. Such features are not implemented in this scenario due to implementation
time restrictions.

Registry changes could include important activity in the system such as token elevations, which
imply privilege escalation from a malware sample. They may also include device and system
fingerprinting information which could prove vital in understanding what a malware sample's
purpose is.

Splunk is deployed on a Windows Server 2008 R2 Virtual machine. The Virtual environment
provides the ability to revert the machine to a fresh state, before and malware samples are
executed in the system through snapshots as discussed in the design and literature review. As it is
the analysis machine, it retrieves data collected during the malware execution, as well as network
activity logs from a Snort IDS deployed locally on the machine monitored.

A second Windows Server 2008 R2 virtual machine has been deployed as the malware taunting
machine on which common services have been installed and running to possibly taunt malware to
attack or utilize the services. After each sample execution the malware monitoring machine and the
services machine are reverted to fresh snapshots. The communication channel between the
Services machine and the malware machine is defined by the use of the Splunk universal forwarder
which has been installed on the services machine to forward data to the Splunk monitor. The
forwarder is serving Application, Security, System, and Setup originating from the services machine
to the monitor machine on port 9997 which is the default port for the forwarder.

Figure 23: Splunk's local Registry monitoring feature

For the Splunk monitor to be able to monitor the registry in this simple sandbox setup, Splunk has
been deployed on the malware execution machine to monitor for malware activity. The fact that
the malware execution machine is also the monitoring machine for malware activity does not
create an analysis conflict since the malware samples chosen for the comparison do not interfere
with local Splunk services and the machine is reverted to a fresh state, using a snapshot, before
executing each sample in the sandbox.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 58

Figure 24: Implemented topology of the Splunk sandbox due to the limitations

As shown in Figure 24, the monitoring machine is the machine on which the malware is executed
and it receives data from an IDS implementation monitoring the traffic in the sandbox, as well as
the taunting services machine security logs to assess possible malware activity on common services.

Splunk is acquiring data from the local machine that relate to windows security events, parsing
security logs as shown in Event Viewer in real time as well as registry hives monitoring. It is also
able to retrieve DNS and other common services logs from the Services machine. The DNS server
deployed on the Services machine is set to log incoming and outgoing requests at a text file under
the file path C:\DNSlogs\DNSlogs.txt. Splunk has also been configured to retrieve that file through
the universal forwarder which has been deployed on the Services machine to forward DNS logs and
windows security event logs to monitor the services provided to the malware machine.

Splunk's ability to process enormous amounts of data related to the systems in the sandbox allows
it to monitor common services and environment features such as the Windows registry on the
malware machine for changes that may be related to malware activity. The
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 59

Figure 25: Malware process identified in Splunk monitoring for malware Cryptolocker sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

Figure 26: DNS requests triggered by Cryptolocker malware sample SHA256:


d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

The analytical power of Splunk comes from the ability to filter data using specific strings that relate
to malware activity and the data source since for example a UDP requests on port 53 imply DNS
requests which can reveal domains the malware samples can try to contact when executed, as
shown in Figure 26. Additionally, filters that Splunk creates automatically can also be used to
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 60

identify important data like process names and locations as shown in figure which can provide
insight on the malware sample activity.

Figure 27: Visualization of port activity for Cryptolocker malware sample SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

Splunk's report visualization is one of its most important features since it automatically creates a
more comprehensive presentation of the data collected as shown in Figure 27.

Figure 28: Filtering by Request count for malware services deployed at 0.0.0.0. by Cryptolocker malware sample
SHA256: d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 61

Splunk's filtering abilities can provide important information for an analyst since the filters can
focus the search in the data in important values of interest. In the examples above in Figure 28 port
activity is visualized and malware deployed services can be identified in Figure 29.

Figure 29: Visualization of network traffic for the Malware machine and the services initiated by the malware sample at
0.0.0.0. Sample SHA256: d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

4.8 Conclusions
This chapter has described the deployment of DMA systems and the implementation of the
experimental Sandbox for dynamic malware analysis through Splunk. The samples used in the
comparison experiments are specified through hash values and types.

All DMA systems were successfully deployed with all possible features enabled due to various
dependencies to assure the best possible outcome from the analysis of malware samples. The
experimental sandbox has been designed to conduct safe analysis of malware samples by isolating
the environment as proposed in the design chapter.

Implementation limitations have forced changes to the original design due to issues regarding data
acquisition from the systems in order to achieve monitoring for as many system parameters as
possible, since if this was neglected the detection abilities of the experimental sandbox would be
much more limited. The analysis results of the system implemented provided deep insight to
malware sample activity, complemented by visualization of the malware activity in the system,
which can help in a deeper understanding of the actions the malware is taking in comparison to a
text report. Overall the experimental sandbox performed well in the detection of malware activity
and Splunk performed as expected in detecting malware activity in the systems monitored.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 62

5.0 Evaluation

5.1 Introduction
This chapter describes and evaluates the results of the experiments conducted by presenting
sample analysis from each system. The results fall in three categories, starting with detection issues
based on specific monitoring mechanisms due to the implementation and detection abilities of
each system. The second category describes a comparison of the reports of the Dynamic Malware
Analysis systems to identify discrepancies in the interpretation of the analysis procedure. Finally, a
comparison is conducted between the reports of DMA systems and the experimental SIEM sandbox
environment.

5.2 Sample Reports on the Analysis of Malware Samples


Nine malware samples were used to conduct the comparison between the DMA systems and the
experimental SIEM sandbox. The malware samples used came from a variety of sources and each
malware sample performed different actions during execution. In order to compare the systems, a
complete documentation of the malware activity must be compared with the findings of each
system for each sample analyzed. The malware activity samples activity is well documented by
Virustotal, which is one of the online malware databases where the samples were taken from.

Since the presentation of the findings for every sample used is very long a sample analysis is
presented by each system under comparison for one of the Cryptolocker samples that can be
uniquely identified by the hashing algorithm SHA256.

Sample: d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9

FakeNet:

Figure 30: FakeNet Reporting on malware network activity


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 63

Figure 31: FakeNet sample DNS requests reported for the Cryptolocker sample

FakeNet identified all network traffic initiated by the malware sample and listed the domains
contacted as well as samples of http traffic for the requests.

IP addresses contacted by the malware: 182.164.136.134, 212.71.250.4

FakeNet also reported a large domain list which the malware attempted to contact. Some examples
of the domains:

hxaqmhgivuwrcjm.ru

iabqnyalehbecqn.org

jsxwlxbqwokqsls.co.uk

kuywmputfbodcfl.info

jpfukmotkpwoufm.com

Buster Sandbox Analyzer:


Buster reported much less network activity for the same sample but it reports on system activity as
well which is very important for a malware sample that operates as ransomware. Buster was able
to detect detailed system activity types such as DLLs utilized by the malware, changes to file system
and registry, domains contacted by the malware and mutexes created.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 64

Figure 32: Buster report on DLL, file system and registry activity

Figure 33: Buster report on Domains contacted and Process information

Cuckoo:
Cuckoo reported on the activity of the malware sample much more efficiently than Buster and
FakeNet. Apart from file details analysis Cuckoo managed to detect much more activity initiated by
the malware sample, compared to Buster, regarding the behaviour of the sample affecting registry
and the processes created.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 65

Figure 34: Cuckoo file details report for the sample

Figure 35: Cuckoo signature and activity summary report for the sample
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 66

Figure 36: Cuckoo domains involved report

Figure 37: Cuckoo processes report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 67

Figure 38: Cuckoo network and behaviour report

Figure 39: Cuckoo Registry report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 68

Cuckoo clearly reported more detailed activity on the file system locations, the registry keys
affected by the malware sample and the mutexes created. At the same time the network report is
also much more accurate compared to Buster while the network activity report matches the results
produced by FakeNet.

Anubis:
Anubis, being an online submission platform for unknown binaries reported a previously generated
report. Once more the report of Anubis on the malware sample was much richer, compared to
Buster in respect of the domains contacted and the file system and registry activity. Surprisingly the
report did not include the second host reported as contacted by Cuckoo and FakeNet at
212.71.250.4. Another issue was that the report showed differences in the processes initiated by
the malware sample. The process names (number sequences) executable files that are reported as
created by the original malware sample executable are different from the ones reported by Cuckoo,
while Buster reported the same process names as Anubis but simply as mutexes created. Finally,
Anubis categorizes the report in sections under each procedure reported as detected.

Figure 40: Anubis report on the original executable and its network activity
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 69

Figure 41: Anubis report on the original executable registry and file system activities

Figure 42: Anubis report on the original executable process creation activities
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 70

Figure 43: Anubis report on the processes created by the original executable

Figure 44: Anubis report on network activity of the processes created by the original executable

Anubis produced a very detailed report on the malware sample. The report is very similar to the
report produced by Cuckoo but there are several interpretation discrepancies that derive from the
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 71

report. The most obvious is the names of the processes created by the original executable. Cuckoo
reports the processes with different names and Buster includes them in the report as mutexes
initiated by the original executable without any further detail.

The network activity is reported by Anubis as the activity of the original executable and the
additional processes. Cuckoo reported the network traffic in total while providing more details that
align with Anubis's report in the process analysis section. Finally, Anubis failed to report all the
hosts contacted by the malware sample compared to Cuckoo and FakeNet.

Experimental Splunk Sandbox:


The experimental Splunk sandbox performed great in detecting the network traffic produced by the
malware sample. It also detected the processes initiated by the original executable. The file system
changes were partially documented and the registry activity is not clearly connected to the
malware sample since it is reported as registry queries that were initiated from svhosts.exe and
explorer.exe which are Windows applications most likely hooked by the malware sample since the
registry keys queried by these applications are similar to the ones reported by Cuckoo, Anubis and
Buster.

Figure 45: Processes reported by Splunk parsing the Windows Security event log.

When reporting on the applications found in registered events in the Security event log Splunk was
able to detect the activity of the original malware executable and the processes it created in the
system to perform malicious activities.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 72

Figure 46: Splunk reports suspicious activity event by a strange process which starts Services at 0.0.0.0:56124

Splunk identified the activity of the processes initiated by the malware sample and, by selecting the
Source Address filter; a report is automatically produced by Splunk to visualize the activity on the
system. The report is a graph showing that most of the traffic Source address was 0.0.0.0. and
therefore related to the malware activity.

Figure 47: Splunk Report on Network activity Source Address


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 73

Figure 48: Search for Domain requests reported back all domains reported by the DMA systems

Figure 49: Splunk has detected a few more hosts in the DNS traffic initiated by the malware

Figure 50: Token elevation by the malware initiated process


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 74

Figure 51: Splunk also reported token elevations for cmd.exe which was the application that launched the malware
sample

Figure 52: Registry key query by injected code in explorer.exe

Registry keys were queried by explorer.exe. Splunk did not clarify if this is related to the malware
activity but since there are several keys queried by explorer.exe and svhosts.exe in an extremely
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 75

small time window it is reasonable to presume that the queries may be initiated by the malware.
Cross referencing with the reports of the DMA systems, the registry keys are reported as used by
the malware.

Finally by performing searches in the IDS logs the network activity is found and the malicious hosts
contacted by the malware identified. Host 184.164.136.134 is reported in the log file and easily
identified when performing a destination search.

Figure 53: IDS log Parsed in Splunk to monitor network activity

As shown in the reports of each system for a specific malware sample, several differences in
interpretation or detection have been identified. Even more important than the lack of detection of
malware features or activity is the fact that the systems show discrepancies in the interpretation
when the analysis is automated. Buster, Cuckoo and Anubis disagreed in several areas when
interpreting the data to present them in a more understandable form. In addition to that the
inability to detect malware actions has been highlighted mostly by Cuckoo and Anubis due to their
thorough reporting on behaviour analysis of the malware sample that affects the registry, file
system, mutexes and memory.

5.3 Systems implementation comparison


The DMA systems implementation is very important in the analysis procedure of potential
malicious executables since the mechanisms implemented to monitor the system, that is analyzing
the executable, are the ones that will produce the results and therefore show the limitations of
each system. The systems may vary in the implementation techniques in different areas such as the
virtualization or emulation of the host machine that the executable is being executed upon. The
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 76

techniques and mechanisms introduced in each system have been documented in the following
table to illustrate the limitations and detection abilities for each system.

Systems

experime
Analyzer

Sandbox
Sandbox
FakeNet

Cuckoo

Anubis

Splunk
Buster

ntal
Monitoring
mechanisms

Analysis Implementation
Virtual 
System
monitor
System 
Emulation
System  
Simulation
Analysis support for
API calls   
System calls    
Single    
Process
Spawned    
Processes
All   
Processes
Function   
Parameters
File system   
operations
Registry    
operations
Signature 
matching
Windows 
Event Logs
Networking support
Internet  
Access
Network     
Monitoring
Services    
Simulation
Reporting
Automated   
Reports
Table 3: System implementation techniques and monitoring mechanisms
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 77

The above table highlights important implementation features for each system studied that
are vital in the dynamic detection of malware activity. The implemented mechanisms are
responsible for capturing and providing data for the report of the system and therefore if a
monitoring mechanism is not implemented the system is not able to report on related
activity. In comparison to a study on the implementation of such systems (M.Egele,
T.Scholte, E.Kirda, & C.Kruegel, 2011), the systems included in the experiment show a more
holistic approach on their implementation since the monitoring mechanisms that are
included in the systems cover more activity detection categories than earlier systems that
have been studied.

As shown in the first column of the table, FakeNet can be considered as a good tool to
record and report network activity of malware samples since it provides a taunting network
simulation, including common services to taunt malware network activity. Even though its
network simulation is very good to provoke malware to show their network activity, the
system lacks system monitoring mechanisms and therefore cannot report on system
activity and provide details on malware activity in the system.

Buster is a more complete system, as shown in the second column, since it utilizes an API
monitoring library to detect the activity of malware and can provide information about
system attributes such as the file system, the registry and processes information while it is
complemented with network monitoring. It reported details in the activity of malware
samples categorized in different groups like network activity and file system activity.
Although its implementation allows it to report on details regarding the system it is
monitoring, the network simulation module utilized by Buster did not produce the results
expected in most cases.

As shown in the third column Cuckoo is a powerful system since its implementation
includes modules to accurately and efficiently gather information about malware samples
activity regarding all aspects of the monitored system. Its unique features in comparison to
the other systems include signature matching through the yara module and virtualization
of a system using VirtualBox or other virtualization platforms. Cuckoo performed much
better than FakeNet and Buster since it was able to report in deeper detail on the malware
sample's activities and automatically produce screenshots of the target machine. One of
the important features in Cuckoo's reporting platform is a signature summary which can
help in the quick production of security countermeasures to defend against new unknown
malware.

Anubis's implementation as described in the fourth column of the table is unique in respect
of the analysis implementation since it emulates the analysis system through Qemu. It is a
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 78

system that includes a variety of monitoring mechanisms giving it the potential to log
malware activity. The fact that the system uses an online platform though, can make it
vulnerable to decoys and a target for malware authors (M.Egele, T.Scholte, E.Kirda, &
C.Kruegel, 2011).

As discussed in a recent study, Qemu virtualization can possibly be detected by


sophisticated malware samples and therefore may produce inaccurate reports (M.Egele,
T.Scholte, E.Kirda, & C.Kruegel, 2011). In comparison to Cuckoo's implementation, it lacks
in summarization of the activity and signature matching, but the analysis reports produced
appeared richer in respect of system monitoring.

The experimental sandbox implementation, as described in the last column, includes a


unique feature that monitors Windows system Event logs, since it is based on Splunk for
monitoring the system and its ability to parse Windows Event logs. The system is
simulated, since the monitoring procedure occurs on a live system reverted to a fresh
instance after each malware sample execution. The system lacks in detecting API calls,
function parameters and file system operations, which are features that may be introduced
using some third party software that can produce reports for each monitoring mechanism.
This lack of monitoring mechanisms provided less data during the analysis compared to
Anubis Cuckoo and Buster, but the system was able to report even better than the other
systems in respect of the network activity and privilege escalation produced by the
malware samples as shown in the example analysis in 5.2. Even though Splunk is not a
system intended for dynamic malware analysis, it was able to detect malware activity
accurately, considering the mechanisms that were introduced in the system to detect the
malware activity.

The modules that were not implemented in this experimental sandbox for dynamic
malware analysis could possibly be introduced with the use of other software that can
produce reports regarding API calls and the file system which can be forwarded to, and
parsed by Splunk to produce much better monitoring of the system in which the malware is
executed.

5.4 Dynamic Malware Analysis systems reporting comparison


In this section the evaluation of the dynamic malware analysis systems will be reported. For
each malware sample analyzed by the Dynamic Malware Analysis systems, the reports
were documented as produced to provide a comparison between the results. The results
were grouped in system detection categories including mutexes, file system activity,
registry activity, spawned processes and processes hooked and network activity regarding
HTTP requests, DNS requests, SNMP requests Telnet requests and Services initiated by the
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 79

samples, as reported by the analysis systems. The evaluation of the system's reports is
categorized in interpretation discrepancies and quantitative activity reported.

Discrepancies identified refer to the interpretation of the same action that the malware
sample is performing, interpreted differently by the analysis systems in their reports
(Massicotte, Couture, Normandin, & Michaud, 2012). Since not all the systems are able to
report on every monitoring mechanism as shown in the implementation comparison, the
network activity reports will be examined since all systems were able to report on the
network activity of the samples.

Two groups need to be defined to highlight the differences in the reports in respect of each
monitoring mechanism for the network activity. Group A, refers to the number of samples
for which all systems agree in the interpretation of the collected data and Group B to the
number of samples for which at least one of the systems disagrees in the interpretation of
at least one action performed by the malware. The actions performed by the malware
samples include requests, DNS requests, SNMP requests Telnet requests and Services
initiated.

Group
Actions\Groups Group A B

HTTP requests 7 3

DNS requests 4 6

SNMP 10 0

TELNET 10 0

Services initiated 9 3
Table 4: Network Activity Interpretation Discrepancies

An example of such discrepancies identified is that for the Cryptolocker sample with SHA
256: 0dd7f3dffe8c6e69df6137cb413ad25c474d73a86f1d46d52846990aa66e6f43. Some
of the systems reported HTTP requests performed by the malware sample while other
systems reported the traffic simply as a host that was contacted or as part of the
initiated service the malware produced at 0.0.0.0. The reports showed a variety of
interpretation discrepancies in the system monitoring modules as well but since the
modules are not implemented in all systems a comparison would not be accurate.

The discrepancies identified illustrate the fact that the systems show differences in the
interpretation of the data, even though they were able to detect the network activity of
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 80

the malware samples. The above table is visualised in Figure 54.

10
9
8
7
6
5 A
4 B
3
2
1
0
HTTP DNS SNMP TELNET Services
requests requests initiated
Figure 54: DMA systems network report discrepancies

The graph in Figure 54 shows that all systems agreed for the services that were not utilized
by the samples during the executions. On all other fields the red bars identify discrepancies
on HTTP, DNS and Services, which were discrepancies in findings that were common to at
least two or more systems. The systems may have provided the information necessary for
an experienced analyst to identify and interpret the activity of the malware, but especially
in summarizing which is the part of the report where the system is "guessing" on the
purpose of the malware sample the systems would show different interpretations.
Additionally, some interpretation differences were identified in DNS and HTTP requests.
Another good example of these interpretation issues was that Cuckoo and Buster reported
one of the Cryptolocker samples as a Key logger malware, while Anubis reported it as a
Banking malware, even though it was able to find the mutexes, process hooking and
registry keys related to key input monitoring.

A recent study that designed a testing model for dynamic malware analysis systems,
grouped the results in a similar way (Massicotte, Couture, Normandin, & Michaud, 2012).
The study including an automated oracle was implemented for the study of a much larger
variety of malware samples. The discrepancies identified in this experiment, fall into the
same pattern that the testing model identified and are accurate since the grouping of the
systems was performed in a similar way. The study included 74 malware samples and 8
systems that were assessed resulting in 33.9% of interpretation discrepancies in the
systems. In this comparison the discrepancies percentage was limited to 16.06%, but if
more systems were assessed the percentage would most likely rise.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 81

Quantitative activity reported by the systems is also an important metric for Dynamic
Malware Analysis systems (Canali, Lanzi, Balzarotti, Christoderescu, Kruegel, & Kirda, 2012).
Even though the reports produced by the systems may show similarity, some of the
systems reported more activity than others, while the monitoring mechanisms to identify
such activity were implemented in each system.

For example, for the Cryptolocker sample that is identified from SHA256:
d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9, Cuckoo and
Anubis reported 6 file system activities while Buster reported only two. Another example
for the same malware sample is that Cuckoo and Anubis reported 35 DNS requests while
FakeNet reported 30 and Buster only 19 as shown in Figure 55 and Figure 56.

The difference in the quantity of the results shows that the systems may create insufficient
reports or these differences may be produced by the environment of the system, plain
bugs, or reporting platform issues as shown in a recent study that introduced a testing
model for such systems (Massicotte, Couture, Normandin, & Michaud, 2012). To identify
such differences between the systems studied, the reports were quantified in respect of
the reported results in each category of the reports documented and categorized as
network activity such as HTTP, DNS requests, SMTP and telnet activity, and services
initiated. System activity actions were categorized in file system activity, registry activity,
spawned processes, and processes injected. The categorization based on the categorization
found in the reports of each system, for each of the samples used, in the comparison
experiments as shown in the charts below.

Cryptolocker sample SHA256:


d765e722e295969c0a5c2d90f549db8b89ab617900bf4698db41c7cdad993bb9
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 82

45
40
35
30 System activity reported
from Fakenet
25
System activity reported
20 from Buster
15 System activity reported
10 from Cuckoo

5 System activity reported


from Anubis
0

Figure 55: System activity reported by the DMAs for Cryptolocker C&C

40

35

30 Network Activity reported


from Fakenet
25
Network Activity reported
20 from Buster
15 Network Activity reported
from Cuckoo
10
Network Activity reported
5 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 56: Network activity reported by the DMAs for Cryptolocker C&C

Figures 55 and 56 show that Cuckoo and Anubis were able to detect much more activity regarding
the registry and DNS requests than the other systems did. FakeNet shows a peak on HTTP requests
since it reports each packet sent as a separate request event while Cuckoo and Anubis reported on
the hosts contacted just once. Buster failed to detect the HTTP requests and was poor in detecting
all the DNS activity produced by the malware.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 83

Zeus Bot Key logger malware sample SHA256:


3ff49706e78067613aa1dcf0174968963b17f15e9a6bc54396a9f233d382d0e6

70

60

50 System activity reported


from Fakenet
40
System activity reported
30 from Buster
System activity reported
20
from Cuckoo
10 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 57: System activity reported by the DMAs for Zeus Bot Key logger

100
90
80
70 Network Activity reported
from Fakenet
60
Network Activity reported
50 from Buster
40
Network Activity reported
30 from Cuckoo
20 Network Activity reported
10 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 58: Network activity reported by the DMAs for Zeus Bot Key logger

Buster reported fewer events regarding system activity compared to Anubis and Cuckoo
and was unable to detect network activity. Anubis also reported fewer DNS requests
possibly due to executing on a different timeframe since it is an online platform and the
execution timeframe cannot be controlled.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 84

Cryptolocker sample SHA256:

0dd7f3dffe8c6e69df6137cb413ad25c474d73a86f1d46d52846990aa66e6f43

70

60

50 System activity reported


from Fakenet
40
System activity reported
30 from Buster
System activity reported
20
from Cuckoo
10 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 59: System activity reported by the DMAs for Cryptolocker Crypt

80

70

60 Network Activity reported


from Fakenet
50
Network Activity reported
40 from Buster
30 Network Activity reported
from Cuckoo
20
Network Activity reported
10 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 60: Network activity reported by the DMAs for Cryptolocker Crypt

Once more, Buster shows once more great variation compared to the results of Cuckoo and Anubis
in system activity in Figure 59 and poor DNS activity in Figure 60. FakeNet reports each HTTP
request separately and therefore reports more on HTTP while Cuckoo and Anubis reported on the
host with a single entry in the report.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 85

Compressed Botnet sample SHA256:

e0f2b10182db6e124e539341eb7e896f1a35c19bbfa2ed67b4e40fc591f3bd57

16
14
12
System activity reported
10 from Fakenet
8 System activity reported
from Buster
6
System activity reported
4 from Cuckoo
2 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 61: System activity reported by the DMAs for Compressed Botnet

4.5
4
3.5
Network Activity reported
3 from Fakenet
2.5 Network Activity reported
from Buster
2
Network Activity reported
1.5
from Cuckoo
1
Network Activity reported
0.5 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 62: Network activity reported by the DMAs for Compressed Botnet

The Compressed botnet was able to escape detection by Cuckoo and Anubis. The reason this
happened is possibly related to the implementations or the virtualization techniques since the
malware may have some controls to stop its activity if a virtualized environment is identified.
Buster on the other hand was able to monitor the malware effectively, even though once again it
reported poorly on DNS activity compared to FakeNet as can be seen in Figure 62.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 86

Zeus Bot sample SHA256:


28520ba137f8872b2256205f37e56c0aa7f96b5b16c8a805aa591022dc940638

120

100

80
System activity reported
60 from Fakenet
System activity reported
40
from Buster
20 System activity reported
from Cuckoo
0
System activity reported
from Anubis

Figure 63: System activity reported by the DMAs for Zeus Bot

16

14

12 Network Activity reported


from Fakenet
10
Network Activity reported
8 from Buster
6 Network Activity reported
from Cuckoo
4
Network Activity reported
2 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 64: Network activity reported by the DMAs for Zeus Bot

Once more, Buster reported fewer events on system activity in Figure 63. FakeNet creates
more report entries than Cuckoo and Anubis for the same host and Buster reports fewer
services spawned than the rest of the systems in Figure 64.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 87

Zeus Banking key logger Bot SHA256:

69e966e730557fde8fd84317cdef1ece00a8bb3470c0b58f3231e170168af169

16
14
12
System activity reported
10 from Fakenet
8 System activity reported
from Buster
6
System activity reported
4 from Cuckoo
2 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 65: System activity reported by the DMAs for Zeus banking key logger Bot

3.5

2.5 Network Activity reported


from Fakenet
2 Network Activity reported
from Buster
1.5
Network Activity reported
1 from Cuckoo
Network Activity reported
0.5 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 66: Network activity reported by the DMAs for Zeus banking key logger Bot

Buster is able to get results out of the Zeus banking Bot sample since it probably evaded
detection in Cuckoo and Anubis due to virtualization or implementation techniques issues.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 88

Alphx family Worm SHA256:


025ca97d6098bf44d7288013008bda9d30886b6d423e46969c0cc370c8896089

30

25
System activity reported
20
from Fakenet
15 System activity reported
from Buster
10 System activity reported
from Cuckoo
5
System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 67: System activity reported by the DMAs for Alphx family Worm

5 Network Activity reported


from Fakenet
4 Network Activity reported
from Buster
3
Network Activity reported
2 from Cuckoo
Network Activity reported
1 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 68: Network activity reported by the DMAs for Alphx family Worm

Buster managed to report more activity regarding the Alphx family worm system activity as
seen in Figure 67, while all systems agreed on the results of network activity in Figure 68.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 89

Nasser Family Worm SHA256:


09398d3f5cc102f7d932b765036e1ac1ff5dc27405d7357b81eaf48ca8ec71b8

14

12

10 System activity reported


from Fakenet
8
System activity reported
6 from Buster
System activity reported
4
from Cuckoo
2 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 69: System activity reported by the DMAs for Nasser Family Worm

140

120

100 Network Activity reported


from Fakenet
80 Network Activity reported
from Buster
60
Network Activity reported
40 from Cuckoo
Network Activity reported
20 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 70: Network activity reported by the DMAs for Nasser Family Worm

Buster reported a much larger number of mutexes compared to Cuckoo and Anubis while it
reported much less in File system and registry activity as seen in Figure 69. FakeNet
reported once more with several entries for the same targets while Cuckoo and Anubis
reported each entry just once as seen in Figure 70.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 90

Email Bomber Mailnuke SHA256:


8f5c6060c8b0a72ad3b0939acfd398acefe6c356bba0139e048250999ce2e448

12

10
System activity reported
8
from Fakenet
6 System activity reported
from Buster
4 System activity reported
from Cuckoo
2
System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 71: System activity reported by the DMAs for Email Bomber Mailnuke

300

250
Network Activity reported
200 from Fakenet
Network Activity reported
150 from Buster
Network Activity reported
100
from Cuckoo

50 Network Activity reported


from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 72: Network activity reported by the DMAs for Email Bomber Mailnuke

Buster once more reports much more activity than the rest of the systems regarding
mutexes created by the malware in Figure 71. FakeNet was the only system that managed
to detect the mail bombing initiated by the malware as seen in Figure 72.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 91

Email Bomber SHA256:


a93339617710234962471b7e9635c5765de9dc405541045fae119f7d45946578

40
35
30
System activity reported
25 from Fakenet
20 System activity reported
from Buster
15
System activity reported
10 from Cuckoo
5 System activity reported
from Anubis
0
Mutexes File Registry Spawned Processes
system Activity processes hooked
activity

Figure 73: System activity reported by the DMAs for Email Bomber

1
0.9
0.8
0.7 Network Activity reported
from Fakenet
0.6
Network Activity reported
0.5 from Buster
0.4
Network Activity reported
0.3 from Cuckoo
0.2 Network Activity reported
0.1 from Anubis
0
HTTP DNS SNMP TELNET Services
requests requests initiated

Figure 74: Network activity reported by the DMAs for Email Bomber

Buster failed to detect all file system activity as shown in Figure 73, while all systems
agreed in reporting zero network activity for this sample.

The results illustrated in the above figures show that the systems reported different
quantities of actions for each sample. This is an issue for dynamic malware analysis systems
since each system is implemented by its authors from a specific point of view, usually the
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 92

point of view of the author. In addition, the reports on Buster were surprisingly accurate in
summarizing the malware's purpose being accurate in most cases in characterizing the
malware as a worm or a key logger or highlighting banking monitoring. This showed that
Buster is able to categorize a malware sample with fewer results than the other systems
since in most cases Buster produced much fewer entries in its reports for each activity
category.

These results leads to the conclusion there are detection issues in all the systems tested
since not all malware will perform actions that the authors intended to monitor and
therefore the system is not able to report with accuracy on the malware activity. It is
therefore, important to establish a framework for the design of such systems, for future
implementations and the improvement of such systems, in order to avoid this type of
issues.

5.5 DMA systems report comparison to Splunk Experimental Sandbox


findings
The experimental SIEM sandbox that was designed showed several advantages and
disadvantages against the DMAs that were included in the experiment.

One of the disadvantages was that the system was unable to detect mutexes and most of
the file system changes for each malware sample executed as shown in the
implementation table in 5.3, which was a design flaw. This is an implementation and design
issue, since a monitoring mechanism was not introduced. If a third party software was
introduced in the system to monitor the Windows API and file system for changes and its
output could be directed to Splunk for analysis perhaps the system would be able to detect
these two categories of malware actions.

In addition to that, the system was surprisingly unable to detect many interesting registry
alterations and key creations even though the policy on the server was set to log events of
interest. Despite the lack of quantity in the results regarding the registry, the registry
findings that Splunk was able to retrieve were directly connected to the malware activity
and the alterations that the malware hooked or spawned processes made in the system's
registry.

These registry findings, do justify the deployment of Splunk inside the Sandbox since it was
able to retrieve and parse activity that would definitely have been missed if the monitoring
system was outside the sandbox and could not retrieve the registry data through the
universal forwarder.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 93

Figure 75: Splunk retrieved a registry Query initiated by a code injection in explorer.exe

Despite, not being able to detect every single change in the registry and all the process
creations, Splunk was able to identify vital information regarding spawned processes, as
well as token elevations that indicated privilege escalations cased by the malware which is
something that none of the DMA systems included in their reports.

These findings are of high importance when an unknown binary is being examined since
they show the hidden intentions of the malicious code against the system and since the
sample is being executed it can bypass obfuscation techniques such as the use of packers.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 94

Figure 76: Splunk retrieves a spawned process log which initiates services at 0.0.0.0:56124

Figure 77: Splunk retrieves token elevation information initiated from code injection in cmd.exe

The main advantage of the Splunk sandbox introduced was that it does not create an
automated report and therefore it cannot cause interpretation discrepancies when
malware activity is reported. On the other hand, the non-automation of the analysis
procedure requires manual work by analysts. The discrepancies can be caused by several
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 95

issues that Splunk does not suffer from. These issues could be plain bugs in the systems,
environmental causes, post analysis filtering and semantics as described in a recent study
(Massicotte, Couture, Normandin, & Michaud, 2012). Splunk is definitely free of post
analysis filtering and semantics issues, since it does not automate the report procedure,
leaving only two categories of possible issues to be taken into consideration. The
environment used in the experiment was virtualized which may lead to issues with very
sophisticated malware samples. Luckily, the samples used did not highlight such issues.
Bugs comprise the final category that may cause discrepancies but since the system is
sophisticated and robust with internal reporting for errors it can be checked to avoid this
category.

Another major advantage of the experimental sandbox was visualization. The text reports
produced by the DMA systems may be well categorized and in some cases produce
summaries of the activity giving a quick behaviour analysis, but if the system fails to
identify all attack vectors of a malware sample these reports could be misleading. Splunk's
virtualization abilities of the data under analysis can provide the analyst with deep insight
into what the malware sample is doing in a system.

A great example of this feature can be seen in figure 5.21. Splunk provided a visualized
report on IP source addresses for traffic leaving the host. The first value in the chart, which
was automatically produced by Splunk, shows traffic initiated from 0.0.0.0. and the number
of packets leaving the host. The malware that was executed is reported by all DMA systems
to initiate services at 0.0.0.0. which means that it is either listening for instructions or
updates from its command and control centre or that it is advertising that the system has
been compromised for the malware author to perform further malicious activity on the
system.

Splunk performed as expected by managing to detect the network activity of the malware
samples for each sample. The DNS logs of the server provided all the information required
to detect the malicious activity and the remote domains that the malware were trying to
contact. In addition to that, the IDS logs provided great detail regarding the rest of the
network traffic and Splunk was able to process and analyze all types of traffic.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 96

Figure 78: Splunk's automated report on traffic Source Address for Crypto C&C malware sample.

5.6 Conclusions
This chapter has described and analyzed the results of the comparison of the
implementations and reports of DMA systems along with an overall comparison with the
findings of the experimental sandbox environment designed.

The DMA systems were compared in respect of their implementations to identify


limitations in the systems and detection abilities. A table was created with categorization in
detection techniques implemented in each system to compare to similar previous research
(M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011).

Interpretation discrepancies were also highlighted in the reports of the systems as


identified during the comparison procedure and categorized in two groups, the first group,
being the number of samples for which all systems were agreeing on the interpretation of
the findings. The second group was the number of malware samples for which at least one
system would disagree with the others in the interpretation of activity detected by all the
systems.

All systems were used to analyze 10 malware samples to produce a comparison in the
findings in terms of a quantitative approach. The differences in the reports pointed out
several detection issues that could be related to the monitoring techniques implemented in
the systems as well as graver issues of misdetection due to possible evasion techniques
employed by malware samples. The variations in the results were obvious and caused for
various reasons.

In the last part of the chapter, a summarization of Splunk's detection abilities compared to
the findings of the DMA systems is presented. Even though the Splunk monitored sandbox
failed to introduce a file system monitoring mechanism, it detected network activity
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 97

successfully and introduced more comprehensive application monitoring in the form of


token elevation which is connected to privilege escalation. Finally Splunk's visualization
abilities in reporting on activity caused by the malware samples give a more comprehensive
view of the true purpose of malware.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 98

6.0 Conclusion

6.1 Overall Conclusion


The thesis showed by researching the literature on the field of Dynamic Malware Analysis
that there are numerous techniques within the scope of which Dynamic Malware Analysis
systems can be implemented. These techniques vary in the way the analysis system is
monitoring the target environment in which a potential malware is executed. The literature
also highlighted the importance of the implementation of an analysis system since if certain
modules or techniques are not implemented, then the system is not able to detect the
activity related to that monitoring feature.

Through the research the advantages of DMA systems were pointed out compared to other
malware analysis techniques but at the same time, several issues regarding the DMA
systems were discussed as well. The advantages of the execution of malware and
monitoring of a system are mainly related to behavioural analysis and overcoming
obfuscation techniques that the malware authors are currently using to create more robust
to detection and analysis malware. On the other hand, DMA systems are implemented in
the point of view of the author and not a standardized framework, which as an issue that
creates gaps in detection and due to several system issues, in many cases, interpretation
discrepancies in their reports can be identified.

SIEM systems could provide solutions to such problems since they could be used in custom
sandboxes which are not implemented from a strict point of view of a system author. SIEM
systems are based on the monitoring mechanisms that will provide data for analysis. If the
data can be retrieved from mechanisms in place to monitor every aspect of a system then
the SIEM system will be able to process the data and produce reports. The main
disadvantage of SIEM systems in this case is that the processing of the data must be done
manually in which case the more the data sources, the more the analysis time required.
One redeeming quality though is visualization and the automation of reports that the SIEM
systems can produce which makes them a worthy adversary of DMA systems.

The design chapter showed that it is possible to create a sandboxed environment for a
SIEM system to monitor and perform Dynamic Malware Analysis. The experimental
Sandbox was designed based on a well-designed and secured implementation and the
metrics of the assessment were explained based on the literature to provide an objective
comparison and highlight issues of the systems. The comparison procedures were also
described based on the literature review, to assess the systems based on their reports of
malware activity.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 99

All DMA systems were successfully deployed with all possible features enabled due to
various dependencies to assure the best possible outcome from the analysis of malware
samples. The experimental sandbox was implemented according to the design to conduct
safe analysis of malware samples by isolating the environment.

Implementation limitations have forced changes to the original design of the experimental
system, due to issues regarding data acquisition from the systems in order to achieve
monitoring for as many system parameters as possible, since if this was neglected the
detection abilities of the experimental sandbox would be much more limited. The analysis
results of the system implemented provided deep insight into malware sample activity,
complemented by visualization of the malware activity in the system, which can help in a
deeper understanding of the actions the malware is taking in comparison to a text report.
Overall the experimental sandbox performed well in the detection of malware activity and
Splunk performed as expected in detecting malware activity in the systems monitored.

The results of the malware sample executions showed that the metrics chosen pointed out
the issues discussed in the literature. The results showed implementation differences and
interpretation discrepancies, which were identified for malware activities that were
detected from all the systems to provide objective results. Additionally, a quantitative
comparison of the reports of the systems showed that the systems reported in different
depths for each category of malware activity which can be attributed to the
implementations and the monitoring of the system that is used in each of the analysis
systems.

Finally, the experimental sandbox system was able to detect most of the malware activity
and lacked in some activity categories due to design flaws. Even though the Splunk
monitored sandbox lacked in introducing a file system monitoring mechanism, it managed
to detect network activity successfully and introduced more comprehensive application
monitoring in the form of token elevation logs which were connected to privilege
escalation attempts by the malware. The most redeeming conclusion was that Splunk's
visualization abilities in reporting on activity caused by the malware samples give a much
more comprehensive view in understanding the true purpose of malware. Therefore
further research on the abilities of SIEM systems to dynamically analyze malware and more
specifically, Splunk's abilities in detection by introducing more detection mechanisms
through third party monitoring software, would be beneficial to the Dynamic Malware
Analysis field.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 100

6.2 Appraisal of Achievements


The aim of this thesis is to compare current Dynamic Malware Analysis systems, evaluate
them and identify issues, as well as to compare such systems to a Security Information and
Event Management system sandbox for Dynamic Malware Analysis. The objectives to
support the aim were as follows:

1. Review the literature around the subject of Dynamic Malware Analysis systems
including implementation techniques, issues and limitations and perform
research into SIEM technology and how it might be used for a similar purpose.
2. Design a comparison system and experiments for common Dynamic Malware
Analysis systems to assess their detection capabilities and possible
interpretation discrepancies in their analysis of malware samples, ant to
compare between systems. Also create a novel SIEM sandbox which can be
compared with the DMA system results.
3. Deploy DMA systems and implement a SIEM based dynamic malware analysis
sandbox, and run experiments with a range of malware samples to gather data
for analysis.
4. Evaluate DMA systems using appropriate metrics and compare the results to
the experimental SIEM sandbox findings.

To support these objectives the following hypotheses have been specified for testing and
evaluation:

 A custom sandbox environment can be created using Splunk to monitor the sandbox
and detect malware activity.
 Splunk will be able to detect all activity related to the malware, if the appropriate
monitoring mechanisms, to produce data for processing, are in place in the sandbox.
 The DMA systems will show interpretation discrepancies in their reports.
 The results of the systems will in most cases be different in respect of the quantity of
actions reported due to their different implementation approaches.

Objective 1
Review the literature around the subject of Dynamic Malware Analysis systems including
implementation techniques, issues and limitations and perform research into SIEM
technology and how it might be used for a similar purpose.

The literature review highlighted the advantages of Dynamic Malware Analysis compared
to other techniques. It also established the different implementation techniques that have
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 101

been used in the creation of DMA systems. The differences highlighted that due to
different implementations the detection effectiveness of DMA systems may vary and a
need for a more sophisticated framework to be followed in such implementations in the
future. Also, several issues regarding DMA systems were identified and discussed through
the literature review, such as interpretation discrepancies of malware activity, possible
vulnerabilities of systems and detection abilities. Limitations were also discussed including
that malware authors have noticed the popularity of these systems and are employing
evasion techniques to make their malicious code more robust against detection, by
exploiting environmental attributes or other features of the systems (M.Egele, T.Scholte,
E.Kirda, & C.Kruegel, 2011). Finally, the similarities between DMA systems and SIEM
technology were identified which lead to the design of a prototype sandbox environment
for dynamic malware analysis using Splunk. The main advantages of SIEMS is that they do
not produce fully automated reports and therefore they do not suffer from issues in the
reporting system like some Dynamic Malware analysis systems do and they also provide
visualization of activity reports making the interpretation of malware activity easier.

Objective 2
Design a comparison system and experiments for common Dynamic Malware Analysis
systems to assess their detection capabilities and possible interpretation discrepancies in
their analysis of malware samples, ant to compare between systems. Also create a novel
SIEM sandbox which can be compared with the DMA system result.

The experiment design was based on the literature to define the course of the comparison
experiment. The experimental sandbox was designed based on the design of TRUMAN
sandbox which is considered a robust infrastructure to successfully detect malware activity
in a dynamic environment. The comparison metrics were identified as implementation
variances, interpretation of malware activity discrepancies and quantitative approach on
reporting to assess the DMA systems in the amount of activity reported for each category
of malware activity, based on the research on the field. Specific and different types of
malware samples were chosen to be analyzed by the systems being compared to identify
issues in their reports and to simulate present threats.

Finally, the comparison procedure was described in order to illustrate the procedure
followed to compare the reports of DMA systems. The reports produced after analyzing
malware samples were compared to identify detection issues and discrepancies in the
interpretation of the actions that the malware performed and assess the ability of SIEM
systems to detect malware activity.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 102

Objective 3

Deploy DMA systems and implement a SIEM based dynamic malware analysis sandbox, and
run experiments with a range of malware samples to gather data for analysis.

The deployment of the DMA systems was not an easy task since the operating systems and
the dependencies of the systems varied. All systems were successfully deployed and tested
with as many features as possible to maximize their detection and reporting abilities.

The original design was incompatible with the monitoring mechanisms to be established in
the experimental sandbox. Limitations in forwarding activity data outside the sandbox
forced changes needed to maximize the activity categories that the experimental sandbox
would be able to detect. This issue was caused from connectivity complexity of Splunk to
remotely monitor the windows registry effectively from outside the sandbox.

To solve this issue Splunk was deployed inside the sandbox. The malware samples used did
not interfere with Splunk, and the environment was reset to a fresh state after each sample
execution to ensure that the monitoring system was not affected in any way. The analysis
results of the experimental sandbox system implemented, provided deep insight to
malware sample activity, complemented by visualization of the malware activity in the
system. Visualization of the activity can help in a deeper understanding of the actions the
malware is taking in comparison to a text report.

Objective 4

Evaluate DMA systems using appropriate metrics and compare the results to the
experimental SIEM sandbox findings.

The DMA systems that were included in the experiment were compared based on their
implementation, discrepancies in the reports produced by malware actions that were
detected and reported by all systems, and a quantitative approach in respect of the
findings in the reports.

The implementation comparison highlighted the different implementation techniques and


different monitoring mechanisms as well as the differences between the systems in
detection abilities. The implementation comparison verified the research in the field since
the findings were similar.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 103

Discrepancies in the reports of the systems included in the experiment were identified for
malware activity that was detected by all the analysis systems. Once again, these results
verified the research in the field and brought to light issues regarding the analysis systems
implementations and interpretation of activities from the point of view of the authors of
the systems.

The quantitative approach showed that the systems are not only based on their
implementations for detection, since systems with implemented monitoring mechanisms
did not produce the same quantity of results as others. Once again this issue derives from
the fact that each system is implemented in the point of view of the system's authors and
its interpretation or logging and reporting abilities may be limited, since the authors may
decide to pass on some information that may seem unimportant.

Finally, the experimental sandbox was mostly successful in detecting the malware samples
activity even though there were a few drawbacks due to the fact that Splunk was based on
Windows Security event logs and Windows Registry monitoring. This resulted in not
detecting some spawned processes and much registry activity, even though important
events in each execution were identified in each case. Splunk's main advantages against
the DMA systems were its ability to visualize reports to provide a much more
understandable presentation of the activity in the system and the lack of interpretation
errors or discrepancies since the system did not produce a fully automated report on the
malware sample but required manual search and interpretation.

6.3 Future work


Standardized DMA system design framework
The research of the literature in the field of dynamic malware analysis showed that most
DMA systems are introduced at research level but later take part in practical analysis
environments. The research point of view in the implementation of such systems comes
with gaps in the implementation of the system, in respect of the monitoring mechanisms
used for the analysis. There is therefore a need for a standardized framework for the
implementation of DMA systems to avoid such issues. A standard framework will not
eliminate malware evasion, which could also be due to implementation factors such as the
detection of emulated or virtualized environment by the malware. A framework would
solve issues regarding interpretation discrepancies and lacks in monitoring mechanisms for
the analysis system which lead to the implementation of better analysis systems.

Monitoring mechanisms
The monitoring mechanisms play a vital role in the detection of malware activity. The issue
is that in many cases analysis systems are implemented lacking some mechanisms or
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 104

partially implemented leading to detection issues. In addition to that problem, malware


authors create more and more sophisticated malware that employ evasion or obfuscation
techniques making the malware harder to detect. Therefore there is a need for more
mechanisms that monitor the system under analysis that are more robust and smart
against the evasion techniques employed by the malware authors.

DMA Discrepancies and results quantity


Research into DMA systems accuracy and efficiency in detection has focused on specific
metrics which have produced rich results in identifying the problems that such systems
face. Even though the contribution of such comparison is beneficial, each metric cannot
solely prove if a system is not sophisticated enough to be used for dynamic malware
analysis due to the issues it may face. A good example is the fact that although Buster
Sandbox Analyzer may not have produced vast amounts of logs considering every malware
action category that other systems did, it was able to "guess" on the type of malware and
summarize the actions it performs in a comprehensive way, based on common malware
terms and types, accurately in most cases. Therefore, there is a need for further study on
such systems that will compare the systems under as many metrics as possible to provide a
complete comparison, including as many systems as possible and online platforms in order
to provide a full assessment of DMA systems.

Splunk sandbox
The experimental sandbox introduced in this thesis performed as expected identifying
much of the malware activity regarding the systems and full network activity of the
samples. Despite some detection issues that were caused by the inability to introduce
additional data sources, specifically a file system monitoring mechanism, Splunk was able
to find and present most actions performed within the system. One of the most important
findings was the tokens for privilege escalation initiated by the samples, in most cases
through injecting in a system process like explorer.exe or svhost.exe or even spawned
processes. Splunk was able to detect the full spectrum of the activity of the malware
samples regarding the network by parsing the IDS logs produced, without any issues.

The evaluation of the experimental sandbox showed that the system was able to detect
important malware activity but lacked in Registry monitoring, since the findings were based
solely on Splunk's monitoring module and file system monitoring where no monitoring
module was used. It also introduced another monitoring mechanism, the security event
log, which produced very important information regarding the activity of malware samples
on the system and services. The application monitor is also another monitoring feature that
enriched detection, since it provided information about tokens created, while malware
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 105

were being executed. Therefore the system could be further improved by introducing new
mechanisms to monitor the file system and the registry.

Future work on the Splunk sandbox could be beneficial in evaluating the file system
monitoring capability of Splunk whether by utilizing a complex configuration or by using
third party software that can produce logs to be parsed by Splunk.

6.4 Personal Reflection


Dynamic Malware Analysis is one of the two main categories of malware analysis. The main
advantage of Dynamic Malware Analysis in comparison to Static analysis is the fact that the
code is executed and not statically analyzed which means that Dynamic analysis can
provide analysis results against obfuscation techniques lately employed by malware
authors to hide the true intentions of their malicious software (Bayer, Moser, Krugel, &
Kirda, 2006).

Dynamic Malware Analysis systems known as DMAs are automated systems that have been
developed in the last decade due to the advances in computing power and the rise of
virtualization technologies. In order for DMA systems to be able to detect malware activity
various techniques, are being employed by their developers, that have been developed
through research which implement different types of monitoring in a system and in
different variations (M.Egele, T.Scholte, E.Kirda, & C.Kruegel, 2011). The fact that each
system is implemented from the point of view of the authors causes issues in detection
since not every single monitoring technique and mechanism is implemented in each
system. In addition to this issue, each system implemented is based on programming
languages, scripts, libraries and presentation filters which may suffer from bugs. The
authors of such systems may also use presentation filters for the creation of a system's
reports in which they may introduce their own point of view in the interpretation of the
data collected from the system (Massicotte, Couture, Normandin, & Michaud, 2012).

The above issues can lead to discrepancies in the interpretation of malware activity since
each system is implemented differently. The experimental comparison conducted based on
the research on DMA systems has shown similar patterns in discrepancies to a study
conducted to propose a testing model for such systems (Massicotte, Couture, Normandin,
& Michaud, 2012). Furthermore, a quantitative comparison that was used as different
approach in the comparison experiments to compare these systems and their accuracy, as
introduced by another study aiming to assess such systems (Canali, Lanzi, Balzarotti,
Christoderescu, Kruegel, & Kirda, 2012) , showed that the systems reports varied in the
quantity of actions reported when categorized in mutexes, spawned processes, file system
changes, registry changes and process injections.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 106

The results of the comparisons showed that current DMA systems respond different to
different types of malware samples since some systems were able to detect malware
activity where other systems could not. Also, the report details varied between systems as
some systems were reporting more activity than others for several samples. The above
results lead to the conclusion that the systems cannot be fully trusted for their detecting
abilities while it is possible to fully assess the purpose of an unknown binary by analyzing it
on different systems, but that is not certain.

SIEM technologies have become very popular in recent years since they provide analysts
with the ability to process enormous amounts of security data in a single platform
efficiently. These systems are based on the monitoring mechanisms employed by the
infrastructure they monitor which is similar to what DMA systems do for sandboxed
environments to analyze malware.

Splunk was deployed in a sandboxed environment and used to analyze security data to
assess its ability to detect malware dynamically. The results of the experiment were more
than rewarding since Splunk was able to utilize the monitoring mechanisms of Windows
Server and an IDS implementation to detect malware activity.

In comparison to the results of the DMA systems Splunk was able to detect most of the
malware activity, but not every single action since it was based on the Windows Server
Security, Application, Registry and System monitoring mechanisms and no file system
monitoring was employed for the experiment. Nevertheless Splunk was able to detect
important events caused by the malware samples by parsing the logs. The findings of
Splunk included application requests for token elevation, which is directly connected to
privilege escalation caused by the malware in the system while the DMA systems reported
such activities as mutexes or registry changes without being able to interpret them. Splunk
was also able to effectively detect all network activity caused by the samples, since the IDS
implementation monitored all network activity in the sandbox. Finally, the visualization
ability of Splunk in reporting of events and activity in the monitored systems provides
analysts with insight on the events caused by the malware and do not produce an
automated interpretation leaving this task for the analyst to interpret what the malware's
purpose is. This leads to a lack of presentation and interpretation bugs making Splunk
robust to discrepancies in the interpretation of the activity of malware samples.

One of the two limitations of the experimental sandbox monitored by Splunk was the
inability to forward registry activity data from a malware machine to a Splunk monitoring
server outside the sandbox. Registry data forwarding is not supported by the universal
forwarder which is an issue that can be solved through very complex configuration of
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 107

Splunk according to the Splunk community and documentation or by introducing a third


party software that would monitor the registry of a malware machine and produce logs
that the forwarder would supply to the Splunk monitor outside the sandbox. The inability
to forward registry data to a server outside the sandbox by the universal forwarder,
although the virtual infrastructure was developed with a Vyatta firewall allowing only port
9997, which is the port the forwarder uses, to forward data outside the sandbox was a
drawback, since the experimental systems final topology did not ensure that the malware
did not interfere with Splunk with absolute certainty.

Another issue that may arise is that malware authors may develop malware that interfere
with the forwarders port (9997) in the future, creating issues to infrastructures monitored
by Splunk deployments.

The second limitation of the experimental sandbox was that there was no monitoring
mechanism for file system changes. This led to an inability of Splunk to detect any file
system changes other than some spawned processes and secondary applications installed
by the malware samples in common access areas of the windows file system such as the
roaming folder under the user's folder. This issue could also be solved in a future similar
experiment by introducing third party software to monitor the file system and produce logs
that can be parsed by Splunk.

Splunk's ability to detect malware activity depends solely on the monitoring mechanisms
that are introduced in the system that is being monitored. This means that if sufficient
monitoring mechanisms are introduced then Splunk would be able to process the data and
detect malware activity to a potential of 100%.

Despite the advances in Dynamic Malware Analysis and the automated systems which may
be improved in the future, and the issues discussed in this thesis solved, the need for
human interpretation of the data produced by such systems will always be vital, since a
wrong interpretation could create devastating results to large infrastructures and
corporations.

Finally, Although Dynamic Malware Analysis can overcome some of the difficulties that
static analysis faces due to the creation of more and more sophisticated and obfuscated
malware a combination of both techniques would be the best approach for analysts to
determine with certainty the full spectrum of malicious activities that an unknown binary
can cause in a system.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 108

References
Agrawal, H., Alberi, J., Bahler, L., Conner, W., Micallef, J., Virodov, A., et al. (2010). Preventing
insider malware threats using program analysis techniques. MILITARY COMMUNICATIONS
CONFERENCE, 2010 - MILCOM 2010 (pp. 936-941). San Jose, CA: IEEE CONFERENCE PUBLICATIONS.

Aguirre, I., & Alonso, S. (2012). Improving the Automation of Security Information Management: A
Collaborative Approach. Security & Privacy, IEEE Volume:12 , Issue: 1 , 55-59.

Baliga, A., Ganapathy, V., & Iftode, L. (2011). Detecting Kernel-Level Rootkits Using Data Structure
Invariants. Dependable and Secure Computing, IEEE Transactions on Volume:8 , Issue: 5 , 670-684.

Bayer, U., Moser, A., Krugel, C., & Kirda, E. (2006). Dynamic analysis of malicious code. Journal in
Computer Virology 2 , 67-77.

Buster. (2013). Buster Sandbox Analyzer. Retrieved 6 10, 2014, from http://bsa.isoftware.nl/:
http://bsa.isoftware.nl/

Canali, D., Lanzi, A., Balzarotti, D., Christoderescu, M., Kruegel, C., & Kirda, E. (2012). A quatitative
study of accuracy in system call-based malware detection. ISSTA 2012, International Symposium on
Software Testing and Analysis. Minneapolis, MN, USA: ACM.

Cao, Y., Liu, J., Miao, Q., & Li, W. (2012). Osiris: A Malware Behavior Capturing System Implemented
at Virtual Machine Monitor Layer. Computational Intelligence and Security (CIS), 2012 Eighth
International Conference on (pp. 534-538). Guangzhou: IEEE CONFERENCE PUBLICATIONS.

Cardenas, A., Manadhata, P., & Rajan, S. (2013). Big Data Analytics for Security. Security & Privacy,
IEEE, Volume:11 , Issue: 6 , 74-76.

Choi, Y. H., Han, B. J., Bae, B. C., Oh, H. G., & Sohn, K. W. (2012). Toward extracting malware
features for classification using static and dynamic analysis. Computing and Networking Technology
(ICCNT), 2012 8th International Conference on (pp. 126- 129). Gueongju: IEEE CONFERENCE
PUBLICATIONS.

Cuckoo Foundation. (2014). What is Cuckoo? Retrieved from readthedocs.org:


http://cuckoo.readthedocs.org/en/latest/introduction/what/

Dai, S.-Y., & Kuo, S.-Y. (2007). MAPMon: A Host-Based Malware Detection Tool. International
Symposium on Dependable Computing, 2007. PRDC 2007. 13th Pacific Rim (pp. 349-356).
Melbourne, Qld.: IEEE CONFERENCE PUBLICATIONS.

Gabriel, R., Hoppe, T., Pastwa, A., & Sowa, S. (2009). Analyzing Malware Log Data to Support
Security Information and Event Management Some Research Results. Advances in Databases,
Knowledge, and Data Applications, 2009. DBKDA '09. First International Conference on (pp. 108 -
103). Gosier: IEEE CONFERENCE PUBLICATIONS.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 109

Garber, L. (2013). News Briefs. Computer Volume:46 , Issue: 8 , 18-20.

Gorecki, C., Freiling, F. C., Kührer, M., & Holz, T. (2011). TrumanBox: Improving Dynamic Malware
Analysis by Emulating the Internet. Stabilization, Safety, and Security of Distributed Systems , 208-
222.

Jordan, C., Chang, A., & Luo, K. (2009). Network Malware Capture. Conference For Homeland
Security, 2009. CATCH '09. Cybersecurity Applications & Technology (pp. 293- 296). Washington, DC:
IEEE CONFERENCE PUBLICATIONS.

Kangarlou, A., Xu, D., Ruth, P., & Eugster, P. (2007). Taking Snapshots of Virtual Networked
Environments. Virtualization Technology in Distributed Computing (VTDC), 2007 Second
International Workshop on (pp. 1-8). Reno, NV: IEEE CONFERENCE PUBLICATIONS.

Kawakoya, Y., Iwamura, M., & Itoh, M. (2010 ). Memory behavior-based automatic malware
unpacking in stealth debugging environment. 5th International Conference on Malicious and
Unwanted Software (MALWARE), (pp. 39-45). Nancy, Lorraine: IEEE CONFERENCE PUBLICATIONS.

Li, X., Duan, H., Liu, W., & Wu, J. (2010). The growing model of Botnets. Green Circuits and Systems
(ICGCS), 2010 International Conference on (pp. 414-419). Shanghai: IEEE CONFERENCE
PUBLICATIONS.

M.Damshenas, A.Dehghantanha, & R.Mahmoud. (2013). A survey on malware propagation,


analysis, and detection. International Journal of Cyber-Security and Digital Forensics , 10-29.

M.Egele, T.Scholte, E.Kirda, & C.Kruegel. (2011). A survey on Automated Dynamic Malware Analysis
Techniques and Tools. ACM Computing Surveys , 1-49.

Madani, A., Rezayi, S., & Gharaee, H. (2011). Log management comprehensive architecture in
Security Operation Center (SOC). Computational Aspects of Social Networks (CASoN), 2011
International Conference on (pp. 284-289). Salamanca: IEEE CONFERENCE PUBLICATIONS.

Massicotte, F., Couture, M., Normandin, H., & Michaud, F. (2012). A Testing Model for Dynamic
Malware Analysis Systems. Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth
International Conference on (pp. 826-833). Montreal, QC: IEEE CONFERENCE PUBLICATIONS.

Microsoft. (2014). Sysinternals System Information Utilities. Retrieved 6 10, 2014, from
Microsoft.com: http://technet.microsoft.com/en-us/sysinternals/bb795535

Moser, A., Kruegel, C., & Kirda, E. (2007). Limits of Static Analysis for Malware Detection. Computer
Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual (pp. 421-430). Miami
Beach, FL: IEEE CONFERENCE PUBLICATIONS.

Patel, V. (2012). A practical solution to improve cyber security on a global scale. Cybersecurity
Summit (WCS), Third Worldwide (pp. 1-5). New Delhi: IEEE CONFERENCE PUBLICATIONS.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 110

Peng, W., Qingping, G., Huijuan, S., & Xiaoyi, T. (2010). A Guess to Detect the Downloader-like
Programs. Distributed Computing and Applications to Business Engineering and Science (DCABES),
2010 Ninth International Symposium on (pp. 458-461). Hong Kong: IEEE CONFERENCE
PUBLICATIONS.

Sandboxie Holdings. (2014). Sandboxie. Retrieved 6 10, 2014, from www.sanboxie.com:


http://www.sandboxie.com/

Shadowserver. (2014). Malwr-About. Retrieved 6 28, 2014, from Malwr: https://malwr.com/about/

Shan, A., & Shuangzhou, G. (2011). A enhancement technology about system security based on
dynamic information flow tracking. Artificial Intelligence, Management Science and Electronic
Commerce (AIMSEC), 2011 2nd International Conference on (pp. 6108-6111). Deng Leng: IEEE
CONFERENCE PUBLICATIONS.

Siko, T. a. (2014). FakeNet. Retrieved 6 12, 2014, from Running the Gauntlet:
http://practicalmalwareanalysis.com/fakenet/

Sikorski, M., & Honig, A. (2012). Practical Malware analysis. San Francisco: No Starch Press.

Stone-Gross, B., Cova, M., Gilbert, B., Kemmerer, R., Kruegel, C., & Vigna, G. (2011). Analysis of a
Botnet Takeover. Security & Privacy, IEEE Volume: 9 Issue: 1 , pp. 64-72.

VirusSign. (2014). Virussign -About. Retrieved 6 28, 2014, from VirusSign:


http://www.virusign.com/about.php

Virustotal. (2014, 6 27). About VirusTotal. Retrieved 6 27, 2014, from virustotal.com:
https://www.virustotal.com/el/about/

VMware. (2014, July 11). Best practices for virtual machine snapshots in the VMware environment
(1025279). Retrieved July 13, 2014, from VMware:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&extern
alId=1025279

Wang, H.-T., Mao, C.-H., Wei, T.-E., & Lee, H.-M. (2013). Clustering of Similar Malware Behavior via
Structural Host-Sequence Comparison. Computer Software and Applications Conference
(COMPSAC), 2013 IEEE 37th Annual (pp. 349-358). Kyoto: IEEE CONFERENCE PUBLICATIONS.

Wu, Y., Zhang, B., Lai, Z., & Su, J. (2012). Malware network behavior extraction based on dynamic
binary analysis. ) IEEE 3rd International Conference on Software Engineering and Service Science
(ICSESS (pp. 316-320). Beijing: IEEE CONFERENCE PUBLICATIONS.

Xie, P., Lu, X., Su, J., Wang, Y., & Li, M. (2013). iPanda: A comprehensive malware analysis tool.
Information Networking (ICOIN), 2013 International Conference on (pp. 481-486). Bangkok: IEEE
CONFERENCE PUBLICATIONS.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 111

Yee, C. L., Chuan, L. L., Ismail, M., & Zainal, N. (2012). A static and dynamic visual debugger for
malware analysis. Communications (APCC), 2012 18th Asia-Pacific Conference on (pp. 765-769). Jeju
Island: IEEE CONFERENCE PUBLICATIONS.

Yin, H., & Song, D. (2013). Automatic Malware Analysis, An Emulator Based Approach. New York:
Springer.

Yoshioka, K., Hosobuchi, Y., Orii, T., & Matsumoto, T. (2010). Vulnerability in Public Malware
Sandbox Analysis Systems. Applications and the Internet (SAINT), 2010 10th IEEE/IPSJ International
Symposium on (pp. 256-268). Seoul: IEEE CONFERENCE PUBLICATIONS.

Yu, S., Gu, G., Barnawi, A., Guo, S., & Stojmenovic, I. (2014). Malware Propagation in Large-Scale
Networks. Knowledge and Data Engineering, IEEE Transactions on , PP ( 99 ), 1.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 112

Appendix A: Project Proposal

EDINBURGH NAPIER UNIVERSITY SCHOOL OF COMPUTING

MSc RESEARCH PROPOSAL

1.0 Student details

Last (family) name Katsamakis

First name Nikolaos

Napier matriculation number 40132614

2.0 Details of your programme of study

MSc Programme title Advanced Security And Digital


Forensics

Year that you started your diploma modules 2013

Month that you started your diploma September


modules

Mode of study of diploma modules Full-time

Date that you completed/will complete your August 2014


diploma modules at Napier

3.0 Project outline details

Please suggest a title for your proposed project. If you have worked with a supervisor on
this proposal, please provide the name. NB you are strongly advised to work with a member
of staff when putting your proposal together.
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 113

Title of the proposed project Dynamic Malware Analysis and zero Day
detection through Splunk

Name of supervisor Richard Macfarlane

I do not have a member of staff lined


up to supervise my work

4.0 Brief description of the research area - background

Please provide background information on the broad research area of your project in the box
below. You should write in narrative (not bullet points). The academic/theoretical basis of your
description of the research area should be evident through the use of references. Your
description should be between half and one page in length.

Dynamic Malware Analysis and Zero day detection through Splunk

Dynamic Malware Analysis is becoming more and more popular in malware analysis since it
provides automatic detection of actions that malware can perform when executed in a system or
network. The malware is analyzed by the actions it performs in a given secure and secluded system
or network which is also known as a sandboxed or emulated environment. By testing malware in
different dynamic malware analysis systems, it is likely that there will be discrepancies in the
reports for specific malware samples and these discrepancies appear due to implementation or
other various issues. In addition not all systems produce the same results in respect of
categorization and quantity. This points to problems concerning the efficiency of the systems and
their detection abilities and highlights the need for a unanimous approach in implementing such
systems. Organizations nowadays, widely use Security Information and Event Management
systems in their Operations Centers to assist them in the processing of enormous amounts of
security data originating from a vast number and various types of monitoring systems that need
to be analyzed fast and accurately to detect attacks and deploy countermeasures to mitigate the
risks that derive from them. Splunk is a tool that can be used to analyze enormous amounts of logs
and network traffic and it also provides modules that can analyze DNS logs extensively to determine
activity that does not comply with normal network activity and at the same time provide a deep
analysis and visualization of logs to explore its Dynamic malware analysis capabilities. Its monitoring
abilities resemble the monitoring techniques implemented in common DMA systems. It is therefore
interesting to explore the ability of Splunk to dynamically detect malware and compare it to
Dynamic Malware Analysis systems

5.0 Project outline for the work that you propose to complete
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 114

Please complete the project outline in the box below. You should use the emboldened text as
a framework. Your project outline should be between half and one page in length.

The idea for this research arose from: CSN11123 Advanced Network and Cloud
Forensics

The aims of the project are as follows:

1)Review the literature on malware analysis and point out the benefits of Dynamic malware
analysis

2)Connect the Literature of Dynamic Malware Analysis with SIEMs

3)Design a setup to be used to detect malware by integrating SIEM technology (Splunk)


analysis

4) Deploy common DMA systems and a Splunk based Sandbox to perform analysis on 10
malware samples of different types

5)Evaluate DMA systems and compare their findings to Splunk's ability to assist in the
detection of threats and Malware activity on a system

The main research questions that this work will address include:

What are the benefits and issues of dynamic malware analysis platforms?

Are SIEM systems able to perform Dynamic Malware Analysis?

Is Splunk able to dynamically detect malware in a sandboxed system ?

Is Splunk better or worse than other Dynamic Malware analysis platforms with multiple
functions like Cuckoo?

The software development/design work/other deliverable of the project will be:

The design of a comparison procedure to assess DMA systems and the ability of SIEM
technology to perform Dynamic Malware Analysis in a sandboxed environment.

The project will involve the following research/field work/experimentation/evaluation:


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 115

Deployment of common malware analysis platforms and execution of known malware


samples in order to determine how effective each system is in detecting the actions the
malware is performing and compare with results from Splunk for discrepancies.

This work will require the use of specialist software:

VMware workstation, Splunk , Cuckoo, Sandboxie, FakeNet, Windows server 2008,


Malware repositories,

6.0 References

Please supply details of all the material that you have referenced in sections 6 and 7 above.
You should include at least three references, and these should be to high quality sources
such as refereed journal and conference papers, standards or white papers. Please ensure
that you use a standardized referencing style for the presentation of your references, e.g.
APA, as outlined in the yellow booklet available from the School of Computing office and
http://www.soc.napier.ac.uk/~cs104/mscdiss/moodlemirror/d2/2005_hall_referencing.pdf

Aguirre, I., & Alonso, S. (2012). Improving the Automation of Security Information Management:
A Collaborative Approach. Security & Privacy, IEEE Volume:12 , Issue: 1 , 55-59.

Bayer, U., Moser, A., Krugel, C., & Kirda, E. (2006). Dynamic analysis of malicious code. Journal in
Computer Virology 2 , 67-77.

Buster. (2013). Buster Sandbox Analyzer. Retrieved 6 10, 2014, from http://bsa.isoftware.nl/:
http://bsa.isoftware.nl/

Choi, Y. H., Han, B. J., Bae, B. C., Oh, H. G., & Sohn, K. W. (2012). Toward extracting malware
features for classification using static and dynamic analysis. Computing and Networking
Technology (ICCNT), 2012 8th International Conference on (pp. 126- 129). Gueongju: IEEE
CONFERENCE PUBLICATIONS.

Cuckoo Foundation. (2014). What is Cuckoo? Retrieved from readthedocs.org:


http://cuckoo.readthedocs.org/en/latest/introduction/what/

Dai, S.-Y., & Kuo, S.-Y. (2007). MAPMon: A Host-Based Malware Detection Tool. International
Symposium on Dependable Computing, 2007. PRDC 2007. 13th Pacific Rim (pp. 349-356).
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 116

Melbourne, Qld.: IEEE CONFERENCE PUBLICATIONS.

7.0 Ethics

If your research involves other people, privacy or controversial research there may be ethical issues to
consider (please see the information on the module website). If the answer below is YES then you
need to complete a research Ethics and Governance Approval form (available on the website:
http://www.ethics.napier.ac.uk).

Does this project have any ethical or NO


governance issues related to working with,
studying or observing other people?
(YES/NO)

8.Supervision timescale

Please indicate the mode of supervision that you are anticipating. If you expect to be away
from the university during the supervision period and may need remote supervision please
indicate.

Weekly meetings over 1 trimester 

Meetings every other week over 2


trimesters

Other
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 117

Appendix B: Project Plan

Figure 79: Project plan Gant chart


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 118

Appendix C: Project Diary

Figure 80: Diary week 4


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 119
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 120

Figure 81: Diary week 5


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 121
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 122

Figure 82: diary week 6

Figure 83: diary week 7


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 123
Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 124

Figure 84: Diary week 9


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 125

Appendix D: DMA Analysis Report samples


Zeus Botnet sample analysis

28520ba137f8872b2256205f37e56c0aa7f96b5b16c8a805aa591022dc940638

FakeNet:

Figure 85 FakeNet HTTP request reported


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 126

Figure 86: DNS requests and socket creation


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 127

Figure 87: FakeNet Socket creation (continue)


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 128

Buster Sandbox Analyzer:

Figure 88: Buster Analysis report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 129

Figure 89: Buster File system and registry report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 130

Figure 90: Buster Processes and Mutexes report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 131

Cuckoo:

Figure 91: Cuckoo Binary report

Figure 92: Cuckoo Summary and screenshots


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 132

Figure 93: Cuckoo registry report

Figure 94: Cuckoo registry report (continue)


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 133

Figure 95: Cuckoo execution screenshot


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 134

Anubis:

Figure 96: Anubis binary report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 135

Figure 97: Anubis process summary

Figure 98: Anubis process analysis


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 136

Figure 99: Anubis process analysis 2

Figure 100: Anubis process analysis 3


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 137

Zeus Banking Spyware that contacts C&C

3ff49706e78067613aa1dcf0174968963b17f15e9a6bc54396a9f233d382d0e6

FakeNet:

Figure 101: FakeNet Adobe execution error


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 138

Buster Sandbox Analyzer:

Figure 102: Buster summary report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 139

Figure 103: Buster registry analysis


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 140

Figure 104: Buster Process and Mutex analysis


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 141

Cuckoo:

Figure 105: Cuckoo binary analysis

Figure 106: Cuckoo summary and screenshots


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 142

Figure 107: Cuckoo Adobe execution error

Figure 108: Cuckoo behavior summary


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 143

Figure 109: Cuckoo Mutexes and Registry

Figure 110: Cuckoo Process activity


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 144

Figure 111: Cuckoo registry activity

Figure 112: Cuckoo Service initiation


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 145

Anubis:

Figure 113: Anubis process report

Figure 114: Anubis network activity report


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 146

Figure 115: Anubis spawned process report

Figure 116: Anubis spawned process report (continue)


Nikolaos Katsamakis, 40132614, MSc Advanced Security and Digital Forensics, 2014 147

Figure 117: Anubis spawned process report

Figure 118: Anubis Mutexes report

You might also like