You are on page 1of 32

Accepted Manuscript

A Lightweight Live Memory Forensic Approach Based on Hardware


Virtualization
Yingxin Cheng , Xiao Fu , Xiaojiang Du , Bin Luo ,
Mohsen Guizani
PII:
DOI:
Reference:

S0020-0255(16)30501-1
10.1016/j.ins.2016.07.019
INS 12345

To appear in:

Information Sciences

Received date:
Revised date:
Accepted date:

14 November 2015
29 June 2016
6 July 2016

Please cite this article as: Yingxin Cheng , Xiao Fu , Xiaojiang Du , Bin Luo , Mohsen Guizani , A
Lightweight Live Memory Forensic Approach Based on Hardware Virtualization, Information Sciences
(2016), doi: 10.1016/j.ins.2016.07.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT
1

A Lightweight Live Memory Forensic Approach Based on Hardware Virtualization


Yingxin Chenga, Xiao Fua,*, Xiaojiang Dub, Bin Luoa, Mohsen Guizanic
a

State Key Laboratory for Novel Software Technology, Software Institute at Nanjing University, China.
b

Dept. of Electrical and Computer Engineering, University of Idaho, USA.

CR
IP
T

Dept. of Computer and Information Sciences, Temple University, USA.

Abstract

AN
US

The results of memory forensics can not only be used as evidence in court but are also beneficial for analyzing vulnerability and
improving security. Thus, memory forensics has been widely used in many fields, including cloud security. Traditional memory
forensics, usually an after-the-fact method, is time-consuming and often loses important transient information. Thus, live methods,
which investigate memory directly, are presented. However, most of them are kernel based and easy to detect or confuse. Although

virtualization technology can overcome these shortages, it must be preinstalled and has high cost. To solve these problems, we
propose a lightweight live memory forensic framework based on hardware virtualization. It can build a virtualization environment

ED

on-the-fly. The operating system will be migrated to the virtual machine without termination or modifications. Then, the forensic
methods can acquire and analyze evidence at the hypervisor level. Two novel forensic methods are proposed to verify the

PT

effectiveness of the framework. They focus on acquiring accurate data and system behavior, respectively. The main ideas are
guaranteeing data accuracy in multi-view extraction and analyzing memory behavior in a para-synchronous style. Experiments

CE

have proved that these methods are able to obtain reliable and integrated evidence at an acceptable cost.

1.

AC

KeyWords: Hardware Virtualization, Live Forensics, Memory Forensics, Lightweight Forensic Framework

Introduction

EMORY forensics investigates illegal behaviors by acquiring and analyzing volatile memory data. This is done

-----------------------------------------------------------------------------------------*

Corresponding author at: State Key Laboratory for Novel Software Technology, Software Institute at Nanjing University, China. E-mail:
fuxiao@nju.edu.cn.
E-mail addresses: yingxincheng@gmail.com (Yingxin Cheng), fuxiao@nju.edu.cn(Xiao Fu),xjdu@temple.edu(Xiaojiang Du),
luobin@nju.edu.cn (Bin Luo), mguizani@ieee.org (Mohsen Guizani).

ACCEPTED MANUSCRIPT
2

because memory can provide information that disks do not contain. Moreover, although binary code can be encrypted
or obfuscated, all illegal processes still must be executed in memory and will inevitably leave some footprint. Thus,
evidence from memory is more reliable. The results of forensics can not only be used as evidence in court but are also
beneficial for analyzing vulnerability and improving the security of systems. Due to these advantages, memory
forensics has been widely used in many fields, including cloud security and network security.

CR
IP
T

Traditional memory forensic methods are after-the-fact methods. They obtain the memory dumps after the
occurrence of illegal events and then analyze the dumps to obtain evidence. However, memory data is transient;
therefore, one single dump is not able to show all of the states of systems. In addition, with the development of
anti-forensics, illegal processes usually hide themselves and only attack when necessary. Thus, memory dumps seem

AN
US

quite normal in most cases. This causes great difficulties for the identification of illegal behaviors by forensic methods.
Thus, many researchers have now turned to analyzing live systems directly for evidence.
The analysis of live systems faces several challenges. First, to be undetectable to criminals, live forensic methods
should try not to intervene or modify the target system. Moreover, some machines (e.g., e-business servers) being

investigated cannot be restarted or stopped for a long period of time. Thus, the live forensic method should be
installed/uninstalled whenever necessary, and the entire forensic procedure should not affect normal services. Second,

ED

how to obtain reliable evidence and keep it easy to understand is difficult. It is easy to obtain intelligible data based on
the system interfaces. However, these data are not reliable because the interfaces may be tampered with by the criminal.

PT

The instructions and events executing on hardware cannot be tampered with, but they are difficult to understand
without sufficient knowledge. This is the so-called semantic gap, and it is an important challenge that live memory

CE

forensic methods should consider. Third, memory data is transient, with no fixed addresses in memory. Thus, they are
hard for forensic methods to track. For example, the behaviors of illegal processes are usually instant. Forensic tools

AC

can find that the kernel has been modified via comparison, but they cannot catch such behaviors, resulting in no
information about who executed such behaviors. Thus, how to catch these transient illegal behaviors is a large
challenge that live memory forensic methods must address.
In this paper, a lightweight live memory forensic framework is proposed. Its main idea is intercepting and analyzing
hardware events by hardware virtualization to build a virtualization environment on-the-fly. The operating system
being investigated can be migrated into the virtual machine without termination or modifications. Then, the forensic

ACCEPTED MANUSCRIPT
3

methods based on this framework can acquire and analyze evidence at the hypervisor level. The forensic procedure is
quick, and the evidence it produces is consistent. After investigation, the target OS will continue running automatically.
Hardware virtualization is now widely supported by popular processors. In addition, this technique has been proven
applicable for not only Windows but also Linux [5]. Thus, our framework can be applied on most current popular
hardware platforms and operating systems without installing any additional hardware. The platform can also be

CR
IP
T

installed on demand and uninstalled after forensics.


To verify the effectiveness of our framework, two novel forensic methods are proposed. Different from current
systems (HyperSleuth [17], Vis [34] and VAIL [35]), which mainly focus on dumping memory, our methods provide
solutions to two difficult challenges that live memory forensics is faced with.

AN
US

First, to obtain both reliable and intelligible evidences, a multi-view evidence extraction method is proposed. The
basic idea is to obtain accurate information from both inside and outside the operating system. For example, a process
has to be switched to by a processor to run; therefore, the currently running processes can be detected from the related
context switching hardware events. At the same time, detailed process information can also be easily acquired from

system interfaces. By combining the information obtained from the two sides, reliable multi-view evidence that
contains a list of running processes and their details, and even a list of processes that have recently run, can be

ED

generated. Moreover, according to a cross-view of the information from the two sides, the hidden processes, and other
tampered with information can be identified.

PT

Second, to catch transient illegal behaviors, a para-synchronous memory behavior analysis method is proposed. It
can trace memory modifications by controlling page-grained permissions and intercepting hardware events. In

CE

addition, it can create a complete evidence chain that includes not only specific memory modifications but also the
related processes and their EXE file paths. It can also capture corresponding running states of the process for further

AC

detailed analysis. These pieces of evidence are very important for proving where memory modifications come from.
Compared with the widely used instruction-level memory behavior analysis, which requires software emulators such as
QEMU [6] and complex P2V migration, our method can reach the same accuracy in revealing the memory-write
related processes and the precise memory addresses modified by them. Moreover, the overhead is much lower because
our method can select the traced memory range and optimize the performance, and the target memory is accessed by
the same process. We also consider supporting typical Symmetric Multi-Processor (SMP) systems. Various

ACCEPTED MANUSCRIPT
4

experiments have proved that the memory usage of our method and its performance impact on the investigated system
are acceptable.

2.

Related Work
According to where the methods are run, current memory forensic methods can be divided into the following

CR
IP
T

categories.
Methods running in the OS Kernel. These types of methods are easy to understand and have been widely used.
They are based on certain system interfaces and are able to obtain a partial or complete dump of memory and then
obtain the system state from this dump. EnCase [13] and Win32dd [31] are two examples. Unfortunately, malware that
also runs in the kernel can easily reduce their effectiveness, for example, the anti-forensic tools provided by Metasploit

AN
US

[18].

Methods based on hardware. Moon presented a forensic method based on a PCI device. It can locate malicious
system transactions by monitoring the bus traffic and access physical memory by DMA. The forensic procedure does
not require the cooperation of the CPU or any software. However, it does require the installation of a specific piece of

hardware or interface. Thus, it is not very convenient for live forensics. Wang [14] used a similar method to obtain

ED

reliable memory dumps. However, the research in [24] shows that these types of methods are also vulnerable to attack.
Methods based on an Independent OS Kernel. BodySnatcher [27] can inject an independent acquisition-specific

PT

OS into the potentially subverted host OS kernel, snatching full control of the hosts hardware, and then obtain the
memory dump. The method is more resistant to subversion due to its reduced attack surface. However, it is very

CE

complex to implement, especially in a multi-processor environment, because it has to control the switch between two
kernels to intercept the key forensic-related events and analyze them. This control will break the integrity of the system

AC

and make the forensic mechanism vulnerable.


Methods based on SMM. SMMDumper, implemented as a x86 firmware, leverages the System Management Mode
(SMM) of Intel CPUs to create a complete and reliable snapshot of the state of the system that, with minimal hardware
support, is resilient to malware attacks. It is the first technique that is able to atomically acquire the entirety of the
volatile memory, overcoming the SMM-imposed 4-GB barrier while providing integrity guarantees and running on
commodity systems. However, to support live forensics, this type of method has to modify the control logic of the
system to let it generate SMI interrupts. This makes it easy to be detected by the criminal.

ACCEPTED MANUSCRIPT
5

Method based on Software VMMs. It is convenient to use software VMMs (Virtual Machine Monitors), such as
QEMU, to implement synchronous memory monitoring by interfering with the binary translation process. Various
works have applied this method to record a processs memory behavior for dynamic analysis. For example, PoKeR [22]
is a kernel rootkit profiler designed to produce multi-aspect profiles, including hooking behavior. SigGENE [29] and
DataGene [20] profile malware from its memory access patterns related to kernel objects. KernelGuard [21] prevents

CR
IP
T

malicious memory modifications using a similar approach. Although these methods are powerful, they are slow and not
suitable for monitoring systems running on bare hardware. Thus, they are not practical for live memory forensics.
Methods based on Hardware Virtualization. In hardware-assisted virtualization, the hardware provides
architectural support that facilitates building a virtual machine monitor and allows guest OSes to be run in isolation.

AN
US

Thus, it is more efficient and secure than software VMMs. People are already currently using it in forensics. For
example, Spider [11] can provide stealth binary instrumentation and debugging capabilities. DARKVUF [16] is able to
track behavior of both user- and kernel-level malware. Ether [12] can trace all system calls by monitoring the value of
SYSENTER_EIP_MSR; based on these system calls, it can analyze the behavior of malware. Experiments on 25,000

malware samples show that Ether remains transparent and defeats obfuscation tools that evade existing approaches.
Similar works are SPEMS[28] and MalGene [15]. However, none of them consider the migration issue, i.e., how to

ED

build a virtualization environment on a target machine. That will greatly affect the practicability of these methods.
Therere very limited researches utilizing lightweight hypervisor to do live forensics, its potential to migrate a live

PT

system is underestimated. Rutkowska [23] first implemented a prototype that supports live migration, this prototype is
not for forensics but for hiding Trojans. But she still proves the ability of this approach to subvert a live system from

CE

hypervisor level. HyperSleuth [17] is the first lightweight forensic virtual machine that supports live migration. It can
build a virtualization environment on-the-fly. The operating system will be migrated to the virtual machine without

AC

termination or modifications. Based on this VM, a dump of memory and the records of system calls can be obtained.
However, its cost is high (greater than 400%). Vis [34] also implemented a memory forensic framework supporting live
migration, and its cost is lower. However, it is only able to obtain memory dumps. VAIL [35] follows VIS and can
monitor network activities through a PCI device. However, there is no solution that focuses on monitoring transient
memory behavior based on hardware virtualization, which is important in profiling illegal events, not to mention
keeping the evidence both reliable and intelligible. Thus, we carefully studied the two issues and have presented

ACCEPTED MANUSCRIPT
6

various solutions in this paper.

3.

System Overview
Our system supports 64-bit Windows and is implemented based on Intel VT with EPT (Extended Page Table), which

are Intels implementation of hardware virtualization and SLAT, respectively. Our system is applicable for most

CR
IP
T

popular Intel platforms.


The Intel VT-x technique separates the CPU execution into two modes: VMX root mode and VMX non-root mode.
Our system runs in VMX root mode; therefore, it can monitor the target OS running in non-root mode and obtain
related evidence at a higher privilege level. Moreover, because of the isolation of the two modes, this forensic

3.1.

AN
US

procedure is transparent to the target OS.


Overall Design of the Lightweight Live-forensic Framework

Our framework is composed by several components. Each of them encapsulates data structures and code to provide
basic functions for forensics (see in Fig. 1). The main part of the framework is the VM monitor, which contains four

components, the event processor, evidence exporter, controller, and memory virtualization module. The other part of
the framework is the initiation driver, whose main tasks include enabling hardware virtualization, allocating memory,

AC

CE

PT

ED

and migrating the live OS.

Fig. 1. Architecture of the Lightweight Live-forensic Framework

Event Processor. The event processor has full control of the VMCS, which controls events for trapping the target
OS for forensic logic. Intel VT provides a series of hardware events. To use the same forensic logic on different Intel

ACCEPTED MANUSCRIPT
7

platforms, the event processor abstracts these events to forensic events. For example, the modifying of the CR3 register
will be defined as a process context switch event, and the writing to the SYSENTER_EIP_MSR register will be defined
as a system call event. After catching these events, the event processor will deliver them to other components or
forensic methods. The delivery priority is first components (such as the controller and memory virtualization module)
and then forensic methods.

CR
IP
T

Memory Virtualization Module. The memory virtualization module is based on EPT. It can control the translation
process between the physical address in the target OS and the real physical address in the host machine. It can also
isolate the memory used by the hypervisor and that used by the target OS. Moreover, it provides the memory access
controls. These controls are the basis of the live behavior analysis.

AN
US

The memory virtualization module provides three modes in total. The first is disable mode, i.e., SLAT is disabled.
This mode will bring high efficiency to the forensic methods that do not need memory virtualization. In this mode, the
memory of the guest OS is the same as the memory of the host. Thus, the forensic framework cannot control the guest
memory. The second is single mode, i.e., all processors use a single set of second-level paging structures. In this mode,

the control of memory mapping and memory access has effects on all processors. It is applicable for the forensic
methods that only need basic memory virtualization. The third is precise mode, i.e., each processor has its own

ED

second-level paging structures. This mode is powerful and complex. It can control the memory mapping and memory
access for each processor separately. Our para-synchronized behavior analysis method uses this mode.

PT

Controller. The main goal of the controller is to help investigators efficiently and safely control the forensic method.
It provides four functions to the forensic methods to meet their requirements for command control. The first function is

CE

the ability to send and receive the command type and content. The second is notification of the result of a command.
The third is helping various commands choose processors. The last is maintaining the safety and transparency of the

AC

transmission of commands. Based on these functions, forensic methods need not consider the difference of platforms.
They can use unified software interfaces.
The implementation of the controller is based on the CPUID event. In Intel VT, CUPID is a type of instruction that
will cause VMExit unconditionally. The hypervisor has to handle this event. The traditional function of CUPID is to
help applications inquire about CPU characteristics. Applications put the inquiry type in the RAX register and execute
the CPUID instruction at any privilege level, and then the CPU will put the result into this register. Thus, an application

ACCEPTED MANUSCRIPT
8

can identify characteristics of the CPU according to this result. In our system, we use this instruction to control the
forensic hypervisor. What the investigators should do is put the command type into RAX, the parameters into RBX and
the password into RCX, then execute CPUID. The event processor will verify the key. If it is correct, the corresponding
forensic method or basic function (such as the uninstall function) will be invoked. The result, return value and
verification code will be filled into RAX, RBX and RCX, respectively, and then the investigator will be notified.

CR
IP
T

Alternately, if anything is wrong, the normal CPUID procedure will be executed. Thus, the OS and malware cannot tell
the difference.

For some commands that choose a CPU, this is implemented by filling the CPU id into RBX. Each CPU has its own
data structure that contains the commands waiting to be executed. Whenever the CPU is trapped into non-root mode, it

AN
US

will check whether there are commands waiting to be executed and execute them immediately.

Evidence Exporter. This module provides a fast and reliable evidence export mechanism. Forensic evidence is
usually large in size. Thus, the transmission is time-consuming and at the risk of being tampered with by malware. To
address these issues, our evidence exporter acquires a block of memory when it is initialized. This block is mapped into

the address space of both the target OS and the forensic monitor. Thus, the forensic framework can write acquired data
into this block and then transmit them out by a certain mechanism (such as an application running on the target OS). To

ED

maintain the integrity of the data, encryption and MD5 signatures are used in transmission. Our framework provides a
unified export interface to the forensic methods. This allows them to concentrate on forensics. Moreover, this interface

PT

is easy to extend.

The above modules abstract the hardware, certain key forensic functions and command control. They can hide the

CE

differences between platforms and provide unified software interfaces to forensic mechanisms. The basic functions
they provide will make the implementation of forensic mechanisms much easier. Moreover, the component-based

3.2.

AC

design will make it easy to extend.


Design and Implementation of Key Features

Live migration and prevention of anti-forensics are two key features of our framework. This section will introduce
the design and implementation of these two features.
3.2.1. Design and Implementation of Live migration
Live migration is the precondition of live memory forensics. It means to build a virtual hypervisor on-the-fly and

ACCEPTED MANUSCRIPT
9

migrate the target OS to the hypervisor without modifying or terminating it. In our framework, an initiation driver
running in the kernel will enable the Intel VT of each CPU and inject the forensic VMM under the target OS. When
uninstalled, the forensic VMM will disable the Intel VT of all CPUs and free the occupied memory.
The detailed process of initialization can be described as follows. First, the initiation driver applies for non-paged
memory from the target OS. This memory will not be returned during the entire lifecycle of the forensic framework.

CR
IP
T

Then, the forensic framework initializes its own memory management system and records where each memory page is
retrieved from or allocated to. Based on these records, the forensic framework builds its own page tables and isolates its
memory from the target OS. Then, the MMU (Memory Management Unit) of the CPU will switch the page tables from
the target OS to the forensic framework. After that, the framework invokes the initialization interface of each module or

AN
US

forensic method. The next step of the initiation driver is checking the state of each CPU, enabling Intel VT, configuring
the VMCS and copying the current state of the OS into the virtual machine state in the VMCS. Now, the control of the
CPU has been taken over by the forensic VMM. The running state that is being migrated includes instruction registers,
segment registers, tag registers, and the descriptor table. The VMM executes the VMLAUNCH command to resume

the running of the target OS. However, this OS is now running in the VM. Then, the forensic framework virtualizes the
next CPU based on the same steps. After all of the CPUs have been virtualized, the initialization phase is finished. The

ED

uninstall process is a reversed version of initialization.

The initialization of the forensic framework needs the help of the driver, but the main tasks of this driver are applying

PT

for memory, changing CPU affinity and locking the required area to maintain the consistency of memory data during
the initialization. Those are the normal behaviors of any driver. The enabling of Intel VT is executed directly on the

CE

CPU hardware. It does not need the intervention of the OS. Moreover, the entire switch procedure will be finished in a
very short amount of time. If any malware wants to detect this procedure, it has to check the state of the CPU

AC

continuously. However, this will reduce the performance of the system and cause the user to become aware.
3.2.2. Design and Implementation of features for preventing anti-forensics
As a forensic method, remaining isolated from the target OS and hiding the entire forensic procedure are necessary to
obtain reliable evidence. With the development of anti-forensics, malware can keep silent or uninstall itself as soon as
it detects the forensic mechanism. Some will even destroy the system and hardware. Thus, remaining isolated and
hidden is not only important for reliable evidence but also for protecting the target systems.

ACCEPTED MANUSCRIPT
10

To hide the forensic framework, the following mechanisms are used. (1) The forensic events are intercepted only
based on hardware virtualization. There is no modification of the target OS. Thus, malware cannot detect the existence
of the forensic framework by checking for modifications. (2)The memory of the forensic framework is isolated from
the target OSs; therefore, malware cannot detect the existence of the forensic framework by checking memory data.
During initialization, the forensic framework records the addresses and size of each piece of memory allocated from the

CR
IP
T

target OS. Then, it will modify the OSs page tables and remap those pages, which contain the frameworks code and
data, to blank pages. Thus, when the OS is resumed, in its eyes, the memory occupied by the forensic framework is
mapped to a blank page. However, when the CPU switches from non-root mode to root mode (at that time, the page
tables will also switch to the hosts page tables), the VMM can see all of the code and data of the forensic framework.

AN
US

(3) Malware can detect the existence of the forensic VMM by finding the extra CPU time based on RDTSC
instructions. To prevent this type of detection, the forensic VMM will subtract its execution time when switched to the
non-root mode. This is implemented by modifying the TSC offset field in the VMCS. (4) The VMXE bit of CR4 can
show whether a VMM is running. Thus, our forensic framework will set the according bit of the shadow CR4 in the

VMCS to a value that indicates that no VMM is running. The target OS will read the value from this shadow CR4. (5)
The controller in the forensic framework can ensure that the control commands are safe from anti-forensics because

ED

malware cannot intercept and explain these commands. (6) The encryption and signature to shared memory will
maintain the safety and transparency of the evidence extraction procedure.

PT

Based on the above mechanisms, the forensic framework is almost isolated from and transparent to the target OS.
However, to keep the entire forensic procedure hidden and transparent, forensic methods also need to obey the

CE

following rules. First, they can intercept OS behaviors only based on the events supported by hardware virtualization.
They also cannot modify the target OS. Second, they must reduce the suspended time of the OS by increasing the

AC

frequency of analysis. Finally, these methods should be implemented based on the modules provided by the forensic
framework.

However, research has shown that the VMM can be detected based on the cache of the TLB [23]. This is the
shortcoming of hardware virtualization, and it cannot be overcome by software solutions alone. However, even in this
situation, malware still cannot know what the function of this VMM is.

ACCEPTED MANUSCRIPT
11

4.

Collecting Process Information Based on Multi-view Extraction


It is very difficult to extract both reliable and intelligible evidence from memory. In this section, we present a

solution: multi-view extraction. Specifically, it is a forensic technique to extract accurate process information from a
running system because this type of information is very important in memory forensics. Our technique is designed and
implemented based on our forensic framework. Thus, its result can prove the efficiency of our framework.
Why choose multi-view extraction

CR
IP
T

4.1.

Forensic tools can easily read system state from memory or system APIs, but those data can be easily modified,
erased, or faked by malware. An operating system is so complicated that it maintains copies of its states with intricate
logic. Any modification in the control flow will cause an incorrect system state, leading to malicious data hiding and

AN
US

attacking. The battle between malware and forensics is endless in the kernel.

The best example lies in process information extraction. An operating system preserves process information in
multiple locations for different purposes. Sometimes a modification to some locations will not crash the running
system. In Windows, for example, malware can unlink the ActiveProcessLink to hide itself without harming the

The most common way is to use the interface NtQuerySystemInformation. The same as with the Task Manager,

ED

system. There are already many methods to detect such hidden processes [7].

NtQuerySystemInformation traverses ActiveProcessLink to obtain process information. Windows maintains all

PT

of the living processes in this linked list, which can also be accessed from PsActiveProcessHead. However,
malware can hide itself by modifying this list.
The handle table PspCidTable contains information about processes and threads. The system API provides

CE

PsLookupProcessByProcessID to lookup that information by PID. However, malware can hide itself by erasing

AC

the related entries in PspCidTable.


Windows maintains a schedule table to switch processes into processors. Processes can be enumerated by using
KiWaitInListHead and KiWaitOutListHead [25]. The operating system relies on this table to schedule
processes, and there are tools to obtain process information based on the idea. However, the tool named phide
[1] can defeat this type of detection by modifying the schedule logic of the OS.

The method SwapContext performs the action to switch processes. Every switch event can be detected if this
method is hooked. It is powerful, but because the modification is located in the kernel, malware can detect it and

ACCEPTED MANUSCRIPT
12

defend against it.

Every process has its separate memory space with a set of paging structures. Thus, a process can be detected by
the pattern of paging structures and the EPROCESS structure. Pattern search is used to detect hidden processes
[26]. However, this method is very slow, and malware is able to hide itself in memory using Shadow Walker
[30] technology.

CR
IP
T

The above examples indicate that it is nearly impossible to extract accurate process information from the kernel.
Every detection attempt has a countermeasure. It is an endless battlefield between malware and forensic technologies.
An effective solution is to monitor the system from a higher privilege level. Unfortunately, there is a semantic gap
between different levels. The information formulation from a different view is difficult to translate and understand,

AN
US

which impedes further analysis.

Our multi-view extraction method provides an ideal solution to obtain both intelligible and accurate information
from memory. It analyses the system from both the hardware and user views simultaneously. The information from
those two endpoints guarantees reliable evidence. Any malicious modifications, erosion, or fabrications can be

detected by comparing the different views. In addition, various elusive pieces of information can be supplemented and
embodied from another view.
The design of multi-view extraction

ED

4.2.

The basic idea of multi-view extraction is to obtain accurate process information from both inside and outside the

PT

operating system. Every process is required to be switched into a processor to run; therefore, the processes can be

CE

detected from related hardware events. At the same time, detailed process information can be easily acquired from
system interfaces. Thus, multi-view extraction can generate detailed results about processes from the two sides.

AC

Fig. 2 shows that our method can intercept context switch events from hardware with the help of the forensic
framework. The hypervisor monitors the hardware continuously and generates a process queue during monitoring.
Once the method receives the investigators commands, it copies the latest process queue and acquires process
information from the system API, then generates results for the investigator.

ACCEPTED MANUSCRIPT

Fig. 2. Design of Multi-view Extraction.

4.3.

The implementation of multi-view extraction

CR
IP
T

13

We have implemented the method based on our forensic framework. The current prototype works for Windows 7,

4.3.1. The methods implementation

AN
US

64-bit version, and supports hardware with Intel VT technology.

The methods implementation consists of three parts, as follows:


The hypervisor part to monitor the hardware;

The OS part to obtain information from the user view;

The analysis process of the two views.

CE

PT

ED

The hypervisor part is based on the forensic framework and has two purposes shown in Fig. 3.

AC

Fig. 3. Multi-view Extraction Logic.

First, it monitors context switch events and updates the process queue. Whenever a context change event occurs, the
hypervisor first checks whether the new process is already recorded in the process queue. If it is, the hypervisor will
move the related entry to the head of the queue; otherwise, a new entry will be allocated from the free entry list to
record the new process. If there is no free entry, the hypervisor will rewrite the last inactive entry of the queue. The
queue is hashed by a CR3 value to accelerate the search speed.
Second, the hypervisor receives commands from the investigator and updates the states of processes in the process

ACCEPTED MANUSCRIPT
14

queue, then sends them to the analysis process. When the hypervisor copies the process queue, it should update every
processs status to check whether the process is still running. With the help of the recorded physical address of
EPROCESS, it is very fast to update the status by checking whether the value of DISPATCHER_HEADER is valid.
Each entry contains a processs information, including its PID, CR3 value (the base physical address of the paging
structure), the physical address of EPROCESS and a process switch counter. The hypervisor maintains a global switch

CR
IP
T

counter, which is increased by every context switch event. The process switch counter records the latest global switch
counter whenever the process is switched into the processor. Thus, the exact time of every processs most recent
running can be reported to investigators.

In the OS part, user-space software records process information from SYSTEM_PROCESSES structures returned by

AN
US

the interface NtQuerySystemInformation, including process name, create time, and the execution time in both user
mode and kernel mode.

The result will be inaccurate if a process is created or destroyed during analysis. To solve this problem, there are at
least three acquisition times in one analysis round. The first and the third are from hypervisor. If the two results do not

match, our method will continuously launch another round of analysis until the result is valid. A single analysis round
is within a few milliseconds, but it still has limitations if the OS incessantly creates and kills processes in a very high

ED

frequency.

During analysis, the process information from the hypervisor and system view are combined. A process record in the

PT

result has the following four states:

0x0: The recorded process is running;

0x1: The recorded process only appears in the hypervisor view, which means it is hidden in the kernel;

0x2: The recorded process has been recorded in the hypervisor view but has been destroyed by the OS and is no

CE

AC

longer valid;

0x3: The recorded process only appears in the system view, which means it may be faked by malware.

4.3.2. Framework-related implementation


The forensic framework provides live migration and four modules to guarantee cross-platform features and
transparency. It simplifies the method implementation, and the code base is only 441 SLOC.
Fig. 4 shows our methods utilized control module, evidence export module and event-handling module. The

ACCEPTED MANUSCRIPT
15

event-handling module configures the VMCS to intercept control register modification behaviors. VMExit is thus
triggered, then CR3 modifications are converted to context change events sent to the method. The control module
forwards investigator commands to the method, including start monitoring, end monitoring and acquire process
information commands. The evidence export module provides a safe tunnel to transfer analysis results, and more

AN
US

CR
IP
T

features, such as signature and encryption, can be added without any modification to the method implementation.

Fig. 4. Framework Related Implementation.

The operating system is frozen when trapped in hypervisor mode. Thus, it is safe to dereference pointers without any

ED

concern of inconsistency. The method implementation only uses hardware events to intercept system behavior without
any modification to the kernel. Thus, the transparency is maintained, and this method is isolated from the target system.
The experiment and performance of multi-view extraction

PT

4.4.

CE

In this section, we will present a case study and evaluate the system. The experiments are conducted on a Lenovo
E47A laptop, with 4 GB RAM and an Intel Core i5-2450M CPU running at 2.50 GHz inside. The operating system

AC

is 64-bit Windows 7, Build 7601.


4.4.1. Case study

To verify our methods capabilities, sample Windows software is provided that can conceal itself from the Windows
Task Manager. The result proves our methods validity.
The sample software is named ProcessA.exe. Windows uses a double linked list called ActiveProcessLink to store
active processes. The symbol of the list header is called PsActiveProcessHead in kernel space. The Task Manager
enumerates all of the processes by iterating this list. Thus, ProcessA.exe can hide itself by removing the related

ACCEPTED MANUSCRIPT
16

EPROCESS entry from ActiveProcessLink.


The experiment contains the following steps:

Run the sample ProcessA.exe and verify that it does not appear in the Task Manager;

Run the control group ProcessB.exe, which is like ProcessA but without the hiding part;

Install the forensic platform and start monitoring with the helper software Caller.exe, which can generate the

CR
IP
T

CPUID;

Wait until the hypervisors process queue is stable;

Run mspaint.exe and stop it during monitoring;

Acquire evidence about process information using our forensic platform and store it in a file.

AN
US

The content of the Windows Task Manager in the last step of the experiment shows that there are 34 running
processes but not the hidden ProcessA.exe. The PID of ProcessB.exe can be seen in the Task Manager, which is 936
(0x3a8).

Fig. 5 shows the evidence acquired by our method. Every record entry contains a processs PID, its last switch tick,

its CR3 value, its process name, its running state, and its memory usage. There are 34 normal processes recorded with
state 0x0, which is exactly the same as the result in Task Manager. The memory usage is available only with the state

ED

0x0 because this information can only be acquired from the system API. The latest running process is Caller.exe with
tick value 0x0, which is reasonable because Caller.exe is forensic helper software for generating CPUID commands

PT

with arguments and communicating with the hypervisor. ProcessA.exe is recorded with state 0x1, meaning that it is a

AC

processes.

CE

hidden process. There are five dead processes with the state 0x2, including mspaint.exe and various other system helper

Fig. 5. Acquired Evidences.

In conclusion, all of the evidence in the experiment is consistent with the facts, which proves that our method is
practical for extracting true process information and their latest switch time.

ACCEPTED MANUSCRIPT
17

4.4.2. Performance
To measure the monitoring performance, we select PCMark 7 [2] to benchmark the target system. PCMark 7 is a
professional Windows benchmark solution to measure the computing, entertainment, storage, and production
performance of a system. In this experiment, only the computing, storage, and overall performance are measured.
There are three groups of performance tests. The control group does not run any forensic-related software; the silent

CR
IP
T

group installs the forensic platform but does not monitor; the experiment group installs the forensic platform and
monitors hardware events.

Fig. 6 shows that the overall forensic performance overhead is 4.7%, with 1.5% coming from the forensic framework
and hardware virtualization. Our methods overhead is 3.2%, which mainly comes from the computing cost

AN
US

(approximately 6.5%), because the monitoring uses the CPU and no other devices. Thus, the storage overhead is low,

PT

ED

only approximately 2%.

Fig. 6. Performance Benchmarking.

CE

In conclusion, our method has a low performance impact on the target system, which mainly comes from computing.

Investigating Process Behavior based on Para-synchronous memory monitoring

AC

5.

How to catch transient illegal behaviors in memory such as hooking and kernel objects manipulations is also an
important but difficult issue that live forensic methods are faced with. In this section, a para-synchronous memory
monitoring method is presented. It can analyze a processs behavior in memory based on hardware virtualization. This
work has been published in [9].
5.1.

The new taxonomy and why to choose para-synchronous memory monitoring

The behavior of a process is transient, for example, hooking an illegal program into a certain OS data structure is

ACCEPTED MANUSCRIPT
18

difficult to catch or identify based only on the OS state or an OS dump. The more efficient way is monitoring or
intercepting the processs instructions. There are already many works focusing on the memory monitoring technique.
We classify their works into two categories: asynchronous and synchronous memory monitoring.
Asynchronous memory monitoring methods do not catch up with the system states. The cross-view approach is the
best example. This approach detects inconsistent states of the memory and can find out the hidden processes and hooks

CR
IP
T

easily. It is applied by various rootkit detection tools [10] and integrity checkers [8]. Because rootkits can stay silent
most of the time and interfere with the system whenever needed, some forensic systems [19] run simultaneously with
the target system and monitor the memory continuously. However, if the tool monitors memory asynchronously,
transient behaviors such as hooking or modifying kernel objects cannot be intercepted. Moreover, this approach cannot

AN
US

discover behavior-related processes simply by making comparisons. For example, the detection subsystem of
HookScout [33] can periodically check the attacks based on policies generated by dynamic analysis, but it cannot
figure out the related processes who perform these attacks.

Synchronous memory monitoring methods monitor every memory activity made by instructions. Thus, they are able

to introspect a processs state after any instruction. It is convenient to use software VMMs, such as QEMU, to
implement synchronous memory monitoring by interfering with the binary translation process. As described in section

ED

2, some works have applied this method to record a processs memory behavior for dynamic analysis. Although these
methods are powerful, they are slow and not suitable for monitoring systems running on bare hardware.

PT

We classify our method into a third category: para-synchronous methods, because this method does not need to
analyze every instruction. Instead, it monitors memory pages according to hardware events triggered by hardware for

CE

speed and accuracy. HookSafe [32] introduces the protection granularity gap between the byte-level granularity of
memory protection and the page-level granularity of hardware protection. HookSafe partially solves the problem by

AC

relocating hooks to a dedicated memory space. We propose a generic way to implement page-grained monitoring. This
method is quicker and is not limited by software emulators compared to synchronous methods, and it is able to capture
transient behaviors and collect consistent evidence, unlike asynchronous methods. With the help of live-migration from
our platform, this method is able to be applied to live systems, which brings dynamic analysis to the area of live
forensics.

ACCEPTED MANUSCRIPT
19

5.2.

The design of Para-synchronous memory monitoring

Our Para-synchronous memory monitoring method is implemented by controlling page-grained permissions. It can
acquire the accurate modifications that each process makes in the target area. It can also capture the corresponding
running states of the process for further detailed analysis. This method is composed of two main parts: tracing units and
recorders.

CR
IP
T

Commercial operating systems use the paging technique provided by hardware to isolate memory from processes;
therefore, the memory is naturally divided by 4-KB pages from both hardware and software perspectives. Based on
page permission control, it is natural to use the Second-Level Address Translation (SLAT) mechanism to control guest
physical memory. We design tracing units as basic components to monitor 4-KB pages, and each of them can respond

AN
US

to hardware events, control page permissions, and record evidence about a page automatically.

The recorders are responsible for merging information from tracing units according to the monitored page and
process. Multiple writes from a process to the same physical address are merged into a record block. The block size is
512 bytes. Each bit of it corresponds to an address in the monitored page and indicates whether this address is modified

by the process.

The key algorithm of our method is shown in Fig. 7. Our method leverages the copy-on-write-like operation and

ED

makes it transparent to the target OS using SLAT. When a guest process tries to modify the read-only memory, a write
violation happens. The violation-related tracing unit dumps the original memory page and obtains the process

PT

information using Virtual Machine Introspection (VMI). After a series of steps in hypervisor mode, the page resumes
being writable so that the process can modify the memory as it wants. The accurate modifications can be recorded

CE

simply by making a quick comparison between the original and the modified image. When the unit detects a different

AC

process coming, the former modifications of the page should be submitted to the hypervisor, and the page image should
be dumped again for the new process. This is called the submit-and-copy operation; the idea comes from the
copy-on-write operation, which guarantees the accuracy of the modification records between processes in a core.

ACCEPTED MANUSCRIPT
20

Fig. 7. Trace Memory Modifications in a Single Page.

CR
IP
T

The tracing unit should identify the process whenever a different one writes into the same page. This happens only if
the page is marked read-only so that the write violation can be triggered. That is to say, the write permissions of the
pages should be cancelled when the process is changed in the current processor. In uniprocessor systems, the pages are
marked read-only whenever a context change event occurs. The related units can then recheck the process and decide

AN
US

whether to perform the submit-and-copy operation. In SMP systems, things get complicated because the memory can
be accessed concurrently from different cores. Another mechanism called write-barrier is introduced during the
submit-and-copy operation to intercept and identify processes between cores.

According to our design, a guest process can run without much interference whenever it accesses a page it has

recently accessed. This is very efficient compared with instruction-level analysis because access patterns of typical
computer applications have locality of references [3]. The implementation of important steps, such as write-barrier,

5.3.

ED

copy-and-submit and write access cancelling, will be explained in the following.


The implementation of Para-synchronous memory monitoring

PT

We have implemented the method based on our framework. It is fit for 64-bit Windows, and the hardware platform

CE

should support hardware virtualization and SLAT.


5.3.1. Tracing unit implementation

AC

Our method is implemented to trace memory modifications in page granularity using tracing units. That is to say, a
tracing unit is related to:

A physical address of the monitored target page.

A dump area of the page to monitor modifications.

Guest process information.

Each guest process runs in its own dedicated address space; therefore, the process can be identified by the Control
Register 3 (CR3) value. When a write violation event occurs, if the violated-address-related tracing unit detects that the

ACCEPTED MANUSCRIPT
21

CR3 value is different from the value in the guest process information field, it will perform the submit-and-copy
operation and update the process information field. Otherwise, if the CR3 value matches, the tracing unit, it simply
allows the write attempt and the guest resumes control. During the submit-and-copy operation, the tracing unit first
compares the current page with the old dump and generates a modification map. The map, labeled with the pages base
physical address, will then be submitted to the record module with the guest processs information. The record module

CR
IP
T

finally merges the records according to the guest processs CR3 and the target pages physical address. If the specific
record does not exist, the record module will establish one during the submission process. After submission, the tracing
unit reinitiates itself by getting information from the new process and dumps the page before any changes are made.
Multiple tracing units can be set up separately because all of the tracing activities are triggered by hardware events,

AN
US

which occur sequentially in a single processor. To optimize the performance, physical addresses of the monitored
pages are hashed to find the corresponding tracing unit among thousands of them. Additionally, the CR3 value and
physical address of the records in the record module are also hashed so that they can be quickly selected during the
submission operation.

5.3.2. SMP system support

Implementations are much more elaborate in SMP systems because of two special cases, as shown in Fig. 8. The first

ED

one occurs when the tracing unit is recording write operations to page K from one guest process (process A) and
another process (process B) tries to write into the same area from a different core. This happens concurrently in SMP

PT

systems. The modification record will be contaminated if there is no write-barrier. The second case is similar, but the
attempt is made by another thread of the same process (process A). Contaminations in the first case will not happen in

CE

the second case; therefore, the tracing unit should allow the write attempt for best performance. A write-barrier is
required to control the write permissions of all of the cores separately. When the write-barrier is built, the tracing unit

AC

cancels write permissions of the target page in all cores. The following submit-and-copy operation is protected by the
critical zone built by the spinlock. Thus, if any processes try to write into the same page at the same time, the write
violation is immediately triggered, and they will be trapped in hypervisor mode and forced to wait outside the critical
zone until the submit-and-copy process is done. After that, the tracing unit checks the upcoming processs CR3 from
the waiting cores. If it matches the current CR3 value, the unit grants write access to that core directly. If not, the write
barrier will be built. Then, the submit-and-copy operation follows again.

ACCEPTED MANUSCRIPT

CR
IP
T

22

AN
US

Fig. 8. Timeline of a Tracing Unit Monitoring Page K in SMP Systems.

In the current prototype, the barrier cannot be built immediately because the Inter Processor Interrupt (IPI) has not
been implemented yet. The TLB cache will not be synchronized in time until we do it in the next context change event.
The false positives caused by the lack of TLB-shootdown will be discussed in case study II.

During context switch event handling, the hypervisor cancels write permissions of all of the tracing pages in the
current processor so that every unit can check again after the process is changed in the core. Our tracing method

ED

guarantees the accuracy of records by assuring sequential writes among processes. The performance is improved
because the writes are concurrent within the same process. Different tracing units are independent of each other. The

PT

current prototype supports up to 512 pages, covering 2 MB of memory space.


5.3.3. Evidence collection

CE

The evidence is regarding WHO, WHAT, WHERE and WHEN. WHEN is the system time when the modification
happens. WHERE is the places related to modifications. These places can be described by physical addresses, virtual

AC

addresses via CR3 value, or the symbols in the program. The record module provides modification maps regarding
processes. Each map contains a set of modification blocks according to modified pages. Every block records the
modified bytes of the monitored page into a bitmap. These bitmaps indicate all of the modified places of the process in
memory. WHO is extracted from the CR3 value of the process when the memory-write behavior is intercepted. This
value can be further evaluated by the introspection module to indicate the process ID (PID), process name and image
path of the program. WHAT includes detailed information about the behavior, and the hypervisor makes a quick

ACCEPTED MANUSCRIPT
23

snapshot to record the processs running state. The above evidence can be integrated in a chain to link from memory
behavior to a process and finally to an EXE image.
5.3.4. Framework-related implementation
Similar to the method in the previous section, para-synchronous memory monitoring is implemented based on our
forensic framework. And like Fig. 4, our methods implementation uses all four modules of the framework. The

CR
IP
T

controller passes start, configure, and reset commands safely. The evidence exporter provides a general interface to
transmit records generated by recorders. The event processor passes context switch events and EPT violations with
related addresses to the method. The para-synchronous method applies a precise mode of the memory virtualization
module to subtly change the memory access control of each processor.
The experiment and performance

AN
US

5.4.

In this section, we will present two case studies and evaluate the system. The experiments are conducted on a Lenovo
E47A laptop, with 4 GB RAM and an Intel Core i5-2450M CPU running at 2.50 GHz inside. The target OS is 64-bit
Windows 7, Build 7601. The current implementation supports up to 512 tracing units to monitor pages, 256 record

maps for processes records, and 16384 record blocks to store modification bitmaps. The overall initiation process
takes 2.5 s on average. Case study I is presented to show our methods ability in live forensics. Case study II will

ED

present the performance analysis and the capability in an SMP system.


5.4.1. Case study I

PT

To simplify our experiment and focus on hooking behavior monitoring in real-world cases, we choose

CE

kis14.0.0.4651.exe as our sample because cross-view-based tools reported that there are 25 new hooks in shadow
SSDT after running kis14.0.0.4651.exe. However, these hooks cannot be proven to have any relevance to their installer

AC

because there are 64 different processes running simultaneously in the system. It could be that some other process
deliberately or happened to install hooks when the sample was running. This is the same scenario as when an
investigator is facing a system filled with unknown software and wants to find out the hook process. To investigate
these hooks in shadow SSDT, we first install our prototype on the target system. We then use the CPUID operation to
set tracing units to monitor physical pages that contain shadow SSDT structures. After that, we start monitoring and run
the sample program. Those tracing units then automatically cancel the write permission of related pages, and they
continuously intercept the write attempts in those areas. After the hooks are installed, we stop monitoring and output

ACCEPTED MANUSCRIPT
24

the records from the hypervisor to disk. The abbreviated result in Fig. 9 shows that two tracing units have records of a
process. The process information includes the CR3 value, process name, PID, and image path. Each tracing unit has
information about the modifications made by the process and a related running snapshot, including a stack page dump

AN
US

CR
IP
T

and a code page dump.

Fig. 9. Abbreviated Records Made by Tracing Units.

Surprisingly, the processs image path does not match kis14.0.0.4651.exe, which should be on the desktop. And we

do not see any file named MsiExec.exe in the folder C:/Windows/syswow64/ or any process named msiexec.exe

ED

after the experiment. We may speculate that MsiExec.exe is created to install hooks and is deleted during the
installation of the hooks. That is to say, if a memory dump is made after or before the hooks are installed, an

PT

investigator cannot obtain enough information simply from after-the-fact analysis.


A code page dump with an instruction pointer value is recorded by the tracing units. This information is collected at

CE

the first write violation event of the monitored process. The binary image is disassembled to assembly code, as shown
in Fig. 10, which proves that msiexec.exe is performing an atomic write operation at that moment. More static analysis

AC

can be made to evaluate the code page to reconstruct the related behavior.

Fig. 10. Disassembled Code of the Running Snapshot.

Stack information is also available in the records of Fig. 9. The analysis shown in Fig. 11 presents the stack frame at

ACCEPTED MANUSCRIPT
25

that moment. It proves that this record is accurate because the violated address can be seen in local variables of the
stack frame. More detailed analysis can be made according to Arastehs work [4]. The memory contents is compared
byte-by-byte in the page. Each byte is marked by a bit to indicate if it is modified. There are 4096 bits to record the 4
KB of content of a page, which means the size of a record block is 512 B. Analysis in Fig. 12 shows the modifications
made by msiexec.exe. The range of shadow SSDT starts from the physical address of 0x86940f00. The two pages are

CR
IP
T

not continuous in virtual address space because msiexec.exe remaps them separately. Each digit represents four bits in
the record and thus four bytes in real memory, which is an entry of shadow SSDT. A 4-byte entry represents an entry
point of a handling procedure, and shadow SSDT has 827 of those entries in Windows 7. The result shows that 25
hooks are installed by msiexec.exe and that their indexes accurately match the results of the integrity checker PCHunter

AN
US

V1.313. The total number of copy-and-submit operations performed is two because the target pages are modified by
only one process. The overhead is much lower than that of instruction-level analysis, which needs to translate every

PT

ED

related write instruction.

AC

CE

Fig. 11. The Stack Frame of the Running Snapshot.

Fig. 12. Memory Modifications Made by Msiexec.exe.

The information in the records is not available in after-the-fact analysis because the hooking behavior is transient and
important evidence could be lost in nanoseconds. Our method can catch up with those important events by using a
hardware memory protection mechanism to intercept processes write attempts. A complete evidence chain is created

ACCEPTED MANUSCRIPT
26

for forensics from memory modifications to hooking behavior snapshots to process information, including its file
image path.
5.4.2. Case study II
In case study II, we test the performance of our method in two situations. The sequential situation is that the traced
page is accessed by only one process. This usually happens when a process installs hooks or accesses local variables,

CR
IP
T

and case study I belongs to the first situation. The parallel situation occurs when a traced page is accessed by two or
more processes simultaneously. This happens when the traced page contains data shared by processes. The guest OS
will be trapped in the hypervisor very frequently in the second situation because, to generate an accurate record, the
tracing unit should perform the submit-and-copy operation every time a different process comes.

AN
US

We implement a test program. It continuously accesses the memory of two predefined pages at high speed. We set up
two tracing units for these pages in the hypervisor. The performance evaluation uses the test program in two situations,
and the four environments in each situation are:
Env.1, the OS with the test program(s);

Env.2, the OS, the test program(s) and a silent hypervisor;

Env.3, the OS, the test program(s) and the hypervisor, which enables SLAT;

Env.4, the OS with the test program(s) traced by the hypervisor.

ED

Env.2 measures the overhead introduced from a simple Intel VT VMM. Env.3 measures the overhead introduced by

PT

the EPT mechanism. Env.4 measures the overhead coming from our monitoring method.
The performance is measured by PCMark7, which is a complete PC benchmark for Windows 7. It can conduct

CE

comprehensive performance tests of a system. The results shown in Fig. 13 indicate that the largest impact (24%)
comes from the SLAT mechanism, compared to the only 5% impact caused by our tracing method. The payload mainly

AC

comes from computation usage because of the additional VM-exit handling logic. The hypervisor does not interfere
with external device usage; therefore, the storage score does not change much. Compared with instruction-level
analysis, PoKeR [22] introduces a 200%-500% performance impact while profiling, and KernelGuard [21] introduces a
19.4% impact while preventing NIC manipulation and process hiding. The QEMU emulator is slower than the VMM,
which leverages hardware-assisted virtualization to optimize performance. To measure the overhead to the traced
processes, we add a counter inside the test program. The counter prints the memory access times every second. The

ACCEPTED MANUSCRIPT
27

average counts are calculated when the counts are stable. The result in Table 1 indicates that 56% overhead is caused by
the SLAT mechanism, and the overhead from our method is strongly based on the access pattern of the program. The
results conform to the fact that the concurrent accesses from different processes are forced to be sequential to guarantee

AN
US

CR
IP
T

accuracy.

Fig. 13. The Performance Evaluation.

ED

TABLE 1
PERFORMANCE OF THE TRACED PROCESSES
(Accesses per second)
Sequential
Parallel
Env. 1
240601
200921
Env. 2
240193
197639
Env. 3
106970
86416
Env. 4
105516
17000-4500

The effective write-barrier mentioned in section V.C is not completely built in the current prototype because

PT

TLB-shootdown has not yet been available. The write operation cannot be blocked immediately because the CPU uses
permissions cached in the TLB. Thus, false positives are introduced in the parallel situation. We measure them by

CE

letting two processes write into different addresses on the same page. The records of the two processes show that 208 of
8192 bytes are contaminated in 2 s. The contamination speed is 2.54% every 2 s. Due to the performance impact and the

AC

false positives in the current prototype, our hypervisor is recommended to intercept memory modifications in the static
memory area similar to what we did in case study I.
During the experiment, the system consumes 13065 4-KB pages, only 1.24% of the 4 GB of memory. There are 29
pages for hypervisor code, 9824 pages for paging structures, and 3212 pages for dynamic allocations.
5.4.3. Limitation
The current prototype still has many limitations. We do not handle page faults in the current prototype. Thus, if the
tracing target is paged out to disk, the tracing method will no longer work. Second, target traced pages need to be

ACCEPTED MANUSCRIPT
28

manually set according to their physical addresses, and the lack of an automatic scheme is inconvenient for forensics.
Additionally, we have not implemented interrupt control of the target OS; this will introduce some false positives
because the TLB cannot be invalidated immediately in the current version. We are in the process of improving our
implementation so that it can trace virtual memory page contents and support interrupt control. Our current prototype is
meant to prove the feasibility of our method for application in live forensic scenarios.

Conclusion

CR
IP
T

6.

In this paper, a lightweight live memory forensic framework is proposed. Its main idea is to build a virtualization
environment on-the-fly, and to intercept and analyze the live system from under ring 0. The operating system being
investigated can be migrated into a virtual machine without termination or modifications. This ability is underestimated

AN
US

in the area of live forensics because there are still very limited researches related. This paper aims to emphasize the fact
that we can now subvert a running OS to do live forensics, not only for memory acquisition to do after-the-fact analysis,
but also for live memory analysis and direct evidence chain extraction.

This paper presents how to do transparent live analysis effectively in this new way, and provides a framework to

simplify the implementation to do more analysis. Two novel forensic methods are proposed in this framework. First, a

ED

multi-view extraction method is proposed to collect reliable and intelligible process information. From both hardware
and user views, any malicious modifications can be detected, which include hidden malware and terminated processes

PT

that are unavailable in other tools. With the help of the forensic framework, the implementation includes only the core
monitoring logic with much less source code. Meanwhile, the hypervisors monitoring method is from a higher

CE

privilege level, making it undetectable and isolated without any modifications to the target system. Second, a
para-synchronous method is proposed to investigate transient memory behaviors. In our experiment, the prototype

AC

shows its ability to monitor hooking behaviors in the shadow SSDT table. It has collected rich information, including a
processs modification map, detailed information about the process, and related running snapshots. A complete
evidence chain can be created from the hook to an EXE file image after a quick analysis. The performance overhead
and memory usage are also acceptable. The result demonstrates that this method is promising for usage in live
forensics.
In the future, more features are going to be supported in this framework. Page fault handling is essential to virtual
memory space tracking. Interrupt handling will provide more hypervisor events for live forensic analysis and bring

ACCEPTED MANUSCRIPT
29

TLP invalidation for accurate multi-process memory tracking. Also, attacks beyond memory modifications such as
ROP attack and direct hardware manipulations should be taken into consideration.

Acknowledgments
This work is supported by the National Natural Science Foundation of China 61100198/F0207, 61100197/F0207).

[1]

CR
IP
T

References
Bypassing Klister 0.4 with no hooks or running a controlled thread scheduler, 29a Magazine, 2007. [Online]. Available: http://vx.
netlux.org/29a/magazines/29a-8.rar.
[2]

PCMark 7, 2013. [Online]. Available: http://www.futuremark.com/benchmarks/pcmark.

[3]

Wikipedia: Cache, 2014. [Online]. Available: http://en.wikipedia.org/wiki/Cache\_(computing)

[4]

A.R. Arasteh, M. Debbabi, Forensic memory analysis: From stack and code to execution history, Digital Investigation. 4 (2007) 114125.
doi:10.1016/j.diin.2007.06.010.

M. B. Athreya, Subverting linux on-the-fly using hardware virtualization technology, M.S. thesis, Department of Elect. and Comp. Eng.,
Georgia Institute of Technology, Atlanta, GA, 2010.

[6]

AN
US

[5]

F. Bellard, QEMU , a Fast and Portable Dynamic Translator, In: Proceedings of the 2005 USENIX Annual Technical Conference,
USENIX05, 2005, pp. 4146.

[7]

M. Burdach, Physical memory forensics, In Proc.of BlackHat, 2006.

[8]

M. Carbone, W. Cui, L. Lu, W. Lee, M. Peinado, X. Jiang, Mapping kernel objects to enable systematic integrity checking, In: Proc. of the
16th ACM conference on Computer and communications security, CCS09, 2009, pp.555-565. doi:10.1145/1653662.1653729.
Y. Cheng, X. Fu, et.al, Investigating the hooking behavior: a page-level memory monitoring method for live forensics, In: Proceedings of

[9]

the 2014 Information Security Conference, ISC 2014, 2014, pp.255-272.

[10] B. Cogswell, M. Russinovich, Rootkitrevealer, 2006. [Online]. Available:http://technet.microsoft.com/en-us/Sysinternals/bb897445.aspx.

ED

[11] Z. Deng, X. Zhang, D. Xu, SPIDER: stealthy binary program instrumentation and debugging via hardware virtualization. In: Proceedings of
the 29th Annual Computer Security Applications Conference, ACSAC 2013, 2013, pp. 289298.
[12] A. Dinaburg, P. Royal, M. Sharif, W. Lee, Ether: malware analysis via hardware virtualization extensions, In: Proceedings of the 15th ACM

PT

conference on Computer and communications security, CCS08, 2008, pp. 11-24. doi:10.1145/1455770.1455779.
[13] L.Garber, EnCase: a case study in computer-forensic technology, IEEE Computer Magazine, January 2001.
[14] W. Jiang, Z. Fengwei, S. Kun, A. Stavrou, Firmware-assisted Memory Acquisition and Analysis tools for Digital Forensics, In: Proceedings

CE

of IEEE Sixth International Workshop on System Approaches to Digital Forensic Engineering., SADFE11, 2011, pp. 15.
doi:10.1109/SADFE.2011.7.

[15] D. Kirat, G. Vigna, MalGene: Automatic Extraction of Malware Analysis Evasion Signature, Proceedings of the 22nd ACM SIGSAC

AC

Conference on Computer and Communications Security. (2015) 769-780. doi: 10.1145/2810103.2813642.

[16] T.K. Lengyel, S. Maresca, B.D. Payne, G.D.Webster, S. Vol, A. Kiayias. Scalability, fidelity and stealth in the drakvuf dynamic malware
analysis system. In: Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC 2014, 2014, pp. 386-395.

[17] L. Martignoni, A. Fattori, R. Paleari, L. Cavallaro, Live and Trustworthy Forensic Analysis of Commodity Production Systems, Recent
Advance in Intrusion Detection., 2010: pp. 297316. doi:10.1007/978-3-642-15512-3_16.
[18] D. Maynor, Metasploit toolkit for penetration testing, exploit development, and vulnerability research, 2011. [Online]. Available:
http://store.elsevier.com/product.jsp?isbn=9780080549255
[19] H. Moon, H. Lee, J. Lee, K. Kim, Y. Paek, B.B. Kang, Vigilare: toward snoop-based kernel integrity monitor, In: Proc. of the 2012 ACM
conference on Computer and communications security, CCS12, 2012, pp. 2837. doi:10.1145/2382196.2382202.

ACCEPTED MANUSCRIPT
30
[20] J. Rhee, R. Riley, Z. Lin, X. Jiang, Data-centric OS kernel malware characterization, IEEE Transactions on Information Forensics and
Security. 9, 1(2014) 72-87. doi: 10.1109/TIFS.2013.2291964.
[21] J. Rhee, R. Riley, D. Xu, X. Jiang, Defeating dynamic data kernel rootkit attacks via VMM-based guest-transparent monitoring, In:
Proceedings of International Conference on Availability, Reliability and Security, ARES09, 2009, pp. 7481.
doi:10.1109/ARES.2009.116.
[22] R. Riley, X. Jiang, D. Xu, Multi-aspect profiling of kernel rootkit behavior, In: Proceedings of the 4th ACM European conference on
Computer systems., EuroSys 09, 2009, pp.47-58. doi:10.1145/1519065.1519072.
[23] J. Rutkowska, A Tereshkin, IsGameOver () anyone, In: Proc.of BlackHat, 2007.

CR
IP
T

[24] J. Rutkowska, Beyond The CPU: Defeating Hardware Based RAM Acquisition, In: Proceedings of Black Hat DC.2007, pp.149.
[25] J. Rutkowska, Klister Project, 2003. [Online]. Available: http://www.rootldt.com/vaul-joanna/klister-0.4.zip.

[26] K. Saur, J.B. Grizzard, Locating 86 paging structures in memory images, Digital Investigation. 7 (2010) 2837.
doi:10.1016/j.diin.2010.08.002.

[27] B. Schatz, BodySnatcher: Towards reliable volatile memory acquisition by software, Digital Investigation. 4, Supplement (2007) 126134.
doi:10.1016/j.diin.2007.06.009.

[28] J. Shi, Y. Yang, C. Li, X. Wang, SPEMS: A Stealthy and Practical Execution Monitoring System Based on VMI, Cloud Computing and

AN
US

Security, Springer International Publishing, 2015: pp. 380-389. doi:10.1007/978-3-319-27051-7_32.

[29] A.F. Shosha, C.-C. Liu, P. Gladyshev, M. Matten, Evasion-resistant malware signature based on profiling kernel data structure objects, In:
Proceedings of the 7th International Conference on Risk and Security of Internet and Systems, CRiSIS 2012, 2012, pp. 18.
doi:10.1109/CRISIS.2012.6378949.

[30] S. Sparks, J. Butler, Shadow Walker: Raising The Bar For Windows Rootkit Detection, Phrack. (2005).
[31] M.Suiche, Win32dd,2009. [Online]. Available: http://www.msuiche.net/tools/win32dd-v1.2.1.20090106.zip.
[32] Z. Wang, X. Jiang, W. Cui, P. Ning, Countering kernel rootkits with lightweight hook protection, Proc. 16th ACM Conf. Comput.

Commun. Secur. - CCS 09. (2009) 545. doi:10.1145/1653662.1653728.

[33] H. Yin, P. Poosankam, S. Hanna, D. Song, HookScout: Proactive binary-centric hook detection, Lect. Notes Comput. Sci. (including
Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 6201 LNCS (2010) 120. doi:10.1007/978-3-642-14215-4_1.

ED

[34] M. Yu, Z. Qi, Q. Lin, X. Zhong, B. Li, H. Guan, Vis: Virtualization enhanced live forensics acquisition for native system, Digital
Investigation. 9 (2012) 2233. doi:10.1016/j.diin.2012.04.002.

[35] X. Zhong, C. Xiang, M. Yu, Z. Qi, H. Guan, A Virtualization Based Monitoring System for Mini-intrusive Live Forensics, International

PT

Journal of Parallel Program. 43 (2015) 455471. doi:10.1007/s10766-013-0285-2.

CE

Yingxin Cheng is a Ph.D Candidate at Nanjing University. He received his B.E. degrees from
Nanjing University in 2012 in Software Institute. His research interests are cloud and big data

AC

security. He is a member of the IEEE.


Xiao Fu is an assistant professor at Nanjing University. Dr. Fu received her B.E., M.S. and Ph.D.

degrees from Nanjing University in 2002, 2005 and 2010, respectively, all in the Department of
Computer Science and Technology. Her research interests are security, cloud computing. She is a
member of the IEEE.
Xiaojiang Du is an associate professor at Temple University. His research interests are security,

ACCEPTED MANUSCRIPT
31

cloud computing, wireless networks, networks and systems. He has published over 100 journal and conference papers.
He has been awarded more than $3 million in research grants from the U.S. NSF and Army Research Office. He is a
Senior Member of the IEEE.
Bin Luo is a professor at Nanjing University. Dr. Luo received his B.E., M.S. and Ph.D. degrees
from Nanjing University in 1989, 1992 and 2000, respectively, all in the Department of Computer

CR
IP
T

Science and Technology. His research interests are systems and networks. He is a member of the
IEEE.

Mohsen Guizani is a professor at University of Idaho. He has published six books and about 200
articles in the areas of wireless networking and communications, mobile computing, optical

AN
US

networking and network security. He is the Founder and Editor-In-Chief of Wiley Wireless
Communications and Mobile Computing Journal. He serve/served on the editorial boards of more
than 20 international journals such as IEEE Trans. on Wireless Communications. He is the Founder of two IEEE/ACM
international conferences and the General/Program/Symposium Chair of many international conferences. He was an

AC

CE

PT

ED

IEEE Computer Society Distinguished National Speaker. He is the IEEE Fellow.