Collection of Volatile Data (Linux)

1.
Introduction
During the incident response process we often come across a situation where a compromised system
wasn't powered off by a user or administrator. This is a great opportunity to acquire much valuable
information, which is irretrievably lost after powering off. I'm referring to things such as: running
processes, open TCP/UDP ports, program images which are deleted but still running in main memory, the
contents of buffers, queues of connection requests, established connections and modules loaded into
part of the virtual memory that is reserved for the Linux kernel. All of this data can help the investigator in
offline examination to find forensic evidence. Moreover, when an incident is still relatively new we can
recover almost all data used by and activities performed by an intruder.
Sometimes the live procedure described here is the only way to acquire incident data because certain
types of malicious code, such as LKM based rootkits, are loaded only to memory and don't modify any file
or directory. A similar situation exists in Windows operating systems -- the Code Red worm is a good
example of this, where the malicious code was not saved as a file, but was inserted into and then run
directory from memory.
On the other hand, methods presented below also have serious limitations and violate the primary
requirement of the collection procedure for digital investigation -- a requirement which can not be easily
fulfilled. That is: every user and kernel space tool used to collect data by nature changes the state of the
target system. By running any tools on a live system we load them into memory and create at least one
process which can overwrite possible evidence. By creating a new process, the memory management
system of the operating system allocates data in main memory and then can overwrite other unallocated
data in main memory or in the swap file system.
Other problems arise when we plan to take legal actions and need to comply with local laws. The signs of
intrusions found in images of main memory can be untrusted, because they could be created by our
acquisition tools. So before taking any action we must decide whether to acquire some data from a live
compromised system or not. It is very often worth it to collect such information. In the main memory image
we can find passwords or decrypted files. Using /proc pseudo file system we can also recover programs
that have been deleted but are still allocated in memory.
In an ideal world, I could imagine a kind of hardware based solution for Intel-based computers, which
would allow us to dump the whole memory to an external storage device without assistance of operating
system. Such a solution exits on Sparc machines, whereby we can dump the whole physical memory by
using the OpenBoot firmware. Unfortunately, no similar solution exists for Intel- or AMD-based computers.
Despite the above problem, software based methods also have advantages for forensic purposes, and I'll
try to show them in this paper. The main goal of this article is a presentation of methods used during an
evidence collection procedure. All collected data can be used later to perform offline forensic analysis.
Some of presented tasks can be also be performed in the preparation and identification phases of the
incident response cycle -- these are two of the six phases defined in a guide called "Incident Handling
Step by step", published by the SANS Institute.
2. Forensic Analysis
This article is divided into four related sections:

2.1 Fitting to the environment
2.2 Preparing the forensic toolkit media
2.3 Data collecting from a live system - step by step procedure
2.4 Initial data analysis and keyword searching
Sections 2.1, 2.2, and part of 2.3 will be discussed in this article; the remaining steps and some offline
procedures will be discussed next month in part two of this article series.
2.1 Fitting to the environment
Before gathering data from a live system we have to fit ourselves into the environment. First of all we
have to run a network sniffer and it must "see" communication flows to and from a compromised system.
This condition is mandatory. We can detect some types of malicious activities just by recording and
analyzing, in real time, this communication. The utility tcpdump is excellent tool for this purpose. My
advice is to record packets in a raw format because of performance issues that may result otherwise.
Before taking any activities on the compromised system we have to create a paper copy of our data
collection procedure. An example procedure can be found in chapter three of this article. This procedure
helps us to avoid any mistakes during the forensics of an incident. We must make additional notes after
every finished step as well as if something goes wrong. Documentation is important, and is something to
keep in mind if we plan to take our forensic case to court.
Our next step is to record the results of commands run during our phase of data gathering. From there,
we connect a destination host to the same local area network on which we will be sending information
from the compromised host. Remember, we are not allowed to write any results on the compromised
system. Recording data locally on the compromised host can delete signs of an intrusion. To make less of
an impact on a compromised system we have to send all our digital data to a remote, or destination, host.
This is one of the most important rules in the forensic analysis process. And once again, as described
earlier this is a requirement that is not always easily to fulfill.
If we don't have a forensic toolkit available for install on removable media, now is a good time to prepare
it for our compromised system. Using tools from this toolkit we will collect all important data, beginning
from the volatile to the less volatile.
The following methods describe a method how to prepare our media into a forensic toolkit.
2.2 Preparing the forensic toolkit media
It is important to remember that during a data collection process we have to fulfill following criteria:

Try not to run programs on a compromised system. Why? An intruder could modify system
commands (such as a netstat) or system libraries (such as a libproc), rendering the results
unreliable. To fulfill this criteria we have to prepare versions of the tools which are compiled
statically.
Try not to run programs which can modify the meta-data of files and directories.
All results from the investigation must be written to a remote location. To fulfill this criteria we will
use the remote host as our destination location. The netcat tool will be used to transfer digital data.
You have to use tools to calculate the hash values of the digital data. This is a kind of assurance
that the digital data has not been altered. A best practice is to make sure that data is not altered and
is properly saved on the destination host, so we also will compare hash values calculated on both
the source and the destination. Sometimes it's impossible to calculate a hash value on the
compromised host -- a good example of this is with main memory. When we try to use md5sum on
the /dev/mem device twice in a row, every time the hash value will be different. This happens
because every time we load that program into memory (and thus create a new process which needs
memory to operate) we change the state of the memory. In our procedure we calculate hash values
of digital data immediately after collection is completed, as well as (when possible) on both the
source and destination host. To maintain the integrity of all results we will use md5sum tool.
The required criteria about preventing our tools from writing data to the memory and even the swap
space of the compromised system cannot be fulfilled for some steps. This will be discussed in
greater detail in section 2.3. For now, let's ensure we have a proper forensic toolkit on removable
media, as showin in Table 1.

Table 1: Requirements for a forensic toolkit on removable media.
program source & method of creation
1 nc
http://www.atstake.com/research/tools/network_utilities/nc110.tgz
How to build: $tar zxvf nc110.tgz; make linux
How to verify: file nc or ldd nc
2 dd
http://www.gnu.org/software/fileutils/fileutils.html
(added to core utilities)
3 datecat
http://www.gnu.org/software/coreutils/
How to build: $ tar zxvf coreutils-5.0.tar.gz; configure CC="gcc -static",
make
How to verify: file date cat or ldd date cat
4 pcat
http://www.porcupine.org/forensics/tct
How to build: $tar zxvf tct-1.14.tgz; make CC="gcc -static"
How to verify: file pcat or ldd pcat
5 Hunter.o
http://www.phrack.org/phrack/61/p61-0x03_Linenoise.txt
To make the module more "independent" we have to delete the following
lines from the source code:
#ifdef CONFIG_MODVERSIONS
#define MODVERSIONS
#include <linux/modversions.h>
#endif
We can load this module to other kernels by removing the
MODVERSIONS.
How to build: $ gcc -c hunter.c -I/usr/src/linux/include/
6 insmod
http://www.kernel.org/pub/linux/utils/kernel/modutils/for kernel 2.4
How to build: $./configure-enable-insmod_static; make
How to verify: file insmod.static or ldd insmod.static
7 NetstatArproute
http://freshmeat.net/projects/net-tools/
How to build: $bzip2 -d net-tools-1.60.tar.bz2; tar xvf net-tools-1.60.tar.bz2;
make config; make CC="gcc -static"
How to verify: file netstat arp route or ldd netstat arp route
8 dmesg
http://ftp.cwi.nl/aeb/util-linux/util-linux-2.12.tar.gz
How to build: $./configure; make CC="gcc -static"
How to verify: file dmesg or ldd dmesg
When we build all above tools successfully, we can copy all of them to our removable media (such as a
CD-RW disc).
2.3 Data collecting from a live system - a step by step procedure
The next requirement, and a very important one, is that we have to start collecting data in proper order,
from the most volatile to the least volatile data. We have to remember about this during data gathering.
Step 1: Take a photograph of a compromised system's screen
Before moving on to step two, mounting our media, let's first think about the impact this next step will
have on a compromised system. What will be an effect of our activity? For the moment let's ignore the
impact it will have on the compromised system's memory.
It is clear that we have to mount external media into the compromised system. We must use the untrusted
mount command to perform this task. This will be probably the sole situation when an untrusted system
command is used. If everything will go according to the plan, we will run the rest of the command from the
mounted media using tools that we trust.
We also have to check to see what the impact of the mount command will have on the system. I have
done some research on a computer, and Table 2 lists the relevant files and directories that are modified.
# strace /bin/mount /mnt/cdrom

Table 2: Files accessed by the mount command.
File Modified Meta-data by the mount command
/etc/ld.so.cache atime
/lib/tls/libc.so.6 atime
/usr/lib/locale/locale-archive atime
/etc/fstab atime
/etc/mtab* atime, mtime, ctime
/dev/cdrom atime
/bin/mount atime
*We can avoid access to this file by using a "-n" switch.
We can imagine a situation when an intruder modifies the mount command. When someone tries to run
this command perhaps a special process, which removes all evidence from the compromised system, is
initiated instead of allowing the media to be mounted. Such a process is called a "deadman switch". But
let's assume this is not the case, and now go back to the process of data collection.
I suggest that one verify every command that is going to be put on the forensic toolkit media, which later
will be used on the compromised system to collect evidence.
We also have to stop and think about potential problems met during the mounting process:

This is a kind of screenshot, and of course we have to use a digital camera to do this task. This is a
simple step.

After putting the media into a drive, the Volume Manager process will mount the media
automatically. Which files and directories will be modified? Are these files listed in the table 1?
Suppose an unknown media is currently mounted on a compromised system. Then the first
task is to unmount that media. How should we safely unmount it? I can suggest two solutions.
We can use the untrusted unmount command or we can put the trusted unmount command
(statically linked) on a floppy disc. Next, we use the untrusted mount command to mount the
floppy and then run the trusted unmount command. It is a little bit complicated but effective.
We still use only one untrusted command.
An administrator is logged off or even worse an administrator password is changed by an
intruder. When the administrator is logged off we have to login into the system. What files will
be accessed or modified during the login process? How many additional processes will be
created? If the administrator password was changed what are the other accounts on the
system? What volatile data can be collected without access to a shell? Open TCP/UDP ports,
current connections, what else?
Are there other unpredictable problems?

Step 2: Media mounting

# mount -n /mnt/cdrom
If the mounting process is successful we can start with the most important phase of data collection.
Remember, all results generated by trusted commands have to be sent to the remote host. I use the
netcat tool and the pipe method to do this. To better differentiate which tasks are performed on which
host, all commands run on the compromised host will be prefixed with a (compromised) word in brackets.
Commands run on the remote host will be prefixed with a (remote) word in brackets. Consider the
following example.
To send information about an actual date of the compromised system into the remote location (the IP
address of remote host in this case is 192.168.1.100) we have to open TCP port on the remote host as it
follows:
(remote host)# nc -l -p 8888 > date_compromised
Next, on the compromised host we do the following:
(compromised host)# /mnt/cdrom/date | /mnt/cdrom/nc 192.168.1.100 8888 -w 3
To maintain the integrity of digital evidence we calculate the hash value of the collected file and clearly
document every step on our paper copy, to document this procedure.
(remote host)# md5sum date_compromised > date_compromised.md5
Sometimes we can generate checksums on the compromised system and send the result to the remote
host. A bit more about some of the problems this can cause has been discussed elsewhere in this article.
(compromised host)# /mnt/cdrom/md5sum /etc/fstab | /mnt/cdrom/nc
192.168.1.100 8888 -w 3
Let's go ahead and mount our media, in this case a CD-ROM with our toolkit.

Step 3: Current date
The result is presented in the UTC format (Coordinated Universal Time)
(remote)# nc -l -p port > date_compromised
(compromised)# /mnt/cdrom/date -u | /mnt/cdrom/nc (remote) port
(remote)# md5sum date_compromised > date_compromised.md5

Step 4: Cache tables
Mac address cache table:
(remote)# nc -l -p port > arp_compromised
(compromised)# /mnt/cdrom/arp -an | /mnt/cdrom/nc (remote) port
(remote)# md5sum arp_compromised > arp_compromised.md5
Kernel route cache table:
(remote)# nc -l -p port > route_compromised
(compromised) # /mnt/cdrom/route -Cn | /mnt/cdrom/nc (remote) port
(remote)#md5sum route_compromised > route_compromised.md5

First, we have to collect information from cache tables because the lifetime of this data, placed in
the tables, is very short. I will collect data from the arp and routing tables.
Step 5: Current, pending connections and open TCP/UDP ports.
(remote)#nc -l -p port > connections_compromised
(compromised)# /mnt/cdrom/netstat -an | /mnt/cdrom/nc (remote) port
(remote)#md5sum connections_compromised > connections_compromised.md5
We can use the cat command instead of the netstat one in this case. Information about open ports is kept
in the /proc pseudo file system (/proc/net/tcp and /proc/net/udp files). Information about current
connections is placed in the /proc/net/netstat file. All data in those files are represented in the hex format.
For example: 0100007F:0401 in decimal is 127.0.0.1:1025.
As mentioned before, current connections can be detected by analyzing of the recorded traffic. It is
important to note: an easy method of detecting a rootkit, loaded into kernel memory, is when one of its
tasks is hiding an open port. We have to scan the compromised host remotely and compare the detected
open ports with our result from the netstat command. But this causes a lot of harm and we once again
change the state of the compromised system, in step seven I will present an alternate method of
detecting hidden LKM based rootkits.
Now, we start collecting information about current connections and open TCP/UDP ports.
Information about all active raw sockets will be gathered in step eight.
Concluding part one
Now that we have the date and the network status logged, we're ready to take some additional steps on
the compromised machine before we power it off. Next month, in part two of this article series, we will
focus on the search for malicious code by collecting more data to be sent to our remote host. We'll also
discuss some of the searching that can be done with the data once we're able to go through it in a safe
environment.

References
Alessandro Rubini, Jonathan Corbet. Linux Device Drivers, 2nd Edition. O'Reilly; 2001.
Dan Farmer, Wietse Venema. Column series for the Doctor Dobb's
Journal.http://www.porcupine.org/forensics/column.html.
Daniel P. Bovet, Marco Cesati. Understanding the Linux Kernel, 2nd Edition. O'Reilly; 2002.
Kernel source code. http://www.kernel.org
Linux manual pages.
National Institute of Standards and Technology. Computer Security Incident Handling
Guide.http://csrc.nist.gov.
PHRACK #61. Finding hidden kernel modules (the extrem way) by madsys. http://www.phrack.org.
RFC 3227. Guidelines for Evidence Collection and Archiving.
Smith Fred, Bace Rebecca. A guide to forensic testimony. Addison Wesley; 2003.
Symantec Corporation. CodeRed Worm. http://securityresponse.symantec.com.
The Honeynet Project. Scan 29. http://www.honeynet.org
The SANS Institute. Incident Handling step by step. http://www.sans.org

Collection of Volatile Data (Linux)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Collection of Volatile Data (Linux)

Uploaded by

Copyright:

Available Formats

1.

You might also like