You are on page 1of 4

Kernel Protection through Immunological Algorithms

Brennon York
boyork@indiana.edu

Abstract. There have been many different studies on help stop attacks by determining if an attack is in effect,
the effects of system calls and their relation to the and if so, the ability to fork control flow back to the
security of the operating system. In this paper I look at user. This algorithm would be built from current
previous methods and build upon those ideas with immunological design in hopes to achieve the evolution
current immune algorithms to construct a new model. factor allowing for the immunological responses to be
This algorithm leverages what can be easily deduced as created before the virus has been seen.
insecure along with heuristic methodologies in an
attempt to define whether a given program is safe or With an algorithm of this design on a machine
not. The input is considered to be a program’s system it would not only protect the individual kernel of the
trace and the output is a heuristic calculation of the machine, but also the kernel of other machines. Giving
security of that application being run. The backing the ability to control what happens back to the user it
behind this idea was the ability to detect and halt can easily prevent the propagation of worms. This
propagation of malware within distributed networks control would have possibly been able to aid in
through the aid of immune systems. With this stopping the Stuxnet virus, and others, from
algorithm in place it should give more control to the propagation. The end goal of this application would be
user and better inform them of oddities and suspicious to build up from the kernel level, leveraging the same
behaviors within running applications. algorithm, to defend distributed networks. The
algorithm would abstract up the network hierarchy
1 Introduction where each computer on the network would act as the
There is a great need to secure networked kernel in this given project.
systems. With the recent attacks on distributed
networks, such as self-propagating viruses like Stuxnet, 2 Previous Works
there is an even greater need to identify current security Stephanie Forrest et al. [1] approach this topic
techniques and flaws. Many approaches to network from a network flow viewpoint although her main
intrusion have been discussed and are present today contributions remain in the same category. She used an
with a multitude of algorithms within the realm of N-Gram model over a simplified system trace only
Intrusion Detection Systems. I feel that these scanning for specified calls such as read, write,
applications and algorithms are not the correct way to open, and close operations. This severely limits the
solve the issue as they do not dynamically evolve to model because it does not help in differentiating all
new situations. They are based on virus signatures other calls. This work also created security holes
which are only known after the discovery and analysis through the assumption that a malicious program must
of the virus. call one of those four system commands. It can be seen
now that there are many ways an application can act
What I propose is a reanalysis of works maliciously without the use of those basic calls. Forest
previous pertaining to system call tracing. The most also uses a limited application set to test her algorithm
important part of the computer is the kernel, and the on. A small set of ‘typical’ applications are ran and
only way for any application to gain control of the then, like stated previously, are only scanned for the
computer is through the kernel. It only accepts input in given system calls. This would then place an issue in
the form of system commands and, knowing this, the data set received because of its validity within the
allows a place to begin studying. With a deep analysis whole. Additionally, Forest’s algorithm only utilizes an
of what calls are made, in what order, and with what N-Gram model where n equals six. It is stated in the
variables, an algorithm can be built to determine if the paper that this is done because it gives the best result
calling application is acting in a malicious manner or when testing.
not. Since all system calls are static, only their calling
order is dynamic and unknown. Understanding how to J. Timmis et al. [2] presents a systematic
analyze this dynamic data will result in a dynamic approach to immunological algorithms and their
algorithm that can determine malicious activities placement within different categories. The major
whether the virus has a definition or not. This should benefits of this paper are the abstractions from which
Timmis shows immunological algorithms are derived.
He defines Epitope, Paratope, Idiotype, and Idiotope as
key factors within the immune system and therefore
must be represented within the application modeling it.
Under our works these are the system calls, the
defending algorithm, the N-Gram model, and a given
N-Gram set respectively. Timmis also briefly covers
previous work related to network intrusion as related to
the immune system as well as current algorithms and
implementations of artificial immune systems now such
as negative selection and danger theory. He presents
many different ways to go about immune systems
although with the time of this project, I was unable to
explore all options.

3 Outline
The objective for this project was to define a
‘self’ for any individual computer. This ‘self’ will also
act as a negation to provide a ‘non-self.’ With this
‘self’ and ‘non-self’ applications can be built to quickly
determine if a given process is acting erroneously. This
could trigger additional auditing capabilities or a
quarantine of the process until a user could verify its
legitimacy. The goal then is to provide an additional Figure 1: Shows the elevation of privilege from User space to Kernel
Space and what it affects.
level of protection against malicious commands
travelling into kernel-space. The ‘self’ and ‘non-self’
The problem is to determine what to consider
are necessary to satisfy the immunological idea of what
as a ‘self’ for the computer. Since ‘self’ can be
is safe and what needs to be protected.
considered a signature of the computer it needs to be
clarified what to consider within that signature. For this
The idea stems from the fact that the kernel
project we are assuming that ‘self’ is defined as a
within a computer is essentially the brain. Once
culmination of the processes running and their system
instructions are executed by the kernel they will take
calls. As stated previously the system calls will be the
place whether they were malicious or not. The only
defining information used in determining whether or
way for an instruction to execute is by getting a system
not an activity is considered malicious. The challenge
call from the operating system. This system call from
stems from identifying what information is available
the operating system, which is usually trigger through a
and how to leverage it for the best results. For the
running application, is considered an elevation of
timeline of this project the ‘self’ for the computer will
privilege from User space to Kernel space as seen in
only comprise of a small subset of applications instead
Figure 1. Understanding this scenario demonstrates the
of sampling from the entire running process space.
need to secure system calls since they are the only way
to manipulate the kernel.
4 Algorithm
A system call can easily be traced through a To begin an N-Gram model will be used to
call to strace. This function in Linux will output the determine the validity of the next system call in the
series of system calls that an application makes to the sequence. This will be performed at a per-application
kernel. It will also display each variable associated basis and not based purely on what call hits the kernel
with the call. The objective is to determine, with this next. While the N-Gram model works with the actual
information alone, if a given process would be system call there will also be a dynamic analysis of
considered malicious through its calls to the kernel. each variable within the call. For a call to be
The system calls and variables are what the algorithm considered secure it must not only pass a percentage
would ‘bind’ to in determining if the call is safe. From base for the N-Gram model, but that each variable
there it would be able to see what typical variables are within the call is considered safe. If all is valid within
for each call and could build immunological responses the call then it will be allowed to execute by the kernel.
for any atypical system call.
One misnomer that was needed for this
algorithm to work in the timeline of the project was that
for each variable within each system call, it would  A variable is called by a de-qualifier and not
automatically be considered secure unless it fell under a within its respective unique call list.
specific case. This was contradictory to the original  A variable is called by a qualifier and is
plan in which every variable would be scanned under a already held within the unique listing.
secure measure. This also reduced the viability of the
evolutionary feature being added to allow for new If either of these rules are violated it is considered a
binding to be produced without malicious calls being security breach because that means either one of two
sent. The variables this algorithm looks at could be possibilities. The application is trying to access
considered ‘good coding practice’ variables. What this something it already has access to, or it is trying to
means is that it looks for any variable that is known to remove access from something it knows nothing about.
be needed again, and if found, will retain that Both cases imply that a third party is operating because
information until it is found again within another of the unknown conditions stated.
system call. To elaborate, given a system call such as
mmap, there will be a variable within pointing to the For the N-Gram model it was decided to use a
address in memory in which to map. This pointer to three-tiered system. This would allow for multiple
memory location must be called again by munmap to possibilities as well as give enough leverage to form a
properly de-allocate it from the heap. Since this concrete answer on whether the application was
concept is known the algorithm can retain that memory malicious or not. A one, three, and five-gram window
pointer to ensure that it is called again under munmap. are used. The one-gram gives the percent chance of
We can do the same with other pairings as well, such as each call from the previous. This shows only
open and close. immediate changes in the system respective to
possibilities of calls seen from this application. We do
What was needed first was the system call not take into effect every system call, but only the
trace of a given program. To receive this data the number of each unique call from a given call sequence.
strace call was used. Originally this call was used The three-gram gives a small pattern recognition which
with an application with the results piped into standard can be used in determining the likelihood of a given call
out and then to a file. From there another program was after a given short sequence. The five-gram is no
called to parse the data and begin the N-Gram analysis. different from the three-gram except for being larger.
Upon further speculation the algorithm went into This takes care of possibly larger call patterns that a
version two and the parsing was pushing inside the three-gram could not pick up. The reason for a one,
strace function since the source code is publicly three, and five-gram set stems from the idea that for any
available. Moving the parsing inside the strace given set of system calls, it will fall in either a pattern
function became an issue because it was not written in of two, four, or six. Knowing this and the percent
C. As it stands the N-Gram code is executed within chance of each call it becomes easier to see if a given
from a popen command and the returned output is call at the end of a pattern will become malicious.
then parsed afterwards in the original C function.
5 Evaluation
To quickly implement the N-Gram model To test this new version of strace, called
Scheme was used. This functional language made it strace-prime, three programs were chosen. They
extremely easy to pull individual calls and variables were grep, cat, and ls. These were chosen because,
when needed while setting up the N-Gram model as like strace, their source code is freely available to
well. These calls that specify variables needed are download and compile. Like strace-prime, a
considered qualifiers. Whenever a call was found grep-prime, cat-prime, and ls-prime were
where the necessary variable was needed it would be created. These prime versions were injected with
pushed into a specific function to determine its malicious code, such as unneeded read operations or
uniqueness. By unique it means that the variable was allocation of heap memory. With such they would
not previously called unless by a de-qualifier, such as throw system calls that would not show up under
close or munmap. If it is considered unique then it is normal operation. Unfortunately for testing purposes of
added to the list of unique variables read for that given this project no actual malware was injected into one of
call. If it is called by a de-qualifier then a check is these applications. The evaluation was only on the lab-
placed to ensure the variable is part of the unique list constructed prime versions of code.
for that given call and then removed. With that, there
are two main issues that can cause an error. Each is The algorithm defined all prime applications
considered a breach in security. secure under all injected conditions save two.
Whenever an application was injected with a call to a
qualifier without its respective de-qualifier it would this still seems difficult as many calls could come, and
consider it unsecure at that time. Also, all three prime not necessarily in the correct order, therefor making
applications were injected with an execve command pattern recognition hard. An abstraction might work
to launch a shell from /bin/sh. In both cases the given a symbolic link table of what calls could, or must,
algorithm would determine the code as unsafe. be places after others. This work would then not look
at defined patterns, but rather flow patterns as the
It seems the error is in the N-Gram model. In program was executed. With all this stated, an
attempt to create an N-Gram model that evolves advanced algorithm might not be able to determine
without training it became almost impossible to find an security through calls, but rather correct flow. This is
application it did not deem secure. The N-Gram model all speculation for further research though.
does not seem to be the best algorithm for this design.
The reason this worked under the previous two cases REFERENCES
are twofold. One, is because the first case does not [1] Forrest, S. and Beauchemin, C. (2007), Computer
involve the N-Gram model. It merely utilizes the immunology. Immunological Reviews, 216: 176–197.
internal qualifier and de-qualifier lists to make its doi: 10.1111/j.1600-065X.2007.00499.x
decisions. Two, is because the execve command is [2] Timmis, J., Andrews, P., Owens, N., and Clark, E.,
not used normally under any condition and is C. (2008), An interdisciplinary perspective on artificial
considered an outlier system call. These results were immune systems. Evolutionary Intelligence, 1: 5-26.
not expected, but upon further analysis of what has doi: 10.1007/s12065-007-0004-2
been done, it makes sense. There are possibilities to
improve this work and they are discussed below.

6 Conclusion
This application does not work as it was
originally intended, although with more time and
energy it might still be a comparable algorithm. The
main problem came out to be the large false positive
rate. The current version of this project seems to output
almost every program as secure, even if tampered with.
One could argue that the tampering we have done for
testing purposes were not of the same caliber as if an
actual attack were happening, but it seems the same
results would come forth.

Currently the major area where this application


shines is the ability to pick up on unresolved qualifiers
and de-qualifiers. This was not the main intent of this
project as it was merely an added feature in the attempt
to quantify the security of the running program.
Additionally this work does not build upon the idea that
one needs to dynamically trace system calls in
determining security. The work showed that
identifying specific variables within system calls would
be good enough to improve the measure of security.
This could also be used when developing applications
as it would show programmers and software developers
where they are leaving unpatched holes such as de-
allocation of memory.

If continuing research were to happen in this


area the focus would be on each system call and its
individual variables to determine if any other such links
could be made. If they could, there might be a
possibility to return to the N-Gram model given
specified patterns seen within system calls. Even so,

You might also like