You are on page 1of 107

Artificial Immune System and Its Applications

Prof. Ying TAN


National Laboratory on Machine Perception Department of Intelligence Science Peking University, Beijing 100871, P.R.China

2005-12-13

Y. Tan---Artificial Immune Sys.

Contents
Biological Immune System Artificial Immune System Basic Algorithms of AIS AIS design procedure Case Studies
Malicious Executable Detection Film Recommender

New Immuneocomputing IC Danger Theory Future


2005-12-13 Y. Tan---Artificial Immune Sys. 2

The Immune System is


Immune system: a system that protects the body from foreign substances and pathogenic organisms by producing the immune response Immunity: state or quality of being resistant (immune), either by virtue of previous exposure (adaptive immunity) or as an inherited trait (innate immunity)
2005-12-13 Y. Tan---Artificial Immune Sys. 3

Why is the Immune System?


Immune system has following appealing features: Recognition
Anomaly detection Noise tolerance

Robustness Feature extraction Diversity Reinforcement learning Memory; Dynamically changing coverage Distributed Multi-layered Adaptive
Y. Tan---Artificial Immune Sys. 4

2005-12-13

Role of Biological Immune System


Protect our bodies from pathogen and viruses Primary immune response
Launch a response to invading pathogens

Secondary immune response


Remember past encounters Faster response the second time around

2005-12-13

Y. Tan---Artificial Immune Sys.

Immune cells
There are two primarily types of lymphocytes:
B-lymphocytes (B cells) T-lymphocytes (T cells)

Others types include macrophages, phagocytic cells, cytokines, etc.

2005-12-13

Y. Tan---Artificial Immune Sys.

Where is it?
P r im a r y l y m p h o i d o r g a n s S e c o n d a r y lym p h o id o r g a T o n s ils a n d a d e n o id s

T hym us S p le e n

P e y e r s p a t c h e s A p p e n d ix B o n e m a rro w Lym ph nodes L y m p h a tic v e s s e ls

2005-12-13

Y. Tan---Artificial Immune Sys.

Multiple layers of the immune system


Pathogens

Skin Biochem ical barriers Phagocyte Innate im m une response

Lym phocytes

Adaptive im m une response


2005-12-13 Y. Tan---Artificial Immune Sys. 8

Antigen
Substances capable of starting a specific immune response commonly are referred to as antigens This includes some pathogens such as viruses, bacteria, fungi etc .

2005-12-13

Y. Tan---Artificial Immune Sys.

Biological Immune System


Innate

vs

Acquired

Cell Mediated

vs

Humoral

T Cell (Helper) B Cell T Cell (Killer)


Secretes

Antibody
Y. Tan---Artificial Immune Sys. 10

2005-12-13

How does IS work: A simplistic view


M H C A P C p r o te in A n tig e n ( I ) P e p tid e ( II ) T - c e ll ( II I ) B - c e ll ( V )

( IV A c tiv a te d T - c e ll

L y m p h o k in e s ( V I )

A c t iv a t e d ( p la s m a

B - c e ll c e ll)

( V II )

2005-12-13

Y. Tan---Artificial Immune Sys.

11

Self/Non-Self Recognition
Immune system needs to be able to differentiate between self and non-self cells Antigenic encounters may result in cell death, therefore
Some kind of positive selection Some element of negative selection

2005-12-13

Y. Tan---Artificial Immune Sys.

12

Immune Pattern Recognition


BCR or Antibody

B-cell Receptors (Ab) Epitopes Antigen


B-cell

2005-12-13

The immune recognition is based on the complementarity between the binding region of the receptor and a portion of the antigen called epitope. Antibodies present a single type of receptor, antigens might present several epitopes. This means that each antibody can recognize a single antigen
Y. Tan---Artificial Immune Sys.

13

Clonal Selection
Clonal deletion (negative selection) Self-antigen Proliferation (Cloning)
M

Antibody Selection Differentiation

Memory cells

Plasma cells

Foreign antigens

Self-antigen Clonal deletion (negative selection)

2005-12-13

Y. Tan---Artificial Immune Sys.

14

Main Properties of Clonal Selection (Burnet, 1978)


Elimination of self antigens Proliferation and differentiation on contact of mature lymphocytes with antigen Restriction of one pattern to one differentiated cell and retention of that pattern by clonal descendants; Generation of new random genetic changes, subsequently expressed as diverse antibody patterns by a form of accelerated somatic mutation
2005-12-13 Y. Tan---Artificial Immune Sys. 15

Immune Network Theory


Idiotypic network (Jerne, 1974) B cells co-stimulate each other Treat each other a bit like antigens Creates an immunological memory
Paratope Suppression Negative response A g 1 Idiotope Antibody Activation Positive response 2 3

2005-12-13

Y. Tan---Artificial Immune Sys.

16

Reinforcement Learning and Immune Memory


Repeated exposure to an antigen throughout a lifetime Primary, secondary immune responses Remembers encounters
No need to start from scratch Memory cells

Continuous learning
2005-12-13 Y. Tan---Artificial Immune Sys. 17

Learning (2)
Primary Response Antibody Concentration Secondary Response Cross-Reactive Response

Lag Lag

Lag

Response to Ag1 Response to Ag1

...
Response to Ag2

Response to Ag1=Ag1 + Ag3

... ...
Antigen Ag1 Antigens Ag1, Ag2

...
Antigen Ag1 + Ag3 Time

2005-12-13

Y. Tan---Artificial Immune Sys.

18

Back

Immune System: Summary

Define host (body cells) from external entities. When an entity is recognized as foreign (or dangerous)- activate several defense mechanisms leading to its destruction (or neutralization). Subsequent exposure to similar entity results in rapid immune response. Overall behavior of the immune system is an emergent property of many local interactions.

2005-12-13

Y. Tan---Artificial Immune Sys.

19

Back

Immune metaphors
Other areas Idea! Idea

Immune System Artificial Immune Systems


2005-12-13 Y. Tan---Artificial Immune Sys. 20

What is an Artificial Immune System? Definition


Dasgupta99: Artificial immune systems (AIS) are intelligent and adaptive systems inspired by the immune system toward real-world problem solving
de Castro and Timmis: Artificial Immune Systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/
Using natural immune system as a metaphor for solving complex computational problems. Not modelling the immune system
2005-12-13 Y. Tan---Artificial Immune Sys. 21

AI models and their corresponding natural prototypes


Natural prototype Biological level Natural language Brain nervous net Biological cells Molecules of proteins Genetic code
2005-12-13

AI model Formal logic Formal linguistic Neural computing (NC) Neural networks (NN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)
22

Left hemisphere of brain Cells Cells Molecular Molecular

Y. Tan---Artificial Immune Sys.

Some History
Developed from the field of theoretical immunology in the mid 1980s.
Suggested we might look at the IS

1990 Bersini first use of immune algorithms to solve problems Forrest et al Computer Security mid 1990s Hunt et al, mid 1990s Machine learning More
2005-12-13 Y. Tan---Artificial Immune Sys. 23

AIS Scope
Pattern recognition; Fault and anomaly detection; Data analysis; Data mining (classification/clustering) Agent-based systems; Scheduling; Machine-learning; Autonomous navigation and control; Search and optimization methods; Artificial life; Security of information systems; Optimization; Just to name a few.
Y. Tan---Artificial Immune Sys. 24

2005-12-13

Back

Typical Applications of AIS

Computer Security(Forrest949698, Kephart94, Lamont9801,02, Dasgupta9901, Bentley0001,02) Anomaly Detection (Dasgupta960102) Fault Diagnosis (Ishida9293, Ishiguro94) Data Mining & Retrieval (Hunt9596, Timmis9901, 02) Pattern Recognition (Forrest93, Gibert94, de Castro 02) Adaptive Control (Bersini91) Job shop Scheduling (Hart98, 01, 02) Chemical Pattern Recognition (Dasgupta99) Robotics (Ishiguro9697,Singh01) Optimization (DeCastro99,Endo98, de Castro 02) Web Mining (Nasaroui02,Secker05) Fault Tolerance (Tyrrell, 01, 02, Timmis 02) Autonomous Systems (Varela92,Ishiguro96) Engineering Design Optimization (Hajela96 98, Nunes00)
2005-12-13 Y. Tan---Artificial Immune Sys. 25

Basic Immune Models and Algorithms


Bone Marrow Models Negative Selection Algorithms Clonal Selection Algorithm Immune Network Models Somatic Hypermutation

2005-12-13

Y. Tan---Artificial Immune Sys.

26

Bone Marrow Models


Gene libraries are used to create antibodies from the bone marrow Antibody production through a random concatenation from gene libraries Simple or complex libraries
An individual genome corresponds to four libraries: Library 1 A1 A2 A3 A4 A5 A6 A7 A8 A3 Library 2 B1 B2 B3 B4 B5 B6 B7 B8 B2 Library 3 C1 C2 C3 C4 C5 C6 C7 C8 C8 Library 4 D1 D2 D3 D4 D5 D6 D7 D8 D5

A3

B2

C8

D5

= four 16 bit segments = a 64 bit chain

A3 B2 C8 D5 Expressed Ab molecule

2005-12-13

Y. Tan---Artificial Immune Sys.

27

Negative Selection (NS) Algorithms


Forrest 1994: Idea taken from the negative selection of T-cells in the thymus Applied initially to computer security Split into two parts:
Censoring Monitoring
Self strings (S)
D e te c to r S e t (R )

Generate random strings (R0)

Match Yes Reject

No

Detector Set (R)

P ro te c te d S trin g s (S )

M a tc h Yes N o n -s e lf D e te c te d

No

Censoring 2005-12-13 Y. Tan---Artificial Immune Sys.

Monitoring 28

Clonal Selection Algorithm (de Castro & von Zuben, 2001)


1. Initialisation: Randomly initialise a population (P) 2. Antigenic Presentation: for each pattern in Ag, do:

2.1 Antigenic binding: determine affinity to each P 2.2 Affinity maturation: select n highest affinity from P and clone and mutate prop. to affinity with Ag, then add new mutants to P
3. Metadynamics: 3.1 select highest affinity P to form part of M 3.2 replace n number of random new ones 4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation)
2005-12-13 Y. Tan---Artificial Immune Sys. 29

CLONALG for PR, Learning, Optimization

Agj Ab{d} Abj* Ab {r} Ab {m} fj Select Select Fj* Ab {n} Cj* Clone

L.N. de Castro, et.al., Learning and optimization using the clonal selection principle, IEEE Trans. Evolutionary computation, vol.6, no.3, June 2002, pp.239251

Select
Y. Tan---Artificial Immune Sys.

Cj
30

2005-12-13

Discrete Immune Network Models (Timmis & Neal, 2001)


Initialisation: create an initial network from a sub-section of the antigens Antigenic presentation: for each antigenic pattern, do: 2.1 Clonal selection and network interactions: for each network cell, determine its stimulation level (based on antigenic and network interaction) 2.2 Metadynamics: eliminate network cells with a low stimulation 2.3 Clonal Expansion: select the most stimulated network cells and reproduce them proportionally to their stimulation 2.4 Somatic hypermutation: mutate each clone 2.5 Network construction: select mutated clones and integrate 3. Cycle: Repeat step 2 until termination condition is met 1. 2.

2005-12-13

Y. Tan---Artificial Immune Sys.

31

Immune Network Models


Timmis & Neal, 2000 Used immune network theory as a basis, proposed the AINE algorithm
Initialize AIN For each antigen Present antigen to each ARB in the AIN Calculate ARB stimulation level Allocate B cells to ARBs, based on stimulation level Remove weakest ARBs (ones that do not hold any B cells) If termination condition met exit else Clone and mutate remaining ARBs Integrate new ARBs into AIN
2005-12-13 Y. Tan---Artificial Immune Sys. 32

Immune Network Models


De Castro & Von Zuben (2000c) aiNET, based in similar principles
At each iteration step do For each antigen do Determine affinity to all network cells Select n highest affinity network cells Clone these n selected cells Increase the affinity of the cells to antigen by reducing the distance between them (greedy search) Calculate improved affinity of these n cells Re-select a number of improved cells and place into matrix M Remove cells from M whose affinity is below a set threshold Calculate cell-cell affinity within the network Remove cells from network whose affinity is below a certain threshold Concatenate original network and M to form new network Determine whole network inter-cell affinities and remove all those below the set threshold Replace r% of worst individuals by novel randomly generated ones Test stopping criterion 2005-12-13 Y. Tan---Artificial Immune Sys. 33

Back

Somatic Hypermutation

Mutation rate in proportion to affinity Very controlled mutation in the natural immune system Trade-off between the normalized antibody affinity D* and its mutation rate ,
1 0 .9 0 .8 0 .7 0 .6

0 .5 0 .4 0 .3 0 .2 0 .1 0 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

1 0 = 2 0

2005-12-13

Y. Tan---Artificial Immune Sys.

D *

34

General Framework of AIS


Solution Immune Algorithms Affinity Measures Representation

Problem
2005-12-13

Application Domain
Y. Tan---Artificial Immune Sys. 35

Representation Shape Space


Describe the general shape of a molecule
A n t ig e n

A n t ib o d y

Describe interactions between molecules Degree of binding between molecules

2005-12-13

Y. Tan---Artificial Immune Sys.

36

Representation
Vectors Ab = Ab1, Ab2, ..., AbL Ag = Ag1, Ag2, ..., AgL Real-valued shape-space Integer shape-space Binary shape-space Symbolic shape-space
Y. Tan---Artificial Immune Sys. 37

2005-12-13

Define their Interaction


Define the term Affinity Affinity is related to distance
Euclidian

D=

2 ( Ab Ag ) i i i =1

Other distance measures such as Hamming, Manhattan etc. etc. Affinity Threshold

2005-12-13

Y. Tan---Artificial Immune Sys.

38

Shape Space Formalism


Repertoire of the immune system is complete (Perelson,
1989)
V

Extensive regions of complementarity Some threshold of recognition


2005-12-13 Y. Tan---Artificial Immune Sys.

39

Back

AIS Design

Problem description Deciding the immune principles used for problem solving Engineering the AIS
Defining the types of immune components used Defining the representation for the elements of the AIS Applying immune principle to problem solving The meta-dynamics of an AIS

Reverse mapping from AIS to the real problem


2005-12-13 Y. Tan---Artificial Immune Sys. 40

Back

Case Studies of AIS


From Z.H. Guo, Z.K. Liu, and Y. Tan, An NNbased Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer, Lecture Notes in Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-trier.de)

Malicious Executables Detection ---

Film Recommender --- From Dr. Dr Uwe


Aickelin (http://www.aickelin.com), University of Nottingham, U.K. 2004
2005-12-13 Y. Tan---Artificial Immune Sys. 41

New!

Immuneocomputing -- IC

By Tarakanov, A. 2001. Aims of A proper mathematical framework; A new kind of computing; A new kind of hardware. New concepts of formal protein (FP) ------- vs. neuron formal immune networks (FIN)------- vs. NN
Refer to
2005-12-13

A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.
Y. Tan---Artificial Immune Sys. 42

Problems of Traditional Self/Non-self View


No reaction to foreign bacteria in gut (friendly bacteria). No reaction to food / air / etc. The human body changes over its life. Auto-immune diseases. How do we produce antibodies that react against antigens and yet avoid self? Is it necessary to attack all non-self or a specific self?

2005-12-13

Y. Tan---Artificial Immune Sys.

43

New!

The Danger Theory

In the danger model, the idea is to recognise danger rather than non self. The screening is accomplished post production through an external danger signal. Thus the production of autoreactive antibodies (which react to self) is allowed. If an (e.g. autoreactive) antibody matches a stimulus in the absence of danger, it is removed. Thus harmless antigens are tolerated, and changing self accommodated.
Matzinger (2002). The Danger Model: A renewed sense of self , Science 296: 301-304.
2005-12-13 Y. Tan---Artificial Immune Sys. 44

Danger Theory (cont)


Danger Theory
Not self/non-self but Danger/Non-Danger Immune response is initiated in the tissues. Danger Zone. This makes it context dependant
Matzinger (2002) The Danger Model: A renewed sense of self Science 296: 301-304 Aickelin & Cayzer (2002) The Danger Theory and Its Application to Artificial Immune Systems, Proc. International Conference on AIS (ICARIS 2002)
2005-12-13 Y. Tan---Artificial Immune Sys. 45

Danger Zone
Stimulation Danger Zone Antibodies Antigens Cells Damaged Cell Danger Signal
2005-12-13 Y. Tan---Artificial Immune Sys. 46

Match, but too far No match away

Towards a dangerous IDS


The danger theory suggests that the immune system reacts to threats based on the correlation of various (danger) signals, providing a method of grounding the immune response, i.e. linking it directly to the attacker.
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd International Conference on Artificial Immune Systems, LNCS 2787, pp 147-155
2005-12-13 Y. Tan---Artificial Immune Sys. 47

Other ways of using danger


Danger = Crime, Antigen = Suspect
or...

Danger = Context ?
It could also be useful for data mining, where the danger signal is a proxy measure of interest Danger Zone can be spatial or temporal
Andrew Secker, Alex Freitas, and Jon Timmis (2005) Towards a danger theory inspired artificial immune system for web mining in A Scime, editor, Web Mining: applications and techniques, pages 145-168 (Idea Group)
2005-12-13 Y. Tan---Artificial Immune Sys. 48

Back

Some Recent Applications of Danger Theory

Anjum Iqbal, Mohd Aizaini Maarof, Danger Theory and Intelligent Data Processing, International Journal of Information Technology, Vol.1, No.1, 2004. Andrew Secker, Alex A. Freitas, and Jon Timmis, A Danger Thory Inspired Approach to Web Mining, Computing Lab. University of Kent, Canterbury, Kent, UK.2005 So on.
2005-12-13 Y. Tan---Artificial Immune Sys. 49

The Future
More formal approach required? Wide possible application domains. What makes the immune system unique? More work with immunologists:
Danger theory. Idiotypic Networks. Self-Assertion.
2005-12-13 Y. Tan---Artificial Immune Sys. 50

Reference for further reading


Books Artificial Immune Systems and Their Applications by Dipankar Dasgupta (Editor) Springer Verlag, January 1999. L.N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer, 2002. A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.
Related academic papers J. Timmis, P.Bentley, and Emma Hart (Eds.): Artificial Immune Systems, Proceedings of Second International Conference, ICARIS 2003, Edinburgh, UK, September 2003. LNCS 2787, Springer.
2005-12-13 Y. Tan---Artificial Immune Sys. 51

New Events:
Special Session on Artificial Immune Systems at the Congress on Evolutionary Computation (CEC), December 8-12, 2003, Canberra, Australia. Special Session on Immunity-Based Systems at Seventh International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), September 3-5, 2003, University of Oxford, UK. Second International Conference on Artificial Immune Systems (ICARIS), September 1-3, 2003, Napier University, Edinburgh, UK. Tutorial on Artificial Immune Systems at 1st Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA), 12 August 2003, The University of Nottingham, UK. Tutorial on Immunological Computation at International Joint Conference on Artificial Intelligence (IJCAI), August 10, 2003, Acapulco, Mexico. Special Track on Artificial Immune Systems at Genetic and Evolutionary Computation Conference (GECCO), Chicago, USA, July 12-16, 2003
2005-12-13 Y. Tan---Artificial Immune Sys. 52

AIS Resources
Artificial Immune Systems and Their Applications by D Dasgupta (Editor), Springer Verlag, 1999. Artificial Immune Systems: A New Computational Intelligence Approach by L de Castro, J Timmis, Springer Verlag, 2002. Immunocomputing: Principles and Applications by A Tarakanov et al, Springer Verlag, 2003. Third International Conference on Artificial Immune Systems (ICARIS), September 13-16, 2004, University of Catania, Italy. 4th International Conference on Artificial Immune Systems(ICARIS), 14th-17th August, 2005 in Banff, Alberta, Canada

2005-12-13

Y. Tan---Artificial Immune Sys.

53

First Page

Thats all

2005-12-13

Y. Tan---Artificial Immune Sys.

54

Case Study 1:

Malicious Executables Detection based on Artificial Immune Principles*


From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-

trier.de)

* This work was supported by Natural Science Foundation


of China with Grant No. 60273100.
2005-12-13 Y. Tan---Artificial Immune Sys. 55

Outline
Definition of Terms Goal and Motivation Previous Research works Immune Principle for Malicious Executable Detection Malicious Executable Detection Algorithm Experiments and Discussion Concluding Remarks
2005-12-13 Y. Tan---Artificial Immune Sys. 56

Back

Definition of Terms
Malicious Executable is generally defined as a program that has some malicious functions, such as compromising a systems security, damaging a system or obtaining sensitive information without the permission of users. It includes virus, trojan horse, worm etc. Benign Executable is a normal program without any malicious function.

2005-12-13

Y. Tan---Artificial Immune Sys.

57

tens of thousands of new viruses / year Appear!

But: Current antivirus systems

attempt to detect these new malicious programs with heuristics by hand (costly and ineffective)

Dos/Win32 viruses

Trojan horses

Computers / Information Systems

Worms Current Task: Devise new methods for detecting new ME


58

eMail attached viruses

Malicious executables
2005-12-13 Y. Tan---Artificial Immune Sys.

Back

Definition of Symbols and Structures


B: binary code alphabet, B={0,1}. Seq(s,k,l): short sequence cutting operation. Supposing s is binary sequence, and sb(0)b(1)b(n-1), b(i)B, then Seq(s,k,l)=b(k)b(k+1)b(k+l-1). E(k): executable set, k{m,b} m denotes malicious executable, b benign executable. E: whole set of executables, i.e., E= E(m)E(b). e(fj,n): executable as binary sequence of length n, and fj is executable identifier. ld: detector code length. lstep: step size of detector generation. dl: detector, dl = Seq(s,k,l). Dl: set of detector with code length l, i.e., Dl ={ dl (0), dl (1),, dl (nd-1)}, |Dl|= nd.

2005-12-13

Y. Tan---Artificial Immune Sys.

59

Back

Goal and Motivation

Aiming at developing an automatic detection approach of new malicious executables. Aiming at trying to use artificial immune system (AIS) and artificial neural networks (ANN), to detect malicious executable with a high Detection Rate (DR) with low False Positive Rate (FPR) over others.
2005-12-13 Y. Tan---Artificial Immune Sys. 60

Back

Previous Related Works


Signature-based Methods Expert Knowledge-based Methods Machine Learning Methods

2005-12-13

Y. Tan---Artificial Immune Sys.

61

Back

Signature-based Methods
It creates a unique tag for each malicious program so that future examples of it can be correctly classified with a small error rate. And relies on signatures of known malicious executable to generate detection models. Drawbacks: Can not detect unknown and mutated viruses. As increase of the number and type of viruses, its detection speed become slow dramatically. At the same time, the analysis of the signatures of viruses become very difficult, in particular, for the encrypted signatures. (refer to IBM Anti-virus Groups report: R.W. Lo, K.N. Levitt, and R.A. Olsson. MCF: a Malicious Code Filter. Computers & Security, 14(6):541566., 1995.)
2005-12-13 Y. Tan---Artificial Immune Sys. 62

Back

Expert Knowledge-based Methods


Using the knowledge of a group of virus experts to construct heuristic classifiers for detection of unknown viruses.

Drawbacks: Time-consuming analysis method. Only discover some unknown viruses, but its false detection rate is very high.
For detecting unknown virus based on ANN, IBM Anti-virus Group also proposes one method to detect Boot Sector viruses only.
(refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic Virus Detection. Proceedings of the 2000 International Virus Bulletin Conference, 2000.)
2005-12-13 Y. Tan---Artificial Immune Sys. 63

Back

Machine Learning Methods

M.G. Schultz developed a framework that used data mining algorithms, i.e., Multi-Nave Bayes method, to train multiple classifiers on a set of malicious and benign executables to detect new examples (unknown ME).
(refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for Detection of New Malicious Executables. IEEE Symposium on Security and Privacy, May 2001.)

2005-12-13

Y. Tan---Artificial Immune Sys.

64

Biologically-motivated Information Processing Systems


Brain-nervous systems Neural Networks (NN) Genetic systems Genetic Algorithms(GA) Immune systems Artificial Immune Systems(AIS) or immunological computation. NN and GA have extensively studied with wide applications but AIS has relative few applications

2005-12-13

Y. Tan---Artificial Immune Sys.

65

Natural prototypes vs. their models


Natural Biological prototype level Natural language Left hemisphere of brain Brain nervous Cells net Biological cells Cells Molecules of proteins Genetic code
2005-12-13

Computing model Formal logic Formal linguistic Artificial Neural networks (ANN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)
66

Molecular Molecular

Y. Tan---Artificial Immune Sys.

Comparison of Three Algorithms


GA (Optimisation) Components Location of Components Structure Chromosome Strings Dynamic Discrete Components Chromosome Strings Evolution Recruitment / Elimination of Components Crossover Fitness Function NN (Classification) Artificial Neurons Pre-Defined Networked Components Connection Strengths Learning Construction / Pruning of Connections Network Connections External Stimuli AIS Attribute Strings Dynamic Discrete components / Networked Components Component Concentration / Network Connections Evolution / Learning Recruitment / Elimination of Components Recognition / Network Connections Recognition / Objective Function

Knowledge Storage

Dynamics Meta-Dynamics Interaction between Components Interaction with Environment

2005-12-13

Y. Tan---Artificial Immune Sys.

67

Back

Immune Principles for Malicious Executable Detection


Non-self Detection Principle Anomaly Detection Based on Thickness The Diversity of Detector Representation vs. Anomaly Detection Hole
2005-12-13 Y. Tan---Artificial Immune Sys. 68

Non-self Detection Principle


For natural immune system, all cells of body are categorized as two types of self and non-self. The immune process is to detect non-self from cells. To realize the non-self detection, the maturation process of lymphocytes T cell undergoes two selection stages of Positive Selection and Negative Selection since antigenic encounters may result in cell death. Some computer scientists inspired by these two stages had proposed some algorithms used to detect anomaly information. Here, we will use the Positive Selection Algorithm (PSA) to perform the non-self detection for recognizing the malicious executable.
2005-12-13 Y. Tan---Artificial Immune Sys. 69

Back

Non-self Detection by PSA


Detector Set Dl N Match ?

Short sequence to be detected Its length is l

Y
self non-self

Process of anomaly detection with PSA

2005-12-13

Y. Tan---Artificial Immune Sys.

70

Back

Anomaly Detection Based on Thickness


Anomaly recognition process is one process that immune cells detect antigens and are activated. The activated threshold of immune cells is decided by the thickness of immune cells matching antigens.

2005-12-13

Y. Tan---Artificial Immune Sys.

71

The Diversity of Detector Representation vs. Anomaly Detection Hole


The main difficulty of anomaly detection is utmost decreasing the anomaly detection hole. The natural immune system resolves this problem well by use of the diversity of MHC (Major Histocompatibility Complex) cell representations, which decides the diversity of anti-body touched in surface of T cells. This property is very useful in increasing the power of detecting mutated antigens, and decreasing the anomaly detection hole. According to the principle, we can use the diversity of detector representation to decrease the anomaly detection hole. As was illustrated by following schematic drawings.
2005-12-13 Y. Tan---Artificial Immune Sys. 72

Schematic diagram of abnormal detection holes (cont)


Self Space Abnormal detection holes Nonself Space

Detectors

2005-12-13

Y. Tan---Artificial Immune Sys.

73

back

Reduction of abnormal detection holes by use of the diversity of detector representations

Detector Representation 1

Detector Representation 2

Detector Representation 3

Combination of detectors
2005-12-13 Y. Tan---Artificial Immune Sys. 74

Malicious Executable Detection Algorithm (MEDA)


MEDA based on AIS includes three parts, Detector generation, Anomaly information extraction , and Classification.
2005-12-13 Y. Tan---Artificial Immune Sys. 75

Back

Flow Chart of Malicious Executable Detection Algorithm (MEDA)


Gene (01101001) Generating detector set MEDA

Extracting property

anomaly

Classifier

Update Gene (10101101)

Executable to be detected (00111101)

Output

2005-12-13

Y. Tan---Artificial Immune Sys.

76

Generation of Detector Set


Detector generation algorithm:
Begin initialize lstepldk=0 Do cutting e(fk,n) from Eg(b) i=0 While i <= n-ld-1 do Begin d = Seq(e(fk,n),i, ld) if d Dld then Dldd i=i+lstep End k=k+1 Until Eg(b) is empty Return Dld End
2005-12-13 Y. Tan---Artificial Immune Sys. 77

Back

Illustration of Detector Generating Process

File Hex Sequence: 56 32 12 0A 34 ED FF 00 2D. . 00 0A 34 ED FF FA 11 00 Extracting Detector: 56 32 12 32 12 0A 12 0A 34 FF FA 11 FA 11 00

Generating Process of 24-bit Detectors with 8-bit stepsize (ld=24, lstep=8)


2005-12-13 Y. Tan---Artificial Immune Sys. 78

Extraction of Anomaly Characteristics -Non-self Thickness (NST) Non-self Detection NST, as Anomaly Property, is defined as the ratio of number of non-self units to file binary sequence, pl=nn/(nn+ns). If there are m kinds of detectors, the file has a NST Vector P=(pl1, pl2, , plm)T.

2005-12-13

Y. Tan---Artificial Immune Sys.

79

NST Extraction Diagram


Initializationchoose lstepld , Dl Nonself Detection File to be detected 00111101 Y

Is Nonself ? N ns add 1

nn add 1

Completing Y detection Compute pl=nn/(nn+ns)

End
2005-12-13 Y. Tan---Artificial Immune Sys. 80

Back

NST Extraction Algorithm


Begin open e(fk,n); Select lstep, ld; Set ns=0, nn=0, i=0; While i <= n-ld-1 do Begin s = Seq(e(fk,n),i, ld) if s Dld then nn = nn+1 else ns = ns + 1 i = i + lstep End pld = nn / ( nsnn ) Return pld End
2005-12-13 Y. Tan---Artificial Immune Sys. 81

BP Network Classifier
We use Anomaly Property Vector (APV), i.e., NST vector P, as input variable of two-layer BP network classifier. The number of nodes of input layer equals to APVs dimension. The Sigmoid transfer function is chosen for the hidden layer and Linear function for the output layer.
2005-12-13 Y. Tan---Artificial Immune Sys. 82

Back

BP Network Classifier Structure


Non-Self Thickness (NST) Vector

pl1 pl2

P
plm

Out (1-ME, 0-BE)

2005-12-13

Y. Tan---Artificial Immune Sys.

83

Back

Experiments and Discussion


Experimental Data Set Generation of Detector Set Experimental Result Using Single Detector Set Experimental Result Using Multi-Detector Set

2005-12-13

Y. Tan---Artificial Immune Sys.

84

Back

Experimental Data Set


Type BE ME Total Files 915 Remarks
Win 2K OS and some application programs. Worm, etc. from Internet. All Justified by Antivirus cleaner Tools

3566 DOS virus, Win32 virus, Trojan, 4481

BEBenign Executable MEMalicious Executable


2005-12-13 Y. Tan---Artificial Immune Sys. 85

Back

Generation of Detector Set

Eg(b) is Gene of generating detector, ld {16 24326496}, and lstep=8bits. By using the detector generating algorithm, we can get D16, D24, D32, D64, and D96, separately.
Table1: Detectors generation result
Code Length ld |Dld| store structure
2005-12-13

16 65536

24

32

64

96

10,931,62 8,938,35 7 2 Tree

12,768,36 21,294,85 1 7 Tree Tree

Bitmap Bitmap Index Index

Y. Tan---Artificial Immune Sys.

86

Detection Result of Malicious Executables by D24


(Detection Rate)%
P24

NST p24 File No. (a) NST of files, where symbol x represents benign program (Red), malicious program (Blue)
2005-12-13

(False Positive Rate)%

(b) ROC Curve

Y. Tan---Artificial Immune Sys.

87

Detection Result of Malicious Executables by D32


(Detection Rate)%

P32

NST p32

(a) NST of files, where symbol x represents benign program, malicious program
2005-12-13 Y. Tan---Artificial Immune Sys.

(False Positive Rate)%

(b) ROC Curve

88

Detection Result of Malicious Executables by D64


(Detection Rate)%

P64

2005-12-13

NST p64

(a) NST of files, where symbol x represents benign program (Red), malicious program (Blue)

(False Positive Rate)%

(b) ROC Curve

Y. Tan---Artificial Immune Sys.

89

Experimental Result Using Single Detector Set


100 80 60 40 20 0 0 20 40 60 80 100 1 6 b its 2 4 b its 3 2 b its 6 4 b its 9 6 b its D a ta D a ta D a ta D a ta D a ta Set Set Set Set Set

Detection Rate (%)

False Positive Rate (%)


2005-12-13 Y. Tan---Artificial Immune Sys. 90

Back

When FPR is fixed, relationship curves of DR versus Code Length ld


Detection Rate (%)

C o d e le n g th l d b its

2005-12-13

Note: from the bottom to up, the FPR Y. Tan---Artificial Immune Sys. is 0%, 0.5%, 1%, 2%, 4%, 8%, and 16%, in sequence.

91

Experimental Result Using MultiDetector Set


This experiment selects multi-detector set to detect benign and malicious executables. We dont use D16 because of its zero DR and also set D96 as upper limit because almost same DR values when ld 96. Here we selects D24, D32, D64 and D96 four detector sets as anomaly detection data set, and uses them to extract Non-self thickness (NST) vector, and finally a BP network is exploited as classifier. For the process of classification, we randomly selects 30% files of E(b) as Eg(b) to train a BP network, and use the remaining data to illustrate the anomaly detection performance.
2005-12-13 Y. Tan---Artificial Immune Sys. 92

NST Distribution and ROC Curve of Multi-Detector Set Method


64bits

24

bit

32bits

Detection Rate (%)

False Positive Rate (%)

(a) NST of files for mixture of D24, D32 and D64. x benign program (in Red), malicious program (in Blue).
2005-12-13

(b) ROC Curve of mixed detector set of D24, D32, D64 and D96
93

Y. Tan---Artificial Immune Sys.

Comparisons With Bayes Methods and Signature-based Method


100 80

Detection Rate (%)

60 40 20 0 0 2

M E D A w ith B P N e tw o r k N a ive B a ye s w ith S tr in g s M u lti- N a ive B a ye s w ith B yte s S ig n a tu r e M e th o d

10

12

False Positive Rate (%)


2005-12-13 Y. Tan---Artificial Immune Sys. 94

Back

Algorithm Complexities
Operation type 1 Algorithm Name detectors Prob. Info. Amount ltrain >>ltrain Operation type 2 Name detector matching Searching P(Fi/C) Amount 80ltes
t

Operation type 3 Name Computing NST Amount 4lf additions

Store Space

MEDA Bayes

0.4Gb 1Gb

Depend Computing lf float on P(Fi/C) Joint Probs. multiplicaP (C ) P ( F / C ) tions


n i =1 i

2005-12-13

Y. Tan---Artificial Immune Sys.

95

Remarks

Back

For short binary sequence and single detector set for the detection of malicious executables, the performance of D24 is the best, giving out DR 80.6% with FPR 3%. For long code length of detector and multidetector set, our method obtains the best performance of DR 97.46% with FPR 2%, over current methods. This result verifies
diversity of detector representation can decrease anomaly detection holes. non-self thickness detection.
2005-12-13 Y. Tan---Artificial Immune Sys.

Back

96

Case Study 2:

Film Recommender
From Dr. Dr Uwe Aickelin (http://www.aickelin.com) University of Nottingham, U.K.,
z

Prediction:
What rating would I give a specific film?

Recommendation:
Give me a top 10 list of films I might like.

2005-12-13

Y. Tan---Artificial Immune Sys.

97

Film Recommender (cont 1)


EachMovie database (70k users). User Profile: set of tuples {movie, rating}. Me: My user profile. Neighbour: User profile of others. Similarity metric: Correlation score. Neighbourhood: Group of similar users. Recommendations: From neighbourhood.
Y. Tan---Artificial Immune Sys. 98

2005-12-13

Film Recommender (cont 2)


Antigen Antibody

at io

User Profile: set of tuples {movie, rating} Me: My user profile. Neighbour: User profile of others. Affinity metric: Correlation score.
Su pp re ss io

ul im

Antibody Antigen Binding Antibody Antibody Binding

St

Neighbourhood: Group of similar users.

Group of antibodies similar to antigen and dissimilar to other antibodies Weighted Score based on Similarities.
2005-12-13 Y. Tan---Artificial Immune Sys.

Recommendations: From neighbourhood


99

Film Recommender (cont 3)


Start with empty AIS. Encode target user as an antigen Ag. WHILE (AIS not full) && (More Users):
Add next user as antibody Ab. IF (AIS at full size) Iterate AIS.

Generate recommendations from AIS.

2005-12-13

Y. Tan---Artificial Immune Sys.

100

Film Recommender (cont 4)


Suppose we have 5 users and 4 movies:
u1={(m1,v11),(m2,v12),(m3,v13)}. u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}. u3={(m1,v31),(m2,v32),(m4,v34)}. u4={(m1,v41),(m4,v44)}. u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.

We do not have users votes for every film. We want to predict the vote of user u4 on movie m3.
2005-12-13 Y. Tan---Artificial Immune Sys. 101

Algorithm walkthrough (1)


Start with empty AIS:
DATABASE u1, u2, u3, u4, u5 AIS

User for whom to predict becomes antigen:


DATABASE u1, u2, u3, u5
2005-12-13 Y. Tan---Artificial Immune Sys.

AIS

u4

Ag
102

Algorithm walkthrough (2)


Add antibodies until AIS is full
AIS DATABASE u2, u3, u5 u1 Ag Ab1 AIS DATABASE u4
2005-12-13 Y. Tan---Artificial Immune Sys.

u2,u3 Ab1

Ag Ab2 Ab3
103

Algorithm walkthrough (3)


Table of Correlation between Ab and Ag:
Ab3 Ab1 Ag Ab2

MS14, MS24, MS34.

Table of Correlation between Antibodies:


MS12 = CorrelCoef(Ab1, Ab2) MS13 = CorrelCoef(Ab1, Ab3) MS23 = CorrelCoef(Ab2, Ab3)
Y. Tan---Artificial Immune Sys. 104

2005-12-13

Algorithm walkthrough (4)


Calculate Concentration of each Ab:
Interaction with Ag (Stimulation). Interaction with other Ab (Suppression).
AIS Ag Ab1 Ab2 Ab3 Ab1 Ag Ab2 AIS

Ab2 Ab 2 Ab1 Ab2 Ab2

2005-12-13

Y. Tan---Artificial Immune Sys.

105

Algorithm walkthrough (5)


Generate Recommendation based on Antibody Concentration.
AIS Ag Ab1 Ab2 Ab2 Ab 2 Ab1 Ab2 Ab2

Recommendation for user u4 on movie m3 will be highly based on vote on m3 of user u2

2005-12-13

Y. Tan---Artificial Immune Sys.

106

Film Recommender Results


Tested against standard method (Pearson k-nearest neighbours). Prediction:
Results of same quality.

Recommendation:
4 out of 5 films correct (AIS). 3 out of 5 films correct (Pearson). Back
2005-12-13 Y. Tan---Artificial Immune Sys. 107