P. 1
Summer 2011 Company Meeting Narus

Summer 2011 Company Meeting Narus

|Views: 26|Likes:
Published by StopSpyingOnMe
Summer 2011 Company Meeting Narus
Summer 2011 Company Meeting Narus

More info:

Published by: StopSpyingOnMe on Aug 21, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/15/2013

pdf

text

original

Narus Company Confidential 1

Summer 2011 Company Meeting
CyberEagle: Automated Discovery, Attribution, Analysis and Risk
Assessment of Information Security Threats
Saby Saha, Narus
Lei Liu, Michigan State University
Prakash Mandayam, Michigan State University
Narus Company Confidential 2 Narus Company Confidential
CyberEagle
• Motivation and Challenges
• Project Layout
• Architecture
• Statistical Machine Learning/ Data Mining
• Results
• Conclusion & Future Work
Narus Company Confidential 3 Narus Company Confidential
Increasing Security Threats
• Continuous and increased attacks on infrastructure
• Threats to business, national security
• Huge financial stake (Conficker: 10 million machines, Loss
$9.1 Billion)
• Attacks are becoming more advanced and sophisticated
• Honeypots, IDS/IPS, Email/IP Reputation Systems are
inadequate
Zeus: 3.6 million machines [HTML Injection]
Koobface: 2.9 million machines [Social
Networking Sites]
TidServ: 1.5 million machines [Email spam
attachment]

Narus Company Confidential 4 Narus Company Confidential
More Sophisticated Attacks
Narus Company Confidential 5 Narus Company Confidential
Host Based Security
• Complete monitoring end hosts behavior and the
state of the system
• Often analyzes a malware program in a controlled
environment to build a model its behavior
• Pros
– Information rich view: high detection rate with low false
positive
– Reverse engineer the properties of the Threat
• Cons
– After-the-fact approach
• Require malicious code for analysis
– Fail to identify evolved threats
– Not effective to identify zero-day threats


Narus Company Confidential 6 Narus Company Confidential
Network Security
• Firewall systems
• IDS/IPS
• Network behavior anomaly detection (NBAD)
• Pros:
– Complete macro view of the network
– With the knowledge of good traffic it can identify
anomalies
– Able to identify new threats as anomalies
• Cons
– Generate large number of false positives
– Unsupervised approach, lacks ground truth
Narus Company Confidential 7 Narus Company Confidential
Bringing Them Together
• Leverage advantages of both the approaches
• Host-security tag flows with threat signatures
– Generates ground truth for associated with flows
• Network security can learn rich statistical model
for all threats using the flow data tagged with
ground truth
• Develop a comprehensive end-to-end data
security system for real-time discovery, analysis,
and risk assessment of security threats
Narus Company Confidential 8 Narus Company Confidential
Enhanced Comprehensive Security System
• Discover common and persistent behavioral
patterns for all security threats
– Even when sessions are encrypted (IDS/IPS fails)
• Generate precise threat alerts in real-time
– Reduce the false positive rate
• Identify new threats which has some similarities
with previous ones
– Newly evolved version of a threat
– New threat with similar behavioral pattern
• Inform about the newly identified threat to the
host-security
Narus Company Confidential 9 Narus Company Confidential
System Overview
Model Generation
• Extract Set of Transport Layer
Features
• Generation of Statistical Models
Classification
• Flush Out Model to Streaming
Classification Path
• Redirect Packets Matching Model to
Binary Analysis Module
Validation
Assessment
• Extract Executable and Execute
Executable
• Analysis of Information Touched
• Assess the Risk
• Increase Confidence and Alert
Narus Company Confidential 10 Narus Company Confidential
Information Flow
Narus Company Confidential 11 Narus Company Confidential
Supervised Threat Classification
• Data
– Network flow features
• Kernel
– Define similarity between different flows
• Classifier
– Binary to separate good from bad
– Multiclass to further separate bad flows
• Scalability issues
– Hierarchy
Narus Company Confidential 12 Narus Company Confidential
Challenges
• Irregular data
– Missing values.
– Imbalanced data
– Heterogeneous.
– Non applicable features.
• Large number of classes (Number of threats
reaches hundreds of thousands)
• New classes
• Noise in the data
• All threat classes may not be captured
• Minimize false positives
Narus Company Confidential 13 Narus Company Confidential
Preprocessing
• Normalization

• Deal with missing values
– Case deleting method:
– Mean imputation
• Overall classes
• Each individual class
– Median imputation
• Overall classes
• Each individual class


Narus Company Confidential 14 Narus Company Confidential
Classifier Framework

S
u
p
e
r
v
i
s
e
d

C
l
a
s
s
i
f
i
e
r

Flows
SNORT
Bad Flows
76 different classes
13935 Flows
Unknown Flows
44427 Flows
Class 1
Class 76
Class 2


Shellcode
Spambot_Proxy_Control_Channel
Exploit_Suspected_PHP_Injection_Attack
Macro-Level
Classifier
Unknown
CL_A
Bad
CL_B CL_N
Micro-Level
Classifier

Learning/Training
Learning/Training
Narus Company Confidential 15 Narus Company Confidential
Binary Classifier Results
• Biased SVM performance comparison with different kernels
Linear Kernel RBF Kernel Poly Kernel
Precision good 79.75 87.46 78.70
Recall good 87.07 90.42 97.79
F1 good 83.25 88.9347 87.2126
Precision bad 79.75 69.33 79.78
Recall bad 37.17 62.55 24.81
F1 bad 42.74 65.7657 34.8495
Accuracy 74.08 83.26 78.79
G-mean 56.89 75.21 49.25
• Kernel Learning
Narus Company Confidential 16 Narus Company Confidential
Binary Classifier Results
• Parameter selection for Biased SVM with RBF Kernel
When gamma=10,
C+/C_=0.5, win best
F1_bad = 0.6494
When gamma=10,
C+/C_=0.55, win best
F1_bad = 0.657657
Narus Company Confidential 17 Narus Company Confidential
Binary Classifier Results
• F1 bad comparison of the methods for Binary classifier
F1 best performance with/without noise: 79.07/88.7 %
F1 bad comparison with noise
45.57 45.57
46.41
63.74
65.7657
76.01
79.07
0
10
20
30
40
50
60
70
80
90
Bagging
SMO
Adaboost SMO KNN Biased
SVM
Decision
Tree
Bagging
Decision
Tree
F1 bad comparison without noise
51.7 51.7
53.4
79.43
67.55
86.8
88.7
0
10
20
30
40
50
60
70
80
90
100
Bagging
SMO
Adaboost SMO KNN Biased
SVM
Decision
Tree
Bagging
Decision
Tree
Narus Company Confidential 18 Narus Company Confidential
Preprocessing (Multiclass)
• Tree based generated features
– For each class k, do
• Repeat c times
– Collect samples from class k, label them +1
– Collect samples from class k
c
, label them -1.
– Build a regression tree on above binary data.
– Store the tree as T
ik

• End
– End
• Example:

Home
owner
Marital status Annual
income
Number of
children
age
- married 125K - 41
No Not married 70K N/A 22
No - 59K 1 55
yes Not married - N/A 23
yes married 100K 1 -
Tree 1 Tree 2 Tree 3 Tree 4 Tree5
-0.25 -1 -0.5 -1 -0.14286
-0.25 -1 -0.5 -0.33333 -0.14286
-1 0.2 1 1 0.142857
0.5 0.714286 0.5 0.25 -1
-0.25 -0.33333 -0.5 0.777778 -0.14286
Original features Tree based features
transformation
Narus Company Confidential 19 Narus Company Confidential
Preprocessing
• Multiclass results comparison with
– Original features
– Tree based generated features

Original Features
Tree based features
Class
ID
Precision Recall
F1 Precision Recall F1
24 77.65 78.30
77.97 86.12 88 87.05
25 63.62 70.02
66.67 79.3 82 80.63
28 99.36 99.70
99.53 100 100 100
48 82.16 73.95
77.84 79.68 77.9 78.78
68 69.05 71.38
70.20 67.7 76 71.61
76 66.58 71.23
68.83 68.45 66.6 67.51
76.40
80.21
77.43
81.75
76.84
80.93
73
74
75
76
77
78
79
80
81
82
83
Precision
original
features
Precision
Tree based
features
Recall
original
features
Recall Tree
based
features
F1 original
features
F1 tree
based
features
Average performance of 6 majority classes
Performance of 6 majority classes
Narus Company Confidential 20 Narus Company Confidential
Multi-class Classification
• Identify individual threats
• Identify new classes and provide properties
• Classifiers
– K-Nearest Neighbor
• No training involved
• Computationally intensive for testing
– Ensemble methods
• Failing to scale up for huge number of classes
– Sphere-based SVM
• Encapsulate each class in a hyper sphere.
• Transform data into appropriate space such that
they cluster into single cohesive unit
Narus Company Confidential 21 Narus Company Confidential
Building Kernel
• Let (X
i
,Y
i
) be the data points where Y
i
={+1,-1}
• Construct ground truth kernel K
– K
ij
= Y
i
Y
j
• Now learn a parametric kernel as follows
– K
ij
= f
θ
(X
i
,X
j
)

Home
owner
Marital
status
Annual
income
Number
of
children
age Y
- married 125K - 41 +1
No Not married 70K N/A 22 +1
No - 59K 1 55 +1
yes Not married - N/A 23 -1
yes married 100K 1 - -1
- Married - 2 32 -1
K
ij
~f
θ
(X
i
,X
j
)
Once θ is learned, it can be applied onto the test set.
=
T
y y
 
class
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
Narus Company Confidential 22 Narus Company Confidential
Kernel for Multi Class
• For each class we do following
– Collect samples belonging to class and label as +1
– Collection samples from rest of data and label as -1
– Build separate kernel for each class.
class
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
K
ij
~f
θ
(X
i
,X
j
)
Narus Company Confidential 23 Narus Company Confidential
Boosted Trees for Kernel Learning
(
(
(
(
(
(
(
(
¸
(

¸

+
÷
+
÷
÷
+
=
1
1
1
1
1
1
1
y
1 2 3 4 5 6
1 +1 -1 -1 +1 -1 +1
2 -1 +1 +1 -1 +1 -1
3 -1 +1 +1 -1 +1 -1
4 +1 -1 -1 +1 -1 +1
5 -1 +1 +1 -1 +1 -1
6 +1 -1 -1 +1 -1 +1
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
Output of tree 1 Kernel matrix for tree 1
(
(
(
(
(
(
(
(
¸
(

¸

+
÷
÷
+
÷
+
=
1
1
1
1
1
1
2
y
Output of tree 2 Kernel matrix for tree 2
1 2 3 4 5 6
1 +1 -1 +1 +1 -1 +1
2 -1 +1 -1 +1 +1 -1
3 -1 +1 +1 -1 +1 -1
4 +1 -1 -1 +1 -1 +1
5 -1 +1 +1 -1 +1 -1
6 +1 -1 -1 +1 -1 +1
¿
.
Narus Company Confidential 24 Narus Company Confidential
Multi class Results
Spheres require only K =6
(number of classes)
comparison whereas KNN
require N comparisons.

Narus Company Confidential 25 Narus Company Confidential
Classification +New Class Detection
Find transformation
to separate class +
from rest of data
Find transformation
to separate class x
from rest of data
Find transformation
to separate class --
from rest of data
Find transformation
to separate class ^
from rest of data
Build a
separate
Kernel for
each class
Narus Company Confidential 26 Narus Company Confidential
New Class Generation
Narus Company Confidential 27 Narus Company Confidential
Conclusion
• CyberEagle: An enhanced comprehensive security
system
– Bringing Host and Network security together to fight
security threats
• Identify threats that IDS/IPS fails to detect
(Encrypted, evolved)
• Identify new threats in the earliest stage
• Generate signatures for the new threats and alert
the host security system in an automated way


Narus Company Confidential 28 Narus Company Confidential
Future Work
• Improve classification accuracy
• Scaling up for huge number of classes
• Reduce computation during classification
– Learn class hierarchy
– Increase speed without sacrificing accuracy
• Validate with diverse data
• Reputation analysis of the ip addresses
• Online update of the classifier
• Mapreduce implementations

Narus Company Confidential 29
Summer 2011 Company Meeting
Thank You
Prakash, Lei, Saby

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->