You are on page 1of 24

Anomaly Detection and Analysis

from Heterogeneous Data

Auroop R. Ganguly
Olufemi A. Omitaomu
GIST Group
CSE Division
Oak Ridge National Laboratory

Managed by UT-Battelle
for the Department of Energy
Offline Analysis of Disparate Data
Leads to Faster and More Reliable
Real-Time Decisions
Consequences Action Additional Data
Hypothesis Generation

Process End-User Analyst


Modeling
Decision Optimization
Risks
Offline Body of
Predictive Analysis Actionable HIT
Knowledge Decision
Pattern
Detection
Support Online
Decision Making
Data
Integration

Stream of
New Data
2 Managed by UT-Battelle
for the Department of Energy
Two Distinct Case Studies

ANOMALY ANALYSIS AND DETECTION


Case Studies

Remote Sensing Transportation Security

3 Managed by UT-Battelle
for the Department of Energy
Online Detection of Anomaly, Change
and Change Point from Space-Time Data
Problem Statement: Develop approaches that
can detect anomalies, change and change point
from time series and spatial data in an online
mode for application in threat cognizance and
remote sensing.
Technical Approach: Methods motivated from
statistical process control detect large outliers
and sustained anomalies or change in space
and time, and methods motivated by simulated
annealing detect change points.
Benefit: Real-time change or anomaly analysis
in distributed applications; Examples in threat
cognizance and remote sensing.

4 Managed by UT-Battelle
for the Department of Energy
Remote Sensing Change Detection

ORNL Staff ORAU Students


 Auroop Ganguly, CSED (PI)  Yi Fang1

Student Collaborators
 Nagendra Singh1, ORAU
 Veeraraghavan Vijayaraj1, ORAU
 Neil Feierabend1, ORAU
 David T Potere1, ORAU

1: CSED Post-Master

5 Managed by UT-Battelle
for the Department of Energy
State of the Art in Online KD
Alarm Generation via Adaptive Metrics

 Lambert and Liu, 2006


– Reference Distribution: Large Deviation  Alarm
– Monitors Streams of Network Counts
– Leading Indicator of Network Performance Change

 Agarwal et al, 2006


– Predictive Model: Large Error  Alarm
– Monitors Count of Known Word Pattern in Websites
– Leading Indicators of Social Disruptive Events

6 Managed by UT-Battelle
for the Department of Energy
Ganguly and
Fang, 2006 Special Session: Sensor-Cyber Networks for Homeland Defense

7 Managed by UT-Battelle
for the Department of Energy
Domain
Remotely Sensed Land Cover and Wal-Mart

 Land cover change in real time


– Change in vegetation index (NDVI)
– “Real-time”  Data Update @ Data Acquisition

 Data sets
– 16-day NDVI Composites (UMD)
– Wal-Mart data for validation (Potere et al., 2006)

 Case study
– 3 Wal-Mart Stores (CA, ME, NC)
– Space: “Construction”, “bordering”, “background”
– Time: “Groundbreaking”; “Store Opening”
8 Managed by UT-Battelle
for the Department of Energy
Wal-Mart’s Spread
1962 - 2004

Courtesy: Feierebend et al., 2006 (AAG)


9 Managed by UT-Battelle
for the Department of Energy
Apple Valley Distribution Center
May 1994 May 2003

Terraserver Google Earth/DigitalGlobe

Apparent ground Distribution Center


breaking, early 2003 Opens March, 2004

10 Managed by UT-Battelle
Courtesy: Feierebend et al., 2006 (AAG)
for the Department of Energy
All 3 Sites Courtesy: Feierebend et al., 2006 (AAG)

Indicates
Opening
Date

11 Managed by UT-Battelle
for the Department of Energy
Method
Online Change Detection
 Reference Model
– Difference of the Time Series (“Wal-Mart” vs. “Background”)
– Cube Root Transform for Variance Stabilization
– Transformed, Difference Time Series: IID and Gaussian

 Online Alarms for Change  O(1)


– Sustained changes (even if small) of interest
– Statistical process control methods (as in Lambert & Liu, 2006)

 Online Change-Point Detection  O(1) to O(n)


– Heuristic and stochastic
– Backward (downhill) search
– Similar in spirit to simulated annealing

 Updating Parameters  O(1)


– Process in Control: Mean held to zero and variance updated
– Alarm Generation: Mean converges to new state
12 Managed by UT-Battelle
for the Department of Energy
Results
Change Alarms and Change-Point Detection

 Validation of change detection / alarms


– Alarm generated prior to store opening
– Alarm generated after groundbreaking (if available)

 Validation of detected change points


– Change point detected prior to alarm (by design)
– Change point detected prior to store opening
– Change point detected during groundbreaking

 Results
– CA store: Groundbreaking available  Near perfect validation
– ME, NC Stores: Groundbreaking not available
 Approximate match
– Experiments consistent with expectations

13 Managed by UT-Battelle
for the Department of Energy
Conclusion: Online Performance

 False Alarm Rate


– Controlled by an EWMA parameter
– For our experiments, Expected FAR = 1 in 500

 Computational Complexity
– Alarm Generation: O(1) or constant time
– Change-Point Detection: O(1) to O(n)
– Parameter Updates: O(1)

 Online Updates
– New Severity Metric requires the following:
 Current Severity metric
 New Data

14 Managed by UT-Battelle
for the Department of Energy
Anomaly analysis from heterogeneous
data for transportation security
Problem Statement: Provide an end user with
the ability to make fast and reliable decisions
on whether a truck at a weigh station
represents a plausible security threat, for
example, owing to camouflaged illicit
radioactive materials, by using historical truck
data and new truck information from disparate
sensors.
Technical Approach: A multivariate statistical
characterization of trucks based on analysis of
archived historical information in an offline
mode, and an online analysis of new truck data,
helps detect potential anomalous behavior
from heterogeneous sensor data.
Benefit: Reduces false alarms without
Offline analysis of normal behavior compromising on the probability of detection,
informs online anomaly analysis, which is leading to greater potential for ensuring
presented in a usable form to end-users transportation security without disrupting
commerce.

15 Managed by UT-Battelle
for the Department of Energy
Transportation Security Team

ORNL Staff ORAU Students


 Auroop Ganguly, CSED (PI)  Yi Fang1
 Vladimir Protopopescu, CSED (Co-PI)  Olufemi Omitaomu2

External Collaborators ORNL SensorNet®


 Amrudin Agovic3, UMN
 Bruce Patton
 Arindam Banerjee, UMN
 Steven Saavedra
 Randy Walker

1: CSED Post-Master
2: CSED Post-Doc
3: Univ. PhD Student
16 Managed by UT-Battelle
for the Department of Energy
Case Study: Weigh Station Inspection Process

Courtesy: SensorNet Program


17 Managed by UT-Battelle
for the Department of Energy
Problem Challenges

 Multiple Heterogeneous Data


– Static Scale Data (eight features)
 Truck Length, Number of Axles, Axle Weights, Length, Speed
near Sensors, Distance from Sensors
– Radiation Data
 Gross Counting Data (Gross counts along truck length, or
during the time the truck passes in front of the sensor)
 Spectroscopy Data (Radiation count versus energy, with peaks
and profiles indicating cargo type)

 Other Data/Metadata
– Image Data (e.g., truck image and license plate)
– Text Data (e.g., cargo manifest)
– Combined Image and Text Data (e.g., drivers license)

18 Managed by UT-Battelle
for the Department of Energy
Offline Anomaly in Static Scale Data

 Three consistent and distinct


groups of trucks are observed
among datasets from month to
month
 Three features and their mutual
relationships are found to be
relevant
 Unexpectedly, truck weight is not
among the features
– Weight can vary significantly when
other variables remain constant

 Unexpectedly, truck distance from


sensor is among the features

19 Managed by UT-Battelle
for the Department of Energy
Online Anomaly in Static Scale Data

 The results from offline


anomaly analysis are used for
anomaly detection
 To test the consistency of
these data points
 Trucks with anomalous
features were consistently
identified
 This analysis is found more
useful from safety perspective
 Could enhance other analyses
from security perspective

20 Managed by UT-Battelle
for the Department of Energy
Novel Approach for Denoising Signals

 Remove noise using an


approach based on empirical
modes rather than a priori fixed
function basis
 Adaptive approach
– Reduces uncertainties in
removing noise
– Applicable for real-time
analyses
– Could enhance anomaly
detection objectives
– Suitable for non-linear
processes

21 Managed by UT-Battelle
for the Department of Energy
Anomaly in Radiation Signals

 Transform the datasets onto


ANOMALY DETECTION APPROACH
another domain in which the Read
small differences are magnified
Wavelet Binary Representation Anomaly
Radiation Transformation of Wavelet Coefficients Detection
Datasets Method

using a technique that


DATA
DECISION
– Retains the time dependence of SAMPLING
PROJECTS
ANALYSES

the data
CASE 1 CASE 2

– Does not smooth the data 0.8 0.8

Gross Counts
0.6 0.6

0.4 0.4

 This approach reduces the 0.2

0
0.2

number of false alarms which in 0 0.05 0.1

CASE 3
0.15 0.2 0 0.05 0.1

CASE 4
0.15 0.2

turn:
0.8 0.8

Gross Counts
0.6 0.6

0.4 0.4

– Could decrease additional costs 0.2 0.2

of operating weigh stations 0


0 0.05 0.1

CASE 5
0.15 0.2
0
0 0.05 0.1

ALL CASES
0.15 0.2

0.8 0.8

– Shortens truck processing time CASE 1


Gross Counts

0.6 0.6
CASE 2
0.4 0.4 CASE 3

– Reduces the impacts of truck 0.2 0.2


CASE 4
CASE 5

inspections on supply chain 0


0 0.05 0.1
Energy (MeV)
0.15 0.2
0
0 0.05 0.1
Energy (MeV)
0.15 0.2

networks
22 Managed by UT-Battelle
for the Department of Energy
Offline Analysis of Disparate Data
Leads to Faster and More Reliable
Real-Time Decisions
Consequences Action Additional Data
Hypothesis Generation

Process End-User Analyst


Modeling
Decision Optimization
Risks
Offline Body of
Predictive Analysis Actionable HIT
Knowledge Decision
Pattern
Detection
Support Online
Decision Making
Data
Integration

Stream of
New Data
23 Managed by UT-Battelle
for the Department of Energy
Demonstration: ADRAT

24 Managed by UT-Battelle
for the Department of Energy

You might also like