You are on page 1of 25

MI HANN ET UN EI OLE LUNA AMANAT AL MATTER

US 20180103052A1
( 19) United States
(12 ) Choudhury
Patent Application
et al.
Publication ( 10) Pub . No.: US 2018 /0103052 A1
(43) Pub . Date: Apr. 12 , 2018
(54 ) SYSTEM AND METHODS FOR AUTOMATED Publication Classification
DETECTION , REASONING AND (51) Int . CI.
RECOMMENDATIONS FOR RESILIENT H04L 29 /06 (2006 .01)
CYBER SYSTEMS GOON 3 /04 (2006 .01)
(71) Applicant: BATTELLE MEMORIAL G06N 3 / 08 (2006 .01)
INSTITUTE , Richland , WA (US) G06F 17/ 30 (2006 .01)
(52) U .S . Cl.
(72 ) Inventors : Sutanay Choudhury , Kennewick , WA CPC ...... H04L 63/ 1425 (2013 .01 ); H04L 63/ 1416
(US ); Kushbu Agarwal, Kennewick , ( 2013 .01) ; H04L 63/ 1441 (2013 .01) ; G06F
WA (US); Pin - Yu Chen , Yorktown 17 /30958 ( 2013 .01) ; GO6N 3 /0427 ( 2013 .01) ;
Heights, NY (US) ; Indrajit Ray, Fort G06N 3/ 08 (2013.01 ); G06N 3/ 0445 ( 2013. 01 )
Collins, CO (US)
(73 ) Assignee : BATTELLE MEMORIAL (57) ABSTRACT
INSTITUTE , Richland , WA (US )
(21) Appl. No.: 15/730,028 A method for securing an IT ( information technology )
system using a set of methods for knowledge extraction ,
(22) Filed : Oct. 11 , 2017 event detection , risk estimation and explanation for ranking
Related U .S . Application Data cyber- alerts which includes a method to explain the rela
tionship (or an attack pathway ) from an entity (user or host )
(60 ) Provisional application No. 62/406 ,699, filed on Oct. and an event context to another entity ( a high - value
11 , 2016 . resource) and an event context (attack or service failure ).

ATTACKER GROUP ) ow ( USERX


www

USES AT ACK TYPELUSESATTACK TYPE _


USES ATTACK TYPE USES ATTACK TYPE SENDE
SENDS
www
CLICKED
W

DDOS EXFILTRATION Y HAS EVENT PHISHING EMAIL WITH SIGNATUREX ) w


w
HAS EVENT
M . * * MO no
* **

*
NETWORK EVENT N3 NETWORK EVENT N1 NETWORK EVENT N2
TOYOTE naux
Patent Application Publication Apr . 12 , 2018 Sheet l of 14 US 2018/0103052 A1

USERS HOSTS APPLICATIONS


-
E

nmentcomen …

ALICE
?-mmcurrenyou vw

CDICE ??PP
# pute

Airinatre
" #-???wwwMHwwwHTTP
CHARLIE
#IRTHwig

on er
Patent Application Publication Apr . 12, 2018 Sheet 2 of 14 US 2018 /0103052 A1

ER TE -

ATACK TECHNQIQUES
ATTACK TECHNIQUES
Q
wwwwwwwwwwwwwwwwwwwwwwwwww ULLAR

EVENTS O o o o
X

OVO

FIG . 2
W
Patent Application Publication Apr . 12 , 2018 Sheet 3 of 14 US 2018/0103052 A1

SOFTMAX

ATTENTION OVER ATTACK


TECHNIQUES , mm.www ???

| …

w
=
? wtw :
W
-

gum * * *

sty?

ENCODING OF New
N
M
?

TALtEsEL vwmteorwm
EVENTS INTO
max EL
ABSTRACTATTACK
SIGNATURES
ow
?M

[ ,

,
,

,
?

[] ]
EVENT LEVELATTENTION NN
A
wow……
M ?wwww

? w www
xt

.
EMOTING
ENCODINGOFOF OSS, |
NETWORK AND wwmwwwwewomm
tis aut#seagt
isintus
-
mm
we
Www
MMM
>

Peria
APP EVENTS pto 172

?? ?? :

FIG. 3
Patent Application Publication Apr . 12, 2018 Sheet 4 of 14 US 2018 /0103052 A1

larry_ page
google www _google _com _about_ corporate_company_execs_htm

webkit eric_ schmidt)


apple _inc wikicategory _apple_inc_employees
0.53413241360 .218912565625
Steve jobs
Shradbrad_pittpit
the_mexican Wikicategory_ got_rights_ activists _ from _the_united states

(alan _ silvestri) ( susan_ sarandon

saturn _ award wikicategory_ armerican _ antiiraq _war_ activists


0.586340464
0.2208605504
george _clooney)

FIG . 4
Patent Application Publication Apr . 12, 2018 Sheet 5 of 14 US 2018 /0103052 A1

lebron james

akron ohio
wikicategory _ basketball_players _from _chio
( ara_parseghian ) red blair

eddie_ robinson _coach _of_the_year wikicategory _ chio _ state _buckeyes_ football_coaches


0.5788374384
0.30532084
urban_meyer

FIG . 5
Patent Application Publication Apr. 12 , 2018 Sheet 6 of 14 US 2018 /0103052 A1

IhNaFsESRkiElD
PUBS
FROM
_
E

NVID
te
4
ordom
w
WrIeKlIaPEtDeIdA
FROM
LEARNT
topic
_E
11

DATA
PROJ
FROM
LEARNT
role
_
has
wmoshema '
,
REPS
TECH
MINING
FROM
LEARNT
on
work
_
can Scientist
Stat wy )
GraphicalModels
_
Prob
xhemos
John
Database
HR
pe
) 6
.
FIG

CHART
_
ORG
LEARNT
FROM
role
has
Scients2Math
.

Scients2 4
NR

ang
a
REPS
TECH
MINING
FROM
LEARNT
on
work
_
can
ID Bad
LEARN
FROM
WRESA
WFROM
Ltopic
_
rIEKeIlAPaREtDNIeTAd

MScienatsDchret SetTheory
Patent Application Publication Apr . 12 , 2018 Sheet 7 of 14 US 2018 /0103052 A1

STRM
TFROM
_
Ccproc
fjob
ELarAtEeSMqIEguFToeIRrnEYyDt

SolarWinds
ap Name
Sceiunrtisy mwana

FLAG
RED
_
BIG
MM

NetworkMape
Job25128 weA

SE SION skill
_
has S(cientisSta jobfrequent
7
.
FIG

fjobrequent
USER Hado p
Sesion12845 W ww
John

hasAcount
resource
proj
_
has
)
PROJ
ASCR
_
DOE
HOST
ClusterX
Patent Application Publication Apr . 12 , 2018 Sheet 8 of 14 US 2018 /0103052 A1

HASEVENT
the
wut
one
the
so

*puroown

X
USER CLICKED N2ENVTEWONRTK Maw
SWEPIHGMNSATHUIRHNELGX
w

a
an

SENDS HASEVENT
DE NAENVTEWONRTK 8
.
FIG
w

were

GATROCUKEPR TYPEUSESAT ACK


w

w
EXFILTRAION
iatna
N3ENVTEWONRTK
E

EN

D OS
Patent Application Publication Apr . 12, 2018 Sheet 9 0f14 US 2018/ 0103052 Al

PersonX

MEETS
Persony

worksAt

_ Company2
hasEvent
DebtDefault

ISA ?????hasEvent
PersonalEventStress
s PersonZ

ThasEvent

EventEmploy Term

? ? ??
ti? ? ? -

?? ???
.. ??????
released patch to fix
Industryz MaliciousAct
* ?? ?

FIG, 9
Patent Application Publication Apr. 12 , 2018 Sheet 10 of 14 US 2018 /0103052 A1

9. 9
1998 DEV3
EXTERNAL R3

R2
DEVI
R1 INTERN
DB2
de 253 ane
10
.
FIG
-UFRSEQEU RNT

-USERFREQUENT
CEO
CEO
e
w
w

IS
-
RISKsonwe w
no

IS
-
RISK
RISK
-
IS
MSTACTHEMIENET
S
'
CEO
TO
RISK
A
IS
DEV1 TOCFDEXORPNLAQVEUCTINOS I
ounon

EASXTPLNEMATPDILONE
{- 22
FDB2.CEO'sMSRAECQUHVEINTCLEY
TODB2.CON ECTS DEV1

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Patent Application Publication Apr. 12 , 2018 Sheet 11 of 14 US 2018 /0103052 A1

HASHASHTAG COLD

USERU3 HASHTAG
HAS
/
LHOCATISON
HASHASAHSTHAIG
*
*
*
*

C1
CITY
LOCATION
HAS
|
USERU1 SHASHI
HASHTAG
HAS
/ 11
.
FIG

LHOCATISON HASHATSAHTAG HEADCHE


U2
USER

HASHASHTAG www

66
TIRED
HASHASHTAG
VIRUS
Patent Application Publication Apr. 12 , 2018 Sheet 12 of 14 US 2018 /0103052 A1

* * # mmm

PersonOflnterest
FRIENDSWITA
WITH DIA i FRIENDS WITH
PERSON P2
*
N

on
Wnow FRIENDSWITH SA
PERSON P3 PERSONP1 )
*
*

PERSON P4 LIVES IN / LIVES IN LIVESE IN


Vro
FRIENDS WITHLIVESIN
FRIENDS WITH
ALI
LIVEO

PERSON P5
IS MEMBER OF
ILL
I ! RIA
AUNCH LAAM A T
Terroristorg ,
FIG . 12
Patent Application Publication Apr. 12 , 2018 Sheet 13 of 14 US 2018 /0103052 A1

RUS IA MEMBER
IS
OF
) OFSISMUAPJLOIER

MEMBER
IS
OF
B

E
TEDNATIONS
KERE *

HEADMQULRASTED USA
(

FORD
OFTYPEIS
MMPARNUOFDCTRE ANUFCTRE PRODUCT
AKANT USES
M(RAREEIANRETAHL
TYPE
IS
'
T
OF
)

PAL ADIUM 13
.
FIG
OF
TYPE
IS
USES
E

ACTURE
WEKA )CARMANUFACTURE
CONTAIN
MANUF
R
WANA

CARS
CARS w NW

COANTVELRYTOICS
Patent Application Publication Apr. 12 , 2018 Sheet 14 of 14 US 2018 /0103052 A1

ww w w

0
.
2
:
Score
Pattern
Importance
High
is
940
_
Biz
-
Server
Client
. w 0
.
2
:
Score
Patter
iCmlpoirteancte
High
is
10
_
Moo
-
and 0
.
2
:
Score
Pattern S-Biz940isHigh
Imeprovteanrce
M:es age Vulnerabil ty
High
is
0
_
PC :Mes age VHulnieragbilhty Mes age
: 0
OO
00000000 . 0 $
0

TIKRAKKKKKKKKKK . . LARRRRR
000000 .SO0 $ O
#
1

Dee
$
0

OO
OQO OQO BADOO
0

w
.
OOOJOO
OO

SkypeEmail :

90* O $ * $ *$
OO
14
.
FIG
OOOO
ON V
06

Infra: HR
OO 0 0 OOO OOO
#

lo o o trenuto
0000
SOOO 00
OOR

Computing KAadnmaidn 080


0090 0 Q
oo
0

CloudServics com #
0

STREAMWOKS
US 2018/0103052 A1 Apr. 12 , 2018

SYSTEM AND METHODS FOR AUTOMATED tasks related to 1 ) knowledge extraction , 2 ) event detection ,
DETECTION , REASONING AND 3 ) estimation and ranking for cyber-alerts, and 4 ) risk
RECOMMENDATIONS FOR RESILIENT estimation and recommendation . In some embodiments ,
CYBER SYSTEMS these methods for performing knowledge extraction include
modeling a set of specifications to build a knowledge graph
CLAIM TO PRIORITY using a data - driven approach to capture behavioral informa
[0001 ] This application claims priority to and incorporates tion of each entity . The event detection portion can be
by reference in its entirety U .S . Patent Application No . accomplished by methods that include augmenting knowl
62/406 ,699 filed by the same applicant on 11 Oct. 2016 . edge graph with information in the data stream to detect an
abstract event” or a “ kill- chain " event. Explanation and
STATEMENT AS TO RIGHTS TO INVENTIONS ranking techniques include explaining the relationship (or an
MADE UNDER FEDERALLY -SPONSORED attack pathway ) from an entity (user or host) and an event
RESEARCH AND DEVELOPMENT context to another entity (a high - value resource ) and an
10002] This invention was made with Government support event context (attack or service failure ). Risk estimation and
under Contract DE -AC0576RL01830 awarded by the U .S . Recommendation methodologies estimate risk or interest
Department of Energy . The Government has certain rights in ingness measures to rank “ alerts ” on an entity by studying
the invention. the behavior within the knowledge graph .
BACKGROUND OF THE INVENTION [0007 ] KNOWLEDGE EXTRACTION : In one arrange
ment modeling the data streams from multiple cyber data
[ 0003] Cyber security is one of the most critical problems sources as a property graph constitutes a basic step in the
of our time.Notwithstanding enormous strides that research process . An information technology infrastructure collects
ers and practitioners have made in modeling and analyzing multiple types of data such as network traffic (also referred
network traffic , attackers find newer and more effective to as netflow ), and event logs . The network traffic data
methods for launching attacks , which means defenders must stream provides information about the communication
revisit the problem with a new perspective . While it is between machines , while the event logs provides informa
acknowledged that not all attacks can be detected and tion about user -level activities through events reported by
prevented , a system should be able to continue operating in operating system software or other applications such as a
the face of such an attack and provide its core services or
missions even in a degraded manner. To build such a word processor or web browsing software . Wedescribe a set
resilient system , it is important to be proactive in detecting of methods to transform these multiple data sources into a
and reasoning about emerging threats in an enterprise net single , unified representation referred to hereafter as a cyber
work and their potential effects on the organization , and then knowledge graph or a knowledge graph . In some examples
identify optimal actions to best defend against these risks . the cyberknowledge graph is an instance of a property graph
[0004] To help in understanding the problem an example concept, which is a variant of a semantic graph concept. A
scenario is provided . Assume that a cyber analyst is tasked semantic graph is a data structure that captures the interac
to protect an information technology infrastructure of a tion or connection (often referred as an edge ) between a set
financial institution or a university . The list of resources to of entities ( or nodes ). An example of a semantic graph is
protect includes databases containing sensitive data such as representation of a social network that captures friendship or
payroll or personally identifiable information , and other professional relationships between a set of people. Property
critical services, such as email, Internet access, or business Property graphs are a specialization of semantic graphs ,
systems. The goal of the analyst is to maximize the avail where each of the nodes and edges can capture additional
ability of these network resources while protecting the attributes or details in the form of arbitrary sets ofkey - value
system integrity from malicious attacks . A typical system pairs . In one of the embodiments the cyber knowledge graph
has hundreds of thousands of devices or workstations that is a semantic graph model capturing the association between
connect to the infrastructure daily . We also have a similar three types of entities : users , machines and applications and
number of users who employ one ormore devices to connect their properties. This method and proposed representation is
to multiple network resources and perform tasks. Hence a agnostic to the applications or software platforms that gen
need exists in such an environment to continuously monitor erate the network traffic or event logs datasets thatmodel the
the vast amountofdata flowing through the network , detect behavior and form the basis for the tripartite cyber knowl
and reason potential problems, and prioritize about how and edge graph . Such a representation supports heterogeneous
where to respond . The present disclosure provides a series of computing environments observed in most internally man
advancements toward addressing these needs. aged or cloud -based information technology infrastructures .
100051 Additional advantages and novel features of the [0008 ] In this exemplary embodiment the nodes in the
present disclosure will be set forth as follows and will be cyber knowledge graph represent software systems such as
readily apparent from the descriptions and demonstrations a workstation or virtualmachine ( subsequently referred to as
set forth herein . Accordingly, the following descriptions of machine ) and capture the vulnerability of these systems as
should be seen as illustrative of the invention and not as an attribute . Likewise the edges in this graph represent, for
limiting in any way. example , the following relationships: user-machine edges
SUMMARY capture access control configurations that describe which
user has access to a specific workstation or mobile device .
[0006 ] The present disclosure provides descriptions of Attributes of this edge indicate whether the user has admin
systems and methods for securing an IT (information tech - istrative access to the relevant resource. Similarly , commu
nology ) system using a set of techniques for accomplishing nication between two networked software entities (such as a
US 2018/0103052 A1 Apr. 12 , 2018

workstation or a virtual machine) can be represented as [0013 ] We extract a multi - scale graph model from the
edges between nodes representing machines and applica - original cyber knowledge graph to represent such hierarchi
tions . cal structure for aggregation of information , from high
10009] EVENT DETECTION : One set of methods that resolution , narrower context to coarser-resolution, broader
operate on this cyber knowledge graph model detect behav context. For example , we may first aggregate information
ioral anomalies around real-word entities represented by such as allmachine -to -machine communication for a given
graph nodes, map the anomalous behavior and recommend IP address, and then aggregate activities (discovered from
alterations to the cyber infrastructure configuration to the communication sequence ) across multiple workstations
improve its robustness. As an example of the resulting or devices to infer behavioral signatures for a single user
capability provided by the implementation of these methods account, which can be further aggregated to infer group
an organization will be able to a ) detect when a user (who level behavior of users or machines within an organization .
is a potential insider threat) connects to multiple network Detailed examples of such hierarchical aggregation from the
services that are inconsistent with his profile , b ) evaluate risk cyber knowledge graph are described hereafter.
and explain why detected activity should be investigated by [0014 ] As described earlier , we train a LSTM Neural
a human expert and , c) recommend altering network security Network model to detect important behavioral shifts at each
configurations such as restricting user ' s access to specific layer of the multi-scale graph model. Graphs are discrete
resources or applications to minimize the risk to the larger data structures , whereas neural networks operate on con
infrastructure . Today, none of the state - of-the -art systems tinuous, differentiable variables. Therefore , we first trans
provide an end -to - end analytics and decision support capa form the graph models at each scale to a continuous valued
bility outlined as above . representation in high - dimensional vector space . Effectively,
[0010 ] In one embodiment an event detection component the graph model is transformed into a tabular representation
that is robust to varying contexts or scenarios that emerge where the features learnt on every node preserve the topo
around an entity of interest, where the entity can be a user logical properties associated with its neighborhood . Typi
or a device represented as a node in the cyber knowledge cally, each node in the graph model becomes a point in 50 ,
graph . The state of the art anomaly detection systems 100 or 200 - dimensional vector space .
typically focuses anomaly detection to a single data stream , [0015 ] LSTMs are a specialization of Recurrent Neural
such as either network traffic or event logs. On the contrary , Networks (RNN ),which has the form of a chain of repeating
our developed method first unifies multiple data sources into neural network modules . The RNNs have emerged as the
a unified property graph representation . preferred neural network architecture for data that is mod
[ 0011 ]. In the described embodiment, anomalies can eled as a sequence . In the context of the Cyber Knowledge
appear through the formation of different subgraph struc Graph , a sequence can represent all communication records
tures. Our primary contribution is a neural network based associated with a given machine , or the sequence of activi
method that detects anomalies by analyzing the changes in ties that are learnt from the communication records) asso
the Cyber Knowledge Graph at multiple scales. It first ciated with a user account on multiple workstations. Given
monitors subgraph changes at a fine grained resolution in a a sequence , predicting the next element is often a frequent
very specific context. Next, potential changes detected in task for sequence -based data such as a stream of events
such specific , fine grained resolution are propagated occurring on a workstation . Sometimes , we can accurately
upwards to a broader context and a decision is emitted when predict an event that will occur on a workstation just based
multiple signals in the broader context suggest a definitive on the previous event. But in many cases, accurate predic
shift in behavior. We build a multi -scale graph model from tion requires preserving a longer context or history . LSTMs
the original property graph, and then train Long -Short Term are a variant of RNNs that are specially designed to track
Memory (LSTM ) Neural Network to process information at longer- term dependencies .
each scale and emit a decision . [0016 ] However, as we remember a longer history and
[ 0012 ] In one context the event detection approach func seek to predict a specific class of anomaly , it is important to
tions similarly to the process of human reading. As humans identify which subset of machine- to -machine communica
read through a long news or research article, we read a single tions or events on a specific machine has a larger impact on
sentence at a time, while understanding of each sentence is the anomaly . We do not want to put equal emphasis on
accomplished through reading the words which make up the
?? everything that is held in the history or themodel' s memory .
sentence . Our reading and interpretation of the content is Also known as Attention Mechanism , we account for such
robust to the order and selection ofwords, as the same topic variable influence via training another layer of a Neural
can be described using a different combination ofwords , an Network for each of the LSTM models involved in the
obviously in different order. Also , starting from the word overall Neural Network architecture .
level, we constantly aggregate information to a higher level [0017] REASONING AND EXPLAINING RISK : While
of context. Thus as we process a stream of words and detection of an anomalous behavior is an important first step ,
sentences, we are able to draw multiple conclusions while automated verification and articulation of its criticality is
being robust to extreme variation in input. Our method for also important. Any large information technology infrastruc
detecting events in the graph model functions similarly . We ture generates a large number of anomalous event signals
treat individual communications between a pair of every day. Typically , such number overwhelmsthe ability of
machines , or an event such as a user starting a web browser the human workforce to respond to each of them in a timely
like words, the session in which such interactions or events fashion . Also , even when a human expert responds to an
occur like sentences. Therefore , detecting a large- scale event alert, the first step is to reverse engineer the cause of the
such as a new pattern of behavior for a user or a network anomaly , followed by looking ahead and determining the
device is similar to the process of extracting meaning from criticality of the event . Given the current system state and an
sentences in a document. event, the criticality of the event is typically determined by
US 2018/0103052 A1 Apr. 12 , 2018

looking for a high -probability sequence of potential future Knowledge Graph further enhances the applicability . In one
events that can lead to compromise of critical resources in embodiment we compute a set ofmetrics ( defined later) for
the infrastructure . Our second embodiment is aimed at every human -specified question -answer pair . Next, we gen
discovering a chain of potential events connecting the cur erate a large volume of question - answer pairs by first
rent system state to a future " breached ” system state where sampling entity pairs from different parts of the Cyber
an adversary has ownership of critical resources in the Knowledge Graph . Next, we generate candidate paths that
infrastructure . The human expert may not be notified if such connect each of the entities in a query . Since two nodes in
" event sequence " or " attack pathways " are low - probability . the graph can be connected via a large number of paths, we
We refer to this entire reasoning process involving identi only restrict ourselves to selecting discriminative paths
fication of event sequences to a system breach and ascer between an entity pair and then rank them using the afore
taining its likelihood as an “ Explanation Task ” . mentioned metrics.
[0018 ] An explanation task may be executed following [0022 ] Bootstrapping with a small dataset collected from
any anomalous event. If a high -probability " attack pathway ” human user studies , this mechanism enables us to automati
is found, it will be presented to the human expert as an cally expand the training dataset and learn a model with high
" explanation ” for him to investigate the anomaly. Before capacity.
proceeding further, it needs to be recognized that such
reasoning happens over an information representation that is [0023 ] Similar to our event detection method , using the
much abstract than raw events ( such as a user opening a web Deep Learning approach on the Cyber Knowledge Graph
browser or a mobile device communicating with a server ) . model requires us to transform relevant nodes and paths into
Therefore, we introduce " event nodes” in the Cyber Knowl- a continuous vector space representation . Our proposed
edge Graph , and seek to find potential sequences (or paths ) metrics to rank the user-specified question -answer pairs are
of event nodes in Knowledge Graph while leveraging other derived from this continuous vector space information rep
contextual information available in the graph . resentation . One proposed metric is “ Coherence ” . Given a
[0019 ] Finding paths in a graph structure is a well -defined path composed of Cyber Knowledge Graph nodes, its coher
problem in literature. However, it is important to recognize ence is measured in terms of tightness of the nodes in the
that the problem of finding path connecting two events ( from high - dimensional vector space. Another proposed metric is
an anomaly to a potential breach state ) is vastly different “ Specificity ” . Given a path composed of Cyber Knowledge
than earlier formulations . This is because most of the prior Graph nodes, its specificity is measured in terms of the
studies are done in the context of finding shortest paths relative importance of the nodes, and the frequency (or
connecting two nodes in a graph , or finding the shortest path rarity ) of the relationships that connects them .
between two nodes that satisfies user specified constraints . [0024 ] Our user studies indicate the answers with highest
Finding a route on a map with minimum distance or the coherence and specificity are most preferred . Next, we build
shortest route with specific constraints (such as having a gas a Deep Neural Network using two layers of Bi-Directional
station on the way ) are examples of such problem . Unlike Long Short - Term Memory Networks, an Attention Layer ,
the route -finding problem in maps, the best answer to a and a Dense Network layer. Given an event sequence and a
question in the cyber security context (e .g . why is an event pair of query entities, we develop an algorithm to perform a
a potential threat? ) is not completely derivable from a graph walk as guided by the model. Given a graph and two
data -driven model. In most circumstances , human experts query nodes , a graph walk is a process that involves stepping
use “ background knowledge ” to determine risk , where through intermediate nodes and finding a path that connects
" background knowledge” refers to information obtained the query nodes . While visiting a node during the walk , it
from prior experience , or information that is not explicitly involves exploring all the neighbors of the node to determine
available in the data . The lack of explicit information is where to visit next. We deviate from traditional methods in
driven by practical limitations in data integration , the bottle that our graph walk is steered by the machine learning
neck of human knowledge acquisition . The fast pace of model described as above . Given the query nodes , and a
evolution of software and hardware systems is another partial trajectory the machine -learning model predicts a
reason behind this challenge . Security properties associated point in high -dimensional vector space to visit next. We
with the software and hardware systems in a cyber infra implement a membership operator that provides a reverse
structure evolve faster than we can systematically collect mapping of the vector space point to a node in the graph and
information about them . select it as the next node for visiting .
[0020 ] This motivates us to embrace a human - in - the - loop [0025 ] Finally, we provide a commonplace example of
approach , where we collect examples of question- answer for such “ explanatory questions” and “ answering strategies” .
relevant analytics questions. The questions are posed in the
context of Cyber Knowledge Graph entity pairs and answers Assume that a financial analyst is tasked with discovering
are collected as pathways involving event nodes in the emerging business trends. He finds that Ford Motors have
Cyber Knowledge Graphs. We also prompt human users to recently imported a high volume of Palladium , which is
annotate the answers with a " rank ” and provide justification unusual given its past history of import- export records.
for ranking. Thus, we pose an “ explanation query ” with the pair (Ford
[0021] Next,we train a Deep Neural Network architecture Motors , Palladium ) and seek to find a meaningful path
through a Knowledge Graph that explains their relationship .
using the Long Short- Term Memory (LSTM ) network as a Given the a semantic graph representation of Wikipedia , we
building block . Robust training of the deep neural network find the three following answers:
models requires a large amount of data, which often out
weighs the amount of question -answer pairs we can solicit [0026 ] Ford is a Car Manufacturer, Car Manufacturer
from human experts . The introduction of question - answer make Cars , Cars contain Catalytic Convertors, Cata
metrics in terms of properties derived from the Cyber lytic Convertors uses Palladium .
US 2018/0103052 A1 Apr. 12 , 2018

[0027 ] Ford is a Manufacturer, Manufacturer makes readily apparent to those skilled in this art from the follow
Products, Products use Rare Earth Mineral, Palladium ing detailed description . In the preceding and following
is a Rare Earth Mineral. descriptions I have shown and described only the preferred
[0028 ] Ford has headquarters USA , USA was involved embodiment of the invention , by way of illustration of the
in cold war with Russia , Russia is major supplier of best mode contemplated for carrying out the invention . As
Palladium . will be realized , the invention is capable of modification in
Our ranking metrics indicate that the first answer has highest various respects without departing from the invention .
coherence and specificity . The second answer scores lower Accordingly , the drawings and description of the preferred
with specificity, while the third answer has lowest coher embodiment set forth hereafter are to be regarded as illus
ence . trative in nature , and not as restrictive .
[0029] RISK ESTIMATION AND RECOMMENDA BRIEF DESCRIPTION OF THE DRAWINGS
TION : Another capability of the present disclosure focuses
on recommendations for altering the cyber infrastructure [0033 ] FIG . 1 shows examples of various relationships
configuration to lower the possibility of a security breach . In between entities in a typical system .
one particular arrangement , the invention exists in a com [0034] FIG . 2 shows the formation of a multiscale struc
puter- implemented method of administering a resilient ture or neural network for modeling the attack .
cybersystem and a system for implementing such a method . [0035 ] FIG . 3 shows an application of the method of the
In particular the method includes the steps of creating a present disclosure in one application .
heterogeneous network of networks model of the cybersys [0036 ] FIGS. 4 and 5 shows additional examples of imple
tem using at least one tripartite arrangement containing at mentations of the present disclosure .
least two subgraphs ; each subgraph representing a distinct [0037 ] FIG . 6 shows an example of how graph -based
activity within the cybersystem , analyzing user traffic within models can be effective in determining a profile for a user.
the cyber system utilizing the model to ascertain anomalies [0038 ] FIG . 7 shows the interaction of models of various
within the cyber system and restricting activities such as a types of data can be arranged and presented .
user' s access to a portion of the cyber system based upon [0039 ] FIG . 8 shows an example of event detection in
identification of anomalies using the model. In some cyber network using hierarchical attention model .
instances , these subgraphs could include a user -host graph [0040] FIG . 9 demonstrates the embodiment of explana
and a host - application graph and in some cases the tripartite tion mechanism in a cyber / social graph and demonstrates
networks model could model the interaction between the how a connection between Person X and Malicious Act can
user, the host and the application . In some cases the analysis take place .
of user traffic includes determining the presence of a lateral
attack by finding a graph -walk problem . [0041] FIG . 10 shows a simple network infrastructure .
( 0030 ) The creation of a heterogeneous network of net [0042 ] FIG . 11 shows an embodiment of event detection
works model could include developing an abstraction of a for “ Flu ” using a twitter graph from a known location .
mission (such as an IT network or a node within the IT 10043 ] FIG . 12 shows explanation embodiment in social
network , or a device or account within the IT network ) in network like Facebook .
terms of concrete entities whose behavior can be monitored [0044] FIG . 13 shows an example of an explanation for an
and controlled . The restriction of access within the system alternative application such as in a stock market scenario
could be instigated based upon the identification of calcu 10045 ] FIG . 14 shows examples of a screenshot from on
lated reachability score above a predefined threshold . The instantiation of the invention .
predefined threshold in some instances can be coordinated
with various graded levels of access or activity restriction DETAILED DESCRIPTION OF THE
within the system . In some application the step of restricting EMBODIMENTS
further restricting access within the system includes calcu [0046 ] The description provided on the following pages
lating a reachability score reflective of the scope of a includes various examples and embodiments . It will be clear
potential attack , and tailoring the scope ofrestrictions to the from this description of the disclosure that the invention is
areas within the scope . This calculation can be performed not limited to these illustrated embodiments but that it also
with a greedy algorithm or other process appropriate for includes a variety of modifications and embodiments
performing such a calculation . thereto . Therefore the present description should be seen as
[0031 ] A resulting implementation of the method could illustrative and not limiting . While the descriptions are
result in a cybersecurity system having a security feature susceptible of various modifications and alternative con
that includes a heterogeneous network of networks model of structions, it should be understood, that there is no intention
the cybersystem , the model comprising at least one tripartite to limit the invention to the specific form disclosed.
arrangement having at least two subgraphs, that monitors 10047 ] Building resilient systems for use both today and
and reviews activity in the system and identifies and assess tomorrow requires algorithms to continuously answer ques
threats to the system and provides information to trigger tions akin to those posed to human experts : 1 ) is there a
compensatory measures to restrict access or activity within problem ? 2 ) What kind of problem is it ? 3 ) How can we
the system or otherwise mollify potential dangers to the change the system to dealwith the problem ? 4 ) What are the
system or portion of the system . In some instances the implications of these changes ? These questions naturally
tripartite model includes subgraphs representing a user -host map into decision making processes such as OODA loop
graph and a host- application graph or a user-host-application that involves observe ( collecting information ), orient (pro
model. cessing the in - formation ), decide (determining a course of
[0032 ] Various advantages and novel features of the pres action ) and act implementing the action ) phases. In this
ent invention are described herein and will become further disclosure these simple , critical questions are used to pro
US 2018/0103052 A1 Apr. 12 , 2018

pose solutions based on machine learning techniques valid user identities by compromising various non -descript
applied to graph theoretic models . Graphs have generally systems those users had used or logged in . By augmenting
been popular for modeling the connectivity across multiple the anomaly detection process to attribution of attack tech
entities, such as a network ofmachines . However, defending niques and intent, we dramatically expands the monitoring
today' s complex cyber infrastructures requires thinking in scope in a security infrastructure .
terms of graphs not only at the network traffic level, but [0051 ] Our data model and methodology is based on the
understanding security dependencies that are created by concept of graphs . A graph is defined as a data structure
relations between users, machines , and applications. describing the relationship (also referred to as edges )
[ 0048 ] In one set of embodiments these security depen between a set of entities ( also referred to as nodes ). For
dencies and relations can be categorized into various types, example , we can model network traffic as a graph with every
for example : Relations induced by Access Control: Typi unique internet protocol ( IP ) address represented as a node
cally , a userhasmultiple registered devices that interact with in the graph . Communications between the IP addresses are
the system . Therefore, infiltration of one device can poten summarized and represented as edges between the nodes. In
tially compromise other devices registered to the same other words, if two IP addresses exchange 10 , 000 packets
account. It is possible that no direct traffic between two during the observed period , we will summarize all of the
devices is discovered , but both may be controlled by an communications as a single edge and use the total
attacker from outside of the enterprise . Relations between exchanged bytes or packet duration as edge weight. The set
Applications: A content management system (CMS) may of entities and their relationships can belong to multiple
pull information from a database hosted in a different part of classes . In one embodiment we utilized a tri-partite cyber
the network . If the CMS is tied to a mission critical network model that captures relationships between three
application, its availability and integrity now are tied to the classes of entities (hence the name tri - partite ): In this
vulnerability of the database server , the network host where example this included Network hosts : a network enabled
the database service is running, and the user' s security device such as a personal computer, a mobile phone or a
configuration running the service ( e . g ., password strength , computer server. The host can have attributes such as
etc .). Many attacks begin with access obtained through resource type , unique ID . Users: an account used to control
social engineering techniques , such as sending phishing or perform actions on the machines, and Applications:
emails . Once a user clicks on such an email, the attackers software performing various local or cross -network actions
find a foothold within the enterprise and can traverse across the hosts. The relationships across these features can
through the network exploiting security dependencies . As also be modeled and correlated . As shown in FIG . 1 rela
the attackers navigate the complex web of security depen tionships across these entities could include 1 . Has -access
dencies, the defenders and the algorithms assisting them ( user, host ): Indicating user has access to host. Such rela
need to think in the same realm . tionship can also include an is - admin attribute and an
[0049] In one embodimentwe utilize three techniques to application- set attribute capturing the services or applica
address these issues. First, we describe a way to detect tions it can access on the host. 2 . Low -level event (host, host,
anomalies using a combination of graph -based algorithms application ): Indicating a network flow from one host to
and neural network . Second , we determine if an anomalous another through a specific application protocol. It may also
event presents a risk to an enterprise and automatically indicate an event that happens locally on a single host such
explain the risk to a user via historical example . Third , we as starting a browser without connecting to any server.
model complex interactions between users, machines, and [0052 ] From this framework a series of deep learning
applications and recommend actions to exploit this structure approaches were applied to the information in the tripartite
to minimize risk around impacted resources or users . The network model. Decisions by human analysts are not
following descriptions provide various examples of the directly driven by low -level information such as network
implementation of such techniques. traffic or event logs . They are correlated with knowledge
[0050 ] Our first technique has a dual focus. We seek to such as network topology, access control configurations , and
detect anomalous behaviors in a cyber system and map the behavioral patterns of different applications, services and
anomalous event to a specific attack tactics such as privilege users . Therefore , wemodel complex concepts such as events
escalation , defense evasion , command and control etc . Fur as a combination of entities drawn from the above set and
ther, we seek to map the attack tactics to known groups of their interactions. Insights derived from these data are fur
attackers . Such mapping allows an organization to deter ther distilled to higher level concepts such as attack tech
mine attacker ' s intent and reason about which resources to niques , and even potential actors and their intents . Formally ,
defend with higher priority. In the rest of the discussion , we such hierarchies can be represented by tree data structures as
refer to this task as dual-task of Anomaly Detection and shown below . A principal idea behind Deep Learning is to
Attribution (ADA ). Detecting an anomalous event alone is learn complex representations by composing simpler repre
not enough or useful for a cyber defender. The information sentations in a hierarchical or multi- layer fashion . Thus , we
technology infrastructure of any organization generates a define the ADA task as a tree- classification problem where
large volume of anomalous events every day . Also , the state the leaves of the tree are drawn from events described over
of the art systems typically detects the anomaly and leaves the tri -partite network model.
the determination of attack techniques and further projection [0053] FIG . 2 shows the formation of a multiscale struc
of intent to the human expert. The volume of events often ture or neural network for modeling the attack . We use a
outgrows the available resources in a security team . There multi - scale neural network approach to perform this task .
fore , most defense strategies focus on responding to anoma The first neural network operates on events described in
lies around high -valued or well-known resources , which is terms of entities and relationships from the tri-partite net
a limiting approach to begin with . Post-mortem analysis work model. Low -level events are aggregated for entities of
from most security breaches reveal that attackers assumed interest as drawn from the tri-partite network to form a
US 2018/0103052 A1 Apr. 12 , 2018

sequence . The second is then input into a neural network that (unordered sequence ) or a sequence of entities , an attention
generates a probability of the sequence corresponding to an function needs to compute the importance of that entity in
attack activity. The second neural network aggregates out that context. We compute the attention functions by extract
puts from the first network . At the second layer, the aggre ing node -level graph metrics (such as degree , page rank and
gation need not be restricted to single - entity levels . We distributional statistics derived from label and motif distri
process the tri - partite network model to generate multiple butions ) from the tri-partite network model and the derived
groups of interest . For example , a group of interest refers to multi -scale graph model.
a set of users with same role , where role is determined by [0057 ] Detection of an anomaly is followed by attribution.
behavioral patterns observed in the host-host interactions as Attribution is a critical step for explaining the evolution of
well as use of applications. Similarly , a group is defined to an attack or confirming the presence of a security breach . We
represent service clusters. A service cluster represents a set describe the task of attribution as returning a subset of data
of hosts providing application services that are related by the that provides a concrete example of the hypothesized attack
same dependency . or breach . Most of the state of the art anomaly detection
[ 0054 ] Generally, we extract such groups using role -min techniques focus on computing a point-wise or entity -level
ing techniques, where a role is aimed to group nodes in a divergence score , where the score indicates the degree to
graph with similar features , whereas the features may be which the entity deviates from its normative pattern of
both structural (machines that connect to many other behavior . Some systems also report pairwise anomalies,
machines primarily through specific applications are whereas a score is computed to measure the strength of
denoted as servers ) as well as attributes of the nodes and association between two entities. Typically , such scores are
edge . Such attributes include: the total duration of all computed via Link Prediction techniques . However, such
interactions a node participated in over the course of a day, computation of scores are limiting for attribution , or effi
the number of said interactions , the number of unique ciently explaining the event that caused the anomaly . For
applications in communications where the given node is the example , consider a software engineer who frequently logs
source IP , the number of protocols used in interactions into a computational cluster from his desktop . Assume that
involving the given node, and the total numbers of packets he also logs into company 's email server from his mobile
and bytes exchanged in communications involving the given using a virtual private network ( VPN ) gateway. However , if
node. we observe a login from his mobile to the computational
[ 0055 ] The neural networks classify sequences that repre system through the gateway, it will need to be flagged or
sent nodes in a graph . The graph may represent the tri- partite reported . Using pointwise scores ( that employees action
network model itself, or another graph such as the multi deviated from normal) or pairwise scores (that unusual login
scale tree - structure derived from the tri- partite network occurred from mobile to the VPN gateway ) is limiting as it
model. Processing of the graph data structures using neural does not fully explain the context. It leads to loss of
networks require a transformation of the graph data , in information or important contextual information.
which the nodes of the graph are mapped to a high [0058 ] This motivates us to consider approaches to use
dimensional vector space embeddings. In one example we data representations that are more descriptive than an ( entity ,
used the well - known node2vec algorithm for learning these score ) pair, or (entity -pair , score ) pair. Instead , we use a
embeddings . FIG . 3 shows an application of the method of subgraph -based representation to capture the event with
the present disclosure in one application . In this example the complete context. Next, we translate the event graph into a
composition ofmultiple neural networks that operate on the sequence using a canonical ordering defined by a domain
multi- scale structure are derived from a graph model repre based ontology . The canonical ordering ensures that events
senting collective behavior of users,machines and applica of similar types are translated into the same sequence even
tions in multi -source cyber applications . if their data elements appear in the data in differing order.
[0056 ] Attention mechanisms have been shown to be a Not all events are reported with the same level of informa
critical performance driver for deep learning algorithms. tion . In fact, it is rather expected to have different data
Attention mechanisms allow us to specify the importance of collection modules represent the same event in different
an entity in a given context. They allow the model to learn ways depending on their version and underlying software
its relative importance in the given context, without treating platforms. Finally , we use a neural network based algorithm
every entity with equal importance . Such mechanisms are to process the sequence generated from the event subgraph
continuously reflected in human conversations, where the and rank them .
relevance or value of a word , whether it is a name of an [0059] In one example we specifically used a variant of
entity, a verb or adjective , strongly depends on the context. Recurrent Neural Network such as Long -Short Term
In the cyber parlance , the communication with a specific Memory Network (LSTM ) to learn the vector space embed
class of server can be used to establish a baseline behavior dings for each node in the graph . Given a sequence repre
of a group of users with same role . However, communica sentation of an event subgraph , we compute the following
tion with the same server can represent an anomaly for a user scores: Coherence : Wedefine the coherence of a sequence as
in a different role . The attention mechanism also changes the variance of sequence elements ’ vector space embed
depending on the target context. Some attack techniques dings . Specificity : Given a sequence of elements represent
( such as the variants of a ping sweep ) are employed by ing graph nodes , we measure specificity as the median of
multiple attacker groups, and is often a first step for explor degree (or other graph - theoretic metric such as PageRank or
ing a breached infrastructure. Therefore, while observing a Centrality ) values extracted from the sequence . Assuming
" ping sweep ” event is useful to conclude an attack activity the graph node embeddings are learned with sufficiently
from low - level event observations , it is not very useful to high accuracy, our experiments and user studies reveal high
determine the specific intent behind the attack . Specifically, effectiveness of these metrics to revealmeaningful relation
given an entity and a specific context represent as a set ships between a set of entities . Most importantly, these
US 2018/0103052 A1 Apr. 12 , 2018

metrics provide with a means to evaluate the likelihood of a (shown in green ), where another person “ Person Z ” under
subgraph occurring with a collection of entities in the “ Personal Stress " performed a malicious act.
absence of any labeled data . In an ideal condition , when [0064] FIG . 10 shows a simple network infrastructure.
labels are available for each such sequence, we can proceed Now , imagine we get some system alerts on a developer' s
further to train a model and classify the subgraphs as belong system . An analyst does not solely act on the alert. He
to specific categories. Examples of such arrangements are accounts for recent observed behavior in actual, high - vol
found in FIGS. 4 and 5 . ume observations (such as network traffic , event logs ) and
[0060 ] We illustrate the impact of the disclosure in mul determines if the involved resource or user account is
tiple cyber application scenarios as follows: Consider John , compromised . He also maps the event to a phase in the
an employee working on a Department of Energy project kill- chain , or tries to infer about potential attacker types and
who launches a job in his organizational cloud - based envi their intents. We refer to this as phase 1 and denote as
ronment. Our goal is to categorize that activity as normal or Abstract Event Detection task . In short , phase 1 will aim to
unusual. FIG . 6 shows how graph -based models can be connect an event reported by the analytics to an attacker .
effective in determining a profile for John and establish Next, the analyst tries to reason about risk . If a host or a user
normative behavior. We first develop a behavioral profile for is compromised , does it present a risk to the enterprise ? This
John , which is obtained by aggregating multiple data requires finding a potential path from the compromised
sources into a single knowledge graph . As FIG . 6 shows, resource to a compromised state of a high - value target. We
each edge between nodes may indicate a different relation , denote this as phase 2 and denote as Explanation (or
potentially learned from a different data source . FIG . 6 is a reasoning ) task .
knowledge graph capturing an employee 's profile . The text [0065 ] FIG . 11 shows an embodiment of event detection
in uppercase points to a potential data source from which the for “ Flu ” using a twitter graph from a known location . A
relation was discovered . location consists of group of users and users can be mapped
[0061] The following illustrates a graph -based model of an to set of keywords . The hierarchical attention model will be
event in which John launched an unusual job on a cluster. used to detect event of interest at a given location using
Starting with a basic job description , we use the graph keyword that are relevant to a given event. FIG . 12 shows
structure to infer the type of software or systems he is likely explanation embodiment in social network like Facebook .
to use for daily work . For example , researchers working on As indicated , a person of interest may be connected to
machine learning techniques are more likely to use graphi terrorist organization through meaningless paths like “ living
cal processing unit based systems. We extend such inference in USA ” . Explanation generation using coherence and speci
abilities by studying the context of John 's projects that ficity captures the relevant path as the chain of “ friendship ”
allows us to reason about the network resources he should relations shown in green . FIG . 13 shows an example of an
use . At this point, we know that John is a statistician working explanation for an alternative application such as in a stock
on high - energy physics projects . Finally, we ingest high market scenario attempting to answer the question : Why
speed telemetry and log data streams and observe that John “ Ford imported Palladium ” Explanation model using coher
ence and specificity clearly distinguishes answer at right
launched a " SolarWinds” process, a well-known network (Ford - > US - > UN - > Russia - > Palladium ) as being unrelated to
mapping tool (see FIG . 7 ). This is clearly inconsistent with the question posed . Further between answer at left , vs one in
the behavioral patterns we have learned about statisticians middle , a human may prefer a more “ informative” answer
and high -energy physicists . By continuously running queries
that aim to detect behavior that is inconsistent with historic rather than a generic answer.
patterns, we conclude that John initiated a computing task [0066 ] Defining lateral movements as graph -walks allows
that can be risky for the overall enterprise . FIG . 7 shows an us to determine which nodes in the tripartite graph can be
example of how multiple data sources (high -speed telemetry reached starting at a given node. From an attacker' s per
data and behavioral patterns obtained from knowledge spective , these nodes that can be “ reached ” are exactly those
graph ) can be unified in a single model for decision -making . mission components that can be attacked and compromised
via exploits. The greater the number of nodes that can be
[0062 ] FIG . 8 shows an example of event detection in reached by the attacker , the more “ damage” the attacker can
cyber network using hierarchical attention model. FIG . 8 render to the mission or resource . Given a system snapshot
shows embodiment of event detection using hierarchical and a compromised workstation or mobile device, we define
model in a cyber event graph . The green nodes indicate low the “ Attacker 's Reachability ” as a measure that estimates the
level events observed in network or machines, blue nodes number of hosts at risk through a given number of system
indicate an attack type and red node indicates the attacker exploits . From a defender ' s perspective , putting some defen
group . An attacker group can be mapped to its known attack sive control on one of these nodes (or edges) allows the walk
signatures, while each attack type can be mapped to to be broken at that point and the mission to be protected .
sequence of low level events. The hierarchical model learns Such walk breaks can then be used to identify mission
these dependencies, enabling the identification of low level hardening strategies that reduce risk .
event to higher level attack types; and subsequently to a set [0067] Given the heterogeneity of a cyber system we
of set of attacker groups . consider effective hardening strategies from different per
[0063 ] FIG . 9 demonstrates the embodiment of explana spectives . Rather than focusing on manipulating the network
tion mechanism in a cyber/ social graph and demonstrates topology under the assumption that the graph is homoge
how a connection between Person X and Malicious Act can neous (all nodes have an identical role in a cyber system ), we
take place . As indicated multiple paths may exist in the instead present a framework formodeling lateral movement
graph connecting “ Person X ” to “Malicious Act ” . Using attacks and propose a methodology for designing mission
path coherence and specificity metrics for attention , we critical cyber systems that are resilient against such attacks.
identify a reasonable explanation using historical example By modeling the enterprise as a tripartite network reflective
US 2018/0103052 A1 Apr. 12 , 2018

of the interaction between users, machines and applications nodes (or edges ) allows the walk to be broken at that point.
and propose a setof procedures to harden and strengthen the The heterogeneity of a cyber system entails a network of
network by increasing the cost of lateral movement and networks (NoN ) representation of entities in the system
preventing intrusions and predefined locations. allowing us to devise effective hardening strategies from
[0068 ] In one example we model lateral movement attack different perspectives. Examples of a screenshot from such
on a mission as a graph -walk problem on a tripartite user a configuration is found in FIG . 14 .
host application network that logically comprises two sub 10072 ] This differs from other models which focus on
graphs : a user-host graph and a host- application graph . The manipulating the network topology under the assumption
tripartite network consisting of a set of users, a set of hosts that the graph is homogeneous , that is, all nodes have an
and a set of applications is shown and modeled (examples of identical role in a cyber system . Ourmodel commences from
modeling tools for such activities could be a Markov Ran a distinct premise namely that the cyber system is hetero
dom Field . In addition , interaction between the network and geneous and thus incorporates several defensive actions for
a user, their device, and a particular application are then also enhancing the resilience to lateral movement attacks. Fur
modeled . (While these items are described it is to be ther examples and details of the proposed method are
understood that a variety of other items including load outlined in the additional information .
characteristics, access control, user role and network flow or [0073] By modeling lateral movement attacks as graph
other features. Preferably this is done with learned data but walks on a user -host - application tripartite graph , we can
it could also be done based upon a preprogrammed profile specify the dominant factors affecting attacker ' s reachability
instead . and propose greedy algorithms for network configuration
[0069 ] From these inputs and against these models , a changes to reduce the attacker 's reachability of various parts
series of comparisons can be made . This can be done for of the system . Wedescribe and characterize the effectiveness
example , using a Gibbs sampling to draw from these dis of three types of defensive controls against lateralmovement
tributions and to reflect various policies so as to protect the attacks, each of which can be abstracted via a node or edge
mission . With the understanding of these distributions, vari operation on the tripartite graph . We provide quantifiable
ous policies can be put in place to protect the network when demonstrations on the performance loss of the proposed
an out of place element is detected . The present methodol greedy algorithms relative to the optimal batch algorithms
ogy enables a System to be Protected Against Lateral that have combinatorial computation complexity. We apply
Attacks with Significantly Fewer (Approximately 1/3 ) our algorithms to a real tripartite network dataset collected
changes to the system and with less disruption to the from a large enterprise and demonstrate that the proposed
activities of the legitimate user. If a user (Charlie ) accesses approaches can significantly constrain attacker ' s reachabil
a network at a certain time each day (8 am ) from a particular ity and hence provide effective configuration recommenda
machine or device (his tablet) and accesses a select number tions to secure the system .
of applications ( email) a profile can be established and [0074 ] While described in a particular paradigm this
modeled mathematically as a distribution of such a combi model and its associated techniques have applicability in
nation of such factors by segmentation on a user -host access many other security domains. Two such areas are security in
graph . Against this model then activities from this user can online social networks (OSN ) and in mobile networks . The
then be compared and deviations from the model identified problem space in OSNs that we have in mind is that of
and analyzed and as necessary actions can be taken with attackers creating fake accounts to target the trust underpin
regard to the particular user on account of their out ofmodel nings of the OSN . The attacker using fake accounts estab
behavior so as to prevent lateral movement within the lishes trust relationships with unsuspecting users . This can
system and to protect mission components . then be used to launch different types of attacks on the OSN
[ 0070 ] For example , if user Charlie modifies his access such as spam distribution , wasting advertising clients '
configuration by disabling the access of the existing account resources by making the clients pay for spurious ad clicks or
( Charlie -2 ) to host H3 and by creating a new user account privacy breaches for users of OSN . Preventing lateral move
(Charlie - 1 ) for accessing H3 . An attacker cannot reach the ment in mobile networks is even more challenging , and one
data server H5 though the printer H3 if Charlie - 2 is com can easily see the necessity for constructing a heterogeneous
promised . In addition to this measure , othermeasures such graph model as we have done .
as edge hardening in host- application graph via additional 10075 ] In another aspect segmentation on user- host graph
firewall rules on all network flows to H5 through HTTP, can be used as a countermeasure for suppressing lateral
Node hardening in host-application graph via system update movement attacks . Segmentation works by creating new
or security patch installation on H5 and defining lateral user accounts to separate user -host in order to alleviate the
movements as graph walks allows us to determine which reachability of lateralmovement. Segmentation can remove
nodes in the tripartite graph can be reached starting at a some edges from the access graph GC and then merge these
given node. removed edges to create new user accounts . In doing so the
[0071] From an attacker 's perspective , the nodes that can sameaccess functionality is retained while the constraints on
be “ reached ” are the mission components that can be lateral movement attacks at the price of additional user
attacked and compromised via exploits . The greater the accounts are described . Various segmentation strategies are
number of nodes that can be reached by the attacker , the described hereafter.
more “ damage” he/she can render to the mission . Given a 10076 ]. In one embodiment; a graph containing 5863 users ,
system snapshot and a compromised workstation or mobile 4474 hosts , 3 applications, 8413 user -host access records
device , we can define the “ Attacker's Reachability ” as a and 6230 host -application -host network flows was accumu
measure that estimates the number of hosts at risk through lated . All experiments assume that the defender has no
a given number of system exploits. From a defender 's knowledge of which nodes are compromised and the
perspective, putting some defensive control on one of these defender only uses the given tripartite network configuration
US 2018/0103052 A1 Apr. 12 , 2018

for segmentation and hardening . To simulate a lateral move ior (authentication mechanisms, social profile of users etc.)
ment attack we randomly select 5 hosts (approximates 0 . 1 % are also envisioned for further extension .
of total host number ) as the initially compromised hosts and [0081 ] While various preferred embodiments of the dis
evaluate the reachability , which is defined as the fraction of closure are shown and described , it is to be distinctly
reachable hosts by propagating on the tripartite graph from understood that this disclosure is not limited thereto butmay
the initially compromised hosts . be variously embodied to practice within the scope of the
[0077 ] The initial node hardening level of each host is following claims. From the foregoing description , it will be
independently and uniformly drawn from the unit interval apparent that various changes may be made without depart
between 0 and 1. The compromise probability matrix P is a ing from the spirit and scope of the disclosure as defined by
random matrix where the fraction of nonzero entries is set to the following claims.
be 10 % and each nonzero entry is independently and uni What is claimed is :
formly drawn from the unit internal between 0 and 1. The 1 . A computer-implemented method of administering a
compromise probability after handing, kj, is set to be 10 2 5 resilient cyber system the method comprising the steps of:
for all k and j. All experimental results are averaged over 10 creating a heterogeneous network of networks model of
trials. Various segmentation strategies proposed on the user the cyber system using a knowledge graph comprising
host graph . In particular a greedy host -first segmentation at least one tripartite arrangement containing at least
strategy has been shown to be an effective approach to two subgraphs, each subgraph representing a distinct
constrain reachability given the same number of segmented activity within the cyber system ;
edges because accesses to high connectivity hosts (i.e., analyzing user traffic within said cyber system utilizing
hubs ) are segmented . For example , segmenting 15 % of the model to ascertain anomalies within the cyber
user -host accesses can reduce the reachability to nearly one system ; and
third of its initial value . Greedy segmentation with score suggesting modifications to the cyber system to restrict
recalculation is shown to be more effective than that without
score recalculation since it is adaptive to user-host access activity within the cyber-system based upon the iden
modification during segmentation . Greedy user - first seg tified anomalies using the model .
mentation strategy is not as effective as the other strategies 2 . The method of claim 1 wherein the sub - graphs include
since segmentation does not enforce any user -host access a user -host graph and a host- application graph .
reduction and therefore after segmentation a user can still 3 . The method of claim 1 wherein the tripartite networks
access the hosts but with different accounts. model is a user-host-application model.
10078 ]. In practice a user might be reluctant to use many 4 . The method of claim 1 wherein the step of analyzing
accounts to pursue his /her daily jobs even though doing so user traffic includes determining the presence of a lateral
can greatly mitigate the risk from lateralmovement attacks. attack by finding a graph -walk problem .
We also investigate the impact of user -host access informa 5 . The method of claim 1 wherein the step of suggestion
tion asymmetry on lateral movement attacks. Information further comprises the step of calculating a reachability score
asymmetry means that the defender uses complete user -host reflective of the scope of a potential attack , and tailoring the
access information for segmentation whereas the attacker scope of restrictions to the areas within the scope.
launching a lateral movement attack can only leverage the 6 . The method of claim 5 wherein the step of restricting
known user-host- access information . Lateral movement further comprises calculating a scope of restriction using a
attacks can be constrained when sufficient segmentation is greedy algorithm .
implemented and the user -host access information is limited 7 . The method of claim 1 further comprising the step of
to an attacker, otherwise a surge in reachability is expected . augmenting knowledge graph with information in a data
10079 ] Various hardening strategies can be proposed . stream to detect an abstract event” or a “ kill - chain ” event
Greedy edge hardening strategies with and without score based upon the modification of the changes in the modeled
recalculation have similar performance in reachability knowledge graph .
reduction , and they outperform the greedy heuristic strategy 8 . Themethod of claim 1 further comprising the step of
that hardens edges of highest compromise probability . The training the modeled knowledge graph using a Long -Short
proposed edge hardening strategies indeed appear to find the Term Memory (LSTM ) Neural Network to process infor
nontrivial edges affecting lateral movement. Node hardening mation at each scale and emit a decision .
strategies using the node score function _ and _ J lead to 9 . The method of claim 1 wherein the modeled knowledge
similar performance in reachability reduction , and they graph represents a hierarchical aggregation of information ,
outperform the greedy heuristic strategy that hardens nodes from high -resolution , narrower context to coarser -resolu
of lowest hardening level. tion , broader context.
[ 0080 ] Our experiments performed on a real dataset dem 10 . The method of claim 1 wherein the graph model is
onstrate the value and powerful insights from this unified transformed into a tabular representation where the features
model with respect to analysis performed on a single dimen learnt on every node preserve the topological properties
sional dataset. Through the tripartite graph model, the domi associated with its neighborhood .
nant factors affecting lateral movement are identified and 11 . The method of claim 10 wherein each node in the
effective algorithms are proposed to constrain the reachabil graph model becomes a point in multidimensional vector
ity with theoretical performance guarantees . The results space .
showed that our proposed approach can effectively contain 12 . The method of claim 11 wherein variable influence on
lateral movements by incorporating the heterogeneity of the the model is accounted for by training another layer of a
system . Generalizing the described applications into work neuralnetwork for each of the models involved in the overall
for k -partite networks to model other dimensions of behav neural network architecture .
US 2018/0103052 A1 Apr. 12 , 2018
10
13. Themethod of claim 12 further comprising the step of
creating additional layers in the model based upon succes
sively higher levels of coherence and specificity between
nodes .
14 . The method of claim 13 further comprising the step of
performing a graph walk to find a path that connects nodes .
15 . A cybersecurity system comprising :
a security feature that includes a series of circuitry that
operates a heterogeneous network of networks model
of the cyber system , the model comprising at least one
tripartite knowledge graph having at least two sub
graphs that perform at least one task selected from the
group consisting of modeling behavioral information of
each entity in the system , detecting anomalous events
in the system , providing an explanation ranking for
cyber -alerts , estimating risks and making recommen
dations by utilizing node trained modifications to the
cyber knowledge graph and perform graph walks
between nodes located therein .
* * * * *

You might also like