You are on page 1of 13

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO.

2, JUNE 2021 1321

From TTP to IoC: Advanced Persistent Graphs


for Threat Hunting
Aimad Berady , Mathieu Jaume , Valérie Viet Triem Tong , and Gilles Guette

Abstract—Defenders fighting against Advanced Persistent maneuvers whose objectives are to test the soldier readiness
Threats need to discover the propagation area of an adver- and attack effectiveness through simulations. In cybersecurity,
sary as quickly as possible. This discovery takes place through a these exercises help organizations keep their assets safe. The
phase of an incident response operation called Threat Hunting,
where defenders track down attackers within the compromised Red Team is composed of highly trained individuals playing
network. In this article, we propose a formal model that dis- the role of potential attackers motivated by a strategic objec-
sects and abstracts elements of an attack, from both attacker tive (e.g., stealing sensitive information, using organizations’
and defender perspectives. This model leads to the construction capabilities for malicious purposes, defeating the availability
of two persistent graphs on a common set of objects and compo- of victim’s services). The Blue Team defends the company,
nents allowing for (1) an omniscient actor to compare, for both
defender and attacker, the gap in knowledge and perceptions; and has to ensure that its assets are not compromised, in the
(2) the attacker to become aware of the traces left on the tar- event of the Red Team finding a vulnerability and exploiting it.
geted network; (3) the defender to improve the quality of Threat The Blue Team thus needs to rapidly remediate the incident to
Hunting by identifying false-positives and adapting logging policy control the Red Team’s network propagation and contain the
to be oriented for investigations. In this article, we challenge this threat. To estimate the effectiveness of their respective games,
model using an attack campaign mimicking APT29, a real-world
threat, in a scenario designed by the MITRE Corporation. We we can naively measure the time it took the Red Team to dom-
measure the quality of the defensive architecture experimentally inate the target and the time it took the Blue Team to detect
and then determine the most effective strategy to exploit data and respond to the attack. We believe that this measure would
collected by the defender in order to extract actionable Cyber be greatly improved with knowledge of the compromised com-
Threat Intelligence, and finally unveil the attacker. ponents, by the Red Team, from the victim’s network (i.e., its
Index Terms—Advanced persistent threat, tactics techniques propagation area) and how aware the Blue Team was of this.
procedures, threat hunting, IOC, SIEM. In this article, we formalize both defensive processes and
the attacker’s offensive approaches, allowing for confronting
their respective perceptions during the same attack cam-
I. I NTRODUCTION paign. The attacker’s perception of the campaign is built from
HE RISE of collective awareness of Advanced Persistent (1) the execution of his procedures chosen from among his
T Threats (APT) has required companies to reconsider their
approach to cybersecurity. The resilience level of a company’s
Tactics, Techniques and Procedures (TTP) [2]; (2) his exposed
resources during these executions; and (3) the victim’s com-
information system can be challenged today through a Red ver- ponents he compromised. The defender’s perception of the
sus Blue exercise that simulates a realistic attack campaign, attack is built from (1) the collected traces on the targeted
aimed at reproducing the behavior of the adversary. In these information system; and (2) the exploitation of these traces
exercises, the Blue Team seeks the Red Team in a Threat through his defensive procedures. The benefits of the proposed
Hunting operation [1], while the Red Team tries to carry out model are twofold. First of all, it provides a high-level repre-
their attack as stealthily as possible. These exercises allow for sentation of the attack campaign, allowing to quickly assess
testing the security controls, sensors, Security Information and the attacker and defender progressions. This representation
Event Management (SIEM), and incident response processes. can also be used by an omniscient third party to mea-
In such a game an attacker, the Red Team, intends to break into sure the success of a Red versus Blue exercise. Second, the
the infrastructure defended by the Blue Team. Such exercises model highlights how the defender can improve the efficiency
of attacking and defending are inspired by similar military of his detection process by tweaking the input (configura-
tions and rules) of some of his defensive procedures. Here,
Manuscript received October 14, 2020; revised January 5, 2021 and January we conduct an experiment with our model, using an attack
27, 2021; accepted February 1, 2021. Date of publication February 3, 2021;
date of current version June 10, 2021. The associate editor coordinating campaign issued from the public project Mordor [3]. This
the review of this article and approving it for publication was S. Scott- attack campaign mimics the real-world threat APT29 in a
Hayward. (Corresponding author: Aimad Berady.) scenario designed by MITRE for the purposes of ATT&CK
Aimad Berady, Valérie Viet Triem Tong, and Gilles Guette are with
CentraleSupélec, Inria, University Rennes, CNRS, IRISA, 35042 Rennes, Evaluations [4]. The confrontation of these two perceptions
France (e-mail: aimad.berady@irisa.fr; valerie.viet_triem_tong@irisa.fr; allows us to define a metric to estimate the deployed detection
gilles.guette@univ-rennes1.fr). chain efficiency.
Mathieu Jaume is with Sorbonne Université, CNRS, LIP6, 75005 Paris,
France (e-mail: mathieu.jaume@lip6.fr). This article is structured as follows. Section II provides
Digital Object Identifier 10.1109/TNSM.2021.3056999 an overview of the concepts manipulated in the model.
1932-4537 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1322 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

Sections III and IV present respectively the attacker and to the victim (some of them related to components of C), by
the defender’s perspectives. Section V details the experiment OA the set of objects relative to the attacker and by O the set
architecture and specifies requirements for integration in exist- OD ∪ OA .
ing infrastructures. Section VI discusses how to evaluate the
efficiency of the detection chain and how to enhance it. B. The Attacker Scope
The attacker can be an individual, a group, or a Red Team,
II. OVERVIEW but for the sake of clarity, we simply refer here to the attacker.
This article formalizes the Threat Hunting process con- In the same way, the presence of multiple attackers in the
ducted by an incident response team, towards ultimately victim’s network is not an obstacle because the ambition of
evaluating the efficiency of a detection chain. We start our pro- this model is to provide a more exhaustive view of the com-
cess with the attacker’s point of view; the attacker has initiative promised components. The attacker is at the initiative of the
and sets the tempo. attack campaign and has his own components at his disposal,
We begin by specifying the scope of this study and explain- which is part of his infrastructure. He also owns a collection of
ing the terms we will use in the rest of this article. Thus, this attack procedures (denoted TTPA ) often related to techniques
section successively details the infrastructure under considera- described in the MITRE ATT&CK matrix. These procedures
tion, the attacker’s scope, and then the defender’s scope before may be parameterized by objects from OD as well as objects
outlining how we will be able to compare their two points of from OA . These procedures are executed on components in the
view with the help of two graphs. targeted network (in C) only if the attacker has already dis-
covered these components. The attacker scope is completely
A. The Infrastructure formalized in Section III.
The targeted network is the information system hosting the
attacker’s final objective. In this network, we distinguish the C. The Defender Scope
components from the objects. The components are assets (i.e., The defender can be the victim itself, the security team
machines) with logging capabilities over which the defender of a company, an external security team, or a Blue Team.
has full control, such as computers, servers, or appliances. For the same reasons, we simply refer here to the defender.
The objects are measurable events or stateful properties rela- The defender has defensive procedures to cope with an attack
tive to malware characterization, intrusion detection, incident campaign. The defender only observes the events occurring
response or digital forensics. These objects correspond to the on components in C that he has chosen to monitor. The rele-
observable objects defined in the STIX standard [5]. Each vant events to be monitored by sensors are specified in their
object plays a precise role in the context of an event. This configuration S. The defensive procedure plogs allows him to
role specifies the function that the object holds in the event. generate traces from events that occur on these components.
The set of possible roles is here denoted by R. Each role r The defensive procedure phunting exploits these traces in an
is associated with a unique type that specifies the nature of attempt to identify in C the components compromised by the
any object playing this role. By denoting T the set of possible attacker. The defender scope is formalized in Section IV.
types, the type associated with a role is defined through the
function τ : R → T.
For our implementation, we used the MITRE Cyber D. Confronting Perspectives
Analytics Repository (CAR) [6] data model to name objects’ We propose here to represent the attacker network propa-
roles. An object can play different roles associated with dif- gation during an attack campaign by a persistent graph GA
ferent types. For example, the object maliciousfile.com between objects and components. This graph will be com-
can hold the role destination hostname with the type domain, puted from the sequence of attack procedures and the involved
the role file name, or even the role executable with the type file. objects. These objects represent characteristic information that
The three columns on the left of the Table I detail some exam- the attacker cannot conceal and is aware of exposing.
ples of objects and their relative types and roles. IP addresses, Similarly, we represent the defender perception of the
domain names, or files are examples of frequently observed attacker’s propagation by a second persistent graph GD . This
types. graph computation relies on the defensive procedures exe-
Although the attacker also has his own machines, which cuted by the defender. Figure 1 gives a global overview of
could be components, in this model, we only consider the vic- this process, including the attacker and the defender scopes.
tim’s components since those of the attacker cannot be reached The knowledge of the attacker and the defender of the tar-
by the defender, in a strictly defensive posture. Nevertheless, geted network are enriched by their own directory. For both
the defender may have an insight into some of the attacker’s the attacker and the defender, their directory determines how
objects (e.g., a domain name, a hash value, an IP) because the an object with a given type is relative to a component from
attacker would have exposed them. The attacker will, however, his own perspective. In the following, DA will be the attacker
be able to discover and gain access to part of the victim’s directory and D the defender directory. Both DA and D are
components through objects. For these reasons, this article possibly imperfect or incomplete; they are here represented by
highlights only the set of components of the targeted network, two functions from T × OD to C ∪ {None}. When the attacker
refering to it as C. We denote by OD the set of objects relative (resp. the defender) has no information concerning an object

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1323

Fig. 1. Model overview.

TABLE I
E XAMPLES OF O BJECTS , T HEIR R ELATIVE T YPES , P OSSIBLE ROLES , AND E XISTENCE IN B OTH D IRECTORIES

o with a type t or if the object o in a type t does not corre- observed by the infosec community. Subsequently, Roberto
spond to a component, then DA (t, o) is equal to None (resp. Rodriguez [3] was at the initiative of a dataset of logs recorded
DD (t, o) = None). The representation of a component c from on an infrastructure allowing the procedure execution from
the attacker’s point of view may be different from the repre- scenarios. We decide to merge these two scenarios because of
sentation of the same component c from the defender’s point their similarities in targeting the same infrastructure and the
of view. For example, the defender may know that a machine fact that they mimic the same APT actor. Among the traces in
has a given IP address, is a Web server, owns files, and has the dataset, we focused on those coming from the Sysmon sen-
a name, while the attacker only knows the IP address of this sor, which is the one recommended according to state of the
machine. In the Table I example, the two right columns show art [7]. Experimentation infrastructure is detailed in Section V,
if an object with a specific type is related to a component and the results of this experiment are discussed in Section VI.
using the attacker and the defender directory.
In the following, Section V explains how the comparison of
III. ATTACKER ’ S P ERSPECTIVE
GA and GD allows confronting the perceptions of the attack
campaign from both points of view: that of the attacker and The attacker is modeled through a graph of objects and
that of the defender. We also show how the Threat Hunting components. The components belong to the targeted network,
operation can be greatly improved by increasing the quality of with which the attacker interacts while executing an offensive
Indicators of Compromise (IoC) and Events of Interest (EoI) procedure. The objects represent the traces that the attacker
used by the defender. is aware of leaving on the target infrastructure while execut-
ing these procedures. This section details the construction of
E. Experiment: APT29 Simulated Campaign this graph by taking into account both actions, which define
the whole attack campaign, and progression of the attacker’s
In the context of cybersecurity product evaluations, MITRE knowledge about the targeted network.
creates attack scenarios inspired by real-world threats. Two
of them concern the so-called APT29, which is a state-
sponsored attackers group that has been active since 2008. A. Actions of the Attacker
During these attacks, the attacker used different procedures to An attack campaign A is composed of a sequence
collect and exfiltrate sensitive files from the targeted network e1 , . . . , eN of executions of N attack procedures p1 , . . . , pN
after exploring it. These two scenarios detail the steps of an on components in C of the targeted network. We assume here
APT campaign using the threat actor’s TTPs as they were that the attacker knows at least one initial component of the

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1324 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

targeted network. This component could have been discov-


ered by the attacker through an external reconnaissance of the
exposed services provided by the targeted network or even
by a social engineering attack. When the attacker has com-
promised at least one component of the targeted network, he
is able to apply attack procedures inside it. Here begins the
Network Propagation phase [8].
The execution e of an attack procedure p ∈ TTPA requires
the knowledge of:
• A Machine M(e): a component c ∈ C named through
a relative object o and a type t occurring in its direc-
tory DA where the procedure will be executed (thus
c = DA (t, o)). This machine is typically the host where
the procedure will be executed, satisfying a technical
intention (also known as Tactic) such as Privilege
Escalation, Discovery or even Persistence;
• Some Parameters ((o1 , r1 ), . . . , (on , rn )): objects oi in
O with their roles ri that configure the procedure;
• Some Sub-Executions (e1 , . . . , em ): corresponding to
the invocation of sub-procedures orchestrated by p and
such that for each ei , the host component M(ei ) Listing 1. Attack Procedure: Lateral Movement Using psexec.
is accessible through a relative object appearing in
((o1 , r1 ), . . . , (on , rn )) the parameters of p.
An execution e (designated by a unique identifier ide ) of an attacker’s directory. This graph represents one step of
attack procedure p ∈ TTPA on a component c = DA (t, o) = his attack campaign. Formally, G(e) is defined for e =
None can thus be formalized by: (ide , p, (o, t), ((o1 , r1 ), . . . , (on , rn )), (e1 , . . . , em )) by:
m

e = (ide , p, (o, t), ((o1 , r1 ), . . . , (on , rn )), (e1 , . . . , em ))
G(e) = (Ve , →e ) ⊕ G(ei )
Listing 1 gives the example of an attack procedure com- i=1
monly used by the attacker. This procedure performs a and computed from:
lateral movement using psexec. It allows the attacker to • the nodes Ve , which are the host component, all the
execute a command-line process on a remote machine and objects used in the procedure execution, and their poten-
redirect console applications output to its local system. In tial relative components known by the attacker:
this example, the host component for the main procedure  
is SCRANTON.dmevals.local, the component for the {M(e), o1 , . . . , on } ∪ c = DA (τ (ri ), oi ) = None
sub-procedure is NASHUA.dmevals.local. The attacker
launches the malware file named python.exe remotely, • the edges →e connecting the host component with param-
with the user’s privileges of pbeesly on both sides. eters (objects) and the components linked to these objects
according to the attacker’s directory:
n 

B. Attacker Propagation in the Targeted Network From the   ref 
ide ,ri
M(e) −−−→ oi ∪ o i −−→ c |
Attacker Point of View
i=1
c = DA (τ (ri ), oi ) = None
We represent here the attacker propagation by a graph whose
m
nodes are the objects involved in the attack campaign and their • and the union of graphs i=1 G(ei ) issued from sub-
potential relative components from the attacker perspective. procedures called during e. The operator ⊕ denotes here
Each execution of an attack procedure requires the attacker the classical union between two graphs.
to use objects and components either from his own resources Figure 2 presents graph G(psexec) computed from the
or objects and components he has already discovered in the specific psexec attack procedure execution as previously
targeted network. In this model, we consider that the attacker’s described in Listing 1. In this graph, dark blue nodes are
knowledge about the targeted network is frozen and already components and light blue nodes are objects. The labels on
fully described through his directory DA . If we wanted to thin black edges define the object role in the procedure. The
make this directory dynamic, it is necessary to formalize a thick green edges define objects that are related to another
Discovery procedure that allows the attacker to improve component according to the attacker directory. Given that a
his knowledge of the targeted network infrastructure. Lateral Movement procedure, by definition, involves two
1) Small Step Propagation: From each execution e components, both are represented on the graph, connected by
of an attack procedure p, we build an oriented graph “ref” edges.
G(e), whose nodes are the objects and their relative 2) Big Step Propagation: Finally, an attack campaign A
components involved in this execution referenced by the composed of executions e1 , . . . , eN can be modeled by the

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1325

side. To this end, the defender uses his own defensive proce-
dures. The semantic framework proposed in this article allows
for generalizing offensive or defensive procedures. We specify
that we have already identified several other tactics that would
be generalizable. These include, for example, pdiscovery , which
consists for the attacker to browse a namespace to discover
new components.
In the following, we describe two main defensive proce-
dures: plogs and phunting , which must be implemented by the
defender to conduct incident response operations. Attack cam-
Fig. 2. G(psexec). paigns against the targeted network generate events on its com-
ponents. We represent an event ev by a tuple (, c, O) where
 ∈ E denotes the type of ev (with E the set of event types
that can be observed), c ∈ C is the component over which the
event is observed, and O = {(o1 , r1 ), . . . , (om , rm )} ⊆ O×R
contains all the objects involved in this event associated with
their role from the defender’s point of view.
Depending on the configuration of sensors, events can gen-
erate traces. A trace x is a tuple (idx , t, , c, O) where idx is
the trace identifier, t is a timestamp,  ∈ E is the type of
events causing this trace, c ∈ C is the component where the
event causing this trace has been observed and O are all the
objects together with their roles involved in this trace. A single
object can assume several roles in a single event and therefore
appears several times in the O set. In contrast, each role is
unique within the same trace.
In a Threat Hunting process, the defender can influence the
quality of his results on three aspects: sensor configuration,
detection rules, and his IoC database.
Fig. 3. GA computed by the attacker during APT29 simulated campaign.
First, the defender decides through his defensive procedure
plogs which components on the targeted network are moni-
tored by sensors and which events produce traces that will
graph GA (A) resulting from the union of all
the graphs asso- be forwarded to the Security Information Event Management
N
ciated with each execution: GA (A) = i=1 G(ei ). The (SIEM). This procedure, detailed in Section IV-A, will gener-
computation of this graph allows the attacker to represent ate a huge number of traces allowing the defender to identify
all the objects he exposes during his attack campaign. This the traces relative to the attacker’s activity.
graph can be seen as an attack footprint exposed to a defender. The second defensive procedure called phunting , detailed in
Moreover, in a Red versus Blue exercise, this graph allows an Section IV-B, exploits those traces.
omniscient team (i.e., the White Team) to measure the dis-
tance between what the Blue Team has actually found and the
attacker’s actual footprint. A. Targeted Network Monitoring
Figure 3 presents the graph GA computed by the attacker Technically, the defender is able to monitor almost any event
during the APT29 simulated campaign in 49 steps corre- occurring on the targeted network’s components. Nevertheless,
sponding to execution procedures, 4 compromised components many of these events tend to be irrelevant from a security point
SCRANTON, NASHUA, NEWYORK and UTICA, which are the of view and may generate too many false-positives in a SIEM
four on which the attacker actually executed procedures dur- alerting system.
ing the attack campaign. Those 4 components are also defined As the traces raised by the sensors are the only way for
in his directory. the defender to perceive a part of the attacker’s activity, the
defender has to pay very close attention to the configuration S
of sensors. The more meaningful a trace is, the more valuable
IV. D EFENDER ’ S P ERSPECTIVE it is to the defender.
We consider the same attack campaign but now from the We assume here that all components can be observed, and
defender’s perspective. The defender is never considered to their monitoring is configured by a sensor configuration S.
have initiative. The aim of the defender is to compute a rep- In this configuration, the defender defines the relevant event
resentation of the attacker propagation in the targeted network types for each component c ∈ C, which need to be traced and
from his (the defender’s) point of view. In other words, his goal filtered on objects according to their roles. The configuration
is to compute a graph of objects and components that are as S(c) makes it possible to monitor the component c. The con-
similar as possible to the one produced by the attacker on his figuration is defined by a set of tuple (, φ, R ) where  ∈ E

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1326 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

objects). The first option will generate too many traces and
risks producing many false-positives. The second option will
rarely be positively satisfied and thus will produce a very few
numbers of traces, which will lead the defender to misjudge
the threat.
Finally, the defender’s defensive procedure determines the
monitored events and the reported traces in the procedure
plogs . This procedure is parameterized by S that specifies the
configurations associated with components and by E a set of
events that occurred on the components.
plogs generates a set of traces Xc for each component c:
Listing 2. Sysmon Sensor Configuration Example, Part of a plogs Procedure.

TABLE II
R ELEVANT ROLES IN A S YSMON T RACE AND T HEIR CAR NAME

Listing 3 is an example of a trace raised by Sysmon sen-


sor. It happens because the configuration in Listing 2 observed
a Network Connection event for which the object with
the role Image (i.e., executable file name) matched with
an expected location in the filesystem (i.e., the Windows
Temp folder). The sensor thus logged it. The produced trace
indicates that this event has occurred on the component
NASHUA.dmevals.local and has some objects to give
more information to this event, such as DestinationIp,
User, or the full path of the Image.
is the event type raised by the sensor when the condition φ
holds true. φ expresses properties on the objects involved in B. Attacker Propagation From the Defender’s Point of View
the observed event (, c, O) and their corresponding roles. We We assume here that all traces produced by the sensors are
write O |= φ when objects in O satisfy condition φ. The reported to a SIEM. The second defensive procedure used by
syntax of φ and satisfaction relation |= are defined in the the defender is the procedure phunting . This procedure helps
Appendix. the defender to construct a graph GD from the observed traces.
R ⊆ R contains the roles considered to be relevant to The graph GD is built on the same model as the graph GA
this type of event. Objects with one of these roles could constructed by the attacker: nodes are objects or components,
be exploited in a Threat Hunting approach because of their edges between two nodes indicate that these objects or com-
searchability. For example, the string 10.0.1.4 (the object) ponents are relative in this attack campaign from the defender
with the role dest_ip and the observable type IP address point of view. The graph GD is not built directly from the
can be searched in the logs. raw set of traces reported to the SIEM because these traces
Listing 2 presents the part of a Sysmon configuration. cover a lot of objects and components irrelevant to the hunting
Sysmon is Windows service that monitors and logs system process. It is for this reason that the defender has to filter the
activity. In this example, the monitored component is the traces reported to the SIEM to focus only on traces dealing
machine running on Windows; the condition φ expresses with an Event of Interest (EoI).
that Sysmon will generate ProcessCreate traces for all 1) Highlighting Events of Interest: The defender relies on
the Process creation events, except if the command line a database of Indicators of Compromise IoC and a set of
is exactly \SystemRoot\System32\smss.exe (false- detection rules R to define the traces that have to be consid-
positives are frequently generated, possibly due to legitimate ered as Events of Interest. An Indicator of Compromise (IoC)
command line). Moreover, this Sysmon instance will generate is an object o with a type t that indicates, with high confi-
NetworkConnect traces for all images (i.e., PE binary files) dence, malicious activity on a network. An IoC is similar to
stored in C:\Windows\Temp filesystem folder. Table II an artifact generated along with a malicious activity. Table III
gives examples of relevant roles R for these two event types. gives an extract of the IoC database used by the defender
As a model of system-level events, we use the MITRE Cyber during the APT29 Threat Hunting process. In this example,
Analytics Repository (CAR) [6]. objects like toby or m.exe with the observable types of
Finding an adapted sensor configuration is very tricky user and file respectively, are searchable in a local scope.
because the defender has to find an optimal position between Those are local IoC. The defender decides this classification
logging every single observed event and defining highly because the user has been created in the target information
restrictive conditions (including specific or excluding generic system, so it makes little sense to look for it globally,

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1327

Listing 3. Extract of a “Sysmon/Network Connection” Trace.

TABLE III
E XTRACT OF THE I O C DATABASE U SED BY THE D EFENDER D URING
APT29 T HREAT H UNTING P ROCESS

Listing 4. Extract of Sigma Detection Rule1 for psexec.

in other victims’ networks, due to the high risk of false-


positives caused by its detection. However, cod.3aka3.scr
the creation of a graph G(x, Rx , Ox ) (where EoIR,IoC (x) =
and 9d1c5ef38e6073661c74660b3a71a76e, with the
(Rx , Ox )) representing this trace. The nodes of this graph are:
observable types of file and hash respectively, are searchable
• the component c where the trace has been collected;
in a larger scope and may make sense in other networks: those
• the objects o occurring in O;
are global IoC.
• the components referred by an object from the trace in
A database IoC = {(o1 , t1 ), . . . , (on , tn )} is maintained by
the defender’s directory {c = DD (τ (r), o) = None |
the defender who can update it through sharing his knowledge
(o, r) ∈ O}.
with other security teams (global IoCs) or through his own
The edges in G(x, Rx , Ox ) permit to connect:
investigations on his network (local IoCs). A detection rule
• all the objects o ∈ O to the component c:
r ∈ R expresses a defender-specific condition specifying that
a trace has to be considered as an Event of Interest. We write  r,R(o,r) ,is_ioc
x |= r when the condition specified by r is satisfied by the c −−−−−−−−−→ o
trace x. The syntax of rules and satisfaction relation |= are (o,r)∈O
inductively defined in the Appendix. Each edge is labeled by the role of o, the subset R(o,r)
Finally, the set of detection rules R and the IoC database of Rx defined by:
allow the defender to specify the function EoIR,IoC , which fil-
ters the traces in order to highlight the traces that are Events of R(o,r) = {r ∈ Rx | (idx , t, , c, O \ {(o, r)}) |= r }
Interest exclusively. More precisely, EoIR,IoC (x) provides two
sets: the subset Rx of R containing rules satisfied by x, and the and by a boolean is_ioc, which is true iff there exists
subset Ox of IoC containing objects occurring in x. Hence, a (o, t) ∈ IoC such that τ (r) = t. Hence, to remove the
r,R(o,r) ,is_ioc
given trace x is an Event of Interest if at least one of these sub- edge c −−−−−−−−−→ o from G(x, Rx , Ox ) it suffices to
sets is not empty. Formally, given a trace x = (idx , t, , c, O) disable rules in R(o,r) and to remove (o, t) from IoC
this function is defined by EoIR,IoC (x) = (Rx , Ox ) where: when is_ioc is true.
• and all the components c referenced in the defender
Rx = {r ∈ R | x |= r } directory c = DD (τ (r), o) = None (such edges are just

(o, t) | (o, t) ∈ IoC and labeled by ref).
and Ox =
(o, r) ∈ O and τ (r) = t Figure 4 gives an example of graph computed by the
defender from the trace described in Listing 3. This graph is
Listing 4 details a rule commonly used by a defender to detect thus the perception of the execution of the psexec attack pro-
execution of psexec by analyzing process_creation cedure on the network. At the center of the graph in Figure 4
event types observed by sensors such as Sysmon with an object is the component on which the trace has been observed.
matching *\PsExec64.exe. The object has the role Image Around this component are the relevant objects involved in
(i.e., executable file name). Following this detection, the EoI the trace. This graph can contain objects similar to the small
function returns, in particular, the trace (see Listing 3), which step attacker’s graph, presented in Figure 2.
resulted in the verification of the condition set out in the rule.
2) Small Step: Each observation of a trace x = 1 https://github.com/Neo23x0/sigma/blob/master/rules/windows/other/win_
(idx , t, , c, O) considered as an Event of Interest leads to tool_psexec.yml

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1328 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

of Interest are candidates to become IoC and, in partic-


ular, objects with roles corresponding to types of observ-
ables such as IP address, hashes or domain. Although
Kurogome et al. [9] have already proposed to automate this
function generate_IoC, the intervention of an expert can be
considered. The defender therefore has two main defensive
procedures plogs , which he uses to designate the components
to monitor and information to report. The phunting proce-
Fig. 4. Computed graph from a single trace which is considered by the
defender as an EoI. dure completes this first defensive procedure by computing the
defender’s graph and updating the IoC database for each trace
considered as an Event of Interest. phunting can be formalized
as follows. Starting from scratch, GD = (∅, ∅).

During a Threat Hunting operation, the defender goes to


build from GD , a restricted graph that highlights the objects
shared between several components. This means that these
objects with relevant roles and involved in Events of Interest
have been observed on at least two components. This graph
is useful for orientation during the hunt.

V. M ODEL E XPERIMENTATION
Integrating our approach in a production environment would
allow both the attacker and the defender on either side to
become aware of the traces left on the victim’s network. The
Fig. 5. GD computed by the defender during APT29 simulated campaign. attacker could use this information to improve the stealthiness
of his procedures, and the defender could use it to improve his
Threat Hunting process. In practice, its deployment in existing
architectures requires:
3) Big Step: Finally, an attack campaign A observed by • On the Attacker’s Side: a logging system for executed
a defender through a collection

of traces X can be modeled procedures formatted according to Listing 1; a directory
by the graph G(X ) = x∈X G(x, Rx , Ox ) resulting from the that corresponds to the attacker’s current knowledge of
union of all the graphs associated with each Event of Interest the victim’s network;
computed from a set of traces X. • On the Defender’s Side: sensors installed on defended
Figure 5 is the graph computed by the defender during the components and configured as presented in Listing 2
APT29 simulated campaign with rules from the public project and whose relevant roles are specified as presented in
Sigma. Section V gives all the details on its computation and Table II; a directory that corresponds to the defender’s
discusses how to deal with the objects present. current knowledge of the victim’s network; an indexer
The graph on the Figure 5 has to be compared with the graph enriched with a set of detection rules as in Listing 4; an
representing this same attack campaign from the attacker’s IoC database, such as in Table III, an analyst for tasks
perspective on the Figure 3. that are not automatable to date, such as IoC generation.
4) Quality of the Defender Perspective: The quality of In this article, we take the point of view of an omniscient
the graph G(X) constructed by the defender is influenced by actor, which allows us to answer the following questions.
three parameters: the configuration of the sensors S, the set How to measure the quality of defensive architecture?
of detection rules R database, and the IoC database. In a The comparison of the graphs GA and GD allows calcu-
Threat Hunting process, updating the sensor configuration or lating the coverage rate of objects coming from the attacker
the set of detection rules is too long and too impactful to be into the defender’s graph. This allows estimating the relevance
done straightaway. On the other hand, the IoC database can of the detection chain.
and must be updated each time a trace is considered to be How to reduce the defender’s graph to unveil the attacker?
of interest by the function generate_IoC((Rx , Ox ), x, IoC). Too many objects in the defender’s graph GD can make it
All objects that appear in a trace considered as an Event unusable. We are therefore looking for ways to reduce the

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1329

Listing 5. Part of All Procedures From the Mordor APT29 Attack Campaign.

Fig. 6. APT29 Evaluation Environment (source: MITRE3 ).

number of objects while maintaining a sufficient coverage rate


to provide potential IoC to a Threat Hunting team.
For this, we exploited traces of a realistic attack scenario on B. Targeted Infrastructure and Defensive Architecture
defensive infrastructure representative of that which we find
in modern companies. The targeted infrastructure, on which the scenario that
produced the Mordor APT29 dataset took place, has been
reproduced according to the environment described by MITRE
A. Attack Scenario
as part of their ATT&CK Evaluations as shown in Figure 6.
We choose to experiment our model with an independent The victim’s network consists of three workstations, one file
and representative attack scenario. We rely on the cyberse- server, and one domain controller. All ran Microsoft Windows
curity project Mordor [3] maintained by Roberto Rodriguez. operating systems. The targeted infrastructure’s systems are
The Mordor project provides pre-recorded security events gen- monitored by Microsoft Windows Sysmon, which provides
erated after simulating adversarial techniques. It was updated detailed information about processes, network connections,
in 2020 with a new dataset called APT29. The dataset pro- and file manipulations. Microsoft Windows Sysmon produces
vides the logs built by replaying both parts of an attack the traces used to compute the defender view. These traces
scenario designed by MITRE in the context of their ATT&CK form a dataset of 783367 log entries corresponding to two
Evaluations [4]. The attack scenario emulates a 2-part attack days of observation (attack duration).
led by the threat group APT29. The attack aims to collect We have injected all the traces from the Mordor dataset
and exfiltrate sensitive data. The first part is a rapid “smash into a Splunk4 indexer. In this raw dataset, we now have to
and grab” collection and exfiltration of specific file types after reveal the Events of Interest (EoI). In this experiment, the
an initial infection due to a widespread phishing campaign. set of rules R is formed by the 565 commonly used rules
Then the attacker drops a toolkit used to further explore and from the public detection project Sigma.5 Sigma is an open
compromise the network. The second part is a targeted and community project which aims to capitalize on detection rules
methodical breach. It is a low and slow takeover of the target. sharing the same formalism and which are thus convertible to
A part of the attack procedures is described in the Listing 5. a large number of SIEM or directly integrated into malware
The complete list of attack procedures is described on the analysis platforms such as VirusTotal.6 At the beginning of
MITRE-Engenuity website.2 the experiment, our Indicators of Compromise IoC database
Two videos were also recorded by the author which gives an is empty. The first pass made it possible to detect 6100 EoI,
informal understanding of the attacker’s perspective during this thanks to the matching of 22 detection rules among the 565
attack and of the objects that he was aware of exposing to the enabled rules. We can now build the defender’s graph with
defender. We wrote all the attack procedures of this campaign these EoI.7 The computed graph has 4 components and 1758
according to the format presented in Listing 1. This allows us unique objects. Table IV presents this graph’s specifications.
to build the attacker graph GA . Thus for each procedure, a Through this experimentation, we evaluated the relevance
central node that corresponds to the component is created. It of two rule-disabling strategies in order to determine the
is then connected by edges to each of the objects, presumed approach, which is the most efficient to reduce the number of
to be the attacker’s, to be exposed during the execution of false-positives and make the defender graph GD exploitable.
the procedure. Each of these edges is tagged with the role of
the object in the procedure and also with an identifier. Where
an involved procedure exists, an object designating the third
3 2018
c
component is created. The object is linked to the appropriate - 2020 The MITRE Corporation. This work is reproduced and
distributed with the permission of The MITRE Corporation.
node with an edge annotated “Ref”. Figure 3 presents this 4 https://www.splunk.com
graph, characterized by 4 components, 180 unique objects, 5 https://github.com/Neo23x0/sigma
and 359 edges. 6 https://developers.virustotal.com/v3.0/reference#sigma-analyses
7 Datasets, code, and full graphs are publicly available at https://gitlab.inria.
2 https://attackevals.mitre-engenuity.org/APT29/operational-flow.html fr/cidre-public/from-ttp-to-ioc-dataset.

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1330 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

TABLE IV
D EFENDER P OINT OF V IEW: B IG G RAPH S PECIFICATIONS

Listing 6. Algorithm to Compute Coverage Rates.

a wider scope. Those could be qualified by a cognitive agent


as new IoC and be added in the IoC database.

VI. R ESULTS B. How to Reduce the Defender’s Graph to Unveil the


As we previously discussed, the attacker graph misses cer- Attacker
tain objects that the attacker is not aware of exposing. In our Threat Hunting is a cyclical discipline where the defender
experiment, the components present in the attacker’s graph identifies new IoCs, modifies detection rules to search for them
are all present in the defender’s graph, which is not surprising in a wider scope, analyzes the collected objects, and extracts
because they have proper monitoring, and all rules are enabled. new IoCs. In this process, not all objects can become IoCs.
However, the number of objects in the attacker’s graph (180) For example, if an attacker uses the attack procedure
is significantly inferior to the number of objects (1758) in the psexec to perform lateral movements, and if the victim’s
defender’s graph. This suggests that the defender’s graph con- administration team’s performs remote tasks with this tool,
tains many false-positives. In this context, false-positives are then considering psexec.exe or one of its hashes as an
generally legitimate or native objects of the system and whose IoC will cause a large number of false-positives. It is then
normal behavior triggers inappropriate or lax detection rules necessary to write a detection rule which will allow us to
R, or even verbose sensor configuration S. In other words, a specify the legitimate context of these administrative actions
false-positive is an object that can never become an IoC since (e.g., by specifying the source IP addresses and user accounts
it would not make sense to look for it in a wider scope or involved). The disabling of too verbose rules can be done
because it is not directly related to the malicious event that in post-processing to clean up GD and to make it more
occurs on the victim’s system. exploitable. We have experimented with two rule-disabling
In order to reduce these false-positives, the intervention of strategies called Top-objects and Top-events. The Top-objects
a cognitive agent is often necessary. However, it would be strategy consists of a sequence of rounds of disabling the rule
possible to automatically mark objects which correspond to that created the largest number of new unique objects in the
legitimate behaviors of the system while paying attention to defender’s graph. This stops upon reaching the highest cov-
malicious actions falling within the Living-off-the-Land [10] erage rate, without having too many objects in the graph and
paradigm. These actions have the particularity of staying under having the fewest rules disabled. The Top-events strategy is
the radar since they rely on native objects of the system in similar but first disables the rules, which are at the origin of
order to satisfy the attacker’s technical intentions. The next the largest number of EoI.
step is to find a method to remove these false-positives. It has Given a defender’s graph GD = (VD , →D ), disabling
to be done because their preponderance on the graph may a set of rules RD leads to a new defender’s graph, writ-
reduce the importance of interesting objects. The defender ten Update(GD , RD ), defined by removing from GD all the
will, therefore, have to find a more efficient way to identify edges:
them instead of manually qualifying each of them.
r,R(o,r) ,is_ioc
c −−−−−−−−−→ o
A. How to Measure the Quality of Defensive Architecture
From an omniscient perspective, the model allows for com- such that R(o,r) \ RD = ∅ and is_ioc = false and to remove
paring data from both sides, attacker and defender. The number (if it exists) the edge labeled by ref starting from o together
of objects from GA , existing also in GD , allows for estimat- with the component ending such edge. Starting from:
ing the efficiency of the entire defender detection chain. We • the attacker’s graph GA = (VA , →A ) containing objects
compute that 27.78% of all attacker objects are effectively in OA = VA ∩ O;
considered by the defender as Events of Interest (EoI). If this • the defender’s graph GD = (VD , →D ) containing
percentage is high enough to allow the defender to initiate a objects in OD = VD ∩ O;
Threat Hunting operation, the large number of objects present • an empty set RD of disabled rules;
in the defender’s graph may disturb him. Consequently, the • and an initial value coverage = v > 0 for coverage;
defender may not pay attention to particular objects in GD Listing 6 defines an algorithm to compute RD according to
that could become new IoCs. Some of them are objects that the Top-events strategy, where:
have very discriminating characteristics, like those presented • most_used(→D ) is a function providing the rule appear-
in Table III, and it would be productive to search for them in ing the most times in GD ;

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1331

Fig. 7. Evolutions of coverage rates and count of objects present in the graph GD according to the number of disabled rules.

• Update(GD , RD ) is a function that updates the graph allow a good understanding of the different layers and their
GD view by excluding the events resulting from the interactions. We observe two main lines of research that should
rules RD . converge: those that formalize the relationships between real-
Implementing RD as a list allows us to keep the order in which life components; and those focusing on the improvement of
the rules were disabled and therefore revert the graph GD to actors’ strategies, for instance, based on Game Theory.
an earlier state.
Figure 7 shows two 3D curves which correspond respec-
A. Need for Unified Views
tively to the deactivation of Top-objects and Top-events strate-
gies. We can observe that the Top-objects strategy seems to be Gianvecchio et al. [11] point out the semantic gap between
the most effective because it allows maintaining high coverage the defender and the attacker. Indeed, the attacker operates
while considerably reducing the number of objects. So when at the level of strategy and tactics; he focuses on target dis-
there are only 63 objects left in the graph and only 4 rules covery, and can deploy various kill chains tactics [8], [12].
have been deactivated, the coverage rate is 24.4%. However, the defender spends significant time processing low-
level, rule-generated alerts and single-log analysis can hardly
C. Discussion reveal the complete attack story for complex, multi-stage
attacks. Gianvecchio et al. reduce this gap by proposing an
By applying a rule disabling strategy, the defender will thus
explicit model of the attacker strategy using machine-readable
considerably reduce the number of objects to be investigated
data structures and clustering of security events around TTP
while maintaining a sufficiently high coverage rate. However,
annotations from a well-known behavioral taxonomy. This
it should be emphasized that, unlike a detection system which
model enables the defender to operate similarly to the attacker
aims to automatically contain a technical threat (e.g., anti-
at the strategic level without sacrificing their ability to drill
malware, Endpoint Detection and Response), the defensive
into evidential details. For their implementation, they used
infrastructure, as described in this article, aims to collect as
CALDERA, previously introduced by Applebaum et al. [13]
much Cyber Threat Intelligence as possible related to the
in order to automate Red Teaming while retaining the concept
adversary in order to be able to better hunt it in the victim’s
of TTP.
network. Thus, an exhaustive detection is not necessary since
the objective is not to block unknown threats but simply to
ensure the sufficient number of IoC that will allow it to be B. From Events of Interest to Objects
tracked in the victim’s network. This approach of measuring In [14], Najafi et al. are one of the first to introduce the intu-
the coverage rate is possible only in the context of Red versus ition behind global features for threat detection. They define a
Blue exercises. However, the disabling of too verbose rules can SIEM-based knowledge graph allowing highlighting the most
be done in post-processing without observing the evolution of important entities (what we call objects) and relationships
the coverage rate, but simply in order to clean up the graph observed in Event Logs of Interest extracted from DNS and
generated by the defender and to make it more exploitable. proxy logs. These relations are enriched with information gath-
ered from publicly available sources of threat intelligence.
VII. R ELATED W ORKS Over this knowledge graph, Najafi et al. design MalRank,
Threat Hunting is an agile and iterative process of search- a graph-based inference algorithm designed to infer a node
ing, characterizing, and later identifying attackers who may maliciousness score based on its associations to other entities
have compromised the victim’s network. In 2021, it is still a presented in the graph. MalRank has successfully been helpful
widespread focus of cyber defense research. We believe that in identifying previously unknown malicious entities such as
this area still requires further formalization efforts in order to malicious domain names and IP addresses.

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1332 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021

C. From Traces to Indicator of Compromise graph analysis subserves the attack correlations by identify-
In [9], Kurogome et al. propose to enhance the Threat ing event patterns. During this study, we better understood the
Hunting process by automatically generating accurate and origin of objects which then become IoC. We also emphasize
interpretable IoCs from malware traces. They design EIGER that some objects, which have a meaning only in the con-
that takes a dataset of traces computed from malware as input. text of the information system where they were found, can
EIGER then computes IoCs of different abstraction levels be very interesting to exploit for Threat Hunting. Finally, our
using an enumerate-then-optimize algorithm. Kurogome et al. model explains the mutual inference necessary for attacker
demonstrate that their generated IoCs bear comparison with and defender to understand each other. The most valuable
manually generated ones, which indicates that EIGER is an intelligence is the understanding of attacker’s procedures.
appealing complement to endpoint malware detection in real- In future work, we plan to focus on graph similarity com-
world security operations. This is an example of a concrete putation in order to design metrics that could testify to the
implementation for the function generate_IoC formalized quality of attacker and defender points of view. Such exper-
here. iments require implementing different comparison algorithms
and benchmarking them. However, beyond a metric confronta-
D. From Logs to Defender’s Perspective tion between two graphs, we hope the semantics introduced
in this article allows for interpreting the differences between
Pei et al. describe in [15] HERCULE a log-based intru- attacker and defender perceptions at a deeper level, towards
sion analysis system. It models the relationship between designing other defensive procedures.
multiple logs in the system and automatically generates a In the long run, our goal is to adjust defender levers defined
multidimensional weighted graph with potentially valuable in this article dynamically: sensor configuration, detection
information for the defender embedded within. Their proposed rules, and IoC database to better reveal the presence of an
graph provides a panoramic view of the logs generated by attacker. To achieve this goal, we might need to define a defen-
different system components and help the defender to under- sive strategy that takes deployment costs into account. Thus,
stand the whole attack trace. Recently, Burr et al. published a Red versus Blue exercise would have concrete and immedi-
a study [16] that focuses on community detection in graphs ate technical repercussions on the company’s cybersecurity to
constructed from Intrusion Detection System (IDS) alerts. fight Advanced Persistent Threats.
This article is in line with these works and proposes a richer
model that offers a dual view of the Threat Hunting process
by also taking the perception into account, but also knowledge A PPENDIX
and actions of the campaign from the attacker perspective. S ATISFACTION R ELATIONS
We believe that in the near future, this work will allow us to a) Sensor Configuration: Expressions φ of sensor configura-
join the research conducted in Game Theory and in Security tions are built from logical constants true, false and operators
Games [17], [18] where the Threat Hunting process requires and, or, not applied over pairs (r, c) where r is a role and c
more accurate models [19]. In these games, the defender, who is a property over the object playing this role (we write c(o)
is monitoring some collection of resources, has to decide how to express that an object o satisfies the property c):
to deploy a certain number of sensors or honeypots [20] at
some predetermined cost. His goal is to properly protect this φ ::= true | false | (r, c) | not φ | φ and φ | φ or φ
network at a minimal cost. The attacker’s goal is to cheat the
Then, given a set O = {(o1 , r1 ), . . . , (on , rn )} ⊆ O × R, the
defender in order to reach his objectives.
satisfaction relation |= of a condition is defined by:
VIII. C ONCLUSION O |= true
Threat Hunting is a fundamental step of an incident response O |= (ri , c) iff c(oi )
operation that allows for spotting the components of an O |= not φ iff O |= φ
information system compromised by an attacker. In this pro-
cess, the defender’s ambition is to shed light on the attacker’s
O |= φ1 and φ2 iff O |= φ1 and O |= φ2
propagation area in order to best prepare the operation for his O |= φ1 or φ2 iff O |= φ1 or O |= φ2
eradication.
We assume here that for each role in φ there exists an object
In this article, we have proposed a model to analyze both the
in O playing it and that a role occurs at most one time in O.
attacker propagation and the defender knowledge of that prop-
b) Detection Rule: Detection rules r are built from logical
agation. All the steps involved in a Threat Hunting approach
constants true, false and operators and, or, not applied over
have been carefully formalized here. Our approach allows
pairs (α, c) where α is a timestamp t, or an event type  ∈ E,
enhancing the knowledge base of the defender with new
or a component c ∈ C, or a role r ∈ R, and c is a property over
Indicators of Compromise, which can subsequently enable
α (we write c(α) to express that α satisfies the property c):
proactive threat detection. Furthermore, our model and its
experimentation highlight the existence of false-positives in r ::= true | false | (α, c) | not r | r and r | r or r
particular because of lax detection rules. This feature is valu-
able for the defender since it allows him to gain efficiency Then, given a trace x = (idx , t, , c, O), the satisfaction relation
and to improve his detection tools. In particular, because the |= of a detection rule r is inductively defined by:

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1333

x |= true R EFERENCES

x |= t  , c 
iff t = t and c(t) [1] E. C. Thompson, “Threat hunting,” in Designing a HIPAA-Compliant
  Security Operations Center. Berkeley, CA, USA: Apress, 2020.
x |=  , c iff  =  and c() [2] F. Maymi, R. Bixler, R. M. Jones, and S. D. Lathrop, “Towards a defi-

x |= c , c iff c = c and c(c) nition of cyberspace tactics, techniques and procedures,” in Proc. IEEE
Int. Conf. Big Data, 2017, pp. 4674–4679.
x |= (ri , c) iff (o, ri ) ∈ O and c(oi ) [3] R. Rodriguez. (2020). APT29 Activity From the ATT&CK Evaluations.
[Online]. Available: https://github.com/hunters-forge/mordor/tree/
x |= not r iff x |= r master/datasets/large/apt29
x |= r1 and r2 iff x |= r1 and x |= r2 [4] MITRE Corporation. (2019). ATT&CK Evaluation. [Online]. Available:
https://attackevals.mitre-engenuity.org/
x |= r1 or r2 iff x |= r1 or x |= r2 [5] OASIS Cyber Threat Intelligence. (2017). STIX a Structured Language
for Cyber Threat Intelligence. [Online]. Available: https://oasis-open.
c) Main Notations: github.io/cti-documentation/stix/intro
[6] MITRE Corporation. (2018). The MITRE Cyber Analytics Repository
– C is a set of components of the targeted network; c is a (CAR). [Online]. Available: https://car.mitre.org/
component in C [7] V. Mavroeidis and A. Jøsang, “Data-driven threat hunting using
– O = OD ∪ OA where OD (resp. OA ) is the set of objects Sysmon,” in Proc. 2nd Int. Conf. Cryptogr. Security Privacy, 2018,
pp. 82–88.
relative to the victim (resp. to the attacker); o ∈ O [8] A. Berady, V. V. T. Tong, G. Guette, C. Bidan, and G. Carat, “Modeling
– R is a set of roles associated with objects the operational phases of APT campaigns,” in Proc. 6th Annu. Conf.
– T is set of types associated with roles; t is a type in T Comput. Sci. Comput. Intell., 2019, pp. 96–101.
[9] Y. Kurogome et al., “EIGER: Automated IOC generation for accu-
– τ : R → T is the function from roles to types rate and interpretable endpoint malware detection,” in Proc. 35th Annu.
– DA (resp. D) : (T × OD ) → (C ∪ {None}) is the attacker Comput. Security Appl. Conf., 2019, pp. 687–701.
(resp. defender) directory [10] Sudhakar and S. Kumar, “An emerging threat fileless malware: A survey
– E is a set of types of observable events (by the defender) and research challenges,” Cybersecurity, vol. 32, p. 1, Jan. 2020.
[11] S. Gianvecchio, C. Burkhalter, H. Lan, A. Sillers, and K. Smith,
on components;  is an event in E “Closing the gap with APTs through semantic clusters and auto-
– ev = (, c, O) is an observable event on a component mated cybergames,” in Lecture Notes of the Institute for Computer
– O ⊆ O × R is a (sub)set of objects with their role Sciences, Social Informatics and Telecommunications Engineering.
Cham, Switzerland: Springer, 2019.
– x = (idx , t, , c, O) is a trace observed on c at date t [12] P. N. Bahrami, A. Dehghantanha, T. Dargahi, R. M. Parizi,
– S is a set of sensor configurations K.-K. R. Choo, and H. H. S. Javadi, “Cyber kill chain-based tax-
– S(c) is a sensor configuration on the component c onomy of advanced persistent threat actors: Analogy of tactics,
techniques, and procedures,” J. Inf. Process. Syst., vol. 15, no. 4,
– TTP is a set of procedures; p is a procedure in TTP pp. 865–889, 2019.
– A = e1 , . . . , eN is an attack campaign [13] A. Applebaum, D. Miller, B. Strom, C. Korban, and R. Wolf,
“Intelligent, automated red team emulation,” in Proc. 32nd Annu. Conf.
– e is a procedure execution Comput. Security Appl., 2016, pp. 363–373.
– M(e) is a machine (component) on which e is executed [14] P. Najafi, A. Mühle, W. Pünter, F. Cheng, and C. Meinel, “MalRank: A
– φ is a condition over some objects and their roles measure of maliciousness in SIEM-based knowledge graphs,” in Proc.
35th Annu. Comput. Security Appl. Conf., 2019, pp. 417–429.
– R ⊆ R are relevant roles for an event type  [15] K. Pei et al., “HERCULE: Attack story reconstruction via community
– (, φ, R ) is an element of a sensor configuration discovery on correlated log graph,” in Proc. 32nd Annu. Conf. Comput.
– IoC ⊆ O × T is an Indicators of Compromise database Security Appl., 2016, pp. 583–595.
– R is a set of detection rules; r is a rule in R [16] B. Burr, S. Wang, G. Salmon, and H. Soliman, “On the detection of
persistent attacks using alert graphs and event feature embeddings,” in
– EoIR,IoC (x) = (Rx , Ox ) is the function providing relevant Proc. NOMS, 2020, pp. 1–4.
rules (Rx ⊆ R) and objects (Ox ⊆ O) from a trace generated [17] A. H. Anwar and C. Kamhoua, “Game theory on attack graph for
cyber deception,” in Decision and Game Theory for Security. Cham,
by an Event of Interest Switzerland: Springer, 2020.
– G(e) = (Ve , →e ) is an attacker’s graph relating to an [18] T. E. Carroll and D. Grosu, “A game theoretic investigation of deception
execution e in network security,” in Proc. 18th Int. Conf. Comput. Commun. Netw.,
– GA (A) = (VA , →A ) is the attacker’s network propaga- 2009, pp. 1–6.
[19] M. Bilinski et al., Lie Another Day: Demonstrating Bias in a
tion graph resulting from the A attack campaign Multi-round Cyber Deception Game of Questionable Veracity. Cham,
– G(x, Rx , Ox ) is a defender’s graph relating to a trace x Switzerland: Springer, 2020.
[20] R. Píbil, V. Lisý, C. Kiekintveld, B. Bošanský, and M. Pěchouček,
– GD (X ) = (VD , →D ) is the defender’s perception graph “Game theoretic model of strategic honeypot selection in computer
of the attacker’s propagation associated with each Event of networks,” in Decision and Game Theory for Security. Berlin, Germany:
Interest computed from a set of traces X. Springer, 2012.

Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.

You might also like