You are on page 1of 10

3.

Our Approach <Uid, Tid, <q1, q2, … qn>>

3.1 Basic Notations. where

Large organisations deal with tremendous amount of qi denotes the ith query, i ∈ [1 … n]
data whose security is of prime interest. The data in
For example, suppose a user has id 1001. He/she
databases comprises of attributes describing real life
then executes the following set of SQL queries:
objects called as entities. The attributes have varying
levels of sensitivity, i.e. not all attributes are equally q1: SELECT a,b,c
important to the integrity of database. As an
example, the signatures and other biometric data are FROM R1,R2
highly sensitive data attributes for a financial WHERE R1.A>R2.B
organisation like Bank in comparison to others like
name, gender etc. So, unauthorised access to the q2: SELECT P
crucial attributes is of a greater concern. Only certain FROM R5
employees may have access to such data elements
and access by all others must be blocked WHERE R5.P==10
instantaneously to ensure Confidentiality and
Then this is said to be a transaction of the form:
consistency of data.
t=<1001,67,<q1,q2>>
Our proposed model QPAFCS (Query Pattern Access
and Fuzzy Clustering System)) pays special attention Definition 2 (Query) A query is a standard database
to sensitive data attributes and they have been management system token/request for inserting and
referred to as CDE (Critical Data Elements) in the text. retrieving data or information from a database table
The attributes that can be used to indirectly infer or combination of tables. We define query as a read
CDEs are also critical to the functioning of the or write request on an attribute of the relation. A
organisation. For instance, account number of a user query is represented as
may be used to access the signatures and other
<O(D1), O(D2), … O(Dn)>
crucial details about him. Such attributes have been
referred to as DAE (Directly Associated Elements) in where,
the text.
D1, D2, … Dn ∈ Rs
We propose a two-phase detection and prevention
model that clusters users based on similarity of their where Rs is the relation schema and Di are the
attribute access patterns and the types of queries attributes. O represents the operations i.e. Read or
performed by them, i.e. our model tries to track the write Operations. O ∈ {R, W}
user access pattern of each user and further classify it For example, examine the following transaction:-
as normal or malicious. The superiority of our model
lies in it’s ability to prevent unauthorised retrieving start transaction
and modification of most sensitive data
select balance from Account where
elements(CDEs). Our model also makes sure that the
Account_Number='9001';
query pattern for access of CDEs is specific and fixed
for a particular user to avoid data breaches, i.e. the select balance from Account where
user associates himself with his regular access Account_Number='9002';
behaviour. Any deviation from the regular
arrangement may lead to depreciation of user’s update Account set balance=balance-900 where
confidence and may act as representative of user’s Account_Number='9001' ;
malicious intent. The following terminologies are update Account set balance=balance+900 where
used: Account_Number='9002' ;
Definition 1 (Transaction) Set of queries executed by commit; //if all sql queries succed
a user. Each transaction is represented by a unique
transaction ID and also carries the user’s ID. Hence rollback; //if any of Sql queries failed or error
<Uid,Tid> act as unique identification key for each set
The query corresponding to this transaction is:
of query patterns. Each Transaction T is denoted as
<<R(Account_Number),R(balance)>, whose confidence is greater than the user defined
<R(Account_Number),R(balance)>, threshold (Ψconf). A read rule is represented as
<R(Account_Number),R(balance),W(balance)>,
{R(x1), R(x2) ...} ⇒ O(x).
<R(Account_Number),R(balance),W(balance)>>
For all sequential patterns <R(x1), R(x2), …, R(Xn-1),
Definition 3 (Read Sequence) A read sequence is
O(xn) > in read sequence set, generate the read rules
defined as
with the format {R(x1), R(x2) ...} ⇒ O(xn). If the
{R(x1), R(x2), … O(xn)} confidence of the rule is larger than the minimum
confidence (Ψconf), then it’s added to the answer set
where O represents the operations i.e. Read or write
of read rules, which implies that before xn , we need
Operations. O ∈ {R, W}. The Read sequence
to read x1,x2…….. xn-1
represents that the transaction may need to read all
data items x1, x2, …, xn-1 before the transaction For example:
performs operation (O∈ {R, W}) on data item xn.
The Read Rule corresponding to the read sequence
For example, consider the following update <R(a), R(b),
statement in a transaction.
R(c), R(d), W(x)> is:
Update Table1 set x = a + b + c where d = 90;
{R(a), R(b), R(c), R(d)} ⇒ W(x)
In this statement, before updating x, values of a, b, c
Definition 6 (Write Rules (WR)) Write rules are the
and d must
association rules generated from write sequences
be read and then the new value of x is calculated. So whose confidence is greater than the user defined
<R(a), R(b), threshold ( Ψconf). A write rule is represented as

R(c), R(d), W(x)> ∈ RS(x). O(x) ⇒ {W(x1), W(x2) …}

Definition 4 (Write Sequence) A write sequence is For all sequential patterns O(x), W(x 1), W(x2), …,(xk)
defined as in the write sequence set, generate the write rules
with the format O(x)→W(x1), w(x2), …, w(xk). If the
{O(x1), W(x2), … W(xn)}
confidence of the rule is larger than the minimum
where O represents the operations i.e. Read or write confidence (Ψconf), then it’s added in the set of write
Operations i.e. O ∈ {R, W} which represents that the rules which depicts after updating x, data
transaction may need to write all data items x1, x2,
items x1, x2, …, xk must be updated by the same
…, xn-1 in this order after the transaction operates on
transaction.
data item xn.
For Example: The write rule corresponding to the
For example, consider the following update
write sequence
statements in one transaction.
<W(x), W(y),W(z)> is W(x) ⇒ {W(y),W(z)}
Update Table1 set x = a + b + c where a=50;
Definition 7 (Critical Data Elements (CDE)) They are
Update Table1 set y = x + u where x=60;
semantically defined data elements crucial to the
Update Table1 set z = x + w + v where w=80; functioning of the system. They are the data
attributes of prime significance having direct
Using the above example, it can be noted that <W(x), correlation to the integrity of the system. In a
W(y),W(z)> vertically hierarchical organisation, these are the
attributes accessed only by the top level
is one write sequence of data item x, that is <W(x),
management, and the access by lower levels of
W(y),W(z)> ∈
hierarchy is strictly protected.
WS(x), where WS(x) denotes the write sequence set
of x.

Definition 5 (Read Rules (RR)) Read rules are the


association rules generated from Read sequences
Type of Attribute Sensitivity Level Dubiety Score is indicative of the amount of deviation
Critical data Elements Highest between the user’s access pattern and his designated
Directly Associated Medium role. Dubiety Score combined with the deviation of
Elements user’s present query from his normal behaviour
Normal Attributes Low pattern, yields the output of the proposed IDS.
CDEs are tokens of behaviour that our model uses for
the malicious activity recognition of users of system. For our paper:

Definition 8 (Critical Rules (CR)) A set of rules that 0<= φ<=1.


contain a Critical Data Element in its antecedent or
Higher the Dubiety Score, more is the evidence
consequent.
against user following the assigned role, that is more
CR = {ζ | (ζ ∈ RR ∨ ζ ∈ WR) ∩ (x ∈ CDE ∩ ({R(x1), R(x2) is the malicious intent i.e. rogue behaviour.
…} ⇒ O(x) ∪ O(x) ⇒ {W(x1), W(x2) …}))}
Definition 11 (Dubiety Table) A table maintaining the
We propose a method of user Access Pattern record of dubiety scores of each user. It contains two
Recognition using the Critical Rules. CR recognize the attributes: UserID and Dubiety Score.
actions and goals of Users from a series of
The initial Dubiety scores are set to 1.
observations on the users' actions and the
environmental conditions, i.e. the user query pattern Uid φ
associated to the Critical data elements. 1001 1
1002 1
1003 1
Definition 9 (Directly Associated elements (DAE)) The 1004 1
attributes except those present in CDE, which are 1005 1
either part of antecedents or consequents of Critical
Rules. The dubiety table is updated each time a user
DAE = {μi| μi ∈ CR ∩ μi ∉ CDE}. performs query.
The query patterns as perceived by our model For example:
QPAFCS are explored using DAEs that represent the
first level of access of the CDEs. A user's behaviour is Let user 1001’s deviation from normal query is
represented by a set of first-order statements quantified as 0.81, Then the updated Dubiety table is
(derived from queries) called attribute hierarchy as shown.
encoded in first-order logic, which defines
Where:
abstraction, decomposition and functional
relationships between types of access arrangements. ds = deviation from normal query
The unit-transactions accessing CDEs are
decomposed into attribute hierarchy comprising of φi = Initial dubiety score.
DAEs, which further represents the user’s most Uid √𝑑𝑠 ∗ фi
sensitive retrieval pattern.
1001 0.9
Example: 1002 1
1003 1
 R(b) → R(a) 1004 1
 R(b), R(c) → R(a) 1005 1
If a is a CDE, then the set {b,c} represents DAEs.

Definition 10 (Dubiety Score(φ)) A measure of 3.2 Learning Phase


anomaly exhibited by a user in the past based on his
We start our learning phase by reading the training
historic transactional data. This score summarizes the
dataset into the memory and extracting useful
user’s historic malicious access attempts. Dubiety
patterns out of it. Our system requires non-malicious
Score attempts to quantify the personnel
training dataset composed of transactions executed
vulnerability that the organisation faces because of a
by trusted users. The model aims at generating user-
particular user.
profiles from the transaction-logs, and quantifies
deviation from normal behaviour i.e. this phase aims start transaction
to recognise and characterise the user activity
select balance from Account where
pattern on the basis of their queries arrangement.
Account_Number='9001';
The following are various components of architecture
of the proposed model: commit; //if all sql queries succed

rollback; //if any of Sql queries failed or error

The parser generates a unique Transaction ID say


T1234 followed by parsing the transaction. The parser
finally yields :

< T1234,U1001,<R(Account_number),R(balance)>>

Frequent sequences generator: After the SQL


query parser generates the sequences, the generated
sequences are pre-processed. Then weights are
assigned to data items, for instance the CDEs are
given greater weight as compared to DAEs and other
normal attributes. Then finally these pre-processed
sequences are given as inputs to frequent sequences
generator. It uses the prefix span algorithm to
COMPONENTS OF ARCHITECTURE: generate frequent sequences out of input sequences
corresponding to each UID.
Training data: A transaction log is a sequential record
of all changes made to the database while the actual Rule generator: The frequent sequences are given
data is contained in a separate file. The transaction as inputs to the rule generator module which uses
log contains enough information to undo all changes association rule mining to generate read rules and
made to the data file as part of any individual write rules out of the frequent sequences.
transaction. The log records the start of a
transaction, all the changes considered to be a part As an example, if the input frequent sequences are:
of it, and then the final commit or rollback of the 1. <R(m),R(n),R(o),W(a)>
transaction. Each database has at least one physical 2. <R(m),R(n),W(o),W(a)>
transaction log and one data file that is exclusive to 3. <R(m),W(n),W(o),W(a)>
the database for which it was created. Our initial 4. <W(a),R(b),W(o)>
input to the learning phase algorithm is the 5. <R(a),R(b),R(m),W(a)>
transaction log, with only authorised and consistent 6. <R(a),R(b),W(m),W(b)>
transactions. This data is free of any unauthorised
activity and is used to form user profiles, role profiles S.No. Frequent Sequences Associated Rules
etc based on normal user transactions. The logs are 1 <R(m),R(n),R(o),W(a)> R(m),R(n),R(o)
scanned, and the following elements are extracted: →W(a)
2 <R(m),R(n),W(o),W(a)> R(m),R(n),W(o)
a. SQL Queries →W(a)
3 <R(m),W(n),W(o),W(a)> R(m),W(n),W(o)
b. The user executing a given query →W(a)
SQL query parser: This is a tool that takes SQL 4 <W(a),R(b),W(o)> W(a),R(b) →W(o)
queries as input, parses them and produces 5 <R(a),R(b),R(m),W(a)> R(a),R(b),R(m)
→W(a)
sequences(read and write) corresponding to the SQL
6 <R(a),R(b),W(m),W(b)> R(a),R(b),W(m)
query as output. The query parser also assigns a
→W(b)
unique Transaction ID. The final output consists of
two 3 columns: (TID), UID (User ID) and the read and
write sequence generated by the parsing algorithm. DAE generator: In our approach, we semantically
define a class of data items known as Critical data
As an Example, if the following transaction
elements or CDEs. These CDEs and rules are given as
performed by user U1001 is examined:
input to our DAE (Directly associated element) clustered into different fuzzy clusters based on the
generator which specifies all those elements as DAE similarity of their user vectors. A cluster profile would
which are present in either the antecedent or the include
consequent of those rules that involve at least one of
Ci = <CID, {R}>
the CDEs.
where, CID represents the cluster centroid, and
Algorithm 1: DAE Generator
Data: CDE, Set DAE = {}, RR = Set of Read {R} is a set of rules which is formed by taking the
Rules, WR = Set of Write Rules union of all the rules that the members of the given
Result: The set of Directly Associated elements fuzzy cluster abide by.
DAE
Function: DAE Generator (CDE, RR, WR) We have used Fuzzy c-means[Dunn, J.C., 1973][32]
for Ω є RR ∪ WR do clustering to create cluster. Each user belongs to a
for α є Ω do cluster to a certain degree wij.
if α є CDE
while β є Ω do Where:
DAE {} ⃪ β
wij represents the membership coefficient of the ith
end
end user (ui) with the jth cluster
end The centre of a cluster (α) is the mean of all points,
end
weighted by their membership coefficients.
Mathematically,
User vector generator: Using the frequent
sequences for the given audit period, it generates the 1
𝑤𝑖𝑗 = 2
user vectors. A user vector is of the form ||𝑢𝑖 − 𝛼𝑗 || 𝑚−1
𝐶
∑𝑘=1 ( )
BID = < UID, w1, w2, w3, ... wn > ||𝑢𝑖 − 𝛼𝑘 ||

where wi = |O(ai)|.
∑𝑢 𝑤(𝑢)𝑚 𝑢
|O(ai)| represents the total number of times user 𝛼𝑘 =
∑𝑢 𝑤(𝑢)𝑚
with the given Uid performs operation (O ∈ {R, W}) on
the aforesaid attribute ai in the pre-decided audit The objective function that is minimized to create
period. An audit period τ refers to a period of time clusters is defined as: [A. Mangalampalli and V.
such as one year, a time window τ = [t1, t2] or the Pudi (2009)][33]
recent 10 months. User vector is representative of
𝑛 𝐶
user’s activity.
𝑎𝑟𝑔 𝑚𝑖𝑛 ∑ ∑ 𝑤𝑖𝑗𝑚 ||𝑢𝑖 − 𝛼𝑗 ||2
Each of these wi would represent how frequently a 𝑖=1 𝑗=1
user performs the operation on the particular data
item. It also can be used in a normalized form, as is where
used in our proposed model QPAFCS. n is the total number of users,
UVID = <UID, < p(a1), p(a2), p(a3), … p(an)>> C is the number of clusters, and
where, m is the fuzzifier.
𝑤𝑘
p(𝑎𝑘 ) = The dissimilarity/distance function used in the
∑𝑤𝑗 𝜖 𝐵𝑖 𝑤𝑗
formation of fuzzy clusters is the modified Jenson
p(ak) is defined as the probability of accessing the Shannon distance[ Fuglede, Bent; Topsøe,,2004]
attribute ak. [31]which is illustrated as:

Value of p(𝑎𝑘 ) close to 1 would mean that the user Given two user vectors
accesses the given attribute frequently. UVx = <Ux, < px(a1), px(a2), px(a3), … px(an)>> and
Cluster generator: It takes user vectors and rules UVy = <Uy, < py(a1), py(a2), py(a3), … py(an)>>
as input and generates fuzzy clusters. Users are
of equal length n, the modified Jensen Shannon given user has accessed that CDE before. Next, it
distance is computed as is checked if any DAE is being accessed. A user can
perform write operation on a DAE iff it is
𝐷(𝑈𝑉𝑝 ||𝑈𝑉𝑞 )
previously written by the same user, otherwise
(1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) the transaction is termed as malicious. Next, we
(1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) log 2
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) check if the transaction abides by the rules that
+ are generally followed by similar users.
𝑛
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 ))
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) log 2
( (1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 ))) PHASES OF TESTING PHASE:
=∑
2 Rule generator: This module takes the sequence
𝑖=1
as generated by the SQL query parser and gives the
where, w(ai) is the semantic weight associated with rule that the input transaction
the aith attribute

User profile generator: This module takes user


vectors and the cluster profiles as input and
generates user profiles. A user profile is of the form

Ui=<UID, < p(a1), p(a2), p(a3) … p(ak) >, < c1, c2, … cC > >

where

UID is a unique ID given to each user,

<p(a1), p(a2), p(a3), … p(an)> is a 1-D matrix containing


the probability of the user accessing a particular
attribute, and

< c1, c2, … cC > is a vector representing the


membership coefficients of the given user for C
different clusters.

As an Example: Fig. Architecture of Testing Phase.

Inputs Outputs follows. This can be a read rule or a write rule and
C1 C2 C3 C4 User User profile indicates the operations done by the user, data
Vector attributes accessed by the user and the order in
0.2 0.2 0.2 0.4 <U1001,0.2,
<U1001,<0.2, which they are accessed. Now this rule can be
0.109,0.9,
0.1,0.9,0.6>, checked for maliciousness.
0.6> <0.2,0.2,0.2,
0.4>> CDE Detector: The semantically critical elements
Consider a system with 4 fuzzy clusters and 4 referred to in our approach as CDEs are detected in
attributes, the given table illustrates the profile of this module. The read/ write rule corresponding to
user U1001. the incoming transaction is checked for the presence
of CDEs. If the rule being checked for maliciousness
3.3 Testing Phase contains a CDE, then it is dealt with using the
In section 3.2, the learning phase is described, following policy:-
in which the system is trained using non-malicious a. If read operation has been performed on any
or benign transactions. Now the trained model CDE, i.e. r(CDE) is present in the rule and
can be used to detect malicious transactions. In UV[i][r(CDE)] = 0 and UV[i][w(CDE)] = 0 for the
this phase, a test query is obtained as input and it given user, then the transaction is termed as
is compared with the model’s perception of user’s malicious.
access pattern, and the model perpetually b. If write operation has been performed on any
evaluates if the test transaction is malicious. It is CDE i.e. w(CDE) is encountered and
first checked whether the user is trying to access
a CDE. If yes, the transaction is allowed only if the
UV[i][w(CDE)] = 0 for the given user, then the given user, then the transaction is termed as
transaction is termed as malicious. malicious.

Dubiety Score Calculator and Analyser: If the


Algorithm 2: CDE Detector transaction has not been found malicious in the
Data: Set of rules (ϒ) from test transaction, previous two modules, we check if the transaction is
Set χCDE, UID, User Profile(ϴ) malicious based on the previous history of the user
Result: Checks whether the test transaction and the behaviour pattern of all similar
is malicious or normal with respect to CDE users(modified Jenson Shannon distance). To do so,
for Ѓє ϒ do we maintain a record of action of all users by keeping
for ϱ є Ѓ do the measure of Dubiety Score(φi).
if ϱ є χCDE then
if w(ϱ) є Ѓ & ϴ[UID][w(ϱ)] == 0 then The deviation of a user’s new transaction with his
Raise Alarm; normal access pattern is referred to as Dubiety, and
end the relative measure of Dubiety is the Dubiety Score.
if r(ϱ) є Ѓ & ϴ[UID][r(ϱ)] == 0 & Our IDS keeps a log of the DS(Dubiety Score) in a
ϴ[UID][w(ϱ)] == 0 then separate table. A user who is a potential threat tends
Raise Alarm; to have a high dubiety score. Another intuition that
end our system follows is that any transaction that a user
end makes matches significantly either with the
end transactions the same user or similar users have
end made in the past.

We use a measure ds to keep a track of the


maximum similarity of the given rule. We combine ds
DAE Detector: This module addresses the issue of with φ i to get the final measure of dubiety score φ f
inference attacks on CDEs. As discussed earlier, for the given user. We define 2 thresholds ФLT and
certain data elements can be used to access the ФUT. ФUT represents the upper limit for the dubiety
CDEs, i.e. first order inference. This module uses the score of a non-malicious user whereas ФLT denotes
rules mined in the learning phase to determine which the lower limit . This means that if φf for a user
elements can be used to directly infer the DAEs. comes out to be greater than ФUT, the user is
malicious. On the other hand, φf value less than ФLT
denotes a benign user.
Algorithm 3: DAE Detector
Data: Set of rules (ϒ) from test transaction,  If the incoming rule (R1) is a write rule, then
Set χDAE, UID, User Profile(ϴ) the consequent of the incoming rule is
Result: Checks whether the test transaction matched with the corresponding rules in the
is malicious or normal with respect to DAE cluster of which a user is as part. A user is
for Ѓє ϒ do said to be the part of the ith cluster iff:
for ϱ є Ѓ do μi > 𝛿.
if ϱ є χDAE then Where,
if w(ϱ) є Ѓ & ϴ[UID][w(ϱ)] == 0 then μi is the fuzzy membership coefficient of the
Raise Alarm; given user for the ith cluster.
end 𝛿 is a user defined threshold.
end  If the incoming rule (R1) is a read rule, then
end the antecedent of the incoming rule is
end matched with the corresponding rules in the
cluster of which a user is as part.

Our system seeks to prevent inference attacks by  In order to quantitatively measure the
especially monitoring the DAEs. We lay emphasis on similarity between two rules, we use
write operations on DAEs. If write operation has been modified Jaccard distance [34]:
performed on any DAEs i.e. w(DAE) is present in the
rule to be checked and UV[i][w(DAE)] = 0 for the
JD = 1-𝛿1(R1 R2) - 𝛿2(R1 R2- R1 R2) Let the minimum value of ds corresponding
R1 R2 to each user be:
R2| μi > 𝛿 and i [1, k]

Algorithm 4: Modified Jaccard Distance Uid ds


Data: Rules R1, R2; 𝛿1, 𝛿2; Set χR1, χR2 1001 0.2
Result: Distance between the two rules 1002 0.3
(Ԏ) 1003 0.2
Function jcDistance (R1, R2) 1004 0.6
for Ω є R1 do 1005 0.3
χR1 ← Ω;
end The calculated dubiety score table :
for Ω’ є R2 do
χR2 ← Ω’; Uid φf= √𝑑𝑠 ∗ фi
end 1001 0.42
Ԏ= 1002 0.49
𝛿1∗(𝜒𝑅1 𝜒𝑅2)– 𝛿2∗(𝜒𝑅1 𝜒𝑅2 – 𝜒𝑅1 𝜒𝑅2) 1003 0.2
;
𝜒𝑅1 𝜒𝑅2 1004 0.6
return Ԏ; 1005 0.46

Taking ФLT=0.3 and ФUT=0.6

Uid φf Nature of Updated


 The minimum value of JD is regarded as ds. φi Transaction φf
is fetched directly from dubiety table. Final 1001 0.42 Non- 0.42
dubiety score for the given user is calculated malicious
as: 1002 0.49 Non- 0.49
φf =√𝑑𝑠 ∗ фi malicious
 If φf < ФLT, the transaction is termed as non- 1003 0.2 Non- 0.198
malicious. In this case, the current dubiety malicious
score in the dubiety table for the given user is 1004 0.6 Malicious 0.6
reduced by a factor known as “amelioration 1005 0.46 Non- 0.46
factor(Å)”. malicious
Thus, φi is updated as
φi = Å φ i
 If ФUT > φf ≥ ФLT, the transaction is termed as
non-malicious and the dubiety table entry for 4. Discussion
the given user is updated with φf.
With regard to a typical credit card company dataset,
 If φf ≥ ФUT the transaction is termed as
some examples of critical data elements (CDEs) are: -
malicious.
 As an Example, Let the initial dubiety table 1. CVV (denoted by a)
be:
Card verification value (CVV) is a combination of
Uid φ features used in credit, debit and automated teller
1001 0.9 machine (ATM) cards for the purpose of establishing
1002 0.8 the owner's identity and minimizing the risk of fraud.
1003 0.2 The CVV is also known as the card verification code
1004 0.6 (CVC) or card security code (CSC).
1005 0.7
When properly used, the CVV is highly effective
against some forms of fraud. For example, if the data
in the magnetic stripe is changed, a stripe reader will The modified JC Distance between R1 & R2 where the
indicate a "damaged card" error. The flat-printed CVV hyperparameters are 𝛿1 = 0.70 and 𝛿2 = 0.20, is
is (or should be) routinely required for telephone or calculated as
Internet-based purchases because it implies that the
JC Distance = 1-𝛿1(R1 R2) - 𝛿2(R1 R2- R1 R2)
person placing the order has physical possession of
the card. Some merchants check the flat-printed CVV R1 R2
even when transactions are conducted in person.
R1 R2 = 2
CVV technology cannot protect against all forms of
fraud. If a card is stolen or the legitimate user is R1 R2 = 4
tricked into divulging vital account information to a
JC Distance = 0.75
fraudulent merchant, unauthorized charges against
the account can result. A common method of stealing 2. User Profile Vector
credit card data is phishing, in which a criminal sends B1 = <U1, <0.7, 0.1, 0.6, 0.2, 0.4, 0.0, 0.2, 0.0>,
out legitimate-looking email in an attempt to gather <0.2, 0.3, 0.1, 0.2, 0.167, 0.033> >
personal and financial information from recipients. Here the values in the second tuple <0.7, …0.0>
Once the criminal has possession of the CVV in represent the probability of User U1 accessing
addition to personal data from a victim, widespread particular attributes, for instance 0.7 denotes
fraud against that victim, including identity theft, can that there is a 70% probability that U1 accesses
occur. the first attribute.
The values in the third tuple represent the
The following are directly associated
membership of user U1 in the various(k) fuzzy
elements (DAEs) to CVV:-
clusters, which is 6 in our case.
a. Credit card number (denoted by b)
b. Name of card holder (denoted by c) 3. Dubiety Score
c. Card expiry date (denoted by d) Suppose the Dubiety Score φi for User U1 is 0.8.
The JC Distance of the test transaction with its
Credit Card Number, Name of card holder, Card cluster is 0.6. Then,
expiry date are elements that are read before CVV φf =√𝑑𝑠 ∗ φi
and hence used to validate the CVV entered by the
φf =√0.6 ∗ 0.8 = 0.69
user. Hence the above-mentioned attributes have
been classified as DAEs, by our system.
Setting our hyperparameter ФUT as 0.65. We observe
Some normal data attributes are: - that φf > ФUT. Hence the test transaction is malicious,
and an alarm is raised.
1. Gender of Customer (denoted by e)
2. Credit Limit (denoted by f)
3. Customer’s phone number (denoted by g)

These are the attributes that have been collected for


the fraud detection and are not directly used to
access the CDE but are crucial for the process.

Some examples of transactions for our proposed


approach:

 R(b) → R(a)
 R(b), R(c) → Ra)

5. Example to our Approach

1. JC Distance

R1: R(c), R(b) → R(a)

R2: R(d), R(b) → R(a)

You might also like