Professional Documents
Culture Documents
Large organisations deal with tremendous amount of qi denotes the ith query, i ∈ [1 … n]
data whose security is of prime interest. The data in
For example, suppose a user has id 1001. He/she
databases comprises of attributes describing real life
then executes the following set of SQL queries:
objects called as entities. The attributes have varying
levels of sensitivity, i.e. not all attributes are equally q1: SELECT a,b,c
important to the integrity of database. As an
example, the signatures and other biometric data are FROM R1,R2
highly sensitive data attributes for a financial WHERE R1.A>R2.B
organisation like Bank in comparison to others like
name, gender etc. So, unauthorised access to the q2: SELECT P
crucial attributes is of a greater concern. Only certain FROM R5
employees may have access to such data elements
and access by all others must be blocked WHERE R5.P==10
instantaneously to ensure Confidentiality and
Then this is said to be a transaction of the form:
consistency of data.
t=<1001,67,<q1,q2>>
Our proposed model QPAFCS (Query Pattern Access
and Fuzzy Clustering System)) pays special attention Definition 2 (Query) A query is a standard database
to sensitive data attributes and they have been management system token/request for inserting and
referred to as CDE (Critical Data Elements) in the text. retrieving data or information from a database table
The attributes that can be used to indirectly infer or combination of tables. We define query as a read
CDEs are also critical to the functioning of the or write request on an attribute of the relation. A
organisation. For instance, account number of a user query is represented as
may be used to access the signatures and other
<O(D1), O(D2), … O(Dn)>
crucial details about him. Such attributes have been
referred to as DAE (Directly Associated Elements) in where,
the text.
D1, D2, … Dn ∈ Rs
We propose a two-phase detection and prevention
model that clusters users based on similarity of their where Rs is the relation schema and Di are the
attribute access patterns and the types of queries attributes. O represents the operations i.e. Read or
performed by them, i.e. our model tries to track the write Operations. O ∈ {R, W}
user access pattern of each user and further classify it For example, examine the following transaction:-
as normal or malicious. The superiority of our model
lies in it’s ability to prevent unauthorised retrieving start transaction
and modification of most sensitive data
select balance from Account where
elements(CDEs). Our model also makes sure that the
Account_Number='9001';
query pattern for access of CDEs is specific and fixed
for a particular user to avoid data breaches, i.e. the select balance from Account where
user associates himself with his regular access Account_Number='9002';
behaviour. Any deviation from the regular
arrangement may lead to depreciation of user’s update Account set balance=balance-900 where
confidence and may act as representative of user’s Account_Number='9001' ;
malicious intent. The following terminologies are update Account set balance=balance+900 where
used: Account_Number='9002' ;
Definition 1 (Transaction) Set of queries executed by commit; //if all sql queries succed
a user. Each transaction is represented by a unique
transaction ID and also carries the user’s ID. Hence rollback; //if any of Sql queries failed or error
<Uid,Tid> act as unique identification key for each set
The query corresponding to this transaction is:
of query patterns. Each Transaction T is denoted as
<<R(Account_Number),R(balance)>, whose confidence is greater than the user defined
<R(Account_Number),R(balance)>, threshold (Ψconf). A read rule is represented as
<R(Account_Number),R(balance),W(balance)>,
{R(x1), R(x2) ...} ⇒ O(x).
<R(Account_Number),R(balance),W(balance)>>
For all sequential patterns <R(x1), R(x2), …, R(Xn-1),
Definition 3 (Read Sequence) A read sequence is
O(xn) > in read sequence set, generate the read rules
defined as
with the format {R(x1), R(x2) ...} ⇒ O(xn). If the
{R(x1), R(x2), … O(xn)} confidence of the rule is larger than the minimum
confidence (Ψconf), then it’s added to the answer set
where O represents the operations i.e. Read or write
of read rules, which implies that before xn , we need
Operations. O ∈ {R, W}. The Read sequence
to read x1,x2…….. xn-1
represents that the transaction may need to read all
data items x1, x2, …, xn-1 before the transaction For example:
performs operation (O∈ {R, W}) on data item xn.
The Read Rule corresponding to the read sequence
For example, consider the following update <R(a), R(b),
statement in a transaction.
R(c), R(d), W(x)> is:
Update Table1 set x = a + b + c where d = 90;
{R(a), R(b), R(c), R(d)} ⇒ W(x)
In this statement, before updating x, values of a, b, c
Definition 6 (Write Rules (WR)) Write rules are the
and d must
association rules generated from write sequences
be read and then the new value of x is calculated. So whose confidence is greater than the user defined
<R(a), R(b), threshold ( Ψconf). A write rule is represented as
Definition 4 (Write Sequence) A write sequence is For all sequential patterns O(x), W(x 1), W(x2), …,(xk)
defined as in the write sequence set, generate the write rules
with the format O(x)→W(x1), w(x2), …, w(xk). If the
{O(x1), W(x2), … W(xn)}
confidence of the rule is larger than the minimum
where O represents the operations i.e. Read or write confidence (Ψconf), then it’s added in the set of write
Operations i.e. O ∈ {R, W} which represents that the rules which depicts after updating x, data
transaction may need to write all data items x1, x2,
items x1, x2, …, xk must be updated by the same
…, xn-1 in this order after the transaction operates on
transaction.
data item xn.
For Example: The write rule corresponding to the
For example, consider the following update
write sequence
statements in one transaction.
<W(x), W(y),W(z)> is W(x) ⇒ {W(y),W(z)}
Update Table1 set x = a + b + c where a=50;
Definition 7 (Critical Data Elements (CDE)) They are
Update Table1 set y = x + u where x=60;
semantically defined data elements crucial to the
Update Table1 set z = x + w + v where w=80; functioning of the system. They are the data
attributes of prime significance having direct
Using the above example, it can be noted that <W(x), correlation to the integrity of the system. In a
W(y),W(z)> vertically hierarchical organisation, these are the
attributes accessed only by the top level
is one write sequence of data item x, that is <W(x),
management, and the access by lower levels of
W(y),W(z)> ∈
hierarchy is strictly protected.
WS(x), where WS(x) denotes the write sequence set
of x.
< T1234,U1001,<R(Account_number),R(balance)>>
where wi = |O(ai)|.
∑𝑢 𝑤(𝑢)𝑚 𝑢
|O(ai)| represents the total number of times user 𝛼𝑘 =
∑𝑢 𝑤(𝑢)𝑚
with the given Uid performs operation (O ∈ {R, W}) on
the aforesaid attribute ai in the pre-decided audit The objective function that is minimized to create
period. An audit period τ refers to a period of time clusters is defined as: [A. Mangalampalli and V.
such as one year, a time window τ = [t1, t2] or the Pudi (2009)][33]
recent 10 months. User vector is representative of
𝑛 𝐶
user’s activity.
𝑎𝑟𝑔 𝑚𝑖𝑛 ∑ ∑ 𝑤𝑖𝑗𝑚 ||𝑢𝑖 − 𝛼𝑗 ||2
Each of these wi would represent how frequently a 𝑖=1 𝑗=1
user performs the operation on the particular data
item. It also can be used in a normalized form, as is where
used in our proposed model QPAFCS. n is the total number of users,
UVID = <UID, < p(a1), p(a2), p(a3), … p(an)>> C is the number of clusters, and
where, m is the fuzzifier.
𝑤𝑘
p(𝑎𝑘 ) = The dissimilarity/distance function used in the
∑𝑤𝑗 𝜖 𝐵𝑖 𝑤𝑗
formation of fuzzy clusters is the modified Jenson
p(ak) is defined as the probability of accessing the Shannon distance[ Fuglede, Bent; Topsøe,,2004]
attribute ak. [31]which is illustrated as:
Value of p(𝑎𝑘 ) close to 1 would mean that the user Given two user vectors
accesses the given attribute frequently. UVx = <Ux, < px(a1), px(a2), px(a3), … px(an)>> and
Cluster generator: It takes user vectors and rules UVy = <Uy, < py(a1), py(a2), py(a3), … py(an)>>
as input and generates fuzzy clusters. Users are
of equal length n, the modified Jensen Shannon given user has accessed that CDE before. Next, it
distance is computed as is checked if any DAE is being accessed. A user can
perform write operation on a DAE iff it is
𝐷(𝑈𝑉𝑝 ||𝑈𝑉𝑞 )
previously written by the same user, otherwise
(1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) the transaction is termed as malicious. Next, we
(1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) log 2
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) check if the transaction abides by the rules that
+ are generally followed by similar users.
𝑛
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 ))
(1 + 𝑝𝑦 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 )) log 2
( (1 + 𝑝𝑥 (𝑎𝑖 ) ∗ 𝑤(𝑎𝑖 ))) PHASES OF TESTING PHASE:
=∑
2 Rule generator: This module takes the sequence
𝑖=1
as generated by the SQL query parser and gives the
where, w(ai) is the semantic weight associated with rule that the input transaction
the aith attribute
Ui=<UID, < p(a1), p(a2), p(a3) … p(ak) >, < c1, c2, … cC > >
where
Inputs Outputs follows. This can be a read rule or a write rule and
C1 C2 C3 C4 User User profile indicates the operations done by the user, data
Vector attributes accessed by the user and the order in
0.2 0.2 0.2 0.4 <U1001,0.2,
<U1001,<0.2, which they are accessed. Now this rule can be
0.109,0.9,
0.1,0.9,0.6>, checked for maliciousness.
0.6> <0.2,0.2,0.2,
0.4>> CDE Detector: The semantically critical elements
Consider a system with 4 fuzzy clusters and 4 referred to in our approach as CDEs are detected in
attributes, the given table illustrates the profile of this module. The read/ write rule corresponding to
user U1001. the incoming transaction is checked for the presence
of CDEs. If the rule being checked for maliciousness
3.3 Testing Phase contains a CDE, then it is dealt with using the
In section 3.2, the learning phase is described, following policy:-
in which the system is trained using non-malicious a. If read operation has been performed on any
or benign transactions. Now the trained model CDE, i.e. r(CDE) is present in the rule and
can be used to detect malicious transactions. In UV[i][r(CDE)] = 0 and UV[i][w(CDE)] = 0 for the
this phase, a test query is obtained as input and it given user, then the transaction is termed as
is compared with the model’s perception of user’s malicious.
access pattern, and the model perpetually b. If write operation has been performed on any
evaluates if the test transaction is malicious. It is CDE i.e. w(CDE) is encountered and
first checked whether the user is trying to access
a CDE. If yes, the transaction is allowed only if the
UV[i][w(CDE)] = 0 for the given user, then the given user, then the transaction is termed as
transaction is termed as malicious. malicious.
Our system seeks to prevent inference attacks by In order to quantitatively measure the
especially monitoring the DAEs. We lay emphasis on similarity between two rules, we use
write operations on DAEs. If write operation has been modified Jaccard distance [34]:
performed on any DAEs i.e. w(DAE) is present in the
rule to be checked and UV[i][w(DAE)] = 0 for the
JD = 1-𝛿1(R1 R2) - 𝛿2(R1 R2- R1 R2) Let the minimum value of ds corresponding
R1 R2 to each user be:
R2| μi > 𝛿 and i [1, k]
R(b) → R(a)
R(b), R(c) → Ra)
1. JC Distance