You are on page 1of 4

(IJCNS) International Journal of Computer and Network Security, 71

Vol. 2, No. 6, June 2010

Clustering Based Machine Learning Approach for


Detecting Database Intrusions in RBAC Enabled
Databases
Udai Pratap Rao1, G. J. Sahani2, Dhiren R. Patel3
1
Dept. of Computer Engineering, S.V. National Institute of Technology Surat, Gujarat, INDIA
upr@coed.svnit.ac.in
2
Dept. of Computer Engineering, SVIT,Vadodara, Gujarat, INDIA
gurcharan_sahani@yahoo.com
3
Dept. of Computer Science & Engineering, Indian Institute of Technology Gandhinagar, Ahmedabad, Gujarat, INDIA
dhiren@iitgn.ac.in

attacks over a time. In today’s network environment is


Abstract: Database security is an important issue of any
organization. Data stored in databases is very sensitive and necessary to protect our data from attackers. Mainly
hence to be protected from unauthorized access and database attacks are of two types: 1) intentional
manipulations. Database management systems provide number unauthorized attempts to access or destroy private data; 2)
of mechanism to stop unauthorized access to database. But, malicious actions executed by authorized users to cause loss
intelligent hackers are able to break the security of database or corruption of critical data.
systems. Most of the database systems are vulnerable or the Although there are number of number of approaches
environment in which database system resides may be available to detect unauthorized attempt to access data,
vulnerable. People knowing such vulnerabilities can easily get attackers are succeeded in attacking the system because of
access to database. Unauthorized suspicious activities can be the vulnerabilities. As database security mechanisms are
trapped by database management systems. But, there are some not design to primarily detect intrusions, there are many
authorized users who can violet the security constraints.
cases where the execution of malicious sequences of SQL
Traditional database mechanisms are not sufficient to handle
such attacks. Early Detections of any authorized or
commands (transactions) cannot be detected. Therefore it
unauthorized access to database is very important for database becomes necessary to employ intrusion detection system [1].
recovery and to save the loss that can be occurred due to In case a computer system is compromised, an early
manipulation of data. There are number of database intrusion detection is the key for recovering lost or damaged data
detection systems to detect intrusions in network systems, these without much complexity. When an attacker or a malicious
IDSs cannot detect database intrusions. Very few IDS user updates the database, the resulting damage can spread
mechanism for databases has been proposed. Here we are very quickly to other parts of the database.
proposing unsupervised machine learning approach for Intrusion Detection System (IDS) provides good
database intrusion detections in databases enabled with Role protections from attacks aimed at taking down access to the
Based Access Control (RBAC) mechanism. network, such as Distributed Denial of Service attacks and
Keywords: Database Security, Clustering Technique, TCP SYN Flood attacks. But such systems cannot detect
Malicious Transactions malicious database activity done by users.
In recent years, researchers have proposed a variety of
1. Introduction approaches for increasing the intrusion detection efficiency
and accuracy [2]-[5]. But most of these efforts concentrated
Databases not only allow the efficient management and
on detecting intrusions at the network or operating system
retrieval of huge amounts of data, but also they provide
level. But, there have been very few ID mechanisms
mechanisms that can be employed to ensure the integrity of
the stored data. Data in these databases may range from specifically tailored to database systems. They are not
credit card numbers to personal information like medical capable of detecting malicious data corruptions. So,
records. Unauthorized access or modification to such data reasonable effort is required in area of database intrusion
results in big loss to customers. So, database security has detection system. Intrusion detection systems determine the
become an important issue of most of the organizations. normal behavior of users accessing the database. Any
Recently number of database attack incidents has been deviation to such behavior is treated as intrusion. There are
occurred and number of customer records was stolen. Most mainly two models of intrusion detection system, namely,
of the attacks were encountered because of bad coding of anomaly detection and misuse detection. The anomaly
database applications or exploiting database systems detection model bases its decision on the profile of a user's
vulnerabilities. Web applications are the main sources of normal behavior. It analyzes a user's current session and
database attacks. Attackers may attack databases for several compares it with the profile representing his normal
reasons and they may deduce newer techniques of database behavior. An alarm is raised if significant deviation is found
72 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 6, June 2010

during the comparison of session data and user's profile. generated as compared to the approach presented in [8].
This type of system is well suited for the detection of More rules generated reduce false alarms. But it is also not
previously unknown attacks. The main disadvantage is that, well suited approach for role based database access. Kamra
it may not be able to describe what the attack is and may et. al [10] have proposed a role based approach for detecting
sometimes have high false positive rate. In contrast, a malicious behavior in RBAC (role based access control)
misuse detection model takes decision based on comparison administered databases. Classification technique is used to
of user's session or commands with the rule or signature of deduce role profiles of normal user behavior. An alarm is
attacks previously used by attackers. raised if roles estimated by classification for given user is
We are presenting unsupervised machine learning different than the actual role of a user. The approach is well
approach for database intrusion detections in databases suited for databases which employ role based access control
enabled with role based access control (RBAC) mechanism. mechanism. It also addresses insider threats scenario
It means number of roles has been defined and assigned to directly. But limitation of this approach is that it is query-
users of database systems. Keeping database security in based approach and it cannot extract correlation among
view, proper privileges are assigned to these roles. queries in the transaction.
The rest of this paper is organized as follows. In section
2, we discuss related background. In section 3, a detailed 3. Our Approach
overview about our approach is given. In section 4, analysis
The approach we are presenting is a transaction level
and result of our approach is presented. Finally in section 5
approach. Attributes referred together for read and write
we conclude with the references at the end.
operations in transactions play important role in defining
normal behavior of user’s activities.
2. Related Work For example consider the following transaction:
Application of machine learning techniques to database
security is an emerging area of research. There are various Begin transaction
approaches that use machine learning/data mining select a1,a2,a3 from t1 where a1= 25;
update t2 set a4= a2+ 1.2(a3);
techniques to enhance the traditional security mechanisms
End transaction
of databases. Bertino et al. [6] have proposed a framework
based on anomaly detection techniques to detect malicious Where t1 and t2 are tables of the database and a1, a2, a3
behavior of database application programs. Association rule are the attributes of table t1 and a4, a5 are the attributes of
mining techniques are used to determine normal behavior of table t2 respectively.
application programs. Query traces from database logs are This example shows the correlation between the two
used for this purpose. This scheme may suffer from high queries of the transaction. It states that after issuing select
detection overhead in case of large number of distinct query, the update query should also be issued by same user
template queries. i.e. the number of association rules to be and in the same transaction. Approach presented in [10] can
maintained will be large. DEMIDS is a misuse-detection easily detect the attributes which are to be referred together,
system, tailored for relational database systems [7]. It uses but it cannot detect the queries which are to be executed
audit log data to derive profiles describing typical patterns together. This example shows the correlation between the
of accesses by database users. The main drawback of the two queries of the transaction. It states that after issuing
approach presented as in [7] is a lack of implementation and select query, the update query should also be issued by same
experimentation. The approach has only been described user and in the same transaction. Our approach extracts this
theoretically, and no empirical evidence has been presented correlation among queries of the transaction. In this
approach database log is read to extract the list of tables
of its performance as a detection mechanism. Yi Hu and
accessed by transaction and list of attributes read and
Brajendra Panda proposed a data mining approach [8] for
written by transaction. The extracted information is
intrusion detection in database systems. This approach
represented in the form of following structure format:
determines the data dependencies among the data items in (Read, TB-Acc[ ], Attr-Acc[ ][ ], Write, TB-Acc[ ],Attr-
the database system. Read and write dependency rules are Acc[ ][ ] )
generated to detect intrusion. The approach is novel, but its Where Read and Write are binary fields while TB-Acc[ ]
scope is limited to detecting malicious behavior in user is binary vector of size equal to number of relations in
transactions. Within that as well, it is limited to user database and Attr-Acc[ ][ ] is vector of N vectors and N is
transactions that conform to the read-write patterns assumed equal to the number of relations in the database. If
by the authors. Also, the system is not able to detect transaction contains select query then Read is equal to 1
malicious behavior in individual read-write commands. otherwise it is 0. Similarly, if transaction contains update or
False alarm rate is may be more. It also does not hold good insert query Write is equal to 1 otherwise it is 0. Element
for different access roles. Sural et al. [9] have presented a TB-Acc[i]=1 if SQL command at hand access i-th table and
approach for extracting dependency among attributes of 0 otherwise. Element Attr-Acc[i][j] = 1 if the SQL
database using weighted sequence mining. They have taken command at hand accesses the j- th attribute of the i-th table
sensitivity of data items into consideration in the form of and 0 otherwise. Table 1 shows the representation of
weights. Advantage of this approach is that more rules are example transaction given above using this format.
(IJCNS) International Journal of Computer and Network Security, 73
Vol. 2, No. 6, June 2010

Table 1: Representation of example transaction into number of groups, we have used k-means clustering
algorithm for clustering. K-means is the fastest among the
Rd t1 t2 a1 a2 a3 a4 a5
partitioning clustering algorithms. Training tuples
1 1 0 1 1 1 0 0 generated from database log has binary data fields.
Therefore similarity measures of binary variables can be
used for clustering such tuples. Similarity measure between
Table 1: (Continued) two tuples for clustering algorithm of our approach is as
follows.
Wt t1 t2 a1 a2 a3 a4 a5

1 0 1 0 0 0 1 0
ncount11
Where Rd=Read and Wt=Write simm(t1,t2) =
Values of fields of above structure will form the normal ncount11 + ncount10 + ncount01
behavior of the transaction to be issued by user. Violation to
such behavior will be detected as anomalous. The overall Where
approach is depicted by figure 1. ncount11 – count equals to number of similar binary
fields of both the tuples t1 and t2 has value 1.

Database Log (History ncount10 – count equals to number of similar binary


Transactions)
field of tuple t1 has value 1 and of tuple t2 has value 0.
ncount01 – count equals to number of similar binary
Preprocess field of tuple t1 has value 0 and of tuple t2 has value 1.
(Read Items, Write Items)

Current Session
For example consider the following transactions:
Clustering Clusters (Role Transaction tr1
(Learning Phase) Profiles)
User transaction Begin Transaction
select a1,a2 from t1;
Comparison
(Detection Phase) update t2 set a4;
End Transaction
Outlier Update
Corresponding bit pattern:
Raise Alarm New DB Log
110 1100010100010

Figure 1. Overview of the proposed approach Transaction tr2


Information about the role of the users who had issued the Begin Transaction
transactions and the data items read written through these
transactions is gathered from the database log. After select a1,a3 from t1;
gathering the history transaction from database log, it is update t2 set a5;
preprocessed and stored as binary bits representing the items
read and items written by the transactions in the form of End Transaction
structure presented above. Data generated in this form is Corresponding bit pattern:
form the dataset for clustering. Clustering forms the group
of similar transactions. These groups represent the normal 110 1010010100001
behavior of the users who have issued such transactions. It ncount11 = 5
represents the role profile of the users who are authorized to
issues such transactions. Once the role profiles are ncount10 = 2
generated, next goal is to predict group of new incoming ncount01 = 2
transactions. If the incoming transaction is found to be
member of any of the cluster, then the transaction is Similarly- similarity measure of tr1 and tr2 will be
considered as a valid transaction. If the incoming simm(tr1,tr2) = 5/(5+2+2) = 55.5 %
transaction is detected as an outlier, then it is considered as
an invalid transaction and an alarm is generated. Valid Advantage of our unsupervised approach is that role
transactions are fired on the database and are added to the information of the transactions need not have to log in
database log. As history transactions are to be partitioned database log. Behaviors of the users belonging to the same
74 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 6, June 2010

role are grouped into same cluster. Approach is also well and ulnerability Assessment (DIMVA), pages 123-
suited for the users with more than one role. Detection phase 140,2003.
need to be generalized only. [2] Lee, V. C.S., Stankovic, J. A., Son, S. H., “Intrusion
Detection in Real-time Database Systems Via Time
4. Result and Analysis Signatures,” In Proceedings of the Sixth IEEE Real
For verification of our approach, we generated number of Time Technology and Applications Symposium, pages
database tables with number of attributes. We defined 121-128, 2000.
number of roles and generated number of transactions for [3] Marco Vieira and Henrique Madeira, “Detection of
these roles. Based on these transactions, we also generated Malicious Transactions in DBMS,” IEEE Proceedings-
large number of tuples as a training dataset. For detection, 11th Pacific Rim International Symposium on
we generated number of valid as well as invalid Dependable Computing, PP: 8, Dec 12-14, 2005.
transactions. We tested our approach by supplying valid as [4] Ashish Kamra, Elisa Bertino, and Evimaria Terzi. ,
well as invalid transactions and our approach was detecting “Detecting anomalous access patterns in relational
these transactions with full accuracy. We considered all the databases,” The International Journal on Very Large
possible ways for generating valid and invalid transactions Data Bases (VLDB), 2008.
and we got the proper result for all the cases. Our approach [5] Wai Lup LOW, Joseph LEE, Peter TEOH., “DIDAFIT:
is perfectly detecting correlations among commands of the Detecting intrusions in databases through
transactions. We tested the approach by issuing the valid fingerprinting transactions,” ICEIS 2002 - Databases
transactions by eliminating one of the SQL command from and Information Systems Integration, pages 121-
the transaction and it was detected as invalid transaction. 127,2002.
When we issued the transactions with all the desired SQL [6] Elisa Bertino, Ashish Kamra, and James Early,
commands, it was detected as valid transaction. Training “Profiling database application to detect sql injection
time was also varying linearly with respect to number of attacks,” IEEE International Performance, Computing,
training tuples as per the expectations. Figure 2 shows the and Communications Conference (IPCCC) 2007, pages
nature of training time vs number of training tuples. 449–458, April 2007.
[7] C.Y. Chung, M. Gertz, and K. Levitt. , “DEMIDS: a
misuse detection system for database systems,” In
Integrity and Internal Control in Information Systems:
Strategic Views on the Need for Control. IFIP TC11
WG11.5 Third Working Conference, pages 159-178,
2000.
[8] Yi Hu and Brajendra Panda, “A data mining approach
for database intrusion detection,” In SAC ’04:
Proceedings of the 2004 ACM symposium on applied
computing, pages 711–716, New York, NY, USA,
2004.
[9] Abhinav Srivastava, Shamik Sural and A. K.
Majumdar, “Database intrusion detection using
weighted sequence mining,” Journal of Computers,
Vol. 1, NO. 4, pages 8-12, JULY 2006.
[10] Elisa Bertino, Ashish Kamra and Evimaria Terzi,
“Intrusion detection in rbac-administered databases,”
In Proceedings of the Applied Computer Security
Figure 2. Training Time Vs Training Data
Applications Conference (ACSAC), 2005.

4. Conclusion
In this paper we have proposed a new unsupervised machine
learning approach of database intrusion detection for
databases in which role based access control (RBAC)
mechanism is enabled. It considers the correlations among
the queries of the transaction and detects them accordingly.
It does not require role information to be logged in database
log. Clusters of transactions generated can also provide
guidelines to the database administrator for role definitions.

References
[1] Fredrik Valeur, Darren Mutz, and Giovanni Vigna., “A
learning-based approach to the detection of sql
attacks,” In Proceedings of the International
Conference on Detection of Intrusions and Malware,

You might also like