Professional Documents
Culture Documents
net/publication/272295945
CITATIONS READS
20 2,352
3 authors, including:
Debabrata Kar
Silicon Institute of Technology
6 PUBLICATIONS 122 CITATIONS
SEE PROFILE
All content following this page was uploaded by Debabrata Kar on 16 February 2015.
1 Introduction
Web applications are exposed to various types of security threats among which
SQL Injection attack is predominantly used against web databases. The Open
Web Application Security Project (OWASP) ranks it on top among the Top-10
security threats [1]. According to TrustWave 2012 Global Security Report [2],
SQL injection was the number one attack method for four consecutive years.
Nowadays, attackers use sophisticated Botnets [3] which automatically discover
vulnerable web pages from search engines like Google and launch mass SQL
injection attacks from distributed sources. About 97% of data breaches across
the world occur due to SQL injection alone [4].
2 D. Kar, S. Panigrahi, S. Sundararajan
the parameter pid has been used directly for constructing the dynamic query
assuming that an integer value (e.g., 24) will be always received. However, if
an attacker adds “ OR 1 = 1” to the URL then the string value “24 OR 1 =
1” will be used in the code which will generate the query as “SELECT * FROM
products WHERE prod id = 24 OR 1 = 1”. Upon execution, it will return the
entire products table as the query evaluates to true for all rows. This is an
example of the simplest type of SQL injection known as tautological attack. There
are several other types of SQL injection attacks [16], using which an attacker
can steal sensitive information from other database tables.
Commercial Intrusion Detection Systems (IDS) and Web Application Fire-
walls (WAF) mostly rely on regular expression based filters created from known
attack signatures. However, signature-based filters can be circumvented using
a number of bypassing techniques [17, 18]. For example, consider the following
attacks, showing only the injection code added by an attacker:
OR 'A' = 'A'
OR -5.24 = 17.43 - 22.67
OR 419320 = 0x84B22 - 0x1E52A
OR 'ABC' = CONCAT('A', 'B', 'C')
OR 'ABC' = CoNcAt(cHaR(0x28 + 25), cHaR(0x42), cHAr(80 - 0x0d))
OR 'XYZ' = sUbStRiNg(cOncAt('AB', 'CX', 'YZ', 'EF'), 4, 3)
OR 'XYZ' = /*!SuBsTrInG(*/CoNcAt('aB',/*!'cX','YZ',*/'eF')/*!,4,3)*/
All of these expressions evaluate to true, hence they are tautological attacks.
Similar bypassing techniques can also be used in other types of SQL injection
attacks. SQL injection attack expressions can be formed in a number of ways
to the extent that constructing regular expression based filters becomes almost
impossible. Signature based systems can be eluded by the attacker with creative
changes to the injected expressions with a little effort.
3 Proposed Approach
Two key ideas, the query transformation scheme and application of document
similarity measure, constitute the core of our approach. The system begins with
an initial set of SQL injection attacks collected from honeypot web applications.
These are converted into text form using a transformation scheme and then
grouped into clusters by their document similarity. Attack vectors in each cluster
are merged into a document. Each document thus contains a set of highly similar
attack vectors. At run-time, SQL injection attacks are detected by comparing
the document similarity of an incoming query with these documents. Rest of
this section describes our approach and architecture of SQLiDDS in detail.
4 D. Kar, S. Panigrahi, S. Sundararajan
insensitive manner. All symbols and special characters are transformed in Step–9
as per the scheme given in Table 2. The last two steps convert the transformed
query to uppercase and reduce multi-spaces to single spaces. To visualize how the
transformation scheme works, consider the following SQL queries, intentionally
written in mixed-case to show the effect of transformation.
sEleCt * fRoM products wHeRe price > 10.00 aNd discount < 8
SeLeCt email FrOm customers WhErE fname LIKE 'john%'
SELECT STAR FROM USRTBL WHERE USRCOL GT DEC AND USRCOL LT INT
SELECT USRCOL FROM USRTBL WHERE USRCOL LIKE SQUT STR PRCNT SQUT
This query contains a number of symbols and operators, which is not suitable
for application of a document similarity measure. The transformation scheme
converts it completely into text form as:
SELECT STAR FROM USRTBL WHERE USRCOL EQ INT OR SQUT STR SQUT EQ CONCAT
LPRN CHAR LPRN HEX PLUS INT RPRN CMMA CHAR LPRN HEX RPRN CMMA CHAR LPRN
INT MINUS HEX RPRN RPRN
Pn
d~i · d~j wt ,d wt ,d
Sim(di , dj ) = cos(θ) = = qP k=1 kqiP k j (1)
kd~i kkd~j k n
w2
n
k=1 w2 tk ,di k=1 tk ,dj
Sentinel (Java), etc., and applied on the three websites. Each tool was applied
multiple times with different settings to ensure that all possible types of SQL
injection attack patterns are captured. We also applied injection attacks on the
three websites using several tips and tricks collected from tutorials, forums, and
black hat sites. During the entire process, all SQL queries generated by the web
applications were recorded by MySQL server in the general query log.
Over 4.52 million lines were written into the log files from which 3.35 million
SQL queries were extracted. After removing duplicates, INSERT queries, and
queries without a WHERE clause, total 245,356 nos. of unique SELECT, UPDATE,
and DELETE queries were obtained, containing genuine as well as injected queries.
Each query was manually examined and the injected queries were separated. Out
of the 49,273 injected queries identified, 16,702 unique queries were obtained,
containing a natural mix of all types of SQL injection attacks. The portion after
the first WHERE keyword of these queries were extracted and normalized using the
transformation scheme producing 3,896 distinct injection attack patterns. The
patterns were stored in the injected structures database and their MD5 hashes
were stored in the reference hash table.
Web # Queries
Application Intercepted TP FN FP TN TPR FPR Accuracy
Bookstore 4477 3862 52 6 557 98.67% 1.07% 98.70%
Forum 3580 2986 97 11 486 96.85% 2.21% 96.98%
Classifieds 1987 1587 21 7 372 98.69% 1.85% 98.59%
NewsPortal 1544 1223 54 2 265 95.77% 0.75% 96.37%
JobPortal 3114 2644 33 4 433 98.77% 0.92% 98.81%
Out of the false positives and negatives, total 162 queries (56.4%) were logged as
suspicious queries for the DBA to examine. The overall accuracy of the system
comes out as 98.05%. Interestingly, the accuracy for the Forum and NewsPortal
are 96.85% and 95.77% respectively in the first run, even if they were not used
as honeypots for collecting injection patterns (see Sect. 4.1). This is very encour-
aging and proves that injected queries collected from few honeypot applications
can be used to protect multiple web applications, because input parameters are
used in the WHERE clause of queries in similar manner across web applications.
800
Bookstore
700 Classifieds
Forum
Delay in Milliseconds
600 Job Portal
News Portal
500
400
300
200
100
0 20 40 60 80 100
Number of Concurrent Users
5 Related Work
To the best of our knowledge, document similarity measures have not been so far
proposed specifically for detection of injection attacks at the SQL query level in
the literature. However, a few studies have used similarity measures on HTTP
requests or payloads for classification of web attacks including SQL injection,
which may be considered as related to our work to some extent.
Small et al. [23] used similarity measure and string alignment techniques for
detecting malicious HTTP payloads. At the same time, Gallagher et al. [24] also
used term frequency based similarity measure for classification of HTTP requests
to identify and classify attacks. Ulmer et al. [25] used similarity on HTTP data
for a configurable hardware classifier for web attacks. Later in [26], they proposed
parallel acceleration for the document-similarity classifier on multi-core proces-
sor architecture, focusing mainly on hardware implementation. Choi et al. [27]
used N-gram splitting and classification using Support Vector Machine (SVM)
to detect SQL injection and XSS attacks. However, their approach to extract
N-grams ignores the symbols and operators completely, which are important
syntactic elements in a query. Application of document similarity measures in
conjunction with query transformation for detecting SQL injection attacks is
proposed for the first time in this paper to the best of our belief.
This paper presented a novel approach to detect all types of SQL injection
attacks by applying document similarity measure on normalized queries, imple-
mented in a tool named SQLiDDS. We adopted the strategy to examine only
the WHERE clause part of run-time queries and ignore INSERT queries, which was
based on two interesting observations made during the course of research. The
SQL injection Detection using Document Similarity 13
References
1. OWASP: Top 10 Security Threats 2013. https://www.owasp.org/index.php/
Top_10_2013-A1-Injection (2013) Accessed: 2013-11-15.
2. TrustWave: Executive Summary: Trustwave 2012 Global Security Report. https:
//www.trustwave.com/global-security-report (2012) Accessed: 2013-06-24.
3. Maciejak, D., Lovet, G.: Botnet-Powered Sql Injection Attacks: A Deeper Look
Within. In: Virus Bulletin Conference. (09 2009) 286–288
4. Curtis, S.: Barclays: 97 percent of data breaches still due to SQL injec-
tion. http://news.techworld.com/security/3331283/barclays-97-percent-
of-data-breaches-still-due-to-sql-injection/ (01 2012)
5. Livshits, B., Erlingsson, Ú.: Using Web Application Construction Frameworks to
Protect Against Code Injection Attacks. In: Proceedings of the 2007 Workshop on
Programming Languages and Analysis for Security, ACM (2007) 95–104
6. Boyd, S., Keromytis, A.: SQLrand: Preventing SQL injection attacks. In: Applied
Cryptography and Network Security, Springer (2004) 292–302
7. Johns, M., Beyerlein, C., Giesecke, R., Posegga, J.: Secure Code Generation for
Web Applications. Engineering Secure Software and Systems (2010) 96–113
8. Benedikt, M., Freire, J., Godefroid, P.: VeriWeb: Automatically Testing Dynamic
Web Sites. In: In Proceedings of 11th International World Wide Web Conference
(WWW’2002), Citeseer (2002)
9. Kals, S., Kirda, E., Kruegel, C., Jovanovic, N.: Secubat: A Web Vulnerability
Scanner. In: Proceedings of the 15th International Conference on World Wide
Web, ACM (2006) 247–256
10. Wassermann, G., Yu, D., Chander, A., Dhurjati, D., Inamura, H., Su, Z.: Dy-
namic Test Input Generation for Web Applications. In: Proceedings of the 2008
International Symposium on Software Testing and Analysis, ACM (2008) 249–260
11. Halfond, W., Orso, A.: AMNESIA: Analysis and Monitoring for NEutralizing SQL-
Injection Attacks. In: Proceedings of the 20th IEEE/ACM International Confer-
ence on Automated Software Engineering, ACM (2005) 174–183
14 D. Kar, S. Panigrahi, S. Sundararajan
12. Buehrer, G., Weide, B., Sivilotti, P.: Using Parse Tree Validation to Prevent SQL
Injection Attacks. In: Proceedings of the 5th International Workshop on Software
Engineering and Middleware, ACM (2005) 106–113
13. Bisht, P., Madhusudan, P., Venkatakrishnan, V.: CANDID: Dynamic Candidate
Evaluations for Automatic Prevention of SQL Injection Attacks. ACM Transac-
tions on Information and System Security (TISSEC) 13(2) (2010) 14
14. Lee, I., Jeong, S., Yeo, S., Moon, J.: A Novel Method for SQL Injection Attack
Detection based on Removing SQL Query Attribute Values. Mathematical and
Computer Modelling 55 (2011) 58–68
15. Wang, Y., Li, Z.: SQL Injection Detection via Program Tracing and Machine
Learning. In: Internet and Distributed Computing Systems. Springer (2012) 264–
274
16. Halfond, W., Viegas, J., Orso, A.: A Classification of SQL-injection Attacks and
Countermeasures. In: International Symposium on Secure Software Engineering
(ISSSE). (2006) 12–23
17. Maor, O., Shulman, A.: SQL Injection Signatures Evasion (White paper). Im-
perva, Inc. (04 2004) Online: http://www.issa-sac.org/info_resources/ISSA_
20050519_iMperva_SQLInjection.pdf.
18. Dahse, J.: Exploiting hard filtered SQL Injections. http://websec.wordpress.
com/2010/03/19/exploiting-hard-filtered-sql-injections/ (03 2010)
19. Kar, D., Panigrahi, S.: Prevention of SQL Injection Attack Using Query Trans-
formation and Hashing. In: Proceedings of the 3rd IEEE International Advance
Computing Conference (IACC), IEEE (2013) 1317–1323
20. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval.
Volume 1. Cambridge University Press, Cambridge (2008) Online: http://nlp.
stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
21. Garcı́a Adeva, J.J., Pikatza Atxa, J.M.: Intrusion Detection in Web Applications
using Text Mining. Engg. Appl. of Artificial Intelligence 20(4) (2007) 555–566
22. Liao, Y., Vemuri, V.R.: Using Text Categorization Techniques for Intrusion De-
tection. In: USENIX Security Symposium. Volume 12. (2002) 51–59
23. Small, S., Mason, J., Monrose, F., Provos, N., Stubblefield, A.: To Catch a Preda-
tor: A Natural Language Approach for Eliciting Malicious Payloads. In: USENIX
Security Symposium. (2008) 171–184
24. Gallagher, B., Eliassi-Rad, T.: Classification of HTTP Attacks: A Study on the
ECML/PKDD 2007 Discovery Challenge. In: Center for Advanced Signal and
Image Sciences (CASIS) Workshop. (2008)
25. Ulmer, C., Gokhale, M.: A Configurable-Hardware Document-Similarity Classifier
to Detect Web Attacks. In: Parallel & Distributed Processing, Workshops and Phd
Forum (IPDPSW), 2010 IEEE International Symposium on, IEEE (04 2010) 1–8
26. Ulmer, C., Gokhale, M., Gallagher, B., Top, P., Eliassi-Rad, T.: Massively parallel
acceleration of a document-similarity classifier to detect web attacks. Journal of
Parallel and Distributed Computing 71(2) (2011) 225–235
27. Choi, J., Kim, H., Choi, C., Kim, P.: Efficient Malicious Code Detection Using
N-Gram Analysis and SVM. In: Network-Based Information Systems (NBiS), 2011
14th International Conference on, IEEE (2011) 618–621