Professional Documents
Culture Documents
435
COSE 2205.qxd 01/07/2003 11:51 Page 436
Gonzalo Álvarez Once the taxonomy has been defined and the 1. Mutually exclusive: the categories do not
Gonzalo Álvarez received his role and importance of its categories have been overlap.
M.S. degree in explained, we define a semantic-dependent
2. Exhaustive: taken together, the categories
telecommunications encoding scheme to encode all relevant
engineering from the include all the possibilities.
information contained in Web attacks. The
University of the Basque
encoding scheme removes local redundancy in 3. Unambiguous: clear and precise so that
Country, Spain, in 1995, and
the Ph.D. degree in computer the description of the attacks, thus enabling classification is not uncertain, regardless of
science from the Polytechnic time and memory savings in the processing of who is classifying.
University of Madrid, Spain, the attack information. The vectors (generally
in 2000. He joined the 4. Repeatable: repeated applications result in
of different lengths) obtained in the encoding
Scientific Research Council the same classification, regardless of who is
process can be used in a number of applications,
(CSIC), Spain, in 1995 and has classifying.
worked since then in such as intrusion detection systems or
cryptography, Internet application firewalls, where the classification of 5. Accepted: logical and intuitive so that
security, and chaotic systems. attacks is needed. With such an encoding categories could become generally approved.
He also teaches courses on scheme, the classification techniques that
applied web hacking and 6. Useful: could be used to gain insight into
employ special distance measures (such as, for
defence to private companies the field of inquiry.
example, edit distance [10]) can be used, which
and public organisms, and
audits web applications. reduces both memory consumption (by omitting First, we introduce a novel model of Web
redundancy) and computational effort (by attacks based on the concept of attack life cycle.
Slobodan Petrovic´ simplifying the compression/decompression By attack life cycle we understand a succession
Slobodan Petrovic´ received process). of steps followed by an attacker to carry out
his Ph.D. degree in 1994, from some malicious activity on the Web server, as
the University of Belgrade. The paper is organized as follows. In Section 2,
depicted in Figure 1. The attacker gets through
The title of his thesis was the Web attack properties are thoroughly
an entry point, searching for a vulnerability in
'Algorithms for the discussed and the taxonomy of Web attacks is
computation of edit-distance the Web server or Web application, which
defined. In Section 3 the encoding scheme of
between discrete sequences - might be exploited to defeat some security
the attack descriptions is given and the
analysis, synthesis and service. The vulnerability is realized by an
applications'. His research advantages of such scheme over other possible
action, using some HTTP verb and headers of
interests include coding encoding schemes is explained. Section 4 gives
certain length, directed against a given target
theory, cryptography, pattern some examples of attacks encodings using the
and with a given scope. The attacker might
recognition, and proposed taxonomy and the encoding scheme.
combinatorial optimisation. obtain some privileges that depend on the type
In Section 5, we estimate the coverage of real
From 1986 to 2000, he of attack. Our taxonomy of Web attacks is
attack space by the proposed taxonomy. Section
participated in various based on the attack life cycle defined in this
projects at the Institute of 6 provides some ideas about which applications
way.
Mathematics in Belgrade, might benefit from this taxonomy and
concerning fundamentals of encoding. Section 7 concludes the paper. At every stage of the life cycle we define the
computer science, and following classification criteria or classifiers:
pattern recognition. From
2000 he is at the Scientific
2 Web attack properties 1. Entry point: where the attack gets through.
Research Council (CSIC), A taxonomy is a classification scheme that
Spain, working on the 2. Vulnerability: a weakness in a system
partitions a body of knowledge and defines the
projects 'Cryptographic allowing unauthorized action.
relationship of the objects. Classification is the
Protection of Copyright in
Digital Networks' and process of using a taxonomy for separating and 3. Service (under threat): security service
'Application of Intelligent ordering [4]. According to [11], satisfactory threatened by the attack.
Mobile Agents in Intrusion taxonomies have classification categories with
Detection Systems'. 4. Action: actual attack against the Web
the following characteristics:
server exploiting the vulnerability.
436
COSE 2205.qxd 01/07/2003 11:51 Page 437
5. Length: the length of the arguments passed 2.1.1 Web server software attacks
to the HTTP request. All Web server software, regardless of platform
or manufacturer, unintentionally hides a
6. HTTP element: verbs and headers needed
number of vulnerabilities, which allow the
to perform the attack.
application to be used in a different way than
7. Target: the aim of the attack. intended. Many of these vulnerabilities are
disclosed to the public, for example published in
8. Scope: impact of the attack on the Web
security forums and bulletins. Upon notification
server.
of the vulnerability, the manufacturer usually
9. Privileges: privileges obtained by the releases a patch or service pack which should
attacker after the successful completion of correct the error. In the meantime, since the
the attack. patch is released until all servers are correctly
patched, many vulnerable servers exist.
In the next subsections each of these criteria
are covered in detail and their relevance 2.1.2 Web application attacks
explained. Web application-level attacks refer to the
vulnerabilities inherent in the code of a Web
2.1 Entry point
application itself, regardless of the technology
The fact that a Web application is in which it is implemented or the security of
successfully attacked usually means that there the Web server/back end database on which it
is a vulnerability that is exploited by the is built [12]. Attacks against Web-based mail are
attacker. This vulnerability might be found in also included in this category.
the Web server software or in the Web
The origin of these vulnerabilities may be errors
application code itself. Thus, according to the
in HTML forms, client-side scripts, server-side
entry point of the attack, we distinguish
scripts (.asp, .jsp, .php, .pl, etc.), business logic
between Web server software attacks and Web
objects (COM, COM+, CORBA, etc.), SQL
application attacks.
sentences processing, etc.
437
COSE 2205.qxd 01/07/2003 11:51 Page 438
438
COSE 2205.qxd 01/07/2003 11:51 Page 439
using non-default paths, etc.) many Section 2.2.4), server platform errors, and SQL
vulnerabilities can arise. injection techniques (see Section 2.2.1). Their
goal is to execute database or operating system
Moreover, there are Web servers, whose default
commands and access resources not allowed to
configuration exposes a number of known
the unprivileged user. Eventually, a well-
directories, sample applications, user accounts,
implemented privilege escalation attack will
etc. Many vulnerabilities have been discovered
result in the unauthorized increase in the
and exploited over time in these elements too.
domain of access, even ensuring the attacker
2.3 Service under threat administrator or root privileges.
439
COSE 2205.qxd 01/07/2003 11:51 Page 440
machines connected to the Internet. In the data continue to be available, but in altered
second case, the attack will exploit form. Modifying data on the client side, such as
vulnerabilities in the Web server or the Web the value of the URL, a cookie, or a hidden
application to stop normal service. field in a form, does not fall under this category
because the vast majority of Web attacks imply
2.3.6 Auditing
this manipulation in some way or another.
The auditing services provide the system
administrator with the means to record security- 2.4.3 Delete
relevant information, which can be analyzed to Delete is an action to remove or render an asset
detect potential and actual violations of the in the server irretrievable by other legitimate
system security policy. Auditing, or users. Examples include deleting database
accountability, has three functions: event objects and server files.
detection, information collection, and
2.4.4 Fabricate
information processing.
Fabricate is an action to insert counterfeit
Some attacks manage to pass undetected by objects into the system. Examples include
preventing their being logged by the auditing adding hacker toolkits to the server file system,
system. creating new user accounts, or inserting records
into a database table.
2.4 Action
2.4.5 Impersonate
Most real-world attacks suffered by Web servers
Impersonate is an action to masquerade an
are variants or concatenations of a few basic
illegal user as a legitimate one. Examples
actions or attack classes. In this subsection we
include authentication tickets reuse or theft.
try to reference those primary attack classes
which account for almost all everyday attacks. 2.4.6 Bypass
Bypass is an action to avoid a control
We distinguish among actions aimed at three
mechanism by using an alternative method to
different objectives: server data, user
access a target. Examples include defeating a
authentication, and Web server. Actions
forms based Web authorization mechanism in
directed against data include read, modify, delete,
order to access protected Web resources, such as
and fabricate. Actions directed against
multimedia files, by simply following a link.
authentication include impersonate, bypass, and
search. Actions directed against the Web server 2.4.7 Search
include interrupt, probe, and other. Search is an action to find valid user
authentication information. Examples include
2.4.1 Read
brute-force attacks, which try different
Read is an action to obtain the content of the
combinations of login/password pairs, or
data contained within a file, database record, or
repeatedly forging possible authentication
other data medium stored in the Web server.
tickets or session ID’s to simulate an already
Reading does not alter the integrity of the data
validated user.
read. Examples include viewing source code
files, and illicitly copying database tables. 2.4.8 Interrupt
Interrupt is an action to cause a server to stop
2.4.2 Modify
operating or offering a service. The most
Modify is an action to alter data. We limit our
common form of interruption of service attacks
definition to tampering with data on the server
are denial-of-service (DoS) attacks. Their
side. Examples include changing a database
primary goal is to deny the legitimate users
record, or changing the contents of a file. The
access to a particular resource or service.
440
COSE 2205.qxd 01/07/2003 11:51 Page 441
441
COSE 2205.qxd 01/07/2003 11:51 Page 442
2.8.1 Web application privileges. Most Web attacks do not allow the
If the attack succeeds, only the application data attacker to escalate privileges. They run under a
and functionality will be affected, but not the restricted account in the server or database.
operating system resources. These attacks are However, misconfiguration of the operating
typically aimed at Web pages (e.g., obtaining system access control lists, Web server
and/or modifying source code), Web users (e.g., permissions, and database users could enable an
stealing cookies or passwords using Cross-Site attacker to reach administrative privileges.
Scripting attacks, Web-based mail attacks), and There are three categories of users involved in
Web data (e.g., viewing, changing and/or an attack:
deleting information in database records).
• The Web application user.
2.8.2 Platform
• The database user.
Under this attack, the target is beyond the Web
application, aimed at the platform. The attacker • The operating system level user.
usually seeks after arbitrary command
Hence, with regard to the privilege obtained by
execution, manipulation of machine accounts,
the attack, we can distinguish between
tampering with the host’s services, obtaining
unprivileged and administrative attacks. This
network information, etc. The Web server is
category is only applicable when the objective
used as a mere portal to gain access to the
of the attack is obtaining access as a certain
internal network.
user, i.e., attacks against the authentication
2.9 Scope service (see Section 2.3.1).
442
COSE 2205.qxd 01/07/2003 11:51 Page 443
information about attacks, allowing us to because only the information significant for the
perform an analysis in order to decide on their particular attack is retained. Besides its
severity. advantage over the general data compression
schemes considering the efficiency of the short
To reduce the amount of the information
messages encoding, the use of this type of
recorded on the media, the data can be
semantic dependent encoding makes possible
compressed, using some of the source coding
the direct use of classification and clustering
techniques. But the use of general compression
techniques that are often needed in the
techniques and algorithms in this case has
implementations, since the classification and
serious drawbacks. These drawbacks depend on
clustering can be performed without
the class of the method. Here we enumerate
decompressing [10].
some of them.
To encode the descriptions of the attacks using
For static defined word schemes to be
the semantic dependent method introduced
implemented (e.g. [27-31]) the knowledge of
above, a range of positive integers is assigned to
the probabilities of message classes is needed in
each of the attack properties discussed, in the
advance. The problem is that if we treat the
following way.
descriptions of the attacks as messages, the
probability of their appearance varies with time, (1) Entry point (1 bit of information)
as new attacks are invented and the remedies 0 - Web server software (ISAPI filters, Perl
are published. Other definitions of messages in modules, etc.)
this case would be too general and would not 1 - Web application (HTML, server-side
lead to a sufficient compression ratio. and client-side scripts, server components,
SQL sentences, etc.)
To adapt to the changes of the message
(2) Vulnerability (3 bits of information)
probabilities, the adaptive Huffman coding can
0 - Code Injection (SQL, JavaScript, cross-
be used (e.g. the FGK algorithm [32] or the
site scripting, etc.)
Vitter algorithm [33]). But there is no guarantee
1 - Canonicalization
that the compression ratio achieved by these
2 - HTML manipulation
methods is satisfactory, since these encodings
3 - Overflows
are often outperformed by the static methods
4 - Misconfiguration (default directories,
[33].
sample applications, guest accounts, etc.)
The main drawback of the free-parse methods, X - Not applicable
such as Ziv-Lempel [34] is that they perform (3) Service (under threat) (3 bits of
very badly when short messages are encoded. As information)
in the case of the static word schemes, other 0 - Authentication
definitions of messages would not lead to an 1 - Authorization
efficient compression either. 2 - Confidentiality
3 - Integrity
Having in mind the specific type of local
4 - Availability
redundancy in the descriptions of the attacks
5 - Auditing
(i.e. the descriptions of some of the attacks
(4) Action (4 bits of information)
include some of the attack properties, whereas
0 - Read
the descriptions of the others do not), we
1 - Modify
propose a semantic dependent data compression
2 - Delete
method that makes use of different-length
3 - Fabricate
vectors. The vectors have different lengths
4 - Impersonate
443
COSE 2205.qxd 01/07/2003 11:51 Page 444
Each property requires a certain number of bits 4. The attack deletes information from the
to encode its information. When only one bit is server, thus the action is delete (2).
required, it means that the property can take
5. The HTTP request has normal length (0).
one of two possible values, but not both.
However, when the property can take some 6. The HTTP verb used is GET (0).
different values simultaneously, as many bits as
7. The HTTP Header used is irrelevant for the
the number of possible values are required. This
attack (X).
only happens with the property (7).
444
COSE 2205.qxd 01/07/2003 11:51 Page 445
8. The target of attack is the Web application 6. The HTTP verb used is GET (0).
(0) because the attacker is deleting the Web
7. The HTTP Header used is irrelevant for the
pages stored on the server.
attack (X).
9. The scope of the attack is global (1),
8. The target is the Web application (0)
because every user of the Web service will
because the attacker is obtaining data at the
be affected by it.
application level, not at the platform level.
10. The attacker gains access to the server’s file In this example, he is reading another user’s
system under the IUSR\_MachineName personal information.
identity, which corresponds to an unprivil-
9. The scope of this attack is local (0), because
eged though dangerous user account in
it affects only a user at a time.
Windows machines. Thus, it is marked as 0.
10. The attacker is not gaining access to the
As a result of the encoding process, the
application, database or server, and, as a
following vector is obtained:
consequence, this category does not apply
Vector: {0, 1, 3, 4, 0, 0, X, 0, 1, 0} (X).
As the second example, let us consider the As a result of the encoding process, the
following attack: following vector is obtained:
GET /product.jsp?id=10&title
Vector: {1, 0, 2, 0, 0, 0, X, 0, 0, X}
=<script>w=window.open(
‘http://www.attacker.com/read.cg As the third example, let us consider the
i?PAN=’+document.forms[0].PAN.va
following attack:
lue); w.close();</script>
GET /prod.asp?id=1;exec
The page blindly displays as its title the xp_cmdshell ‘net user bob h6q2
argument of a parameter passed in the URL. /add’—
Hence, JavaScript code can be injected, The Web application reads the input (the value
opening the opportunity for a Cross-Site of the parameter id) and passes it to the
Scripting attack. In this case, the credit card database engine, allowing for SQL injection
number entered by the victim is sent to the attacks. The attacker exploits this vulnerability
attacker’s server. by cheating the application into executing a
1. The entry point is the Web application (1) SQL Server extended procedure which executes
because the attack exploits a lack of input any command passed as argument. In this
validation to inject a script. This is a defect example, the attacker adds himself to the OS
in the application, not in the Web server users.
itself. 1. The entry point is the Web application (1)
2. It is a cross-site scripting vulnerability, one because the attack exploits poor input
of the possible code injection vulnerabilities validation to inject SQL commands. This is
(0). a defect in the application, not in the Web
server itself.
3. It is an attack against confidentiality (2)
because the attacker is stealing some other 2. It is a SQL injection vulnerability, one of
user’s credit card number. the possible code injection vulnerabilities
(0).
4. The attacker is reading information (0).
3. It is an attack against authorization (1)
5. The HTTP request has normal length (0).
445
COSE 2205.qxd 01/07/2003 11:51 Page 446
because the attacker is bypassing the shared by all servers of the same version,
administrator’s configuration to add himself regardless of the Web application running
to the system’s users list (net user bob on top.
h6q2 /add).
2. It is an overflow attack, because a very long
4. The attacker is fabricating information sequence is used, much longer than
because he is creating a new user account expected by the server software (3).
(3).
3. The attack is directed against availability
5. The HTTP request has normal length (0). (4) because the server stops functioning
after the attack.
6. The HTTP verb used is GET (0).
4. The attacker is interrupting the normal
7. The HTTP Header used is irrelevant for the
operation of the service (7).
attack (X).
5. The HTTP request is unusually long (1).
8. The target is the server platform (1) because
the attacker has added himself to the 6. The HTTP verb used is GET (0).
operating system’s users list. The attack is
7. The HTTP Header used is irrelevant for the
not directed against the Web application.
attack (X).
Instead, it exploits a vulnerability in the
application to gain access to the underlying 8. The target is the Web application (0)
operating system. because the attacker is not obtaining any
access over the underlying server platform,
9. This attack has no direct effect on the Web
but limits to disrupt the normal operation of
users. As a consequence it is marked as X.
the Web application.
10. The extended stored procedure
9. This attack affects all users (1), since they
xp_cmdshell is running under the
will not be able to access the service.
administrator account’s identity, and as a
consequence the attacker gains 10. The attacker is not gaining access to the
administrative access (1). application, database or server, and, as a
consequence, this category does not apply
As a result of the encoding process, the
(X).
following vector is obtained:
As a result of the encoding process, the
Vector: {1, 0, 1, 3, 0, 0, X, 1, X, 1}
following vector is obtained:
As the last example, let us consider the
Vector: {0, 3, 4, 7, 1, 0, X, 0, 1, X}
following attack:
GET /dir/[../](repeated approx In the ensuing process of encoding, the letters X
1344 times) are omitted, leaving different-length vectors. In
such a way, memory is saved.
According to CVE-2001-0252, iPlanet
Enterprise Server 4.1 allows remote attackers to
cause a denial-of-service via a long HTTP GET
5 Coverage of real attacks
request that contains many ‘’/../’’ (dot dot)
space
sequences. The space of real attacks is unlimited. On the
one hand, new vulnerabilities are discovered
1. The entry point is the Web server software
every day. On the other hand, there exist
(0) because the attack exploits a problem in
infinite variations of some attacks such as SQL
the Web server itself. This problem is
446
COSE 2205.qxd 01/07/2003 11:51 Page 447
injection, buffer overflow or cookie networked computer systems in real time [36].
manipulation. A traditional, non-heuristically based IDS
consists of three functional components:
In order to quantitatively test how large a
fraction of new vulnerabilities the taxonomy • A monitoring component, such as a packet
covers, all CVE reported Web attacks against capturer, which collects traffic data.
Microsoft’s Internet Information Server (one of
• An inference component, which analyzes
the most popular Web servers nowadays) have
the captured data to determine whether it
been successfully encoded and classified. For the
corresponds to normal activity or malicious
sake of completeness, some other vulnerabilities
activity.
affecting other Web servers (iPlanet, Apache,
Oracle, BEA) have been randomly chosen from • An alerting component, which generates a
the CVE database, encoded and classified. response when an attack has been detected.
This response can be passive (such as
Let us now consider the type of attacks with
writing an entry in an event log) or active
infinite number of instances. We explain in
(such as changing configuration rules in the
detail the SQL injection attack, but similar
firewall to block the attacker’s IP address).
reasoning can be applied to other types of
attacks with infinite number of instances, such One of the biggest problems faced by these
as buffer overflow, cross-site scripting, cookie systems is the huge amount of alerts that might
manipulation, etc. When the SQL sentence be generated in a heavily attacked environment
varies, some categories in the encoding of the in a matter of hours. It is impossible for a
SQL injection attack remain the same (Entry human operator to analyze so many reports and
point, Vulnerability, Length, HTTP Verb, decide on the severity of the detected attacks to
HTTP Headers). The categories that are determine the action to take. This taxonomy
changed can sustain the variability of the SQL can be used in the following way: first, the
language (Service under threat, Action, Target, attacks are encoded by means of the proposed
Scope, Privileges), since it is possible to encoding scheme. Next, the vectors originated
semantically recognize SQL sentences, and the from the encoder are processed using pattern
category Action of our taxonomy is exhaustive. recognition or information extraction
Thus, any different SQL sentence embedded in techniques (clustering algorithms [37],
the SQL injection attack would be comprised supervised learning [38], etc.) in order to
by our taxonomy affecting the already pinpoint the most dangerous attacks, and
mentioned categories, which means that the analyze attack trends throughout time.
proposed taxonomy is exhaustive in this case.
6.2 Application-level firewall
6 Possible applications Another approach to detect and prevent Web
attacks consists of using an application-level
This taxonomy and the corresponding attack
firewall or, more specifically, Web application
encoding vectors are useful in a number of
firewalls [39, 40]. A traditional firewall provides
applications, especially in intrusion detection
protection only at the network level, with
systems and in application-level firewalls.
minimal or no application awareness [41]. On
6.1 Intrusion Detection Systems the other hand, application-level firewalls are
(IDS) capable of processing data at the application
level as well as decrypting SSL connections. An
An intrusion detection system (IDS) detects
application-layer solution works within the
and reports attempts to misuse or break into
application that it is protecting, inspecting
447
COSE 2205.qxd 01/07/2003 11:51 Page 448
448
COSE 2205.qxd 01/07/2003 11:51 Page 449
[24]wget. http://www.gnu.org/software/wget/wget.html. [34]Ziv, J. and Lempel, A., 1977. A universal algorithm for
[25]Whisker. sequential data compression. IEEE Trans. Inform. Theory,
http://www.wiretrip.net/rfp/p/doc.asp/i2/d21.htm. Vol. 23(3), May 1977.
449