You are on page 1of 15

COSE 2205.

qxd 01/07/2003 11:51 Page 435

A new taxonomy of Web


attacks suitable for efficient
encoding
Abstract attacks is increasing in parallel: theft of private Gonzalo Álvareza ,
information, defacing of homepages, denial-of-
Web attacks, i.e. attacks exclusively using the service, worm spreading, and fraud, are but a
Slobodan Petrović
HTTP/HTTPS protocol, are rapidly becoming few of the most common and frequent attacks Instituto de Física Aplicada,
one of the fundamental threats for information in cyberspace [1, 2]. Consejo Superior de
systems connected to the Internet. When the Investigaciones Científicas,
attacks suffered by Web servers through the By Web attacks, we understand network attacks Serrano 144 - 28006 Madrid,
years are analyzed, it is observed that most of exclusively using the HTTP/HTTPS protocol. Spain
them are very similar, using a reduced number When the Web attacks recorded through the
of attacking techniques. It is generally agreed years are analyzed, it is observed that most of
that classification can help designers and them are recurrent attacks. In fact, they
programmers to better understand attacks and correspond to a limited number of types of
build more secure applications. As an effort in attacks. Thus, it is generally agreed that
this direction, a new taxonomy of Web attacks classification can help designers, programmers,
is proposed in this paper, with the objective of and security analysts to better understand
obtaining a useful reference framework for attacks and build more secure applications.
security applications. The use of the taxonomy In an effort to create a common reference
is illustrated by means of multiplatform real language for security analysts, a number of
world Web attack examples. Along with this taxonomies of computer attacks and
taxonomy, important features of each attack vulnerabilities have appeared in recent years [3-
category are discussed. A semantic-dependent 9]. The shortcoming of such taxonomies is that
Web attack encoding scheme is also defined they often include categories unsuitable for the
that, together with the taxonomy, can be used classification of Web attacks. Even when their
to process the attack information with low categories can be used in a Web context, they
time and memory consumption. Applications of fail to cover all the subtleties of Web attacks.
the taxonomy and the encoding scheme are For example, entry point, target, HTTP Verb,
described, such as intrusion detection systems and HTTP Header are Web-specific categories
and application firewalls. that we consider important for a more accurate
Keywords: Web attacks; Taxonomy; Source classification of Web attacks, and these are not
coding; Intrusion detection; Application covered by general taxonomies. In addition,
firewalls some categories that can also be met in general
taxonomies, such as vulnerability, need to take
Web-specific values (e.g. Code injection,
1 Introduction
HTML manipulation).
With the increasing use of Internet as a
In this paper we propose a taxonomy of Web
commercial channel, there are a growing
attacks, taking into account a number of
number of websites deployed to share
important features of each attack category. The
information, offer online services, sell all sorts
role of these features as well as their importance
of goods, digital or physical, distribute news and
for each of the attack categories is discussed Computers & Security
articles, etc. On the other hand, the number of Vol 22, No 5, pp 435-449, 2003
thoroughly. The proposed taxonomy represents Copyright ©2003 Elsevier Ltd
an effort to cover known attacks and also some Printed in Great Britain
All rights reserved
a Corresponding author: email: gonzalo@iec.csic.es attacks that might appear in the future. 0167-4048/03

435
COSE 2205.qxd 01/07/2003 11:51 Page 436

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

Gonzalo Álvarez Once the taxonomy has been defined and the 1. Mutually exclusive: the categories do not
Gonzalo Álvarez received his role and importance of its categories have been overlap.
M.S. degree in explained, we define a semantic-dependent
2. Exhaustive: taken together, the categories
telecommunications encoding scheme to encode all relevant
engineering from the include all the possibilities.
information contained in Web attacks. The
University of the Basque
encoding scheme removes local redundancy in 3. Unambiguous: clear and precise so that
Country, Spain, in 1995, and
the Ph.D. degree in computer the description of the attacks, thus enabling classification is not uncertain, regardless of
science from the Polytechnic time and memory savings in the processing of who is classifying.
University of Madrid, Spain, the attack information. The vectors (generally
in 2000. He joined the 4. Repeatable: repeated applications result in
of different lengths) obtained in the encoding
Scientific Research Council the same classification, regardless of who is
process can be used in a number of applications,
(CSIC), Spain, in 1995 and has classifying.
worked since then in such as intrusion detection systems or
cryptography, Internet application firewalls, where the classification of 5. Accepted: logical and intuitive so that
security, and chaotic systems. attacks is needed. With such an encoding categories could become generally approved.
He also teaches courses on scheme, the classification techniques that
applied web hacking and 6. Useful: could be used to gain insight into
employ special distance measures (such as, for
defence to private companies the field of inquiry.
example, edit distance [10]) can be used, which
and public organisms, and
audits web applications. reduces both memory consumption (by omitting First, we introduce a novel model of Web
redundancy) and computational effort (by attacks based on the concept of attack life cycle.
Slobodan Petrovic´ simplifying the compression/decompression By attack life cycle we understand a succession
Slobodan Petrovic´ received process). of steps followed by an attacker to carry out
his Ph.D. degree in 1994, from some malicious activity on the Web server, as
the University of Belgrade. The paper is organized as follows. In Section 2,
depicted in Figure 1. The attacker gets through
The title of his thesis was the Web attack properties are thoroughly
an entry point, searching for a vulnerability in
'Algorithms for the discussed and the taxonomy of Web attacks is
computation of edit-distance the Web server or Web application, which
defined. In Section 3 the encoding scheme of
between discrete sequences - might be exploited to defeat some security
the attack descriptions is given and the
analysis, synthesis and service. The vulnerability is realized by an
applications'. His research advantages of such scheme over other possible
action, using some HTTP verb and headers of
interests include coding encoding schemes is explained. Section 4 gives
certain length, directed against a given target
theory, cryptography, pattern some examples of attacks encodings using the
and with a given scope. The attacker might
recognition, and proposed taxonomy and the encoding scheme.
combinatorial optimisation. obtain some privileges that depend on the type
In Section 5, we estimate the coverage of real
From 1986 to 2000, he of attack. Our taxonomy of Web attacks is
attack space by the proposed taxonomy. Section
participated in various based on the attack life cycle defined in this
projects at the Institute of 6 provides some ideas about which applications
way.
Mathematics in Belgrade, might benefit from this taxonomy and
concerning fundamentals of encoding. Section 7 concludes the paper. At every stage of the life cycle we define the
computer science, and following classification criteria or classifiers:
pattern recognition. From
2000 he is at the Scientific
2 Web attack properties 1. Entry point: where the attack gets through.
Research Council (CSIC), A taxonomy is a classification scheme that
Spain, working on the 2. Vulnerability: a weakness in a system
partitions a body of knowledge and defines the
projects 'Cryptographic allowing unauthorized action.
relationship of the objects. Classification is the
Protection of Copyright in
Digital Networks' and process of using a taxonomy for separating and 3. Service (under threat): security service
'Application of Intelligent ordering [4]. According to [11], satisfactory threatened by the attack.
Mobile Agents in Intrusion taxonomies have classification categories with
Detection Systems'. 4. Action: actual attack against the Web
the following characteristics:
server exploiting the vulnerability.

436
COSE 2205.qxd 01/07/2003 11:51 Page 437

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

Figure 1: Taxonomy of Web attacks.

5. Length: the length of the arguments passed 2.1.1 Web server software attacks
to the HTTP request. All Web server software, regardless of platform
or manufacturer, unintentionally hides a
6. HTTP element: verbs and headers needed
number of vulnerabilities, which allow the
to perform the attack.
application to be used in a different way than
7. Target: the aim of the attack. intended. Many of these vulnerabilities are
disclosed to the public, for example published in
8. Scope: impact of the attack on the Web
security forums and bulletins. Upon notification
server.
of the vulnerability, the manufacturer usually
9. Privileges: privileges obtained by the releases a patch or service pack which should
attacker after the successful completion of correct the error. In the meantime, since the
the attack. patch is released until all servers are correctly
patched, many vulnerable servers exist.
In the next subsections each of these criteria
are covered in detail and their relevance 2.1.2 Web application attacks
explained. Web application-level attacks refer to the
vulnerabilities inherent in the code of a Web
2.1 Entry point
application itself, regardless of the technology
The fact that a Web application is in which it is implemented or the security of
successfully attacked usually means that there the Web server/back end database on which it
is a vulnerability that is exploited by the is built [12]. Attacks against Web-based mail are
attacker. This vulnerability might be found in also included in this category.
the Web server software or in the Web
The origin of these vulnerabilities may be errors
application code itself. Thus, according to the
in HTML forms, client-side scripts, server-side
entry point of the attack, we distinguish
scripts (.asp, .jsp, .php, .pl, etc.), business logic
between Web server software attacks and Web
objects (COM, COM+, CORBA, etc.), SQL
application attacks.
sentences processing, etc.

437
COSE 2205.qxd 01/07/2003 11:51 Page 438

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

2.2 Vulnerability The most common way of exploiting


canonicalization issues is in the form of path
In order to reach the desired result, an attacker
traversal attacks, which allow a malicious user to
must take advantage of a computer or network
execute commands or view data outside of the
vulnerability. A vulnerability is a weakness in a
intended target path. Path traversal
system allowing unauthorized action. We define
vulnerabilities arise normally from unchecked
the following vulnerabilities in Web
URL input parameters, cookies, or HTTP
applications.
request headers. In most cases, a path traversal
2.2.1 Code injection attack inherits the permissions of the
Code injection vulnerabilities allow for application being executed which may then
injecting arbitrary user-chosen code into a page. access any file allowable using those
These vulnerabilities arise from none existent permissions.
or poorly designed input validation routines on
2.2.3 HTML manipulation
the server-side. The main categories of code
HTML manipulation is a vulnerability, which
injection are:
allows a malicious user to modify data sent
• Script injection: The attack involves Web between the client (Web browser) and server
servers that dynamically generate HTML (Web application), to which the user was not
pages. If these servers embed browser input intended to have direct access. Parameter
in the dynamic pages that they send back to manipulation is often accomplished through:
the browser, these servers can be
• URL Query Strings
manipulated to include content in the
dynamic pages that will allow malicious • Form Fields
scripts to be executed. This attack does not
• Cookies
modify website content; rather, it inserts
new, malicious script that can execute at Although it is neglected too often, parameter
the victim’s browser in the information manipulation can be prevented with good input
context associated with a trusted server. validation techniques on the server side.
Cross-Site Scripting is the most common
2.2.4 Overflows
form of script injection [13, 14].
Buffer overflows have been causing serious
• SQL injection: An attacker creates or alters security problems for decades [19]. When a
existing SQL commands to gain access to program writes past the bounds of a fixed-length
unintended data or even the ability to buffer previously allocated in memory, this is
execute system level commands on the host. called a buffer overflow. Reading or writing past
See for example [15-17]. the end of a buffer can cause a number of
diverse behaviors: programs can act in strange
• XPath injection: XPath is a language for
ways or fail completely.
addressing parts of an XML document. An
attacker can modify search strings to access At best, a buffer overflow can stop a server or
unauthorized data in XML documents. service where it happens. At worst, it can
allow for arbitrary code execution with the
2.2.2 Canonicalization
same privileges as the program where it is
Canonicalization vulnerabilities occur when an
present.
application makes a security decision based on a
name (a filename, a folder name, a Web address), 2.2.5 Misconfiguration
without having in mind the fact that the name If the platform and Web server are not correctly
may be expressed in more than one way [18]. configured (carefully assigning file permissions,

438
COSE 2205.qxd 01/07/2003 11:51 Page 439

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

using non-default paths, etc.) many Section 2.2.4), server platform errors, and SQL
vulnerabilities can arise. injection techniques (see Section 2.2.1). Their
goal is to execute database or operating system
Moreover, there are Web servers, whose default
commands and access resources not allowed to
configuration exposes a number of known
the unprivileged user. Eventually, a well-
directories, sample applications, user accounts,
implemented privilege escalation attack will
etc. Many vulnerabilities have been discovered
result in the unauthorized increase in the
and exploited over time in these elements too.
domain of access, even ensuring the attacker
2.3 Service under threat administrator or root privileges.

There is a widely accepted set of services or 2.3.3 Confidentiality


requisites that all systems must satisfy in order The goal of confidentiality is to protect the
to be considered secure [20, 21, 9]. These information from being disclosed or revealed to
services include authentication, confidentiality, entities not authorized to have that information.
integrity, availability, access control, and
Confidentiality attacks show private
auditing.
information contained in files (source code,
According to the security property under threat, other users’ information, etc.), or in database
we distinguish between the following attacks. tables (credit card numbers, personal
information, etc.). Depending on the nature of
2.3.1 Authentication
the information disclosed, the attacker can
The goal of authentication is to determine
penetrate further into the attacked system or
whether someone or something is, in fact, who
get private information about the targeted
or what it is declared to be. Authentication is
company or its customers.
commonly performed through the use of
passwords. Knowledge of the password is 2.3.4 Integrity
assumed to guarantee that the user is authentic. Data integrity services act as safeguards against
accidental or malicious tampering or
In a Web context, authentication attacks bypass
modification of data. Changing the value of a
identification controls by using a number of
data item includes inserting additional data or
different techniques, such as session hijacking,
deleting, modifying, or reordering parts of the
session replay, session fixation [22], identity
existing data.
spoofing, valid credentials theft, authentication
module subversion, and brute-forcing. As a Common attacks against data integrity include
result of a successful attack, an illegitimate user database records manipulation, cookie
will be identified as a legitimate one. poisoning, Web page alteration (defacing), and
forms field modification.
2.3.2 Authorization
The goal of authorization is to permit 2.3.5 Availability
authenticated users to do or have something. In The goal of service availability is to ensure that
multi-user computer systems, a system users are not unduly denied access to
administrator defines for the system which users information and resources.
are allowed access to the system and what
Denial of service (DoS) is the most frequent
privileges of use (such as access to which file
attack against availability (see Section 2.4.8).
directories, hours of access, amount of allocated
DoS attacks can be conducted against the
storage space, and so forth).
machine hosting the Web server or against the
The most common authorization attacks bypass Web server itself. In the first case, the attack
access control by means of buffer overflows (see will use techniques equally valid for all

439
COSE 2205.qxd 01/07/2003 11:51 Page 440

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

machines connected to the Internet. In the data continue to be available, but in altered
second case, the attack will exploit form. Modifying data on the client side, such as
vulnerabilities in the Web server or the Web the value of the URL, a cookie, or a hidden
application to stop normal service. field in a form, does not fall under this category
because the vast majority of Web attacks imply
2.3.6 Auditing
this manipulation in some way or another.
The auditing services provide the system
administrator with the means to record security- 2.4.3 Delete
relevant information, which can be analyzed to Delete is an action to remove or render an asset
detect potential and actual violations of the in the server irretrievable by other legitimate
system security policy. Auditing, or users. Examples include deleting database
accountability, has three functions: event objects and server files.
detection, information collection, and
2.4.4 Fabricate
information processing.
Fabricate is an action to insert counterfeit
Some attacks manage to pass undetected by objects into the system. Examples include
preventing their being logged by the auditing adding hacker toolkits to the server file system,
system. creating new user accounts, or inserting records
into a database table.
2.4 Action
2.4.5 Impersonate
Most real-world attacks suffered by Web servers
Impersonate is an action to masquerade an
are variants or concatenations of a few basic
illegal user as a legitimate one. Examples
actions or attack classes. In this subsection we
include authentication tickets reuse or theft.
try to reference those primary attack classes
which account for almost all everyday attacks. 2.4.6 Bypass
Bypass is an action to avoid a control
We distinguish among actions aimed at three
mechanism by using an alternative method to
different objectives: server data, user
access a target. Examples include defeating a
authentication, and Web server. Actions
forms based Web authorization mechanism in
directed against data include read, modify, delete,
order to access protected Web resources, such as
and fabricate. Actions directed against
multimedia files, by simply following a link.
authentication include impersonate, bypass, and
search. Actions directed against the Web server 2.4.7 Search
include interrupt, probe, and other. Search is an action to find valid user
authentication information. Examples include
2.4.1 Read
brute-force attacks, which try different
Read is an action to obtain the content of the
combinations of login/password pairs, or
data contained within a file, database record, or
repeatedly forging possible authentication
other data medium stored in the Web server.
tickets or session ID’s to simulate an already
Reading does not alter the integrity of the data
validated user.
read. Examples include viewing source code
files, and illicitly copying database tables. 2.4.8 Interrupt
Interrupt is an action to cause a server to stop
2.4.2 Modify
operating or offering a service. The most
Modify is an action to alter data. We limit our
common form of interruption of service attacks
definition to tampering with data on the server
are denial-of-service (DoS) attacks. Their
side. Examples include changing a database
primary goal is to deny the legitimate users
record, or changing the contents of a file. The
access to a particular resource or service.

440
COSE 2205.qxd 01/07/2003 11:51 Page 441

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

2.4.9 Probe 2.5.2 Unusually long


Probe is an attempt to determine the Buffer overflows require passing arguments of
characteristics or vulnerabilities of a specific sufficient length to fill a memory buffer
target. Attackers usually begin by gathering allocated to a particular input, along with some
information about the target, before stepping additional data that are written into memory
into more invasive activities. For instance, the outside the buffer. When these additional data
use of uncommon HTTP headers, such as are missing, the buffer overflow usually results
HEAD, or website mirroring tools, such as in a Denial-of-Service (DoS) attack (see
Teleport Pro [23] or wget [24], are indicators Section 2.4.8), whereas when it is present and
that an attack might be under way. More accurately crafted, might result in a system
aggressive tools, such as Whisker [25] or Nikto command execution.
[26] are more than probing tools, since they
Unexpectedly long input arguments very likely
scan the Web server in search of vulnerabilities.
imply a buffer overflow attack attempt.
2.4.10 Other
2.6 HTTP verb
When an attacker executes code, but it is
impossible to ascertain what the command In order to be described using our taxonomy, an
executed is, we consider the action carried as attack must use the HTTP/HTTPS protocol.
other. Examples include a buffer overflow with a An HTTP request consists of a verb and a
command execution payload or .bat or .cmd group of headers. The verb can be any amongst
files executed via URL name. Usually, these GET, POST, HEAD, SEARCH, PROPFIND,
actions are considered a threat against PUT, DELETE, etc.
authorization, because although the outcome of
2.7 HTTP header
such execution cannot be known, we assume it
is unauthorized. Different attacks use different headers. It is
obvious that this category is not mutually
2.5 Length
exclusive, but actually is unambiguous and we
Buffer overflows are the most common form of consider it valid to be included in our global
security attack today, not only in Web taxonomy, since it provides useful information.
applications but also in all Internet applications Most common headers seen in attacks are Host,
and services. They are the easiest to exploit Cookie, Referer, Translate, etc. This value is
with the most devastating consequences, usually included in the vector only when it is relevant
resulting in complete takeover of the attacked to the attack. Otherwise, when irrelevant, it is
host. listed as X.

Buffer overflows often need a very long data 2.8 Target


input to work. Hence, based on the length of the
Not all Web attacks aim at the same
attack string, we distinguish between common-
target. Some attackers might be interested
length and unusually long attacks.
in taking control over the Web server
2.5.1 Common length machine and extending their attack further
Attacks which are not based on exploiting a into the server’s network. Others might be
buffer overflow seldom submit arguments of a interested only in obtaining database records,
length greater than a certain threshold that can modifying some pages or misleading other
be experimentally defined for a particular Web users. Based on the target of the attack, we
server. Thus, most attacks fall under this distinguish between application and platform
category. attacks.

441
COSE 2205.qxd 01/07/2003 11:51 Page 442

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

2.8.1 Web application privileges. Most Web attacks do not allow the
If the attack succeeds, only the application data attacker to escalate privileges. They run under a
and functionality will be affected, but not the restricted account in the server or database.
operating system resources. These attacks are However, misconfiguration of the operating
typically aimed at Web pages (e.g., obtaining system access control lists, Web server
and/or modifying source code), Web users (e.g., permissions, and database users could enable an
stealing cookies or passwords using Cross-Site attacker to reach administrative privileges.
Scripting attacks, Web-based mail attacks), and There are three categories of users involved in
Web data (e.g., viewing, changing and/or an attack:
deleting information in database records).
• The Web application user.
2.8.2 Platform
• The database user.
Under this attack, the target is beyond the Web
application, aimed at the platform. The attacker • The operating system level user.
usually seeks after arbitrary command
Hence, with regard to the privilege obtained by
execution, manipulation of machine accounts,
the attack, we can distinguish between
tampering with the host’s services, obtaining
unprivileged and administrative attacks. This
network information, etc. The Web server is
category is only applicable when the objective
used as a mere portal to gain access to the
of the attack is obtaining access as a certain
internal network.
user, i.e., attacks against the authentication
2.9 Scope service (see Section 2.3.1).

Different attacks affect the Web server in 2.10.1 Unprivileged user


different ways. Many attacks only affect one Attacks are run under the identity of a Web
user or a group of users, whereas others have an user, a database user, and/or a system-level user.
impact on all users of the service. The attacker can access resources only
accessible to that user and thus the impact of
Regarding the scope of the attack, we can
the attack is limited.
distinguish between local and universal attacks.
2.10.2 Administrator/root
2.9.1 Local
If the attacker succeeds in gaining
In these attacks, only one user or a small group
administrative access, the machine is
of users is affected. An example is stealing one
completely compromised. This is the highest
user’s personal data by means of a cross-site
level of attack realization.
scripting attack.

2.9.2 Universal 3 Encoding of the attacks


All users will be affected by the attack.
Now that we have characterized attacks by their
Typically, universal attacks include Web
fundamental properties, we face the problem of
defacing, database record manipulation, and
how to encode the attack information in a
DoS. If a local attack can be automated and
compact and useful way. Provided that the
extended to any other user, then it pertains to
number of attacks suffered by a website with
the category of universal attacks. An example is
medium traffic can grow very quickly, it is not
data harvesting attacks.
advisable to record unnecessary information in
2.10 Privileges order to prevent running out of storage space
prematurely. By using the taxonomy defined in
When an attacker compromises a Web server,
this paper, it is possible to capture the relevant
the ultimate goal is to gain administrator/root

442
COSE 2205.qxd 01/07/2003 11:51 Page 443

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

information about attacks, allowing us to because only the information significant for the
perform an analysis in order to decide on their particular attack is retained. Besides its
severity. advantage over the general data compression
schemes considering the efficiency of the short
To reduce the amount of the information
messages encoding, the use of this type of
recorded on the media, the data can be
semantic dependent encoding makes possible
compressed, using some of the source coding
the direct use of classification and clustering
techniques. But the use of general compression
techniques that are often needed in the
techniques and algorithms in this case has
implementations, since the classification and
serious drawbacks. These drawbacks depend on
clustering can be performed without
the class of the method. Here we enumerate
decompressing [10].
some of them.
To encode the descriptions of the attacks using
For static defined word schemes to be
the semantic dependent method introduced
implemented (e.g. [27-31]) the knowledge of
above, a range of positive integers is assigned to
the probabilities of message classes is needed in
each of the attack properties discussed, in the
advance. The problem is that if we treat the
following way.
descriptions of the attacks as messages, the
probability of their appearance varies with time, (1) Entry point (1 bit of information)
as new attacks are invented and the remedies 0 - Web server software (ISAPI filters, Perl
are published. Other definitions of messages in modules, etc.)
this case would be too general and would not 1 - Web application (HTML, server-side
lead to a sufficient compression ratio. and client-side scripts, server components,
SQL sentences, etc.)
To adapt to the changes of the message
(2) Vulnerability (3 bits of information)
probabilities, the adaptive Huffman coding can
0 - Code Injection (SQL, JavaScript, cross-
be used (e.g. the FGK algorithm [32] or the
site scripting, etc.)
Vitter algorithm [33]). But there is no guarantee
1 - Canonicalization
that the compression ratio achieved by these
2 - HTML manipulation
methods is satisfactory, since these encodings
3 - Overflows
are often outperformed by the static methods
4 - Misconfiguration (default directories,
[33].
sample applications, guest accounts, etc.)
The main drawback of the free-parse methods, X - Not applicable
such as Ziv-Lempel [34] is that they perform (3) Service (under threat) (3 bits of
very badly when short messages are encoded. As information)
in the case of the static word schemes, other 0 - Authentication
definitions of messages would not lead to an 1 - Authorization
efficient compression either. 2 - Confidentiality
3 - Integrity
Having in mind the specific type of local
4 - Availability
redundancy in the descriptions of the attacks
5 - Auditing
(i.e. the descriptions of some of the attacks
(4) Action (4 bits of information)
include some of the attack properties, whereas
0 - Read
the descriptions of the others do not), we
1 - Modify
propose a semantic dependent data compression
2 - Delete
method that makes use of different-length
3 - Fabricate
vectors. The vectors have different lengths
4 - Impersonate

443
COSE 2205.qxd 01/07/2003 11:51 Page 444

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

5 - Bypass 4 Encoding examples


6 - Search
7 - Interrupt In this section, we give some examples
8 - Probe that illustrate how the methodology works.
9 - Other Let us consider the following common
(5) Length (1 bit of information) attacks directed against different types of Web
0 - Expected servers and platforms. When applicable,
1 - Unexpected (unusually long) problem descriptions are taken from the
(6) HTTP Verb (4 bits of information) Common Vulnerabilities and Exposures
0 - GET (CVE), a list of standardized names for
1 - POST vulnerabilities and other information security
2 - SEARCH exposures [35].
3 - PUT As the first example, let us consider the
4 - DELETE following attack:
5 - PROPFIND GET /scripts/..%255c../winnt/
6 - TRACE system32/cmd.exe?/c+delete+/Q+.
7 - HEAD
According to CVE-2000-0884, IIS 4.0 and 5.0
8 - OPTIONS
allows remote attackers to read documents
(7) HTTP Header (4 bits of information)
outside of the Web root, and possibly execute
1 - Host
arbitrary commands, via malformed URLs that
2 - Cookie
contain UNICODE encoded characters, also
4 - Referer
known as the ‘Web Server Folder Traversal’
8 - Translate
vulnerability.
X - Irrelevant
(8) Target (1 bit of information) 1. The entry point is the Web server’s software
0 - Web application (source files, customers’ (0), not the Web application, since this
data, etc.) vulnerability was present in all IIS
1 - Platform (OS command execution, machines, and can be exploited regardless of
system accounts, network, etc.) the Web application on top.
(9) Scope (1 bit of information)
2. The attack exploits a canonicalization
0 - Local (one user affected)
vulnerability in the Web server software
1 - Universal (all users affected)
(1).
X - Not applicable
(10)Privileges (1 bit of information) 3. The attack is deleting all files in the current
0 - Unprivileged user directory in silence mode (delete /Q.),
1 - Administrator/root thus attacking the integrity of the file
X - Not applicable system (3).

Each property requires a certain number of bits 4. The attack deletes information from the
to encode its information. When only one bit is server, thus the action is delete (2).
required, it means that the property can take
5. The HTTP request has normal length (0).
one of two possible values, but not both.
However, when the property can take some 6. The HTTP verb used is GET (0).
different values simultaneously, as many bits as
7. The HTTP Header used is irrelevant for the
the number of possible values are required. This
attack (X).
only happens with the property (7).

444
COSE 2205.qxd 01/07/2003 11:51 Page 445

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

8. The target of attack is the Web application 6. The HTTP verb used is GET (0).
(0) because the attacker is deleting the Web
7. The HTTP Header used is irrelevant for the
pages stored on the server.
attack (X).
9. The scope of the attack is global (1),
8. The target is the Web application (0)
because every user of the Web service will
because the attacker is obtaining data at the
be affected by it.
application level, not at the platform level.
10. The attacker gains access to the server’s file In this example, he is reading another user’s
system under the IUSR\_MachineName personal information.
identity, which corresponds to an unprivil-
9. The scope of this attack is local (0), because
eged though dangerous user account in
it affects only a user at a time.
Windows machines. Thus, it is marked as 0.
10. The attacker is not gaining access to the
As a result of the encoding process, the
application, database or server, and, as a
following vector is obtained:
consequence, this category does not apply
Vector: {0, 1, 3, 4, 0, 0, X, 0, 1, 0} (X).

As the second example, let us consider the As a result of the encoding process, the
following attack: following vector is obtained:
GET /product.jsp?id=10&title
Vector: {1, 0, 2, 0, 0, 0, X, 0, 0, X}
=<script>w=window.open(
‘http://www.attacker.com/read.cg As the third example, let us consider the
i?PAN=’+document.forms[0].PAN.va
following attack:
lue); w.close();</script>
GET /prod.asp?id=1;exec
The page blindly displays as its title the xp_cmdshell ‘net user bob h6q2
argument of a parameter passed in the URL. /add’—
Hence, JavaScript code can be injected, The Web application reads the input (the value
opening the opportunity for a Cross-Site of the parameter id) and passes it to the
Scripting attack. In this case, the credit card database engine, allowing for SQL injection
number entered by the victim is sent to the attacks. The attacker exploits this vulnerability
attacker’s server. by cheating the application into executing a
1. The entry point is the Web application (1) SQL Server extended procedure which executes
because the attack exploits a lack of input any command passed as argument. In this
validation to inject a script. This is a defect example, the attacker adds himself to the OS
in the application, not in the Web server users.
itself. 1. The entry point is the Web application (1)
2. It is a cross-site scripting vulnerability, one because the attack exploits poor input
of the possible code injection vulnerabilities validation to inject SQL commands. This is
(0). a defect in the application, not in the Web
server itself.
3. It is an attack against confidentiality (2)
because the attacker is stealing some other 2. It is a SQL injection vulnerability, one of
user’s credit card number. the possible code injection vulnerabilities
(0).
4. The attacker is reading information (0).
3. It is an attack against authorization (1)
5. The HTTP request has normal length (0).

445
COSE 2205.qxd 01/07/2003 11:51 Page 446

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

because the attacker is bypassing the shared by all servers of the same version,
administrator’s configuration to add himself regardless of the Web application running
to the system’s users list (net user bob on top.
h6q2 /add).
2. It is an overflow attack, because a very long
4. The attacker is fabricating information sequence is used, much longer than
because he is creating a new user account expected by the server software (3).
(3).
3. The attack is directed against availability
5. The HTTP request has normal length (0). (4) because the server stops functioning
after the attack.
6. The HTTP verb used is GET (0).
4. The attacker is interrupting the normal
7. The HTTP Header used is irrelevant for the
operation of the service (7).
attack (X).
5. The HTTP request is unusually long (1).
8. The target is the server platform (1) because
the attacker has added himself to the 6. The HTTP verb used is GET (0).
operating system’s users list. The attack is
7. The HTTP Header used is irrelevant for the
not directed against the Web application.
attack (X).
Instead, it exploits a vulnerability in the
application to gain access to the underlying 8. The target is the Web application (0)
operating system. because the attacker is not obtaining any
access over the underlying server platform,
9. This attack has no direct effect on the Web
but limits to disrupt the normal operation of
users. As a consequence it is marked as X.
the Web application.
10. The extended stored procedure
9. This attack affects all users (1), since they
xp_cmdshell is running under the
will not be able to access the service.
administrator account’s identity, and as a
consequence the attacker gains 10. The attacker is not gaining access to the
administrative access (1). application, database or server, and, as a
consequence, this category does not apply
As a result of the encoding process, the
(X).
following vector is obtained:
As a result of the encoding process, the
Vector: {1, 0, 1, 3, 0, 0, X, 1, X, 1}
following vector is obtained:
As the last example, let us consider the
Vector: {0, 3, 4, 7, 1, 0, X, 0, 1, X}
following attack:
GET /dir/[../](repeated approx In the ensuing process of encoding, the letters X
1344 times) are omitted, leaving different-length vectors. In
such a way, memory is saved.
According to CVE-2001-0252, iPlanet
Enterprise Server 4.1 allows remote attackers to
cause a denial-of-service via a long HTTP GET
5 Coverage of real attacks
request that contains many ‘’/../’’ (dot dot)
space
sequences. The space of real attacks is unlimited. On the
one hand, new vulnerabilities are discovered
1. The entry point is the Web server software
every day. On the other hand, there exist
(0) because the attack exploits a problem in
infinite variations of some attacks such as SQL
the Web server itself. This problem is

446
COSE 2205.qxd 01/07/2003 11:51 Page 447

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

injection, buffer overflow or cookie networked computer systems in real time [36].
manipulation. A traditional, non-heuristically based IDS
consists of three functional components:
In order to quantitatively test how large a
fraction of new vulnerabilities the taxonomy • A monitoring component, such as a packet
covers, all CVE reported Web attacks against capturer, which collects traffic data.
Microsoft’s Internet Information Server (one of
• An inference component, which analyzes
the most popular Web servers nowadays) have
the captured data to determine whether it
been successfully encoded and classified. For the
corresponds to normal activity or malicious
sake of completeness, some other vulnerabilities
activity.
affecting other Web servers (iPlanet, Apache,
Oracle, BEA) have been randomly chosen from • An alerting component, which generates a
the CVE database, encoded and classified. response when an attack has been detected.
This response can be passive (such as
Let us now consider the type of attacks with
writing an entry in an event log) or active
infinite number of instances. We explain in
(such as changing configuration rules in the
detail the SQL injection attack, but similar
firewall to block the attacker’s IP address).
reasoning can be applied to other types of
attacks with infinite number of instances, such One of the biggest problems faced by these
as buffer overflow, cross-site scripting, cookie systems is the huge amount of alerts that might
manipulation, etc. When the SQL sentence be generated in a heavily attacked environment
varies, some categories in the encoding of the in a matter of hours. It is impossible for a
SQL injection attack remain the same (Entry human operator to analyze so many reports and
point, Vulnerability, Length, HTTP Verb, decide on the severity of the detected attacks to
HTTP Headers). The categories that are determine the action to take. This taxonomy
changed can sustain the variability of the SQL can be used in the following way: first, the
language (Service under threat, Action, Target, attacks are encoded by means of the proposed
Scope, Privileges), since it is possible to encoding scheme. Next, the vectors originated
semantically recognize SQL sentences, and the from the encoder are processed using pattern
category Action of our taxonomy is exhaustive. recognition or information extraction
Thus, any different SQL sentence embedded in techniques (clustering algorithms [37],
the SQL injection attack would be comprised supervised learning [38], etc.) in order to
by our taxonomy affecting the already pinpoint the most dangerous attacks, and
mentioned categories, which means that the analyze attack trends throughout time.
proposed taxonomy is exhaustive in this case.
6.2 Application-level firewall
6 Possible applications Another approach to detect and prevent Web
attacks consists of using an application-level
This taxonomy and the corresponding attack
firewall or, more specifically, Web application
encoding vectors are useful in a number of
firewalls [39, 40]. A traditional firewall provides
applications, especially in intrusion detection
protection only at the network level, with
systems and in application-level firewalls.
minimal or no application awareness [41]. On
6.1 Intrusion Detection Systems the other hand, application-level firewalls are
(IDS) capable of processing data at the application
level as well as decrypting SSL connections. An
An intrusion detection system (IDS) detects
application-layer solution works within the
and reports attempts to misuse or break into
application that it is protecting, inspecting

447
COSE 2205.qxd 01/07/2003 11:51 Page 448

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

requests as they come in from the network References


level. If at any point a possible attack is
[1] McClure, S., Shah, S. and Shah, S., 2002. Web Hacking.
detected, it can take over and prevent Addison Wesley Professional, 2002.
unauthorized access and/or damage to the Web [2] Scambray, J. and Shema, M., 2002. Hacking Exposed Web
Applications. McGraw-Hill Osborne Media, 2002.
server by simply blocking and logging the
[3] Cohen, F.B., 1997. Information system attacks: A
offensive request. preliminary classification scheme. Computers & Security,
Vol. 16(1), 1997, pp. 29-46.
This approach might work better than the IDS [4] Howard, J.D. and Longstaff, T.A., 1998. A common
language for computer security incidents. Technical Report
one, blocking many more attacks. But again the
SAND98-8667, Sandia National Laboratories, October
problem of how to postprocess the alerts log file 1998.
arises. Classifying the attacks once they have [5] Lindqvist, U. and Jonsson, E., 1997. How to systematically
classify computer security intrusions. Proceedings of the
been blocked by the firewall and deciding on 1997 IEEE Symposium on Security & Privacy, 1997.
their severity is crucial for a prompt and [6] Lough, D.L., 2001. A Taxonomy of Computer Attacks with
Applications to Wireless Networks. PhD thesis, Virginia
effective response. This taxonomy helps in this Polytechnic Institute and State University, 2001.
task, providing an exhaustive group of mutually [7] Richardson, T.W., 2001. The Development of a Database
exclusive categories under which the attacks Taxonomy of Vulnerabilities to Support the Study of
Denial of Service Attacks. PhD thesis, Iowa State
can be unambiguously classified using methods University, 2001.
that employ the encoding scheme proposed in [8] Schneier, B., 1999. Attack trees: Modeling security threats.
Dr. Dobb’s Journal, Vol. 12(24), 1999, pp. 21-29.
this paper [37]. Using encoder data as input, a
[9] Stallings, W., 1995. Network and Internetwork Security,
decision system can determine the priority of chapter 1. Prentice Hall, IEEE Press, 1995.
various attacks. [10]Petrović, S., 1997. Clustering unequal length binary data
using graph-theoretic techniques. Proc. 4th Balkan
Conference on Operational Research, 1997.
7 Conclusion [11]Amoroso, E.G., 1994. Fundamentals of Computer Security
Technology. Prentice-Hall PTR, 1994.
In this paper, a taxonomy of Web attacks is [12]Scott, D. and Sharp, R., 2002. Abstracting application-level
proposed that intends to represent a forward Web security. WWW2002, May 2002.
[13]Microsoft. Cross-site scripting security exposure executive
step towards a more precise reference summary.
framework. A Web attack life cycle is defined as http://www.microsoft.com/technet/security/topics/ExSum
CS.asp, 2000.
its base, to make it structured and logical. The
[14]Owasp. http://www.owasp.org.
properties of the most common Web attacks are [15]Anley, C., 2002. Advanced sql injection in sql server
described. A new semantic-dependent encoding applications. Technical report, Next Generation Security
Software, January 2002.
scheme for these attacks is defined that can be
[16]Anley, C., 2002. (more) advanced sql injection in sql server
exploited by the known pattern recognition applications. Technical report, Next Generation Security
algorithms employing special distance measures, Software, June 2002.
[17]Cerrudo, C., 2002. Manipulating Microsoft sql server using
which reduces the time and space complexity of sql injection. Technical report, Application Security, Inc.,
attacks data processing. Real world examples of 2002.
attacks against different platforms, Web servers, [18]Howard, M. and LeBlanc, D., 2001. Writing Secure Code,
chapter 12. Microsoft Press, 2001.
and applications are given to illustrate how this [19]Cowan, C., Wagle, P., Pu, C., Beattie, S. and Walpole, J.,
taxonomy and the encoding scheme can be 2000. Buffer overflows: Attacks and defenses for the
vulnerability of the decade. DARPA Information
applied. Finally, possible applications are Survivability Conference and Exposition, 2:1119-1129,
described, such as intrusion detection systems January 2000.

and application firewalls. [20]Ford, W., 1994. Computer Communications Security,


chapter 2. Prentice Hall PTR, 1994.
[21]Purser, M., 1993. Secure Data Networking, chapter 1.
Acknowledgements Artech House, 1993.
[22]Kolsek, M., 2002. Session fixation vulnerability in web-
This research was supported by Ministerio de based applications. Technical report, Acros Security, 2002.
Ciencia y Tecnología, Proyecto TIC2001-0586. [23]Teleport pro. http://www.tenmax.com/teleport/.

448
COSE 2205.qxd 01/07/2003 11:51 Page 449

Gonzalo Álvarez and Slobodan Petrović


A new taxonomy of Web attacks suitable for efficient encoding

[24]wget. http://www.gnu.org/software/wget/wget.html. [34]Ziv, J. and Lempel, A., 1977. A universal algorithm for
[25]Whisker. sequential data compression. IEEE Trans. Inform. Theory,
http://www.wiretrip.net/rfp/p/doc.asp/i2/d21.htm. Vol. 23(3), May 1977.

[26] Nikto. http://www.cirt.net/code/nikto.shtml. [35]Common vulnerabilities and exposures.


http://cve.mitre.org.
[27]Abramson, N., 1963. Information Theory and Coding.
McGraw-Hill, 1963. [36]Northcutt, S., 2002. Network Intrusion Detection, Third
Edition. New Riders Publishing, 2002.
[28]Elias, P., 1975. Universal codeword sets and representation
of the integers. IEEE Trans. Inform. Theory 21,2, March [37]Petrović, S. and Álvarez, G., 2003. A method for clustering
1975. different length vectors using edit distance.
http://arXiv.org/abs/cs.IR/0304007, 2003.
[29]Fano, R.M., 1949. Transmission of Information. MIT Press,
1949. [38]Frank, J., 1994. Artificial Intelligence and intrusion
detection: Current and future directions. Proceedings of
[30]Huffman, D.A., 1952. A method for the construction of the 17th National Computer Security Conference,
minimum-redundancy codes. Proc. IRE 40,9, September Baltimore, MD, 1994.
1952.
[39]Álvarez, G. and Petrović, S., 2003. Anomaly-based Web
[31]Shannon, C.E. and Weaver, W., 1949. The Mathematical attack detection system. Submitted to 6th Information
Theory of Communication. University of Illinois Press, Security Conference (ISC’03), 2003.
1949.
[40]Lindstrom, P., 2002. Guide to intrusion prevention.
[32]Faller, N., 1973. An adaptive system for data compression. Information Security, October 2002.
Record of the 7th Asilomar Conf. on Circuits, Systems and
Computers, 1973. [41]Zwicky, E.D., Cooper, S., Chapman, D.B. and Russell, D.,
2000. Building Internet Firewalls (2nd Edition). O’Reilly &
[33]Vitter, J.S., 1987. Design and analysis of dynamic Huffman Associates, 2000.
codes. J. ACM, Vol, 34(4), 1987.

449

You might also like