• Embed Doc
  • Readcast
  • Collections
  • 1
    CommentGo Back
Download
 
Denial of Service via Algorithmic Complexity Attacks
Scott A. Crosby Dan S. Wallach
scrosby@cs.rice.edu dwallach@cs.rice.edu
 Department of Computer Science, Rice University
Abstract
We present a new class of low-bandwidth denial of service attacks that exploit algorithmic deficienciesin many common applications’ data structures. Fre-quently used data structures have “average-caseexpected running time that’s far more efficient thanthe worst case. For example, both binary trees andhash tables can degenerate to linked lists with care-fully chosen input. We show how an attacker caneffectivelycomputesuchinput, andwedemonstrateattacks against the hash table implementations intwo versions of Perl, the Squid web proxy, and theBro intrusion detection system. Using bandwidthless than a typical dialup modem, we can bring adedicated Bro server to its knees; after six min-utes of carefully chosen packets, our Bro server wasdropping as much as 71% of its traffic and consum-ing all of its CPU. We show how modern universalhashing techniques can yield performance compa-rable to commonplace hash functions while beingprovably secure against these attacks.
1 Introduction
When analyzing the running time of algorithms,a common technique is to differentiate best-case,common-case, and worst-cast performance. For ex-ample, an unbalanced binary tree will be expectedto consume
O
(
n
log
n
)
time to insert
n
elements,but if the elements happen to be sorted beforehand,then the tree would degenerate to a linked list, andit would take
O
(
n
2
)
time to insert all
n
elements.Similarly, a hash table would be expected to con-sume
O
(
n
)
time to insert
n
elements. However, if each element hashes to the same bucket, the hashtable will also degenerate to a linked list, and it willtake
O
(
n
2
)
time to insert
n
elements.While balanced tree algorithms, such as red-black trees [11], AVL trees [1], and treaps [17] can avoidpredictable input which causes worst-case behav-ior, and universal hash functions [5] can be usedto make hash functions that are not predictable byan attacker, many common applications use simpleralgorithms. If an attacker can control and predictthe inputs being used by these algorithms, then theattacker may be able to induce the worst-case exe-cution time, effectively causing a denial-of-service(DoS) attack.Such algorithmic DoS attacks have much in com-monwithotherlow-bandwidthDoSattacks, suchasstack smashing [2] or the ping-of-death
1
, wherein arelatively short message causes an Internet server tocrash or misbehave. While a variety of techniques
can
be used to address these DoS attacks, com-mon industrial practice still allows bugs like theseto appear in commercial products. However, unlikestacksmashing, attacksthattargetpoorlychosenal-gorithms can function even against code written insafe languages. One early example was discoveredby Garfinkel [10], who described nested HTMLtables that induced the browser to perform super-linear work to derive the table’s on-screen layout.More recently, Stubblefield and Dean [8] describedattacks against SSL servers, where a maliciousweb client can coerce a web server into perform-ing expensive RSA decryption operations. They
1
http://www.insecure.org/sploits/ping-o-death.html
has a nice summary.
 
 Bucket 012345
Figure 1: Normal operation of a hash table.suggested the use of 
crypto puzzles
[9] to forceclients to perform more work before the server doesits work. Provably requiring the client to con-sume CPU time may make sense for fundamen-tally expensive operations like RSA decryption, butit seems out of place when the expensive opera-tion (e.g., HTML table layout) is only expensivebecause a poor algorithm was used in the system.Another recent paper [16] is a toolkit that allowsprogrammers to inject sensors and actuators into aprogram. When a resource abuse is detected an ap-propriate action is taken.This paper focuses on DoS attacks that may bemounted from across a network, targeting serverswith the data that they might observe and store ina hash table as part of their normal operation. Sec-tion 2 details how hash tables work and how theycan be vulnerable to malicious attacks. Section 3describes vulnerabilities in the Squid web cache,the DJB DNS server, and Perl’s built-in hash ta-bles. Section 4 describes vulnerabilities in the Brointrusion detection system. Section 5 presents somepossible solutions to our attack. Finally, Section 6gives our conclusions and discusses future work.
2 Attacking hash tables
Hash tables are widely used throughout computersystems. They are used internally in compilers totrack symbol tables. They are used internally in op-erating systems for everything from IP fragment re-assembly to filesystem directory lookup. Hash ta-
 Bucket 012345
Figure 2: Worst-case hash table collisions.bles are so common that programming languageslike Perl provide syntactic sugar to represent hashtables as “associative arrays,” making them easy forprogrammers to use. Programmers clearly preferhash tables for their constant-time expected behav-ior, despitetheirworst-case
O
(
n
)
per-operationrun-ning time. After all, what are the odds that a hashtable will degenerate to its worst case behavior?In typical usage, objects to be inserted into ahashtable are first reduced to a 32-bit
hash value
.Strings might be hashed using a checksum oper-ator like CRC32 or MD5, but are usually hashedby much simpler algorithms. More complex ob- jects might have custom-written hash-value oper-ators. The hash table then takes this hash value,modulo the
bucket count 
, the size of the array of pointers to data being stored in the hash table, de-termining the bucket that will hold the reference tothe object being inserted. When two inputs mapthe the same bucket, a
collision
has occurred. Todeal with this case, each
hash bucket 
holds a linkedlist of all inserted objects whose hash value, mod-ulo the bucket count, maps to that particular bucket(see Figure 1). These linked lists are referred toas
hash chains
. When the total number of objectsin the hash table grows too large, resulting in longaverage chain length, the size of the array of hashbuckets is typically increased, perhaps multipliedby a constant factor, and the entries in the table arereinserted, taking their hash values modulo the newbucket count.There are other methods of implementing hash ta-bles, including
open addressing
, where collisionsare not resolved using hash chains. Instead, the
 
system follows a deterministic strategy to probe foran empty hash bucket, where the object is then in-serted. Although this paper focuses on hash chain-ing, the attacks described here will be at least aseffective on open addressing hash tables.Theworsecase(seeFigure2)canoccurfortworea-sons: either the 32-bit hash values are identical, orthe hash values modulo the bucket count becomesidentical. Of course, for randomly chosen input, theodds of every object hashing to the same bucket isvanishingly small —
1
b
n
1
for
b
buckets and
n
ob- jects. For maliciously chosen input, however, it be-comes entirely feasible. If the hash table is check-ing for duplicates when a new object is inserted,perhaps to guarantee that it acts as a mapping fromobject keys to values, then it will need to scan ev-ery entry in the hash bucket. This will induce theworst-case
O
(
n
)
behavior for each insert.There are only a few requirements in order to en-gage in such an attack. First, the hash function be-ing used must be deterministic and known to the at-tacker. Second, the attacker needs the ability to pre-dict or supply all of the input being used by the hashfunction. Third, the attacker needs to ensure that asufficient volume of attack input gets to the victimsuch that they experience a performance degrada-tion.The attacker must understand how raw data, ini-tially read by the application from the network,is processed before it is inserted into the hash ta-ble. Knowing this, the attacker must compute ob- jects that will eventually collide, either in the 32-bithash-value space, or only in the eventual hash buck-ets. Section 2.1 will describe how these collisionscan be efficiently computed for some hash func-tions. At worst, computing hash collisions requiresan attacker to exhaustively search within the spaceof possible inputs. While expensive, the attackercan do this work ahead of time. Ultimately, thequestion is whether the victim will accept enoughattack-input for the
O
(
n
2
)
worst-case behavior tomanifest itself. Furthermore, some victims may en-force various limits on the growth of their hash ta-bles, making them robust against this class of at-tack. We describe such limits in Section 2.2.
2.1 Constructing a specific attack
The first step in analyzing a program’s vulnerabil-ities to this attack is to determine where hash ta-bles are being used and identifying whether exter-nal, untrusted input can be fed directly into the ta-ble. This can be time consuming. As an example,the Bind DNS server places some four different ab-straction layers between the network and the ulti-mate hash table storing DNS bindings. Tracing thiscan be tedious work for an attacker unfamiliar withthe source code.
2.1.1 Hash collision versus bucket collision
An attacker may not know the bucket count ex-actly; many implementations change the bucketcount based on the number of objects stored in thehash table. However, given the application’s sourcecode, an attacker may be able to guess possible val-ues for the bucket count. This leads to two avenuesof attack: those where you don’t care about thebucket count and those where you know or guessthe bucket count.Ifcollisionscanbecomputedinthefull32-bithash-value space, then the bucket count is irrelevant;the hash table will exhibit worst-case behavior re-gardless of how many buckets it has. More for-mally, wewishtoderiveinputs
1
,
2
,...
i
suchthat
 Hash
(
1
) =
Hash
(
2
) =
...
=
Hash
(
i
)
. We referto these as
hash collisions
. If the inputs have dif-ferent hash values, but still collide into the samebucket (e.g., after a modulo operation has takenplace), we refer to these as
bucket collisions
. For-mally, a bucket collision is when we derive inputs
1
,
2
,...
i
such that
(
1
) =
(
2
) =
...
=
(
i
)
where
is the the function mapping from inputs tobuckets. In many cases,
(
) =
Hash
(
) (
mod
n
)
,with
n
being the number of buckets.While hash collisions would seem straightforward,they do not always result in a feasible attack. For
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
11 / 24 / 2010This doucment made it onto the Rising List!
You must be to leave a comment.
Submit
Characters: ...