You are on page 1of 22

Chapter 5 Hash Functions

A hash function usually means a function that compresses, meaning the output is shorter than the input. Often, such a function takes an input of arbitrary or almost arbitrary length to one whose length is a fixed number, like 160 bits. Hash functions are used in many parts of cryptography, and there are many different types of hash functions, with differing security properties. We will consider them in this chapter.


The hash function SHA1

The hash function known as SHA1 is a simple but strange function from strings of almost arbitrary length to strings of 160 bits. The function was finalized in 1995, when a FIPS (Federal Information Processing Standard) came out from the US National Institute of Standards that specified SHA1. Let {0, 1}< denote the set of all strings of length strictly less than . The 64 function SHA1: {0, 1}<2 → {0, 1}160 is shown in Fig. 5.1. (Since 264 is a very large length, we think of SHA1 as taking inputs of almost arbitrary length.) It begins by padding the message via the function shapad, and then iterates the compression function sha1 to get its output. The operations used in the algorithms of Fig. 5.1 are described in Fig. 5.2. (The first input in the call to SHF1 in code for SHA1 is a 128 bit string written as a sequence of four 32-bit words, each word being consisting of 8 hexadecimal characters. The same convention holds for the initialization of the variable V in the code of SHF1.) SHA1 is derived from a function called MD4 that was proposed by Ron Rivest in 1990, and the key ideas behind SHA1 are already in MD4. Besides SHA1, another well-known “child” of MD4 is MD5, which was likewise proposed by Rivest. The MD4, MD5, and SHA11 algorithms are all quite similar in structure. The first two produce a 128-bit output, and work by “chaining” a compression function that goes from 512 + 128 bits to 128 bits, while SHA1 produces a 160 bit output and works by chaining a compression function from 512 + 160 bits to 160 bits.




algorithm SHA1(M ) // |M | < 264 V ← SHF1( 5A827999 6ED9EBA1 8F1BBCDC CA62C1D6 , M ) return V algorithm SHF1(K, M ) // |K | = 128 and |M | < 264 y ← shapad(M ) Parse y as M1 M2 · · · Mn where |Mi | = 512 (1 ≤ i ≤ n) V ← 67452301 EFCDAB89 98BADCFE 10325476 C3D2E1F0 for i = 1, . . . , n do V ← shf1(K, Mi V ) return V algorithm shapad(M ) // |M | < 264 d ← (447 − |M |) mod 512 Let be the 64-bit binary representation of |M | y ← M 1 0d // |y | is a multiple of 512 return y algorithm shf1(K, B V ) // |K | = 128, |B | = 512 and |V | = 160 Parse B as W0 W1 · · · W15 where |Wi | = 32 (0 ≤ i ≤ 15) Parse V as V0 V1 · · · V4 where |Vi | = 32 (0 ≤ i ≤ 4) Parse K as K0 K1 K2 K3 where |Ki | = 32 (0 ≤ i ≤ 3) for t = 16 to 79 do Wt ← ROTL1 (Wt−3 ⊕ Wt−8 ⊕ Wt−14 ⊕ Wt−16 ) A ← V 0 ; B ← V1 ; C ← V2 ; D ← V3 ; E ← V4 for t = 0 to 19 do Lt ← K0 ; Lt+20 ← K1 ; Lt+40 ← K2 ; Lt+60 ← K3 for t = 0 to 79 do if (0 ≤ t ≤ 19) then f ← (B ∧ C ) ∨ ((¬B ) ∧ D) if (20 ≤ t ≤ 39 OR 60 ≤ t ≤ 79) then f ← B ⊕ C ⊕ D if (40 ≤ t ≤ 59) then f ← (B ∧ C ) ∨ (B ∧ D) ∨ (C ∧ D) temp ← ROTL5 (A) + f + E + Wt + Lt E ← D ; D ← C ; C ← ROTL30 (B ) ; B ← A ; A ← temp V0 ← V 0 + A ; V 1 ← V 1 + B ; V 2 ← V 2 + C ; V 3 ← V 3 + D ; V 4 ← V 4 + E V ← V0 V1 V2 V3 V4 return V Figure 5.1: The SHA1 hash function and the underlying SHF1 family.

Namely. This is unfortunate in some ways.2 Collision-resistant hash functions A hash function for us is a family of functions H : K × D → R. having just two “print” statements—whose output specifies a collision. it will be a family of functions. This is just by the pigeonhole principle: if 2256 pigeons (the 256-bit messages) roost in 2160 holes (the 160-bit hash values) then some two pigeons (two distinct strings) roost in the same hole (have the same hash). it is supposed to be the case that nobody can find distinct strings M and M such that SHA1(M ) = SHA1(M ). So even if you restricted the domain of SHA1 just to “short” strings—let us say strings of length 256 bits—then there must be an enormous number of pairs of strings M and M that hash to the same value. 5. The only difficulty is our human problem of not knowing what this program is. But no alternative is known. It’s not computationally hard to output a collision.2: Operations on 32-bit words used in sha1. We would like to say that it is computationally infeasible to output a pair of distinct strings M and M that collide under SHA1. if K ∈ K is a particular key then . As usual. explicitly provided) even two such pigeons (strings). it can’t be. In trying to define this collision-resistance property of SHA1 we immediately run into “foundational” problems. It seems very hard to make a mathematical definition that captures the idea that human beings can’t find collisions in SHA1. Here D is the domain of H and R is the range of H . But in what sense could it be infeasible? There is a program—indeed a very short an simple one. The function SHA1 maps strings of (almost) any length to strings of 160 bits. Stop for a moment and think about the collision-resistance requirement. This property is called collision resistance. In order to reach a mathematically precise definition we are going to have to change the very nature of what we conceive to be a hash function. The difficult is only that nobody has as yet identified (meaning. So what is SHA1 supposed to do? First and foremost. because it distances us from concrete hash functions like SHA1. for it is really quite amazing to think that such a thing could be possible. rather than it being a single function.Bellare and Rogaway 3 X ∧Y X ∨Y X ⊕Y ¬X X +Y ROTLl (X ) bitwise AND of X and Y bitwise OR of X and Y bitwise XOR of X and Y bitwise complement of X integer sum modulo 232 of X and Y circular left shift of bits of X by l positions (0 ≤ l ≤ 31) Figure 5. Indeed countless pigeons must share the same hole.

and will choose the remaining s points as a function of the key. 1. This hash function takes a 128-bit key and an input M of at most 264 bits and returns a 160-bit output. 5. For any key K and y ∈ R we let denote the pre-image set of y under HK . Let −1 HK (y ) = { x ∈ D : HK (x) = y } denote the image of HK . The three choices of s ∈ {0. namely the one whose associated key is 5A827999 6ED9EBA1 8F1BBCDC CA62C1D6 .) Once the adversary has specified these points and the key has been selected. 1. It consists of a pre-key attack phase. which measures the ability of an adversary to find a collision for an instance of a family H . varying in restrictions put on the adversary in its quest for a collision. M ). Here is some notation we use in this chapter. and involving an adversary A. The function SHA1 is an instance of this family. the adversary is given the key. 2} give rise to three notions of security. Fig. This is the instance of H defined by key K . 1}128 × {0. A collision for a function h: D → R is a pair x1 .3: Framework for security notions for collision-resistant hash functions. The most basic security property of a hash function is collision-resistance. There are different notions of collisionresistance. 1}160 . Recall that a collision consists of a pair x 1 . in the post-key attack phase. followed by a key-selection phase. parameterized by an integer s ∈ {0. 5. (The latter has yet to be selected. before it has any information about the key. we imagine a game. HK : D → R is defined for all M ∈ D by HK (M ) = H (K.1. followed by a post-key attack phase. The adversary is required to specify 2 − s points in the pre-key attack phase. The adversary is attempting to find a collision for HK .4 HASH FUNCTIONS Pre-key attack phase Key selection phase Post-key attack phase Winning condition A selects 2 − s points A key K is selected at random from K A is given K and returns s points The 2 points selected by A form a collision for HK Figure 5. To introduce the different notions. 2}. 1}<2 → {0. where key K is selected at random from K in the key-selection phase. x2 of (distinct) points in D.3 summarizes the framework. It wins if the 2 = (2 − s) + s points it has selected form a collision for HK . 64 An example is SHF1: {0. as described in Fig. The three choices of the parameter s give Image(HK ) = { HK (x) : x ∈ D } . x2 ∈ D of points such that (1) HK (x1 ) = HK (x2 ) and (2) x1 = x2 . Let H : K × D → R be a hash function.

4 provides in more detail the experiments underlying the three attacks arising from the above framework. The notation in the experiments of Fig. K ← K . x2 ) ← A() . We represent by st information that the adversary wishes to maintain across its attack phases.1 reflects this via the use of “kk”. x2 ∈ D ) then return 1 else return 0 Expcr0 H (A) $ $ (x1 .1 Let H : K × D → R be a hash function and let A be an algorithm.8 as capturing collision-resistance under hidden-key attack. (x1 . Definition 5. rise to three notions of security. Fig. 5. we refer to our current notions as capturing collisionresistance under known-key attack. It will output this information in the pre-key attack phase. 5. In a variant of this model that we consider in Section 5. The three types of hash functions we are considering are known by other names in the literature. To disambiguate. known and hidden key attacks coincide. but instead is given an oracle for HK (·). and hence we just say cr0.4: Experiments defining security notions for three kinds of collisionresistant hash functions under known-key attack. and the notions of Section 5. x2 ∈ D ) then return 1 else return 0 Figure 5.4 and Definition 5. except that for CR0. x2 ) ← A(K ) if ( HK (x1 ) = HK (x2 ) and x1 = x2 and x1 .Bellare and Rogaway 5 -kk (A) Expcr2 H $ $ K← K . The higher the value of s the more power the adversary has. st) ← A() . st) if ( HK (x1 ) = HK (x2 ) and x1 = x2 and x1 . and hence the more stringent is the corresponding notion of security. the adversary is not given the key K in the post-key attack phase. x2 ← A(K. 5. and be provided it at the start of the post-key attack phase.8. as indicated in Fig.5. We let -kk (A) = Pr Expcr2-kk (A) = 1 Advcr2 H H -kk (A) = Pr Expcr1-kk (A) = 1 Advcr1 H H . x2 ∈ D ) then return 1 else return 0 -kk (A) Expcr1 H $ $ $ (x1 . K ← K if ( HK (x1 ) = HK (x2 ) and x1 = x2 and x1 .

because we can always imagine that hardwired into its code is a “best” choice of distinct points x1 .5: Types of hash functions. This is. cr0 Advcr0 H (A) = Pr ExpH (A) = 1 . Clearly.6 HASH FUNCTIONS Type CR2-KK CR1-KK CR0 Name(s) in literature collision-free. H 0 H 1 Also for any adversary A1 there exists an adversary A2 having the same running time as A1 and Advcr1-kk (A ) ≤ Advcr2-kk (A ) . a CR2 hash function is also CR1 and a CR1 hash function is also CR0. almost universal Figure 5. Then for any adversary A0 there exists an adversary A1 having the same running time as A0 and Advcr0 (A ) ≤ Advcr1-kk (A ) . The above value equals Advcr0 H (A) and is the maximum advantage attainable. with names in our framework and corresponding names found in the literature.2 Let H : K × D → R be a hash function. x2 . later. Although there is formally no definition of a “secure” hash function. Note that the running time of the adversary is not really relevant for CR0. target-collision resistant [1]) universal. for any integer n. meaning a choice for which $ Pr K ← K : HK (x1 ) = HK (x2 ) = y1 =y2 $ max Pr K ← K : HK (y1 ) = HK (y2 ) . 1}n → {0. such an algorithm will emerge. The following states the corresponding relations formally. 1}160 denote the restriction of SHF1 to the domain {0. H 1 H 2 We believe that SHF1 is CR2. to get SHF1n : {0. we will talk of a hash function being CR2. however. CR1 or CR0 with the intended meaning that its associated advantage function is small for all adversaries of practical running time. It is useful. Proposition 5. Note that a collision for SHF1n K is also a . 1}n . Perhaps. based on which Advcr2 H the current inability to find such an algorithm. The proof is trivial and is omitted. collision-intractable universal one-way [8] (aka. meaning that there is no practical algorithm A for -kk (A) is appreciably large. collision-resistant. In measuring resource usage of an adversary we use our usual conventions. purely a belief.

. To succeed. so that D = {D1 . q do $ x2 ← D if (HK (x2 ) = y and x1 = x2 ) then return x1 . Let us enumerate the elements of D in some way. the domain has size 2672 . We consider different types of CR2-type collision-finding attacks on a family H : K × D → R where D. . which happens at least half the time if |D| ≥ 2|R|. . However. Collision-resistance does not mean it is impossible to find a collision. the compression function of SHF1. y ← HK (x1 ) for i = 1. Each trial involves one computation of HK . for F = shf1. say |D| ≥ 2|R|. there is an obvious collision-finding strategy. we would choose n as small as possible.Bellare and Rogaway 7 collision for SHF1K . so we expect to find a collision in about q = |R| trials. which is prohibitive. hoping that their image under HK equals the image under HK of an initial target point. . and it is often convenient to think of attacking SHF1n for some fixed n rather than SHF1 itself. the attack −1 requires that HK (y ) has size at least two. . In particular. for D is usually large. q do if (HK (Di ) = y and x1 = Di ) then return x1 . It is implemented like this: Adversary A(K ) $ x1 ← D . . Analogous to the case of one-wayness. Dd } where d = |D|. R are finite sets. . D2 . x2 return FAIL A particular trial finds a collision with probability (about) 1 in |R|. Canonical example families to keep in mind are H = SHF1n for n ≥ 161 and shf1. . so the attack uses 2161 computations of HK . For example. but need n ≥ 161 to ensure collisions exist. y ← HK (x1 ) for i = 1. so the number of trials is a measure of the time taken by the attack. For SHF1n . We pick points at random. . The following adversary A implements an exhaustive search collisionfinding attack: Adversary A(K ) $ x1 ← D . far too large. 5. a collision for shf1 would be found in time .3 Collision-finding attacks Let us focus on CR2. Call this the random-input collision-finding attack. . which is not practical. which is the most important property for the applications we will see later. Now here’s another idea. We assume the family performs some reasonable compression. . we would still expect that it would take about q = |D| trials to find a collision. . Di return FAIL We call q the number of trials. This is much better than the |D| trials used by our first attempt.

1) 2N √ √ for 1 ≤ q ≤ 2N . We imagine each person has a birthday that is a random one of the 365 days in a year. Thus C (N. It is illustrated in Fig. this attack is not a threat. We now consider another strategy. But this is still far from practical. The answer may seem surprising at first. . The birthday attack on the other hand takes around 2160 = 280 trials. 1. yi ← HK (xi ) if (there exists j < i such that yi = yj but xi = xj ) then return xi . 5. . q = O( |R|) trials suffices. Our conclusion is that as long as the range size of the hash function is large enough. To see why the birthday attack performs as well as we claimed. . We throw the balls at random into the bins. q ) ≈ . This is MUCH less than 2160 . it has found a collision for HK . where N ≥ q . A collision is said to occur if some bin ends up containing at least two balls. xj return FAIL // No collision found Figure 5. The attack is successful in finding a collision if it does not return FAIL. one by one. . Suppose we have q balls. and applies HK to each of them. q ) ≈ 1 for q ≈ 2N . the random-input collision-finding attack takes about 160 2 √ trials to find a collision. Similarly. and the probabilities for all the balls are independent. The question is how large q need be to find a collision. q ). . At random means that each ball is equally likely to land in any of the N bins. called a birthday attack. where the i-th bin represents having birthday the i-th day of the year. If it finds two distinct points yielding the same output. So we can apply the Proposition from the Appendix with N = 365 C (N. q2 (5. but first let us note the impact. we recall the following game. beginning with ball 1. As shown in the Appendix. View them as numbered. q do $ // q is the number of trials // collision found xi ← D . This means we can think of a person as a ball being thrown at random into one of 365 bins. We are interested in C (N. Consider SHA1n with n ≥ 161. As we indicated. . . the probability of a collision. around 2160 rather than 2672 . The relation to birthdays arises from the question of how many people need be in a room before the probability of there being two people with the same birthday is close to one.6: Birthday attack on a hash function H : K × D → R.8 HASH FUNCTIONS for i = 1. We also have N bins. We will justify this later. Namely. that turns out to be much better than the above. q .6. the birthday attack finds a collision in shf1 in around 280 trials while while random-input collision-finding takes about 2160 trials. It picks at random q points from the domain. .

P (y ) = |D| In order for P (R1 ) = P (R2 ) = · · · = P (RN ) to be true. Let P (Rj ) denote the probability that a ball lands in bin Rj . for which there is no evidence so far. a collision of balls (two balls in the same bin) occurs precisely when two values xi . We view xi as a ball. (We ignore the probability that xi = xj . telling us that we ought to design hash functions to be as “close” to regular as possible [2]. q ). 5. then the probability that the attack succeeds is roughly C (N. This leads designers to choose l large enough that 2 l/2 is prohibitive. Then −1 |HK (Rj )| . the probability that xi = xj is small enough to neglect. For any K and any x. The Proposition says that when the room contains q ≈ 2 · 365 ≈ 27 people. namely the probability that HK (x) = Rj taken over a random choice of x from D. This number (27) is quite small and may be hard to believe at first hearing. where N = |R|. that a function is not vulnerable to a birthday attack does not. . counting a collision only when HK (xi ) = HK (xj ). In the case of SHF1 and shf1. let us enumerate the points in the range as R1 . Namely. So the above says that in this case we √ need about q ≈ 2N = 2 · |R| trials to find a collision with probability close to one. . If H is not regular. (Unless they turn out to be extremely non-regular. there is a 2l/2 or better time attack to find collisions in any hash function outputting l bits. where yi = HK (xi ). To see how this applies to the birthday attack of Fig. 1}160 defined as follows. The output length is 160. by appropriate choice of output length. Each such point defines a bin. 1}161 → {0. we cannot apply the birthday analysis directly. which is why this is sometimes called the birthday paradox. RN . because the latter assumes that each ball is equally likely to land in each bin. guarantee it is collision resistant. so a birthday attack takes 280 time and is not feasible. This is not.) However. These functions cannot thus be considered vulnerable to birthday attack. an adversary can just pick some 160-bit y and output −1 −1 −1 (RN )| . . (R2 )| = · · · = |HK (R1 )| = |HK |HK . . Thus.) Ensuring. and imagine that it is thrown into bin yi . and H is called regular if HK is regular for every K .6. on input K . In summary. We are interested in the probability that this happens as a function of q . as required to apply the birthday analysis. true for our attack. Our conclusion is that if H is regular. Consider the family H : K × {0. but it is still easy to find collisions. in general. It can be argued that since D is larger than R. it turns out the attack succeeds even faster.Bellare and Rogaway 9 and q the number √ of people in the room. it must be the case that A function HK with this property is called regular. xj have the same output under HK . function HK (x) returns the first 160 bits of x. the choice is l = 160 because 2 80 is indeed a prohibitive number of trials. of course. the probability that there are two people with the same birthday is close to one.

x ← A(K. the answer is “no. but needs to be carefully quantified. This turns out to be true. Even if a somewhat better adversary. y ) If (HK (x ) = y and x ∈ D) then return 1 else return 0 We let -kk (A) = Pr Expow-kk (A) = 1 . It is easy to see. Then H is highly collision-resistant (the advantage of an adversary in finding a collision is zero regardless of its time-complexity since collisions simply don’t exist) but is not one-way. Definition 5. To understand the issues. the size of the range is more than the size of the domain) are one-way.10 HASH FUNCTIONS y 0. Since this definition too has a hidden-key version. however. were found. say one finding a collision for shf1 in 265 trials. let H be a family of functions every instance of which is the identity function. Nobody has yet found an adversary that finds a collision in shf1 using less than 280 trials. and we would still consider shf1 to be collision-resistant. This tells us that to ensure collision-resistance it is not only important to have a long enough output but also design the hash function so that there no clever “shortcuts” to finding a collision. that. since this is still a very large number of trials. we would expect that “genuine” hash functions. as well as SHF1n . We consider the following experiment: Expow-kk (A) H $ $ $ K← K. y ← HK (x) .8 tells us that SHF1. it would not be devastating. where x was chosen at random from the domain. We believe that shf1 is well-designed in this regard. we indicate the known-key in the notation below. 5. meaning ones that perform some non-trivial compression of their data (ie. to find a pre-image of y (whether x or some other) under HK . y 1. in the absence of additional assumptions about the hash function than collision-resistance. a family H is one-way if it is computationally infeasible. meaning no attacks that exploit some weakness in the structure of the function to quickly find collisions.” For example. for all n. However. If we believe shf1 is collision-resistant. can also be considered collision-resistant.4 One-wayness of collision-resistant hash functions Intuitively. x← D . Advow H H We now ask ourselves whether collision-resistance implies one-wayness. it may help to begin by considering the natural argument one would attempt to use to show that collision-resistance implies one-wayness. Suppose we have an adversary A that has a significant advantage in attacking the one-wayness of hash function H . given HK and a range point y = HK (x).3 Let H : K × D → R be a family of functions and let A be an algorithm. Theorem 5. We could try to use A to find a collision via .

we compute y = HK (x1 ) and give K.4. having received the key K . Then for any A there exists a B such that -kk (A) ≤ 2 · Advcr1-kk (B ) + Advow H H |R| . The probability that this happens turns out to depend on the quantity: $ $ −1 PreImH (1) = Pr K ← K. So HK (x2 ) = HK (x1 ) and we have a collision. then it is also one-way. However Proposition 5. In the post-key phase. and performs enough compression that |R| is much smaller than |D|.4 Let H : K × D → R be a hash function. y ← HK (x) : |HK (y )| = 1 .2 implies that the same is true for CR2. This implies that PreImH (1) ≤ The corollary follows from Proposition 5.Bellare and Rogaway 11 the following strategy. y to A. Why? Let A be a practical . Advow H H H Furthermore the running time of B is that of A plus the time to sample a domain point and compute H once. In the pre-key phase (we consider a type-1 attack) we pick and return a random point x1 from D. Corollary 5. The result is about the CR1 type of collision-resistance. taken over y generated as shown. |D| Furthermore the running time of B is that of A plus the time to sample a domain point and compute H once. x← D . and. Then for any A there exists a B such that -kk (A) ≤ 2 · Advcr1-kk (B ) + PreIm (1) . |R| . the number of points in the range of HK that have exactly one pre-image certainly cannot exceed |R|. A general and widely-applicable corollary of the above Proposition is that collisionresistance implies one-wayness as long as the domain of the hash function is significantly larger than its range. The latter returns some x2 . |D| Corollary 5.5 Let H : K × D → R be a hash function. The catch is that we only have a collision if x2 = x1 . we know that HK (x2 ) = y .5: For any key K .5 says that if H is collision-resistant. This is the probability that the size of the pre-image set of y is exactly 1. The following quantifies this. Not quite. Proposition 5. Proof of Corollary 5. if it was successful. The following Proposition says that a collision-resistant function H is one-way as long as PreImH (1) is small.

5 does not apply.6: The assumption d > r implies that PreImH (1) = 0. Proof of Proposition 5. 1} 160 and D = {0.4: Here’s how B works: Pre-key phase Adversary B () $ x1 ← D . We now turn to the proof of Proposition 5. st ← x1 return (x1 .6 Suppose 1 ≤ r < d and let H : K × {0. We believe shf1 is collisionresistant.4. Proof of Corollary 5. meaning the bound is vacuous. Now apply Proposition 5. st) Retrieve x1 from st $ y ← HK (x1 ) . however.3) (5. meaning H is one-way. let H be the compression function shf1. The ratio of range size to domain size is 1/2. 1}d → {0. However. Corollary 5.2) then tells us H is collision-resistant we know Advcr1 H ow-kk that as long as |R|/|D| is small. Then for any A there exists a B such that -kk (A) ≤ 2 · Advcr1-kk (B ) . 1}r and every K ∈ K. As an example. which is tiny.2) (5. so the right hand side of the equation of Corollary 5. such a function is a special case of the one considered in the following Proposition. Then B is also practical. In that case R = {0.12 HASH FUNCTIONS adversary that attacks the one-wayness of H . For any Let Pr [·] denote the probability of event “·” in experiment Expcr1 H K ∈ K let −1 SK = { x ∈ D : |HK (HK (x))| = 1 } . st) Post-key phase Adversary B (K. and the above thus says it is also one-way. for which Corollary 5. 1}r be a hash −1 function which is regular. 1}672 so |R|/|D| = 2−512 . y ) return x2 -kk (B ). Equation (5. Advow H H Furthermore the running time of B is that of A plus the time to sample a domain point and compute H once.5) . AdvH (A) is low. x2 ← B (K. There are some natural hash functions.4. meaning |HK (y )| = 2d−r for every y ∈ {0. Consider a hash function H every instance of which is two-to-one.5 is 1. and since -kk (B ) is low. -kk (B ) Advcr1 H = Pr [HK (x2 ) = y ∧ x1 = x2 ] ≥ Pr [HK (x2 ) = y ∧ x1 = x2 ∧ x1 ∈ SK ] (5.4) = Pr [x1 = x2 | HK (x2 ) = y ∧ x1 ∈ SK ] · Pr [HK (x2 ) = y ∧ x1 ∈ SK ] (5.

It turns out that if shf1 is collision-resistant.7) applies another standard probabilistic inequality.5 The MD transform We saw above that SHF1 worked by iterating applications of its compression function shf1.7) (5. -kk (B ) and B . and be assured it is collision-resistant. then SHF1 is guaranteed to be collision-resistant. 5. We need no longer seek attacks on SHF1. Adversary A has no information about x1 other than that it is a random point in the set −1 −1 HK (y ). 1}v be a family of functions that we call the compression function. SHF1 works by compressing its input 512 bits at a time using shf1. F . Equation (5.5) uses the standard formula Pr[E ∧ F ] = P r[E |F ] · Pr[F ]. compresses 672 bits to 160 bits.Bellare and Rogaway 13 ≥ ≥ = 1 · Pr [HK (x2 ) = y ∧ x1 ∈ SK ] 2 1 · (Pr [HK (x2 ) = y ] − Pr [x1 ∈ SK ]) 2 1 -kk (A) − PreIm (1) . and v another integer parameter called the chaining-variable length. Equation (5. Let B denote the set of all strings whose length is a positive multiple of b bits. M1 .7 A function pad: D → B is called a MD-compliant padding function if it has the following properties for all M.3) is by definition of Advcr1 H cause Pr[E ] ≥ Pr[E ∧ F ] for any events E. namely that Pr[E ∧ F ] ≥ Pr[E ] − Pr[F ].2).8) Re-arranging terms yields Equation (5. under any key. Definition 5. the harder task of designing a collision-resistant hash function taking long and variable-length inputs has been reduced to the easier task of designing a collisionresistant compression function that only takes inputs of some fixed length. Let b be an integer parameter called the block length. In other words. We assume it is collisionresistant. We are now going to take a closer look at this paradigm. So the probability that x2 = x1 is at least 1/2 in this case. However if x1 ∈ SK then |HK (y )| ≥ 2. Let h: K × {0.6) is justified as follows. Equation (5. Equation (5. we need only concentrate on validating shf1 and showing the latter is collision-resistant. Let us now justify the steps above. The latter. b and let D be some subset of {0. The iteration method has been chosen carefully. 1}b+v → {0. 3]. 1}<2 . · Advow H H 2 (5.6) (5. This paradigm shows how to transform a compression function into a hash function in such a way that collision-resistance of the former implies collision-resistance of the latter.4) is true beEquation (5. M2 ∈ D: . To validate it. Equation (5. This has clear benefits.8) uses the definitions of the quantities involved. This is one case of an important hash-function design principle called the MD paradigm [10.

. However. meaning is a sequence of b-bit blocks. we end up with strings that differ in their final blocks. M ) y ← pad(M ) Parse y as M1 M2 · · · V ← IV for i = 1. V2. The main fact about this method is the following. 5.i V2. above. .i−1 ) Figure 5. Remember that the output of pad is in B . . n[1] do V1. We build a family H : K × D → {0. x2 ) y1 ← pad(x1 ) .i V2.n[2]−1 ) n ← n[1] // n = n[1] = n[2] since |x1 | = |x2 | for i = n downto 1 do if M1. n[2] do V2.i ← h(K. n do V ← h(K.0 ← IV .i−1 ) if (V1. .1 M2.n[2] V2.i V2. An example of a MD-compliant padding function is shapad.n[2] where |M2. .7: Hash function H defined from compression function h via the MD paradigm.0 ← IV for i = 1.i V1. and adversary Ah for the proof of Theorem 5. Now let IV be a v -bit value called the initial vector. Condition (3) of the definition is saying that if two messages are different then. .1 M1.i−1 then return (M1. (1) M is a prefix of pad(M ) (2) If |M1 | = |M2 | then |pad(M1 )| = |pad(M2 )| (3) If M1 = M2 then the last block of pad(M1 ) is different from the last block of pad(M2 ).i−1 . consists of b bits.8. . . M2. there are other examples as well. 1}v from h and pad as illustrated in Fig.i | = b (1 ≤ i ≤ n[1]) Parse y2 as M2.i ← h(K.n[2] OR x1 = x2 ) return FAIL if |x1 | = |x2 | then return (M1. y2 ← pad(x2 ) Parse y1 as M1. .14 HASH FUNCTIONS H (K. M1. M2. when we apply pad to them.n[1] = V2. built from h = shf1 and pad = shapad.2 · · · M2.n[1] where |M1. .i−1 = M2.7. Notice that SHF1 is such a family. . M2.n[1]−1 . A block. Mi V ) Return V Mn where |Mi | = b (1 ≤ i ≤ n) Adversary Ah (K ) Run AH (K ) to get its output (x1 .i−1 ) for i = 1.i V1.i V1.n[1] V1.i | = b (1 ≤ i ≤ n[2]) V1. . .2 · · · M1.

Let us assume this. Equation (5. Let Vn denote the value V1.9) H H h h Furthermore.7. The second case is that x1 . let us look at the inputs to the application of hK that yielded these outputs. But since h -kk (A ) is low. because its running time is that of AH plus the time to do some extra computations of h.n−1 and M2. If these inputs are different. Adversary Ah computes V1. they form a collision for hK and can be returned by Ah . The inputs in question are M1. x2 ) is the collision output by AH . This means that M1.n V1. If x1 . so we let n be the number of b-bit blocks that comprise y1 and y2 . But item (1) of Definition 5. then we know that V1. taking input a key K ∈ K.n[2] .n[2]−1 . because y1 = y2 . The first case is that x1 . Item (2) of Definition 5. Why is the latter true? We know that x1 = x2 . adversary Ah finds a collision in hK exactly .n−1 V2. they are the same.n[2] V2. Why? Let AH be a practical adversary attacking H . 1}v be built from h as described above. y2 have the same length as well. however.Bellare and Rogaway 15 Theorem 5. and Advcr2-kk (A ) ≤ Advcr2-kk (A ) . x2 ) of messages in D.n .n[1] = M2.n−1 that under hK yielded Vn .n[1] V1. x2 is a collision for HK then we know that V1. Then Ah is also practical.n[1]−1 and M2.n . This theorem says that if h is collision-resistant then so is H . x2 have different lengths. we now consider the inputs M1. We now consider two cases.8: Adversary Ah . We claim that if x1 .9) then tells is collision-resistant we know that Adv cr2 h h cr2-kk (AH ) is low. Suppose we are given an adversary AH that attempts to find collisions in H .n−2 and M2.n[2] = HK (x2 ). Denoting this value by Vn−1 .n[1] = V2.7 tells us that y1 . is depicted in Fig. Then we can construct an adversary Ah that attempts to find collisions in h.n[2] . (5.n[1] = HK (x1 ) and V2. continually stepping back and not finding our collision? No. the running time of Ah is that of AH plus the time to perform (|pad(x1 )| + |pad(x2 )|)/b computations of h where (x1 . So y1 = y2 .n[1] V1. If. x2 is a collision for HK then Ah will return a collision for hK .n[1]−1 = M2. 1}b+v → {0. meaning H is collision-resistant as well.7 says that x1 is a prefix of y1 and x2 is a prefix of y2 . which by assumption equals V2.n V2.n[2]−1 . 1}v be a family of functions and let H : K × D → {0.n−2 that under hK yield Vn−1 . 5. x2 have the same length. We know this length is a positive multiple of b since the range of pad is the set B . Can we get stuck.n−1 = V2.7 tells us that M1. us that AdvH Proof of Theorem 5. and thus these two points form a collision for hK that can be output by Ah . We have argued that on any input K .n−1 V1. Now.n−1 . It runs AH on input K to get a pair (x1 . they form a collision for hK . If they are different. The argument repeats itself: if these inputs are different we have a collision for hK . else we can step back one more time.n[2] V2. We compare the inputs M1.8 Let h: K × {0. Item (3) of Definition 5.

This game realizes one way to compute Ym = CBCρ (M ) and Ym = CBCρ (M ) for a random ρ ∈ Func(n). .9). Intuitively. and it is. . consider the X1 . Also depicted in Fig. in game C1. . 1}n )m be distinct strings. By definition. This justifies Equation (5. Xm values that get fed to ρ as we process M = M1 · · · Mm .8. . 5. M ∈ ({0. 5. . This can happen due either to an internal collision between Xm and Xm . Then write M = M1 · · · Mm and M = M1 · · · Mm where each Mi and each Mj is n-bits long and m ≥ m. let M name the shorter of the two strings and let M name the longer. and so we would like to bound the probability. if M1 = M1 then k = 0.9 [CBC Collision Bound] Fix n. We are interested in the probability that Ym = Ym . The main component of the running time of Ah is the time to run AH . Lemma 5. m CollCBC[ Func(n)] ≤ mm max{m. without loss of generality. . Then m. m } + 2n 2n Proof: If the lengths of M and M differ then.16 HASH FUNCTIONS when AH finds a collision in HK . it performs a number of computations of h equal to the number of blocks in y1 plus the number of blocks in y2 .7 Polynomial evaluation is an almost-universal hash function The CBC MAC is an almost-universal hash function $ We show that if M. 5. The former is unlikely too. but to prove this we generalize and consider a broader set of internal collisions than just Xm coinciding with Xm . where Ym was produced as ρ(Xm ) and Ym was produced as ρ(Xm ).8 is game C2. We use another game-playing argument. Note that k = m if M is a prefix of M and otherwise k is the largest nonnegative integer such that M1 · · · Mk = M1 · · · Mk but Mk+1 = Mk+1 . and consider the X1 . . 1}n )+ are distinct then Pr[ρ ← Func(n) : CBCρ (M ) = CBCρ (M )] is small. That is. we give a bit of intuition about what they aim to capture. that Ym = Ym . There is some more overhead. Consider game C1. By “small” we mean a slowly growing function of m = M n and m = M n .6 5. . but small enough to neglect. We now justify the claim about the running time of Ah . 1}n )m and M ∈ ({0. Further note that k < m . . In addition. the latter seems unlikely. obtained by eliminating the shaded statements. as defined in Fig. Let k = N n . Before taking up the analysis of the games in earnest. Xm values that get fed to ρ as we process M = M1 · · · Mm . or Ym might equal Ym even in the absence of this collsion between Xm and Xm . m ≥ 1 and let M ∈ ({0. Write M as M = N I and write M as M = N I where N is the longest common prefix of M and M whose length is divisible by n. m.

.. Accounting for the fact that Xi = Xi for all i ∈ [1 . as written. bad ← true else η (Xi ) ← Yi for i ← k + 1 to m do $ Yi ← {0. we don’t really expect to see collisions among X1 . bad ← true Y k ← Yk for j ← k + 1 to m do $ Yj ← {0. but not all. . . . Game C1 provides a way to compute the CBC MAC of distinct messages M = M1 · · · Mm and M = M1 · · · Mm that are identical up to block k . Xm coincides with one of Xk+1 . . . Xk and one of Xk+1 . . (c) collisions that occur between N and I (one of X1 . . . The computed MACs are Ym and Ym . . There is nothing “magical” about giving up exactly on these internal collisions—it is simply that the more internal collisions one gives up . . . . Xm . . we will “give up” in the analysis for some of these possibilities. To get a better bound. . . . the Xi -values that occur as we process the common prefix N of M and M ). . . bad ← true Figure 5. .8: Game C1. bad ← true if Xj ∈ Domain(η ) then Yj ← η (Xj ). Xm . Xm ). Xi+1 . . the internal collisions that we focus on are (a) collisions that occur while we process N (meaning collisions among any of X1 . 1}n do η (X ) ← ι(X ) ← ι (X ) ← undef Y0 ← 0n for i ← 1 to k do $ Yi ← {0. k ] (that is. . after eliminating the shaded statements. and game C2. . . . (b) collisions that occur between N and I (those between one of X1 . . . .Bellare and Rogaway 17 01 02 10 11 12 13 14 20 21 22 23 24 30 40 41 42 43 44 45 46 bad ← false for X ∈ {0. . . Xk coincides with one of Xk+1 . . Xm ). . Xk ). and (d) collisions that occur between I and I (one of Xk+1 . . Specifically. . . Xm ). 1}n Xi ← Yi−1 ⊕ Mi if Xi ∈ Domain(ι) then Yi ← ι(Xi ) else ι(Xi ) ← Yi if Xi ∈ Domain(η ) then Yi ← η (Xi ). . 1}n Xi ← Yi−1 ⊕ Mi if Xi ∈ Domain(η ) then Yi ← η (Xi ). 1}n if Yj = Ym then bad ← true Xj ← Yj −1 ⊕ Mj if Xj ∈ Domain(ι ) then Yj ← ι (Xj ) else ι (Xj ) ← Yj if Xj ∈ Domain(ι) then Yj ← ι(Xj ).

24.5 k (k − 1)/2n .18 HASH FUNCTIONS on the worse the bound but the easier the analysis. We first claim that. This is because every point placed into the domain of η is of the form Xi = Yi−1 ⊕ Mi where each Yi−1 is randomly selected from {0. m − 1]. 45. Since m ≥ k + 1 we know that the loop covering lines 40–46 will be executed at least once and Ym will take its value as a result of these statements. When a preliminary Ym value gets chosen at line 41 for iteration j = m we see that if Ym = Ym then line 42 will set the flag bad . We now proceed to bound Pr2 [B]. and as the value secondarily by ι otherwise. m }/2n . and we seek to bound the probability. So for any i and j with 1 ≤ i < j ≤ k we have that Pr[Yi−1 ⊕ Mi = Yj −1 ⊕ Mj ] = 2−n and there are 0. If the value gets overwritten at either of line 45 or line 46 then bad will get set to true. Also note that games C1 and C2 are identical until the flag bad gets set to true (meaning that any execution in which a highlighted statement is executed it is also the case that bad is set to true in that execution) and so. But observe that the initially chosen Ym value is not necessarily the final Ym value returned by game C1—the Ym value chosen at line 42 could get overwritten at any of lines line 44. Now look at game C1. We thus have that Prρ [CBCρ (M ) = CBCρ (M )] ≤ Pr2 [B]. If Ym gets overwritten by Ym at line 44 then Ym had earlier been placed in the range of ι and it could only have gotten there (since ι grows only by the else -clause of line 44) by Ym being equal to Yj for an already selected j ∈ [k + 1 . or 46. Prρ [CBCρ (M ) = CBCρ (M )] ≤ Pr1 [B]. The proof we give formalizes the intuition of this paragraph and does all the necessary accounting. in particular. In that case the flag bad would have already been set to true by an earlier execution of line 42: when the Yj value was randomly selected at line 41 and found to coincide with Ym the flag bad is set. any time that Ym = Ym it is also the case that flag bad gets set to true. Game C1 can thus be seen to provide a perfect simulation of the CBC algorithm over a random function ρ ∈ Func(n). and 46.. that Ym = Ym . Let Pr1 [·] denote the probability of an event in game C1 and let Pr2 [·] denote the probability of an event in game C2. and the function ι keeps track of the association of points that arise during the processing of I . in game C1. the function ι keeps track of the association of points that arise during the processing of I . Pr1 [B] = Pr2 [B]. which is a constant. showing that Pr2 [B] ≤ mm /2n +max{m. The function η keeps track of the association of points that arise during the processing of N . Focus on line 42. This is done by summing the probabilities that bad gets set to true at lines 14. ι. We thus have that every execution of game C1 that results in Ym = Ym also results in bad being true. in game C1. 42. The probability that bad gets set at line 14 of game C2 is at most 0. By the contents of the last paragraph. 1}n (at line 12) with the single exception of Y0 . It works not by growing a single random function ρ but by growing three random functions: η .5 k (k − 1)/2n such . Let B be the event (whether in game C1 or game C2) that the flag bad gets set to true. 45. If a value X should get placed in the domain of two or more of these three functions then we regard the corresponding range value as that specified first by η if such a value has been specified. and ι .

The probability that bad gets set at line 42 of game C2 is at most (m − k )/2n . this value is at most (mm + m )/2n . As for Xk+1 = Yk ⊕ Mk+1 . the proof is complete. the value Yk was just selected uniformly at random back at line 12 and is independent of the domain of ι apart from the domain point Yk ⊕ Mk+1 . Recalling that m is the block length of the longer of M and M . Each point Xi whose presence in the domain of η we test is of the form Xi = Yi−1 ⊕ Mi . a cr0-adversary has already output both its points. the adversary does not get the key K in the post-key attack phase.9. the value Yk was just selected uniformly at random (at the last execution of line 12) and is independent of the domain of η . as defined at line 22. 5. independent of the now-determined domain of ι. We conclude by the sum bound that the probability that bad gets set somewhere in game C2 is at most (k (k −1)/2+k (m−k )+(m −k )+(m−k )(m −k )+k (m −k ))/2n = mm + m − k (k − 3)/2. Each of these Yi−1 values with the exception of Yk was uniformly selected at line 41. Thus each time line 45 executes there is at most a (m − k )/2 n chance that bad will get set. . Formal experiments defining these two notions are given in Fig.3 except that. in the post-key attack phase. which is guaranteed to be distinct from Yk ⊕ Mk+1 by our criterion for choosing k . and the line is executed m − k times. so we get only two new notions. but instead gets an oracle for HK (·). The CR0 notion however coincides with the one for known-key attacks since by the time the post-key attack phase is reached. There are again three possible notions of security. Each of these Yi−1 values with the exception of Yk is uniformly selected at line 21. as defined at line 43. analogous to those in Fig. Since k can be zero. j ) pairs possible. A is not given K but is instead given an oracle for HK (·).8 Collision-resistance under hidden-key attack In a hidden-key attack. This is because the line is executed m − k times and each time the chance that bad gets set is 1/2n since Yj was just selected at random from {0. Thus each time line 24 executes there is at most a k/2n chance that bad will get set. Each point Xj whose presence in the domain of ι is being tested is of the form Xj = Yj −1 ⊕ Mj . 1}n . The probability that bad gets set at line 46 of game C2 is at most k (m − k ) for reasons exactly analogous to that argument used at line 24. The probability that bad gets set at line 45 of game C2 is at most (m − k )(m − k ). The probability that bad gets set at line 24 of game C2 is at most k (m − k )/2 n . independent of the now-determined domain of η . 5. and the line is executed m − k times. 5.Bellare and Rogaway 19 (i. As for Xk+1 = Yk ⊕ Mk+1 .

Construct from H an -AU hash-function family H : K × {0. 1}n )+ → {0. Advcr1 H H the hash function. x2 ∈ D – Oracle queries x1 .2 Let H : K × {0. 1}n → {0. Construct from H an 2 -AU hash-function family H : K2 × {0. Let E : K×{0. 1}a → {0. 5. 1}2a → {0. Problem 5.10 Let H : K × D → R be a hash function and let A be an algorithm. and often this has not gone well.20 HASH FUNCTIONS -hk (A) Expcr2 H $ K← K . 1}n be an -AU hash-function family. 1}2n . Here we select K to be some public constant. 1}n be an -AU hash-function family. K ← K . Problem 5. Run AHK (·) () If there exist x1 . 1}2n . x2 were made by A – The answers returned by the oracle were the same then return 1 else return 0 -hk (A) Expcr1 H $ $ (x1 . 1}a → {0.1 Hash functions have sometimes been constructed using a block cipher. x2 such that – x1 = x2 and x1 . 1}n be a block cipher and consider constructing H : K× ({0. Run AHK (·) (st) If there exists x2 such that – x1 = x2 and x1 . st) ← A() . Show that this hash function is not collision-resistant (no matter how good is the block cipher E ). Definition 5. x2 were made by A – The answers returned by the oracle were the same then return 1 else return 0 Figure 5. 1}n by way of the CBC construction: let the hash of M1 · · · Mm be Ym where Y0 = 0n and Yi = EK (Hi−1 ⊕ Mi ) for i ≥ 1. x2 ∈ D – Oracle queries x1 .9: Experiments defining security notions for two kinds of collision-resistant hash functions under hidden-key attack. 1}a → {0. .3 Let H : K × {0.9 Problems Problem 5. We let Advcr2-hk (A) = Pr Expcr2-hk (A) = 1 H H -hk (A) = Pr Expcr1-hk (A) = 1 .

Hash function balance and its impact on birthday attacks. vol. Lecture Notes in Computer Science Vol. Cryptanalysis of MD4. RIPEMD-160. Collisions for the compression function of MD5. [9] National Institute of Standards. http://csrc. C.. Advances in Cryptology – CRYPTO ’97. May 1996. Springer-Verlag. 1989. [5] H.iacr. Rump Session of Eurocrypt 96. August 2000.. Damg˚ ard. D.html. [4] B. Lecture Notes in Computer Science Vol. Kohno.. a strengthened version of RIPEMD. 765. Fast Software Encryption—Cambridge Workshop. Springer-Verlag. G. 1989. Springer-Verlag. Bosselaers. Yung.nist. Dobbertin. 1294. [2] M. [6] H. FIPS 180-2. 21 .. Bellare and T. Naor and M. Springer-Verlag.. D. Rogaway. Cachin and J. 1039. Secure hash standard. Gollmann ed. Lecture Notes in Computer Science Vol. T. Universal one-way hash functions and their cryptographic applications. Brassard ed. Lecture Notes in Computer A Design Principle for Hash Functions. 1993. [7] H. B. Camenisch ed. Kaliski ed. [3] I. Bosselaers. http://www. Springer-Verlag.Bibliography [1] M. den Boer and A. Lecture Notes in Computer Science Vol. Helleseth ed. Preneel.. Proceedings of the 21st Annual Symposium on the Theory of Computing. 1996. Dobbertin. 3027. ed. 1997. ACM. 2004. Dobbertin. Lecture Notes in Computer Science Vol. B. Advances in Cryptology – EUROCRYPT ’93. 435. Advances in Cryptology – CRYPTO ’89. 1039. Springer-Verlag. 1996. Bellare and P. [8] M. Advances in Cryptology – EUROCRYPT ’04. Gollman. Collision-Resistant Hashing: Towards Making UOWHFs Practical. Fast Software Encryption ’ A. pdf. Cryptanalysis of MD5.

Advances in Cryptology – CRYPTO ’89. Rivest. Brassard ed. [12] R. G. 435. [11] R. Also IETF RFC 1320 (April 1992). 537. 1990.22 BIBLIOGRAPHY [10] R. . Advances in Cryptology – CRYPTO ’90. 1989. Merkle. IETF RFC 1321 (April 1992).. Rivest. The MD5 message-digest algorithm. J. One way hash functions and DES. Menezes and S. pp. Lecture Notes in Computer Science Vol. The MD4 message-digest algorithm. Springer-Verlag. Lecture Notes in Computer Science Vol.. Vanstone ed. 303–311. Springer-Verlag. A.