= y
L
;
8 return H
;
Algorithm 1: SMD iteration technique
Any hash iteration technique must preserve the security
properties of the function it iterates over. We review the
seven existing security notions [3]: the standard three of
collision-resistance (Coll), preimage-resistance (Pre) and sec-
ond preimage-resistance (Sec); and the always and everywhere
variants of the last two of the three.
Denition 1. Let H = / / } be a hash-function
family and 0, 1
m
/. Let A be an adversary. Then:
Adv
Pre[m]
H
(A) = Pr
K
$
/; M
$
0, 1
m
;
Y H
K
(M); M
$
A(K, Y )
: H
K
(M
) = Y
(1)
Adv
ePre[m]
H
(A) =
max
Y }
Pr
K
$
/; M
$
A(K)
: H
K
(M) = Y
(2)
Adv
aPre[m]
H
(A) =
max
K /
Pr
M
$
0, 1
m
;
Y H
K
(M);
M
$
A(Y ) : H
K
(M
) = Y
(3)
While the preimage resistance denes the usual one-way
functions, the everywhere-preimage resistance states the fact
that for whatever range point is selected, it is computationally
hard to nd its preimage. Also the always-preimage resistance
consolidates Pre by saying that a function like SHA1 is
one-way: one could consider SHA-1 being part form a family
of hash functions (keyed, for example, by the initial chaining
value) and tell whether or not it remains hard to nd a
preimage of a random point for it.
Denition 2. Let H = // } be a hash-function family
and let m be a number such that 0, 1
m
/. Let A be an
adversary. Then:
Adv
Sec[m]
H
(A) = Pr
K
$
/; M
$
0, 1
m
;
M
$
A(K, M) : (M ,= M
)
(H
K
(M) = H
K
(M
))
(4)
Adv
eSec[m]
H
(A) =
max
M 0, 1
m
Pr
K
$
/;
M
$
A(K) : (M ,= M
)
(H
K
(M) = H
K
(M
))
(5)
Adv
aSec[m]
H
(A) =
max
K
Pr
M
$
0, 1
m
;
M
$
A(M) : (M ,= M
)
(H
K
(M) = H
K
(M
))
(6)
Briey, if the second-preimage resistance claims that it is
hard to nd a partner for a known and xed domain point,
the everywhere second-preimage resistance (eSec) states the
fact that it is hard for an adversary to nd a partner for
any particular domain point. The always-second preimage
resistance tells that for a function like SHA1 it remains hard
to nd a partner for a random point (again considering it as
being part from a family of hash functions).
Denition 3. Let H = // } be a hash-function family
and let A be an adversary. Then:
Adv
Coll
H
(A) = Pr
K
$
/; (M, M
)
$
A(K)
: (M ,= M
) (H
K
(M) = H
K
(M
))
(7)
The collisison resistance property denes the difculty
with which an adversary is able to nd two distinct points in
the domain of a hash function that hash to the same range
point.
The relationships between the security properties presented
above are physically represented in Figure 1. The basis for
understanding those is the difference between the conven-
tional implication and the provisional implication. Briey, if
a conventional implication is a regular one, a provisional
implication depends of some technical condition (i.e. the
compressing factor of the hash function which could tell
us that the implication strength is increasing with the hash
functions compression rate). The formal denition of the
implications, taken from [2], is as follows:
Denition 4. Fix /, /, m, and n where 0, 1
m
/.
Suppose that xxx and yyy are labels for which Adv
xxx
H
and
Adv
yyy
H
have been dened for any H : / / 0, 1
n
.
Conventional implication We say that xxx implies yyy,
written xxx yyy, if Adv
yyy
H
(t) cAdv
xxx
H
(t
) for
all hash functions H : / / 0, 1
n
where c is an
absolute constant and t
= t + cTime
H,m
;
Provisional implication We say that xxx implies yyy to ,
written xxx yyy to , if Adv
yyy
H
(t) cAdv
xxx
H
(t
)+
for all hash functions H : / / 0, 1
n
where c
is an absolute constant and t
= t + cTime
H,m
.
In the denition above, the is a placeholder which is either
[m] (for Pre, aPre, Sec, aSec, eSec) or empty (for ePre, Coll).
Pre
ePre aPre
Sec
aSec eSec
Coll
Figure 1. Summary of the relationships among notions of
hash-function security. Solid arrows represent conventional
implications, dotted arrows represent provisional
implications, and the lack of an arrow represents a
separation.
III. TOKEN-FREE BOUNDED DELAY CODES AND HASH
ITERATION
A. Token-free bounded delay codes
In this section we dene token-free codes with bounded
delay. After that we describe how to use them in a hash
iteration technique. At an informal level, a code is a set of
words such that any product of these words can be uniquely
decoded. Of the special classes of codes investigated in the
literature, codes with bounded delay are of interest from the
point of view of many problems in language theory [4].
Denition 5. A code C has bounded delay k 1 from left
to right if, whenever
x
i1
x
i2
...x
i
k
is prex of x
j1
x
j2
...x
jn
,
then x
i1
= x
j1
. Note that xs are code words over C.
Bounded delay from right to left is similarly (analo-
gously) dened by using sufxes. Of course, a code can have
both left to right and right to left k-bounded delay.
The denition states the fact that we need to read k code
words at maximum to identify how to decode the rst word
(from a total of n code words signifying the whole production
over C which we are attempting to decode). One consequence
of that would be that x
i1
x
i2
...x
it
= x
j1
x
j2
...x
jt
, t < k. The
denition stated above is A. Salomaas denition of codes with
bounded delay [4] Also, note that if a code C has bounded
delay k then for every k
k then C is a k
-bounded delay
code as well [4]. An important fact to mention is that the
notion of a bounded delay code C is satisfactory only if C is
a code. As a preliminary notation, note that u, v
, then
u v, denotes that u is a subword in v.
Denition 6. A code C over the alphabet is m-token-free
if and only if there exists
m
such that w = c
i1
c
i2
...c
i
k
,
with c
ij
C
, 1 j k, / w .
Denition 7. A bounded delay code which is also m-token-
free is called m-token-free bounded delay code.
Lemma 1. If a code C is m-token-free then, for any m
> m
then C is m
-token-free as well.
Proof: Let C be an m-token-free code (over the alphabet
). Let , with [[ = m such that, c C
, / c. Consider
the word w = x, where x
, thus w
m+|x|
. Such a
c C
(M);
prex c
> C
= y
L
;
12 return H
;
Algorithm 2: Token-free iteration technique
Lemma 2. If x ,= y, with x, y
(x) = c
(y), for x ,= y.
Contradiction with the fact that C is a code.
Lemma 3. If x ,= y, with x, y
then token-free-proc(x)
cannot be a sufx for token-free-proc(y) and vice-versa.
Proof: Assume the contrary and consider the case when
the former is sufx in the latter. Therefore, we have the
following equation:
token-free-proc(y) = [[token-free-proc(x) = [[[[c
(x) =
[[c
(y).
Which is impossible, because:
1) is chosen so that this could not happen;
2) c
(x) =
c
(k, M);
3) obtain M
l
|t
p
) = y and l is the number
of blocks in which M
is split;
5) return M
l
|t
p
, where M
l
is the last chunk obtained from
encoding and splitting M
.
Therefore, because A
) = Adv
Pre[m]
H
(A)
The fact that the adversary A
]
H
(A
) = Adv
Pre[m]
H
(A
) , m m
with m
;
when m < m
-sized input.
Because adversary A is the best algoritm to nd a preim-
age for the iterated hash function and Adv
Pre[m]
H
(A
)
Adv
Pre[m]
H
(1), we then have:
Adv
Pre[m]
H
(1) Adv
Pre[m]
H
(1)
Theorem 2. Let H be the compression hash function fam-
ily with which the token-free with bounded delay iteration
technique is instantiated. Let H
(M
(k, M) = h
(k, M
), where
M
0, 1
m
. Thus the adversary is having M
and M
, with M ,= M
, such that h
(k, M) = h
(k, M
).
Let t
i
, m
i
(respectively t
i
, m
i
) be some intermediate values
in the computation of h
(k, M) (respectively h
(k, M
)),
and l (resp. l
. Considering that h
(k, M) = h
(k, M
), we
have the following equation: h
(k, M) = h
K
(m
l
|t
p
) =
h
K
(m
l
|t
p
) = h
(M
l
|t
p
it follows that h is vulnerable as well
to collision-resistance attack since we are able to nd
m
l
|t
p
hashing into the same value as m
l
|t
p
for h;
m
l
|t
p
= m
l
|t
p
.
When m
l
|t
p
= m
l
|t
p
we obtain that:
m
l
= m
l
;
t
p
= t
p
.
So, t
p
= t
p
, will lead us to: h
K
(z
p
|t
p1
) = h
K
(z
p
|t
1
).
As before, without restricting the generality let us suppose that
p p
p
|t
1
it follows that h is vulnerable as
well to collision-resistance attack since we are able to
nd z
p
|t
1
hashing into the same value as z
p
|t
p1
for h;
z
p
|t
p1
= z
p
|t
1
.
The second case, tells us that we can repeat the process:
z
p1
= z
1
.
So, further we obtain that:
z
p1
= z
1
;
t
p2
= t
2
.
Continuing like this we either have:
1) two different inputs hash into the same value (for the
compression function, making it vulnerable as well);
2) in the assumption that p = p
, then we have
t
j
= t
j
; z
j
= z
j
, j = 1, p it follows that
token-free-proc(m
l
|y
l1
) = token-free-proc(m
l
|y
l1
)
it follows that we must have y
l1
= y
1
(since
token-free-proc is using the token-free code c);
3) in the assumption that p > p
there is an x > 1
such that t
u
= t
j
; z
u
= z
j
, u = x, p, j =
1, p
l
|y
l1
) is suf-
x in token-free-proc(m
l
|y
l1
) contradiction with
Lemma 3.
Thus, since all the other cases are showing that the compres-
sion function is vulnerable as well, we need to repeat the
process for the previous iteration step: y
l1
= y
1
. After
the repetition, similarly, we obtain that:
m
l1
= m
1
;
y
l2
= y
2
.
Otherwise, the compression function h is vulnerable as well.
Clearly, this repetitive process can continue until:
1) two different chunks hash into the same value (the
compression function being vulnerable as well);
2) in the assumption that l = l
, then we have
y
i
= y
i
; m
i
= m
i
, i = 1, l it follows that
token-free-proc(M) = token-free-proc(M
) contra-
diction with Lemma 2;
3) in the assumption that l > l
i
; m
k
= m
i
, i = 1, l
, k = j, l it follows
that token-free-proc(M
) is sufx in token-free-proc(M)
it follows a contradiction with Lemma 3.
Theorem 4. Let H be the compression hash function fam-
ily with which the token-free with bounded delay iteration
technique is instantiated. Let H
is split;
4) return m
1j
|t
1k
and m
2n
|t
2o
.
Note that the m
1j
|t
1k
, and m
2n
|t
2o
are the intermediate
values when digesting message M, respective M
. Also
note than, when constructing this algorithm, weve used the
inferrence proven in the above lemma: any collision resistance
successful attack to the iterated hash function, can be traced
back to the compression function on which the iteration
relies on. Therefore, because A
) = Adv
Coll
H
(A)
The fact that the adversary A
) = Adv
Coll
H
(A
) , m m
with m
;
when m < m
-sized input.
Because adversary A is the best algoritm to nd a collision
for the iterated hash function and Adv
Coll
H
(A
) Adv
Coll
H
(1),
we then have:
Adv
Coll
H
(1) Adv
Coll
H
(1)
D. Comparison with other existing iteration techniques
As shown in table I, and according to [2] the SMD con-
struction preserves Coll and ePre security, but fails to preserve
any of the other notions. All of the existent schemes preserve
ePre; intuitively, if all of the range of the (randomly keyed)
compression function is hard to invert, then iterating produces
a function whose range is similarly hard to invert. Apart
from ePre, most schemes preserve only collision resistance.
These schemes include SMD, EMD, HAIFA, and Randomized
hash. Of the twelve schemes in the table, besides ROX,
none preserves all seven notions. In fact, the best-performing
existing constructions in terms of property preservation are the
XOR Linear hash and Shoups hash, which still preserve only
three of the seven notions (Coll, eSec, and ePre). The XOR
Tree hash is the only iteration to preserve Pre, and none of
the schemes preserve Sec, aSec or aPre. Remember that the
latter two are particularly relevant for the security of practical
hash functions because they do not rely on the compression
functions being chosen at random from a family.
In relation with the existent iteration techniques, our
token-free scheme is proven to preserve Coll, Pre, ePre,
aPre and neither has proofs or couter-examples regarding
Sec, eSec or aSec. That makes it the best performer,
after ROX. However, the difference between ROX and the
proposed token-free technique is that ROX is more like a
theoretically proven seven property preserving technique,
hard to implement in practice. The reason is that ROX makes
use of two random oracles which are a challenge regarding
the actual implementation. Likewise, its designers say that
[2]: It is quite standard in cryptography for new primitives
to rst nd instantiations in the random oracle model, only
much later to be replaced with constructions in the standard
model. So, without entering into implementation details,
ROX designers only suggest an implementation sketch to
either reuse the compression function about three times as
many rounds as normal (with different values of constants) -
admitting that this violates good cryptographic hygiene, either
by using calls to a blockcipher like AES - which is designed
independently of the compresion function. Regarding this
aspect, our technique is much more easier to implement since
nding token-free codes with bounded delay seems to be far
more easier than implementing a specic random oracle [1].
Regarding generic attacks (like multi-collisions, second-
preimage search with expandable message [5] or the herding
type [6]) to which the original MD construction is proven
not to be resistent, neither token-free nor the ROX technique
have proven secure. Moreover, we are aware that our current
proposed technique is clearly susceptible to these type of
attacks as we do expose our whole inner state (by the hash
digest returned). However, the Sponge approach seems to have
proven security regarding length expansions attack, therefore
working on proposing an iteration technique that is resilient
to those attack types as well is part of our future work. In [7],
the Sponge authors propose to work in a squeeze mode (to
truncate the nal result), but for now this is not a solution,
as our the proof regarding the preservation of Coll will be no
longer valid (as we can not guarantee the fact that a collision
of the iterated hash function can be traced back to a collision
towards the compression function). Still, regarding the possible
squeeze mode for MD like construction techniques, the Sponge
designers advise that MD functions should work with an larger
inner state and in the end just truncate the nal chaining
value to the desired hash length. Sponge authors also say that
this should be done even if the reduction proof for collision
resistance will be no longer valid because there is not evidence
that desigining xed-length compression function would be
easier in the rst place. That being said, the resistance of the
resulted function would be limited by the size of the inner
state (successfully applied in the new SHA-3: Keccak).
In relation to SMD, which provides security preservation
only for Coll and ePre, our technique brings a security im-
provement by allowing the preservation of other two security
properties (but with a performance trade-off). Also, we learned
that using token-free codication only in the preprocessing
6
phase of the plain SMD technique will not enhance its security
since the CE1 from [2] will apply to the newly resulted
iteration technique as well.
In conclusion, even though we do not have proven security
for Sec, eSec, and aSec, our technique seems to be more
practical than ROX. Likewise, since our token-free approach
follows the MD construction model it is susceptible to multi-
collision attacks. In order to be resistant to those attacks,
investigating how to benet from the Sponge approach and
token-free codes at the same time, as well as trying to
model the ROX random oracles or the padding technique
of the Sponge functions with token-free codes are different
investigation tracks which are related to our future work.
IV. CONCLUSIONS
We contributed to the eld rst by introducing the notion of
token-free bounded delay codes, after which we showed that
the token-free with bounded delay codes iteration technique
can support the hash security of a compression hash function.
We are able to preserve all of the following security properties:
Pre, aPre, ePre, Coll. Moreover, regarding the preservation
of the seven notions of hash security, the token-free iteration
technique seems to be the best-performer after ROX. However,
even though we do not have proven security for Sec, aSec, and
eSec, our technique seems to be more practical than ROX. The
main reason is that ROX works in the random oracle model,
as opposed to the standard working model of our technique.
REFERENCES
[1] S.C.Dit u Can we use codes with bounded delay on hash iteration
techniques? Bachelor Thesis 2011, coordinated by prof. dr. F.L.T iplea
[2] P.Rogway, T.Shrimpton Cryptographic Hash-Function Basics: Def-
initions, Implications, and Separations for Preimage Resistance,
Second-Preimage Resistance, and Collision Resistance Fast Software
Encryptions 2004, LNCS No. 3017, pages 371-388.
[3] E. Andreeva, G. Neven, B. Preneel, T. Shrimpton Seven-property-
preserving. Iterated Hashing: ROX Asiacrypt 2007.
[4] A. Salomaa Jewels of formal language theory Computer Science Press,
1981. University of Turku, Finland.
[5] J. Kelsey, B. Schneier. Second preimages on n-bit hash functions for
much less than 2
n
work. Eurocrypt 2005, LNCS No. 3494, pages 474-
490.
[6] J. Kelsey, T. Kohno. Herding hash functions and the Nostradamus
attack. Eurocrypt 2006, LNCS No. 4004, pages 183-200.
[7] G. Bertoni, J. Daemen, M.Peeters, G. V. Assche. Sponge Functions.
Ecrypt Hash Workshop 2007.
[8] G.Bertoni, J. Daemen, M. Peeters, G. V. Assche. The KECCAK refer-
ence. NIST SHA-3 Competition, 2007-2012