You are on page 1of 10

KECCAK Verification

Angus Gruen
July 2022

KECCAK is a cryptographic protocol behind SHA-3. The KECCAK algo-


rthm works by repeating a the following sequence of operations 24 times.

Algorithm 1 KECCAK ROUNDk


1: Input: 5 × 5 × 64 cube of bits A[x, y, z]
2: procedure θ Step
3: C[x, z] = A[x, 0, z] ⊕ A[x, 1, z] ⊕ A[x, 2, z] ⊕ A[x, 3, z] ⊕ A[x, 4, z]
4: D[x, z] = C[x − 1, z] ⊕ C[x + 1, z − 1]
5: A′ [x, y, z] = A[x, y, z] ⊕ D[x, z]
6: procedure ρ and π Steps
7: B[x, y, z] = A[3y + x, x, z − r[x, y]]
8: procedure χ Step
9: A′′ [x, y, z] = B[x, y, z] ⊕ ((∼ B[x + 1, y, z]) & B[x + 2, y, z])
10: procedure ι Step
11: A′′′ [0, 0, z] = A′′ [0, 0, z] ⊕ RC[k, z]
12: Output: 5 × 5 × 64 cube of bits A′′′ [x, y, z]

Included in this algorithm are 2 constants, r, a 5×5 array of integers between


0 and 63 and RC, a 24 × 64 array of bits, with one 64 bit vector used for each
round. The exact values of these constants will not matter for us here but can
be found in the implementation paper [1].
The problem we investigate in this document is how to verify that the out-
put of the round was correct given the input, using a collection of low degree
polynomials. In particular we would like to do this with only a minimal amount
of information needing to be saved. The reason we want to express these oper-
ations via polynomial conditions is that it allows us to produce zero knowledge
proofs that the computations were done correctly. For a rough sketch of how
this is done, see Appendix B.

1 Polynomification of Bitwise Operations


We start with a discussion of how Bitwise Operations can computed using low
degree polynomials. The basic observation which makes this all possible are the

1
following well known algebraic identities for simple unary and binary operations.

x AND y = x & y = xy
x XOR y = x ⊕ y = x + y − 2xy
NOT x = ∼ x = 1 − x

What ‘=’ means here is that if we interpret 1 as TRUE and 0 as FALSE, the
left and right hand sides are equal for all (x, y) ∈ {0, 1}2 regardless of the field
we work over. While in general we will be working over Fq for q = pn with p a
large prime, one nice well known observation which follows immediately from
the above relations is that XOR and AND are exactly + and × over F2 .
Say we have 3 columns x, y, z of length 32 with each entry a 0, 1 element of
Fp and we claim that for each 0 ≤ i ≤ 31, zi = xi ⊕ yi . Looking above, this is
equivalent to 32 simple polynomial conditions each of the form:

xi + yi − 2xi yi − zi = 0.

We expand on this example in Appendix B and show how this can be trans-
formed into a Zero-knowledge proof of correctness.
Currently, to test these polynomial conditions, the prover needs to save 96
field elements, but there is a slight reformulation which allows us to save only
65. The main idea is that while xi and yi appear in order 2 in the verification
condition, zi only appears with order 1. Hence instead of saving zi , simply save1
the u32
z = z0 + 2z1 + ⋯ + 231 z31 .
Then we can combine our 32 polynomial conditions linearly into 1 condition
which is still only degree 2 in the inputs
31
z − ∑ 2i (xi + yi − 2xi yi ) = 0.
i=0

Note that we could rearrange our constraint as

zi = xi ⊕ yi ⇐⇒ yi = xi ⊕ zi ⇐⇒ xi = yi ⊕ zi

and so we could pick which of the 3 columns to collapse into a u32. While,
for this single operation, there is no memory difference between the different
options, for longer sequences of operations and or more complicated operations
there may be an optimal choice.
Finally, as we will use these later, we give explicit polynomial expressions
for a couple of ternary operations.

xor3(x, y, z) = x ⊕ y ⊕ z = x + y + z − 2(xy + xz + yz) + 4xyz (1)


xornand(x, y, z) = x ⊕ ((∼ y) & z) = x + z − yz − 2xz + xyz (2)
1 This requires p > 232 but we are free to assume this

2
1.1 Polynomification of Larger Bitwise Operations
Whilst the resulting polynomials are relatively simple when our operation only
involves a few input bits, the situation gets more complicated as the number of
input bits is increased. In particular to compute a generic n−ary operation, we
need to use a polynomial of degree n as described by the following lemma.
Lemma 1.1. Any n−ary operation can always be represented as polynomial of
total degree n of the form
z = p(x) = ∑ ai xi11 ⋯xinn .
i∈{0,1}n

Proof. We briefly sketch the proof. Let X denote the set of all monomials of
the form xi1n ⋯xinn with ij ∈ {0, 1} for all j. We do not include monomials with
higher powers such as x21 or x1 x52 as out inputs are all in {0, 1} on which xn = x
for n ≠ 0 and so these will not be independant.
Now observe that the map {0, 1}n → X sending each tuple to the non zero
monomial of highest total degree is both well defined and bijective. This implies
that the image of the 2n evaluation maps:
n
{0, 1}n → (X → R2 )
n
forms a basis of R2 and so the ai can always be uniquely solved for via the
usual linear algebra techniques.
This would suggest that if we demand the overall degree be ≤ 3, the largest
bitwise operation we can use is also degree 3. Surprisingly however this is not
the case and if we are allowed to increase the degree in z, we can sometimes
find a polynomial P (z, x) of lower degree such that, given any xi , z ∈ {0, 1},
P (z, x) = 0 if and only if z = p(x). The key point is that, while for fixed x,
the equation P (z, x) = 0 may have more than one solution for z, exactly 1
solution will lie in {0, 1}. To differentiate between these two types, we will call
an equation of the form z = p(x) a computation polynomial whereas one of
the form P (z, x) = 0 will be a verification polynomial.
For a simple example which will appear later, imagine we wish to find a
polynomial for
z = x1 ⊕ x2 ⊕ x3 ⊕ x4 ⊕ x5 .
Any computation polynomial will have degree 5 but there are 2 different ap-
proaches we can take, which allow us to produce a verification polynomial of
degree 3.
• By definition, z is equal to the parity bit of x1 + x2 + x3 + x4 + x5 . Hence
x1 + x2 + x3 + x4 + x5 − z ∈ {0, 2, 4}
and so the equality
(x1 +x2 +x3 +x4 +x5 −z)(x1 +x2 +x3 +x4 +x5 −z−2)(x1 +x2 +x3 +x4 +x5 −z−4) = 0.
Holds if and only if z is correct.

3
• Taking a completely different approach, recall that
x = y if and only if x − y = 0
and
x = y ⊕ z if and only if x ⊕ z = y.
Hence
z = x1 ⊕ x2 ⊕ x3 ⊕ x4 ⊕ x5
holds if and only if
z ⊕ x1 ⊕ x2 = x3 ⊕ x4 ⊕ x5 .
which in turn holds if and only if
xor3(x1 , x2 , z) − xor3(x3 , x4 , x5 ) = 0.
where xor3 is the degree 3 polynomial defined in Equation (1)
Given this, a natural question to ask is when can a lower degree verification
polynomial be found and is there a lower degree on the bound. A simple check
we can do is test how many independent monomials are there of degree ≤ i.
Making use of z, we now have n + 1 variables and so in principal, for degree ≤ i,
the number of monomials will be
n+1
∑( ).
j≤i j
We need this to be at least 2n +1 which, by symmetry of binomials, immediately
implies that i ≥ n+1
2
. Thus verification with degree ≤ 3 will only be possible when
n ≤ 5. Note that this condition is necessary but not sufficient. For example there
is no degree 3 verification polynomial for
z = x1 & x2 & x3 & x4 & x5 .
This is because on our binary input space zxi = z for all i and so we do not have
enough independent monomials. In general the sufficient condition is that we
have 2n + 1 independent monomials and this needs to be checked on individual
examples. That being said, note that the second approach shown above will
generalise2 to any situation where our bitwise operation can be broken down as
z = y1 ⊕ y2 where y1 is a binary operation applied to any 2 of the inputs and y2
is a ternary operation applied to the other 3.
The other key difference between computation and verification polynomials
is that only the method of collapsing columns described in the previous section
only works with computation polynomial. This presents a clear trade off between
using verification polynomials for a larger expression versus breaking down the
expression into intermediaries and using and using computation polynomials to
collapse some columns.
One interesting question which requires further work is if verification poly-
nomials of even lower degree can be found in situations where more bits are
output.
2 The approach similarly generalises to situations where the bound on our polynomial degree

is something other than 3.

4
2 Verification of KECCAK
In order to verify that a round of KECCAK, we need to save the following
2406 field elements and verify that they were computed correctly. While this
is not strictly optimal (See Section 3.1 for a couple of tiny improvements that
reduces the number to 2402) the final improvements would make the code more
complicated for little gain. Additionally, a brief notational point, there are a
couple of variables which will be saved as u32’s but we will need to discuss there
individual bits. To distinguish the two settings we use bold font. So A[x, y, z]
will denote an individual bit with 0 ≤ z ≤ 63 whereas A[x, y, j] will denote a
u32, with j ∈ {0, 1}.
1. Size 5 × 5 × 2: The input, A[x, y, j] stored as pairs of u32’s.

2. Size 5 × 64: All of

C[x, z] = A[x, 0, z] ⊕ A[x, 1, z] ⊕ A[x, 2, z] ⊕ A[x, 3, z] ⊕ A[x, 4, z].

3. Size 5 × 5 × 64: All of

A′ [x, y, z] = A[x, y, z] ⊕ C[x − 1, z] ⊕ C[x + 1, z − 1].

4. Size 5 × 64: All of

C ′ [x, z] = C[x, z] ⊕ C[x − 1, z] ⊕ C[x + 1, z − 1].

5. Size 5 × 5 × 2: All of

A′′ [x, y, z] = B[x, y, z] ⊕ ((∼ B[x + 1, y, z]) & B[x + 2, y, z]).

saved as u32’s, A′′ [x, y, j]. Here B is the alias

B[x, y, z] = A′ [3y + x, x, z − r[x, y]]

which does not need to be separately saved.


6. Size 64: All of the individual bits of A′′ [0, 0, z].
7. Size 2: All of
A′′′ [0, 0, z] = A′′ [0, 0, z] ⊕ RC[k, z].
saved as a pair of u32′ s, A′′′ [0, 0, j].

We will go through in detail the verification checks which need to be completed.


These can be separated into 2 groups. We first verify that A′ was correctly
computed from A and then verify that A′′ , A′′′ was correctly computed from A′ .

5
2.1 Verifying the computation of A′
As A is saved as u32’s we can only use computation polynomials on it, namely
all polynomials involving it need to be of the form

A[x, y, z] = p(A′ , C, C ′ ).

In particular this means that we can’t verify C in the obvious way so we will
use a slightly roundabout method. First we check the relation between C and
C ′ using xor3 described in Equation (1). Explicitly this means that we have

xor3(C[x, z], C[x − 1, z], C[x + 1, z − 1]) = C ′ [x, z]. (3)

Next, observe that we can re-write the definition of A′ to get

A[x, y, z] = A′ [x, y, z] ⊕ C[x, z] ⊕ C ′ [x, z].

This can be verified using xor3 and our u32 trick. Explicitly
31
i ′ ′
∑ 2 xor3(A [x, y, i + 32j], C[x, i + 32j], C [x, i + 32j]) = A[x, y, j] (4)
i=0

Next we make use of the following lemma


Lemma 2.1. Assuming that

A[x, y, z] = A′ [x, y, z] ⊕ C[x, z] ⊕ C ′ [x, z]

the following statements are equivalent:


1. C[x, z] = ⊕4i=0 A[x, i, z]
2. C ′ [x, z] = ⊕4i=0 A′ [x, i, z]
Proof. See Appendix B
This means that instead of verifying the computation for C, we can instead
verify it for C ′ which we are able to do as we have the bits of A′ . Hence, letting
4
D[x, z] = (∑ A′ [x, i, z]) − C ′ [x, z]
i=0

we verify
D[x, z](D[x, z] − 2)(D[x, z] − 4) = 0. (5)
Note that this equation is only degree 3 in our saved variables as D is linear in
A′ and C ′ .
Putting this all together, if all constraints are satisfied then Lemma 2.1 and
Equations (4) and (5) show that C was computed correctly. From this, (3)
shows that C ′ is also correct and so using (4) again we conclude that A′ was
correctly computed from A.

6
2.2 Verifying the computation of A′′ and A′′′
This part of the procedure will be simpler to verify. As
B[x, y, z] = A′ [3y + x, x, z − r[x, y]]
is simply a rotation of A’, nothing extra needs to be saved and there is noting
to verify. Hence we immediately move to verifying A′′ which, like A, is saved
as an array of pairs of u32’s. We can verify it using xornand as described in
Equation (2) along with the u32 trick again giving us
31
i ′′
∑ 2 xornand(B[x, y, i+32j], B[x+1, y, i+32j], B[x+2, y, i+32j]) = A [x, y, j].
i=0
(6)
Next, we need to check that the saved individual bits of A′′ [0, 0, z] match up
with A′′ [0, 0, j]. This corresponds to the conditions
31
i ′′ ′′
∑ 2 A0,0 [i + 32j] = A [0, 0, j] (7)
i=0

Finally, to verify A′′′ [0, 0] we simply need to use xor, the usual 2 degree poly-
nomial describing xor. This gives
31
i ′′ ′′
∑ 2 xor(A0,0 [i + 32j], RC[k, i + 32j]) = A [0, 0, j]. (8)
i=0

If Equations (3), (4), (5), (6), (7), (8) all hold, for all choices of x, y, j then we
can conclude that the output was correct given the input.

3 Improving the Algorithm


In this section we briefly discuss some idea’s which could be used to reduce the
memory cost of the algorithm even further. Note that some of these improve-
ments may be mutually exclusive.

3.1 Minor Improvements


There is a small improvement which can immediately be done to reduce the
number of saved field elements by 4. The downside of it is that it is slightly
annoying from a technical standpoint to implement and only gives a tiny per-
formance boost. The improvement is simply to not store the u32’s for either
A′′ [0, 0] or A′′′ [0, 0] and instead of storing the 64 bits of A′′ [0, 0], store the 64
bits of A′′′ [0, 0]. Then we simply need to check that
A′′′ [0, 0, z] = RC[k, z] ⊕ B[x, y, z] ⊕ ((∼ B[x + 1, y, z]) & B[x + 2, y, z]).
This can be easily checked by moving RC[k, z] onto the other side and then
verifying

xor(A′′′ [0, 0, z], RC[k, z]) − xornand(B[0, 0, z], B[1, 0, z], B[2, 0, z]) = 0.

7
3.2 More Radical Improvement Ideas
The following are a couple of possible ideas which, with further investigation
and work may be able to reduce the number of field elements which need to be
saved. Note that the idea’s are a little unformed and may or may not work out.

• Clearly, the main cost in storage is coming from the fact that for most
variables A′ , C, C ′ we need to use 64, {0, 1} field elements. It would be a
massive saving if it was possible to combine these somewhat, even if we
cannot pack 32 together as we do for A and A′′ .
The issue with combining is that is breaks our polynomial expressions but
could there be a way around this by ’spacing’ variables appropriately or
finding different polynomial expressions which can verify u′ n′ bit opera-
tions for some small n (e.g. u2 or u4).

• Another idea to investigate is restructuring the procedure so as to perform


the ι step first. This might allow us to move the direct summand with
RC into the definition for C ′ which would save 66 columns overall as we
would not need to save A′′ [0, 0, z] or A′′′ [0, 0, j]. The cose of this is that
we would have to add an extra final mini round where we direct sum in the
last RC term. Note that this last direct summation has no cryptographic
value so we could also simply remove it and end up with an equally secure
cryptographic protocol which is essentiall KECCAK but not exactly it.

A Zero Knowledge Verification Proofs


Say we have 3 columns x, y, z of length 32 with each entry a 0, 1 element of
Fp and we claim that for each 0 ≤ i ≤ 31, zi = xi ⊕ yi . Looking above, this is
equivalent to 32 simple polynomial conditions each of the form:

xi + yi − 2xi yi − zi = 0.

We want to turn these polynomial constraints into a zero-knowledge proof that


the computation was performed correctly. There are a variety of ways to do this
but one possible method is as follows. The prover finds an order 128 subgroup
⟨g⟩ ⊂ F×p and fixes a domain D ⊂ F×p disjoint from ⟨g⟩ with ∣D∣ large (Say 220 )
and closed under multiplication by g (So gD = D). Next the prover computes
the degree 96 polynomial f (X) satisfying

⎪xj j ≡ 0 mod 4


j ⎪ 4

f (g ) = ⎨y j−1 j ≡ 1 mod 4
⎪ 4



⎪z
⎩ 4j−2 j ≡ 2 mod 4

8
which they commit to over D via a Merkel tree. From these polynomials, the
prover computes the polynomial Q(X) defined by

f (X) + f (gX) − 2f (X)f (gX) − f (g 2 X)


Q(X) = 31
∏i=0 (x − g 4i )
fx (X) + fy (X) − 2fx (X)fy (X) − fz (X)
= .
x32 − 1
and similarly commits to it via another Merkel Tree. The prover is telling the
truth if and only if this polynomial Q has degree less than or equal to 160.
This can be checked by the verifier sampling 161 points from D where for each
d they receive Q(d), f (d), f (gd), f (g 2 d). Then they can test that for each d,
(d32 −1)Q(d) = f (d)+f (gd)−2f (d)f (gd)−f (g 2 d) and that the 161 points Q(d)
indeed lie on a polynomial of degree 160 or less.3

B Proof of XOR Equivalence


We restate the lemma here for convenience.
Lemma B.1. Assuming that

A[x, y, z] = A′ [x, y, z] ⊕ C[x, z] ⊕ C ′ [x, z]

the following statements are equivalent:


1. C[x, z] = ⊕4i=0 A[x, i, z]
2. C ′ [x, z] = ⊕4i=0 A′ [x, i, z]
Proof. First assume that C ′ [x, z] = ⊕4i=0 A′ [x, i, z]. Then, using the basic prop-
erties of ⊕, namely that x ⊕ 0 = x, x ⊕ y = y ⊕ x and

n ⎧
⎪x
⎪ if n is even
⊕x = ⎨
i=0 ⎪
⎪0 if n is odd

we compute
4 4
′ ′
⊕ A[x, i, z] = ⊕ (A [x, i, z] ⊕ C[x, z] ⊕ C [x, z])
i=0 i=0
4
= C[x, z] ⊕ C ′ [x, z] ⊕ A′ [x, i, z]
i=0
= C[x, z]

Proving one direction. The other direction follows immediately by symmetry.

3 You would usually also use the FRI protocol here to further reduce the degree of Q and

or combine different Q’s which becomes vastly more efficient as the polynomials grow larger.

9
References
[1] Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., and
Van Keer, R. Keccak implementation overview. https://keccak.team/
files/Keccak-implementation-3.2.pdf.

10

You might also like