Authenticated encryption GCM - CCM

Lorenzo Peraldo, Vittorio Picco December 20, 2007

Contents
1 Introduction 1.1 Authenticated Encryption . 1.2 Generic composition . . . . 1.3 Single-Pass combined modes 1.4 Two-pass combined modes . 2 2 3 3 4 5 5 5 7 8 9 10 10 11 11 11 14 14 15 15 17 19 20

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

2 CCM, Counter with Cipher Block Chaining-Message Authentication Code 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Formatting function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Length of the MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Efficiency and performances of CCM . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Criticism of CCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 A possible attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 GCM, Galois/Counter Mode 3.1 Introduction . . . . . . . . . 3.2 Description . . . . . . . . . 3.3 IV and Keys . . . . . . . . 3.3.1 Keys . . . . . . . . . 3.3.2 IV . . . . . . . . . . 3.4 Implementations . . . . . . 3.5 Security . . . . . . . . . . . 4 Conclusions Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Chapter 1

Introduction
1.1 Authenticated Encryption

Authenticated Encryption (AE) is a term used to describe encryption systems which simultaneously protect confidentiality and authenticity (integrity) of communications. These goals have long been studied, but they have only recently enjoyed a high level of interest from cryptographers due to the complexity of implementing systems for privacy and authentication separately in a single application. For decades the solution to this problem has been to combine privacy and authentication in a straightforward manner using the so-called ”generic composition”, but recently there have been a number of new construction which achieve this two goals simultaneously, often much faster than generic composition solutions. What we’ll analyze regards authenticated encryption in a symmetric-key model. Thus a single key K will be chosen randomly and then shared between the sender and the receiver. Once the two parties have the key, we have to provide them an AE algorithm such that the sender can process a selected message M with the AE algorithm along with the key K (and possibly a nonce N ), and then send the resulting output to the receiver. The output of this processing will be a ciphertext C, the nonce N and a short message authentication tag, T . Then the receiver should be able to recover M using C, N and his copy of the key K, and to verify the authenticity of the received message using the above parameters along with the tag T . To make an AE algorithm good we have many requirements. For example performance, portability, simplicity, parallelizability, freedom from patents and of course security. This last requirement is maybe the most important one as an AE scheme has two goals, privacy and authenticity,and it won’t serve our needs if it’s not secure. Privacy means that a passive attacker that views the ciphertext C and the nonce N , cannot read the content of the message M . to achieve this we could make C indistinguishable from a random bit string. Authenticity, instead, means that an active attacker cannot easily generate a valid ciphertext C, a nonce N and a tag T such that the receiver will believe it was generated by the authorized sender. In many applications we do not only encrypt and authenticate our message, but we also need to include some additional information A which must be authenticated too. For example in a network packet we should encrypt the payload and authenticate both the header and the payload. For this reason associated data needs to be included as input to the AE schemes. Schemes that allow associated data are called AEAD (Authenticated Encryption with Associated Data). One unfortunate aspect of most cryptographic schemes it that we cannot prove that any scheme meets the formal goals required of it. We can prove only some things related to security depending on the type of cryptographic object we analyze. If it is a primitive such as a block cipher, there’s no proof of security possible, so we can just hope for security after showing that none of the known attacks are working (differential cryptanalysis). For algorithms that are built on top of these primitives we can prove at best that they are as secure as the underlying primitives.

2

1.2

Generic composition

The traditional way to obtain both authenticity and confidentiality was, until recently, to find two well designed protocols, one for encryption and one for authentication, and then use them in sequence. This is really straightforward and, at least at a first glance, completely safe. Is both the algorithms are safe, so their combination in sequence would be safe as well. In general the approach was to choose a strong mode of operations for block ciphers, like CBC, and then to use it with authentication protocols that do not use keyed-hash functions. This kind of approach has been proved as wrong, and the best example to show it is the WEP protocol. Wired Equivalent Privacy protocol is a common choice to protect WiFi networks. It provided authentication with a simple CRC hash and then used a stream cipher to encrypt the data. The mechanism is very simple and is also very simple to circumvent it. Another common mistake is to use the same key both for the authentication and encryption operations. This a weaker requirement, though, and a smart implementation could reduce a lot the related security risks. There are three available choices when using generic composition: • MAC then Encrypt (MtE): we compute a MAC of the plaintext, add to it and then encrypt the whole; • Encrypt then MAC(EtM): we first encrypt the plaintext and then authenticate the resulting ciphertext; • Encrypt and MAC (E&M): we first encrypt the plaintext and then authenticate the plaintext again. Some studies has been done on which of these three strategies is the best to achieve authenticated encryption in the safest way, and the result was that in general Encrypt then MAC is the best choice. Performing such an operation gives the MAC a property called ”strongly unforgeable”, while using the other two methods there could be confidentiality problems. The conclusion of these studies was that EtM can be considered safe if provided of a secure encryption algorithm and of a secure MAC, each using independent keys. MtE and E&M can be considered secure if attention is paid on the choice of the combination encryption algorithm-MAC computation. In addition of their extremely simplicity, the generic composition methods have the interesting characteristic that since the two operation are completely independent, is possible to encrypt only a subset of the total transmitted data. In this way we can add to a message some additional data that will not be encrypted and therefore is useful for authentication only. On the other hand, the obvious drawback of the generic composition is the long time needed to process two times the message, with two different keys.

1.3

Single-Pass combined modes

Until 2000 there was no way to obtain authenticated encryption with only a single pass over the message. Generic composition was the only way to provide AE. In 2000 IAPM has been developed. If with generic composition we needed to invoke 2 · m times the block cipher (where m is the number of blocks in which the plaintext has been splitted), with IAPM we need just m + log(m) invocations. This is because IAPM compute certain values, called the seeds, before the encryption; seeds calculation is used to achieve authentication and needs only log(m) invocations of the encryption algorithm. After the release of IAPM many researchers started to work on their own solution of single-pass AE, generally modifying the original structure of IAPM. The researchers understood the power of the modes they were developing, so they all patented their discoveries. This was also their biggest mistake. It has been never verified whether all these patents override each other, even though probably some of them do, because of the hype of that period on single-pass AE. Since now there has never been 3

requests of verifying the possible overrides in court, but the possible users of the proposed methods are quite afraid that if they choose to implement one of these solutions there could be legal actions against them. In conclusion, Single-pass combined modes has never been used as a standard in any application and furthermore other teams of researchers have quited they work in this field because they could have accused of violating some Intellectual Property. Now the interest of researchers has moved in another direction, the two-pass combined mode.

1.4

Two-pass combined modes

The Intellectual Property problems raised by the Single-pass combined mode had clearly shown that it was important to find good solutions to the authenticated encryption problems, and that these solution should be patent-free. The two-pass combined modes represent a class of algorithms with performances not so far from the single-pass ones, but all with no intellectual property restriction. The first to be developed was the CCM (which will be explained in the details later), then EAX tried to solve some of the CCM problems, then CWC has been developed to improve EAX and finally GCM has been created. GCM is probably the best algorithm available now and will also be explained in all its aspects later. CCM is not much better than just generic composition, because in fact it uses a standard MAC generation algorithm (CBC-MAC) and then a standard CTR encryption, but it offers some advantages. The biggest is the use of only a single key to encrypt and generate the MAC. We said that this could be a security problems but CCM designers paid a lot of care on this topic and therefore CCM has now been proved as secure. CCM has also become the mandatory mode for the 802.11 wireless networks. EAX solved some of the CCM problems, in particular the issue of knowing the message length in advance and its complicated definition that used some unnatural parametrization. EAX uses a modified CBC-MAC called OMAC, and then the CTR mode of operations. CWC has been developed because both CCM and EAX can’t be parallelized in hardware, because the CBC-MAC (or its variants) is inherently serial and so can’t achieve high throughput. CWC could operate up to 10Gbit per second. The features of CWC, though, can be exploited in a parallel computing environment only, in a normal computer its performances are not any better of the previous techniques. GMC is the latest developed algorithm and is the one with the highest performances. It has been developed by one of the researcher of CWC and therefore has been totally devoted to improve CWC performances and to fill its lacks. GMC has been created starting from a modification in the mathematical construct that lays at the CWC basis. While CWC operated in a modulo 127 integer environment, the GMC makes use of the Galois field mathematics: this apparently abstract choice allows the implementers to carry out a much much simpler hardware circuit that realizes the hashing function. This allow GCM to perform with throughput of more than 10Gbit per second. The only possible competitor for GCM is the OCB (that is a single-pass mode), but many reasons (other than intellectual property) makes GCM a best choice. For example OCB needs two different schemes, one for encryption and one for decryption, while GCM only needs one.

4

Chapter 2

CCM, Counter with Cipher Block Chaining-Message Authentication Code
2.1 Introduction

CCM is a mode of operation for a symmetric key block cipher algorithm. It combines the techniques of the Counter (CTR) mode and the Cipher Block Chaining-Message Authentication Code (CBC-MAC) algorithm to provide confidentiality and authenticity of the data. The Counter mode with CBC-MAC mode is designed to use the Advanced Encryption Standard (AES) block cipher, or any other block cipher with a block size of 128 bits or more, to provide authentication and encryption using a single key. As the secret key is only one, being this a symmetric key algorithm, it must be established beforehand and be known only by the two parties involved in the transmission of the data. For this purpose CCM requires a well-designed key management structure. CCM is intended for use in a packet environment and thus it can’t be used with stream data. The plaintext input includes a header, which is authenticated but not encrypted, and a payload, which is both authenticated and encrypted. Each packet must be an integral number of bytes and must be assigned a unique value, called nonce. The maximum number of packets that can be authenticated with the same key is determined by the size of the nonce, which is one of the parameters that must be decided when designing the algorithm. CCM processing expands the packet size by appending an encrypted authentication tag. Successful verification of this authentication tag provides assurance that the packet originated from a source with access to the block cipher key and it also provides assurance that the packet wasn’t altered after the generation of the authentication tag. Failed verification of the tag is designed to reveal both accidental and intentional, unauthorized modifications of the packet. CCM allows pre-computation of the key stream if the nonce value is known, allowing half of the computational load to be pre-processed. This property can be used to improve the efficiency of an implementation. The size of the implementation can be minimized as well, as CCM uses only the forward encryption function of the block cipher and not the inverse function. CCM mode was designed by Russ Housley, Doug Whiting and Niels Ferguson. At the time CCM mode was developed, Russ Housley was employed by RSA Laboratories. A minor variation of the CCM, so called CCM*, is used in ZigBee standard. CCM* includes all of the features of CCM and additionally offers encryption-only and integrity-only capabilities.

2.2

Description

CCM consists of two main processes: generation-encryption and decryption-verification. These two processes combine the counter mode encryption and the cipher block chaining-message authentication code to compute a MAC to provide authentication.

5

Before the implementation of CCM it is important to have a valid key establishment and key management to ensure the efficiency of the block cipher algorithm used for encryption. The secret key for this algorithm must be generated randomly and be shared only by the parties to the information or the whole cipher algorithm would be useless. Moreover the same key can be used for a maximum number of invocations of the cipher block algorithm and this limit should be set to 261 . As we said the CCM combines two cryptographic mechanisms based on this cipher block algorithm. The first one is the Counter mode, used for confidentiality, which requires the generation of a sufficiently long sequence of blocks (counter blocks) that will then be used to encrypt the message. These blocks don’t need to be secret but must be distinct within a single invocation and any other invocation of the cipher block algorithm under the same secret key. The other mechanism used in CCM is CBC-MAC. This method is basically an adaption of the cipher block chaining mode used for authentication. Starting from a zero initialization vector CBC is applied to the data to be authenticated and the last block generated, truncated at an established length, is used as an authentication tag called MAC. Note that the same key is used both for the CTR and the CBC-MAC. For the generic CCM mode there are two parameter choices. The first one is the size of the authentication tag M , which involves a trade-off between message expansion and the probability that an attacker can undetectably modify a message. Valid values for M are 4, 6, 8, 10, 12, 14 and 16 bytes. The second choices is on parameter L, the size of the length field, which requires a trade-off between the maximum message size and the size of the nonce. Therefore the length of the message we want to encrypt and authenticate must be defined beforehand.

Generation-encryption
To authenticate and encrypt the message we need the following input information: • an encryption key K for the block cipher; • a nonce N ; • the message, also called payload P , of length determined by the choice of the parameter L; • additional authenticated associated data A, used to authenticate plaintext packet headers, or other information about the message. CCM produces as output the ciphertext C. The first step for authentication is to generate the authentication tag M . This is done using CBC-MAC. First a formatting function is applied to 3 of the inputs, the payload, the associated data and the nonce, to produce blocks B0 , B1 , ..., Bn . These blocks provide the input for the CBC-MAC function that generates the MAC we need using the key K for the cipher block chaining. Then we need to perform encryption. Once we have the authentication tag M, the Counter mode is applied to generate the counter blocks CT R0 , CT R1 , ..., CT Rm . Thanks to these blocks we can encrypt the message by XORing the various octets of the payload with the blocks CT R1, ..., CT Rm . CT R0 is instead XORed with the authentication tag M to generate an authentication value. The final output of the generation-encryption process is the ciphertext C which consists of the encrypted payload followed by the encrypted authentication value computed before. This is the detail of the steps needed for generation-encryption: 1. Apply the formatting function to (N, A, P ) to produce the blocks B0 , B1 , ..., Br ; 2. Set Y0 = CIPHK (B0 ); 3. For i = 1 to r, do Yi = CIPHK (Bi ⊕ Yi − 1); 4. Set T = MSBT len (Yr ); 5. Apply the counter generation function to generate the counter blocks Ctr0 , Ctr1 , ..., Ctrm ; 6

6. For j = 0 to m, do Sj = CIPHK (Ctrj ); 7. Set S = S1 S2 ... Sm ; 8. Return C = (P ⊕ MSBP len (S)) (T ⊕ MSBT len (S0 )).

Decryption-verification
For the process of decryption and the verification of authenticity and integrity we need the following information: • the received ciphertext C; • the associated data A; • the nonce N ; • the cipher key K. As an output this process produces our message in plaintext or INVALID if the verification fails. First of all the counter mode decryption is applied to the received ciphertext with the key K to produce the payload and the associated authentication tag (MAC). Then the nonce, the associated data and the computed payload are formatted according to the formatting function in order to produce blocks for the CBC-MAC mechanism. This is applied to recomputed the MAC and compare it with the received one in order to verify it. If this is not verified then the decryption-verification function returns the error message INVALID, else it gives as output the payload. To provide higher security, when an INVALID message is returned, the payload P and the MAC should not be revealed and the implementation should ensure a third party not to be able to distinguish what step the error message results from. This is the detail of the steps needed for decryptionverification: 1. If Clen ≤ T len, then return INVALID; 2. Apply the counter generation function to generate the counter blocks Ctr0 , Ctr1 , ..., Ctrm ; 3. For j = 0 to m, do Sj = CIPHK (Ctrj ); 4. Set S = S1 S2 ... Sm ; 5. Set P = MSBClen−T len (C)⊕ MSBClen−T len (S); 6. Set T = LSBT len (C)⊕ MSBT len (S0 ); 7. If N , A, or P is not valid, then return INVALID, else apply the formatting function to (N, A, P ) to produce the blocks B0 , B1 , ..., Br ; 8. Set Y0 = CIPHK (B0 ); 9. For i = 1 to r, do Yj = CIPHK (Bi ⊕ Yi−1 ); 10. If T = MSBT len (Yr ), then return INVALID, else return P .

2.3

Formatting function

The blocks B0 , B1 , ..., Bn used by the CBC-MAC mechanism are generated by a formatting function that acts on the nonce, the payload and the associated data. The value of n depends on this formatting function. This formatting function must hold the following properties for any key used: • the first block B0 uniquely determines the nonce N ;

7

• the formatting data uniquely determines the payload P and the associated data A; • the first block B0 is distinct from any counter blocks used across all the invocations of CCM under a given key; this means that the formatting function and the counter generation function should not be constructed independently. The formatting function also defines which values (bit lengths) of payload, associated data, nonce and authentication tag are valid. In fact the formatting function imposes some restriction on these parameters that must be respected. The bit lengths of N , A and P must be multiple of 8 bits and the same is for the authentication tag. The first block of the formatted data represents the binary representation of the length of the payload. The length of this block can be called q and it’s a parameter of the formatting function we have to define. Therefore q determines the maximum length of the payload so that p < 28q . The value of q also determines the length of the nonce n, because the sum q + n must be constant. Thus we’ll have a trade-off between the maximum number of invocations of CCM under a given key and the maximum length of the payload for these invocations.

Formatted input data
The formatted data, in the form of blocks B0 , B1 , ..., Bn , must be well defined. The first block B0 contains a byte dedicated to four flags, the nonce and the binary representation of the message length as we said before. The four flag are the following: the first bit is Reserved and the second is for Adata; then follow 2 strings of 3 bits which contain the encoded values of t and q. Byte number Contents 0 Flags 1...15 − q N 16 − q...15 Q

Table 2.1: Formatting of B0 If the Adata field is 0 then there’s no associated data, else the associated data is formatted in this way: the associated data length a is encoded and the encoding is concatenated with the associated data A, followed by the minimum number of 0 needed so that the resulting string can be partitioned into 16 bytes blocks B1 , ..., Bm , where m depends on the associated data length a. Depending on the value of a, it can be encoded into 2, 6 or 10 bytes. The last n − m blocks Bm+1 , Bm+2 , ..., Bn represent the payload followed by the minimum number of 0 such that this string can be partitioned into 16bytes blocks. Not only the input data must be formatted, but also the counter blocks used in the CTR mode need to be formatted in the following way: Byte number Contents 0 Flags 1...15 − q N 16 − q...15 [i]8q

Table 2.2: Formatting of CT Ri Each block CT Ri contains the nonce N , the encoding of the index i and a field with flags. The first 2 bits of these flags are reserved for future use; these are followed by 3 bits set to 0 to ensure that all the counter blocks are different from B0 and the last 3 bits contain the encoding of q as in B0 .

2.4

Length of the MAC

The length of the MAC is one of the most important security parameters within CCM. During the decryption-verification process we determine whether the purported ciphertext is a valid ciphertext, which means that it’s been generated by a generation-encryption process with access 8

to the secret key, the nonce and the associated data, or not. The assurance of authentication of CCM is based on the scarcity of ciphertext. This means that an attacker without the key or with no access to the generation-encryption process cannot generate a ciphertext easily and therefore if a ciphertext passes the decryption-verification process it’s very likely to be a valid and legitimately generated ciphertext. The first thing we verify in a purported ciphertext is that it’s length is at least equal to the length of the authentication tag (MAC), which we’ll call Tlen . The decryption-verification process compares the MAC decrypted from the ciphertext with the MAC computed for the received payload, the nonce and the associated data. If the MACs are equal then the result is positive and the process outputs the payload, else the output will be the error message INVALID. In this case one between the payload and the associated data is not authentic. If the result is positive and we get the payload then both the payload and associated data are authentic, but this assurance cannot be absolute as an attacker could still have a small probability to generate a valid ciphertext. This probability depends on Tlen and in particular it is less than 2−T len . As an attacker could present many ciphertexts to increase this probability or intercept a valid ciphertext and replay it, the receiver should have proper controlling protocols. So we could state that the larger Tlen we choose, the greater authentication we assure, but we must beware of the trade-off that the choice of Tlen implies. In fact larger values of Tlen require more bandwidth for the ciphertext and this could not always be available for some connections. To ensure a good security n low risks we should always choose a value of Tlen greater than 64. a smaller value for Tlen could be chosen for example for low bandwidth connections where there’s not the possibility to attempt many trials. We can say that Tlen should satisfy the following inequality: Tlen > lg M axErrs Risk

Where Risk is the highest acceptable probability for an inauthentic message to pass the decryptionverification process, and M axErrs is the number of times that the output can be the error message INVALID before the key is retired. To preserve security, implementations need to limit the total amount of data that is encrypted with a single key; the total number of block cipher encryption operations in the CBC-MAC and encryption together cannot exceed 261 . (This allows nearly 264 octets to be encrypted and authenticated using CCM. This is roughly 16 million terabytes, which should be more than enough for most applications). In an environment where this limit might be reached, the sender must ensure that the total number of block cipher encryption operations in the CBC-MAC and encryption together does not exceed 261 . Receivers that do not expect to decrypt the same message twice may also check this limit.

2.5

Efficiency and performances of CCM

Performances depend on the speed of the block cipher implementation. In hardware, for large packets, the speed achievable for CCM is roughly the same as that achievable with the CBC encryption mode. Encrypting and authenticating an empty message, without any additional authentication data, requires two block cipher encryption operations. For each block of additional authenticated data one additional block cipher operation is required. Each message block requires two block cipher encryption operations. The worst-case situation is when both the message and the additional authentication data are a single octet. In this case, CCM requires five block cipher encryption operations. Both CCM encryption and CCM decryption operations require only the block cipher encryption function. In AES, the encryption and decryption algorithms have some significant differences. Thus, using only the forward encrypt operation can lead to a significant saving in code size and hardware implementation and size. In hardware, CCM can compute the message authentication code and perform encryption in a single pass. This means that the implementation doesn’t have to wait for the calculation of the MAC to be completed to start the encryption. Thus there is a good advantage in the speed of this algorithm.

9

CCM was designed for use in a packet processing environment. The authentication processing requires the message length to be known in advance, which makes one-pass processing difficult in some environments. However, in almost all environments message or packets lengths are well known so we don’t have this problem.

2.6

Criticism of CCM

There are several problems regarding different aspects of CCM that have been analyzed. In terms of efficiency the first problem with CCM is that it doesn’t work on-line. This means that it can’t work on a stream of data as we’ve already said but must have the input data n needs to know the length of the message before starting the process. On the other hand it’s true that CCM is often used in environments where packet length are well known even if in many context we can’t know the length of the message we’re handling until it’s finished. Length-prepend annotation also causes another problem for the associated data: CCM disrupts its word-alignment. This problem may cause significant losses in the performances, as modern machines perform operations much more efficiently when pointers into memory fall along word-boundaries. This can’t be done when we prepend the length-annotation to the associated data. This problem becomes more relevant when the associated data is long, but we usually expect the associated data to be just a few bytes. Another problem related to the associated data comes from the fact that CCM can’t pre-process static associated data. This would be very useful in contexts where the associated data is the same during a whole communication session so that we could process it once for all in order to reduce the time needed for encryption and decryption. This cannot be done because the algorithm encodes the nonce and the message length before the associated data rather than after it. Parametrization of CCM is another aspect that is often criticized. The main points of this criticism include the fact that a trade-off between the length of the nonce and the message length, induced by the choice the user has to do before using CCM, is apparently without any sense as the two parameters have nothing to do with each other. Furthermore byte orientation of CCM, as it’s defined only on octet strings, could be seen as a limit for this mode of operation.

2.7

A possible attack

A common slogan in the design of Internet protocols is ”be conservative in what you send, and liberal in what you accept”. Imagine a CCM implementation respecting this slogan literally; the sender always send 16-byte tags messages, but the receiver accept messages with valid tags of any permitted length. An attacker could choose to create 4-byte tags and generate a valid ciphertext after 232 tries. However this attack could be of limited value as it’s a blind forgery and the attacker couldn’t control whether the message is accepted or not. Another possible scenario is the same: a smarter attacker can fully control what message the recipient will accept. This happens because the transmitted ciphertext has the form of C T where T is the authentication tag (MAC) and the received message M is computed as a function of C. the direct forgery attack can be performed as follows if an attacker intercepts a valid ciphertext for a message M . The attacker may want to flip certain bits positions in M and then generate 232 ciphertext in the form M ′ T ′ where M ′ is obtained XORing M with the difference the attacker decides to flip some bits, and T ′ varies over all 4-byte values. One of these ciphertext will be accepted as a valid encryption of M ′ . Thus an attacker can forge any message with 232 trials, given a single ciphertext that was authenticated with a 128-bit tag. One possible countermeasure to this kind of attack would be to fix the tag length parameter at key-negotiation time so that only sender and receiver know it. In this way the recipient will accept only one value of the tag length, in order to avoid the direct forgery attack, and won’t accept a new tag length until the end of the session.

10

Chapter 3

GCM, Galois/Counter Mode
3.1 Introduction

The GCM is a mode of operation for block ciphers, that provides authenticated encryption. It makes use of the finite fields mathematics (Galois fields) to provide authentication and uses the CTR mode of operation for the AES cipher to provide encryption. The GCM has been developed to meet the growing need of fast algorithms, capable of handling the fastest and fastest networks speed. In the era of Gigabit networks, a reliable and fast authenticated encryption algorithm is desired: the encryption is usually a fast operation, that can be realized in many ways, and many protocols provide efficient encryption techniques; many of them can be implemented in software and in hardware, make the most of pipelining and parallelization. The real bottleneck is the authentication part. Although many algorithms provide authentication, almost none can keep up the pace of Gigabit links, and in fact a standard doesn’t exist. GCM is an authenticated encryption mode of operation that can be realized both in software and hardware, can be pipelined and work in a multiprocessor environment, and is free of intellectual property restrictions, thus is a perfect candidate to fill the emerging need. The Galois Counter Mode is based on the CTR, but adds a MAC, computed with operations in a Galois field. This choice has been made because the operation of multiplication is extremely easy to perform within such a field. It only involves basic operations that can be implemented in hardware. The function that computes the MAC is called GHASH and it produces a tag; this tag is sent with the ciphertext and must be verified by the recipient in order to authenticate the message. One of the most interesting features of the GCM is that the function GHASH doesn’t need to be applied to an encrypted text, but it can be used alone, only to provide authentication: in this case the algorithm is called GMAC. What is remarkable is that if one changes a few bits of the plaintext and then compute the MAC again, the computational effort needed is proportional to the number of bits changed. Another useful characteristic of GCM, that makes it particularly attractive, is that it needs an Initialization Vector (IV), but this vector does not have a fixed length, it can be arbitrary. Since the IV is often a nonce, any available nonce can be used, spreading the field of application of the GCM. GCM has been designed to be used with AES, in particular AES-128, that is a common choice for many applications. In any case the 128 bit is just a suggestion, the Galois Counter Mode can be used with other lengths. The key used for encryption and to generate the MAC is the same: this choice simplifies the operation of key distribution.

3.2

Description

GCM has two main functions, called authenticated encryption and authenticated decryption. We’re going to analyze them separately, even though they are almost identical.

Authenticated encryption
This function needs 4 inputs:

11

• the plaintext, called P ; • the secret key, called K; • the initialization vector, IV ; • the additional authentication data, shortly AAD, indicated with A in the formulas. The output produced are only 2: • the ciphertext, called C; • an authentication tag, called T . We will now describe how this algorithm works, providing also other information on the input and output data. We assume to use the AES-128 as the underlying encryption cipher but as we’ve already said, the size of the cipher is not important. The authenticated encryption function acts on two different levels: one for encryption and one for authentication. Let’s consider encryption first. The plaintext length can be up to 239 − 256 bits, that is about 64 gigabytes of data. The plaintext is encrypted with the key K, whose length is appropriate to cipher one, in our case 128 bits. To start the encryption the initialization vector IV must be provided; IV can be of any length but the best choice is 12 bytes (96 bits), because in this case the algorithm is optimized; for applications where efficiency is a must, this length should be chosen. The input data are organized in this way. The plaintext is divided in sub-blocks of 128 bits (or the given cypher block size). At the end there are n blocks of 128 bits and a last block composed of the remaining bits, that are not enough to form a 128 bits block. Here is the encryption algorithm: 1. H = E(K, 0128 ); 2. Y0 = IV 031 1 if len(IV ) = 96 ; GHASH(H, {}, IV ) otherwise

3. Yi = incr(Yi − 1) for i = 1, ..., n; 4. Ci = Pi ⊕ E(K, Yi ) for i = 1, ..., n − 1;
∗ ∗ 5. Cn = Pn ⊕ MSBu (E(K, Yn ));

6. T = MSBt (GHASH(H, A, C) ⊕ E(K, Y0 )). Y is the counter of the CTR, that is initialized to the IV (padded with 31 zeros and 1 one). If IV is not 96 bits long a GHASH operation is performed to reduce (or expand) it to the 128 bits standard length. This is the reason why it’s suggested to use a 12 byte IV in efficiency-bounded applications: in this way the IV is used as-it-is and no GHASH operation must be performed. We will return later on the GHASH functioning. The value of the counter is then encrypted with the key and the result is XORed with the first block of data. Then the counter is increased by one. This operation is done modulo 232 , that is no more than saying that every time the counter reaches the value 232 is then set to zero. The new value of the counter is encrypted and XORed with the next data block and so on. For the last data block the operations are the same but only the most significant bits are considered. At the end we have n ciphertext blocks, each of them corresponding to a plaintext block. The blocks can be assembled and sent, or sent separately, possibly with sequence number to allow the recipient to reconstruct the original message. The choice of adopting the CTR mode of operation makes possible to treat each ciphertext block separately, so that the decryption operations can be pipelined in hardware to maximize the throughput. 12

Now the authentication part. To authenticate the data of the plaintext a MAC is used. This Message Authentication Code is computed using the GHASH function. We’re not going to describe in detail the definition of this function but we will only outline its main features. The GHASH is based on two basic operations easy to implement in hardware: the XOR and the multiplication in a Galois field. The GHASH needs 3 inputs: • the encrypted all zeros string, aka the hash sub-key, called H; • the authenticated data A, up to 264 bits; • the ciphertext block C. The hashing function computes many different XOR and multiplications in GF (2128 ). The output is a 128 bit string, but this string is not immediately the authentication tag T , it is the base to compute it. T is computed using this string and the ciphertext, taking only a certain number of bits of the output to allow the user choose the level of security of the tag. Since the output of the GHASH is a 128 bit string it’s natural to use it to resize a non-128 bits IV. In this case the input data of the function are not the same of the previous case, the authenticated data is replaced by an empty string and the ciphertext is replaced with the IV. Since the ciphertext has got the same length of the plaintext that can be from 0 to 239 − 256 bits, this is also the possible size of the IV.

Authenticated decryption
This function needs 5 inputs: • the secret key, called K; • the initialization vector, IV ; • the ciphertext, called C; • the additional authentication data, shortly AAD, indicated with A; • the authentication tag, called T ; At the output we obtain only 1 item: • the plaintext P or the F AIL special symbol if anything in the authenticated decryption goes wrong. How does it work? It performs exactly the same operations of the encryption, with the exception that the hash function is done before the encryption. This is possible because the ciphertext is obtained XORing the encrypted counter value with the plaintext. The XOR is the inverse operation of itself so after receiving the encrypted data if sufficient to encrypt the local counter (that must start with the same value of the encryption algorithm, the IV ) and XOR it with the received ciphertext. This is the procedure: 1. H = E(K, 0128 ); 2. Y0 = IV 031 1 if len(IV ) = 96 ; GHASH(H, {}, IV ) otherwise

3. T ′ = MSBt (GHASH(H, A, C) ⊕ E(K, Y0 )). 4. Yi = incr(Yi − 1) for i = 1, ..., n; 5. Pi = Ci ⊕ E(K, Yi ) for i = 1, ..., n;
∗ ∗ 6. Pn = Cn ⊕ MSBu (E(K, Yn ));

T ′ is compared to the T received with the ciphertext; if the two values match then the message is authentic and decryption is performed, otherwise the F AIL symbol is produced and the procedure aborted. 13

GMAC
The GMAC is the name given to the Galois Counter Mode of operation in the case only authentication is needed. This could be done for many reasons, the simplest of which is that there could not be the interest in encrypting the data but only in their authentication; another case is that we only want to take advantage of the speed of the GMAC algorithm with small plaintexts. To explain the latter we have to make a few considerations on the context where GCM could be applied. We have seen that the authenticated encryption function encrypts the plaintext P only, and not the additional data A, that is passed to the GHASH function to compute a hash, but is never being encrypted by the encryption function E. In conclusion we have two kinds of data: the plaintext (encrypted and authenticated) and the additional data (only authenticated). There are a lot of applications that could take advantage of this characteristic and one of the most important is packet routing over a network. The header of a, for example, IP packet, carries a lot of information that is needed to routers to forward the packet in a direction rather than another. If these data are encrypted routing can’t be done easily. On the contrary, using GCM, is possible to encrypt the payload of the packets and let in clear text the header. Moreover, if we just want to authenticate the packets but not encrypt them we can apply the GMAC function only. It is very interesting to know of what the Internet traffic is made of. Some studies have been done on this topic and they all lead to the same, and quite surprising, conclusion. First of all, TCP packets represent almost 90% of the Internet traffic. The surprising part is that almost 60% of the worldwide Internet traffic is made of packets smaller or equal to 44 bytes! This is due to the very low size but the very frequent use of ACK or SYN packets, that are very small. An analysis of GCM/GMAC performances compared to other similar authenticated encryption techniques shows that it is the best performing mode of operations in the Internet environment.

3.3

IV and Keys

The initialization vector and the symmetric key are two of the most critical elements of this mode of operation. If not used properly they can compromise the security offered by GCM. We will analyze them separately. There is a very important constraint in choosing the IV and the key, that is strongly stated in the official NIST publication. The document refer to the following principle as the ”uniqueness” requirement: The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2−32 . This limitation is obviously imposed to achieve a high security level, and must be obeyed in all GCM implementations.

3.3.1

Keys

First of all, the key. In symmetric encryption ciphers, the key is the most important element, as stated in the Kerchoff’s principles. It has no importance keeping the algorithm secret if the key are not handled in a safe way, and, moreover, the security of the cipher relies entirely on the key. The official Recommendation, though, is not giving any indication on how to create and to distribute the keys; it only says that a given key should be ”fresh” and that the mechanism of creating the keys should resist to attacks and tampering. In addition, no method is specified for key distribution, so an implementation of GCM should carefully take into account this aspects, and choose a proper way to create and distribute the keys, otherwise the security of the system would be seriously impaired.

14

3.3.2

IV

About the IVs, the recommendation is more strict. It specifies two frameworks to create the initialization vectors, one deterministic and the other based on random number generator. We’re not going to enter in the details of these two methods, but we will only describe them briefly, to highlight the main differences. We will refer to the two frameworks respectively as the deterministic-based and the RBG-based (RBG stands for Random Bit Generator). Both the systems does not specify the length of the IV, that is arbitrary, and treat the IV as the composition of two fields, but their logical meaning is different in the two cases. In the deterministic-based construction the first part is called fixed field, the second one the invocation field. Each device in the network has got a unique fixed field and every time the authenticated encryption function is called, the invocation field is incremented, to guarantee that no ciphertext are created using the same IV in a reasonable amount of time. In the RBG-based construction the two fields are called the random field and the free field. The recommendation is that the random field is at least 96 bits, while the free field could be 0 bits long. The random field can be either a real random number generated in a secure way, or the increment of a previous random number. The free field can assume the value of any number that we like, but the suggestion is that is 0, so that the IV is a completely random number. Doesn’t matter which method is used, we cannot generate infinite IVs always using the same key, otherwise we won’t met the requirement expressed in the Recommendation. No more than 232 IVs can be used with the same key, but in general the number of IVs used depends on the length of the key and on the number of devices implementing the GCM. The value 232 is valid only if there are only 2 devices using a 128 bits long key. In other cases, that make use of shorter keys or with a higher number of devices, a fewer initialization vectors can be used before changing the key. In any case the Recommendation states clearly that the probability of using the algorithm with the same key and IV must be lower than 2−32 , otherwise a high security level is not guaranteed.

3.4

Implementations

We have seen that GCM can be implemented both in hardware and software, so we’re going to take a brief look to these implementations, in particular for the hardware one.

Software
For what relates to software implementations we are just going to show that there are 2 directly proportional variables: the memory occupation and the amount of computation. We consider the GHASH function only because it is the only real operation needed to keep into account: the other operations are XORs, increments (both performed in 1 clock cycle), and the encryption (that depends on the underlying block cipher, AES, and not on the GCM structure). The GHASH operation that costs more in term of time (apart from the encryption) is the multiplication over the Galois field. This operation, though, has got an interesting property: it is linear in the bits of one of the factors. For example, if we want to compute H ·X, where H is the encrypted null vector and X an arbitrary bit string, the operation is linear in the bits of X. This means that given a value of H, is possible to construct a table with the result of certain values of the multiplication and use them as a basis to compute the final result. It can be shown that considering the string X (128 bits) split in 16 parts, each 8 bits long, the operation H · X can be performed storing a table of intermediate results 65,536 bytes large. This table must be computed every time a certain key is going to be used, computing the value H and then all the table entries. If we want to save memory we can consider X as split in 32 parts, each 16 bits long, and in this case the needed table is only 8,192 bytes large. In table 3.1 is shown the amount of memory needed and the correspondent throughput expressed in cycles per byte.

15

Method Simple, 8-bit tables Simple, 4-bit tables Shoup’s, 8-bit tables Shoup’s, 4-bit tables No tables

Storage requirement 65,535 bytes/key 8,192 bytes/key 1024 bytes + 4096 bytes/key 64 bytes + 256 bytes/key 16 bytes/key

Throughput 13.1 17.3 32.1 69.3 119

Table 3.1: Throughput of GHASH on a Motorola G4 processor

Hardware
One of the design goals of GCM was the efficiency and the possibility to be implemented in hardware in a simple way, to allow GCM deal with authenticated encryption over Gigabit networks. The choice of a hardware implementation is the only possible in such an environment, but it also a very good choice in other cases where the speed factor is not so relevant. A possible and straightforward implementation is the one represented in figure 3.1

Figure 3.1: GCM basic hardware implementation scheme The function requires 4 inputs: the IV, the additional data AAD, the plaintext and the key, that is not explicitly represented in the figure. The rhomboids represent the point where data are switched. The left part of the diagram is the part devoted to authentication, while the right part is the one performing encryption. At the very first cycle the IV is incremented and then sent to the encryption block (this block itself can be realized in hardware in many ways, but here we’re looking at it as a black box, since the explanation must be architectural independent). Now the data are sent to the left part of the diagram, to the XOR operation, and here they wait for the other data to be computed. The left part takes as input the AAD, performs the multiplication with H (remember, H is the 16

encrypted null vector) and sends the result to the XOR block where the data just computed are waiting. Note that this operation can be done in parallel to the first one. Now the first cycle is done and the switches are toggled. The IV is incremented and encrypted again, but this time the result is sent to the right part and XORed with the first block of plaintext waiting. This time the encrypted data block is sent to the multiplication block instead of the AAD, that will no longer be used. The operations go on until the plaintext is finished, and at the output we receive the plaintext and the authentication tag. There are some very interesting considerations to do on this scheme. The first thing is that if we look at figure 1 we see that the two parts of the scheme, the two pipelines, are independent except for the tag creation part. If we can build two distinct encryption block, performing the AES operations, we could completely split the two pipelines, so that they could run completely in parallel. The second observation is about the multiplying block. This part must perform the operation in GF (2128 ), or, generally speaking, in GF (2q ) (remember that to achieve the top speed, the 128 bit cipher should be chosen). The multiplication in such a field can be realized in hardware in many ways, each with different requirements in terms of area occupation and time to compute the result. The fact is that a parallel multiplier can do the operation in just 1 clock cycle, with an area occupation proportional to q 2 . In any case the area of this multiplier would be at maximum about 30% of the area needed by the AES encryption block, so it doesn’t affect a lot the total occupation. Only in applications where area occupation is very strongly limited we should consider other implementations of the multiplier, noting that all other solutions need an area and a time to provide the result proportional to q.

3.5

Security

To analyze GCM security, it’s necessary to introduce a lot of mathematics. This is not the goal of this text, so we just briefly report the lines to follow to achieve the result. Consider this experiment. We have what is called a permutation oracle that act in this way: it is a black box that gives as output a bit string; sometimes the bit string is a completely random sequence and sometimes it is a pseudo-random function (PRF), that is the result of an encryption with a given key. In the general case the probability of emitting random or pseudo-random streams is 0.5. An attacker can query the permutation oracle and obtain a result; he then has to determine whether the output string is a random sequence or the result of an encryption. If he can make it we have what is called the distinguishing advantage. In another experiment there are two oracles: the tag-generation oracle and the tag-verification oracle. The first receives as input a string from the attacker and generates an authentication tag for that string; the second receives a message and a tag, and tells if the tag actually verifies that message or not. The attackers can use the tag-generation oracle to construct message-tag pairs, try to understand the way it works and then provide a pair to the tag-verification oracle. Obviously the messages built by the first oracle will be accepted by the second one, but we indicate as the forgery advantage the probability that an attacker can build on his own a message/tag pair accepted by the tag-verification oracle. The distinguishing advantage is related to encryption security and the forgery advantage is related to authentication security. The goal of securing an authenticated encryption operation is to reduce to the minimum possible these two advantages, so that an attacker can’t exploit them to be able to build ad hoc messages or, worst case, recover the key. The proof of GCM security is not particularly complicated but requires some mathematics to be used and therefore we are not going to explain all the passages. The results are quite intuitive and tell us that to achieve a very high security we need a long tag (128 bits is advised) and not to use too long IVs as well as encrypting too long messages. In particular we observe that the distinguishing advantage is quadratic in the length of the plaintext and linear both in the length of the plaintext and the length of the IV ; the forgery advantage is quadratic in the length of P , and linear in the length of the pair C, A. If we use a long IV it will be hashed by the algorithm and therefore we increase the probability of

17

collisions using longer and longer IVs; in addition, if we encrypt very long messages we increase the probability of collisions in the authentication tag T . Obviously the security of a mode of operations relies on the security of the underlying bock cipher, so we can’t expect high security levels if we choose the right IVs, we don’t encrypt very long messages, but we use a key of only 32 bits... The official recommendation gives us some advices to increase the security provided by GCM. The first advice is about the keys and is quite obvious. Keys should be kept strictly secret and changed whenever is needed, distributing them in a safe way. The second advice is about IV. The repetition of an IV in the authenticated encryption function causes serious problems because of how the CTR mode of operation is built. Remember that GCM is based on the CTR technique. If we can induce a bit flipping in the ciphertext, a corresponding bit flip will be produced in the plaintext upon decryption. An attacker with an authenticated decryption oracle could induce strategic bit flipping to see the results in the plaintext. In any case there is the MAC, so performing such an attack to GCM should be useless because there is the tag. There is a problem though, when IVs are repeated. In this case the computed tag is only a function of the hash sub-key H, so it could theoretically possible to recover the sub-key, with specific attacks. An attacker with H at his disposal, could modify a ciphertext and then compute a valid authentication tag. The receiver would notice that the IV used is the same, but could think that the sender was in good faith, so would try to decrypt the message, and since the authentication would be verified he could think that the message is authentic. In conclusion IVs must be changed any time a transmission is performed and it must be guaranteed that also in the case of a power down the same IV must not be used twice. The third advice is about the tags. We know that given a tag of length t the probability of obtaining a collision (i.e.: the same tag with different ciphertexts) is 2−t . With GCM an attacker could use techniques that increase this probability. Any of these forgery attacks that succeed increases the probability that other attacks of the same kind come through, and finally the hash sub-key H could be compromised, canceling any authentication assurance. The fourth and last advice is related to the protection against replay attacks. To avoid this kind of attacks is sufficient to follow the general principle of the second advice, instructing the recipient to discard messages with duplicated IVs for a given key, and, furthermore, to use timestamps in the additional authenticated data.

18

Chapter 4

Conclusions
Authenticated Encryption is probably one of the best available examples to show what a wrong use of Intellectual Properties could do. The problem of AE is quite important, especially in a high speed network environment, and a reliable, fast and easy to implement AE protocol is needed. We’ve seen, though, that the best technical solution has been actually blocked by a dull use of the Intellectual Property concept. The two described solutions are good ways to solve the problem and are free of Intellectual Properties restrictions, but are not optimal solutions. They both use the two-pass combined mode strategy; CCM does it in a very simple way, because it was one of the first solutions of this type ever proposed; GCM is better in many ways, but in any case the approach is only slightly better than generic composition. So why choose an authenticated encryption protocol? One of the reason is that is often good to have a unique solution to solve two problems, just because it could be cheaper to implement, for instance. The advantage of saving time is unfortunately quite limited, since the single-pass solutions are in fact unavailable. It is useful, though, to have an algorithm like GCM that could implement just authentication, and AE when needed only. Let’s have a look at the main properties CCM offers. The main security function offered is of course authenticated encryption. There is no error propagation during the generation process. Sender and recipient must be synchronized as they both need to use the same nonce, based for example on a counter. The encryption process can be parallelized if needed but this is not true for the authentication process so CCM algorithm can’t be parallelized. The process needs a unique key, shared by sender n receiver and used both for the counter mode and the cipher block encryption, a nonce and a counter, which are part of the counter block. In terms of memory requirements CCM requires memory for the encrypt operation of the underlying block cipher algorithm, for the plaintext, the ciphertext and a packet counter. One important feature of CCM is that the encryption key stream can be precomputed, saving time and increasing speed. Unluckily the same cannot be said for authentication. To what relates GCM we can briefly summarize its main features. Like CCM there is no error propagation, because is based on a mode that operates with independent blocks. GCM can be efficiently parallelized, improving a lot the performances. It needs one key only: the structure of the algorithm is designed to eliminate the security issues that could rise from the use of the same key for the authentication and encryption parts. The IVs can have arbitrary length, although the suggested one is 96 bits, and should never be reused with the same key. GCM is on-line, in the sense that the recipient do not need to know the length of the incoming message, but can process the data blocks as they arrive. The authentication tag has a variable length, from 0 to 128 bits and the ciphertext has the same length of the plaintext. Probably the most important feature of GCM is its possibility to be easily implemented in hardware, allowing throughput of more than 10Gbps.

19

Bibliography
[1] NIST Special Publication 800-38C. [2] NIST Special Publication 800-38D. [3] www.wikipedia.org [4] J. Black. Authenticated encryption. [5] P. Rogaway, D. Wagner. A Critique of CCM. [6] J. Jonsson. On The Security of CTR + CBC-MAC. [7] D. Whiting, R. Housley, N. Ferguson. Counter with CBC-MAC (CCM) IETF Internet Draft [8] D. A. McGrew, J. Viega. The Security and Performance of the Galois/Counter Mode (GCM) of Operation [9] D. A. McGrew, J. Viega. The Galois/Counter Mode of Operation (GCM) [10] D. A. McGrew, J. Viega. Flexible and Efficient Message Authentication in Hardware and Software [11] K. Claffy, G. Miller, K. Thompson The nature of the beast: recent traffic measurements from an Internet backbone

20