Comp128 Thesis

Student Research Project
A performance oriented implementation of
COMP128
Philipp Sdmeyer Matr.-Nr.: 108 000 21 85 23 Last changes: April 15, 2006
Ruhr-University Bochum, Germany Chair for Communication Security http://www.crypto.rub.de Supervisor: Dipl.-Ing. Kai Schramm
This Study Work deals with the secret algorithm COMP128 which is used in GSM Networks for authentication purpose. The algorithm has been derived from some leaked pages (refer [4]) of the secret standard and with help of reverse engineering by Marc Briceno, Ian Goldberg, and David Wagner in 1998. In this work the published source code is used as a basis to implement the algorithm in a Hardware Security Module (HSM) with maximal performance. Therefor the original source code is changed step by step and the changes in performance are measured. Moreover the same algorithm will be designed with the hardware description language VHDL, to see the dierences in performance between hardware and software implementations.
Contents
1. Introduction 1.1. GSM-Network . . . . . . . . . . . . 1.1.1. Infrastructure . . . . . . . . 1.1.2. Link Connection . . . . . . 1.2. The Environment . . . . . . . . . . 1.2.1. Hardware Security Modules 1.2.2. PKCS #11 . . . . . . . . . 2. Description of COMP128 2.1. The Algorithm . . . . . . . . . 2.2. Security Vulnerabilities . . . . . 2.2.1. The Narrow Pipe Attack 2.2.2. The Partitioning Attack 3. Implementing the Algorithm 3.1. The Host . . . . . . . . . . . 3.2. The Functionality Module . . 3.2.1. The Skeletal Structure 3.2.2. The Core Function . . 3.3. Chronometry . . . . . . . . . 6 6 6 8 9 9 9 10 10 13 14 14 16 17 18 18 19 20
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4. Code Optimization 21 4.1. Single Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5. Conclusion 28
A. C- Sourcecode 30 A.1. Reference Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2. Optimized Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
List of Figures
1.1. The logical structure of GSM-networks . . . . . . . . . . . . . . . . . . . 2.1. Basic functionality of COMP128 . . . . . . . . . . . . . . . . . . . . . . . 3.1. Client-Server principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 11 16
1. Introduction
The intention of this chapter is a classication of the whole topic. Hence the infrastructure of a GSM-Network is described shortly to show where COMP128 is used, as well as the technical equipment is described with which the authentication can be done and which is used for the following performance test.
1.1. GSM-Network
1.1.1. Infrastructure
The infrastructure consists of Mobile Stations (MS) which are used by the Customers, and the network which is operated by the provider. To communicate with the network, every MS needs a Subscriber Identity Module (SIM) which is usually situated in the MS. The SIM holds all required data to identify the customer, in particular the secret key Ki and the International Mobile Subscriber Identity (IMSI) which is a unique ID. The network cuts into 3 areas: Base Station Subsystem (BSS) Network and Switching Subsystem (NSS) Operation Subsystem (OSS) The BSS combines mutiple Base Transceiver Stations (BTS) and one Base Station Controller (BSC). In this context the BSC controls the BTSs in its area and builds the connection to the NSS. Thereby the communication within a BSS is usually not encrypted. A NSS is responsible for all the network management, e.g. the Mobile Switching Centers (MSC) provide the interconnection between dierent networks.
Figure 1.1.: The logical structure of GSM-networks
To do its job NSSs use the following databases: Home Location Register (HLR) Visited Location Register (VLR) Equipment Identity Register (EIR) Authentication Center (AuC) Thereby the HLR is the core of the whole network. One can compare it to a TLD-DNS in the internet, since it provides all information about one providers customers. The VLR1 requests all information about the MS in the MSCs area from the HLR and stores it in its local database. The EIR stores MS specic data, in particular the Individual Mobile Equipment Identity (IMEI) which is throughout the world a unique serial number. In the AuC all condential data for authentication and encryption is stored. All that is coordinated by the OSS which consists in rst instance of the Operation And Maintenance Center (OMC).
In accordance with the HLR one can compare the VLR with a normal (authoritative) DNS-server.
1.1.2. Link Connection

Each time a MS wants to link a connection the followings steps are executed: MS sends its TMSI2 or IMSI to the next BTS BTS relays information to AuC AuC sends back a triple of the Random Value RAND 3 and associated Signed Response (SRES) 4 , as well as the secret session key Kc 5 BTS sends RAND to MS MS calculates Signed Response (SRES) from RAND and Ki , and sends it to the BTS BTS compares SRES and SRES If both SRES-values are identical the authentication was successful and the link can be done. During the authentication process there was also a session key derived from RAND which is used to initialize the encryption algorithm A5. Thus the following communication can be encrypted and the customers private key Ki has not been transmitted. Merely the customers SIM and the AuC know but never publish Ki . So it is a usual task for hardware security modules (HSM) to do all needed calculations during the authentication process. There are three algorithms used during the mentioned process, which interfaces are dened in the GSM-standard. Moreover the encryption algorithm A5 is dened6 and there is an example for a combined algorithm to derive SRES and Kc , called COMP128 . In general the algorithms A3 (authentication) and A8 (key generation) are not custom made by the provider but substituted by COMP128 . The algorithm has never been published to the public but Marc Briceno et al. reconstructed it by reverse engineering and published the source code. That source code provides the basis for the following implementation in an HSM . The algorithm is in many respects not secure but still used by the majority of GSMproviders.
2
temporary ID, assigned by the competent VLR. Just a substitution to the IMSI in order to protect the customers privacy. 3 16 byte 4 4 byte 5 8 byte 6 But it is not available to the public.
1.2. The Environment

1.2.1. Hardware Security Modules
Hardware security modules are used in most cases of processing sensitive data, especially in connection with encryption processes, since they usually provide a not too bad performance. Above all they provide the most secure housing of sensitive data. Todays HSMs are equipped with various kinds of tamper protection, e.g. light- and shock sensors and permanent control of the power supply. If any anomalous condition appears the secure memory gets erased. Eracom has specialized in developing HSMs. The Orange-Series is Eracoms recent Series, which includes Protect Host, Server and Server external. It is compatible to PKCS #11 and provides most of the functions mentioned in that standard. One specialty of the Orange-Products is the ability to load a Functionality Module (FM). That allows the customer to implement its own cryptographic algorithm or other sensitive functions to be executed within the HSMs rmware. Thus it can be used to provide new functions, which can also fall back on present functionalities. In fact it can also be used to replace present functions by new ones. To work with the HSM over a network or the PCI-Bus a so called HSM Access Provider software is always needed, which provides an interface for the communication. Moreover the ProtectProcessing Orange software is needed for the development of custom functions.
1.2.2. PKCS #11

Since cross-linking throughout the whole world is no more vision and as a result security requirements became more and more important over the last years, there is also a need of standardization the security functions. RSA Technologies has started developing these common standards in cooperation with other companies such as Apple, DEC, Lotus, Microsoft, MIT and Sun in 1991. Today the Public Key Cryptographic Standard s (PKCS ) are the de facto standards not only for secure e-mail and SSL (PKCS #7, #10, #12) but also for a wide variety of other application. One of them is reected by PKCS #11 which is also called Cryptoki 7 . Cryptoki denes a programming interface for cryptographic devices, e.g. smartcards. Moreover it is used as the standard interface for recent HSMs, since it provides basically all common cryptographic algorithms.
Cryptographic Token Interface
2. Description of COMP128
2.1. The Algorithm
The algorithm is used in GSM networks for the purpose of authenticating the Mobile Station to the Base Transceiver Station and to initialize the encryption standard A5 which is used to encrypt the air to air connection. The GSM specication only mentions COMP128 as an example of how to implement the A3 (authentication) and A8 (key derivation) algorithms using a single function. The algorithm is still not available to the public but it has been reverse engineered by Marc Briceno, Ian Goldberg, and David Wagner in 1998. In the following their source code is used to explain the algorithm.
Basic functionality Input: 16 Byte random value (challenge) 16 Byte secret key of the mobile station Output: 12 byte -32 bit used for authentication -54 + 10 bit used for A5 initialization Order of events 1. The random and key value are concatenated to the input x. 2. The input is hashed (8 times) which reduces it from 32 bytes to 16 bytes. 3. After each but the last hashing, the result value x is permutated. 4. The result of the permutation is used as the random input for the next round. 5. After 8 passes the hash value is used as the algorithms output without permuting it.
10
Figure 2.1.: Basic functionality of COMP128
The hash function The hash function itself consists of 5 rounds in which 16 pairs of two "bytes" are substituted (refer Table 2.1). In each round a dierent s-box is used. The rst one just substitutes each 2 octets by 2 octets, the second reduces each 2 octets to a pair of 7 bit values. The third round reduces 16 pairs of 7 bit values to 16 pairs of 6 bit values and so on. Thereby the basic principle of the hash function is a buttery structure 1 . The reduction is done with an interim value: y = (x [m] + 2 x [n]) z = (2 x [m] + x [n]) After that the compression is done as follows: x[m] = table j [y] x[n] = table j [z] (mod 29j ) (mod 29j )
Which means that the positions of two values in the array swap during the reduction.
11
Table 2.1.: Pairs of variables, built in loops (refer source code)
At this point the x[ ]-values consist of 4 bits, not 8! The permutation After each round of COMP128, except the last one, the permutation is performed as follows: The low order nibbles in the array x[ ] are to be interpreted as a bit-array bit (after the reduction, the four MSBs are 0) The new position of every bit is then dened in this way: bit[i] = bit [(17 i)(mod128)] After that the bit-array is used to reinitialize the highest 4 byte of the input x[ ]. Thereby the bits bit[i] to bit[i + 7] are used as the byte x[i], but in reverse order. So bit[i] is the MSB of byte[i].
12
Output of COMP128 The algorithm provides a string of 128 bits but only 86 of them are used: The rst 32 bits are used as the response for authentication. The last 54 bits are used to initialize the LFSRs of the A5 algorithm, which needs 64 bits for the initialization. Therefore the last 10 bits of the A5 initialization process are zeros.
2.2. Security Vulnerabilities

On April 13, 1998 Marc Briceno, Ian Goldberg and David Wagner published an attack on COMP128 with which it is possible to nd out the secret key Ki . Because the whole GSM-specication2 is publicly available, one can now clone a SIM-card and then spy on the owners phone calls or misuse the SIM for free calls. Marc Briceno et al. explained an attack with physical access to the card for about 8 hours but supposed that it is possible to do the attack over the air. The GSM reacted on that and worked on new versions of COMP128. Thus the original function was renamed to COMP128-1 and there where 2 other versions developed which are unpublished until now. The second version seems to be a new way to calculate the response, whereas COMP128-3 just expands the length of the response taken to initialize A5. But most providers presumably still use COMP128-1 with only one change to the SIM card. They used to set a maximal number of requests to be responded by the SIM until it gets useless, which is smaller then the average number of 164,000 challenges needed to perform the attack successfully. Two years after the rst published attack a research group of IBM in cooperation with the Swiss Federal Institute of Technology published the so called Partitioning Attack which is based on a dierential power analysis (DPA). The result is an attack which needs just 8 requests to be send to the SIM. With that attack it is possible to copy every SIM3 to which one has physical access for only a few seconds. Particularly it is not possible anymore to protect the SIM by just setting up a maximal number of requests since a SIM usually has to respond to much more then 8 requests within a few days. Meanwhile there is a fourth version available which is again totally new. The greatest improvement is that GSM changed from their former security model security by obscurity to a public standard, based on AES. The COMP128-4 algorithm is used in UMTSnetworks.
2 3
excluding the cryptographic algorithms using COMP128-1
13
2.2.1. The Narrow Pipe Attack

The following information is primary based on chapter 10 of the diploma work [6] by Eric Zenner in which the original publication has been worked out in detail. The attack is a collision attack which is rst of all possible because 95% of the collisions in the output originate in the second round4 , whereas a collision in the rst round is impossible since it is an one-to-one mapping5 . With their analysis Briceno et al. found out that "in particular, bytes i,i+8,i+16,i+24 at the output of the second round depend only on bytes i,i+8,i+16,i+24 of the input to COMP128." (refer [5]). This fact is called the Narrow Pipe. Due to the birthday paradoxon one can expect a collision after about 214.326 dierent challenges. From this it follows that one can derive 2 Bytes of the key after each 20,538 challenges. Since the key has a length of 16 Bytes it takes about 8 214.326 164, 300 challenges to know the whole key. A normal SIM card can respond to about 6 challenges per second. Thus it takes about 7.5h to get the required data. To nd out the key-fragments one can just try every possibility6 since it takes only a few seconds on an average personal computer. In fact one takes the pair of colliding challenges and combines each of them with the same key presumption. Then one calculates the rst to rounds of COMP128 and checks the result for a collision in the depending keyand challenge-bytes. If all four pairs correspond the presumed key-fragments are correct.
2.2.2. The Partitioning Attack

The partitioning attack in general feeds on the slow speed of smart card processors which makes it usually indispensable to get rid of some calculations and look up the results in tables instead. These table lookups can have some characteristics which help the attacker. In the case of COMP128 the important characteristic is that the processor can only address 8 bit but the rst s-box consists of 512 values which need an address room of 9 bit. Thus the table needs to be split in minimum two pieces. The IBM engineers made the assumption that the table is just split in the middle. With that assumption they did detailed measurements with which they could nally design their partitioning attack. The resulting procedure can be completed with a maximum of 1000 random challenges
4
By round, I refer to one layer of butteries and S-boxes; there are a total of 5*8 rounds in COMP128. (refer [5]) 5 refer the s-boxes 6 65,535 possibilities
14
or 255 chosen challenges. Finally the attack was reduced to 8 adaptive chosen challenges which makes it possible to clone a SIM within just 2 seconds. But anyway it is understood that this attack can only be done with physical contact and not over the air. For detailed information refer to the published paper [7] where one can also nd suggestions to protect the SIM.
15
3. Implementing the Algorithm

The program which is to be developed in the following consists of two main parts. On the one hand there is the HSM in which we download the Functionality Module, on the other hand there is a host from which we want to send requests to the HSM. The communication between the Host and the HSM can be seen as a client/server-architecture, as illustrated in gure 3.1. That gure is independent of the actual connection between client and server. We dont make use of the PKCS #11 interface but write a so called custom function. Thus a user dened communication protocol is needed, dening the packets sent and received to and from the HSM. Since the reference implementation is calling A3A8 with the challenge and key included, the FM will receive both values as well. Another possibility could be, to let the HSM derive a challenge with its random processor and send this to the host. Moreover in a secure environment the key would be passed over as encrypted data. There is no security associated with this implementation since the key is passed into the HSM in clear type! All source code in this paper makes use of variable types such as uint8 and uint32 . These types signify appropriate unsigned variables of the specied length, for the used environment (e.g. 8 bits)!
Figure 3.1.: Client-Server principle
16
3.1. The Host

On the host site we need particularly two functions. First we need the main function which communicates with the user and receives the challenge to be derived. Moreover a function is needed which takes the user information, converts it into the dened packetformat and sends the packet to the HSM, using the HSM Access Provider . The main function could be changed, for example to receive good random values from a special source, but the host-application is static. Listing 3.1: The main-function
i n t h e x t o i n t ( char ) ; void comp128 ( unsigned char [ 1 6 ] , unsigned char [ 1 6 ] , unsigned char [ 1 2 ] ) ; i n t main ( i n t a r g c , char a r g v ) { unsigned char c h a l l e n g e [ 1 6 ] = { 0 } , key [ 1 6 ] = { 0 } , r e s p o n s e [ 1 2 ] = { 0 } , i =0; MD_RV r v = MDR_UNSUCCESSFUL; f o r ( ; i <16; i ++) c h a l l e n g e [ i ] = ( h e x t o i n t ( a r g v [ 1 ] [ 2 i +2])<<4) | h e x t o i n t ( a r g v [ 1 ] [ 2 i + 3 ] ) ; f o r ( i =0; i <16; i ++) key [ i ] = ( h e x t o i n t ( a r g v [ 2 ] [ 2 i +2])<<4) | h e x t o i n t ( a r g v [ 2 ] [ 2 i + 3 ] ) ; rv = MD_Initialize ( ) ; i f ( r v =! MDR_OK) p r i n t f ( " \nAn e r r o r h a s o c c u r e d : %d\n" , r v ) ; comp128 ( c h a l l e n g e , key , r e s p o n s e ) ; return 0 ; } int { h e x t o i n t ( char x ) x = toupper ( x ) ; i f ( x >= A && x <= F ) return x A +10; e l s e i f ( x >= 0 && x <= 9 ) return x 0 ; p r i n t f ( " bad i n p u t . \ n" ) ; exit (1); }
The main-function reads two 16 byte hex-values from the standard input. After a validity check and converting them to regular integers it calls the host-application which builds the packets and sends them to the HSM, afterwards. Listing 3.2: The host-application
void comp128 ( unsigned char c h a l l e n g e [ 1 6 ] , unsigned char key [ 1 6 ] , unsigned char r e s p o n s e [ 1 2 ] ) unsigned long p R e c e i v e d L e n =0 , pFmStatus =0 , i =0; MD_RV s t a t u s = MDR_UNSUCCESSFUL; MD_Buffer_t R e q u e s t [ 3 ] , Reply [ 2 ] ; R e q u e s t [ 0 ] . pData = key ; R e q u e s t [ 0 ] . l e n g t h = 16 s i z e o f ( unsigned char ) ; R e q u e s t [ 1 ] . pData = c h a l l e n g e ; R e q u e s t [ 1 ] . l e n g t h = 16 s i z e o f ( unsigned char ) ; R e q u e s t [ 2 ] . pData = NULL ; Request [ 2 ] . l e n g t h = 0 ; Reply [ 0 ] . pData = r e s p o n s e ; Reply [ 0 ] . l e n g t h = 12 s i z e o f ( unsigned char ) ; Reply [ 1 ] . pData = NULL ; Reply [ 1 ] . l e n g t h = 0 ; s t a t u s = MD_SendReceive ( 0 , 0 , FM_NUMBER_CUSTOM_FM, Request , RESERVED, Reply , &p R e c e i v e d L e n , &pFmStatus ) ; i f ( s t a t u s =! MDR_OK) p r i n t f ( " \nAn e r r o r h a s o c c u r e d : %d\n" , s t a t u s ) ; } {
17
The packet denition is done with an array of structures. Thereby each struct contains a pointer to the data-block, followed by its length:
typedef s t r u c t { u i n t 8 pData ; uint32 length ; } MD_Buffer_t ;
The variable MD_RV is a type-denition, containing all of the HSMs possible return codes. The function MD_SendReceive nally calls the Access Provider:
MD_RV MD_SendReceive ( u i n t 3 2 hsmIndex , / uint32 originatorId , / u i n t 1 6 fmNumber , / MD_Buffer_t pReq , / uint32 reserved , / MD_Buffer_t pResp , / u i n t 3 2 p R e c e i v e d L e n , / u i n t 3 2 pFmStatus ) ; / i n d i c a t e s HSM t o communicate w i t h r e q u e s t o r i g i n a t o r s ID . U s u a l l y 0 i n d i c a t e s t h e demanded FM sendb u f f e r , l a s t element has t o be {NULL, 0} not used , has t o be 0 r e c e i v e b u f f e r , l a s t element has t o be {NULL, 0} pointer to v a r i a b l e holding s i z e of received b u f f e r p o i n t e r t o v a r i a b l e c o n t a i n i n g t h e FMs s t a t u s message / / / / / / / /
3.2. The Functionality Module

3.2.1. The Skeletal Structure
A Functionality Module usually consists of minimum two functions: The startup-function Every FM includes a startup function, which is called after downloading it to the HSM. The purpose of this function is to attach the FM to the rmware. Thereby function calls with the ID 0x0100 get forwarded to the FM. The FM usually oers an interface1 which prepares the received data, calls the core function and sends the result back to the rmwares message processor.
#d e f i n e FM_NUMBER_CUSTOM_FM 0 x0100 FM_RV S t a r t u p ( void ) { FM_RV r v ; r v = FMSW_RegisterDispatch (FM_NUMBER_CUSTOM_FM, return r v ; }
MessageHandler ) ;
The function FMSW_RegisterDispatch() is dened as follows:

FMSW_STATUS FMSW_RegisterDispatch ( FMSW_FmNumber_t fmNumber , / FM number / FMSW_DispatchFn_t d i s p a t c h ) ; / d i s p a t c h f u n c t i o n p o i n t e r /
whereas FM_NUMBER_CUSTOM_FM is the dened ID for the users individual FM2 . The framework of the dispatch function is dened as follows:
1 2
the so called dispatch-function there are other functionality modules which represent basic HSM-functionalities and cannot be changed
18
typedef void ( FMSW_DispatchFn_t ) ( unsigned long t o k e n , / A t o k e n used t o a l l o c a t e r e p l y b u f f e r s , and send t h e r e p l y back t o h o s t . / void r e q B u f f e r , / P o i n t e r t o t h e r e q u e s t b u f f e r . / unsigned long r e q L e n g t h ) ; / Length o f t h e r e q u e s t b u f f e r . /
The dispatch function The function MessageHandler receives a pointer, which shows on the beginning of the memory where all received information can be found. The arrangement of all the data within the memory is dened by the host application, hence it lies with the developer to extract the information:
s t a t i c void M e s s a g e H a n d l e r ( unsigned long t o k e n , void r e q B u f f e r , unsigned long r e q L e n g t h ) { unsigned char pRand , pKey , pSimoutput ; pSimoutput = SVC_GetReplyBuffer ( t o k e n , 12 s i z e o f ( unsigned char ) ) ; pKey = ( unsigned char ) r e q B u f f e r ; pRand = ( unsigned char ) ( r e q B u f f e r + 16 s i z e o f ( unsigned char ) ) ; comp128 ( pRand , pKey , pSimoutput ) ; SVC_SendReply ( t o k e n , 0 ) ; }
In this case the message handler prepares the received data, allocates memory for the response, calls the core function comp128 and sends the result back to the host, using SVC_SendReply(). That function is dened as follows:
void SVC_SendReply ( HI_MsgHandle t o k e n , / uint32 applicationStatus ) ; / The t o k e n i d e n t i f y i n g t h e r e q u e s t / A s t a t u s code f o r t h e e x e c u t i o n o f t h e r e q u e s t , which w i l l be r e p o r t e d t o t h e h o s t a p p l i c a t i o n . The v a l u s o f t h i s parameter does not a f f e c t t h e r e p l y d e l i v e r y i n any way /
The function SVC_GetReplyBuer returns the pointer to a memory block of the requested size and associates it with the token which is later send back to the calling host application.
3.2.2. The Core Function

The core function is akin to the reference function which is also used to describe the algorithm COMP128. The only dierence is that in Briceno et aliis reference code the parameters are send as arrays whereas pointers to these arrays are send in the reference version for the FM. Moreover the order of parameters has been corrected since they appear in wrong order in the original version. Refer to the appendix to see the reference version as well as the optimized version of the core function. In the next chapter the changes to the reference code and its impact is discussed.
19
3.3. Chronometry
The performance is derived by the round trip time (RTT), not only by the runtime on the HSM. But since the measurement is done with a HSM plugged into the hosts PCI-bus there will be no great dierence between the RTT and the FMs runtime. The following function can be used for any chronometry purpose with precise results, assuming that the measured function is much more time consuming than a loop:
#include < s t d i o . h> #include <t i m e . h> i n t measure ( long r o u n d s ) { long i ; double t i m e ; s t r u c t t i m e s p e c time1 , t i m e 2 ; i f ( c l o c k _ g e t t i m e (CLOCK_REALTIME, &t i m e 1 ) ; ) exit (1); f o r ( i =0; i <r o u n d s ; i ++) / put i n f u n c t i o n t o be measured . / i f ( c l o c k _ g e t t i m e (CLOCK_REALTIME, &t i m e 2 ) ; ) exit (1); time = ( time2 . tv_sec time1 . tv_sec ) + ( double ) ( t i m e 2 . t v _ n s e c t i m e 1 . t v _ n s e c ) / 1 0 0 0 0 0 0 0 0 0 . ; p r i n t f ( " \ n t o t a l t i m e : %f \ n i t e r a t i o n s : %d\ n t i m e p e r p a s s : %f \ n c a l l s time , r ounds , t i m e / rounds , r o u n d s / t i m e ) ; return 0 ; }
p e r s e c o n d : %f \n" ,
20
4. Code Optimization
In this chapter the reader can retrace the single ideas to increase the performance and its impact on the whole result. For this reason each change is based on the reference code, as long as it is not depending on prior changes. The source code will be compiled with and without the optimization option, provided by gcc. This way one can see if optimized c-code can be compiled more ecient or if the impact on the eciency decreases with increased optimization level. Moreover the equivalent IBM-compatible code will be tested. In fact that source code can be found in the appendix, so that the reader has a chance to retrace the changes and compile the code on a standard PC. New code, e.g. a new denition, is printed separately whereas changes are shown in a tabular view, with the pristine code and line number on the right side.
4.1. Single Improvements

1. MODIFICATION Instead of passing three parameters to the core function just one structures gets send to it, containing each one array for the key and challenge and one pointer for the output. That way we get rid of one address: The struct indicates the start of the rst array and implicit the array of the second one. Since we cant dene an own array for the output but need to call SVC_GetReplyBuer (refer p. 19), it is unhandy not to use that pointer.
struct parameter { u i n t 8 rand [ 1 6 ] ; u i n t 8 key [ 1 6 ] ; uint8 simoutput ; }; struct parameter par ; memcpy ( p a r . key , r e q B u f f e r , 1 6 ) ; memcpy ( p a r . rand , ( r e q B u f f e r + 16 s i z e o f ( u i n t 8 ) ) , 1 6 ) ; p a r . s i m o u t p u t = SVC_GetReplyBuffer ( t o k e n , 12 s i z e o f ( u i n t 8 ) ) ; A3A8 ( p a r ) ; 21 22 23 24 25 u i n t 8 pKey , pRand , pSimoutput ; pKey = ( u i n t 8 ) r e q B u f f e r ; pRand = ( u i n t 8 ) ( r e q B u f f e r + 16 s i z e o f ( u i n t 8 ) ) ; pSimoutput = SVC_GetReplyBuffer ( t o k e n , . . . ) ; A3A8 ( pRand , pKey , pSimoutput ) ;
21
2. MODIFICATION In this modication multiplications with factors to the basis 2 get substituted by bit-movements. The implementation is shown exemplary in the following snippet:
y = ( x [m] + ( x [ n] < <1)) % (1<<(9 j ) ) ; z = ( ( x [m]<<1)+x [ n ] ) % (1<<(9 j ) ) ; 50 y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; 51 z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ;
3. MODIFICATION The modulo operand needs a lot of computing. Since all apearing modulo operations use a modulus to the basis 2, one can substitude that operation by bitwise AND operations. Every n-digit decimal number can be described as follows:
k1
b =
i=0 n1
ai 2i ai 2i +
i=0
, ai {0; 1}
k1
= b =
ai 2i
i=n
,k n + 1
The second part of this expression is divisible by 2n : 1 2n

k1 k1
ai 2i =
i=n i=n
ai
k1 2i = ai 2in 2n i=n
,i n i n 0
The rst part of the expression is not divisible by 2n . Since the second part is divisible by 2n , the rst part is the remainder:
n1
ai 2i < 2n
i=0 k1 n1
= b mod 2n = (
i=0
ai 2i ) mod 2n =
i=0
ai 2i
For this reason we can substitute the modulo operation by just extracting the low order bits:
y = ( x [m]+2 x [ n ] ) & ((1<<(9 j ) ) 1 ) ; z = ( 2 x [m]+ x [ n ] ) & ((1<<(9 j ) ) 1 ) ; ... next_bit = ((8 j + k )17) & 127; 50 y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; 51 z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ; ... 64 n e x t _ b i t = ( ( 8 j + k ) 1 7 ) % 1 2 8 ;
4. MODIFICATION Now we try to get rid of the loop which initializes the key and challenge, because a loop is in general computationally expensive. There are two smart possibilities to substitute the loop by single commands in which the second one is less common, but it is applicable anyway. At rst we set the values by using memcpy:
22
/ l o a d RAND i n t o l a s t 16 b y t e s memcpy(&( x [ 1 6 ] ) , rand , 1 6 ) ; / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / l o a d KEY i n t o l a s t 16 b y t e s memcpy(&( x [ 0 ] ) , key , 1 6 ) ;
o f i n p u t /
o f i n p u t /
16 17 18 19 20 21 22 23 24
/ l o a d RAND i n t o l a s t 16 b y t e s f o r ( i =16; i <32; i ++) x [ i ] = ( rand + i 1 6 ) ; / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / l o a d KEY i n t o l a s t 16 b y t e s f o r ( j =0; j <16; j ++) x [ j ] = ( key + j ) ;
o f i n p u t /
o f i n p u t /
The other way is to set the arrays with the help of long integers. As mentioned above, it is not a common method but it is feasible:
/ l o a d RAND i n t o l a s t 16 b y t e s o f i n p u t / ( ( u i n t 3 2 ) x ) [ 4 ] = ( ( u i n t 3 2 ) ( rand ) ) [ 0 ] ; ( ( u i n t 3 2 ) x ) [ 5 ] = ( ( u i n t 3 2 ) ( rand ) ) [ 1 ] ; ( ( u i n t 3 2 ) x ) [ 6 ] = ( ( u i n t 3 2 ) ( rand ) ) [ 2 ] ; ( ( u i n t 3 2 ) x ) [ 7 ] = ( ( u i n t 3 2 ) ( rand ) ) [ 3 ] ; / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / l o a d KEY i n t o l a s t 16 b y t e s o f i n p u t / ( ( u i n t 3 2 ) x ) [ 0 ] = ( ( u i n t 3 2 ) ( key ) ) [ 0 ] ; ( ( u i n t 3 2 ) x ) [ 1 ] = ( ( u i n t 3 2 ) ( key ) ) [ 1 ] ; ( ( u i n t 3 2 ) x ) [ 2 ] = ( ( u i n t 3 2 ) ( key ) ) [ 2 ] ; ( ( u i n t 3 2 ) x ) [ 3 ] = ( ( u i n t 3 2 ) ( key ) ) [ 3 ] ;
16 17 18 19 20 21 22 23 24
/ l o a d RAND i n t o l a s t 16 b y t e s f o r ( i =16; i <32; i ++) x [ i ] = ( rand + i 1 6 ) ; / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / l o a d KEY i n t o l a s t 16 b y t e s f o r ( j =0; j <16; j ++) x [ j ] = ( key + j ) ;
o f i n p u t /
o f i n p u t /
Both variants make no great dierence. In fact it depends on the individual architecture and compiler which variant brings better performance. Since the rst version is more usual and the performance is very similar in this case, we use memcpy in our nal version. 5. MODIFICATION In this modication we get rid of the calculations for m and n. It depends on the individual case wether it is better to do a table lookup or to calculate values. Since we have enough memory and want to reach the best performance it is the better choice to do a table lookup. In association with that we can get rid of one loop and do not need the variable l anymore.
const unsigned char j_0 [ 3 2 ] = { . . . } , j_1 [ 3 2 ] = { . . . } , j_2 [ 3 2 ] = { . . . } , j_3 [ 3 2 ] = { . . . } , j_4 [ 3 2 ] = { . . . } , s p a i r s [ 5 ] = { j_0 , j_1 , j_2 , j_3 , j_4 } ; f o r ( j =0; j <5; j ++) f o r ( k =0; k <16; k++) { m = spairs [ j ] [ k ] ; n = s p a i r s [ j ] [ k +16]; y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ; x [m] = t a b l e [ j ] [ y ] ; x [ n ] = table [ j ] [ z ] ; } 45 f o r ( j =0; j <5; j ++) 46 f o r ( k =0; k<(1<< j ) ; k++) 47 f o r ( l =0; l <(1<<(4 j ) ) ; l ++) { 48 m = l + k(1<<(5 j ) ) ; 49 n = m + (1<<(4 j ) ) ; 50 y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; 51 z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ; 52 x [m] = t a b l e [ j ] [ y ] ; 53 x [ n ] = table [ j ] [ z ] ; 54 }
6. MODIFICATION 6.1 Now the j-loop gets substituted by manual coding. For this purpose the commands within the loop get dened as loop(j). This way we get rid of multiple counting and comparisons which are computationally expensive.
23
#d e f i n e l o o p ( j ) f o r ( k =0; k <16; k++) { m pairs [ j ] [ k ] ; =s n=s p a i r s [ j ] [ k + 1 6 ] ; y=( p a r . x [m] + ( p a r . x [ n]<<1))&((1<<(9 j ) ) 1 ) ; z =(( p a r . x [m]<<1)+ p a r . x [ n])&((1 < <(9 j ) ) 1 ) ; p a r . x [m]= t a b l e [ j ] [ y ] ; p a r . x [ n]= t a b l e [ j ] [ z ] ; } 43 f o r ( j =0; j <5; j ++) 44 f o r ( k =0; k <16; k++) { 45 m = spairs [ j ] [ k ] ; 46 n = s p a i r s [ j ] [ k +16]; 47 y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; 48 z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ; 49 x [m] = t a b l e [ j ] [ y ] ; 50 x [ n ] = table [ j ] [ z ] ; 51 }
\ \ \ \ \ \ \
loop loop loop loop loop
(0); (1); (2); (3); (4);
6.2 The 2-dimensional arrays sbox and spairs are reduced to one dimension. That can be done quiet easy, since we did modication 6.1: we just introduce a macro-variable:
#d e f i n e l o o p ( j ) f o r ( k =0; k <16; k++) { m p a i r s _##j [ k ] ; =s n=s p a i r s _##j [ k + 1 6 ] ; y=(x [m] + ( x [ n]<<1))&((1<<(9 j ) ) 1 ) ; z =(( x [m]<<1)+x [ n])&((1 < <(9 j ) ) 1 ) ; x [m]= t a b l e _##j [ y ] ; x [ n]= t a b l e _##j [ z ] ; } \ \ \ \ \ \ \ 11 #d e f i n e l o o p ( j ) f o r ( k =0; k <16; k++) { 12 m pairs [ j ] [ k ] ; =s 13 n=s p a i r s [ j ] [ k + 1 6 ] ; 14 y=(x [m] + ( x [ n]<<1))&((1<<(9 j ) ) 1 ) ; 15 z =(( x [m]<<1)+x [ n])&((1 < <(9 j ) ) 1 ) ; 16 x [m]= t a b l e [ j ] [ y ] ; 17 x [ n]= t a b l e [ j ] [ z ] ; 18 } \ \ \ \ \ \ \
7. MODIFICATION In this modication we try to increase the performance by substituting c code by macro code, again. Thus we substitute the k-loop which forms bits from bytes.
#d e f i n e l o o p 3 ( k ) b i t [ 4 j+k ] = ( x [ j ]>>(3k ) ) & 1 ;
/ form b i t s from b y t e s / f o r ( j =0; j <32; j ++) { loop3 ( 0 ) ; loop3 ( 1 ) ; loop3 ( 2 ) ; loop3 ( 3 ) ; }
54 / form b i t s from b y t e s / 55 f o r ( j =0; j <32; j ++) 56 f o r ( k =0; k <4; k++) 57 b i t [ 4 j+k ] = ( x [ j ]>>(3k ) ) & 1 ;
8. MODIFICATION 8.1 Now we try to improve the performance by changing the code for the permutation. Therefor the inner loop gets substituted by manual code:
24
/ permutation b u t not on t h e l a s t l o o p / i f ( i < 8) f o r ( j =0; j <16; j ++) { x [ j +16] = 0 ; next_bit = (8 j ) & 127; x [ j +16] |= b i t [ n e x t _ b i t ] << 7 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 6 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 5 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 4 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 3 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 2 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] << 1 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x [ j +16] |= b i t [ n e x t _ b i t ] ; }
58 / permutation b u t not on t h e l a s t l o o p / 59 i f ( i < 8 ) 60 f o r ( j =0; j <16; j ++) { 61 x [ j +16] = 0 ; 62 f o r ( k =0; k <8; k++) { 63 next_bit = ((8 j + k )17) % 128; 64 x [ j +16] |= b i t [ n e x t _ b i t ] << (7k ) ; 65 } 66 }
8.2 If we manage not to use the array x[ ] in every step of the permutation we can save indexing overhead. For this reason we introduce the new variable x_tmp to set the byte and write each byte to the arrays appropriate position at once.
/ permutation b u t not on t h e l a s t l o o p / i f ( i < 8) f o r ( j =0; j <16; j ++) { x_tmp = 0 ; next_bit = (8 j ) & 127; x_tmp |= b i t [ n e x t _ b i t ] << 7 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 6 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 5 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 4 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 3 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 2 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 1 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] ; x [ j +16] = x_tmp ; } 58 / permutation b u t not on t h e l a s t l o o p / 59 i f ( i < 8) 60 f o r ( j =0; j <16; j ++) { 61 x [ j +16] = 0 ; 62 next_bit = (8 j ) & 127; 63 x [ j +16] |= b i t [ n e x t _ b i t ] << 7 ; 64 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 65 x [ j +16] |= b i t [ n e x t _ b i t ] << 6 ; 66 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 67 x [ j +16] |= b i t [ n e x t _ b i t ] << 5 ; 68 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 69 x [ j +16] |= b i t [ n e x t _ b i t ] << 4 ; 70 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 71 x [ j +16] |= b i t [ n e x t _ b i t ] << 3 ; 72 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 73 x [ j +16] |= b i t [ n e x t _ b i t ] << 2 ; 74 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 75 x [ j +16] |= b i t [ n e x t _ b i t ] << 1 ; 76 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 77 x [ j +16] |= b i t [ n e x t _ b i t ] ; 78 }
8.3 Now the j-loop of the permutation gets substituted by macro code.
#d e f i n e l o o p 2 ( j ) x_tmp = 0 ; next_bit = (8 j ) & 127; x_tmp |= b i t [ n e x t _ b i t ] << 7 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 6 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 5 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 4 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 3 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; \ \ \ \ \ \ \ \ \ \ \ \
25
x_tmp |= b i t [ n e x t _ b i t ] << 2 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 1 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] ; x [ j +16] = x_tmp ;
\ \ \ \ \ \
/ permutation b u t not on t h e i f ( i < 8) { loop2 ( 0 ) ; loop2 ( 1 ) ; loop2 ( 2 ) ; loop2 ( 3 ) ; loop2 ( 4 ) ; loop2 ( 5 ) ; loop2 ( 6 ) ; loop2 ( 7 ) ; loop2 ( 8 ) ; loop2 ( 9 ) ; loop2 ( 1 0 ) ; loop2 ( 1 1 ) ; loop2 ( 1 2 ) ; loop2 ( 1 3 ) ; loop2 ( 1 4 ) ; loop2 ( 1 5 ) ; }
last
l o o p /
58 / permutation b u t not on t h e l a s t l o o p / 59 i f ( i < 8) 60 f o r ( j =0; j <16; j ++) { 61 x_tmp = 0 ; 62 next_bit = (8 j ) & 127; 63 x_tmp |= b i t [ n e x t _ b i t ] << 7 ; 64 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 65 x_tmp |= b i t [ n e x t _ b i t ] << 6 ; 66 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 67 x_tmp |= b i t [ n e x t _ b i t ] << 5 ; 68 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 69 x_tmp |= b i t [ n e x t _ b i t ] << 4 ; 70 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 71 x_tmp |= b i t [ n e x t _ b i t ] << 3 ; 72 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 73 x_tmp |= b i t [ n e x t _ b i t ] << 2 ; 74 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 75 x_tmp |= b i t [ n e x t _ b i t ] << 1 ; 76 n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; 77 x_tmp |= b i t [ n e x t _ b i t ] ; 78 x [ j +16] = x_tmp ; 79 }
4.2. Results
REFERENCE MOD1 MOD2 MOD3 MOD4 MOD5 MOD6.1 MOD6.2 MOD7 MOD8.1 MOD8.2 MOD8.3 FINAL Rank Standard Increase Size 903 3204 924 3.33% 3256 916 1.44% 2988 1266 40.20% 2984 926 2.55% 2916 933 3.32% 3160 1436 59.03% 3996 1510 67.22% 3752 979 8.42% 3292 1005 11.30% 3456 1036 14.73% 3244 1046 15.84% 9060 2399 165.67% 9920 Optimized 1639 1704 1639 3653 1703 1781 3947 3961 1719 1753 1788 1810 7066 Increase Size 2336 3.97% 2608 0% 2336 122.88% 2296 3.90% 2292 8.66% 2456 140.82% 2684 141.67% 2568 4.88% 2556 6.96% 2484 9.09% 2412 1.04% 4204 331.12% 4628
Table 4.1.: PSO 450, Intel Strongarm SA-110, 266 MHZ (RISC Processor)
26
REFERENCE MOD1 MOD2 MOD3 MOD4 MOD5 MOD6.1 MOD6.2 MOD7 MOD8.1 MOD8.2 MOD8.3 FINAL
Rank Standard Increase Size 6436 6960 6536 1.55% 7028 6438 0.03% 6960 11518 78.96% 6864 6536 1.55% 6964 6467 0.48% 6896 10501 63.16% 7632 10620 65.01% 7600 6900 7.21% 6992 8178 27.07% 7248 8342 29.61% 7152 8454 31.35% 12688 27478 326.94% 13396
Optimized 10051 11179 11052 25318 11290 12317 38168 41947 11796 12620 12622 13064 90893
Increase Size 6604 1.16% 6672 0% 6604 129.10% 6540 2.16% 6636 11.46% 6412 245.38% 6636 279.58% 6540 6.74% 6604 14.20% 6700 14.22% 6700 18.22% 7692 722.49% 7824
Table 4.2.: Intel Pentium III (Coppermine), 864.214 MHz(CISC Processor)
27
5. Conclusion
In this work one could see that the way of programming has much inuence on the binary codes performance. To get an idea of how to increase the performance one should always try to imagine how the processor works. It is for example well known that computers work on the basis of binary values. In fact processors are very fast in binary operations. Moreover a modulo calculation is usually realized by repetitive substractions until the result is smaller then the modulus, which can be very time consuming. Thus one should try to nd another way to calculate the result, what we did in the third modication. It depends on the point of view, which modication is considered as the best one. One can just take the absolute increase of performance into account but one can also try to normalise the results. To normalise the results one can take the aected lines of C-code but one can also be of the opinion that the C-code is not representative because it does not reect the resulting binary code. Thus it could be more realistic to normalise the performance by the pure code size1 . Since there are dierent ways to appraise the results the author doesnt want to make a ranking. But from any point of view the sixth modiation is the most eective one. Finally one can say that a performance tuning should include the following considerations: 1. Try to get rid of loops. Try to substitute them by macro code or hard coding. 2. Try to use as few addresses as possible, e.g. handle multiple parameters in one structure. 3. Try to get rid of computational expensive operations if they appear several times, e.g. substitute the modulo-operand by its equivalent bit-operation. Anyway the performance of a program is always a function of the algorithm, skill of the developer and nally the compiler. Moreover bear in mind that there is no security
1
The binary code usually includes some comments and should be cleaned up with a programm like strip
28
associated with the actual implemented FM. With sending the encrypted key enc(Ki ) to the HSM it would greatly decrease the performance2 due to the additional needed decryption.
maybe about 50%
29
A. C- Sourcecode
A.1. Reference Code
/ An implementation o f t h e GSM A3A8 a l g o r i t h m . / ( S p e c i f i c a l l y , COMP128. )
/ C o p y r i g h t 1998 , Marc Briceno , Ian Goldberg , and David Wagner . All r i g h t s reserved . / / For e x p o s i t o r y p u r p o s e s o n l y . Coded i n C merely b e c a us e C i s a much more p r e c i s e , c o n c i s e form o f e x p r e s s i o n f o r t h e s e p u r p o s e s . See Judge P a t e l i f you have any problems w i t h t h i s . . . Of course , i t s o n l y a u t h e n t i c a t i o n , so i t s h o u l d be e x p o r t a b l e f o r t h e usual boring reasons . / typedef unsigned char Byte ; #include < s t d i o . h> / #d e f i n e TEST / / rand [ 0 . . 1 5 ] : t h e c h a l l e n g e from t h e b a s e s t a t i o n key [ 0 . . 1 5 ] : t h e SIM s A3/A8 longterm key Ki simoutput [ 0 . . 1 1 ] : what you d g e t back i f you f e d rand and key t o a r e a l SIM . The GSM s p e c s t a t e s t h a t simoutput [ 0 . . 3 ] i s SRES, and simoutput [ 4 . . 1 1 ] i s Kc ( t h e A5 s e s s i o n key ) . ( See GSM 1 1 . 1 1 , S e c t i o n 8 . 1 6 . See a l s o t h e l e a k e d document r e f e r e n c e d below . ) Note t h a t Kc i s b i t s 7 4 . . 1 2 7 o f t h e COMP128 output , f o l l o w e d by 10 zeros . In o t h e r words , A5 i s keyed w i t h o n l y 54 b i t s o f e n t r o p y . This r e p r e s e n t s a d e l i b e r a t e weakening o f t h e key used f o r v o i c e p r i v a c y by a f a c t o r o f ove r 1000. V e r i f i e d w i t h a P a c i f i c B e l l Schlumberger SIM . Your m i l e a g e may vary . Marc Briceno <marc@scard . org >, Ian Goldberg <iang@cs . b e r k e l e y . edu >, and David Wagner <daw@cs . b e r k e l e y . edu> / void A3A8 ( / i n / Byte rand [ 1 6 ] , / i n / Byte key [ 1 6 ] , / out / Byte s i m o u t p u t [ 1 2 ] ) ; / The compression t a b l e s . / s t a t i c const Byte t a b l e _ 0 [ 5 1 2 ] = { 102 ,177 ,186 ,162 , 2 ,156 ,112 , 75 , 55 , 25 , 8 , 12 ,251 ,193 ,246 ,188 , 109 ,213 ,151 , 53 , 42 , 79 ,191 ,115 ,233 ,242 ,164 ,223 ,209 ,148 ,108 ,161 , 252 , 37 ,244 , 47 , 64 ,211 , 6 ,237 ,185 ,160 ,139 ,113 , 76 ,138 , 59 , 70 , 67 , 26 , 13 ,157 , 63 ,179 ,221 , 30 ,214 , 36 ,166 , 69 ,152 ,124 ,207 ,116 , 247 ,194 , 41 , 84 , 71 , 1 , 49 , 14 , 95 , 35 ,169 , 21 , 96 , 78 ,215 ,225 , 182 ,243 , 28 , 92 ,201 ,118 , 4 , 74 ,248 ,128 , 17 , 11 ,146 ,132 ,245 , 48 , 149 , 90 ,120 , 39 , 87 ,230 ,106 ,232 ,175 , 19 ,126 ,190 ,202 ,141 ,137 ,176 ,
30
},
},
},
},
}, /
250 , 27 ,101 , 40 ,219 ,227 , 58 , 20 , 51 ,178 , 98 ,216 ,140 , 22 , 32 ,121 , 61 ,103 ,203 , 72 , 29 ,110 , 85 ,212 ,180 ,204 ,150 ,183 , 15 , 66 ,172 ,196 , 56 ,197 ,158 , 0 ,100 , 45 ,153 , 7 ,144 ,222 ,163 ,167 , 60 ,135 ,210 ,231 , 174 ,165 , 38 ,249 ,224 , 34 ,220 ,229 ,217 ,208 ,241 , 68 ,206 ,189 ,125 ,255 , 239 , 54 ,168 , 89 ,123 ,122 , 73 ,145 ,117 ,234 ,143 , 99 ,129 ,200 ,192 , 82 , 104 ,170 ,136 ,235 , 93 , 81 ,205 ,173 ,236 , 94 ,105 , 52 , 46 ,228 ,198 , 5, 57 ,254 , 97 ,155 ,142 ,133 ,199 ,171 ,187 , 50 , 65 ,181 ,127 ,107 ,147 ,226 , 184 ,218 ,131 , 33 , 77 , 86 , 31 , 44 , 88 , 62 ,238 , 18 , 24 , 43 ,154 , 23 , 80 ,159 ,134 ,111 , 9 ,114 , 3 , 91 , 16 ,130 , 83 , 10 ,195 ,240 ,253 ,119 , 177 ,102 ,162 ,186 ,156 , 2 , 75 ,112 , 25 , 55 , 12 , 8 ,193 ,251 ,188 ,246 , 213 ,109 , 53 ,151 , 79 , 42 ,115 ,191 ,242 ,233 ,223 ,164 ,148 ,209 ,161 ,108 , 37 ,252 , 47 ,244 ,211 , 64 ,237 , 6 ,160 ,185 ,113 ,139 ,138 , 76 , 70 , 59 , 26 , 67 ,157 , 13 ,179 , 63 , 30 ,221 , 36 ,214 , 69 ,166 ,124 ,152 ,116 ,207 , 194 ,247 , 84 , 41 , 1 , 71 , 14 , 49 , 35 , 95 , 21 ,169 , 78 , 96 ,225 ,215 , 243 ,182 , 92 , 28 ,118 ,201 , 74 , 4 ,128 ,248 , 11 , 17 ,132 ,146 , 48 ,245 , 90 ,149 , 39 ,120 ,230 , 87 ,232 ,106 , 19 ,175 ,190 ,126 ,141 ,202 ,176 ,137 , 27 ,250 , 40 ,101 ,227 ,219 , 20 , 58 ,178 , 51 ,216 , 98 , 22 ,140 ,121 , 32 , 103 , 61 , 72 ,203 ,110 , 29 ,212 , 85 ,204 ,180 ,183 ,150 , 66 , 15 ,196 ,172 , 197 , 56 , 0 ,158 , 45 ,100 , 7 ,153 ,222 ,144 ,167 ,163 ,135 , 60 ,231 ,210 , 165 ,174 ,249 , 38 , 34 ,224 ,229 ,220 ,208 ,217 , 68 ,241 ,189 ,206 ,255 ,125 , 54 ,239 , 89 ,168 ,122 ,123 ,145 , 73 ,234 ,117 , 99 ,143 ,200 ,129 , 82 ,192 , 170 ,104 ,235 ,136 , 81 , 93 ,173 ,205 , 94 ,236 , 52 ,105 ,228 , 46 , 5 ,198 , 254 , 57 ,155 , 97 ,133 ,142 ,171 ,199 , 50 ,187 ,181 , 65 ,107 ,127 ,226 ,147 , 218 ,184 , 33 ,131 , 86 , 77 , 44 , 31 , 62 , 88 , 18 ,238 , 43 , 24 , 23 ,154 , 159 , 80 ,111 ,134 ,114 , 9 , 91 , 3 ,130 , 16 , 10 , 83 ,240 ,195 ,119 ,253 table_1 [ 2 5 6 ] = { 19 , 11 , 80 ,114 , 43 , 1 , 69 , 94 , 39 , 18 ,127 ,117 , 97 , 3 , 85 , 43 , 27 ,124 , 70 , 83 , 47 , 71 , 63 , 10 , 47 , 89 , 79 , 4 , 14 , 59 , 11 , 5, 35 ,107 ,103 , 68 , 21 , 86 , 36 , 91 , 85 ,126 , 32 , 50 ,109 , 94 ,120 , 6, 53 , 79 , 28 , 45 , 99 , 95 , 41 , 34 , 88 , 68 , 93 , 55 ,110 ,125 ,105 , 20 , 90 , 80 , 76 , 96 , 23 , 60 , 89 , 64 ,121 , 56 , 14 , 74 ,101 , 8 , 19 , 78 , 76 , 66 ,104 , 46 ,111 , 50 , 32 , 3 , 39 , 0 , 58 , 25 , 92 , 22 , 18 , 51 , 57 , 65 ,119 ,116 , 22 ,109 , 7 , 86 , 59 , 93 , 62 ,110 , 78 , 99 , 77 , 67 , 12 ,113 , 87 , 98 ,102 , 5 , 88 , 33 , 38 , 56 , 23 , 8 , 75 , 45 , 13 , 75 , 95 , 63 , 28 , 49 ,123 ,120 , 20 ,112 , 44 , 30 , 15 , 98 ,106 , 2 ,103 , 29 , 82 ,107 , 42 ,124 , 24 , 30 , 41 , 16 ,108 ,100 ,117 , 40 , 73 , 40 , 7 ,114 , 82 ,115 , 36 ,112 , 12 ,102 ,100 , 84 , 92 , 48 , 72 , 97 , 9 , 54 , 55 , 74 , 113 ,123 , 17 , 26 , 53 , 58 , 4, 9 , 69 ,122 , 21 ,118 , 42 , 60 , 27 , 73 , 118 ,125 , 34 , 15 , 65 ,115 , 84 , 64 , 62 , 81 , 70 , 1 , 24 ,111 ,121 , 83 , 104 , 81 , 49 ,127 , 48 ,105 , 31 , 10 , 6 , 91 , 87 , 37 , 16 , 54 ,116 ,126 , 31 , 38 , 13 , 0 , 72 ,106 , 77 , 61 , 26 , 67 , 46 , 29 , 96 , 37 , 61 , 52 , 101 , 17 , 44 ,108 , 71 , 52 , 66 , 57 , 33 , 51 , 25 , 90 , 2 , 1 1 9 , 1 2 2 , 35 table_2 [ 1 2 8 ] = { 52 , 50 , 44 , 6 , 21 , 49 , 41 , 59 , 39 , 51 , 25 , 32 , 51 , 47 , 52 , 43 , 37 , 4 , 40 , 34 , 61 , 12 , 28 , 4 , 58 , 23 , 8 , 15 , 12 , 22 , 9 , 18 , 55 , 10 , 33 , 35 , 50 , 1 , 43 , 3 , 57 , 13 , 62 , 14 , 7 , 42 , 44 , 59 , 62 , 57 , 27 , 6, 8 , 31 , 26 , 54 , 41 , 22 , 45 , 20 , 39 , 3 , 16 , 56 , 48 , 2 , 21 , 28 , 36 , 42 , 60 , 33 , 34 , 18 , 0 , 11 , 24 , 10 , 17 , 61 , 29 , 14 , 45 , 26 , 55 , 46 , 11 , 17 , 54 , 46 , 9 , 24 , 30 , 60 , 32 , 0, 20 , 38 , 2 , 30 , 58 , 35 , 1 , 16 , 56 , 40 , 23 , 48 , 13 , 19 , 19 , 27 , 31 , 53 , 47 , 38 , 63 , 15 , 49 , 5 , 37 , 53 , 25 , 36 , 63 , 29 , 5, 7 table_3 [ 6 4 ] = { 1, 5 , 29 , 6 , 25 , 1 , 18 , 23 , 17 , 19 , 0, 9 , 24 , 25 , 6 , 31 , 28 , 20 , 24 , 30 , 4 , 27 , 3 , 13 , 15 , 16 , 14 , 18 , 4, 3, 8, 9, 20 , 0 , 12 , 26 , 21 , 8 , 28 , 2 , 29 , 2 , 15 , 7 , 11 , 22 , 14 , 10 , 17 , 21 , 12 , 30 , 26 , 27 , 16 , 31 , 11 , 7 , 13 , 23 , 10 , 5 , 2 2 , 19 table_4 [ 3 2 ] = { 15 , 12 , 10 , 4, 1 , 14 , 11 , 7, 5, 0 , 14 , 7, 1, 2 , 13 , 8, 10 , 3, 4, 9, 6, 0, 3, 2, 5, 6, 8, 9 , 1 1 , 1 3 , 1 5 , 12 t a b l e [ 5 ] = { table_0 , table_1 , table_2 , table_3 , table_4 } ;
This code d e r i v e d from a l e a k e d document from t h e GSM s t a n d a r d s . Some m i s s i n g p i e c e s were f i l l e d i n by r e v e r s e e n g i n e e r i n g a working SIM . We have v e r i f i e d t h a t t h i s i s t h e c o r r e c t COMP128 a l g o r i t h m . The f i r s t page o f t h e document i d e n t i f i e s i t as _Technical I n f o r m a t i o n : GSM System S e c u r i t y Study_ . 10161701, 10 t h June 1988. The bottom o f t h e t i t l e page i s marked Racal Research Ltd . Worton Drive , Worton Grange I n d u s t r i a l E s t a t e ,
31
Reading , Berks . RG2 0SB, England . Telephone : Reading (0734) 868601 Telex : 847152 The r e l e v a n t b i t s are i n Part I , S e c t i o n 20 ( pages 6667). Enjoy ! Note : There are t h r e e t y p o s i n t h e s p e c ( d i s c o v e r e d by r e v e r s e e n g i n e e r i n g ) . F i r s t , " z = (2 x [ n ] + x [ n ] ) mod 2^(9 j )" s h o u l d c l e a r l y read " z = (2 x [m] + x [ n ] ) mod 2^(9 j ) " . Second , t h e "k" l o o p i n t h e "Form b i t s from b y t e s " s e c t i o n i s s e v e r e l y b o t c h e d : t h e k i n d e x s h o u l d run o n l y from 0 t o 3 , and c l e a r l y t h e range on " t h e (8k ) t h b i t o f b y t e j " i s a l s o o f f ( s h o u l d be 0 . . 7 , not 1 . . 8 , t o be c o n s i s t e n t w i t h t h e s u b s e q u e n t s e c t i o n ) . Third , SRES i s t a k e n from t h e f i r s t 8 n i b b l e s o f x [ ] , not t h e l a s t 8 as claimed i n t h e document . (And t h e document doesn t s p e c i f y how Kc i s d e r i v e d , b u t t h a t was a l s o e a s i l y d i s c o v e r e d w i t h r e v e r s e e n g i n e e r i n g . ) A l l o f t h e s e t y p o s have been c o r r e c t e d i n t h e f o l l o w i n g code . / void A3A8 ( / i n / Byte rand [ 1 6 ] , / i n / Byte key [ 1 6 ] , / out / Byte s i m o u t p u t [ 1 2 ] ) { Byte x [ 3 2 ] , b i t [ 1 2 8 ] ; i n t i , j , k , l , m, n , y , z , n e x t _ b i t ; / ( Load RAND i n t o l a s t 16 b y t e s o f i n p u t ) / f o r ( i =16; i <32; i ++) x [ i ] = rand [ i 1 6 ] ; / ( Loop e i g h t t i m e s ) / f o r ( i =1; i <9; i ++) { / ( Load key i n t o f i r s t 16 b y t e s o f i n p u t ) / f o r ( j =0; j <16; j ++) x [ j ] = key [ j ] ; / ( Perform s u b s t i t u t i o n s ) / f o r ( j =0; j <5; j ++) f o r ( k =0; k<(1<< j ) ; k++) f o r ( l =0; l <(1<<(4 j ) ) ; l ++) { m = l + k(1<<(5 j ) ) ; n = m + (1<<(4 j ) ) ; y = ( x [m]+2 x [ n ] ) % (1<<(9 j ) ) ; z = ( 2 x [m]+ x [ n ] ) % (1<<(9 j ) ) ; x [m] = t a b l e [ j ] [ y ] ; x [ n ] = table [ j ] [ z ] ; } / ( Form b i t s from b y t e s ) / f o r ( j =0; j <32; j ++) f o r ( k =0; k <4; k++) b i t [ 4 j+k ] = ( x [ j ]>>(3k ) ) & 1 ; / ( Permutation b u t not on t h e l a s t l o o p ) / i f ( i < 8) f o r ( j =0; j <16; j ++) { x [ j +16] = 0 ; f o r ( k =0; k <8; k++) { next_bit = ((8 j + k )17) % 128; x [ j +16] |= b i t [ n e x t _ b i t ] << (7k ) ; } } } / ( At t h i s s t a g e t h e v e c t o r x [ ] c o n s i s t s o f 32 n i b b l e s . The f i r s t 8 o f t h e s e are t a k e n as t h e o u t p u t SRES. ) / / The remainder o f t h e code i s not g i v e n e x p l i c i t l y standard , b u t was d e r i v e d by r e v e r s e e n g i n e e r i n g . / in the
f o r ( i =0; i <4; i ++) s i m o u t p u t [ i ] = ( x [ 2 i ]<<4) | x [ 2 i + 1 ] ; f o r ( i =0; i <6; i ++) s i m o u t p u t [4+ i ] = ( x [ 2 i +18]<<6) | ( x [ 2 i +18+1]<<2)
32
| ( x [ 2 i +18+2]> >2); s i m o u t p u t [ 4 + 6 ] = ( x [26+18] < <6) | ( x [26+18+1] < <2); simoutput [4+7] = 0 ; }
#i f d e f TEST i n t h e x t o i n t ( char x ) { x = toupper ( x ) ; i f ( x >= A && x <= F ) return x A +10; e l s e i f ( x >= 0 && x <= 9 ) return x 0 ; f p r i n t f ( s t d e r r , " bad i n p u t . \ n" ) ; exit (1); } i n t main ( i n t a r g c , char a r g v ) { Byte key [ 1 6 ] , rand [ 1 6 ] , s i m o u t p u t [ 1 2 ] ; int i ; if ( a r g c != 3 s t r l e n ( a r g v [ 1 ] ) != 34 | | s t r l e n ( a r g v [ 2 ] ) != 34 | | strncmp ( a r g v [ 1 ] , " 0 x " , 2 ) != 0 | | strncmp ( a r g v [ 2 ] , " 0 x " , 2 ) != 0 ) { f p r i n t f ( s t d e r r , " Usage : %s 0x<key> 0x<rand >\n" , a r g v [ 0 ] ) ; exit (1); ||
} f o r ( i =0; i <16; i ++) key [ i ] = ( h e x t o i n t ( a r g v [ 1 ] [ 2 i +2])<<4) | h e x t o i n t ( argv [ 1 ] [ 2 i + 3 ] ) ; f o r ( i =0; i <16; i ++) rand [ i ] = ( h e x t o i n t ( a r g v [ 2 ] [ 2 i +2])<<4) | h e x t o i n t ( argv [ 2 ] [ 2 i + 3 ] ) ; A3A8 ( key , rand , s i m o u t p u t ) ; p r i n t f ( " simoutput : " ) ; f o r ( i =0; i <12; i ++) p r i n t f ( "%02X" , s i m o u t p u t [ i ] ) ; p r i n t f ( " \n" ) ; return 0 ; } #e n d i f
A.2. Optimized Code

#include #include #include #include #include #include #include < s t d i o . h> < s t d l i b . h> < c s a 8 h i f a c e . h> < s e r i a l . h> <fmsw . h> <fm . h> < s t r i n g . h> \ \ \ \ \ \ \
#d e f i n e l o o p ( j ) f o r ( k =0; k <16; k++) { m p a i r s _##j [ k ] ; =s n=s p a i r s _##j [ k + 1 6 ] ; y=(x [m] + ( x [ n]<<1))&((1<<(9 j ) ) 1 ) ; z =(( x [m]<<1)+x [ n])&((1 < <(9 j ) ) 1 ) ; x [m]= t a b l e _##j [ y ] ; x [ n]= t a b l e _##j [ z ] ; }
/ sb o x e s / const unsigned char t a b l e _ 0 [ 5 1 2 ] = { 102 ,177 ,186 ,162 , 2 ,156 ,112 , 75 , 55 , 25 , 8 , 12 ,251 ,193 ,246 ,188 , 109 ,213 ,151 , 53 , 42 , 79 ,191 ,115 ,233 ,242 ,164 ,223 ,209 ,148 ,108 ,161 , 252 , 37 ,244 , 47 , 64 ,211 , 6 ,237 ,185 ,160 ,139 ,113 , 76 ,138 , 59 , 70 , 67 , 26 , 13 ,157 , 63 ,179 ,221 , 30 ,214 , 36 ,166 , 69 ,152 ,124 ,207 ,116 ,
33
247 ,194 , 41 , 84 , 71 , 1 , 49 , 14 , 95 , 35 ,169 , 21 , 96 , 78 ,215 ,225 , 182 ,243 , 28 , 92 ,201 ,118 , 4 , 74 ,248 ,128 , 17 , 11 ,146 ,132 ,245 , 48 , 149 , 90 ,120 , 39 , 87 ,230 ,106 ,232 ,175 , 19 ,126 ,190 ,202 ,141 ,137 ,176 , 250 , 27 ,101 , 40 ,219 ,227 , 58 , 20 , 51 ,178 , 98 ,216 ,140 , 22 , 32 ,121 , 61 ,103 ,203 , 72 , 29 ,110 , 85 ,212 ,180 ,204 ,150 ,183 , 15 , 66 ,172 ,196 , 56 ,197 ,158 , 0 ,100 , 45 ,153 , 7 ,144 ,222 ,163 ,167 , 60 ,135 ,210 ,231 , 174 ,165 , 38 ,249 ,224 , 34 ,220 ,229 ,217 ,208 ,241 , 68 ,206 ,189 ,125 ,255 , 239 , 54 ,168 , 89 ,123 ,122 , 73 ,145 ,117 ,234 ,143 , 99 ,129 ,200 ,192 , 82 , 104 ,170 ,136 ,235 , 93 , 81 ,205 ,173 ,236 , 94 ,105 , 52 , 46 ,228 ,198 , 5, 57 ,254 , 97 ,155 ,142 ,133 ,199 ,171 ,187 , 50 , 65 ,181 ,127 ,107 ,147 ,226 , 184 ,218 ,131 , 33 , 77 , 86 , 31 , 44 , 88 , 62 ,238 , 18 , 24 , 43 ,154 , 23 , 80 ,159 ,134 ,111 , 9 ,114 , 3 , 91 , 16 ,130 , 83 , 10 ,195 ,240 ,253 ,119 , 177 ,102 ,162 ,186 ,156 , 2 , 75 ,112 , 25 , 55 , 12 , 8 ,193 ,251 ,188 ,246 , 213 ,109 , 53 ,151 , 79 , 42 ,115 ,191 ,242 ,233 ,223 ,164 ,148 ,209 ,161 ,108 , 37 ,252 , 47 ,244 ,211 , 64 ,237 , 6 ,160 ,185 ,113 ,139 ,138 , 76 , 70 , 59 , 26 , 67 ,157 , 13 ,179 , 63 , 30 ,221 , 36 ,214 , 69 ,166 ,124 ,152 ,116 ,207 , 194 ,247 , 84 , 41 , 1 , 71 , 14 , 49 , 35 , 95 , 21 ,169 , 78 , 96 ,225 ,215 , 243 ,182 , 92 , 28 ,118 ,201 , 74 , 4 ,128 ,248 , 11 , 17 ,132 ,146 , 48 ,245 , 90 ,149 , 39 ,120 ,230 , 87 ,232 ,106 , 19 ,175 ,190 ,126 ,141 ,202 ,176 ,137 , 27 ,250 , 40 ,101 ,227 ,219 , 20 , 58 ,178 , 51 ,216 , 98 , 22 ,140 ,121 , 32 , 103 , 61 , 72 ,203 ,110 , 29 ,212 , 85 ,204 ,180 ,183 ,150 , 66 , 15 ,196 ,172 , 197 , 56 , 0 ,158 , 45 ,100 , 7 ,153 ,222 ,144 ,167 ,163 ,135 , 60 ,231 ,210 , 165 ,174 ,249 , 38 , 34 ,224 ,229 ,220 ,208 ,217 , 68 ,241 ,189 ,206 ,255 ,125 , 54 ,239 , 89 ,168 ,122 ,123 ,145 , 73 ,234 ,117 , 99 ,143 ,200 ,129 , 82 ,192 , 170 ,104 ,235 ,136 , 81 , 93 ,173 ,205 , 94 ,236 , 52 ,105 ,228 , 46 , 5 ,198 , 254 , 57 ,155 , 97 ,133 ,142 ,171 ,199 , 50 ,187 ,181 , 65 ,107 ,127 ,226 ,147 , 218 ,184 , 33 ,131 , 86 , 77 , 44 , 31 , 62 , 88 , 18 ,238 , 43 , 24 , 23 ,154 , 159 , 80 ,111 ,134 ,114 , 9 , 91 , 3 ,130 , 16 , 10 , 83 ,240 ,195 ,119 ,253 }, table_1 [ 2 5 6 ] = { 19 , 11 , 80 ,114 , 43 , 1 , 69 , 94 , 39 , 18 ,127 ,117 , 97 , 3 , 85 , 43 , 27 ,124 , 70 , 83 , 47 , 71 , 63 , 10 , 47 , 89 , 79 , 4 , 14 , 59 , 11 , 5, 35 ,107 ,103 , 68 , 21 , 86 , 36 , 91 , 85 ,126 , 32 , 50 ,109 , 94 ,120 , 6, 53 , 79 , 28 , 45 , 99 , 95 , 41 , 34 , 88 , 68 , 93 , 55 ,110 ,125 ,105 , 20 , 90 , 80 , 76 , 96 , 23 , 60 , 89 , 64 ,121 , 56 , 14 , 74 ,101 , 8 , 19 , 78 , 76 , 66 ,104 , 46 ,111 , 50 , 32 , 3 , 39 , 0 , 58 , 25 , 92 , 22 , 18 , 51 , 57 , 65 ,119 ,116 , 22 ,109 , 7 , 86 , 59 , 93 , 62 ,110 , 78 , 99 , 77 , 67 , 12 ,113 , 87 , 98 ,102 , 5 , 88 , 33 , 38 , 56 , 23 , 8 , 75 , 45 , 13 , 75 , 95 , 63 , 28 , 49 ,123 ,120 , 20 ,112 , 44 , 30 , 15 , 98 ,106 , 2 ,103 , 29 , 82 ,107 , 42 ,124 , 24 , 30 , 41 , 16 ,108 ,100 ,117 , 40 , 73 , 40 , 7 ,114 , 82 ,115 , 36 ,112 , 12 ,102 ,100 , 84 , 92 , 48 , 72 , 97 , 9 , 54 , 55 , 74 , 113 ,123 , 17 , 26 , 53 , 58 , 4, 9 , 69 ,122 , 21 ,118 , 42 , 60 , 27 , 73 , 118 ,125 , 34 , 15 , 65 ,115 , 84 , 64 , 62 , 81 , 70 , 1 , 24 ,111 ,121 , 83 , 104 , 81 , 49 ,127 , 48 ,105 , 31 , 10 , 6 , 91 , 87 , 37 , 16 , 54 ,116 ,126 , 31 , 38 , 13 , 0 , 72 ,106 , 77 , 61 , 26 , 67 , 46 , 29 , 96 , 37 , 61 , 52 , 101 , 17 , 44 ,108 , 71 , 52 , 66 , 57 , 33 , 51 , 25 , 90 , 2 , 1 1 9 , 1 2 2 , 35 }, table_2 [ 1 2 8 ] = { 52 , 50 , 44 , 6 , 21 , 49 , 41 , 59 , 39 , 51 , 25 , 32 , 51 , 47 , 52 , 43 , 37 , 4 , 40 , 34 , 61 , 12 , 28 , 4 , 58 , 23 , 8 , 15 , 12 , 22 , 9 , 18 , 55 , 10 , 33 , 35 , 50 , 1 , 43 , 3 , 57 , 13 , 62 , 14 , 7 , 42 , 44 , 59 , 62 , 57 , 27 , 6, 8 , 31 , 26 , 54 , 41 , 22 , 45 , 20 , 39 , 3 , 16 , 56 , 48 , 2 , 21 , 28 , 36 , 42 , 60 , 33 , 34 , 18 , 0 , 11 , 24 , 10 , 17 , 61 , 29 , 14 , 45 , 26 , 55 , 46 , 11 , 17 , 54 , 46 , 9 , 24 , 30 , 60 , 32 , 0, 20 , 38 , 2 , 30 , 58 , 35 , 1 , 16 , 56 , 40 , 23 , 48 , 13 , 19 , 19 , 27 , 31 , 53 , 47 , 38 , 63 , 15 , 49 , 5 , 37 , 53 , 25 , 36 , 63 , 29 , 5, 7 }, table_3 [ 6 4 ] = { 1, 5 , 29 , 6 , 25 , 1 , 18 , 23 , 17 , 19 , 0, 9 , 24 , 25 , 6 , 31 , 28 , 20 , 24 , 30 , 4 , 27 , 3 , 13 , 15 , 16 , 14 , 18 , 4, 3, 8, 9, 20 , 0 , 12 , 26 , 21 , 8 , 28 , 2 , 29 , 2 , 15 , 7 , 11 , 22 , 14 , 10 , 17 , 21 , 12 , 30 , 26 , 27 , 16 , 31 , 11 , 7 , 13 , 23 , 10 , 5 , 2 2 , 19 }, table_4 [ 3 2 ] = { 15 , 12 , 10 , 4, 1 , 14 , 11 , 7, 5, 0 , 14 , 7, 1, 2 , 13 , 8, 10 , 3, 4, 9, 6, 0, 3, 2, 5, 6, 8, 9 , 1 1 , 1 3 , 1 5 , 12 };
/ l i s t o f c o r e s p o n d i n g t a b l e e n t r i e s , s o r t e d by rounds o f t h e hashf u n c t i o n . spairs_ ?[0 15] = m, spairs_ ?[16 31] = n /
34
const unsigned char spairs_0 [ 3 2 ] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 , 16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,25 ,26 ,27 ,28 ,29 ,30 ,31} , spairs_1 [ 3 2 ] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 , 8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,24 ,25 ,26 ,27 ,28 ,29 ,30 ,31} , spairs_2 [ 3 2 ] = {0 ,1 ,2 ,3 ,8 ,9 ,10 ,11 ,16 ,17 ,18 ,19 ,24 ,25 ,26 ,27 , 4 ,5 ,6 ,7 ,12 ,13 ,14 ,15 ,20 ,21 ,22 ,23 ,28 ,29 ,30 ,31} , spairs_3 [ 3 2 ] = {0 ,1 ,4 ,5 ,8 ,9 ,12 ,13 ,16 ,17 ,20 ,21 ,24 ,25 ,28 ,29 , 2 ,3 ,6 ,7 ,10 ,11 ,14 ,15 ,18 ,19 ,22 ,23 ,26 ,27 ,30 ,31} , spairs_4 [ 3 2 ] = {0 ,2 ,4 ,6 ,8 ,10 ,12 ,14 ,16 ,18 ,20 ,22 ,24 ,26 ,28 ,30 , 1 ,3 ,5 ,7 ,9 ,11 ,13 ,15 ,17 ,19 ,21 ,23 ,25 ,27 ,29 ,31}; struct parameter unsigned unsigned unsigned }; { char c h a l l e n g e ; char key ; char r e s p o n s e ;
s t a t i c void M e s s a g e H a n d l e r ( unsigned long t o k e n , void r e q B u f f e r , unsigned long r e q L e n g t h ) ; void comp128 ( s t r u c t p a r a m e t e r p a r ) ; FM_RV S t a r t u p ( void ) { FM_RV r v ; r v = FMSW_RegisterDispatch (FM_NUMBER_CUSTOM_FM, return r v ; }
MessageHandler ) ;
void M e s s a g e H a n d l e r ( unsigned long t o k e n , void r e q B u f f e r , unsigned long r e q L e n g t h ) { struct parameter par ; p a r . r e s p o n s e = SVC_GetReplyBuffer ( t o k e n , 12 s i z e o f ( unsigned char ) ) ; p a r . key = ( unsigned char ) r e q B u f f e r ; p a r . c h a l l e n g e = ( unsigned char ) ( r e q B u f f e r + 16 s i z e o f ( unsigned char ) ) ; comp128 ( p a r ) ; SVC_SendReply ( t o k e n , } 1);
void comp128 ( s t r u c t p a r a m e t e r p a r ) { unsigned char x [ 3 2 ] , b i t [ 1 2 8 ] , x_tmp ; i n t 3 2 i , j , k , m, n , y , z , n e x t _ b i t ; / s e t l a s t 16 b y t e s o f i n p u t t o par . c h a l l e n g e / ( ( unsigned long ) x ) [ 4 ] = ( ( unsigned long ) ( p a r . c h a l l e n g e ( ( unsigned long ) x ) [ 5 ] = ( ( unsigned long ) ( p a r . c h a l l e n g e ( ( unsigned long ) x ) [ 6 ] = ( ( unsigned long ) ( p a r . c h a l l e n g e ( ( unsigned long ) x ) [ 7 ] = ( ( unsigned long ) ( p a r . c h a l l e n g e / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / s e t f i r s t 16 b y t e s ( ( unsigned long ) x ) [ 0 ] ( ( unsigned long ) x ) [ 1 ] ( ( unsigned long ) x ) [ 2 ] ( ( unsigned long ) x ) [ 3 ] loop loop loop loop loop (0); (1); (2); (3); (4);
))[0]; ))[1]; ))[2]; ))[3];
of = = = =
i n p u t t o par . key / ( ( unsigned long ) ( p a r . key ( ( unsigned long ) ( p a r . key ( ( unsigned long ) ( p a r . key ( ( unsigned long ) ( p a r . key
))[0]; ))[1]; ))[2]; ))[3]; d e f i n i t i o n /
/ 5 rounds o f t h e h a s h f u n c t i o n r e f e r
/ form b i t s from b y t e s / f o r ( j =0; j <32; j ++) f o r ( k =0; k <4; k++) b i t [ ( j <<2)+k ] = ( x [ j ]>>(3k ) ) & 1 ; / permutation b u t not on t h e l a s t l o o p / i f ( i < 8) f o r ( j =0; j <16; j ++) {
35
x_tmp = 0 ; n e x t _ b i t = ( j <<3) & 1 2 7 ; / e q u a l s mod 128 / x_tmp |= b i t [ n e x t _ b i t ] << 7 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 6 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 5 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 4 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 3 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 2 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] << 1 ; n e x t _ b i t = ( n e x t _ b i t +17) & 1 2 7 ; x_tmp |= b i t [ n e x t _ b i t ] ; x [ j +16] = x_tmp ; } } / At t h i s p o i n t t h e xarray c o n s i s t s o f 32 low o r d e r n i b b l e s . The f i r s t 32 B i t o f t h a t array , so 8 elements , are used as t h e r e s p o n s e SRES. Marc Briceno e t a l found out by r e v e r s e e n g i n e e r i n g t h a t f u r t h e r m o r e t h e l a s t 54 B i t are used as a p a r t o f t h e A5 i n i t i a l i s a t i o n , w h i l e t h e remaining 10 B i t f o r t h e A5 are u s u a l l y s e t t o z e r o . / f o r ( i =0; i <4; i ++) ( p a r . r e s p o n s e + i ) = ( x [ i <<1]<<4) | x [ ( i < <1)+1]; f o r ( i =0; i <6; i ++) ( p a r . r e s p o n s e + 4 + i ) = ( x [ ( i <<1)+18]<<6) | ( x [ ( i <<1)+19]<<2) | ( x [ ( i <<1)+20]>>2); ( p a r . r e s p o n s e + 1 0 ) = ( x [30] < <6) | ( x [ 3 1 ] < < 2 ) ; ( par . r e s p o n s e + 11) = 0 ; }
36
Bibliography
[1] Philipp Suedmeyer: Die Stromchire A5, http://www.suedmeyer.net/paper/a5/A5.pdf [2] RSA Security, Homepage: http://www.rsasecurity.com [3] Marc Briceno, Ian Goldberg, and David Wagner: An Implementation Of The GSM A3A8 algorithm, 1998, http://www.iol.ie/kooltek/a3a8.txt [4] Anonymous: TECHNICAL INFORMATION: GSM System Security Study, http://jya.com/gsm061088.htm [5] Marc Briceno, Ian Goldberg, and David Wagner: GSM Cloning, http://www.isaac.cs.berkeley.edu/isaac/gsm-faq.html [6] Erik Zenner: Kryptographische Protokolle Im GSM-Standard, 1999, http://th.informatik.uni-mannheim.de/people/zenner/pub/thesis.ps.gz [7] J.R. Rao, P. Rohatgi, H. Scherzer, S. Tinguely: Partitioning Attacks: Or How To Rapidly Clone GSM Cards, 2002, http://csdl.computer.org/comp/proceedings/sp/2002/1543/00/15430031abs.htm [8] Jrgen Wolf: C Von A Bis Z, Galileo Computing [9] Gunther Lehmann, Bernhard Wunder, Manfred Selz: Schaltungsdesign mit VHDL, http://www.itiv.uni-karlsruhe.de/opencms/opencms/de/study/vhdl/book
37
[10] David Pellerin, Douglas Taylor: VHDL Made Easy, Prentice Hall [11] Paul Molitor, Jrg Ritter: Eine Einfhrung in VHDL, Pearson Studium [12] Peter J. Ashenden: The Designers Guide To VHDL, 2nd Edition, Morgan Kaufmann Publishers [13] Andreas Mder: VHDL Kompakt, http://tams-www.informatik.uni-hamburg.de
38
Index
A3 (authentication), 8 A8 (key generation), 8 Base Station Controller (BSC), 6 Base Transceiver Stations (BTS), 6 buttery structure, 11 chronometry, 20 client/server-architecture, 16 COMP128, 8 Cryptoki, 9 custom function, 16 Functionality Module (FM), 9 Hardware Security Module (HSM), 2 HSM, 8 HSM Access Provider, 9, 17 Operation And Maintenance Center (OMC), 7 Partitioning Attack, 13 PKCS, 9 PKCS #11, 9 ProtectProcessing Orange, 9 Public Key Cryptographic Standard, 9 Random Value RAND, 8 session key Kc , 8 Signed Response (SRES), 8 Signed Response (SRES), 8 SRES, 8 startup function, 18 Subscriber Identity Module (SIM), 6
uint32, 16 Individual Mobile Equipment Identity (IMEI), uint8, 16 7 International Mobile Subscriber Identity (IMSI), 6 Kc , 8 Mobile Stations (MS), 6 Mobile Switching Centers (MSC), 6 MS, 6 Narrow Pipe, 14
39

Comp128 Thesis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comp128 Thesis

Uploaded by

Copyright:

Available Formats

Student Research Project

A performance oriented implementation of

4. Code Optimization 21 4.1. Single Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5. Conclusion 28

A. C- Sourcecode 30 A.1. Reference Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A.2. Optimized Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 1.1.: The logical structure of GSM-networks

1.1.2. Link Connection

1.2. The Environment

1.2.2. PKCS #11

Cryptographic Token Interface

Figure 2.1.: Basic functionality of COMP128

Table 2.1.: Pairs of variables, built in loops (refer source code)

2.2. Security Vulnerabilities

excluding the cryptographic algorithms using COMP128-1

2.2.1. The Narrow Pipe Attack

2.2.2. The Partitioning Attack

3. Implementing the Algorithm

Figure 3.1.: Client-Server principle

3.1. The Host

3.2. The Functionality Module

The function FMSW_RegisterDispatch() is dened as follows:

3.2.2. The Core Function

4.1. Single Improvements

The second part of this expression is divisible by 2n : 1 2n

/ l o a d RAND i n t o l a s t 16 b y t e s memcpy(&( x [ 1 6 ] ) , rand , 1 6 ) ; / l o o p e i g h t t i m e s / f o r ( i =1; i <9; i ++) { / l o a d KEY i n t o l a s t 16 b y t e s memcpy(&( x [ 0 ] ) , key , 1 6 ) ;

loop loop loop loop loop

(0); (1); (2); (3); (4);

/ form b i t s from b y t e s / f o r ( j =0; j <32; j ++) { loop3 ( 0 ) ; loop3 ( 1 ) ; loop3 ( 2 ) ; loop3 ( 3 ) ; }

Table 4.2.: Intel Pentium III (Coppermine), 864.214 MHz(CISC Processor)

maybe about 50%

A.2. Optimized Code

/ l i s t o f c o r e s p o n d i n g t a b l e e n t r i e s , s o r t e d by rounds o f t h e hashf u n c t i o n . spairs_ ?[0 15] = m, spairs_ ?[16 31] = n /

))[0]; ))[1]; ))[2]; ))[3];

))[0]; ))[1]; ))[2]; ))[3]; d e f i n i t i o n /

You might also like