BULETINUL INSTITUTULUI POLITEHNIC DIN IAŞI Publicat de Universitatea Tehnică Gheorghe Asachi din Iaşi Tomul LV (LIX), Fasc. 1, 2009 SecŃia AUTOMATICĂ şi CALCULATOARE DISTRIBUTED DIFFERENTIAL CRIPTANALYSIS OF FEAL - 8 BY MIHAI HORIA ZAHARIA and *EUGEN CAZACU Abstract. In this paper a distributed approach for differential cryptanalysis of Fast Data Encipherment Algorithm (FEAL) 8 is presented. Because that algorithm is intensive computational, a mesh mapping using hypercube routing approach is used. A centralized client-server implementation was chosen. The simple partition of the problem is used in client job generation. Also, an example of MPI code is presented in order to underline the method appliance. \ Key words: distributed computing, differential attack, Feistel Cipher. 2000 Mathematics Subject Classification: 68P25, 68N19. 1. Introduction The differential cryptanalysis represents a theoretical method developed to reduce the searchable solution space in the case of attack of Feistel based ciphers or stream ciphers. The official history of cryptanalysis had begun in 1980 when Eli Biham and Adi Shamir published a Data Encryption Standard (DES) attack analysis using this technique. Anyhow, in a presentation made in 2006 about the history of cryptanalysis, Eli Biham claimed that the IBM knew about technique from 1974 and also that NSA knew. This assumption is based on the paper of Don Coppersmith [4]. Not all modern algorithms are sensitive to this type of attack (e.g. Advanced Encryption Standard - AES). There are also many varieties of these techniques, like: high order, truncated, impossible differential cryptanalysis, and the latest boomerang attack. That proves the high importance of this approach in order to maintain studies to improve the speed of the process. After DES, the next targets were FEAL 4, 8 and NX versions were studied by the cryptanalysts by using this method. The classic approach using
52 Mihai Horia Zaharia and Eugen Cazacu the known plain text approach was the most efficient one. Although for newer classes of improved ciphers the time begins to be unfeasible if the algorithm is executed onto a single machine. Of course, a supercomputer may be used but the costs become too high. In this context, a distributed approach clearly provides us a cheap method to increase the efficiency of this type of attack. In this paper a way of using distributing computing to decrease the method execution time is presented. The solution is very scalable and can be implemented into any type of cluster using any type of operating system 2. The FEAL 8 Algorithm Feistel based ciphers are yet used due to their simplicity that allows to be used in various applications within a medium security and lower or medium available computing power needs by the designers both at hardware or software levels. Fast Data Encipherment Algorithm or FEAL is a Feistel based cipher similar with DES but using a much simpler f function presented in Fig. 1 [6]. Fig. 1 FEAL - 8 base function. This algorithm class was designed especially for 8-bits microcontrollers. The result is that the algorithm use only byte oriented operations and avoids bits permutations as the look-up tables use. Unfortunately, this comes with a price, as is expected FEAL- 4 or 8 were proven to be more sensitive to attacks than the old DES. Even the increase of key length to 16 or 32 did not offer a security greater than DES [7].
Bul. Inst. Polit. Iaşi, t. LV (LIX), f. 1, 2009 53 The function f(a,y) depicted in Fig. 1 will map a 32 x 16 bits pair into a 32 bits output value. In this situation, only two 8 x 8 bits substitution tables also known as S-boxes are used. Each of them will map a 8 x 8 entry into a 8 bits output as it is presented in Table 1. Table 1 Returned Value U = (U 0, U 1, U 2, U 3 ) by the Functions f and f k U f(a,y) U f k (A,B) t 1 = (A 0 A 1 ) Y 0 A 0 A 1 t 2 = U 1 = U 2 = U 0 = U 3 = (A 2 A 3 ) Y 1 S 1 (t 1,t 2 ) S 0 (t 2,U 1 ) S 0 (A 0,U 1 ) S 1 (A 3,U 2 ) A 2 A 3 S 1 (t 1,t 2 B 0 ) S 0 (t 2,U 1 B 1 ) S 0 (A 0, U 1 B 2 ) S 1 (A 3, U 2 B 3 ) Each S-box sums one bit d {0,1 at the x and y arguments without taking into account the resulted carry and rotates the result as in Eq. (1). (1) S d ( x, y) = rotate twice( x+ y+ d mod 256). The key generator uses a function f k ( A, B), similar with the function f presented in Table 1, where A i, B i, Y i, t i and U i are represented using 8 bits, that will map two 32 bits entry into a 32 bits output. Most of the fundamental operators used are linear except the 256 modulo sum. This implies a fast implementation of the algorithm and little memory requirement. 3. FEAL 8 Cryptanalysis First of all, it is necessary to assume some notations that will be used in the following: a) n x represents a hexadecimal number with an index x; b) Ω P and Ω T are real input information and related encrypted output. The differential cryptanalysis attack using chosen text over FEAL 8 uses around 1000 information pairs that correspond to input data [2]. The input data are random chosen using the constraint Ω P =A2 00 80 00 22 80 80 00x. This approach is used because six round characteristic with a 1/128 probability
54 Mihai Horia Zaharia and Eugen Cazacu (Fig. 2) where not all the Ω T bits are established. Five shorter characteristic are derived from the first rounds of Fig. 2 as follows: from first round the probability is 1, from first two and three rounds the probability is 1/4 and using the fifth and sixth round a 1/16 probability is obtained. Using the same techniques as the ones presented by Eli Biham and Adi Shamir for making the differential cryptanalysis of the full 16-round DES [3], the FEAL 8 may be reduced round by round beginning with the seventh rounds and finishing with only one. Fig. 2 Six round characteristic. The process is simpler than in the DES case [3] due to the inherent FEAL 8 simplicity by comparison with DES. Anyhow, the process needs enough computing resources in order to support a parallel or distributed approach. 4. FEAL 8 Distribution of Cryptanalysis Fig. 3. The main idea in implementing differential cryptanalysis is given in
Bul. Inst. Polit. Iaşi, t. LV (LIX), f. 1, 2009 55 DATA SETS PAIRS FILTER COMPUTE KEYS THE KEY Fig. 3 Differential cryptanalysis application. The flow of input pairs is filtered and the key is statistically computed using the keys resulted from filtered pairs. Fig. 4 Differential cryptanalysis distributed application architecture. The needed computing power for filtering the pairs is less than the one needed in numbering algorithm. That is the reason to use in some cases only the parallelization of keys computing algorithm. This approach decreases the network communication because the rejected keys are not transmitted. In the case of DES the number of bad keys is over 99%. The architecture of distributed application used to compute differential cryptanalysis is presented in Fig. 4. The high level pseudo code of this approach is presented in the following: a) Server sends the filtered or not pairs to clients; b) Server filters the keys if needed; c) Selected keys are used to count the possible keys;
56 Mihai Horia Zaharia and Eugen Cazacu d) The vectors with the possible keys are synchronized with the clients; e) The key with maximum probability is sent back to the server. The parallel approach is possible due to the independence between processes involved in any needed key to be computed. In order to minimize the cost of the keys vector synchronization a proper communication topology must be chosen. Usually for FEISTEL based algorithms a mesh topology is suitable. 5. FEAL-8 Approach In this case the used topology will be mesh. There are necessary much synchronization for computing subkeys on each round. The pairs emitted by server are unfiltered because one pair that can be incorrect into one round can be suitable into another. The pairs are successively used to compute the needed subkeys that are applied in each round. The algorithm used by server is presented as follows: 1. The server will assign a different port to each client in the cluster. This is necessary to handle client from the subnetworks that share the same IP. 2. The server will upload to each client the complete list with all active clients. In case of sequential run this list will be empty. 3. The communications will be made by broadcast over a hypercube. This have a log(n) complexity where n is the number of active clients. If there are not enough clients to fill a dimension d = ceil(log2(n)), 2 d > n then 0...(2 d -n) clients will emulate clients n 2 d. 4. The communications will be made in d steps. On each step the clients will exchange messages on dimensions 0...(d-1) and in step j the client i will communicate with i 2j, where the client with bit j=1 will be server and the client with bit j=0 will be client. In Fig. 5 the connection from 5...8 clients, with d=3, where clients 4 7 can be emulate is presented. 5. Then the server will number the pair per each client into ClientLoad list. 6. The first split is in five parts to all clients and the process is repeated. 7. Each client will send a message to server each time when he finish to process a pair and this one will update the ClientLoad list. 8. The client with minimal value from ClientLoad list is elected and new work is given until the job is finished. 9. The server will wait until the clients finish their jobs and then will receive the following information from each client: a) Initial key computed if the algorithm succeed or an error message; b) The input and output traffic of the client; c) The sum of all wait times when the client communicate with others.
Bul. Inst. Polit. Iaşi, t. LV (LIX), f. 1, 2009 57 a) b) c) d) Fig. 5 Hypercube broadcast: a initial distribution of messages; b distribution before second step; c distribution before step 3; d final distribution. In the following, an example of method implementation using MPI is presented: # include //needed libraries int main(int argc, char** argv) { MPI Init(&argc, &argv); int rank; int size; int vect[50]; int tmpvect[50]; int virtualvect[50];
58 Mihai Horia Zaharia and Eugen Cazacu int virtualtmpvect[50]; //rank, and dimension computing MPI Status status; MPI Comm rank(mpi COMM WORLD, &rank); MPI Comm size(mpi COMM WORLD, &size); //vector setup with the needed information for other clients fill(vect, vect+size, 0); fill(tmpvect, tmpvect+size, 0); fill(virtualvect, virtualvect+size, 0); fill(virtualtmpvect, virtualtmpvect+size, 0); vect[rank] = rank; int d = ceil(log2((double)size)); //virtual clients init int hasvirtual[50]; int virtualpartener[50]; fill(virtualpartener, virtualpartener+size, -1); fill(hasvirtual, hasvirtual+size, -1); iota(virtualpartener+size, virtualpartener+(1<<d), 0); int hasvirtualsize = (1<<d) -size; iota(hasvirtual, hasvirtual+((1 <d) -size), size); //print virtual clients vector if(0 == rank) { copy(virtualpartener, virtualpartener+(1<<d), ostream iterator<int>(cout, )); cout << endl; copy(hasvirtual, hasvirtual+size, ostream iterator<int>(cout, )); cout << endl; MPI Barrier(MPI COMM WORLD); //all to all communication for(int i = 0; i < d; ++i) { if(hasvirtual[rank]!= -1) { int partener = hasvirtual[rank]ˆ(1<<i); if(partener >= size) partener = virtualpartener[partener]; MPI Send(&virtualVect, size, MPI INT, partener, 13, MPI COMM WORLD); MPI Recv(&virtualTmpVect, size, MPI INT, partener, 13, MPI COMM WORLD, &status); transform(virtualvect, virtualvect+size, virtualtmpvect, virtualvect, plus<int>()); int partener = rankˆ(1<<i); if(partener >= size) partener = virtualpartener[partener]; MPI Send(&vect, size, MPI INT, partener, 13, MPI COMM WORLD); MPI Recv(&tmpvect, size, MPI INT, partener, 13, MPI COMM WORLD, &status); transform(vect, vect+size, tmpvect, vect, plus<int>()); //print results for(int i=0; i<size; ++i) {if(rank == i) {cout << rank << : ; copy(vect, vect+size, ostream iterator<int>(cout, )); cout << endl; MPI Barrier(MPI COMM WORLD); return 0;
Bul. Inst. Polit. Iaşi, t. LV (LIX), f. 1, 2009 59 The client should follow the steps: 1. Waits the server to initiate computing process; 2. Receives his listen port for synchronizing with the other clients; 3. Receives the list with other clients; 4. The client begins to process the I/O pairs in accordance with the differential cryptanalysis algorithm. When a pair processing is finished, the server is announced to send the next one pair and so on; 5. When the key vector computing is finished, the client connects with another client and receives the key vector. Then, it uses the values from their vectors to update its own vector; 6. If the connection is simulated then the client sends initial a null vector that will be initiated with the received values. If the receiver is the same as emitter, than no communication is made just a local update of the vector virtualvect[]; 7. The key computing continues and before electing maximum from vector the previous synchronization step is done again; 8. The process stops when a good key is found or the algorithm fails. 4. Conclusions In this paper, it is presented a method of using cluster computing in speeding cryptanalysis specific techniques implementation. This is needed due to the fact that the differential cryptanalysis is a very complex method but has the inherent advantage of greatly decrease of the solution space. Even so breaking an algorithm designed especially to be resistant at almost brute force attack approach is intensive computational. The use of a computing cluster gives the possibility to obtain the solution of the problem in reasonable time. One of the results is that the chosen communication model is essential in order to make an efficient implementation of the distributed approach for differential cryptanalysis. The method used in problem parallelization gives to this approach a good scalability. Received: January 12, 2009 Gheorghe Asachi Technical University of Iaşi, Department of Computer Science and Engineering e-mail: mike@cs.tuiasi.ro *Continental Automotive System e-mail: the_e57@yahoo.com R E F E R E N C E S 1. Biham E., Dunkelman O., Keller N., Enhancing Differential-Linear Cryptanalysis. In LNCS, Springer-Verlag London, UK, 2002, Vol. 2501, 254 266. 2. Biham E., Shamir A., Differential Cryptanalysis of Feal and N-Hash. In LNCS, Springer-Verlag London, UK, 1995, Vol. 547, 1-16. 3. Biham E., Shamir A., Differential Cryptanalysis of the Full 16-Round DES. In LNCS,
60 Mihai Horia Zaharia and Eugen Cazacu Springer-Verlag London, UK, 1992, Vol. 740, 487-496. 4. Coppersmith D., The Data Encryption Standard (DES) and its Strength Against Attacks. IBM Journal of Research and Development 38, 3, 243, 1994. 5. Lipmaa H., Moriai S., Efficient Algorithms for Computing Differential Properties of Addition. In LNCS, Springer-Verlag London, UK, 2001, Vol. 2355, 336-350. 6. Menezes A., Oorschot P., Vanstone S., Handbook of Applied Cryptography, Boca Raton, FL, US, CRC Press, 1996. 7. * * * http://info.isl.ntt.co.jp/crypt/eng/archive/feal_specifications.html. CRIPTANALIZA DIFERENłIALĂ DISTRIBUTĂ APLICATĂ A ALGORITMULUI FEAL - 8 (Rezumat) Este binecunoscut faptul că o tehnică de criptare este bună pentru a proteja o anumită informańie numai dacă costurile implicate în atacul respectivei metode depăşesc valoarea respectivei informańii. Pornind de la această afirmańie, încă de la începuturile criptografiei moderne s-a dezvoltat în paralel domeniul analizei rezistenńei la atacuri teoretice şi practice ale metodelor criptografice numit uzual criptanaliză. Deşi reprezintă în general apanajul unei clase de specialişti dedicańi, posibilităńile actuale ale tehnicii de calcul permit o creştere a vitezei de spargere fără a conduce la costuri suplimentare mari. Ca rezultat, această lucrare prezintă tehnica de folosire a calculului distribuit în acest scop, luând ca exemplu criptanaliza diferenńială a unui algoritm Feistel simplu cum este FEAL 8. Trebuie menńionat că această tehnică deşi nu poate fi aplicată cu succes asupra tuturor cifrurilor bloc de ultimă generańie cum este AES, are încă o serie de aplicańii şi prezintă încă potenńial de cercetare. Acest lucru este dovedit şi de variantele apărute ulterior cum este criptanaliza diferenńială de tipul generalizat (engl. high order ) trunchiată şi nu în ultimul rând atacul de tip bumerang. În lucrare este prezentată şi o exemplificare a abordării propuse folosind o implementare bazată pe o bibliotecă de tip MPI (Message Passing Interface).