Implementation and Performance analysis of Skipjack & Rijndael Algorithms by Viswnadham Sanku ECE646 Project Fall-2001
TABLE OF CONTENTS TABLE OF CONTENTS 2 1. OBJECTIVE 3 2. SKIPJACK CIPHER 3 2.1 CIPHER OPERATIONS 3 2.2 DESIGN PRINCIPLES 5 2.3 CRYPTANALYSIS 5 2.4 IMPLEMENTATION & TESTING 5 PKCS#5 PADDING 6 2.5 OPTIMIZATION TECHNIQUES 6 2.6 RESULTS 6 3. RIJNDAEL CIPHER 8 3.1 CIPHER OPERATIONS 8 3.2 DESIGN PRINCIPLES 10 3.3 CRYPTANALYSIS 10 3.4 IMPLEMENTATION & TESTING 11 PKCS#5 PADDING 11 3.5 OPTIMIZATION TECHNIQUES 11 3.6 RESULTS 12 4. COMPARISION OF SKIPJACK VS RIJNDAEL 14 5. PROBLEMS ENCOUNTERED 14 6. CONCLUSIONS 15
1. OBJECTIVE The main objective of this project is to implement and analyze the performances of two efficient and secure secret-key block ciphers Skipjack and Rijndael. The purpose of doing this is to get a good insight into the working of secret-key ciphers and to explore better ways of implementation with the help of different optimization techniques. The ciphers are implemented on two different platforms, MS Visual C(Windows) and GNU C (Linux). Then speeds of operation on single block of data is measured using CPU clock cycles. Then the speeds achieved by my implementation is compared against different other implementations found in the public domain. A measure of diffusion is taken for each round of operation, to see how good the diffusio n is at different levels. Finally the two ciphers are compared against each other and the reasons for the differences in their performances are listed. 2. SKIPJACK CIPHER Skipjack is the secret key encryption algorithm designed by the NSA and is used in the Clipper chip and Fortezza PC card. It was implemented in tamper-resistant hardware and its structure had been classified since its introduction in 1993. Skipjack was unclassified on June 24 th, 1998. 2.1 CIPHER OPERATIONS Skipjack is block cipher that uses 64-bit block size and 80-bit key. There are total 32 rounds in encryption and 32 rounds in decryption. Skipjack uses two different types of round functions, RuleA and RuleB for encryption and RuleA -1 and RuleB -1 for decryption. Encryption consists of 8 RuleA, and 8 RuleB, 8 RuleA and 8 RuleB rounds. Decryption consists of 8 RuleB -1, 8 RuleA -1, 8 RuleB -1 and 8 RuleA -1 rounds. The 64-bit block is internally divided into four 16-bit words. And in each round a keyed non-linear permutation is applied to one word from the block. Each of the 32 rounds uses 4 byte subkey. 10 byte key is repeated itself to make it 128 byte. G/G -1 permutation is applied to one word in each Rule. This G permutation uses a F table, which is a byte substitution table. The G permutation carries the linear operation of mixing the bytes and no-linear operation of substitution. ENCRYPTION 64 bits DECRYPTION 64 bits 8 Rounds (Rule A) 8 Rounds (Rule B -1 ) 8 Rounds (Rule B) 8 Rounds (Rule A -1 ) 8 Rounds (Rule A) 8 Rounds (Rule B -1 ) 8 Rounds (Rule B) 8 Rounds (Rule A -1 ) 64 bits 64 bits
RuleA RuleB G G Counter Counter RuleA -1 W1 W2 W 3 W4 RuleB -1 G -1 G -1 Counter Counter G Permutation G -1 Permutation g1(high) g2(low) g1(high) g2(low) CV 4k CV 4k3 F F CV 4k1 F F CV 4k2 CV4k2 CV 4k1 F F CV 4k3 F F CV 4k g1(high) g2(low) g1(high) g2(low)
2.2 DESIGN PRINCIPLES Symmetry. Skipjack encryption and decryption have the same symmetrical structure. This principle simplifies the task of implementation. And it ensures identical security against chosen plaintext and chosen ciphertext attacks. Not too much Symmetry. Symmetry sometimes allows for clever cryptanalytic attacks. In Skipjack symmetry is broken with round counters. round counters appear to prevent most attacks that attempt to exploit their symmetry, while retaining the equivalence between chosen-plaintext and chosen-ciphertext security against very large class of attacks. A Round before B Round. A rounds exhibit better diffusion than B rounds and hence makes it harder to peek deep inside the cipher rounds. So A rounds are used before B rounds in encryption direction. A -1 rounds exhibit weaker diffusion than B -1 rounds. So B -1 is used before A -1 in decryption direction. 8A, 8B Rounds. Any number of rounds less than 8 are vulnerable to cryptanalysis attacks, especially truncated differential cryptanalysis. 80-bit Key. With longer key size differential style attacks have lower complexity than exhaustive key search. Shorter keys are too weak to resist exhaustive key search. So 80-bit key length is an effective compromised key length. 2.3 CRYPTANALYSIS Bad interactions between the round-types, i.e A and B Rounds interact poorly where they are applied in consecutive rounds. Because of this the transitions between round types appear to reduce security. Less diffusion in B rounds and A -1 rounds A Round mixes the output of the G permutation to the input of the next G permutation, while Rule B mixes the input of a G permutation to the output of the previous G permutation, and thus during encryption the Rule B rounds add little to the avalanche effect, and during decryption Rule A rounds add little to the avalanche effect. A difference of one input bit in the F Table may cause a difference of only one bit in its output. Lars R. Knudsen, M.J.B. Robshaw, David Wagner, have shown that there are 24 round truncated differentials with probability 1, which can be extended to attacks on up to 28 rounds skipjack. Eli Biham had shown an attack on 31round skipjack using impossible differentials. 2.4 IMPLEMENTATION & TESTING This cipher is implemented under two platforms MS VC (Windows 350MHz Pentium II) and gnu C (Linux 500MHz Pentium III). Implemented is done for 5 different modes of operation ECB, CBC, CFB, OFB and CTR modes, all in 64-bit mode. Functions to carry out speed calculations in cpu clock cycle counts and functions that perform diffusion calculation are implemented. Testing procedures for Known Answer Tests that test the correctness of the implementation and Monte Carlo Tests that checks the modes of operation are also incorporated into the implementation.
PKCS#5 PADDING Since Skipjack operates on block of 64 bits, if the input data size is not a multiple of 64 then padding has to be done before encryption, to make the input size a multiple of 64. The padded bits are to be removed after decryption. PKCS#5 is the recommended padding technique used in Skipjack. Let n be the length in bytes of the input. Pad the input by appending 8 - (n mod 8) bytes to the end of the message, each having the value 8 - (n mod 8), the number of bytes being added. 2.5 OPTIMIZATION TECHNIQUES To optimize the speed used three different techniques. Operations on integers, except Gfunction. Since Integer oprations are faster than operations on other data types like short/long. Unrolling the rounds instead of using loops. #define macro substitutions. This reduces run-time overtime of function call. There are other possible techniques which I have not used in my implementation are With prior key knowledge, subkey table representing the Gpermutaion function ftable[inbyte ^ keybyte] Data movement can be minimized by rotating the names of variables w1,w2,w3, w4 instead of the contents of the words. 2.6 RESULTS The following graph shows the speeds of operation on a single block of data. And it can be seen that my implementation gave a better performance on VC than GNU C. This is because other implementation are optimized for GNU C while mine is optimized for VC platform.
The following graph shows the results of encryption and decryption speeds using clock cycle count and cpu time on a 51Mbytes size file. It is evident from the results that the speed dropped by 40%, as the file size is large and hence most the time is spent on file operations rather than cipher operations. Also the time difference between the usage of clock cycle count and cpu time is very small. The following graph shows the result of Monte Carlo Tests in two different modes ECB, MCT. This test carries out 400 x 10000 rounds of encryption and decryption on a single block of data. The following graph shows diffusion for encryption and decryption. It can be observed that the diffusion around 25% after the first round and 50% after 32 rounds.
3. RIJNDAEL CIPHER Rijndael is a block cipher designed Joan Daemen and Vincent Rijmen. This cipher is selected as proposed AES by NIST, and will probably be official sometime spring 2001. This is flexible key size and block size cipher. 3.1 CIPHER OPERATIONS Keys lengths of 128, 192, or 256 bits or more and blocks lengths of 128, 192 or 256 bits or more are supported. Block length and key length can be extended to multiples of 32 bits. Number of rounds depend on block and key lengths. Each round transformation is composed of four different transformation, ByteSub, ShiftRow, MixColumn and AddRoundKey, except the final round, which does not have MixColumn. Intermediate cipher result is called the State. Number of rounds is calculated using following formula. Number of rounds = Max(NB, NK) 6 NB = # of 32-bit blocks in input block NK = # of 32-bit blocks in key Rijndael operations uses Galois field arithmatic. Inverse ByteSub uses inverse S-box. Inverse ShiftRow shifted in other direction. For MixColumn c(x) = 03 x 3 01 x 2 01 x 02. For InverseMixColumn d(x) = 0B x 3 0d x 2 09 x 0E polynomials are used. Encryption 128/192/256 bit block 128/192/256 bit Round Transformation 128/192/256 bit block Key Addition round key ByteSub MAX(NB,NK) 4-1 Round Transformations ShiftRow MixColumn Final Transformation 128/192/256 bit cipher AddRoundKey 128/192/256 b round key Final Transformation 128/192/256 bit block ByteSub ShiftRow AddRoundKey 128/192/256 bit round key
Decryption 128/192/256 bit block 128/192/256 bit Inverse Round Transformation 128/192/256 bit block Key Addition inverse round key Inv ByteSub MAX(NB,NK) 4-1 Inv Round Transformations Inv ShiftRow Inv MixColumn Inv Final Transformation 128/192/256 bit cipher AddRoundKey 128/192/256 bi inverse round k Inverse Final Transformation 128/192/256 bit block Inv ByteSub Inv ShiftRow AddRoundKey 128/192/256 bit inverse round key
Key Expansion Key is expanded and used using the following rules. Expanded key size = NB * (NR1). Each round key consists of next NB 32-bit blocks taken from expanded key. Inverse round key is obtained by applying InvMixColumn to all Round Keys except the first and the last one One single round transformation is depicted as follows. 3.2 DESIGN PRINCIPLES Symmetry in encryption & decryption protects against choosen plaintext and chosen ciphertext attacks Linear-mixing layer (ShiftRow & MixColumn) which gaurentees high diffusion and Non-linear S-boxes. Protects against linear and differential cryptanalysis. Non-Feistel round transformation. 3.3 CRYPTANALYSIS Rijndael has adequate security margin, against linear and differential cryptanalysis, as well as other type of attacks Square attack can be used to break 7 rounds of the cipher Niels Ferguson showed attacks on 8 rounds of cipher for key sizes 192 & 256 bits
3.4 IMPLEMENTATION & TESTING This cipher is implemented under two platforms MS VC (Windows 350MHz Pentium II) and gnu C (Linux 500MHz Pentium III). Implemented to support block size of 128 bits and key sizes of 128/192/256 bits. Implementation is done for 5 different modes of operation ECB, CBC, CFB, OFB and CTR modes, all in 128-bit mode. Functions to carry out speed calculations in cpu clock cycle counts and functions that perform diffusion calculation are implemented. Testing procedures for Known Answer Tests that test the correctness of the implementation and Monte Carlo Tests that checks the modes of operation are also incorporated into the implementation. PKCS#5 PADDING Since Rijndael operates on block of say 128 bits, if the input data size is not a multiple of 128, then padding has to be done before encryption, to make the input size a multiple of 128. The padded bits are to be removed after decryption. PKCS#5 is the recommended padding technique used in Skipjack. Let n be the length in bytes of the input. Pad the input by appending 16 - (n mod 16) bytes to the end of the message, each having the value 16 - (n mod 16), the number of bytes being added. 3.5 OPTIMIZATION TECHNIQUES To optimize the speed used three different techniques. #define macro substitutions. This reduces run-time overtime of function call. Use pre calculated tables in place of (ByteSubShiftRowMixColumn) Transforms. These tables can be calculated as shown the diagram below. a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 0 d3 S-box A0 B0 C0 D0 A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 0 D3 Shift A0 B0 C0 D0 B1 C1 D1 A1 C2 D2 A2 B2 D3 A3 B3 0 C3 M i x 2A0 ^ 3B1 ^ C2 ^ D3 A0 ^ 2B1 ^ 3C2 ^ D3 A0 ^ B1 ^ 2C2 ^ 3D3 3A0 ^ B1 ^ C2 ^ 2D3 Table1 = S(2X) S(X) S(X) S(3X) Table1
3.6 RESULTS The following graph shows the speeds of operation on a single block of data. And it can be seen that my implementation gave a better performance than Rijmen implementation and less than Gladman performance on VC. On Gnu C it is Rijmen's that is better than mine. This is because Rijmen's implementation is optimized for GNU C while mine is optimized for VC platform. The following graphs show the key setup cycles. On VC the cycle count for my implementation and Rijmen's show total cycles for set up of encryption and decryption keys. But Gladman uses only one key setup. On Gnu C, since I could not able to get clock cycle count, used cpu time and measured the key setup speed for different key sizes.
The following graph shows the results of encryption and decryption speeds using clock cycle count and cpu time on a 51Mbytes size file. It is evident from the results that the speed dropped by 40%, as the file size is large and hence most the time is spent on file operations rather than cipher operations. Also the time difference between the usage of clock cycle count and cpu time is very small. The following graph shows the result of Monte Carlo Tests in two different modes ECB, MCT. This test carries out 400 x 10000 rounds of encryption and decryption on a single block of data.
The following graph shows diffusion for encryption and decryption. It can be observed that the diffusion of only 4 bits out of 128 after the first round in encryption. This only happens for 128 bit key for other keys the diffusion is around 50% in the first round itself. Also the diffusion in each round is around 50%. This indicates a very good diffusion. 4. COMPARISION OF SKIPJACK VS RIJNDAEL The Best possible speeds I could able to achieve for Skipjack = 9 Mhz and Rijndael = 27 Mhz (128 bit block&key). This clearly indicates that Rijndael is many times faster than Skipjack. Major contribution to the differen ce is the number of rounds (skipjack = 32, Rijndael = 10) and then the round transformation tables used in Rijndael. Skipjack G permutation is applied to one 16-bit word in each round. While Rindael round transformation (ByteSubShiftMix) is applied to every byte of the block, and hence gaurentees a very good diffusion. Rijndael has pretty good security margin compared to Skipjack. 5. PROBLEMS ENCOUNTERED Clock cycle counter code would not work with MSVC, because the code used constants CPUID, RDTSC, which the compiler can not understand. Instead they have to be force fed to the compiler using _emit instruction along with the opcode.
double cycles(void) { unsigned long hi,lo; asm { _emit 0x60 //PUSHAD: Save all registers _emit 0x0f _emit 0xa2 //CPUID: Serialize instruction execution _emit 0x0f _emit 0x31 //RDTSC: Read clock cycle count into A,D mov lo,eax mov hi,edx //Get values from A,D registers _emit 0x61 } //POPAD: Restore the registers return 4294967296.0 * hi lo; // 2^32 * hi lo } Though the constants CPUID, RDTSC works with GNU C compiler, the count is not returning proper cycle count values. So I had to depend on the absolute time functions. I Could not able to figure it out Yet. 6. CONCLUSIONS Rijndael is more flexible cipher than skipjack since Rijndael supports different block sizes and different key sizes, while Skipjack supports only 64 bit block and 80 bit key. Rijndael is more secure than Skipjack, since the minimum key size in Rijndael is 128 bit while that in skipjack is 80 bit. Rijndael yields greater speeds than skipjack with proper optimized implementations. To implement a cipher it is very important to analyze the cipher. So that the implementation can be optimized to yield far better performances than the straight forward implementation. The speed of the cipher depends on complier as well. Same implementation may yield widely varied performances on different platforms. Although diffusion and security margin may not say anything about how secure the cipher is, but they are important in the analysis of the cipher. - End of Report -