Design Space Exploration of the Lightweight Stream Cipher WG- for FPGAs and ASICs Gangqiang Yang, Xinxin Fan, Mark Aagaard and Guang Gong University of Waterloo g37yang@uwaterloo.ca Sept 9, 013 Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9
Smart devices are everywhere. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9
Smart devices are everywhere. Lightweight symmetric ciphers. Block ciphers: HIGHT, PRESENT, CLEFIA, LED, PRINCE, SIMON/SPECK. Stream ciphers: Grain, Trivium, MICKEY, lightweight instances of WG stream cipher family. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9
FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Figure 1: The High-Level Hardware Architecture of the Lightweight Stream Cipher WG- Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 3 / 9
Overview 1 Introduction to WG- Hardware architecture of WG- Preliminaries Behaviour and hardware architecture of WG- 3 The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using look-up tables Implementation of WGT- module using 1 Implementation of WGT- module using Implementation of WGT- module using 3 4 The multiplication by ω module 5 Results Results for FPGAs Results for ASICs Discussions 6 Conclusions Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 4 / 9
Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9
Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Recent work has demonstrated the advantages of tower field constructions for finite field arithmetic in the AES(S-box) and WG-16 ciphers. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9
Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Recent work has demonstrated the advantages of tower field constructions for finite field arithmetic in the AES(S-box) and WG-16 ciphers. Purpose. Investigate three different tower field constructions in F and analyze their effect in hardware architectures for WG-. Explore the design space and hardware performance for WG- on low-cost FPGAs and CMOS 65nm ASICs in terms of area, speed and power consumption. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9
Introduction to WG- Main contribution We proposed four different hardware architectures for WG- transformation module (WGT-). 1. Directly employs look-up table over F.. Based on the tower construction F ( 4 ) together with small 4 4 look-up tables for arithmetic in F 4. 3. Based on the tower construction F ( 4 ), but with type-i optimal normal basis (ONB) for efficient computations in F 4. 4. Benefits from the construction F (( ) ) enables the simplification of the trace of product of two finite field elements. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 6 / 9
Hardware architecture of WG- Preliminaries Preliminaries p(x) = x + x 4 + x 3 + x + 1, a primitive polynomial of degree over F. F, the extension field of F defined by the primitive polynomial p(x) with elements. ω is a primitive element of F and p(ω) = 0. Tr(x) = x + x + x + x 3 + x 4 + x 5 + x 6 + x 7, the trace function from F F. l(x) = x 0 + x 9 + x + x 7 + x 4 + x 3 + x + x + ω, the feedback polynomial of LFSR (which is also a primitive polynomial over F ). q(x) = x + x 3 +1 + x 6 + 3 +1 + x 6 3 +1 + x 6 + 3 1, a permutation polynomial over F. WGP-(x d ) = q(x d + 1) + 1, the WG- permutation with decimation d = 19 from F F, where d is coprime to 1. WGT-(x d ) = Tr(WGP-(x d )) = Tr(q(x d + 1)), the WG- transformation with decimation d = 19 from F F. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 7 / 9
Hardware architecture of WG- Behaviour and hardware architecture of WG- Behaviour of WG- WG- consists of a 0-stage LFSR with feedback polynomial l(x) followed by WG- transformation module (WGT-(x 19 )). It contains loading and initialization phase, and running phase. Loading and initialization phase. The loading phase will take 0 clock cycles for loading the initial state S i. The initialization phase will run for 40 clock cycles after the loading phase based on the recursive relation in the picture. S 19 S 1 S 17 S 16 S 15 S 14 S 13 S 1 S 11 S 10 x 19 WGP-(x 19 ) ω S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 WGP-(x 19 ): WG- Permutation Module Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9
Hardware architecture of WG- Behaviour and hardware architecture of WG- Running phase ω S 19 S 1 S 17 S 16 S 15 S 14 S 13 S 1 S 11 S 10 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 WGT-(x 19 ) WGT-(x 19 ): WG- Transformation Module x 19 WGP-(x 19 ) Tr( ) 1 keystream WGP-(x 19 ): WG- Permutation Module Tr( ): Trace Computation Module Running phase The recursive relation for updating the LFSR in the running phase is different from the loading and initialization phase. 1-bit keystream is generated from the trace module after each clock cycle. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 9 / 9
Hardware architecture of WG- Behaviour and hardware architecture of WG- Hardware architecture of WG- FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Four main components. 0-stage LFSR. WGT-(x 19 ). (most important) FSM. Multiplication by ω module. WGT-(x 19 ) was further split into WGP-(x 19 ) and trace modules. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 10 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using LUT and Tower Field arithmetic 4 ( ) 4 ( ) (( ) ) f ( x) 1 f ( x) /1 g ( x) 3 LUT 4 e ( x) 1 4 e ( x) ( ) f ( x) 3 Figure : Look-up table in F Figure 3: Tower Field 1 construction for F Figure 4: Tower Field construction for F e ( x) 3 Figure 5: Tower Field 3 construction for F Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 11 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using look-up tables Implementation of WG- transformation module using look-up tables Pre-compute WGP-(x 19 ) and store the results in a look-up table, (Table(WGP-)). Pre-compute WGT-(x 19 ) = Tr(WGP-(x 19 )) and store the results in a 1 look-up table, (Table(WGT-)). FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 Table(WGT-) Table(WGP-) 1 Keystream Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 1 Implementation of WGT- module using 1 (Tower Field arithmetic) 4 ( ) 4 f ( x) 1 e ( x) 1 Figure 6: Tower Field 1 construction for F The tower construction is used to calculate the multiplication efficiently in F. t is calculated by cyclic shift operation using normal basis (NB). The conversion matrices between and NB representations are needed. e 1 (x) is a primitive polynomial. The arithmetic in F 4 is conducted with the aid of a 4 4 exponentiation table T exp and a 4 4 logarithm table T log. The table T exp stores exponentiation α i, i = 0, 1,..., 14. The table T log keeps the exponent i for each α i, i = 0, 1,..., 14. AB = T exp(t log A + T log B). Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 13 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 1 Implementation of WGT-(x 19 ) module WGP- and WGT- modules. WGP-(x 19 ) = y + y 3 +1 + y 6 (y 3 +1 + y 3 1 ) + y 3 ( 3 ( 1)+1 + 1, ) y = x 19 + 1. WGT-(x 19 ) = Tr y + y 3 +1 + y 6 (y 3 +1 + y 3 1 ) + y 3 ( 3 1)+1. A simple WGT-(x 19 ) module described in normal basis (NB). 4 x 4 x M 4 + M x x 1 x x 19 M S : -bit Multiplier : -bit Squarer 1 y ( ) 3 1 y 3 y 3 M y 3 1 3 y y 6 6 y 3 ( 3 1) y 3 +1 y 1 y 6 (y 3 +1 + y 3 1 ) M 4 M 3 1 + 3 + 4 1 WGT-(x19) Tr( ) 1 WGP-(x 19 ) Initialization Feedback Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 14 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using Implementation of WGT- module using (Tower Field arithmetic) 4 ( ) 4 f ( x) /1 e ( x) f (x) is the same as f 1 (x). e (x) is only an irreducible polynomial. Type-I optimal normal basis(onb) exists, and the arithmetic calculation in F 4 is very efficient. Figure 7: Tower Field construction for F Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 15 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 4 x 4 x M 4 + M x x 1 x x 19 M S : -bit Multiplier : -bit Squarer 1 y ( ) 3 1 y 3 y 3 M y 3 1 3 y y 6 6 y 3 ( 3 1) y 3 +1 y 1 y 6 (y 3 +1 + y 3 1 ) M 4 M 3 1 + 3 + 4 1 WGT-(x19) Tr( ) 1 WGP-(x 19 ) Initialization Feedback Figure : Simple description of WGT-(x 19 ) module in normal basis (NB) Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 16 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using x NB 0x : Method 1 0xFF: Method y 4 1 M T N x 4 M x 4 N T M M NB x NB y NB M N T M N T y 3 x x x 4 + y 3 3 M M NB N T x 19 y 3 +1 y 1 1 + ( ) 3 1 y 3 1 y 3 3 ( 3 1) y 3 1 WGT-(x 19 ) M T N M y 3 ( 3 1) NB NB N T M Tr( ) 1 3 WGP-(x 19 ) y y y M 6 T N M T N 6 6 y M NB NB N T M Initialization 4 6 (y 3 +1 + y 3 1 ) Feedback 3 + 4 0x : Method 1 : Different Trace Functions for Method 1 and Method 0xFF: Method M S : -bit Multiplier : -bit Squarer Figure 9: The Hardware Architecture of WGT-(x 19 ) module for 1 and. Differences between 1 and. Multiplication M. Conversion matrices. Trace function. Representation of 1 in different tower field. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 17 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Implementation of WGT- module using 3 (Tower Field arithmetic) (( ) ) g ( x) 3 The arithmetic operations in F, F ( ), F (( ) ) are all computed using logic directly. ( ) f ( x) 3 e ( x) 3 Figure 10: Tower Field 3 construction for F U V Special trace property can be obtained based on this tower construction. M V The trace of multiplication U and V in F (( ) ) can be computed as the inner product of U and V, i.e, Tr(UV) = Tr Z = U M Tr 7 i=0 (u i 1 v i ) U V Z Z Gangqiang Yang (University of Waterloo) ULightweight Stream Cipher WG- Sept 9, 013 1 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGT-(x 19 ) module for running phase using the trace property WGT- (x 19 ) can be computed as follows: WGT-(x 19 ) = Tr(WGP-(x 19 )), x F ( ) = Tr y y 3 +1 y 6 (y 3 +1 y 3 1 ) y 3 ( 3 1) y = Tr(y y 3 +1 ) 1 Tr(y 6 (y 3 +1 y 3 1 )) 1 Tr(y (3 1) y 6 ) = Tr(y y 3 +1 ) 1 Tr(y 6 (y 3 +1 y 3 1 y (3 1) )). Two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y inside the trace function have been replaced by the bitwise AND and XOR operations. Problem: The values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y are missing. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 19 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGP-(x 19 ) module for the initialization phase WGP- can be computed as follows: WGP-(x 19 ) = q(x 19 + 1) + 1, x F = (1 y y 3 +1 ) (y 6 (y 3 +1 y 3 1 )) (y 3 ( 3 1) y). y 3 +1 and y 3 1 can be obtained from the WGT-(x 19 ) module. Question: How can we recover the values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y? Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 0 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGP-(x 19 ) module for the initialization phase WGP- can be computed as follows: WGP-(x 19 ) = q(x 19 + 1) + 1, x F = (1 y y 3 +1 ) (y 6 (y 3 +1 y 3 1 )) (y 3 ( 3 1) y). y 3 +1 and y 3 1 can be obtained from the WGT-(x 19 ) module. Question: How can we recover the values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y? WGP- module is only used in the initialization phase. we can keep the throughput of the running phase to be 1 bit/cycle, while decreasing the throughput of the initialization phase (i.e., < 1 bit/cycle). It can be done by increasing the length of the initialization phase and reuse the existing multipliers. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 0 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 a b M c M e 1 d a c sel M e b d sel Figure 11: Reuse the multiplier in two consecutive clock cycles. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9
The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Integrated hardware architecture for computing WGT- and WGP- 0xFF y MT N 6 y 6 NB y y 3 3 y 3 0 MN T NB NB S y y 3 M sel 1 S y 4 0 MUX 3 MN T sel 1 0 1 MUX 4 M MUX 1 sel y 6 7 1 i=0 xi Keystream y y 7 i=0 3 +1 1 1 xi 0 1 1 WGT-(x 19 ) MUX M y 3 ( 3 1) 0 1 MUX 6 -bit Reg 1 clk -bit y 3 ( 3 1) Reg MN T clk y 3 1 MT N NB 1 y 3 +1 y 3 1 y 3 ( 3 1) NB y (3 1) NB 3 MN T y (3 1) x (NB) 4 1 x 4 NB MN T x NB x 4 x MN T M x 1 M 1 x 19 MUX 0 5 -bit Reg 3 clk MT N NB Intialization Feedback WGP-(x 19 ) MN T x Figure 1: The Integrated Hardware Architecture for Computing WGP-(x 19 ) and WGT-(x 19 ) Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9
The multiplication by ω module The multiplication by ω module FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Using finite field arithmetic: X ω = x 7 + x 0 ω + (x 1 1 x 7 )ω + (x 1 x 7 )ω 3 + (x 3 1 x 7 )ω 4 + x 4 ω 5 + x 5 ω 6 + x 6 ω 7. X F under polynomial basis (PB). X ω = M (x 0, x 1,, x 6, x 7 )T. Normal basis (NB). Using the look-up table. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 3 / 9
Results Results for FPGAs Results for flip-flops, area, speed, and power consumption analysis of FPGA implementations Spartan-3(xc3s1000) Key IV Data #FFs Area Maximum Dynamic Maximum Optimality Implementations Size Size Rates Frequency Power Throughput T/A T/P T/(A*P) (bits) (bits) (bits/ (Slices) (MHz) (W) (Mbps) (Mbps/ (Mbps/ (Mbps/ cycle) #Slices) W) Slices*W) WG- (LUT) 1 5 137 190 0.005 190 1.39 3000 77.4 WG- (LUT) 11 07 39 19 0.016 11 5.31 13000 331.7 WG- ( 1) 1 3 67 19 0.671 19 0.03.3 0.04 WG- ( 1) 11 79 5106 19 4. 09 0.04 4. 0.010 0 0 WG- ( ) 1 3 343 4 0.339 4 0.1 13.9 0.36 WG- ( ) 11 306 369 4 1.66 46 0.0 74 0.1 WG- ( 3) 1 114 436 49 0.67 49 0.11 13.5 0.4 WG- ( 3) 11 470 795 44.399 44 0.17 01.7 0.07 Grain 0 64 1 44 196 196 4.45 Trivium 0 0 1 50 40 40 4.0 We use Synopsys Synplify for synthesis, Xilinx ISE for implementation and Modelsim for simulation. All results are obtained after post place-and-route phase and the dynamic power consumption are recorded at a frequency of 33.3 MHz except 1 (16.7 MHz). For serial design, we take advantage of SRL16 in Spartan-3 device, which can reduce the flip-flop numbers and area significantly. Using parallel LFSR to achieve data rate from one to eleven bits/cycle. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 4 / 9
Results Results for ASICs Area, speed, and power consumption results for ASIC implementations. CMOS 65nm Key IV Data Area Optimum Total Maximum Optimality Implementations Size Size Rates Frequency Power Throughput T/A T/P T/(A*P) (bits) (bits) (bits/ (GE) (MHz) (mw) (Mbps) (Mbps/ (Mbps/ (Mbps/ cycle) #GE) mw) GE*mW) WG- (LUT) 1 176 500 0.93 500 0. 50.6 0.5 WG- (LUT) 11 394 610 1.344 6710 1.70 499.6 1.7 WG- ( 1) 1 753 9 35. 9 0.03 6.40 0.001 WG- ( 1) 11 476 1 100.1 134 0.03 13.4 0.0003 0 0 WG- ( ) 1 316 60 6.97 60 0.0 37.3 0.01 WG- ( ) 11 66 05 44.6 55 0.10 50.6 0.00 WG- ( 3) 1 91 54 7.9 54 0.0 31.3 0.011 WG- ( 3) 11 19 05 53.3 55 0.11 4.3 0.00 Grain 1 116 100.04 100 0.91 500 0.44 0 64 Grain 11 116 109.5 107 10.73 536 4.77 Trivium 1 196 96 3. 96 0.4 47.9 0.1 0 0 Trivium 11 0 990 4 1090 5.37 7.5 1.34 We use Synopsys Design Compiler for synthesis and Cadence SoC Encounter for place & route phase. we use TSMC CMOS 65nm LPLV technology. All results are obtained after post place-and-route phase and the dynamic power consumption are recorded at their optimum frequency for all designs. The Modelsim simulation is run for 000 clock cycles, including the initialization and running phases. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9
Results Discussions Discussions The WG- (LUT) method is the best hardware solution for the WG- stream cipher. For the three tower field methods, the best one is dependent on the data rates and different metrics. Table 1: The best Tower Field method for different metrics Data Rates FPGA ASIC T/A T/P T/(A*P) T/A T/P T/(A*P) 1 3 3 /3 11 3 /3 Table : The number of multipliers and multiplexers and the area of them Implementations Multipliers Multiplier Multiplier Multiplexers (Number) (Single area) (Total areas) (Number) (slices) (slices) 1 7 53 371 0 7 9 03 0 3 5 6 130 6 Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 6 / 9
Results Discussions Discussions Compare results with Grain and Trivium. For the FPGA results, Grain and Trivium are better than WG-(LUT) method in metric of T/A for both date rates one and eleven. For the ASIC results, WG-(LUT) is 105% better than Trivium in metric of T/P and 13% better in T/(A*P) for data rates one. For data rates eleven, it is 3% better in T/P. For ASIC results, WG-(LUT) is % better than Grain in metric of T/P for data rates one. Guaranteed mathematical properties of WG-. WG- has desired randomness properties like period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 7 / 9
Conclusions Conclusions and future work Conclusions: Four implementation methods for WG- transformation module (WGT-) are proposed. LUT: Look-up tables. 1: Tower construction F ( 4 ) with small 4 4 look-up tables in F 4. : Tower construction F ( 4 ) with Type-I optimal normal basis (ONB) in F 4. 3: Tower construction F (( ) ) with special trace property. The tower field constructions affect the hardware architecture of WG- and also the area of one multiplier. Make trade-offs in terms of area, speed, and power consumption. Among the three tower field constructions, with type-i optimal normal basis is the best choice in most cases for WG-. Future work: Explore more efficient tower field constructions for large size WG stream ciphers, i.e, WG-10, WG-11, WG-14, and other lightweight stream ciphers. More architecture optimizations, such as pipelining and reuse techniques can be performed to improve the speed, area and power consumption. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9
Conclusions Q & A? Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 9 / 9