Design Space Exploration of the Lightweight Stream Cipher WG-8 for FPGAs and ASICs

Similar documents
A New Architecture of High Performance WG Stream Cipher

Advanced WG and MOWG Stream Cipher with Secured Initial vector

Optimized Hardware Implementations of Lightweight Cryptography

FPGA Implementation of WG Stream Cipher

Efficient FPGA Implementations of PRINT CIPHER

Topics. Midterm Finish Chapter 7

INTRODUCTION TO FPGA ARCHITECTURE

Encryption / decryption system. Fig.1. Block diagram of Hummingbird

FFT/IFFTProcessor IP Core Datasheet

Design and Implementation of 3-D DWT for Video Processing Applications

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

Field Programmable Gate Array (FPGA)

AES Core Specification. Author: Homer Hsing

Verilog for High Performance

The Simeck Family of Lightweight Block Ciphers

A High-Performance VLSI Architecture for Advanced Encryption Standard (AES) Algorithm

Sine/Cosine using CORDIC Algorithm

Small-Footprint Block Cipher Design -How far can you go?

Dietary Recommendations for Lightweight Block Ciphers: Power, Energy and Area Analysis of Recently Developed Architectures

The Xilinx XC6200 chip, the software tools and the board development tools

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

Chosen-IV Correlation Power Analysis on KCipher-2 and a Countermeasure

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

A Countermeasure Circuit for Secure AES Engine against Differential Power Analysis

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

FPGA for Software Engineers

A hardware evaluation of π-cipher

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Tiny Tate Bilinear Pairing Core Specification. Author: Homer Hsing

Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel?

FPGA architecture and design technology

Implementation and Comparative Analysis of AES as a Stream Cipher

DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA

Lecture 41: Introduction to Reconfigurable Computing

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

PRESENT An Ultra-Lightweight Block Cipher

Cryptography for Resource Constrained Devices: A Survey

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

a, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign

P V Sriniwas Shastry et al, Int.J.Computer Technology & Applications,Vol 5 (1),

Stratix II vs. Virtex-4 Performance Comparison

FPGA Implementation and Evaluation of lightweight block cipher - BORON

PINE TRAINING ACADEMY

Warbler: A Lightweight Pseudorandom Number Generator for EPC C1 Gen2 Passive RFID Tags

This is a repository copy of High Speed and Low Latency ECC Implementation over GF(2m) on FPGA.

Circuit Design and Simulation with VHDL 2nd edition Volnei A. Pedroni MIT Press, 2010 Book web:

Architectures and FPGA Implementations of the. 64-bit MISTY1 Block Cipher

AES as A Stream Cipher

TSEA44 - Design for FPGAs

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Topics. Midterm Finish Chapter 7

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Efficient Hardware Design and Implementation of AES Cryptosystem

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

ELEC 427 Final Project Area-Efficient FFT on FPGA

Implementation of Full -Parallelism AES Encryption and Decryption

EECS150 - Digital Design Lecture 09 - Parallelism

VHDL for Synthesis. Course Description. Course Duration. Goals

Differential Fault Analysis of Trivium

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

FPGA Matrix Multiplier

CYBER SECURITY LAB, IIT KANPUR MUGDHA JADHAO IIT ROORKEE (SECOND YEAR, ELECTRICAL) UNDER THE GUIDANCE - PROF. SANDEEP K. SHUKLA

SHA3 Core Specification. Author: Homer Hsing

Hardware Architectures

Advanced Encryption Standard Implementation on Field Programmable Gate Arrays. Maryam Behrouzinekoo. B.Eng., University of Guilan, 2011

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

HDL Coding Style Xilinx, Inc. All Rights Reserved

ECE 545 Lecture 12. FPGA Resources. George Mason University

Parallelized Radix-4 Scalable Montgomery Multipliers

In this lecture, we will focus on two very important digital building blocks: counters which can either count events or keep time information, and

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO.

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

The Serial Commutator FFT

Design of S-box and IN V S -box using Composite Field Arithmetic for AES Algorithm

Design and Implementation of Rijndael Encryption Algorithm Based on FPGA

Low area implementation of AES ECB on FPGA

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

TWO GENERIC METHODS OF. Chinese Academy of Sciences

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering

Outline of Presentation

SNOW 3G Stream Cipher Operation and Complexity Study

EE178 Spring 2018 Lecture Module 4. Eric Crabill

Synthesis Options FPGA and ASIC Technology Comparison - 1

Performance Analysis of Hummingbird Cryptographic Algorithm using FPGA

Algorithms and arithmetic for the implementation of cryptographic pairings

Transcription:

Design Space Exploration of the Lightweight Stream Cipher WG- for FPGAs and ASICs Gangqiang Yang, Xinxin Fan, Mark Aagaard and Guang Gong University of Waterloo g37yang@uwaterloo.ca Sept 9, 013 Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9

Smart devices are everywhere. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9

Smart devices are everywhere. Lightweight symmetric ciphers. Block ciphers: HIGHT, PRESENT, CLEFIA, LED, PRINCE, SIMON/SPECK. Stream ciphers: Grain, Trivium, MICKEY, lightweight instances of WG stream cipher family. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9

FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Figure 1: The High-Level Hardware Architecture of the Lightweight Stream Cipher WG- Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 3 / 9

Overview 1 Introduction to WG- Hardware architecture of WG- Preliminaries Behaviour and hardware architecture of WG- 3 The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using look-up tables Implementation of WGT- module using 1 Implementation of WGT- module using Implementation of WGT- module using 3 4 The multiplication by ω module 5 Results Results for FPGAs Results for ASICs Discussions 6 Conclusions Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 4 / 9

Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9

Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Recent work has demonstrated the advantages of tower field constructions for finite field arithmetic in the AES(S-box) and WG-16 ciphers. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9

Introduction to WG- WG- WG- is a recently proposed lightweight instance of the WG stream cipher family. Good randomness properties, including period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Resist to time/memory/data trade-off attack, differential attack, algebraic attack, correlation attack, cube attack and discrete fourier transform (DFT) attack. Recent work has demonstrated the advantages of tower field constructions for finite field arithmetic in the AES(S-box) and WG-16 ciphers. Purpose. Investigate three different tower field constructions in F and analyze their effect in hardware architectures for WG-. Explore the design space and hardware performance for WG- on low-cost FPGAs and CMOS 65nm ASICs in terms of area, speed and power consumption. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9

Introduction to WG- Main contribution We proposed four different hardware architectures for WG- transformation module (WGT-). 1. Directly employs look-up table over F.. Based on the tower construction F ( 4 ) together with small 4 4 look-up tables for arithmetic in F 4. 3. Based on the tower construction F ( 4 ), but with type-i optimal normal basis (ONB) for efficient computations in F 4. 4. Benefits from the construction F (( ) ) enables the simplification of the trace of product of two finite field elements. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 6 / 9

Hardware architecture of WG- Preliminaries Preliminaries p(x) = x + x 4 + x 3 + x + 1, a primitive polynomial of degree over F. F, the extension field of F defined by the primitive polynomial p(x) with elements. ω is a primitive element of F and p(ω) = 0. Tr(x) = x + x + x + x 3 + x 4 + x 5 + x 6 + x 7, the trace function from F F. l(x) = x 0 + x 9 + x + x 7 + x 4 + x 3 + x + x + ω, the feedback polynomial of LFSR (which is also a primitive polynomial over F ). q(x) = x + x 3 +1 + x 6 + 3 +1 + x 6 3 +1 + x 6 + 3 1, a permutation polynomial over F. WGP-(x d ) = q(x d + 1) + 1, the WG- permutation with decimation d = 19 from F F, where d is coprime to 1. WGT-(x d ) = Tr(WGP-(x d )) = Tr(q(x d + 1)), the WG- transformation with decimation d = 19 from F F. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 7 / 9

Hardware architecture of WG- Behaviour and hardware architecture of WG- Behaviour of WG- WG- consists of a 0-stage LFSR with feedback polynomial l(x) followed by WG- transformation module (WGT-(x 19 )). It contains loading and initialization phase, and running phase. Loading and initialization phase. The loading phase will take 0 clock cycles for loading the initial state S i. The initialization phase will run for 40 clock cycles after the loading phase based on the recursive relation in the picture. S 19 S 1 S 17 S 16 S 15 S 14 S 13 S 1 S 11 S 10 x 19 WGP-(x 19 ) ω S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 WGP-(x 19 ): WG- Permutation Module Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9

Hardware architecture of WG- Behaviour and hardware architecture of WG- Running phase ω S 19 S 1 S 17 S 16 S 15 S 14 S 13 S 1 S 11 S 10 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 WGT-(x 19 ) WGT-(x 19 ): WG- Transformation Module x 19 WGP-(x 19 ) Tr( ) 1 keystream WGP-(x 19 ): WG- Permutation Module Tr( ): Trace Computation Module Running phase The recursive relation for updating the LFSR in the running phase is different from the loading and initialization phase. 1-bit keystream is generated from the trace module after each clock cycle. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 9 / 9

Hardware architecture of WG- Behaviour and hardware architecture of WG- Hardware architecture of WG- FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Four main components. 0-stage LFSR. WGT-(x 19 ). (most important) FSM. Multiplication by ω module. WGT-(x 19 ) was further split into WGP-(x 19 ) and trace modules. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 10 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using LUT and Tower Field arithmetic 4 ( ) 4 ( ) (( ) ) f ( x) 1 f ( x) /1 g ( x) 3 LUT 4 e ( x) 1 4 e ( x) ( ) f ( x) 3 Figure : Look-up table in F Figure 3: Tower Field 1 construction for F Figure 4: Tower Field construction for F e ( x) 3 Figure 5: Tower Field 3 construction for F Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 11 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using look-up tables Implementation of WG- transformation module using look-up tables Pre-compute WGP-(x 19 ) and store the results in a look-up table, (Table(WGP-)). Pre-compute WGT-(x 19 ) = Tr(WGP-(x 19 )) and store the results in a 1 look-up table, (Table(WGT-)). FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 Table(WGT-) Table(WGP-) 1 Keystream Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 1 Implementation of WGT- module using 1 (Tower Field arithmetic) 4 ( ) 4 f ( x) 1 e ( x) 1 Figure 6: Tower Field 1 construction for F The tower construction is used to calculate the multiplication efficiently in F. t is calculated by cyclic shift operation using normal basis (NB). The conversion matrices between and NB representations are needed. e 1 (x) is a primitive polynomial. The arithmetic in F 4 is conducted with the aid of a 4 4 exponentiation table T exp and a 4 4 logarithm table T log. The table T exp stores exponentiation α i, i = 0, 1,..., 14. The table T log keeps the exponent i for each α i, i = 0, 1,..., 14. AB = T exp(t log A + T log B). Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 13 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 1 Implementation of WGT-(x 19 ) module WGP- and WGT- modules. WGP-(x 19 ) = y + y 3 +1 + y 6 (y 3 +1 + y 3 1 ) + y 3 ( 3 ( 1)+1 + 1, ) y = x 19 + 1. WGT-(x 19 ) = Tr y + y 3 +1 + y 6 (y 3 +1 + y 3 1 ) + y 3 ( 3 1)+1. A simple WGT-(x 19 ) module described in normal basis (NB). 4 x 4 x M 4 + M x x 1 x x 19 M S : -bit Multiplier : -bit Squarer 1 y ( ) 3 1 y 3 y 3 M y 3 1 3 y y 6 6 y 3 ( 3 1) y 3 +1 y 1 y 6 (y 3 +1 + y 3 1 ) M 4 M 3 1 + 3 + 4 1 WGT-(x19) Tr( ) 1 WGP-(x 19 ) Initialization Feedback Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 14 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using Implementation of WGT- module using (Tower Field arithmetic) 4 ( ) 4 f ( x) /1 e ( x) f (x) is the same as f 1 (x). e (x) is only an irreducible polynomial. Type-I optimal normal basis(onb) exists, and the arithmetic calculation in F 4 is very efficient. Figure 7: Tower Field construction for F Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 15 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 4 x 4 x M 4 + M x x 1 x x 19 M S : -bit Multiplier : -bit Squarer 1 y ( ) 3 1 y 3 y 3 M y 3 1 3 y y 6 6 y 3 ( 3 1) y 3 +1 y 1 y 6 (y 3 +1 + y 3 1 ) M 4 M 3 1 + 3 + 4 1 WGT-(x19) Tr( ) 1 WGP-(x 19 ) Initialization Feedback Figure : Simple description of WGT-(x 19 ) module in normal basis (NB) Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 16 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using x NB 0x : Method 1 0xFF: Method y 4 1 M T N x 4 M x 4 N T M M NB x NB y NB M N T M N T y 3 x x x 4 + y 3 3 M M NB N T x 19 y 3 +1 y 1 1 + ( ) 3 1 y 3 1 y 3 3 ( 3 1) y 3 1 WGT-(x 19 ) M T N M y 3 ( 3 1) NB NB N T M Tr( ) 1 3 WGP-(x 19 ) y y y M 6 T N M T N 6 6 y M NB NB N T M Initialization 4 6 (y 3 +1 + y 3 1 ) Feedback 3 + 4 0x : Method 1 : Different Trace Functions for Method 1 and Method 0xFF: Method M S : -bit Multiplier : -bit Squarer Figure 9: The Hardware Architecture of WGT-(x 19 ) module for 1 and. Differences between 1 and. Multiplication M. Conversion matrices. Trace function. Representation of 1 in different tower field. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 17 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Implementation of WGT- module using 3 (Tower Field arithmetic) (( ) ) g ( x) 3 The arithmetic operations in F, F ( ), F (( ) ) are all computed using logic directly. ( ) f ( x) 3 e ( x) 3 Figure 10: Tower Field 3 construction for F U V Special trace property can be obtained based on this tower construction. M V The trace of multiplication U and V in F (( ) ) can be computed as the inner product of U and V, i.e, Tr(UV) = Tr Z = U M Tr 7 i=0 (u i 1 v i ) U V Z Z Gangqiang Yang (University of Waterloo) ULightweight Stream Cipher WG- Sept 9, 013 1 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGT-(x 19 ) module for running phase using the trace property WGT- (x 19 ) can be computed as follows: WGT-(x 19 ) = Tr(WGP-(x 19 )), x F ( ) = Tr y y 3 +1 y 6 (y 3 +1 y 3 1 ) y 3 ( 3 1) y = Tr(y y 3 +1 ) 1 Tr(y 6 (y 3 +1 y 3 1 )) 1 Tr(y (3 1) y 6 ) = Tr(y y 3 +1 ) 1 Tr(y 6 (y 3 +1 y 3 1 y (3 1) )). Two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y inside the trace function have been replaced by the bitwise AND and XOR operations. Problem: The values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y are missing. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 19 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGP-(x 19 ) module for the initialization phase WGP- can be computed as follows: WGP-(x 19 ) = q(x 19 + 1) + 1, x F = (1 y y 3 +1 ) (y 6 (y 3 +1 y 3 1 )) (y 3 ( 3 1) y). y 3 +1 and y 3 1 can be obtained from the WGT-(x 19 ) module. Question: How can we recover the values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y? Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 0 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Hardware architecture of WGP-(x 19 ) module for the initialization phase WGP- can be computed as follows: WGP-(x 19 ) = q(x 19 + 1) + 1, x F = (1 y y 3 +1 ) (y 6 (y 3 +1 y 3 1 )) (y 3 ( 3 1) y). y 3 +1 and y 3 1 can be obtained from the WGT-(x 19 ) module. Question: How can we recover the values of these two multiplications y 6 (y 3 +1 y 3 1 ) and y 3 ( 3 1) y? WGP- module is only used in the initialization phase. we can keep the throughput of the running phase to be 1 bit/cycle, while decreasing the throughput of the initialization phase (i.e., < 1 bit/cycle). It can be done by increasing the length of the initialization phase and reuse the existing multipliers. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 0 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 a b M c M e 1 d a c sel M e b d sel Figure 11: Reuse the multiplier in two consecutive clock cycles. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 1 / 9

The WG- transformation module WGT- (x 19 ) Implementation of WGT- module using 3 Integrated hardware architecture for computing WGT- and WGP- 0xFF y MT N 6 y 6 NB y y 3 3 y 3 0 MN T NB NB S y y 3 M sel 1 S y 4 0 MUX 3 MN T sel 1 0 1 MUX 4 M MUX 1 sel y 6 7 1 i=0 xi Keystream y y 7 i=0 3 +1 1 1 xi 0 1 1 WGT-(x 19 ) MUX M y 3 ( 3 1) 0 1 MUX 6 -bit Reg 1 clk -bit y 3 ( 3 1) Reg MN T clk y 3 1 MT N NB 1 y 3 +1 y 3 1 y 3 ( 3 1) NB y (3 1) NB 3 MN T y (3 1) x (NB) 4 1 x 4 NB MN T x NB x 4 x MN T M x 1 M 1 x 19 MUX 0 5 -bit Reg 3 clk MT N NB Intialization Feedback WGP-(x 19 ) MN T x Figure 1: The Integrated Hardware Architecture for Computing WGP-(x 19 ) and WGT-(x 19 ) Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9

The multiplication by ω module The multiplication by ω module FSM ω Mult DIN S 19 S 9 S S 7 S 6 S 5 S 4 S 3 S S 1 S 0 x 19 WGP-(x 19 ) Tr( ) 1 Keystream WG- Transformation Module WGT-(x 19 ) Using finite field arithmetic: X ω = x 7 + x 0 ω + (x 1 1 x 7 )ω + (x 1 x 7 )ω 3 + (x 3 1 x 7 )ω 4 + x 4 ω 5 + x 5 ω 6 + x 6 ω 7. X F under polynomial basis (PB). X ω = M (x 0, x 1,, x 6, x 7 )T. Normal basis (NB). Using the look-up table. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 3 / 9

Results Results for FPGAs Results for flip-flops, area, speed, and power consumption analysis of FPGA implementations Spartan-3(xc3s1000) Key IV Data #FFs Area Maximum Dynamic Maximum Optimality Implementations Size Size Rates Frequency Power Throughput T/A T/P T/(A*P) (bits) (bits) (bits/ (Slices) (MHz) (W) (Mbps) (Mbps/ (Mbps/ (Mbps/ cycle) #Slices) W) Slices*W) WG- (LUT) 1 5 137 190 0.005 190 1.39 3000 77.4 WG- (LUT) 11 07 39 19 0.016 11 5.31 13000 331.7 WG- ( 1) 1 3 67 19 0.671 19 0.03.3 0.04 WG- ( 1) 11 79 5106 19 4. 09 0.04 4. 0.010 0 0 WG- ( ) 1 3 343 4 0.339 4 0.1 13.9 0.36 WG- ( ) 11 306 369 4 1.66 46 0.0 74 0.1 WG- ( 3) 1 114 436 49 0.67 49 0.11 13.5 0.4 WG- ( 3) 11 470 795 44.399 44 0.17 01.7 0.07 Grain 0 64 1 44 196 196 4.45 Trivium 0 0 1 50 40 40 4.0 We use Synopsys Synplify for synthesis, Xilinx ISE for implementation and Modelsim for simulation. All results are obtained after post place-and-route phase and the dynamic power consumption are recorded at a frequency of 33.3 MHz except 1 (16.7 MHz). For serial design, we take advantage of SRL16 in Spartan-3 device, which can reduce the flip-flop numbers and area significantly. Using parallel LFSR to achieve data rate from one to eleven bits/cycle. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 4 / 9

Results Results for ASICs Area, speed, and power consumption results for ASIC implementations. CMOS 65nm Key IV Data Area Optimum Total Maximum Optimality Implementations Size Size Rates Frequency Power Throughput T/A T/P T/(A*P) (bits) (bits) (bits/ (GE) (MHz) (mw) (Mbps) (Mbps/ (Mbps/ (Mbps/ cycle) #GE) mw) GE*mW) WG- (LUT) 1 176 500 0.93 500 0. 50.6 0.5 WG- (LUT) 11 394 610 1.344 6710 1.70 499.6 1.7 WG- ( 1) 1 753 9 35. 9 0.03 6.40 0.001 WG- ( 1) 11 476 1 100.1 134 0.03 13.4 0.0003 0 0 WG- ( ) 1 316 60 6.97 60 0.0 37.3 0.01 WG- ( ) 11 66 05 44.6 55 0.10 50.6 0.00 WG- ( 3) 1 91 54 7.9 54 0.0 31.3 0.011 WG- ( 3) 11 19 05 53.3 55 0.11 4.3 0.00 Grain 1 116 100.04 100 0.91 500 0.44 0 64 Grain 11 116 109.5 107 10.73 536 4.77 Trivium 1 196 96 3. 96 0.4 47.9 0.1 0 0 Trivium 11 0 990 4 1090 5.37 7.5 1.34 We use Synopsys Design Compiler for synthesis and Cadence SoC Encounter for place & route phase. we use TSMC CMOS 65nm LPLV technology. All results are obtained after post place-and-route phase and the dynamic power consumption are recorded at their optimum frequency for all designs. The Modelsim simulation is run for 000 clock cycles, including the initialization and running phases. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 5 / 9

Results Discussions Discussions The WG- (LUT) method is the best hardware solution for the WG- stream cipher. For the three tower field methods, the best one is dependent on the data rates and different metrics. Table 1: The best Tower Field method for different metrics Data Rates FPGA ASIC T/A T/P T/(A*P) T/A T/P T/(A*P) 1 3 3 /3 11 3 /3 Table : The number of multipliers and multiplexers and the area of them Implementations Multipliers Multiplier Multiplier Multiplexers (Number) (Single area) (Total areas) (Number) (slices) (slices) 1 7 53 371 0 7 9 03 0 3 5 6 130 6 Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 6 / 9

Results Discussions Discussions Compare results with Grain and Trivium. For the FPGA results, Grain and Trivium are better than WG-(LUT) method in metric of T/A for both date rates one and eleven. For the ASIC results, WG-(LUT) is 105% better than Trivium in metric of T/P and 13% better in T/(A*P) for data rates one. For data rates eleven, it is 3% better in T/P. For ASIC results, WG-(LUT) is % better than Grain in metric of T/P for data rates one. Guaranteed mathematical properties of WG-. WG- has desired randomness properties like period, balance, ideal two-level autocorrelation, ideal tuple distribution, and exact linear complexity. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 7 / 9

Conclusions Conclusions and future work Conclusions: Four implementation methods for WG- transformation module (WGT-) are proposed. LUT: Look-up tables. 1: Tower construction F ( 4 ) with small 4 4 look-up tables in F 4. : Tower construction F ( 4 ) with Type-I optimal normal basis (ONB) in F 4. 3: Tower construction F (( ) ) with special trace property. The tower field constructions affect the hardware architecture of WG- and also the area of one multiplier. Make trade-offs in terms of area, speed, and power consumption. Among the three tower field constructions, with type-i optimal normal basis is the best choice in most cases for WG-. Future work: Explore more efficient tower field constructions for large size WG stream ciphers, i.e, WG-10, WG-11, WG-14, and other lightweight stream ciphers. More architecture optimizations, such as pipelining and reuse techniques can be performed to improve the speed, area and power consumption. Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 / 9

Conclusions Q & A? Gangqiang Yang (University of Waterloo) Lightweight Stream Cipher WG- Sept 9, 013 9 / 9