P V Sriniwas Shastry et al, Int.J.Computer Technology & Applications,Vol 5 (1),

Similar documents
AES ALGORITHM FOR ENCRYPTION

Implementation of Full -Parallelism AES Encryption and Decryption

Optimized AES Algorithm Using FeedBack Architecture Chintan Raval 1, Maitrey Patel 2, Bhargav Tarpara 3 1, 2,

FPGA BASED CRYPTOGRAPHY FOR INTERNET SECURITY

A High-Performance VLSI Architecture for Advanced Encryption Standard (AES) Algorithm

Minimum Area Cost for a 30 to 70 Gbits/s AES Processor

2016 Maxwell Scientific Publication Corp. Submitted: August 21, 2015 Accepted: September 11, 2015 Published: January 05, 2016

Low area implementation of AES ECB on FPGA

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

Cryptographic Algorithms - AES

128 Bit ECB-AES Crypto Core Design using Rijndeal Algorithm for Secure Communication

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

On-Line Self-Test of AES Hardware Implementations

Low-power and area-optimized VLSI implementation of AES coprocessor for Zigbee system

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures

Design and Implementation of Rijndael Encryption Algorithm Based on FPGA

FPGA CAN BE IMPLEMENTED BY USING ADVANCED ENCRYPTION STANDARD ALGORITHM

Efficient Hardware Design and Implementation of AES Cryptosystem

FPGA Based Design of AES with Masked S-Box for Enhanced Security

FPGA Can be Implemented Using Advanced Encryption Standard Algorithm

Encryption and Decryption by AES algorithm using FPGA

The Encryption Standards

AES Implementation for RFID Tags: The Hardware and Software Approaches

VLSI Implementation of Advanced Encryption Standard using Rijndael Algorithm

AES as A Stream Cipher

ECE596C: Handout #7. Analysis of DES and the AES Standard. Electrical and Computer Engineering, University of Arizona, Loukas Lazos

OPTICAL networks require secure data transmission at

Fully Pipelined High Throughput Cost Effective FPGA Based Implementation of AES Algorithm

Cryptography and Network Security

ASIC Performance Comparison for the ISO Standard Block Ciphers

Cryptography and Network Security. Sixth Edition by William Stallings

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO.

SLICED: Slide-based concurrent error detection technique for symmetric block ciphers

International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3 Issue 3, March 2014

A Novel Approach of Area Optimized and pipelined FPGA Implementation of AES Encryption and Decryption

CHAPTER 1 INTRODUCTION

FAULT DETECTION IN THE ADVANCED ENCRYPTION STANDARD. G. Bertoni, L. Breveglieri, I. Koren and V. Piuri

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

Design of Least Complex S-Box and its Fault Detection for Robust AES Algorithm

CRYPTOGRAPHY plays an important role in the security

HIGH DATA RATE 8-BIT CRYPTO PROCESSOR

IMPLEMENTATION OF EFFICIENT AND HIGH SPEED AES ALGORITHM FOR SECURED DATA TRANSMISSION

The Use of Runtime Reconfiguration on FPGA Circuits to Increase the Performance of the AES Algorithm Implementation

Speeding Up AES By Extending a 32 bit Processor Instruction Set

Advanced Encryption Standard and Modes of Operation. Foundations of Cryptography - AES pp. 1 / 50

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

FPGA IMPLEMENTATION OF HIGHLY AREA EFFICIENT ADVANCED ENCRYPTION STANDARD ALGORITHM

Implementation of Stronger S-Box for Advanced Encryption Standard

Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications

Energy Efficiency Analysis and Implementation of AES on an FPGA

Design and Implementation of Rijindael s Encryption and Decryption Algorithm using NIOS- II Processor

Cryptography and Network Security Block Ciphers + DES. Lectured by Nguyễn Đức Thái

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of Parallel AES Encryption Engines for Multi-Core Processor Arrays

ENHANCED AES ALGORITHM FOR STRONG ENCRYPTION

Design and Implementation of 3-D DWT for Video Processing Applications

FPGA Based Implementation of AES Encryption and Decryption with Verilog HDL

Implementation of the block cipher Rijndael using Altera FPGA

The Serial Commutator FFT

A Countermeasure Circuit for Secure AES Engine against Differential Power Analysis

Area Optimization in Masked Advanced Encryption Standard

The Study of GF (2 4 ) 2 AES Encryption for Wireless FPGA Node

@ 2014 SEMAR GROUPS TECHNICAL SOCIETY.

Block Ciphers and Data Encryption Standard. CSS Security and Cryptography

Shrivathsa Bhargav Larry Chen Abhinandan Majumdar Shiva Ramudit

Implementation and Comparative Analysis of AES as a Stream Cipher

VLSI Implementation of Enhanced AES Cryptography

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

Hardware-Focused Performance Comparison for the Standard Block Ciphers AES, Camellia, and Triple-DES

Volume 5, Issue 5 OCT 2016

DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA

1 Contents. Version of EnSilica Ltd, All Rights Reserved

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

Design Implementation of Composite Field S-Box using AES 256 Algorithm

A Reliable Architecture for Substitution Boxes in Integrated Cryptographic Devices

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Elastic Block Ciphers: The Feistel Cipher Case

Sharing Resources Between AES and the SHA-3 Second Round Candidates Fugue and Grøstl

Parallel and Pipeline Processing for Block Cipher Algorithms on a Network-on-Chip

PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER

AES1. Ultra-Compact Advanced Encryption Standard Core AES1. General Description. Base Core Features. Symbol. Applications

FPGA and ASIC Implementations of AES

Design of S-box and IN V S -box using Composite Field Arithmetic for AES Algorithm

Area And Power Optimized One-Dimensional Median Filter

Designing a High-End Cryptographic Engine for Multi-Core Processor Arrays of FPGA

Comparison of Performance of AES Standards Based Upon Encryption /Decryption Time and Throughput

Using Error Detection Codes to detect fault attacks on Symmetric Key Ciphers

BLOWFISH ALGORITHM ON ITS OWN CLOUD COMPUTER PERFORMANCE AND IMPLEMENTATION

Enhanced Key Expansion Algorithm for Advanced Encryption Standard using Different S- Box Implementation on FPGA

Towards Optimal Custom Instruction Processors

Design and Implementation of Advanced Encryption Algorithm with FPGA and ASIC

International Journal of Informative & Futuristic Research ISSN:

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

CCproc: A custom VLIW cryptography co-processor for symmetric-key ciphers

An Efficient Pipelined Multiplicative Inverse Architecture for the AES Cryptosystem

An Instruction Set Extension for Fast and Memory- Efficient AES Implementation. Stefan Tillich, Johann Großschädl, Alexander Szekely

AES Core Specification. Author: Homer Hsing

Architectural Optimization for a 1.82Gbits/sec VLSI Implementation of the AES Rijndael Algorithm

Transcription:

On-The-Fly AES Key Expansion For All Key Sizes on ASIC P.V.Sriniwas Shastry 1, M. S. Sutaone 2, 1 Cummins College of Engineering for Women, Pune, 2 College of Engineering, Pune pvs.shastry@cumminscollege.in Abstract This paper proposes the design and implementation of On-The-Fly (OTF) computation of round keys of Advanced Encryption Standard (AES) for all key sizes. The OTF implementation architecture has ensured generation of round key of 128 bits each for the input cipher key sizes of 128, 192 and 256 bits. The implementation was targeted on 180nm CMOS technology using standard cell libraries. Key expansion unit is such designed that, it can be used for both encryption and decryption of AES. The design was clocked at 179MHz to generate 128-bit round keys at a throughput of 22.912Gbps. Key words: On-The-Fly Key Expansion, AES, Very Large Scale Integration (VLSI), All key sizes. 1. Introduction Advanced Encryption Standard (AES) is a symmetric key, block cryptographic algorithm [1]. The rapidly growing need of secure data communication on mobile computing platforms as well as portable devices has led to increasing demand of hardware implementation of stronger encryption standards like AES. The hardware implementation of AES is more reliable and introduces more security against attacks. The need of higher speed of operations and higher security has instigated many researchers to implement the crypto-system algorithms on FPGA and ASIC platforms. Researchers have implemented AES using rolled architectures, pipelined architectures, subpipelined architectures. To date several AES implementations have been published to target very low area designs, while some have been targeting high throughput approaches. Rolled architecture implementations have resulted into minimum use of silicon area and low power, whereas pipelined architectures have achieved high throughput in several tens of Gbps. Further better results were achieved in these same architectures by optimizing substitute box and mixed column operations of AES. The OTF computation of round keys required by encryption or decryption block are performed in the key expansion unit without needing memory to store the keys [2]. Instead of dedicated key expansion units for different key lengths, an architecture which support different key lengths combined with key generating process for encryption as well as decryption, can significantly reduce the hardware cost of full key length AES [3]. The computation of substitute byte on the fly employs the use of composite field arithmetic in reducing the complexity while computing the multiplicative inverse in GF(2 8 ) has further reduced the power consumption and helped in increasing the speed [2][3]. The implementation of substitute byte function involves handling the nonlinearity properties of multiplicative inverse computation of an input byte. The substitute byte operation is a byte function hence an AES implementation with 128bit depth of data path requires sixteen such concurrent functions. Concurrently the substitute byte operation is also needed while performing the key expansion. In this paper we have presented OTF architecture for round key generation for all cipher key sizes. The substitute byte operation is also performed using combinational circuit and hence does not require the memory elements. The design uses limited resources with merely one 256 bit register, for all key sizes. The rest of the paper is organized in the following manner, Section 2 describes the Key expansion unit, Section 3 includes our proposed architecture and Section 4 gives the results and compares with that of others. Lastly in Section 5, conclusion of this work. 2. Key Expansion for AES The key expansion unit of AES takes a cipher key and conducts a key expansion routine to generate various round keys required based on the size of the original cipher key. The key expansion routine can generate 128-bit round keys required by AddRoundKey operation of the encryption or InvAddRoundKey operation of the decryption, from 128-bit or 192-bit or 256-bit input cipher key. The number of rounds (Nr) to be performed depends of the key size, and are mentioned in Table.I. Nb is the number of words of key data with 32bits of each word. The key expansion unit performs RotWord, SubWord and XOR operation with RCON. The explanation of each of these suboperations are given as under. The RotWord operation is a cyclic rotation of bytes within a word to left. This operation is applied only to the lowest significant word of the cipher key. Let the 217

4-byte word be represented as w[i], with i in the range 0 i < Nb(Nr+1), then the RotWord operation is performed to the word w[i k -1], where the condition {i k mod Nk = 0}, is satisfied. The value of Nk is 4, 6 or 8 for 128-bit,192-bit or 256-bit cipher keys respectively. SubWord is SubstituteByte transformation applied independently to each byte of an word w[i k -1], after RotWord operation is performed, except in case of 256- bit cipher keys. The SubWord operation for a 256-bit cipher key is performed on the w[i k -1] word where the condition {i k mod Nk = 0} and the condition {i k mod Nk = 4} is satisfied. RCON is the round constant word which is XORed with the substituted word after SubWord operation. The values of RCON array, [x i-1, {00}h, {00}h,{00}h ] are constituted for i, where the initial value starts with 1 and not 0. The values of x i-1 being powers of x, denoted as{02} h in the GF(2 8 ). Every following word, w[i] is equal to the XOR of the previous word, w[i-1] and the word Nk positions earlier, w[i-nk]. Refer Figure 1. The key expansion may be processing either 128-bit or 192-bit or 256-bit in each iteration, but the round keys supplied to the AddRoundKey operation in encryption or InvAddRoundKey operation in decryption is always 128-bit. This is because the data path consisting of encryption or decryption is always 128-bit depth, while the key expansion path may be different for different key sizes. (Nk) (Nb) 128 -bit 10 4 4 192 -bit 12 6 4 256 -bit 14 8 4 The derivation of round keys from the expanded keys is illustrated in Figure 2. In all there will be Ne key expansions, depending upon the key size, where the value of Ne can be computed as shown in equation (1). Hence the value of Ne is 10, 8 and 7 for 128-bit, 192-bit and 256-bit respectively, after substituting the values of Nr, Nb and Nk from Table I. Ne = (Nr * Nb)/Nk (1) The round keys are required in the reverse order while performing the decryption data path. Hence the round keys expanded while encryption are normally stored in the memory so as to retrieve the keys in the reverse order while decryption. For 128-bit and 192-bit keys: w[i-1] * = SubWord(RotWord(w[i-1]) w[i] = [{w[i-1] * RCON[i/Nk]} w[i-nk] ] ---for i mod Nk = 0; = w[i-1] w[i-nk] ---for other values of i; For 256-bit key: w[i-1] * = SubWord(RotWord(w[i-1]) w[i] = [{w[i-1] * RCON[i/Nk]} w[i-nk] ] ---for i mod Nk = 0; = SubWord(w[i-1]) ---for i mod Nb = 0; = w[i-1] w[i-nk] ---for other values of i; Figure 1. Computations of key expansions Cipher Key Size TABLE I. KEY EXPANSION COMBINATIONS Rounds (Nr) Words per expansion Words per round key (a) (b) (c) Figure 2. Key expansion for different key sizes (a) 128- bit key (b) 192-bit key (c) 256-bit key 3. Proposed Architecture for OTF Key expansion Our proposed architecture makes use of a 256-bit register, which temporarily logs the round keys. The 218

size of the register is chosen so as to accommodate expanded round key of all three sizes. The round keys are generated using multiple iterations and after every iteration the round key of 128-bit needed for AddRoundKey operation, is placed at the upper half of the register. As shown in the Figure 3, a multiplexer is used, which swaps the key expanded in the earlier iteration, to place the round key at the upper half of the 256-bit register. A common architecture is designed for all the three key sizes. The most critical part of the architecture is to manage different number of expansion iteration for each size, while keeping the round key size as 128-bit. With an assumption that the encryption and decryption data path is implemented using rolled architecture and every clock event to the encryption or decryption data path, results into one round of encryption. Hence the key expansion unit also has to generate one round key per clock cycle and this condition would be applicable for all three key sizes. As mentioned in the Figure 1, there are specific words which are operated with SubWord, RotWord and then XOR with RCON. The round key generation per clock cycle is based on 128-key expansion procedure. In order to match to timing for different key sizes, the original key as well as subsequent round keys are shuffled after every clock cycle. The advantage of data shuffling is that only four data processing elements would be required for completion of key expansion for three key sizes [7]. Figure 3 shows the above said A round counter is maintained so as to generate the select lines for the multiplexers. In case of 128-bit key expansion, each clock cycle generates one round key through one expansion iteration. Hence a total of 10 clock cycles would be needed to generate round keys using 128-bit expansion. In case of 192-bit key expansion, every three clock cycles generate three round keys through two expansion iterations, therefore we require 12 clock cycles. While expanding 256-bit keys, every two consecutive clock cycles generate two round keys through one expansion iteration, resulting in to use of total 14 clock cycles. These iteration and their required number of clock cycles are exactly matches with that of encryption or decryption data paths. In Figure 3(a), the swapping of the words are shown for 192-bit and 256-bit key expansion. In case of128-bit key expansion, no swapping of words is needed and hence the data lines joins direct vertically down to the corresponding word. While performing 128-bit key expansion, the words, w 4, w 8, w 12, w 16,etc., performs extra computations of RotWord, SubWord and XOR with RCON. Similarly the words w 6, w 12, w 18, w 24, etc., in 192-bit expansion performs extra computations alike 128-bit expansion. In case of 256- bit expansion the words w 8, w 16, w 24, w 32, - - w 56 perform RotWord, SubWord and XOR with RCON, while the words w 12, w 20, w 28, w 36, - -,w 52 performs only SubWord operation. The word multiplexers in Figure 3(b) selects the first input cipher key or swapped data word from the (a) (b) Figure 3. (a) Data swapping strategy (b) All key size key expansion architecture arrangement and the key expansion architecture. In our architecture we have generated controls signals which select the multiplexer data lines using sequential machine and no processor has been employed as done in [7]. previous key expansion iteration based on the swapping strategy shown in Figure 3(a). The architecture also performs the reverse expansion of the round keys for the decryption data path. 219

In our proposed architecture the splitting of the 256- bit data shuffling multiplexers [7] into word multiplexers has reduced the power consumption, because the multiplexers unselected remain inactive resulting into lower dynamic power consumption. The input to the multiplexer at 0 indexed port is for the cipher key given by the user. The input to the 1 indexed port is for the 128-bit expansion, 2 indexed port for 192-bit expansion and 3 indexed input port is for 256-bit key expansion. The design was synthesized using RTL Compiler of Cadence. Standard cell libraries of 180nm were employed for synthesizing the design. A clock frequency of 179MHz has successfully clocked the design, while having 495ps worst case slack. Irrespective of the key size, every clock cycle has generated one round key of 128-bit at a throughput of 22.91Gbps. The throughput calculations are done using equation (2). Throughput = 128 * Clock Frequency (2) The synthesis results are presented in Table II. The physical layout design on 180nm was performed using SoC Encounter of Cadence. The total design was fit into 61153 um 2 area, with a core density of 70%. 4. Results and Comparison We have implemented the OTF key expansion for all key sizes using TSMC 180nm cell libraries. We compare our implementation results in Table III. The design in [7] has similar implementation and clocked the design at 102MHz and achieving approximately 13.056Gbps. The design in [8] also implemented OTF key expansion unit, but only for 128-bit key size. Even though another similar implementation for different key sizes was proposed in design [4], but it was Table II. Synthesis result Particulars Values Standard cell 3231 Instances Standard Cell Area 16273 Power dissipation 1.79mW Slack 495ps Clock Frequency 179MHz Physical Area (Physical Layout) 61153 m 2 implemented on 250nm technology, also it has consumed 26,639 gate count which is quite higher than our gate count. The design proposed in [6] was also implemented on 180nm technology, but have used pipelined architecture for the 128-bit OTF key expansion. Also this design has used32-bit data path and achieved 10.656Gbps. 5. Conclusion We have presented a new optimization method while implementing the On-The-Fly Key expansion for all key sizes on 180nm technology, by splitting Multiplexers into word multiplexers and keeping them inactive, when not in use. Particularly while 128-bit key expansion is performed and while 192-bit key expansion is performed. This has not only reduced the number of gates required but also reduced the dynamic power consumption. 6. References Table III. Implementation Comparison Particulars [4] [7] Ours CMOS Technology 250nm 180nm 180nm Frequency (MHz) 66 102 179 Throughput (Gbps) 8.448 13.056 22.912 Gates 26,639 26,639 16,284 Key sizes 128, 192 and 256- bit 128,192 and 256- bit 128,192 and 256-bit Data path depth 128 bits 128 bits 128 bits [1] Advanced Encryption Standard (AES)", Federal Information Processing Standards Publications (FIPS PUBS) Publication 197, November, 2001. [2] Qingfu Cao, Shuguo Li, A high throughput costeffective ASIC implementation of the AES algorithm, Proc. IEEE 8th International Conference on ASIC (ASICON)2009, pp. 805-808. [3] Po-Chun Lie, Chang Hsie-Chia, Chen-Yi Lee, A 1.69Gbps area-efficient AES Crypto Core with compact on the fly key expansion unit, Proc. ESSCIRC 2009, pp. 404-407. [4] Chih-Pin Su, Chia-Lung Horng, Chih-Tsun Huang and Cheng-Wen Wu, A configurable AES processor for enhanced security, Proc. ASP-DAC 2005, pp. 361-366 [5] Shen-Fu Hsiao, Ming-Chih Chen, Chia-Shin Tu, Memory-free low cost designs of advanced encryption standard using common subexpression eliminationfor subfunctions in transformations, IEEE Trans. Circuits and Systems -I: Regular papers, Vol.53, No.3, March 2006, pp 615-626. [6] P Saravanan, N Renukadevi, G Swathi, P Kalpana, A high-throughput ASIC implementation of configurable advanced encryption standard(aes), Proc. IJCA special issue on Network Security and Cryptography NSC, 2011. 220

[7] Mao-Yin Wang, Chih-Pin Su, Chia-Lung Horng, Chen- Wen Wu, Chih-Tsun Huang, Single and multi-core configurable AES architectures for flexible security, IEEE Tans. on Very Large Scale Integration (VLSI) Systems, Vol.18, No. 4, April 2010, pp. 541-551. [8] A Alma aitah, Zine-Eddine Abid, Area efficient-high throughput sub-pipelined design of the AES in CMOS 180nm, Proc. 5 th International Design and Test Workshop (IDT), 2010, pp. 31-36. 221