Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware

Similar documents
Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Hardware Architectures

ECE 545 Lecture 8b. Hardware Architectures of Secret-Key Block Ciphers and Hash Functions. George Mason University

Lecture 2B. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Experimental Testing of the Gigabit IPSec-Compliant Implementations of Rijndael and Triple DES Using SLAAC-1V FPGA Accelerator Board

Comparison of the hardware performance of the AES candidates using reconfigurable hardware

Data Encryption Standard

High Performance Single-Chip FPGA Rijndael Algorithm Implementations

Data Encryption Standard

AES as A Stream Cipher

Implementation of the block cipher Rijndael using Altera FPGA

ECE 646 Lecture 7. Secret-Key Ciphers. Data Encryption Standard DES

FPGA and ASIC Implementations of AES

!"#$%&'()*+%&,-%&.*/.&0"&#%(1.*"0* 2+345*!%(,',%6.7*87'()*9/:37* :."&).*A%7"(*8('B.&7'6=* 8C2C3C*

Implementation and Comparative Analysis of AES as a Stream Cipher

ECE 646 Lecture 12. Cryptographic Standards. Secret-key cryptography standards

An FPGA-Based Performance Evaluation of the AES Block Cipher Candidate Algorithm Finalists

Efficient Hardware Design and Implementation of AES Cryptosystem

An FPGA Implementation and Performance Evaluation of the AES Block Cipher Candidate Algorithm Finalists

Jaap van Ginkel Security of Systems and Networks

A High-Performance VLSI Architecture for Advanced Encryption Standard (AES) Algorithm

RC-6 CRYPTOSYSTEM IN VHDL. BY:- Deepak Singh Samant

Advanced Encryption Standard and Modes of Operation. Foundations of Cryptography - AES pp. 1 / 50

ECE 646 Lecture 7. Data Encryption Standard DES. Secret-Key Ciphers. Secret agreement between IBM & NSA, 1974

Federal standards NIST FIPS 46-1 DES FIPS 46-2 DES. FIPS 81 Modes of. operation. FIPS 46-3 Triple DES FIPS 197 AES. industry.

AES1. Ultra-Compact Advanced Encryption Standard Core AES1. General Description. Base Core Features. Symbol. Applications

RC6 Implementation including key scheduling using FPGA

CONSIDERATIONS ON HARDWARE IMPLEMENTATIONS OF ENCRYPTION ALGORITHMS

Symmetric Encryption. Thierry Sans

Winter 2011 Josh Benaloh Brian LaMacchia

Stream Ciphers and Block Ciphers

Stream Ciphers and Block Ciphers

Implementation of Full -Parallelism AES Encryption and Decryption

FPGA CAN BE IMPLEMENTED BY USING ADVANCED ENCRYPTION STANDARD ALGORITHM

AES Advanced Encryption Standard

Secret Key Cryptography

AES Core Specification. Author: Homer Hsing

Data Encryption Standard (DES)

The Use of Runtime Reconfiguration on FPGA Circuits to Increase the Performance of the AES Algorithm Implementation

2Gbit/s Hardware Realizations of RIJNDAEL and SERPENT: A Comparative Analysis

A Comparative Study of Performance of AES Final Candidates Using FPGAs

Block Ciphers. Lucifer, DES, RC5, AES. CS 470 Introduction to Applied Cryptography. Ali Aydın Selçuk. CS470, A.A.Selçuk Block Ciphers 1

Area Optimization in Masked Advanced Encryption Standard

Implementation and Performance analysis of Skipjack & Rijndael Algorithms. by Viswnadham Sanku ECE646 Project Fall-2001

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

ECE596C: Handout #7. Analysis of DES and the AES Standard. Electrical and Computer Engineering, University of Arizona, Loukas Lazos

L3. An Introduction to Block Ciphers. Rocky K. C. Chang, 29 January 2015

VLSI Implementation of Advanced Encryption Standard for secured Electronic Voting Machine

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM

CCproc: A custom VLIW cryptography co-processor for symmetric-key ciphers

Comp527 status items. Crypto Protocols, part 2 Crypto primitives. Bart Preneel July Install the smart card software. Today

FPGA BASED CRYPTOGRAPHY FOR INTERNET SECURITY

2 Gbit/s Hardware Realizations of RIJNDAEL and SERPENT: A Comparative Analysis

ECE 297:11 Reconfigurable Architectures for Computer Security

Content of this part

Week 4. : Block Ciphers and DES

Week 5: Advanced Encryption Standard. Click

ECE 646 Lecture 8. Modes of operation of block ciphers

CIS 6930/4930 Computer and Network Security. Topic 3.1 Secret Key Cryptography (Cont d)

A Methodology to Implement Block Ciphers in Reconfigurable Hardware and its Application to Fast and Compact AES RIJNDAEL

Computer and Data Security. Lecture 3 Block cipher and DES

Low area implementation of AES ECB on FPGA

Block Ciphers and Stream Ciphers. Block Ciphers. Stream Ciphers. Block Ciphers

Chapter 6: Contemporary Symmetric Ciphers

Understanding Cryptography by Christof Paar and Jan Pelzl. Chapter 4 The Advanced Encryption Standard (AES) ver. October 28, 2009

FPGA Can be Implemented Using Advanced Encryption Standard Algorithm

Minimum Area Cost for a 30 to 70 Gbits/s AES Processor

Performance and Overhead in a Hybrid Reconfigurable Computer

An FPGA Implementation and Performance Evaluation of the Serpent Block Cipher

64-bit Block ciphers: hardware implementations and comparison analysis

Survey of Commercially available chips and IP cores implementing cryptographic algorithms

Introduction to Modern Symmetric-Key Ciphers

Advanced Encryption Standard Implementation on Field Programmable Gate Arrays. Maryam Behrouzinekoo. B.Eng., University of Guilan, 2011

Lecture 4. Encryption Continued... Data Encryption Standard (DES)

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010

Optimized AES Algorithm Using FeedBack Architecture Chintan Raval 1, Maitrey Patel 2, Bhargav Tarpara 3 1, 2,

Compact Dual Block AES core on FPGA for CCM Protocol

Architectures and FPGA Implementations of the. 64-bit MISTY1 Block Cipher

Lecture 13. Modern Cryptographic Algorithms. Key Sizes. Cryptographic Standards

How many DES keys, on the average, encrypt a particular plaintext block to a particular ciphertext block?

Vivado HLS Implementation of Round-2 SHA-3 Candidates

Modern Symmetric Block cipher

Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications

Presented by: Kevin Hieb May 2, 2005

Implementation and Performance analysis of Skipjack & Rijndael Algorithms

@ 2014 SEMAR GROUPS TECHNICAL SOCIETY.

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures

Cryptography MIS

Lecture 13. Modern Cryptographic Algorithms. Key Sizes. Cryptographic Standards. Secret-Key Cryptography. Modern Secret-Key Ciphers

Network Security Essentials Chapter 2

6 Block Ciphers. 6.1 Block Ciphers CA642: CRYPTOGRAPHY AND NUMBER THEORY 1

Cryptography and Network Security

ECE 646 Fall 2009 Final Exam December 15, Multiple-choice test

CPSC 467b: Cryptography and Computer Security

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

Benchmarking of Cryptographic Algorithms in Hardware. Ekawat Homsirikamol & Kris Gaj George Mason University USA

IDEA, RC5. Modes of operation of block ciphers

Can High-Level Synthesis Compete Against a Hand-Written Code in the Cryptographic Domain? A Case Study

Transcription:

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Master s Thesis Pawel Chodowiec MS CpE Candidate, ECE George Mason University Advisor: Dr. Kris Gaj, ECE George Mason University

Outline Introduction to AES contest Introduction to FPGAs and Hardware Implementations Modes of Operation Implementations in Basic Iterative Architecture Implementations in Pipelined Architecture Summary

Most Popular Secret-Key Ciphers American standards 1980 1990 2000 2010 2020 2030 1977 1999 2001 DES 56 bit key AES-contest Triple DES 112, 168 bit keys AES - Rijndael 128, 192, and 256 bit keys Other popular algorithms IDEA Blowfish RC5 CAST Serpent Twofish RC6 Mars

Deep Crack Electronic Frontier Foundation, 1998 Total cost: $220,000 Average time of search: 4.5 days/key 1800 ASIC chips, 40 MHz clock

AES Contest - NIST Evaluation Criteria Security Software Efficiency Hardware Efficiency Flexibility

AES Contest Effort June 1998 15 Candidates from USA, Canada, Belgium, France, Germany, Norway, UK, Isreal, Korea, Japan, Australia, Costa Rica Round 1 Security Software efficiency August 1999 5 final candidates Mars, RC6, Rijndael, Serpent, Twofish Round 2 Security Hardware efficiency October 2000 1 winner: Rijndael Belgium

Hardware Efficiency Comparisons Government and large companies ASIC Academia and small business FPGA NSA Mitsubishi USC WPI GMU UC Berkeley IBM MICRONIC

Which way to go? ASICs FPGAs High performance Low power Low cost (but only in high volumes) Off-the-shelf Low development costs Short time to the market Reconfigurability

Reconfigurability External ROM and microprocessor enable changing an FPGA function in several milliseconds Encryption vs. decryption vs. key scheduling FPGA FPGA FPGA Key scheduling Encryption Decryption Various algorithms 5-15 ms 5-15 ms FPGA FPGA FPGA AES Triple DES IDEA 5-15 ms 5-15 ms

Target FPGA devices Xilinx Virtex - XCV 1000 0.22 m CMOS process 12 288 CLB slices 32 4-kbit block RAMs 1 mln equivalent logic gates Up to 200 MHz clock Configurable Logic Block slices (CLB slices) Programmable Interconnects Block RAMs

Methodology and Tools Implementation Code in VHDL Verification 2. Synthesis and Implementation 1. Functional simulation Xilinx, Foundation Series v. 2.1 Aldec, Active-HDL Netlist with timing 3. Timing simulation Aldec, Active-HDL Bitstream 4. Experimental Testing USC-ISI, SLAAC-1V FPGA board

Top level block diagram control input/key Control unit input interface key scheduling encryption/decryption output interface memory of internal keys output

Primary factor in choosing the encryption/decryption unit architecture Symmetric-key cipher mode of operation: 1. Non-feedback cipher modes ECB, counter mode 2. Feedback cipher modes CBC, CFB, OFB

Non-feedback Counter Mode - CTR IV IV+1 IV+2 IV+N-1 IV+N... E E E E E... M 0 M 1 M 2 M N-1 M N C 1 C 2 C 3 C N-1 C N C i = M i AES(IV+i) for i=0..n

Feedback cipher modes - CBC M 1 M 2 M 3 M N-1 M N IV... E E E E E... C 1 C 2 C 3 C N-1 C N C 1 = AES(M i IV) C i = AES(M i C i-1 ) for i=2..n

Feedback cipher modes CBC, CFB, OFB

Typical Structure of a Block Cipher Round Key[0] Initial transformation i:=1 Round Key[i] Cipher Round i:=i+1 i<#rounds? #rounds times Round Key[#rounds+1] Final transformation

Basic iterative architecture multiplexer register one round combinational logic

Architectures suitable for feedback modes MUX register MUX round 1 one round combinational logic round 2........ round K round 1 round 2........ round #rounds

Partial Loop Unrolling multiplexer register K rounds combinational logic round 1 round 2..... round K

Loop Unrolling: Speed vs. Area Throughput - basic architecture - loop unrolling - resource sharing basic architecture loop-unrolling k=2 k=3 k=4 k=5 resource sharing Area

MARS - General Structure plaintext + subkey forward mixing keyed forward transformation keyed backwards transformation subkeys subkeys backwards mixing - subkey ciphertext

MARS - Keyed Transformation D3 from mix transf. D2 from mix transf. D1 from mix transf. D0 from mix transf. D3 loop D2 loop D1 loop D0 loop optional swap optional rotation to the right 128-bit register optional swap Keyed transformation core D3 loop D2 loop D1 loop D0 loop

MARS - Keyed Transformation Core D3 D2 D1 D0 32 32 32 32 1 +/ +/ 2 3 E 32 >>>13 <<<13 >>>13

MARS E-function 1 <<< S K[4+2i] 2 <<< K[5+2i] <<< 13 3 <<< 5 <<< 5 *

RC6 Round R 0 R 1 R 2 R 3 32 32 32 32 d e d e F S[2i+1] F S[2i] A A d e d e B 5 5 <<< + S[2i+1] + <<< S[2i] B R 1 R 0 R 3 R 2 d e e d d e e d feedback to R 0 feedback to R 1 feedback to R 2 feedback to R 3

Rijndael Different Circuits for Encryption and Decryption plaintext ciphertext ByteSub ShiftRow MixColumn InvMixColumn InvShiftRow subkey subkey InvByteSub ciphertext plaintext

Rijndael Resource Sharing Between Encryption and Decryption inversed affine transformation encryption inversed element in Galois field decryption affine transformation ShiftRow MixColumn InvShiftRow InvMixColumn subkey subkey

First basic architecture of Serpent - Serpent I1 Ki 128 128 128-bit register regular Serpent round 32 x S-box 0 128 128 128 32 x S-box 1 128 128 128 8-to-1 128-bit multiplexer 32 x S-box 7 K32 128 linear transformation output

Alternative basic architecture of Serpent: Serpent I8 128 K32 K0 K7 128 output 128-bit register round 0 32 x S-box 0 linear transformation round 7 32 x S-box 7 linear transformation 128 one implementation round of Serpent = 8 regular cipher rounds

Twofish Encryption/Decryption Round <<<1 F - function >>>1 <<<1 >>>1

Twofish F-function h q 0 M2 q 0 M0 q 1 q 1 q 0 q 0 q 1 q 0 q 1 MDS + PHT K 2i q 1 q 1 q 0 h q 0 M3 q 0 M1 q 1 q 1 q 0 q 0 q 1 q 0 q 1 MDS <<< 8 + <<< 9 K 2i+1 q 1 q 1 q 0

500 450 400 350 300 My Results: Basic architecture - Speed Throughput [Mbit/s] 431 414 250 200 150 177 142 100 50 61 59 0 Serpent Rijndael Twofish RC6 Mars 3DES

5000 4500 4000 My Results: Basic architecture - Area Area [CLB slices] 4507 3500 3000 2500 2507 2744 2000 1500 1000 500 1076 1137 356 0 Twofish RC6 Rijndael Mars Serpent 3DES

Comparison with results of other groups: Speed Throughput [Mbit/s] 500 450 400 350 300 250 200 150 100 50 0 431 Serpent I8 444 414 353 294 177 173 104 My Results University of Southern California Worcester Polytechnic Institute 149 Rijndael Twofish Serpent RC6 Mars I1 62 143 112 88 61 102

Comparison with results of other groups: Area Area [CLB slices] 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1076 2809 2666 Our Results University of Southern California Worcester Polytechnic Institute 1137 1749 2638 1250 5511 2507 4312 3528 2744 4621 Twofish RC6 Serpent Rijndael Mars I1 4507 7964 Serpent I8

My Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 300 Rijndael Serpent I8 200 Twofish Serpent I1 100 0 RC6 Mars 0 1000 2000 3000 4000 5000 Area [CLB slices]

NSA Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0.5 m CMOS Throughput [Mbit/s] 700 600 500 Rijndael 400 300 200 100 0 Serpent I1 RC6 Twofish Mars 0 5 10 15 20 25 30 35 40 Area [mm 2 ]

Conclusions for feedback cipher modes (1) (CBC, CFB, OFB) Speed (throughput) should be the primary criteria of comparison Basic iterative architecture is the most appropriate for comparison and future implementations Serpent and Rijndael are over twice as fast as the next best candidate for all implementations

Conclusions for feedback cipher modes (2) (CBC, CFB, OFB) Results confirmed by - three independent university groups for FPGAs, and - NSA group for ASICs Results of comparison independent of implementation technology (FPGAs vs. ASICs)

Non-Feedback Cipher Modes ECB, counter

Comparison for non-feedback cipher modes, e.g. Counter Mode - CTR IV IV+1 IV+2 IV+N-1 IV+N... E E E E E... M 0 M 1 M 2 M N-1 M N C 1 C 2 C 3 C N-1 C N C i = M i AES(IV+i) for i=0..n

NSA approach: Traditional methodology register MUX K registers MUX one round, no pipelining combinational logic round 1 = one pipeline stage round 2 = one pipeline stage round K = one pipeline stage.... #rounds registers round 1 = one pipeline stage round 2 = one pipeline stage.... round #rounds = one pipeline stage

My approach: New methodology a) b) register MUX k registers MUX one round, no pipelining combinational logic one round = k pipeline stages.... d) #rounds k registers c) round 1 = k pipeline stages round 2 =k pipeline stages............ K k registers round 1 = k pipeline stages round 2 = k pipeline stages MUX............ round #rounds =k pipeline stages.... round K = k pipeline stages....

My approach: Inner-Round Pipelining multiplexer register1 pipeline stage 1 one round register2 pipeline stage 2.... register k pipeline stage k

Comparison of the traditional and new design methodologies (1) Throughput mixed inner and outer-round pipelining K=2 K=3 - inner-round pipelining - mixed inner and outer-round pipelining - basic architecture - outer-round pipelining inner-round pipelining k=2 k opt K=2 basic architecture K=3 K=4 outer-round pipelining Area

Comparison of the traditional and new design methodologies (2) Latency inner-round pipelining mixed inner and outer-round pipelining k opt k=2 K=2 K=3 - inner-round pipelining - mixed inner and outer-round pipelining - basic architecture - outer-round pipelining basic architecture K=2 K=3 K=4 K=5 outer-round pipelining Area

NSA architecture: Full outer-round pipelining #rounds registers round 1 = one pipeline stage round 2 = one pipeline stage.... round #rounds = one pipeline stage Total # of pipeline stages = #rounds

NSA Results: Full outer-round pipelining CMOS ASIC 0.5 m Throughput [Gbit/s] 9 8 8.0 7 6 5.7 5 4 3 2 1 2.3 2.2 2.2 0 Serpent Rijndael Twofish RC6 Mars

My approach: Full mixed inner- and outer-round pipelining k registers round 1 = k pipeline stages.... round 2 =k pipeline stages........ round #rounds =k pipeline stages.... Total # of pipeline stages = #rounds k

18 16 14 12 10 8 6 4 2 My Results: Full mixed pipelining Throughput [Gbit/s] 16.8 Virtex FPGA 15.2 13.1 12.2 0 Serpent Twofish RC6 Rijndael

Speed-up compared to the basic architecture 100 90 80 70 60 50 40 30 20 10 29.5 9.5 Our results NSA 39 40 0 Rijndael Serpent Serpent Twofish I8 I1 RC6 Mars 86 21.5 91.5 21 38

My approach: Full Mixed Pipelining Cipher 1 Cipher 2 round 1 round 1 round 2 minimum clock period...... round 10 round 16 Speed = block size min_clock_period

My Results: Full mixed pipelining Area [CLB slices] 50000 45000 40000 dedicated memory blocks, RAMs 46,900 35000 30000 25000 20000 15000 19,700 21,000 12,600 10000 80 RAMs 5000 0 Serpent Twofish RC6 Rijndael

Conclusions for non-feedback cipher modes (1) ECB, counter All ciphers can achieve approximately the same speed. Area should be the primary criteria of comparison. Architecture with inner round pipelining combined with full outer round pipelining is the most appropriate for comparison and future implementations Serpent, Twofish and Rijndael are the most cost-efficient and take approximately the same amount of area

Conclusions for non-feedback cipher modes (2) ECB, counter No agreement regarding the methodology and architecture used for comparison NSA methodology favored ciphers with simple (fast) cipher round (Serpent and Rijndael) My methodology fair practical (superior throughput/area ratio)

Summary (1) All five AES finalists implemented in basic architecture The best throughput for four of them The best throughput/area ratio Four AES finalists implemented in mixed architecture The highest throughput ever reported

Summary (2) Basic iterative architecture is the most appropriate for comparison and future implementations of ciphers in feedback modes of operation Gives best throughput/area ratio Throughput is the comparison criteria Mixed architecture is the most appropriate for comparison and future implementations of ciphers in non-feedback modes of operation Area is the comparison criteria Independent of number and complexity of rounds

Summary (3) - My Ranking of Ciphers Serpent, Rijndael Rijndael offers better throughput/area ratio RC6, Twofish Both ciphers slower in basic architecture MARS Slow and large

Summary (4) - Survey filled by AES3 conference participants # votes 100 90 80 70 60 50 40 30 20 10 0 RijndaelSerpentTwofish RC6 Mars

Possible Extensions of this Work Study implementations of key schedules Forward and backward key schedules Study new modes of operation New contest started by NIST Comparison of Stream and Block Ciphers

Questions.