GMU SHA Core Interface & Hash Function Performance Metrics

Similar documents
GMU SHA Core Interface & Hash Function Performance Metrics Interface

ECE 545 Lecture 8b. Hardware Architectures of Secret-Key Block Ciphers and Hash Functions. George Mason University

Implementation & Benchmarking of Padding Units & HMAC for SHA-3 candidates in FPGAs & ASICs

Low-Area Implementations of SHA-3 Candidates

Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates using FPGAs

ECE 545. Digital System Design with VHDL

ECE 545 Fall 2010 Exam 1

Lightweight Implementations of SHA-3 Candidates on FPGAs

ECE 545 Fall 2013 Final Exam

!"#$%&'()*+%&,-%&.*/.&0"&#%(1.*"0* 2+345*!%(,',%6.7*87'()*9/:37* :."&).*A%7"(*8('B.&7'6=* 8C2C3C*

Benchmarking of Cryptographic Algorithms in Hardware. Ekawat Homsirikamol & Kris Gaj George Mason University USA

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

GMU Hardware API for Authen4cated Ciphers

Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs

C vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies

CAESAR Hardware API. Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj

Lecture 2B. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 8. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Industrial Data Communications - Fundamentals

Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs

ECE 545 Lecture 12. Datapath vs. Controller. Structure of a Typical Digital System Data Inputs. Required reading. Design of Controllers

Hardware Benchmarking of Cryptographic Algorithms Using High-Level Synthesis Tools: The SHA-3 Contest Case Study

Can High-Level Synthesis Compete Against a Hand-Written Code in the Cryptographic Domain? A Case Study

NIST SHA-3 ASIC Datasheet

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware

January 1996, ver. 1 Functional Specification 1

Homework deadline extended to next friday

ECE331: Hardware Organization and Design

RTL Coding General Concepts

Midterm Exam ECE 448 Spring 2019 Wednesday, March 6 15 points

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

ECE 545: Lecture 11. Programmable Logic Memories

ECE 545: Lecture 11. Programmable Logic Memories. Recommended reading. Memory Types. Memory Types. Memory Types specific to Xilinx FPGAs

Configurable UART ver 2.10

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

ECE 699: Lecture 9. Programmable Logic Memories

Configurable UART with FIFO ver 2.20

ENGG3380: Computer Organization and Design Lab4: Buses and Peripheral Devices

CDA 4253 FPGA System Design Op7miza7on Techniques. Hao Zheng Comp S ci & Eng Univ of South Florida

COE758 Digital Systems Engineering

ECE 545 Lecture 7. Advanced Testbenches. George Mason University

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 15 Memories

Groestl Tweaks and their Effect on FPGA Results

A High-Speed Unified Hardware Architecture for AES and the SHA-3 Candidate Grøstl

Vivado HLS Implementation of Round-2 SHA-3 Candidates

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Spring 2010 May 10, 2010

Field Programmable Gate Array (FPGA)

Midterms Exam Fall 2011 Solu6ons

Hardware Design with VHDL PLDs IV ECE 443

HEAD HardwarE Accelerated Deduplication

Implementation and Comparative Analysis of AES as a Stream Cipher

Formats. SAS Formats under OS/2. Writing Binary Data CHAPTER 13

High level Synthesis of Cryptographic Hardware Jeremy W. Trimble, Graduate Student in ECE646

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Memory-Mapped SHA-1 Coprocessor

EECS150 - Digital Design Lecture 17 Memory 2

ECE 545 Lecture 11. Modern FPGA Devices. ATHENa - Automated Tool for Hardware EvaluatioN

Laboratory Exercise 8

Benchmarking of Round 3 CAESAR Candidates in Hardware: Methodology, Designs & Results

Lecture 12 VHDL Synthesis

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Chronos Latency - Pole Position Performance

Formats. SAS Formats under OpenVMS. Writing Binary Data CHAPTER 13

Spiral 3-1. Hardware/Software Interfacing

High-Speed Hardware for NTRUEncrypt-SVES: Lessons Learned Malik Umar Sharif, and Kris Gaj George Mason University USA

Spiral 2-8. Cell Layout

Informats. SAS Informats under OpenVMS. Reading Binary Data CHAPTER 15

Xilinx(Ultrascale) Vs. Altera(ARRIA 10) Test Bench

Lecture 6. Advanced Testbenches. George Mason University

1 1-Introduction: 2 FIFO Overview (Part 1 = 40 pts.) generic Fig. 1: FIFO Architecture

The simplest form of storage is a register file. All microprocessors have register files, which are known as registers in the architectural context.

ON THE EFFECTIVENESS OF ASSERTION-BASED VERIFICATION

INTRODUCTION TO CATAPULT C

SHA3 Core Specification. Author: Homer Hsing

ECE550 PRACTICE Midterm

problem maximum score 1 8pts 2 6pts 3 10pts 4 15pts 5 12pts 6 10pts 7 24pts 8 16pts 9 19pts Total 120pts

GRAPHICS LCD INTERFACING WITH 8051

Universal Asynchronous Receiver/Transmitter Core

Introduction to Ethernet and lab3.3

session 7. Datapath Design

Hardware Architectures

Network Processors and their memory

Thyro-PX Anybus Modbus TCP

AES as A Stream Cipher

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

100Mbps Ethernet Data Transmission over SDH Networks using Cross Virtual Concatenation

High Level Synthesis of Cryptographic Hardware. Jeremy Trimble ECE 646

ECE 545 Lecture 11 Addendum

Implementation and Analysis of the PRIMATEs Family of Authenticated Ciphers

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Ciphertext. data out. Plaintext. data out

Pushing the Limits of SHA-3 Hardware Implementations to Fit on RFID

Overview. Ram vs Register. Register File. Introduction to Structured VLSI Design. Recap Operator Sharing FSMD Counters.

Minimum Area Cost for a 30 to 70 Gbits/s AES Processor

ORCA Series 3 Programmable Clock Manager (PCM)

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

High-Speed SDR SDRAM Controller Core for Actel FPGAs. Introduction. Features. Product Brief Version 1.0 November 2002

CS211 Digital Systems/Lab. Introduction to VHDL. Hyotaek Shim, Computer Architecture Laboratory

Transcription:

GMU SHA Core Interface & Hash Function Performance Metrics

Interface

Why Interface Matters? Pin limit Total number of i/o ports Total number of an FPGA i/o pins Support for the maximum throughput Time to load the next message block Time to process current block 3

Interface: To possible solutions msg_bitlen message zero_ord end_of_msg SHA core Length of the message communicated at the beginning + easy to implement passive source circuit area overhead for the counter of message bits Dedicated end-of-message port more intelligent source circuit required + no need for internal message bit counter 4

SHA Core: Interface & Typical Configuration clk rst clk rst clk rst clk rst clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full Output FIFO din full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read SHA core is an active component; surrounding FIFOs are passive and idely available Input interface is separate from an output interface Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel 5

SHA Core Interface clk rst clk rst din SHA core dout src_ready src_read dst_ready dst_rite 6

SHA Core Interface + Surrounding FIFOs clk rst clk rst clk rst clk rst clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full din Output FIFO full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read 7

Operation of FIFO 8

Communication Protocol for Unpadded Messages a) b) bits msg_bitlen bits seg_0_bitlen seg_0 message zero_ord seg_1_bitlen seg_1... seg_n-1_bitlen seg_n-1 zero_ord 9

SHA Core Interface ith Additional Faster I/O Clock io_clk clk rst io_clk clk rst din SHA core dout src_ready src_read dst_ready dst_rite 10

SHA Core Interface ith To Clocks + Surrounding FIFOs io_clk rst io_clk clk rst io_clk rst clk rst io_clk clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full din Output FIFO full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read 11

Communication Protocol for Padded Messages Without Message Splitting bits msg_len_ap last = 1 msg_len_bp message msg_len_ap message length after padding [bits] msg_len_bp message length before padding [bits] 12

Communication Protocol for Padded Messages With Message Splitting bits seg_0_len_ap last=0 seg_0 seg_1_len_ap last=0 seg_1... seg_n-1_len_ap last=1 seg_n-1_len_bp seg_n-1 seg_i_len_ap segment i length after padding* [bits] seg_i_len_bp segment i length before padding [bits] * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the ord size,. The last segment cannot consist of only padding bits. It must include at least one message bit. 13

Performance Metrics

Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allos for easy cross-comparison among implementations in softare (microprocessors), FPGAs (various vendors), ASICs (various libraries) 15

Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file. 16

Time to Hash N Blocks of the Message [clock cycles] 17

Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N) T Maximum Throughput (for long messages) = = block_size T * (Htime(N+1) - Htime(N)) block_size T * block_processing_time Effective maximum throughput for short messages: 18

Performance Metrics - Speed from specification Maximum Throughput (for long messages) = block_size T * block_processing_time from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation 19

Performance Metrics - Area For the basic, folded, and unrolled architectures, e force these vectors to look as follos through the synthesis and implementation options: 0 0 0 0 Areaa 20

Choice of Optimization Target Primary Optimization Target: Throughput to Area Ratio Features: practical: good balance beteen speed and cost very reliable guide through the entire design process, facilitating the choice of high-level architecture implementation of basic components choice of tool options leads to high-speed, close-to-maximum-throughput designs 21

Our Design Flo Specification Interface Datapath Block diagram Controller ASM Chart Controller Template VHDL Code Library of Basic Components Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages 22

Ho to compare hardare speed vs. softare speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, ith n=0,1, 2, 3, 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) Time(2048) Performance for long message = 2048 23 23

Ho to compare hardare speed vs. softare speed? Throughput [Gbit/s] = 8 bits/byte clock frequency [GHz] Performance for long message [cycles/byte] 24 24