GMU SHA Core Interface & Hash Function Performance Metrics

Interface

Why Interface Matters? Pin limit Total number of i/o ports Total number of an FPGA i/o pins Support for the maximum throughput Time to load the next message block Time to process current block 3

Interface: To possible solutions msg_bitlen message zero_ord end_of_msg SHA core Length of the message communicated at the beginning + easy to implement passive source circuit area overhead for the counter of message bits Dedicated end-of-message port more intelligent source circuit required + no need for internal message bit counter 4

SHA Core: Interface & Typical Configuration clk rst clk rst clk rst clk rst clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full Output FIFO din full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read SHA core is an active component; surrounding FIFOs are passive and idely available Input interface is separate from an output interface Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel 5

SHA Core Interface clk rst clk rst din SHA core dout src_ready src_read dst_ready dst_rite 6

SHA Core Interface + Surrounding FIFOs clk rst clk rst clk rst clk rst clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full din Output FIFO full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read 7

Operation of FIFO 8

Communication Protocol for Unpadded Messages a) b) bits msg_bitlen bits seg_0_bitlen seg_0 message zero_ord seg_1_bitlen seg_1... seg_n-1_bitlen seg_n-1 zero_ord 9

SHA Core Interface ith Additional Faster I/O Clock io_clk clk rst io_clk clk rst din SHA core dout src_ready src_read dst_ready dst_rite 10

SHA Core Interface ith To Clocks + Surrounding FIFOs io_clk rst io_clk clk rst io_clk rst clk rst io_clk clk rst clk rst ext_idata fifoin_full din full Input FIFO dout empty idata fifoin_empty SHA core din dout src_ready dst_ready odata fifoout_full din Output FIFO full dout empty ext_odata fifoout_empty fifoin_rite rite read fifoin_read src_read dst_rite fifoout_rite rite read fifoout_read 11

Communication Protocol for Padded Messages Without Message Splitting bits msg_len_ap last = 1 msg_len_bp message msg_len_ap message length after padding [bits] msg_len_bp message length before padding [bits] 12

Communication Protocol for Padded Messages With Message Splitting bits seg_0_len_ap last=0 seg_0 seg_1_len_ap last=0 seg_1... seg_n-1_len_ap last=1 seg_n-1_len_bp seg_n-1 seg_i_len_ap segment i length after padding* [bits] seg_i_len_bp segment i length before padding [bits] * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the ord size,. The last segment cannot consist of only padding bits. It must include at least one message bit. 13

Performance Metrics

Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allos for easy cross-comparison among implementations in softare (microprocessors), FPGAs (various vendors), ASICs (various libraries) 15

Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file. 16

Time to Hash N Blocks of the Message [clock cycles] 17

Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N) T Maximum Throughput (for long messages) = = block_size T * (Htime(N+1) - Htime(N)) block_size T * block_processing_time Effective maximum throughput for short messages: 18

Performance Metrics - Speed from specification Maximum Throughput (for long messages) = block_size T * block_processing_time from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation 19

Performance Metrics - Area For the basic, folded, and unrolled architectures, e force these vectors to look as follos through the synthesis and implementation options: 0 0 0 0 Areaa 20

Choice of Optimization Target Primary Optimization Target: Throughput to Area Ratio Features: practical: good balance beteen speed and cost very reliable guide through the entire design process, facilitating the choice of high-level architecture implementation of basic components choice of tool options leads to high-speed, close-to-maximum-throughput designs 21

Our Design Flo Specification Interface Datapath Block diagram Controller ASM Chart Controller Template VHDL Code Library of Basic Components Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages 22

Ho to compare hardare speed vs. softare speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, ith n=0,1, 2, 3, 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) Time(2048) Performance for long message = 2048 23 23

Ho to compare hardare speed vs. softare speed? Throughput [Gbit/s] = 8 bits/byte clock frequency [GHz] Performance for long message [cycles/byte] 24 24