Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs

Size: px
Start display at page:

Download "Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs"

Transcription

1 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Using FPGAs Ekaat Homsirikamol Marcin Rogaski Kris Gaj George Mason Universit ehomsiri, mrogask, This ork has een supported in part NIST through the Recover Act Measurement Science and Engineering Research Grant Program, under contract no. 6NANBD4.

2 Contents Introduction and Motivation 2 2 Methodolog 4 2. Choice of a Language, FPGA Devices, and Tools Performance Metrics for FPGAs Uniform Interface Assumptions and Simplifications Optimization Target Design Methodolog Comprehensive Designs of SHA-3 Candidates 3 3. Notations and Smols Basic Component Description Multiplication 2 in the Galois Field GF(2 8 ) Multiplication n in the Galois Field GF(2 8 ) AES BLAKE Block Diagram Description vs. Variant Differences Blue Midnight Wish (BMW) Block Diagram Description vs. Variant Differences CueHash Block Diagram Description vs. Variant Differences ECHO Block Diagram Description vs. Variant Differences Fugue Block Diagram Description

3 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates vs. Variant Differences Groestl Block Diagram Description vs. Variant Differences Hamsi Block Diagram Description vs. Variant Differences JH Block Diagram Description vs. Variant Differences Keccak Block Diagram Description vs. Variant Differences Luffa Block Diagram Description vs. Variant Differences SHA Block Diagram Description vs. Variant Differences Shaal Block Diagram Description vs. Variant Differences SHAvite Block Diagram Description vs. Variant Differences SIMD Block Diagram Description vs. Variant Differences Skein Block Diagram Description vs. Variant Differences Design Summar and Results Design Summar Relative Performance of the and 256-it Variants of the SHA-3 Candidates Results Results from Other Groups 5. Best Results from Other Groups Best Results

4 4 E. Homsirikamol, M. Rogaski, and K. Gaj 6 Conclusions and Future Work 4

5 Astract Performance in hardare has een demonstrated to e an important factor in the evaluation of candidates for crptographic standards. Up to no, no consensus exists on ho such an evaluation should e performed in order to make it fair, transparent, practical, and acceptale for the majorit of the crptographic communit. In this report, e formulate a proposal for a fair and comprehensive evaluation methodolog, and appl it to the comparison of hardare performance of 4 Round 2 SHA-3 candidates. The most important aspects of our methodolog include the definition of clear performance metrics, the development of a uniform and practical interface, generation of multiple sets of results for several representative FPGA families from to major vendors, and the application of a simple procedure to convert multiple sets of results into a single ranking. The VHDL codes for 256 and -it variants of all 4 SHA-3 Round 2 candidates and the old standard SHA-2 have een developed and thoroughl verified. These codes have een then used to evaluate the relative performance of all aforementioned algorithms using seven modern families of Field Programmale Gate Arras (FPGAs) from to major vendors, Xilinx and Altera. All algorithms have een evaluated using four performance measures: the throughput to area ratio, throughput, area, and the execution time for short messages. Based on these results, the 4 Round 2 SHA-3 candidates have een divided into several groups depending on their overall performance in FPGAs.

6 Chapter Introduction and Motivation Starting from the Advanced Encrption Standard (AES) contest organized NIST in [], open contests have ecome a method of choice for selecting crptographic standards in the U.S. and over the orld. The AES contest in the U.S. as folloed the NESSIE competition in Europe [2], CRYPTREC in Japan, and estream in Europe [3]. Four tpical criteria taken into account in the evaluation of candidates are: securit, performance in softare, performance in hardare, and flexiilit. While securit is commonl recognized as the most important evaluation criterion, it is also a measure that is most difficult to evaluate and quantif, especiall during a relativel short period of time reserved for the majorit of contests. A tpical outcome is that, after eliminating a fraction of candidates ased on securit flas, a significant numer of remaining candidates fail to demonstrate an eas to identif securit eaknesses, and as a result are judged to have adequate securit. Performance in softare and hardare are next in line to clearl differentiate among the candidates for a crptographic standard. Interestingl, the differences among the crptographic algorithms in terms of hardare performance seem to e particularl large, and often serve as a tiereaker hen other criteria fail to identif a clear inner. For example, in the AES contest, the difference in hardare speed eteen the to fastest final candidates (Serpent and Rijndael) and the sloest one (Mars) as a factor of seven [][4]; in the estream competition the spread of results among the eight top candidates qualified to the final round as a factor of 5 in terms of speed (Trivium x vs. Pomaranch), and a factor of 3 in terms of area (Grain v vs. Edon8) [5][6]. At this point, the focus of the attention of the entire crptographic communit is on the SHA-3 contest for a ne hash function standard, organized NIST [7][8]. The contest is no in its second round, ith 4 candidates remaining in the competition. The evaluation is scheduled to continue until the second quarter of 22. In spite of the progress made during previous competitions, no clear and commonl 2

7 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 3 accepted methodolog exists for comparing hardare performance of crptographic algorithms [9]. The majorit of the reported evaluations have een performed on an ad-hoc asis, and focused on one particular technolog and one particular famil of hardare devices. Other pitfalls included the lack of a uniform interface, performance metrics, and optimization criteria. These pitfalls are compounded different skills of designers, using to different hardare description languages, and no clear a of compressing multiple results to a single ranking. In this paper, e address all the aforementioned issues, and propose a clear, fair, and comprehensive methodolog for comparing hardare performance of SHA-3 candidates and an future algorithms competing to ecome a ne crptographic standard. Our methodolog is ased on the use of FPGA devices from various vendors. The advantages of using FPGAs for comparison include short development time, ide availailit of tools, and a limited numer of vendors dominating the market. The hardare evaluation of SHA-3 candidates started shortl after announcing the specifications and reference softare implementations of 5 algorithms sumitted to the contest [7][8][]. The majorit of initial comparisons ere limited to less than five candidates, and their results have een pulished at []. The more comprehensive efforts ecame feasile onl after NISTs announcement of 4 candidates qualified to the second round of the competition in Jul 29. Since then, to comprehensive studies have een reported in the Crptolog eprint Archive [][2]. The first, from the Universit of Graz, has focused on ASIC technolog, the second from to institutions in Japan, has focused on the use of the FPGA-ased SASEBO-GII oard from AIST, Japan. Although oth studies generated quite comprehensive results for their respective technologies, the did not quite address the issues of the uniform methodolog, hich could e accepted and used a larger numer of research teams. Our stud is intended to fill this gap, and put forard the proposal that could e evaluated and commented on a larger crptographic communit.

8 Chapter 2 Methodolog 2. Choice of a Language, FPGA Devices, and Tools Out of to major hardare description languages used in industr, VHDL and Verilog HDL, e choose VHDL. We elieve that either of the to languages is perfectl suited for the implementation and comparison of SHA-3 candidates, as long as all candidates are descried in the same language. Using to different languages to descrie different candidates ma introduce an undesired ias to the evaluation. FPGA devices from to major vendors, Xilinx and Altera, dominate the market ith aout 9% of the market share. We therefore feel that it is appropriate to focus on FPGA devices from these to companies. In this stud, e have chosen to use seven families of FPGA devices from Xilinx and Altera. These families include to major groups, those optimized for minimum cost (Spartan 3 from Xilinx, and Cclone II and III from Altera) and those optimized for high performance (Virtex 4 and 5 from Xilinx, and Stratix II and III from Altera). Within each famil, e use devices ith the highest speed grade, and the largest numer of pins. As CAD tools, e have selected tools developed FPGA vendors themselves: Xilinx ISE Design Suite v.. (including Xilinx XST, used for snthesis) and Altera Quartus II v. 9. Suscription Edition Softare. 2.2 Performance Metrics for FPGAs Choosing proper performance metrics for the implementation of hash functions (or an other crptographic transformations) using FPGAs is a non-trivial task, and no clear consensus exists so far on ho these metrics should e defined. Belo e summarize our proposed approach, hich e applied in our stud. 4

9 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 5 Speed. In order to characterize the speed of the hardare implementation of a hash function, e suggest using Throughput, understood as a throughput (numer of input its processed per unit of time) for long messages. To e exact, e define Throughput using the folloing formula: lock size T hroughput = (2.) T (HT ime(n + ) HT ime(n)) here lock size is a message lock size, characteristic for each hash function, HT ime(n) is a total numer of clock ccles necessar to hash an N-lock message, T is a clock period, different and characteristic for each hardare implementation of a specific hash function. Throughput defined this a is tpicall independent of N (and thus the size of the message), as in all hash function architectures e investigated so far, the expression HT ime(n + ) HT ime(n) is a constant that corresponds to the numer of clock ccles eteen processing of to susequent input locks. The effective throughput for short messages is alas smaller, and is expressed the formula N lock size T hroughput eff = (2.2) T HT ime(n) In this paper, e provide the exact formulas for HT ime(n) for each SHA-3 candidate (see Tale 4.2), and values of f = /T for each algorithm FPGA device pair (see Tales 4.8 and 4.9). Therefore, e provide sufficient information to calculate and compare values of the effective throughputs for each specific message size, hich ma e of interest in a given application. For short messages, it is more important to evaluate the total time required to process a message of a given size (rather than throughput). The size of the message can e chosen depending on the requirements of an application. For example, in the ebash stud of softare implementations of hash functions, execution times for all sizes of messages, from -tes (empt message) to 496 tes, are reported, and five specific sizes 8,, 576, 536, and 496 are featured in the tales [3]. The generic formulas e include in this paper (see Tale 4.2) allo the calculation of the execution times for an message size. In order to characterize the capailit of a given hash function implementation for processing short messages, e present in this stud the comparison of execution times for an empt message (one lock of data after padding) and a -te (8-its) message efore padding (hich ecomes equivalent for majorit, ut not all, of the investigated functions to 24 its after padding). To e exact our parameters are defined as follos T empt = T HT ime() (2.3) ( ) padlen(8) T B = T HT ime, lock size (2.4)

10 6 E. Homsirikamol, M. Rogaski, and K. Gaj here padlan(8) denotes the size of an 8-it message after padding. Resource Utilization/Area. Resource utilization is particularl difficult to compare fairl in FPGAs, and is often a source of various evaluation pitfalls. First, the asic programmale lock (such as CLB slice in Xilinx FPGAs) has a different structure and different capailities for various FPGA families from different vendors. For example, in Virtex 5, a CLB slice includes four 6-input Look-Up-Tales (LUTs); in Spartan 3 and Virtex 4, a CLB slice includes to 4-input LUTs. In Cclone II and Cclone III, the asic programmale lock is called Logic Element (LE); in Stratix II and III, the asic programmale component has a different structure and is called ALUT (Adaptive Look-Up Tale). Taking this issue into account, e suggest avoiding an comparisons across famil lines. Secondl, all modern FPGAs include multiple dedicated resources, hich can e used to implement specific functionalit. These resources include Block RAMs (BRAMs), multipliers (MULs), and DSP units in Xilinx FPGAs, and memor locks, multipliers, and DSP units in Altera FPGAs. In order to implement a specific operation, some of these resources ma e interchangale, ut there is no clear conversion factor to express one resource in terms of the other. Therefore, e suggest in the general case, treating resource utilization as a vector, ith coordinates specific to a given FPGA famil. For example, Resource Utilization Spartan3 = (#CLBslices, #BRAMs, #MULs) (2.5) Resource Utilization CcloneIII = (#LE, #memor its, #MULs) (2.6) Taking into account that vectors cannot e easil compared to each other, e have decided to opt out of using an dedicated resources in the hash function implementations used for our comparison. Thus, all coordinates of our vectors, other than the first one have een forced ( choosing appropriate options of the snthesis and implementation tools) to e zero. This a, our resource utilization (further referred to as Area) is characterized using a single numer, specific to the given famil of FPGAs, namel the numer of CLB slices (#CLBslices) for Xilinx FPGAs, the numer of Logic Elements (#LE) for Cclone II and Cclone III, and the numer of Adaptive Look-Up Tales (#ALUT ) in Stratix II and Stratix III. The resource utilization vector in FPGAs (or even its simplified one-coordinate form, referred to as Area aove) cannot e easil translated to an equivalent area or the numer of transistors in ASICs. An attempts to define a resource utilization unit that ould appl to oth technologies (such as an equivalent logic gate) have een mostl unsuccessful, and of limited value in practice. The onl common denominator is cost, ut unfortunatel the prices of integrated circuits, and FPGAs in particular, are not commonl availale, and are affected multiple non-technical factors (including the numer of units ordered, the relationship eteen companies, etc.)

11 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Uniform Interface In order to remove an amiguit in the definition of our hardare cores for SHA-3 candidates, and in order to make our implementations as practical as possile, e have developed an interface shon in Fig. 2.a, and descried elo. In a tpical scenario, the SHA core is assumed to e surrounded to standard FIFO modules: Input FIFO and Output FIFO, as shon in Fig. 2.. In this configuration, SHA core is an active module, hile a surrounding logic (FIFOs) is passive. Passive logic is much easier to implement, and in our case is composed of standard logic components, FIFOs, availale in an major lirar of IP cores. Each FIFO module generates signals empt and full, hich indicate that the FIFO is empt and/or full, respectivel. Each FIFO accepts control signals rite and read, indicating that the FIFO is eing ritten to and/or read from, respectivel. The aforementioned assumptions aout the use of FIFOs as surrounding modules are ver natural and eas to meet. For example, if a SHA core implemented on an FPGA communicates ith an outside orld using PCI, PCI-X, or PCIe interface, the implementations of these interfaces most likel alread include Input and Output FIFOs, hich can e directl connected to a SHA core. If a SHA core communicates ith another core implemented on the same FPGA, then FIFOs are often used on the oundar eteen the to cores in order to accommodate for an differences eteen the rate of generating data one core and the rate of accepting data another core. Additionall, the inputs and outputs of our proposed SHA core interface do not need to e necessaril generated/consumed FIFOs. An circuit that can support control signals src read and src read can e used as a source of data. An circuit that can support control signals dst read and dst rite can e used as a destination for data. The exact format of an input to the SHA core, for the case of pre-padded messages, is shon in Fig To scenarios of operation are supported. In the first scenario, the message itlength after padding is knon in advance and is smaller than 2. In this scenario, shon in Fig. 2.2a, the first ord of input represents message length after padding, a) clk rst ) clk rst clk rst clk rst clk rst clk rst clk rst clk rst SHA core din dout src_read dst_read src_read dst_rite Input FIFO ext_idata idata din dout fifoin_full fifoin_empt full empt fifoin_rite rite read fifoin_read din SHA core src_read src_read dout dst_read dst_rite odata fifoout_full fifoout_rite rite Output FIFO ext_odata din dout fifoout_empt full empt fifoout_read read Figure 2.: a) Input/output interface of a SHA core. ) A tpical configuration of a SHA core connected to to surrounding FIFOs.

12 8 E. Homsirikamol, M. Rogaski, and K. Gaj a) its ) its msg_len last = msg_len_p seg len last= seg_ seg len last= message seg_... seg_n-_len last= seg_n-_len_p seg_n- Figure 2.2: Format of input data for to different operation scenarios: a) ith message itlength knon in advance, and ) ith message itlength unknon in advance. Notation: msg len message length after padding, msg len p message length efore padding, seg i len segment i length after padding, seg i len p segment i length efore padding, last a one-it flag denoting the last segment of the message (or one-segment message), itise OR. expressed in its. This ord has the least significant it, representing a flag called last, set to one. This ord is folloed the message length efore padding. This value is required several SHA-3 algorithms using internal counters (such as BLAKE, ECHO, Shavite-3, and Skein), even if padding is done outside of the SHA core. These to control ords are folloed all ords of the message. The second format, shon in Fig. 2.2, is used hen either message length is not knon in advance, or it is greater than 2. In this case, the message is processed in segments of data denoted as seg, seg,...,seg n-. For the ease of processing data the hash core, the size of the segments, from seg to seg n-2 is required to e alas an integer multiple of the lock size, and thus also of the ord size. The least significant it of the segment length expressed in its is thus naturall zero, and this it, treated as a flag called last, can e used to differentiate eteen the last segment and all previous segments of the message. The last segment efore padding can e of aritrar length < 2. This segment is processed in the same a as the entire message in scenario a). This a there is no need for an additional signal to distinguish eteen these to scenarios. Scenario a) is a special case of scenario ). In case the SHA core supports padding, the protocol can e even simpler, as explained in [4]. Please note that scenario ) is ver similar to the a data is processed a tpical softare API for hash functions, such as [5]. The Update function of the softare API corresponds to processing segments from seg to seg n-2. The function Final corresponds to the processing of the last segment of data, seg n-.

13 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Assumptions and Simplifications Our stud is performed using the folloing assumptions. Onl the SHA-3 candidate variants ith the 256-it and the -it outputs have een implemented and compared at this point. Other variants, treated either independentl or as cominations of multiple variants (all-in-one hash cores) ma e sujects of future comparisons. Padding is assumed to e done outside of the hash cores (e.g., in softare). All investigated hash functions have ver similar padding schemes, hich ould lead to similar asolute area overhead if implemented as a part of the hardare core. The relative area penalt ill e higher for cores ith smaller area used for main processing. The complexit of the padding circuit ill also depend on the assumptions regarding the smallest unit of a message (it, te, or ord), hich ma e different for specific applications. Onl the primar mode of operation is supported for all functions. Special modes, such as tree hashing or MAC mode are not implemented (their implementation ould actuall ork against the respective candidates, ecause of the area and speed penalt introduced these extra features). There is also no support for providing salt specific to each message. The salt values are fixed to all zeros in all SHA-3 candidates supporting this special input (namel BLAKE, ECHO, SHAvite-3, and Skein). 2.5 Optimization Target We elieve that the choice of the primar optimization target is one of the most important decisions that needs to e made efore the start of the comparison. The optimization target should drive the design process of ever SHA-3 candidate, and it should also e used as a primar factor in ranking the otained SHA-3 cores. The most common choices are: Maximum Throughput, Minimum Latenc, Minimum Area, Throughput to Area Ratio, Product of Latenc times Area, etc. Our choice is the Throughput to Area Ratio, here Throughput is defined as Throughput for long messages, and Area is expressed in terms of the numer of asic programmale logic locks specific to a given FPGA famil. This choice has multiple advantages. First, it is practical, as hardare cores are tpicall applied in situations, here the size of the processed data is significant and the speed of processing is essential. Otherise, the input/output latenc overhead associated ith using a hardare accelerator dominates the total processing time, and the cost of using dedicated hardare (FPGA) is not justified. Optimizing for the est ratio provides a good alance eteen the speed and the cost of the solution. Secondl, this optimization criterion is a ver reliale guide throughout the entire design process. At ever junction here the decisions must e made, starting from the choice of a high-level hardare architecture don to the choice of the particular FPGA tool options, this criterion facilitates the decision process, leaving ver fe possile paths for further investigation.

14 E. Homsirikamol, M. Rogaski, and K. Gaj On the contrar, optimizing for Throughput alone, leads to highl unrolled hash function architectures, in hich a relativel minor improvement in speed is associated ith a major increase in the circuit area. In hash function cores, latenc, defined as a dela eteen providing an input and otaining the corresponding output, is a function of the input size. Since various sizes ma e most common in specific applications, this parameter is not a ell-defined optimization target. Finall, optimizing for area leads to highl sequential designs, resemling small general-purpose microprocessors, and the final product depends highl on the maximum amount of area (e.g., a particular FPGA device) assumed to e availale. 2.6 Design Methodolog Our design of all 4 SHA-3 candidates folloed an identical design methodolog. Each SHA core is composed of the Datapath and the Controller. The Controller is implemented using three main Finite State Machines, orking in parallel, and responsile for the Input, Main Processing, and the Output, respectivel. As a result, each circuit can simultaneousl perform the folloing three tasks: output hash value for the previous message, process a current lock of data, and read the next lock of data. The parameters of the interface are selected in such a a that the time necessar to process one lock of data is alas larger or equal to the time necessar to read the next lock of data. This a, the processing of long streams of data can happen at full speed, ithout an visile input interface overhead. The finite state machines responsile for input and output are almost identical for all hash function candidates; the third state machine, responsile for main data processing, is ased on a similar template. The similarit of all designs and reuse of common uilding locks assures a high fairness of the comparison. The design of the Datapath starts from the high level architecture. At this point, the most complex task that can e executed in an iterative fashion, ith the minimum overhead associated ith multiplexing inputs specific to a given iteration round, is identified. The proper choice of such a task is ver important, as it determines oth the numer of clock ccles per lock of the message and the circuit critical path (minimum clock period). It should e stressed that the choice of the most complex task that can e executed in an iterative fashion should not follo lindl the specification of a function. In particular, quite often one round (or one step) from the description of the algorithm is not the most suitale component to e iterated in hardare. Either multiple rounds (steps) or fractions thereof ma e more appropriate. In Tale 2. e summarize our choices of the main iterative tasks of SHA-3 candidates. Each such task is implemented as cominational logic, surrounded registers. The next step is an efficient implementation of each cominational lock ithin the DataPath. In Tale 2.2, e summarize major operations of all SHA-3 candidates that require logic resources in hardare implementations. Fixed shifts, fixed rotations, and

15 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Tale 2.: Main iterative tasks of the hardare architectures of SHA-3 candidates optimized for the maximum Throughput to Area ratio Function Main Iterative Task Function Main Iterative Task BLAKE G i..g i+3 JH Round function R 8 BMW entire function Keccak Round R CueHash one round Luffa The Step Function, Step ECHO AES round/aes round/ Shaal To iterations BIG.SHIFTROWS, BIG.MIXCOLUMNS of the main loop Fugue 2 surounds SHAvite-3 AES round (ROR3, CMIX, SMIX) Groestl Modified AES round SIMD 4 steps of the compression function Hamsi Truncated Non-Linear Skein 4 rounds of Permutation P Threefish- Tale 2.2: Major operations of SHA-3 candidates (other than permutations, fixed shifts and fixed rotations). maddn denotes a multioperand addition ith n operands. Function NTT Linear S-ox GF MUL MUL madd ADD Boolean code /SUB BLAKE madd3 ADD XOR BMW madd7 ADD,SUB XOR CueHash ADD XOR ECHO AES 8x8 x2, x3 XOR Fugue AES 8x8 x4..x7 XOR Groestl AES 8x8 x2..x5, x7 XOR Hamsi LC Serpent XOR 4x4 JH 4x4 x2, x5 XOR Keccak NOT,AND,XOR Luffa 4x4 x2 XOR Shaal x3, x5 ADD,SUB NOT,AND,XOR SHAvite-3 AES 8x8 x2, x3 NOT,XOR SIMD NTT x85, x233 madd3 ADD NOT,AND,OR Skein ADD XOR SHA-256 madd5 NOT,AND,XOR

16 2 E. Homsirikamol, M. Rogaski, and K. Gaj other more complex permutations are omitted ecause the appear in all candidates and require onl routing resources (programmale interconnects). The most complex out of logic operations are the Numer Theoretic Transform (NTT) [6] in SIMD, linear code (LC) [7] in Hamsi, asic operations of AES (8x8 AES S-ox and multiplication a constant in the Galois Field GF(2 8 )) in ECHO, Fugue, Groestl, and SHAvite-3; and multioperand additions in BLAKE, BMW, SIMD, and SHA-256. For each of these operations e have implemented at least to alternative architectures. NTT as optimized using a Fast Fourier Transform (FFT) [6]. In Hamsi, the linear code as implemented using oth logic (matrix vector multiplications in GF(4)), and using look-up tales. AES 8x8 S-oxes (SuBtes) ere implemented using oth look-up tales (stored in distriuted memories), and using logic onl (folloing method descried in [8], Section.6..3). Multi-operand additions ere implemented using the folloing four methods: carr save adders (CSA), tree of to operand adders, parallel counter, and a + in VHDL [9]). Finall, integer multiplications 3 and 5 in Shaal have een replaced a fixed shift and addition. All optimized implementations of asic operations have een applied uniforml to all SHA-3 candidates. In case the initial testing did not provide a strong indication of superiorit of one of the alternative methods, the entire hash function unit as implemented using to alternative versions of the asic operation code, and the results for a version ith the etter throughput to area ratio have een listed in the result tales. All VHDL codes have een thoroughl verified using a universal testench, capale of testing an aritrar hash function core that follos interface descried in Section 2.3 [2]. A special padding script as developed in Perl in order to pad messages included in the Knon Anser Test (KAT) files distriuted as a part of each candidates sumission package. An output from the script follos a similar format as its input, ut includes apart from padding its also the lengths of the message segments, defined in Section 2.3, and shon schematicall in Fig The generation of a large numer of results as facilitated an open source tool ATHENa (Automated Tool for Hardare EvaluatioN) [2]. This enchmarking environment as also used to optimize requested snthesis and implementation frequencies and other tool options.

17 Chapter 3 Comprehensive Designs of SHA-3 Candidates The designs of all 4 SHA-3 candidates folloed the same asic design principle ith the core separated into to main units, the Datapath and the Controller. Onl the Datapath diagrams are provided in this chapter as the Controller can e derived from the Datapath and the specification of the function, and descried using ASM charts. The full specification of each of the algorithms can e found in [2 35]. 3. Notations and Smols Tale 3. provides the notation and smols that are eing used throughout this chapter. 3

18 4 E. Homsirikamol, M. Rogaski, and K. Gaj Tale 3.: Notations and Smols Word A group of its used in arithmetic and logic operations, tpicall of the size of or its. Block A group of ords. X[i] Refers to an arra position i in X. X i Refers to a it position i in X. salt Salt values are alas assumed to e zero and as a result the are omitted from the diagrams. Block size in its. h Hash value size in its. Word size in its. IV Initialization vector SEXT Sign extension. ZEXT Zero extension. <<<R Rotation left R positions. If R is a constant: fixed rotation; if R is a variale: variale rotation implemented using a arrel rotator. >>>R Rotation right R positions. If R is a constant: fixed rotation; if R is a variale: variale rotation implemented using a arrel rotator. <<S Shift left S positions. If S is a constant: fixed shift; if S is a variale: variale shift implemented using a arrel shifter. >>S Shift right S positions. If S is a constant: fixed shift; if S is a variale: variale shift implemented using a arrel shifter. Concatenation. B default, the uses concatenate ack to the same arrangement as efore the separation (split) occurs. SIPO PISO sitch endian ord sitch endian te Serial-in-parallel-out unit. Parallel-in-serial-out unit. Wordise endianness sitching. Bteise endianness sitching.

19 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Basic Component Description This section descries implementations of selected asic components used in more than one algorithm. These components include multiplication 2 in GF(2 8 ), SuBtes, Mix- Columns, and AES Round Multiplication 2 in the Galois Field GF(2 8 ) X[7] X[6] Y[7] X[5] Y[6] X[4] Y[5] X[3] Y[4] X[2] Y[3] Y[2] X[] Y[] Y[] X[] Figure 3.: Basic : x Multiplication n in the Galois Field GF(2 8 ) Galois Field multiplication n other than 2 is summarized in Tale 3.2. Tale 3.2: Galois Field Multiplication n x3(x) = x2(x) X x4(x) = x2(x2(x)) x5(x) = x4(x) X x6(x) = x4(x) x2(x) x7(x) = x4(x) x3(x)

20 6 E. Homsirikamol, M. Rogaski, and K. Gaj AES AES is a asic uilding lock of man SHA-3 candidates. An AES round consists of three asic operations, SuBtes, MixColumns and ShiftRos shon in Figure 3.2. SuBte operation, shon in Figure 3.3, performs direct sustitution on all tes of its input. Mix- Columns performs matrix multiplication on each ord of its input. A ord of AES contains its. Hence, four instances of MixColumns are required to process the entire AES lock of its. The SBOX and ShiftBtes operations of AES and their full specifications can e found in [36] and [37]. x SuBtes MixColumns ShiftRos ke Figure 3.2: Basic : AES Round

21 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 7 x x[] x[5] AES SBOX AES SBOX [] [5] Figure 3.3: Basic : AES SuBtes Note : All uses are 8 it ide 2 3 x2 x2 x2 x2 2 3 Figure 3.4: Basic : AES MixColumns

22 8 E. Homsirikamol, M. Rogaski, and K. Gaj 3.3 BLAKE 3.3. Block Diagram Description Figure 3.5 shos the datapath of BLAKE. In this design, the cominational CORE implements one half of the BLAKE s round [2]. Thus, to clock ccles are necessar to implement the full round. First, a message lock is loaded into SIPO. Once done, the lock is stored in a temporar register, used to hold the message lock until this lock is full processed the CORE. This temporar register allos the next message lock to e loaded simultaneousl into SIPO. The message lock msg and the constant c are then applied as inputs to the function PERMUTE and the otained output is passed to the design s CORE. Simultaneousl, the chaining value, CV, is initialized ith the the Initialization Vector, IV, and an input to the CORE, V, is initialized ith the value dependent on the chaining value, the counter, t, and a loer half of the constant c. The initial value of V is mixed the CORE ith an output of the lock PERMUTE, CM, for tent clock ccles ( rounds). Once this operation is completed, an additional clock ccle is required for finalization. The output of Finalization is used as the next chaining value, for intermediate message locks, or as the final hash value for the last message lock. The Initialization unit performs the folloing function: v[] v[] v[2] v[3] h[] h[] h[2] h[3] v[4] v[5] v[6] v[7] v[8] v[9] v[] v[] h[4] h[5] h[6] h[7] c[] c[] c[2] c[3] v[2] v[3] v[4] v[5] t[] c[4] t[] c[5] c[6] c[7] The Finalization unit performs the folloing operation: h [] h[] v[] v[8] h [] h[] v[] v[9] h [2] h[2] v[2] v[] h [3] h[3] v[3] v[] h [4] h[4] v[4] v[2] h [5] h[5] v[5] v[3] h [6] h[6] v[6] v[4] h [7] h[7] v[7] v[5] In Figure 3.6, an operation of the BLAKE s PERMUTE module is presented. A ne value of the variale m is selected depending on the round numer using a ide multiplexer preceded constant permutations. A permutation tale is shon in Tale 3.3. The selection signal of the multiplexer ccles from to 9 (and then again ack to for BLAKE- ) until all BLAKE s rounds are executed. Each output of the multiplexer is then mixed ith the respective constant using the transformation XOR W CROSS (defined in the note to Fig. 3.6) and registered afterards.

23 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 9 BLAKE : =, h=256 BLAKE : =24, h= /2 IV /2 /2 din SIPO c c[..7] /2 t /8 /2 t h Initialization c v /2 CV /2 V /2 msg constant V PERMUTE CM /2 /2 CM CORE VP v Finalization h h /2 /2 /2 PISO dout Figure 3.5: BLAKE : Datapath

24 2 E. Homsirikamol, M. Rogaski, and K. Gaj Tale 3.3: BLAKE : Permutation Constants hi lo σ σ σ σ σ σ σ σ σ σ The CORE unit is shon in Figure 3.7 and represents one half of the BLAKE s round. As specified in [2], there are to levels of G functions and therefore a permutation eteen the first and the second half-round is required. This permutation is performed ordise and is shon in Tale 3.4. LVL2 permute transforms the state matrix (output of 4 parallel G functions) into a ne matrix appropriate for the second half-round. LVL permute is a permutation inverse to LVL2 permute. Tale 3.4: BLAKE: Half Round s Permutation LVL2 (forard) LVL (revert) The G-function in the CORE unit is shon in Figure 3.8. Note that the XOR operations used to calculate input values CM 2i and CM 2i+, hich are normall depicted as a part of the G-function, are omitted in our design. These operations ere placed as a part of the PERMUTE unit and therefore skipped here. R, R2, R3 and R4 are rotating constants. The values of these constants are shon in Tale vs. Variant Differences BLAKE- doules the ord size of BLAKE-, there increasing the lock size as ell. Hence, the IV and the constant are changed from its to 24 its. These values can Tale 3.5: BLAKE : Rotation Constants of BLAKE- R R2 R3 R

25 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 2 msg BLAKE : =,= BLAKE : =24,= constant small_permute Pσ hi Pσ hi Pσ9 P σ lo P σ8lo P σ9 lo hi i 5 small_permute /2 /2 /2 /2 /2 / i /2 Note: hi and lo denotes top and ottom half of the permutation tale m XOR_W CROSS /2 c /2 m = m[..5] c = c[..5] cm 2i = m 2i c 2i+ CM cm 2i+ = m 2i+ c 2i Figure 3.6: BLAKE : PERMUTE Tale 3.6: BLAKE: Rotating Constants of BLAKE- R R2 R3 R e found in Section 2.2. of [2]. BLAKE- introduces also an increase in the numer of mixing rounds from to 4. As a result, the numer of clock ccles required in our design for processing a single lock of message increases from 2 to 29. The multiplexer selection signal in the PERMUTE unit loops ack hen the round numer reaches. Hence, after reaching 9, this selection signal goes ack to. Finall, the rotation constants are adjusted to reflect the increased ord size. These values are descried in Tale 3.6.

26 22 E. Homsirikamol, M. Rogaski, and K. Gaj V[] V[8] V[4] V[2] V[] V[9] V[5] V[3] V[2] V[] V[6] V[4] V[3] V[] V[7] V[5] A B C D A B C D A B C D A B C D CM[,] cm G_function CM[2,3] G_function CM[4,5] G_function CM[6,7] G_function 2 cm cm cm F F F F /4 /4 /4 /4 BLAKE : =, = BLAKE : =24, = =/6 LVL2 permute LVL permute vp Figure 3.7: BLAKE : CORE A CM2i CM2i+ BLAKE : = BLAKE : = R2 R4 B <<< <<< /4 F C D R <<< R3 <<< Figure 3.8: BLAKE : G-function

27 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates Blue Midnight Wish (BMW) 3.4. Block Diagram Description Our design for Blue Midnight Wish (BMW) hashes a lock of data ithin one clock ccle. Since the numer of clock ccles necessar to read a lock of a message is greater than the numer of clock ccles required to hash it, an additional clock signal is used in the circuit as shon in Figure 3.9. This faster clock (io clk) is used to drive the SIPO and PISO units, alloing them to read and rite data at a faster rate than the operation of other units in the circuit. The rate of reading and riting is determined the lock size and the numer of ccles required to process a lock. Since onl one clock ccle is used to process a message lock, the frequenc of io clk is lock size/ord size times higher than the main clock. This ratio is equal to 8 for BMW-256. BMW requires each message lock to go through the endianness sitching efore the start of processing. A message lock is then mixed ith the chaining value in order to otain the next chaining value. Once all locks of the message are processed, a finalization round is initiated. Since there is no incoming message lock, the chaining value and the input message lock are replaced the constant and the chaining value, respectivel. The descriptions of F, F, F2 and AddElements and its associated logical operations can e found in Tale.3 and Tale in [22] vs. Variant Differences BMW- increases the ord size of BMW-256 from to its. As a result, the lock size is douled as ell. Since the lock size increases, the numer of clock ccles required to load a message lock also increases for io clk from 8 ccles to 6 ccles. Furthermore, logic functions, specificall shifts and rotations, are adjusted to accommodate the increased ord size. These changes are shon in Tale.3 of [22]. All other operations remain the same.

28 24 E. Homsirikamol, M. Rogaski, and K. Gaj din BMW 256 : =, h=256, = BMW : =24, h=, = io_clk SIPO IV sitch endian ord constant CV H F M qa H M AddElement A A qa F q q F2 hp qa M hp h hp.. h sitch endian ord h io_clk PISO dout Figure 3.9: BMW : Datapath

29 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates CueHash 3.5. Block Diagram Description A straightforard iterative architecture is used in our design. The datapath of CueHash is shon in Figure 3.. Due to endianness issue, the input message is required to go through endianness sitching tice. First, the teise endianness sitching is applied, hich is then folloed the ordise endianness sitching. A ord of CueHash consists of its. For each message, the chaining value A is initialized to IV. The 256 leftmost its of the chaining value are xored ith an input message lock. The state is then transformed for 6 rounds. A round is descried in Figure 3.. Saps used inside of the round are descried in Figure 3.2. All operations inside the round are performed ordise. This process repeats until all message locks are processed. In the last round of the last message lock, an integer one is xored ith the position zero of the chaining value, rp, activating the control signal f inal efore the chaining value is inserted ack into the state register. Then, the chaining value is transformed for 6 rounds to get the final hash value. The hash value is required to go through the endianness sitching process again to reach the correct hash output vs. Variant Differences Everthing is the same for oth variants ith the exception of truncation size. CueHash6/- 256 truncates the state to 256 its to otain the hash value, as opposed to CueHash6/- hich truncates the state to its.

30 26 E. Homsirikamol, M. Rogaski, and K. Gaj sitch endian te din SIPO 256 sitch endian ord zeros CueHash6/ 256 : h=256 CueHash6/ : h= IV B A=B C B C rp = rp 23.. rp rp dout 24 D=B C 23 PISO 24 rp final h r sitch endian te ROUND 24 rp h sitch endian ord 24 h rp h.. rp Figure 3.: CueHash : Datapath

31 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 27 Note : All operations are performed ordise, ith = 7 <<< SapA r 5.. rp 5.. <<< SapC r rp r 23.. SapB SapD rp 23.. Figure 3.: CueHash : Round SapA SapB SapC SapD Note : Each index is it ide Figure 3.2: CueHash : Saps

32 28 E. Homsirikamol, M. Rogaski, and K. Gaj 3.6 ECHO 3.6. Block Diagram Description ECHO s top level datapath is shon in Figure 3.3. A message lock is first concatenated ith the chaining value to produce the state matrix. The state matrix is vieed as an arra of 6 ords ith each ord representing its. The state then goes through rounds of iteration for ECHO-256. Note that c represents the numer of its hashed so far. This value also includes its of the currentl processed lock. Once the state matrix is thoroughl mixed, a ne chaining value is computed from the state matrix the BIG.Final unit. This operation is descried as follos: v [] v[] m[] m[4] m[8] [] [4] [8] [2] v [] v[] m[] m[5] m[9] [] [5] [9] [3] v [2] v[2] m[2] m[6] m[] [2] [6] [] [4] v [3] v[3] m[3] m[7] m[] [3] [7] [] [5] din SIPO ECHO 256 : p = 536 q = M p W=V M p 248 m BIG.Final c v v c q q q IV PISO q CV dout ECHO : p = 24 q = x ECHO ROUND q Figure 3.3: ECHO : Datapath In Figure 3.4, operations inside of the ECHO round are shon. In our design, each ECHO round is executed in three clock ccles. BIG.SuBtes is performed in the first to clock ccles and BIG.ShiftRos and BIG.MixColumns in the third ccle. BigSuBtes operation is shon in Figure 3.5. The unit takes in the state matrix and the message length counter, C, and produces the next state. In the first clock ccle of the round

33 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 29 c x c x BIG.SuBtes x BIG.ShiftRos BIG.MixColumns x Figure 3.4: ECHO : Round operation, the ke is chosen to e the length counter plus the numers eteen and 5. These added values follo the ord numer. Hence, the fourteenth ord gets the ke as C + 4. In the next ccle, salt is selected as the ke. Since in our implementation, salt is not used, zero is selected instead. x[] x 248 x[5] c zeros c zeros ke=c ke x AES round AES round x ke ke=c zeros c 5 zeros c [] 248 [5] Figure 3.5: ECHO : BIG.SuBtes To operations are performed in the third ccle of a round. First, BIG.ShiftRos is performed. This operation is equivalent to the ord permutation given in Tale 3.7. Next,

34 3 E. Homsirikamol, M. Rogaski, and K. Gaj BIG.MixColumns transforms the permuted state to otain the final value of a round. In Figure 3.6, a diagram of BIG.MixColumns is shon. BIG.MixColumns separates the state into 4 locks, each lock containing 4 -it ords. A te of data from each ord is selected to go through the AES MixColumn. All data is then comined together to produce the final state. Tale 3.7: ECHO : BIG.ShiftRos Word Permuted Word MC Note: Buses size are 8 it ide unless specified otherise x [] x x [] x [] x [2] x [3] x 248 x[] x[2] x [3] 2 3 AES MixColumn AES MixColumn 2 3 MC MC MC [] [] [2] [3] [] [] [2] [3] 248 Figure 3.6: ECHO : BIG.MixColumns vs. Variant Differences ECHO- differs from ECHO-256 in its message lock and chaining value sizes. The message lock is reduced from 536 its to 24. On the other hand, the chaining value is increased from to 24 its. This change increases the securit of ECHO and therefore

35 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 3 a smaller numer of rounds is used. Onl 8 rounds are used in ECHO-. Finall, onl BIG.Final is altered. ECHO- s BIG.Final is descried as follos: v [] v[] m[] [] [8] v [] v[] m[] [] [9] v [2] v[2] m[2] [2] [] v [3] v[3] m[3] [3] [] v [4] v[4] m[4] [4] [2] v [5] v[5] m[5] [5] [3] v [6] v[6] m[6] [6] [4] v [7] v[7] m[7] [7] [5]

36 E. Homsirikamol, M. Rogaski, and K. Gaj 3.7 Fugue 3.7. Block Diagram Description Fugue 256 : =96, h=256 Fugue : =52, h= din s Finalization s r ROUND din h PISO rp IV dout S Figure 3.7: Fugue : Datapath In Figure 3.7, the datapath of Fugue is shon. For ever message, the state register is initialized to IV. The state is vieed as a matrix of 4 X tes, here X is the column length dependent on the lock size of Fugue. For Fugue-, the lock size is equal to 96 its. Hence, the matrix have dimensions 4 x 3. The state is mixed ith input message locks through the ROUND unit. Once all message locks are processed, the state goes through Finalization. For Fugue-, Finalization is descried elo: S = S[..3] (S[4] S[]) (S[5] S[]) S[6..8] A round of Fugue is shon in Figure 3.8. The path through the ROUND unit is selected ased on the sequence of operations as descried in Section of F-256 in [25]. TIX operates in parallel as follos: S [] = din S [] = S[] S[24] S [8] = S[8] din S [] = S[] S[]

37 Comparing Hardare Performance of Fourteen Round To SHA-3 Candidates 33 din r din TIX s s ROR3 s CMIX s s ADDFn RORFn s i SMIX rp Figure 3.8: Fugue : Round

38 34 E. Homsirikamol, M. Rogaski, and K. Gaj Tale 3.8: Fugue: F-256 ADDFn and RORFn Operation i S [4] = S[4] S[] S [5] = S[5] S[] ROR5 S [4] = S[4] S[] S [6] = S[6] S[] ROR4 ROR3 and CMIX are performed consecutivel. All RORn operations are teise rotations n tes. This is equivalent to >>> (n 8). As such, ROR3 can e considered as >>> 24. CMIX operates as follos: S [] = S [] = S [2] = S [5] = S [6] = S [7] = S[] S[4] S[] S[5] S[2] S[6] S[5] S[4] S[6] S[5] S[7] S[6] The ADDFn and RORFn operations are selected the i control signal. The selection process is descried in Tale 3.8. Finall, the SMIX operation is descried in Figure 3.9. The SMIX operation first splits an input into an arra of -it locks. Then, each lock is further splitted into 6 tes. These tes are transformed using AES SBOX and the resulting vector of 6 tes is used as an input to the Matrix Multiplier. The Matrix Multiplier performs multiplication of a constant matrix an input vector. The value of the constant matrix is shon in Tale 3.9. Empt positions in this tale correspond to the values zero. All multiplications are defined as multiplications in GF (2 8 ) vs. Variant Differences Fugue- increases the state size to 4 x 36 hich is equivalent to 52 its. Additionall, TIX, CMIX, ADDFn, RORFn and Finalization have een modified. TIX is no performed in parallel as follos: S [] = din S [] = S[] S[24] S [4] = S[4] S[27] S [7] = S[7] S[3] S [8] = S[8] din S [22] = S[22] S[]

Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs

Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs Ekawat Homsirikamol Marcin Rogawski Kris Gaj George Mason University ehomsiri, mrogawsk, kgaj@gmu.edu Last revised: Decemer

More information

Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates using FPGAs

Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates using FPGAs Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates using FPGAs Kris Gaj, Ekawat Homsirikamol, and Marcin Rogawski ECE Department, George Mason

More information

GMU SHA Core Interface & Hash Function Performance Metrics

GMU SHA Core Interface & Hash Function Performance Metrics GMU SHA Core Interface & Hash Function Performance Metrics Interface Why Interface Matters? Pin limit Total number of i/o ports Total number of an FPGA i/o pins Support for the maximum throughput Time

More information

Groestl Tweaks and their Effect on FPGA Results

Groestl Tweaks and their Effect on FPGA Results Groestl Tweaks and their Effect on FPGA Results Marcin Rogawski and Kris Gaj George Mason University {kgaj, mrogawsk}@gmu.edu Abstract. In January 2011, Groestl team published tweaks to their specification

More information

Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates

Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates Malik Umar Sharif, Rabia Shahid, Marcin Rogawski and Kris Gaj George Mason University, USA Agenda SHA-3 High Speed

More information

GMU SHA Core Interface & Hash Function Performance Metrics Interface

GMU SHA Core Interface & Hash Function Performance Metrics Interface GMU SHA Core Interface & Hash Function Performance Metrics Interface 1 Why Interface Matters? Pin limit Total number of i/o ports Total number of an FPGA i/o pins Support for the maximum throughput Time

More information

Implementation & Benchmarking of Padding Units & HMAC for SHA-3 candidates in FPGAs & ASICs

Implementation & Benchmarking of Padding Units & HMAC for SHA-3 candidates in FPGAs & ASICs Implementation & Benchmarking of Padding Units & HMAC for SHA-3 candidates in FPGAs & ASICs Ambarish Vyas Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE,

More information

Vivado HLS Implementation of Round-2 SHA-3 Candidates

Vivado HLS Implementation of Round-2 SHA-3 Candidates Farnoud Farahmand ECE 646 Fall 2015 Vivado HLS Implementation of Round-2 SHA-3 Candidates Introduction NIST announced a public competition on November 2007 to develop a new cryptographic hash algorithm,

More information

Use of Embedded FPGA Resources in Implementa:ons of 14 Round 2 SHA- 3 Candidates

Use of Embedded FPGA Resources in Implementa:ons of 14 Round 2 SHA- 3 Candidates Use of Embedded FPGA Resources in Implementa:ons of 14 Round 2 SHA- 3 Candidates Kris Gaj, Rabia Shahid, Malik Umar Sharif, and Marcin Rogawski George Mason University U.S.A. Co-Authors Rabia Shahid Malik

More information

Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates

Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates Malik Umar Sharif, Rabia Shahid, Marcin Rogawski, Kris Gaj Abstract In this paper, we present results of the comprehensive

More information

Low-Area Implementations of SHA-3 Candidates

Low-Area Implementations of SHA-3 Candidates Jens-Peter Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of IT&E, George Mason University, Fairfax, VA, USA SHA-3 Project Review Meeting

More information

A High-Speed Unified Hardware Architecture for AES and the SHA-3 Candidate Grøstl

A High-Speed Unified Hardware Architecture for AES and the SHA-3 Candidate Grøstl A High-Speed Unified Hardware Architecture for AES and the SHA-3 Candidate Grøstl Marcin Rogawski Kris Gaj Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE,

More information

Lightweight Implementations of SHA-3 Candidates on FPGAs

Lightweight Implementations of SHA-3 Candidates on FPGAs Lightweight of SHA-3 Candidates on FPGAs Jens-Peter Kaps Panasayya Yalla Kishore Kumar Surapathi Bilal Habib Susheel Vadlamudi Smriti Gurung John Pham Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu

More information

Version 2.0, November 11, Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely

Version 2.0, November 11, Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein Version 2.0, November 11, 2009 Stefan Tillich,

More information

ECE 545 Lecture 8b. Hardware Architectures of Secret-Key Block Ciphers and Hash Functions. George Mason University

ECE 545 Lecture 8b. Hardware Architectures of Secret-Key Block Ciphers and Hash Functions. George Mason University ECE 545 Lecture 8b Hardware Architectures of Secret-Key Block Ciphers and Hash Functions George Mason University Recommended reading K. Gaj and P. Chodowiec, FPGA and ASIC Implementations of AES, Chapter

More information

Benchmarking of Cryptographic Algorithms in Hardware. Ekawat Homsirikamol & Kris Gaj George Mason University USA

Benchmarking of Cryptographic Algorithms in Hardware. Ekawat Homsirikamol & Kris Gaj George Mason University USA Benchmarking of Cryptographic Algorithms in Hardware Ekawat Homsirikamol & Kris Gaj George Mason University USA 1 Co-Author Ekawat Homsirikamol a.k.a Ice Working on the PhD Thesis entitled A New Approach

More information

C vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies

C vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies C vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, and Kris Gaj George

More information

ECE 545. Digital System Design with VHDL

ECE 545. Digital System Design with VHDL ECE 545 Digital System Design with VHDL Course web page: ECE web page Courses Course web pages ECE 545 http://ece.gmu.edu/coursewebpages/ece/ece545/f10/ Kris Gaj Research and teaching interests: Contact:

More information

Two Hardware Designs of BLAKE-256 Based on Final Round Tweak

Two Hardware Designs of BLAKE-256 Based on Final Round Tweak Two Hardware Designs of BLAKE-256 Based on Final Round Tweak Muh Syafiq Irsyadi and Shuichi Ichikawa Dept. Knowledge-based Information Engineering Toyohashi University of Technology, Hibarigaoka, Tempaku,

More information

Implementation and Comparative Analysis of AES as a Stream Cipher

Implementation and Comparative Analysis of AES as a Stream Cipher Implementation and Comparative Analysis of AES as a Stream Cipher Bin ZHOU, Yingning Peng Dept. of Electronic Engineering, Tsinghua University, Beijing, China, 100084 e-mail: zhoubin06@mails.tsinghua.edu.cn

More information

Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII

Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII Kazuyuki Kobayashi 1, Jun Ikegami 1, Shin ichiro Matsuo 2, Kazuo Sakiyama 1 and Kazuo Ohta 1 1 The University of Electro-Communications,

More information

Sharing Resources Between AES and the SHA-3 Second Round Candidates Fugue and Grøstl

Sharing Resources Between AES and the SHA-3 Second Round Candidates Fugue and Grøstl Sharing Resources Between AES and the SHA-3 Second Round Candidates Fugue and Grøstl Kimmo Järvinen Department of Information and Computer Science Aalto University, School of Science and Technology Espoo,

More information

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Master s Thesis Pawel Chodowiec MS CpE Candidate, ECE George Mason University Advisor: Dr. Kris Gaj, ECE George

More information

AES as A Stream Cipher

AES as A Stream Cipher > AES as A Stream Cipher < AES as A Stream Cipher Bin ZHOU, Kris Gaj, Department of ECE, George Mason University Abstract This paper presents implementation of advanced encryption standard (AES) as a stream

More information

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays Kris Gaj and Pawel Chodowiec Electrical and Computer Engineering George Mason University Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable

More information

Can High-Level Synthesis Compete Against a Hand-Written Code in the Cryptographic Domain? A Case Study

Can High-Level Synthesis Compete Against a Hand-Written Code in the Cryptographic Domain? A Case Study Can High-Level Synthesis Compete Against a Hand-Written Code in the Cryptographic Domain? A Case Study Ekawat Homsirikamol & Kris Gaj George Mason University USA Project supported by NSF Grant #1314540

More information

SHA3 Core Specification. Author: Homer Hsing

SHA3 Core Specification. Author: Homer Hsing SHA3 Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1 January 29, 2013 This page has been intentionally left blank. www.opencores.org Rev 0.1 ii Rev. Date Author Description 0.1 01/29/2013

More information

Hardware Benchmarking of Cryptographic Algorithms Using High-Level Synthesis Tools: The SHA-3 Contest Case Study

Hardware Benchmarking of Cryptographic Algorithms Using High-Level Synthesis Tools: The SHA-3 Contest Case Study Hardware Benchmarking of Cryptographic Algorithms Using High-Level Synthesis Tools: The SHA-3 Contest Case Study Ekawat Homsirikamol and Kris Gaj Volgenau School of Engineering George Mason University

More information

A Zynq-based Testbed for the Experimental Benchmarking of Algorithms Competing in Cryptographic Contests

A Zynq-based Testbed for the Experimental Benchmarking of Algorithms Competing in Cryptographic Contests A Zynq-based Testbed for the Experimental Benchmarking of Algorithms Competing in Cryptographic Contests Farnoud Farahmand, Ekawat Homsirikamol, and Kris Gaj George Mason University Fairfax, Virginia 22030

More information

Environment for Fair and Comprehensive Performance Evalua7on of Cryptographic Hardware and So=ware. ASIC Status Update

Environment for Fair and Comprehensive Performance Evalua7on of Cryptographic Hardware and So=ware. ASIC Status Update Environment for Fair and Comprehensive Performance Evalua7on of Cryptographic Hardware and So=ware ASIC Status Update ECE Department, Virginia Tech Faculty - Patrick Schaumont, Leyla Nazhandali Students

More information

Hardware Architectures

Hardware Architectures Hardware Architectures Secret-key Cryptography Public-key Cryptography Cryptanalysis AES & AES candidates estream candidates Hash Functions SHA-3 Montgomery Multipliers ECC cryptosystems Pairing-based

More information

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays Kris Gaj and Pawel Chodowiec George Mason University, Electrical and

More information

Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing

Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing Abdulkadir Akın, Aydın Aysu, Onur Can Ulusel, and Erkay

More information

High-Speed Hardware for NTRUEncrypt-SVES: Lessons Learned Malik Umar Sharif, and Kris Gaj George Mason University USA

High-Speed Hardware for NTRUEncrypt-SVES: Lessons Learned Malik Umar Sharif, and Kris Gaj George Mason University USA High-Speed Hardware for NTRUEncrypt-SVES: Lessons Learned Malik Umar Sharif, and Kris Gaj George Mason University USA Partially supported by NIST under grant no. 60NANB15D058 1 Co-Author Malik Umar Sharif

More information

A High-Speed Unified Hardware Architecture for the AES and SHA-3 Candidate Grøstl

A High-Speed Unified Hardware Architecture for the AES and SHA-3 Candidate Grøstl A High-Speed Unified Hardware Architecture for the AES and SHA-3 Candidate Grøstl Marcin Rogawski and Kris Gaj Volgenau School of Engineering George Mason University Fairfax, Virginia 22030 email: {mrogawsk,

More information

Rubik's Shells.

Rubik's Shells. Ruik's Shells Ruik's Shells is a puzzle that consists of 4 intersecting rings, coloured heels ith 8 alls each, hich can rotat The heels are in to pairs; to axes ith a pair of heels on each, and the to

More information

CAESAR Hardware API. Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj

CAESAR Hardware API. Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj CAESAR Hardware API Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj Cryptographic Engineering Research Group George Mason University

More information

ECE 646 Lecture 12. Cryptographic Standards. Secret-key cryptography standards

ECE 646 Lecture 12. Cryptographic Standards. Secret-key cryptography standards ECE 646 Lecture 12 Cryptographic Standards Secret-key cryptography Federal Banking International NIST FIPS 46-1 DES FIPS 46-2 DES FIPS 81 Modes of operation FIPS 46-3 Triple DES FIPS 197 AES X3.92 DES

More information

Compact Hardware Implementations of ChaCha, BLAKE, Threefish, and Skein on FPGA

Compact Hardware Implementations of ChaCha, BLAKE, Threefish, and Skein on FPGA Compact Hardware Implementations of ChaCha, BLAKE, Threefish, and Skein on FPGA Nuray At, Jean-Luc Beuchat, Eiji Okamoto, İsmail San, and Teppei Yamazaki Department of Electrical and Electronics Engineering,

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 2 (Jul. - Aug. 2013), PP 13-18 A Novel Approach of Area-Efficient FIR Filter

More information

Implementation of the block cipher Rijndael using Altera FPGA

Implementation of the block cipher Rijndael using Altera FPGA Regular paper Implementation of the block cipher Rijndael using Altera FPGA Piotr Mroczkowski Abstract A short description of the block cipher Rijndael is presented. Hardware implementation by means of

More information

GMU Hardware API for Authen4cated Ciphers

GMU Hardware API for Authen4cated Ciphers GMU Hardware API for Authen4cated Ciphers Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Malik Umar Sharif, and Kris Gaj George Mason University USA http:/cryptography.gmu.edu

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Structural Features for Recognizing Degraded Printed Gurmukhi Script

Structural Features for Recognizing Degraded Printed Gurmukhi Script Fifth International Conference on Information Technology: Ne Generations Structural Features for Recognizing Degraded Printed Gurmukhi Script M. K. Jindal Department of Computer Applications, P. U. Regional

More information

Federal standards NIST FIPS 46-1 DES FIPS 46-2 DES. FIPS 81 Modes of. operation. FIPS 46-3 Triple DES FIPS 197 AES. industry.

Federal standards NIST FIPS 46-1 DES FIPS 46-2 DES. FIPS 81 Modes of. operation. FIPS 46-3 Triple DES FIPS 197 AES. industry. ECE 646 Lecture 12 Federal Secret- cryptography Banking International Cryptographic Standards NIST FIPS 46-1 DES FIPS 46-2 DES FIPS 81 Modes of operation FIPS 46-3 Triple DES FIPS 197 AES X3.92 DES ANSI

More information

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya 1, Chang Shu 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George

More information

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining Pawel Chodowiec, Po Khuon, Kris Gaj Electrical and Computer Engineering George Mason University Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining http://ece.gmu.edu/crypto-text.htm

More information

NIST SHA-3 ASIC Datasheet

NIST SHA-3 ASIC Datasheet NIST SHA-3 ASIC Datasheet -- NIST SHA-3 Competition Five Finalists on a Chip (Version 1.1) Technology: IBM MOSIS 0.13µm CMR8SF-RVT Standard-Cell Library: ARM s Artisan SAGE-X V2.0 Area: 5mm 2 (Core: 1.656mm

More information

Design and Efficient FPGA Implementation of an RGB to YCrCb Color Space Converter Using Distributed Arithmetic

Design and Efficient FPGA Implementation of an RGB to YCrCb Color Space Converter Using Distributed Arithmetic Design and Efficient FPGA Implementation of an RGB to YCrC Color Space Converter Using Distriuted Arithmetic Faycal Bensaali and Aes Amira School of Computer Science, Queen s University of Belfast, Belfast

More information

The Codesign Challenge

The Codesign Challenge ECE 45: Hardware/Software Codesign Codesign Challenge Fall 21 The Codesign Challenge Ojectives In the codesign challenge, your task is to accelerate a given software reference implementation as much as

More information

Compact implementations of Grøstl, JH and Skein for FPGAs

Compact implementations of Grøstl, JH and Skein for FPGAs Compact implementations of Grøstl, JH and Skein for FPGAs Bernhard Jungk Hochschule RheinMain University of Applied Sciences Wiesbaden Rüsselsheim Geisenheim bernhard.jungk@hs-rm.de Abstract. This work

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Benchmarking of Round 3 CAESAR Candidates in Hardware: Methodology, Designs & Results

Benchmarking of Round 3 CAESAR Candidates in Hardware: Methodology, Designs & Results Benchmarking of Round 3 CAESAR Candidates in Hardware: Methodology, Designs & Results Ekawat Homsirikamol, Farnoud Farahmand, William Diehl, and Kris Gaj George Mason University USA http://cryptography.gmu.edu

More information

On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl

On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl Bernhard Jungk, Steffen Reith, and Jürgen Apfelbeck Fachhochschule Wiesbaden University of Applied Sciences {jungk reith}@informatik.fh-wiesbaden.de

More information

ECE 545 Fall 2013 Final Exam

ECE 545 Fall 2013 Final Exam ECE 545 Fall 2013 Final Exam Problem 1 Develop an ASM chart for the circuit EXAM from your Midterm Exam, described below using its A. pseudocode B. table of input/output ports C. block diagram D. interface

More information

Dept. of Computing Science & Math

Dept. of Computing Science & Math Lecture 4: Multi-Laer Perceptrons 1 Revie of Gradient Descent Learning 1. The purpose of neural netor training is to minimize the output errors on a particular set of training data b adusting the netor

More information

Parallelized Very High Radix Scalable Montgomery Multipliers

Parallelized Very High Radix Scalable Montgomery Multipliers Parallelized Very High Radix Scalable Montgomery Multipliers Kyle Kelley and Daid Harris Harey Mudd College 301 E. Telfth St. Claremont, CA 91711 {Kyle_Kelley, Daid_Harris}@hmc.edu Abstract This paper

More information

HOST Cryptography III ECE 525 ECE UNM 1 (1/18/18)

HOST Cryptography III ECE 525 ECE UNM 1 (1/18/18) AES Block Cipher Blockciphers are central tool in the design of protocols for shared-key cryptography What is a blockcipher? It is a function E of parameters k and n that maps { 0, 1} k { 0, 1} n { 0,

More information

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs ECE 645: Lecture Basic Adders and Counters Implementation of Adders in FPGAs Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 5, Basic Addition and Counting,

More information

4.5 Routing Algorithms

4.5 Routing Algorithms 4.5 ROUTING ALGORITHMS 363 to hosts enjo the securit services provided b IPsec. On the sending side, the transport laer passes a segment to IPsec. IPsec then encrpts the segment, appends additional securit

More information

High-Performance and Area-Efficient Hardware Design for Radix-2 k Montgomery Multipliers

High-Performance and Area-Efficient Hardware Design for Radix-2 k Montgomery Multipliers High-Performance and Area-Efficient Hardare Design for Radix- k Montgomery Multipliers Liang Zhou, Miaoqing Huang, Scott C. Smith University of Arkansas, Fayetteville, Arkansas 771, USA Abstract Montgomery

More information

A unified architecture of MD5 and RIPEMD-160 hash algorithms

A unified architecture of MD5 and RIPEMD-160 hash algorithms Title A unified architecture of MD5 and RIPMD-160 hash algorithms Author(s) Ng, CW; Ng, TS; Yip, KW Citation The 2004 I International Symposium on Cirquits and Systems, Vancouver, BC., 23-26 May 2004.

More information

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering Winter/Summer Training Level 2 continues. 3 rd Year 4 th Year FIG-3 Level 1 (Basic & Mandatory) & Level 1.1 and

More information

Kris Gaj. Research and teaching interests: cryptography computer arithmetic FPGA design and verification

Kris Gaj. Research and teaching interests: cryptography computer arithmetic FPGA design and verification Kris Gaj Research and teaching interests: Contact: cryptography computer arithmetic FPGA design and verification Engineering Bldg., room 3225 kgaj@gmu.edu (703) 993-1575 Office hours: Monday, 3:00-4:00

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

SINCE the introduction of the RSA algorithm [1] in

SINCE the introduction of the RSA algorithm [1] in I TRANSACTIONS ON COMPUTRS, VOL. N, NO. N, MM X Ne Hardare Architectures for Montgomery Modular Multiplication Algorithm Miaoqing Huang, Member, Kris Gaj, Tarek l-ghazai, Senior Member Abstract Montgomery

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (FFT_MIXED) November 26, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com

More information

ECE241 - Digital Systems. University of Toronto. Lab #2 - Fall Introduction Computer-Aided Design Software, the DE2 Board and Simple Logic

ECE241 - Digital Systems. University of Toronto. Lab #2 - Fall Introduction Computer-Aided Design Software, the DE2 Board and Simple Logic ECE24 - Digital Sstems Universit of Toronto Lab #2 - Fall 28 Introduction Computer-Aided Design Software, the DE2 Board and Simple Logic. Introduction The purpose of this eercise is to introduce ou to

More information

Chapter 3 Symmetric Cryptography

Chapter 3 Symmetric Cryptography Symmetric ion General description: The same key A,B is used for enciphering and deciphering of messages: Chapter 3 Symmetric Cryptography Decrypt Plaintext Ciphertext Modes of ion Feistel Network Data

More information

RC6 Implementation including key scheduling using FPGA

RC6 Implementation including key scheduling using FPGA ECE 646, HI-3 1 RC6 Implementation including key scheduling using FPGA (ECE 646 Project, December 2006) Fouad Ramia, Hunar Qadir, GMU Abstract with today's great demand for secure communications systems,

More information

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by

More information

Chapter 3 Symmetric Cryptography

Chapter 3 Symmetric Cryptography Chapter 3 Symmetric Cryptography Modes of ion Feistel Network Data ion Standard (DES) Advanced ion Standard (AES) Stream Cipher RC4 [NetSec/SysSec], WS 2007/2008 3.1 Symmetric ion General description:

More information

The Hash Function Fugue 2.0

The Hash Function Fugue 2.0 The Hash Function Fugue 2.0 Shai Halevi William E. Hall Charanjit S. Jutla IBM T.J. Watson Research Center April 2, 2012 1 Specification of Fugue 2.0 Fugue 2.0 is a modification of the original Fugue hash

More information

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single

More information

Hardware Performance Evaluation of SHA-3 Candidate Algorithms

Hardware Performance Evaluation of SHA-3 Candidate Algorithms Journal of Information Security, 2012, 3, 69-76 http://dx.doi.org/10.4236/jis.2012.32008 Published Online April 2012 (http://www.scirp.org/journal/jis) Hardware Performance Evaluation of SHA-3 Candidate

More information

Multi-Stage Fault Attacks on Block Ciphers

Multi-Stage Fault Attacks on Block Ciphers Multi-Stage Fault Attacks on Block Ciphers Philipp Jovanovic, Martin Kreuer, Ilia Polian Fakultät für Informatik und Mathematik Universität Passau 94032 Passau, German {Philipp.Jovanovic, Martin.Kreuer,

More information

Cryptographic Properties and Implementation Complexity of Different Permutation Operations. Abstract

Cryptographic Properties and Implementation Complexity of Different Permutation Operations. Abstract Cryptographic Properties and Implementation Complexity of Different Permutation Operations Abstract The orkload of computers has changed dramatically. Ne user requirements include the increasing need to

More information

CONSIDERATIONS ON HARDWARE IMPLEMENTATIONS OF ENCRYPTION ALGORITHMS

CONSIDERATIONS ON HARDWARE IMPLEMENTATIONS OF ENCRYPTION ALGORITHMS CONSIDERATIONS ON HARDWARE IMPLEMENTATIONS OF ENCRYPTION ALGORITHMS Ioan Mang University of Oradea, Faculty of Electrotechnics and Informatics, Computer Science Department, 3, Armatei Romane Str., 3700

More information

January 1996, ver. 1 Functional Specification 1

January 1996, ver. 1 Functional Specification 1 FIR Filters January 1996, ver. 1 Functional Specification 1 Features High-speed operation: up to 105 million samples per second (MSPS) -, 16-, 24-, 32-, and 64-tap finite impulse response (FIR) filters

More information

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM DES and DES3 Encryption Engine (MC-XIL-DES) May 19, 2008 Product Specification AllianceCORE Facts 10805 Rancho Bernardo Road Suite 110 San Diego, California 92127 USA Phone: (858) 385-7652 Fax: (858) 385-7770

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Industrial Data Communications - Fundamentals

Industrial Data Communications - Fundamentals Industrial Data Communications - Fundamentals Tutorial 1 This tutorial on the fundamentals of communications is broken don into the folloing sections: Communication Modes Synchronous versus Asynchronous

More information

Implementation of a Turbo Encoder and Turbo Decoder on DSP Processor-TMS320C6713

Implementation of a Turbo Encoder and Turbo Decoder on DSP Processor-TMS320C6713 International Journal of Engineering Research and Development e-issn : 2278-067X, p-issn : 2278-800X,.ijerd.com Volume 2, Issue 5 (July 2012), PP. 37-41 Implementation of a Turbo Encoder and Turbo Decoder

More information

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya, Chang Shu, Kris Gaj George Mason University Tarek El-Ghazawi The George

More information

Stratix II vs. Virtex-4 Performance Comparison

Stratix II vs. Virtex-4 Performance Comparison White Paper Stratix II vs. Virtex-4 Performance Comparison Altera Stratix II devices use a new and innovative logic structure called the adaptive logic module () to make Stratix II devices the industry

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (FFT_PIPE) Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com Core Facts Documentation

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

SHA-3 interoperability

SHA-3 interoperability SHA-3 interoperability Daniel J. Bernstein Department of Computer Science (MC 152) The University of Illinois at Chicago Chicago, IL 60607 7053 djb@cr.yp.to 1 Introduction: You thought software upgrades

More information

Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University

Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University Computational Boolean Algera Representations Applying unate recursive paradigm (URP) in solving tautology is a great warm up example.

More information

Effect of Replica Placement on the Reliability of Large-Scale Data Storage Systems

Effect of Replica Placement on the Reliability of Large-Scale Data Storage Systems Effect of Replica Placement on the Reliaility of Large-Scale Data Storage Systems Vinodh Venkatesan, Ilias Iliadis, Xiao-Yu Hu, Roert Haas IBM Research - Zurich {ven, ili, xhu, rha}@zurich.im.com Christina

More information

Digital Design using HDLs EE 4755 Final Examination

Digital Design using HDLs EE 4755 Final Examination Name Digital Design using HDLs EE 4755 Final Examination Thursday, 8 December 26 2:3-4:3 CST Alias Problem Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Exam Total (3 pts) (2 pts) (5 pts) (5 pts) (

More information

An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code

An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code An 8Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code Abstract We developed an architecture optimization technique called divide-and-concatenate and applied it to

More information

Winter 2011 Josh Benaloh Brian LaMacchia

Winter 2011 Josh Benaloh Brian LaMacchia Winter 2011 Josh Benaloh Brian LaMacchia Symmetric Cryptography January 20, 2011 Practical Aspects of Modern Cryptography 2 Agenda Symmetric key ciphers Stream ciphers Block ciphers Cryptographic hash

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs On the parallelization of slice-based Keccak implementations on Xilinx FPGAs Jori Winderickx, Joan Daemen and Nele Mentens KU Leuven, ESAT/COSIC & iminds, Leuven, Belgium STMicroelectronics Belgium & Radboud

More information

Hybrid LUT/Multiplexer FPGA Logic Architectures

Hybrid LUT/Multiplexer FPGA Logic Architectures Hybrid LUT/Multiplexer FPGA Logic Architectures Abstract: Hybrid configurable logic block architectures for field-programmable gate arrays that contain a mixture of lookup tables and hardened multiplexers

More information