U k-2. U k-1 V 1. V k-1. W k-1 W 1 P 1. P k-1

Size: px
Start display at page:

Download "U k-2. U k-1 V 1. V k-1. W k-1 W 1 P 1. P k-1"

Transcription

1 A Super-Serial Galois Fields Multiplier for FPGAs and its Application to Public-Key Algorithms Gerardo Orlando Christof Paar GTE Government Systems ECE Department 77 A. St. Worcester Polytechnic Institute Needham, MA Institute Road Worcester, MA presented at: FCCM '99, April 21-23, 1999, Napa Valley, California Abstract This contribution introduces a scalable multiplier architecture for Galois field GF (2 k ) amenable for field programmable gate arrays (FPGAs) implementations. This architecture is well suited for the implementation of public-key cryptosystems which require programmable multipliers in large Galois fields. The architecture trades a reduction in resources with an increase in the number of clock cycles. This architecture is also fine grain scalable in both the time and the area (or logic) dimensions thus facilitating implementations that maximize their use of finite FPGA resources while achieving fast computational speed. This leads to an architecture that requires less resources than traditional bit serial multipliers, which we demonstrated with implementations of multipliers in the field GF (2 167 ). Our results demonstrate that for this field one can realize super-serial multipliers that use 2.76 times fewer function generators and 6.84 times fewer flip-flops than their serial multiplier counterparts. We also extrapolated the performance of these multipliers in an elliptic curve cryptosystem. 1 Introduction Public-key cryptography is an indispensable technology of the new digital age. This technology enables identification, authentication, secure communications, intellectual property protection, exchanges of goods and services, and many other services. Many popular public-key algorithms require arithmetic in Galois fields GF (2 k ), including in particular schemes based on the intractable discrete logarithm in finite fields (see [1] for an overview), elliptic curve discrete logarithm [2, 3], or hyperelliptic curves [4]. Galois field multiplication is usually considered the most crucial operation for the performance of these cryptosystems. All practical public-key schemes require operations in relatively large finite fields; e.g., about 150{250 bit for elliptic curve systems and about 1024 bits for systems based on the discrete logarithm problem in finite fields. For physical security as well as for performance reasons, implementations of Galois field arithmetic in hardware are very attractive. At the same time, the algorithm-independent design paradigm of modern cryptographic protocols and flexible security levels require alterable implementations that are difficult to provide with traditional (non-reconfigurable) hardware. Our architecture deals with these issues by offering fine-grained scalability and the inherent reprogrammability and speed of FPGAs. Previous work in this area includes traditional bitserial multipliers (referred to here as serial multipliers), which compute GF (2 k ) multiplication in k cycles using O(k) logic elements, and parallel multipliers that compute field multiplications in one cycle using O(k 2 ) logic elements. More recently, hybrid multipliers for composite fields GF ((2 n ) m ) were introduced in [5, 6] and digit-serial architectures were introduced in [7]. These multipliers perform multiplication in m cycles using O(mk) logic elements. Most of the documented studies focus on ASIC implementations. There are just a few that focus on reprogrammable logic. They include [8, 9, 10]. This work introduces, to our knowledge, a new sliced, polynomial basis multiplier based on a traditional serial multiplier [5]. We refer to the new architectures as super-serial. This multiplier emulates the operation of the serial multiplier using a smaller number of processors but requires more clock cycles, thus

2 realizing smaller and proportionally slower implementations by exploring a time-space trade-off. This paper begins with an introduction of a serial multiplier, then continues with a detailed description of the super-serial multiplier, and concludes with an analysis of the proof-of-concept implementations and their estimated performance in an elliptic curve cryptosystem. Throughout this document the acronyms SM and SSM are used to reference respectively the serial and the super-serial multipliers. 2 Preliminaries 2.1 Galois Field Multiplication We consider arithmetic in an extension field of GF (2). The extension degree is denoted by k, so that the field can be denoted by GF (2 k ). This field is isomorphic to GF (2)[x]/(P(x)), where P (x) = x k + P k?1 i=0 p ix i is an irreducible polynomial of degree k with p i 2 GF (2). In the following, a residue class will be identified with the polynomial of least degree in its class. The product of two field element W = UV, where W P P (x) = k?1 P i=0 w ix i, U(x) = k?1 i=0 u ix i, and V (x) = k?1 i=0 v ix i, is defined as follows: W (x) = U(x)V (x) mod P (x) (1) GF (2 k ) multiplication can be carried out by multiplying U(x) and V (x) and then performing reduction modulo P (x) or alternatively by interleaving multiplication and reduction according to Equation (2). In this equation W 0(i) corresponds to the partial results generated at step i of the recursion. W 0(i) = xw 0(i?1) mod P (x) + u k?i V (x) (2) for i = 1,2,...,k ; W 0(0) = 0 ; W (x) = W 0(k) GF (2 k ) multiplication in k cycles using Algorithm 1 [11, 12]. This algorithm is an implementation of the multiplication in (2) that uses identity (3) for polynomial reductions. The computation is carried out in k cycles, each of which realizes the operation in Equation (2). Algorithm 1: SM multiplication algorithm w[-1] = 0 w[0 to k? 1] = 0 For i = k? 1 down to 0 Do Parallel For j = 0 to k? 1 Do w[j] = w[j? 1] + (u[i] * v[j]) + (w[k? 1] * p[j]) End Parallel For End For A hardware realization of the multiplier is shown in Figure 1 (note that each connection is one bit wide). The serial multiplier is particularly attractive for FPGA implementations because it uses very simple processors (or slices) and a highly distributed interconnect network. Each processor consists of three register (v; w; p), two AND gates (GF (2) multipliers), and an XOR gate (GF (2) adder). The interconnect network consists mainly of connections between neighboring processors along with two global interconnects (u; w k?1). The complexity of this multiplier can be further reduced for implementations that use irreducible polynomials with few coefficients such as trinomials or pentanomials. On average, these implementations save one register, one AND gate, and one input of the XOR gate per processor. They also require only one global connection from the U operand shift register to all the processors. U 0 U k-2 U k-1 In (2), the products xw 0(i?1) are polynomials of degree k, which must be reduced modulo P (x). These reductions are done using the following identity. V 0 W 0 V 1 W 1 V k-1 W k-1 x k p k?1x k?1 + + p1x + p0 mod P (x) (3) Each field multiplication requires a total of k 2 bit multiplications and k? 1 polynomial reductions. P 0 P 1 P k Serial Multiplier Architecture The serial multiplier, sometimes referred to as \MSB-First multiplier," is a polynomial basis multiplier that uses k processors (or slices) and computes Figure 1: Serial multiplier in GF (2 k ) The main building blocks of modern FPGAs are flip-flops (FF), 4-input function generators (FG) that

3 implement any logical function of its inputs, and memory elements (MM). Using these logic elements as a metric, we estimated the logic complexity of this multiplier for configurations that implement programmable P (x) polynomials and those that implement fixed P (x) polynomials. The complexity of the latter is significantly lower especially for implementations of low complexity polynomials such as trinomials and pentanomials. We also estimated the complexity of configurations that implement the U operand shift register with memory elements as well as flip-flops. These estimates are summarized in Table 1. In general, the complexity of the serial multiplier is proportional to k. For fixed polynomial implementations, it also depends on p, the number of non-zero coefficients of P(x) minus one. For cryptographic applications where k is large, p can be approximated to zero for low complexity polynomials such as trinomials and pentanomials for which p = 2 and 4 respectively. Because memory implementations are not uniform across FPGAs, Table 1 summarizes the complexity of implementations of the U operand shift register with memory elements as c, where c represents the number of resources utilized; that is, c under the #FG column represents the number of function generators utilized and under the #FF column represents the number of flip-flops utilized. These U shift register implementations require counters that generate memory addresses and miscellaneous multiplexing and decoding logic. For cryptographic applications, the complexity of this circuitry can also be ignored as c << k. 3 A Super-Serial Multiplier 3.1 Principal Idea As can be seen from the SM entries of Table 1, the classical serial multiplier requires ak resources for multiplication in GF (2 k ), where a 1 is a constant. Since the values of k required for practical public-key algorithms are relatively large, at least 150 bit for elliptic curves systems, it seems attractive to provide multiplier architectures which require less resources, especially if FPGA implementations are intended. In the following such architecture will be developed which still requires bk resources, but this time with b < 1. We name this architecture super-serial multiplier as it takes several clock cycles to compute a partial product W 0(i). The super-serial multiplier is also a polynomial basis multiplier that uses m < k processors and computes GF (2 k ) multiplication in k dk=me cycles (see Figure 2). It implements Algorithm 2, which is also a partially parallel realization of the multiplication (2) that uses identity (3) for polynomial reduction. As the serial multiplier, it performs GF (2 k ) multiplication by recursively computing W 0(i). Each partial result is computed by its m < k processors in dk=me cycles thus realizing multiplication in k dk=me cycles. In contrast, the serial multiplier computes the same result in one cycle using a larger number of processors thus computing an entire multiplication in k cycles. It should be noted that contrary to a serial multiplier, which for a particular field GF (2 k ) requires a fixed number of processors (k) and computes a multiplication in a fixed number of cycles (k), the super-serial multiplier allows implementations to choose the number of processors (m) that best meet their target processing time (k dk=me cycles per multiplication) and area requirements. This flexible design option is particularly relevant for implementations in reconfigurable hardware. Algorithm 2: SSM multiplication algorithm w 0 [-1] = 0 w 0 [0 to k? 1] = 0 w[0 to k? 1] = 0 For i = k? 1 down to 0 Do For j = 0 to dk=me Do Parallel For l = 0 to m? 1 Do w[j m + l] = w 0 [j m + l? 1] + (u[i] * v[j m + l]) + (w 0 [k? 1]*p[j m + l]) End Parallel For End For w 0 [0 to k? 1] = w[0 to k? 1] End For The super-serial multiplier trades area for speed. It performs essentially the same algorithm the serial multiplier implements but with a lower degree of parallelism. It should be noted that the reduced parallelism is accompanied by a reduced number of processing elements and an increased number of storage elements. Discrete implementations of the super-serial multiplier using gates and flip-flops would result in designs that are larger than their serial multiplier counterparts. On the other hand, implementations in FPGAs that implement memory elements result in smaller designs because it is generally \cheaper" to store data in memory elements than in flip-flops. As an example, the logic complexity of a 16x1-bit memory element in a Xilinx XC4000 FPGA is comparable to that of a single flipflop.

4 3.2 Serial Multiplier Emulation Functionally, the super-serial multiplier performs GF (2 k ) multiplication by emulating the operation of a serial multiplier. The emulation is done by logically mapping the functions performed by the k processors of the serial multiplier into the m processors of the super-serial multiplier. This mapping assigns the processing functions of processor x of the serial multiplier to processor y of the super-serial multiplier, where the relationship between x and y is given by y x mod m. An example of such mapping is illustrated by Figure 2 for m = 5. It should be stressed that only one row with m processing elements is actually realized in hardware. s 0 s 1 s 5 s 6 s 7 s k-3 s k-2 s k-1 ss 0 s 0 s 5 s k-4 ss 1 s 1 s 6 s k-3 ss 2 s 2 s 7 s k-2 logical mapping ss 3 s 3 s 8 s k-1 s x - serial multiplier processor ss x - super-serial multiplier processor ss 4 Figure 2: Mapping of functions from the serial multiplier to the super-serial multiplier with m = 5 The super-serial multiplier emulates the serial multiplier by first emulating processors S0 to S m?1, then processors S m to S2m?1, and so on until all the k processors are emulated. This process is repeated k times for a full multiplication, which results in a complete multiplication after a total of k dk=me cycles. To carry out this emulation, the super-serial multiplier incorporates processors that realize the same mathematical function implemented by the processors of the serial multiplier. These processors also incorporate 3 dk=me bits of storage for the V operand, the multiplication results (W ), and the coefficients of the irreducible polynomial (P ). This storage is used to save and restore the state (u; v; p) of the processors s 4 s 9 they emulate. In addition, the super-serial multiplier incorporates a mechanism for the propagation of results from one cycle to another as the emulation of a serial multiplier cycle spans multiple cycles. When a processor from the super-serial multiplier emulates a processor from the serial multiplier, it restores the state of the emulated processor, computes the next partial result, and then saves the new state. For these processors, the first multiplication cycle is a special one. This cycle initializes W with the product u k?1v (x) (first product of Equation (2)). These processors support this cycle with a special reset emulation circuit that forces their output to zero when active (shown in Figure 3 with multiplexers). The super-serial multiplier incorporates a data transfer mechanism from processor SS m?1 to processor SS0. This mechanism is used to emulate the connection between processors S m?1 and S m, S2m?1 and S2m,, S mbk=mc?1 and S mbk=mc of the serial multiplier. Although not entirely obvious, this mechanism incorporates dk=me? 1 bits of storage because the results from the emulated processors S m?1, S2m?1,, S mbk=mc?1 are not immediately used in the following cycle. These results are used in the computation of the next partial result W 0(i) not the current one. This detail is easier to grasp by noticing that at the beginning of a multiplication all the w registers of the serial multiplier are reset, and thus all the processors receive a zero from their neighboring processors. The result of this cycle is the initialization of the multiplier with the product u k?1v (x). This partial result is then used in the following multiplication cycle. This multiplier also propagates the partial results of the emulated S k?1 processor. This result is latched so it is available through the dk=me cycles in which it is used by various processors. As an example, in Figure 3 we assume that processor S k?1 (which is the last processor of the serial multiplier) is realized by slice m? 2, so that the register for the result of S k?1 is placed after slice m? 2. In the example shown in Figure 2, this result is propagated to processor SS0 where it is used in the first cycle and to processor SS1 where it is used in the second cycle. It should be pointed out that the serial multiplier requires the loading of the V operand and one bit of the U operand before it can start computing a product. The multiplication result also becomes available all at once on the last clock cycle. For most practical implementations, the loading and unloading of data must span multiple clock cycles because the data is typically carried over busses that are much narrower than the field elements. Field elements are commonly

5 more than 150 bits wide in public key applications. Therefore, these multipliers must idle while I/O takes place or must store the results in temporary registers. Implementations of the super-serial multiplier can overcome this limitation by using a number of processors equal to the bus width of the interfacing busses. These multipliers can start computing a product as soon as the m least significant bits of the V operand and the most significant bit of the U operand are available. They can also make their results available, piecewise, in successive groups of m bits starting with the least significant ones and and ending with the most significant ones. 3.3 Architecture The architecture of the super-serial multiplier, shown in Figure 3, is similar to that of the serial multiplier in its processor architecture, processor communications, storage of the U operand, but it is substantially more complex in its control structure. Whereas the serial multiplier requires minimum control, the super-serial multiplier requires a more sophisticated controller that guides the emulation of the processors of a serial multiplier and controls the transfer of partial results from one intermediate cycle to another. Because the architecture of the super-serial multiplier is memory independent, it can be efficiently implemented in FPGAs that implement either centralized or distributed memory. Its partially distributed and partially centralized interconnect network is also well suited for FPGA implementations. Its distributed interconnects are analogous to those of the serial multiplier and its centralized ones are very regular and in practice link a moderate number of processors. Using the FPGA metric previously noted, we estimated the logic complexity of this multiplier for configurations that implement programmable P (x) polynomials and those that implement fixed polynomials. These estimates are summarized in Table 1. Note that these estimates exclude the complexity of the multiplier's controller as it is tightly coupled to the utilized FPGA technology and the system details. As a reference, the complexity of the controller used in the implementation documented in Subsection 4.2, is 64 function generators and 26 flip-flops. As for the serial multiplier, these estimates also represent the complexity of the U operand shift register with c. The serial multiplier core uses mainly function generators and memory elements. It uses only one flipflop to store the result from the emulated processor S k?1. Its controller consumes the bulk of the flipflops along with some function generators. In general, the complexity in terms of function generators is proportionally to m. For fixed polynomial implementations, it also depends on p 0, the number of non-overlapping sets of non-zero coefficients of P (x). These non-overlapping sets needs to be supported with independent hardware. For example, the multiplier shown in Figure 2 requires the propagation of the result from the emulated processor S k?1 to the emulated processors S0 and S6, requiring the propagation of this result to two different processors SS0 and SS1. For this case p 0 = 2. If this result is used by a single processor, then p 0 equal one. This would have been the case in Figure 2 if the propagation of the result were to the emulated processor S5 instead of S6. The multiplier's memory complexity depends on both k and m. The term k corresponds to the U operand memory and the terms proportional to m correspond processors' memory. Unless m is large, accurate estimates should not ignore the complexity of the controller and the U operand shift register. Table 1: Multipliers' estimated complexity Mult. U 1 P (x) 2 #FG 345 #FF 3 MMb 56 FF P 2k 4k 0 F k + p 3k 0 SM MM P 2k + c 3k + c k F k + p 2k + c k +c P 2m + 1 c + 1 k + +c (3m + 1)* SSM 7 MM dk=me F m + 1+ c + 1 k + (2m p 0 + c +1 + p 0 )* dk=me 4 Proof-of-Concept Implementation 4.1 Rationale and Parameter Choice The theoretical concepts were verified through implementations of a serial and a super-serial multiplier in modern FPGAs. In order to assure practical relevance, we designed a multiplier for the finite field GF (2 167 ) with the field polynomial 1 U - shift reg. impl. memory (MM) or flip-flops (FF) 2 P (x) - programmable poly. (P) or fixed poly.(f) 3 c = refers to U shift register logic 4 p = no. of non-zero coef. of P (x) minus 1 5 p' = no. of slices requiring wk?1 input 6 MMb - storage bits 7 Excludes the complexity of its controller.

6 U 0 V 0 W 0 P 0 proc addr 0 reset Legend U k-2 0 reset + start of cycle U k-1 W path addr single port memory start of cycle u addr reset Controller proc addr V m-2 W m-2 P m-2 proc addr 0 reset reg dual port memory path addr Figure 3: Super-serial multiplier V m-1 U proc addr P (x) = x x over GF(2). This field is highly interesting as an underlying algebraic structure for public-key cryptosystems based on elliptic curves [13]. Elliptic curves form the most recent family of publickey algorithms with practical relevance. Due to their small field orders (as opposed to RSA or schemes based on the discrete logarithm problem in finite fields), they are very attractive for implementation in reconfigurable logic. We concentrated on compact implementations and thus minimized the I/O and the control logic and implemented the U operand shift register using memory elements. We also implemented a common coprocessor interface for both multipliers consisting of an 11-bit data bus, a 6-bit address bus, a device select control signal, and an interrupt signal that signals the end of a multiplication. The selection of an 11-bit data bus resulted in an area efficient implementation for GF (2 167 ). It led to the a high utilization of the memory elements available in the used FPGAs and minimized the I/O complexity. Since for these FPGAs the function generators can be configured as 16x1-bit memories, very efficient implementations are achieved when k=m approximates a multiple of sixteen. For our implementation, this figure is 167=11 = 15:2 which represents a high utilization of memory elements. 4.2 FPGA Implementations W m-1 P m-1 0 reset The prototype implementations were done using Xilinx XC4000X FPGAs of speed grade -09. These devices incorporate distributed synchronous single and dual port memories that can be used in place of 4- input function generators. For these parts the logic complexity of a synchronous single port 16x1-bit memory element is the same as that of a 4-input function generator. This FPGA family also defines a number of parts with different logic densities, some of which were used to verify the scalability of both architectures with respect to FPGA resources. To achieve fast and area efficient implementations, we used logic generated by Xilinx's LogiBLOX version M1.5.19, which we incorporated into our VHDL code. For designs compilation and synthesis, we used Synopsis' FPGA Analyzer version and Xilinx's Design Manager version M Our proof-of-concept approach focused on prototyping practical multipliers with common interfaces. To realize these prototypes, logic was added to the basic multipliers described in Sections 2.2 and 3. The additions consisted of an interface circuitry and for the super-serial multiplier it also included the use of pipelined processors. The processors were pipelined by adding registers at the outputs of the U operand shift register and at the outputs of the v and the w memory elements. Minor modifications to the controller were also necessary. Pipelining was used because it was possible at practically no logic cost. As Table 1 shows, the super-serial multiplier uses far more function generators than flip-flops, which could result in a large number of wasted flip-flops due to the unavailability of routing resources. For I/O interface, we used an 11-bit data bus. This bus width was chosen because it matched the number of processors in the implementation of the super-serial multiplier (m = 11). This implementation achieved a high utilization of 1x16-bit memory elements for v's, and w's. The overall result is a very compact superserial multiplier for the field GF (2 167 ). This parameter selection did not affect the overall area of the serial multiplier as the complexity of its core circuitry is much larger than the complexity of its interface circuitry. The implementation results are summarized in Tables 2 and 3. Table 2 summarizes the logic complexity. To facilitate comparisons, this table also include normalized results that use as a reference the complexity of the super-serial multiplier. Table 3 summarizes the operational frequency and the percent of FPGA resources used by each implementation for different devices.

7 Table 2: Area (or logic) results for GF (2 167 ) Mult. #FG #FF #CLB 8 abs norm abs norm abs norm SSM (m = 11) SM Table 3: Timings and resource usage for GF (2 167 ) Mult. FPGA % % % Freq. FG FF CLB (MHz) SSM XC (m = 11) XC XC SM 9 XC XC Elliptic Curve Performance As described above, it is very interesting to implement elliptic curve cryptosystems on reconfigurable logic. For that reason, we extrapolated the performance of these multipliers for an elliptic curve cryptosystem. Our timing estimates, summarized in Table 4, are based on the computation of elliptic curve point multiplication using projective coordinates as documented in [10]. We considered projective coordinates because they eliminate field inversions from the computation of elliptic curve point multiplications at the cost of additional field multiplications [13]. The projected results assume the use of the double-and-add algorithm, which on average requires k? 1 point doubles and (k? 1)=2 point additions per point multiplication. In these projective coordinates, the computation of a point double requires 12 field multiplication and the computation of a point addition requires 14 field multiplications, leading to the number of clock cycles shown in Table 4. Note that the results in Table 5 ignore the transformation from projective to affine coordinates. This transformation requires an inversion and two field multiplications. These results also ignore additions, which are realized by a single bitwise XOR operation and are thus very inexpensive compared to multiplications. The expressions in Table 4 were used along with the performance numbers recorded in Table 3 to estimate the time to perform a GF (2 k ) multiplication and an elliptic curve point multiplication. These results are summarized in Table 5. Table 4: Number of clock cycles for elliptic curve point multiplication SM 19k(k? 1) SSM 19k(k? 1)(dk=me) Table 5: Estimated timing for GF (2 167 ) field and elliptic curve point multiplication Mult. FPGA GF Mult. Elliptic Curve (sec) Point Mult. (msec) SSM XC (m = 11) XC XC SM 9 XC XC Conclusions In this work we introduced a super-serial multiplier architecture which is, to our knowledge, a new GF (2 k ) multiplier. This architecture is particularly attractive for FPGA implementations because of its regularity; its mainly distributed network with few global interconnects; its simple processors which are efficiently implementable with function generators, flip-flops and memory; and, to a large extend, because its fine grained time and area scalability. This later point is one of great practical importance as it allows many implementations to reach a balance between performance and area over a wide ranges, which could be translated into cost reductions and product growth. We proved the theoretical concepts with proof-ofconcept implementations that demonstrated that substantial area savings are achievable for multipliers suitable for secure elliptic curve cryptosystems. More precisely, our implementation of the serial multiplier uses 2.76 times more function generators, 6.84 times more flip-flops, and 5.78 times more CLBs than our implementation of the super-serial multiplier. Our results also demonstrate the scalability of both serial and super-serial multipliers with respect to FPGA resources as their overall performance remained virtually constant across parts with different logic densities. 8 Configurable logic blocks consisting of one 3- and two 4- input function generators, and two flip-flops. 9 Design did not fit in the XC4005 part.

8 References [1] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of Applied Cryptography. CRC Press, [2] N. Koblitz, \Elliptic curve cryptosystems," Mathematics of Computation, vol. 48, pp. 203{209, [12] S. Lin and D. Costello, Error Control Coding: Fundamentals and Applications. Englewood Clis, NJ: Prentice-Hall, [13] A. Menezes, Elliptic Curve Public Key Cryptosystems. Kluwer Academic Publishers, [3] V. Miller, \Uses of elliptic curves in cryptography," in Advances in Cryptology CRYPTO '85, pp. 417{426, Springer-Verlag, [4] N. Koblitz, \Hyperelliptic cryptosystems," Journal of Cryptology, vol. 1, no. 3, pp. 129{150, [5] E. Mastrovito, VLSI Architectures for Computation in Galois Fields. PhD thesis, Linkoping University, Dept. Electr. Eng., Linkoping, Sweden, [6] C. Paar and P. S. Rodriguez, \Fast Arithmetic Architectures for Public-Key Algorithms over Galois Fields GF ((2 n ) m )," in Advances in Cryptography EUROCRYPT '97, pp. 363{378, Springer-Verlag, LNCS [7] L. Song and K. Parhi, \Ecient nite elds serial/parallel multiplication," in Proc. Int. Conf. Application Specic System Architectures and Processors, pp. 72{82, Chicago, IL, August [8] A. Klindworth, \FPLD-Implementation of computations over nite elds GF (2 m ) with applications to error control coding," in 5th. Intern. Workshop on Field-Programable Logic and Applications, (Oxford, UK), pp. 261{71, LNCS975, Springer-Verlag, September [9] C. Paar and M. Rosner, \Comparison of arithmetic architectures for reed-solomon decoders in recongurable hardware," in Fifth Annual IEEE Symposuium on Field-Programmable Custom Computing Machines, FCCM '97, (Napa Valley, USA), April [10] M. Rosner, \Elliptic curve cryptosystems on recongurable hardware," Master's thesis, ECE Dept., Worcester Polytechnic Institute, Worcester, USA, May [11] T. Beth and D. Gollmann, \Algorithm engineering for public key algorithms," IEEE Journal on Selected Areas in Communications, vol. 7, no. 4, pp. 458{466, 1989.

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER* IJVD: 3(1), 2012, pp. 21-26 ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER* Anbuselvi M. and Salivahanan S. Department of Electronics and Communication

More information

Volume 5, Issue 5 OCT 2016

Volume 5, Issue 5 OCT 2016 DESIGN AND IMPLEMENTATION OF REDUNDANT BASIS HIGH SPEED FINITE FIELD MULTIPLIERS Vakkalakula Bharathsreenivasulu 1 G.Divya Praneetha 2 1 PG Scholar, Dept of VLSI & ES, G.Pullareddy Eng College,kurnool

More information

A Binary Redundant Scalar Point Multiplication in Secure Elliptic Curve Cryptosystems

A Binary Redundant Scalar Point Multiplication in Secure Elliptic Curve Cryptosystems International Journal of Network Security, Vol3, No2, PP132 137, Sept 2006 (http://ijnsnchuedutw/) 132 A Binary Redundant Scalar Multiplication in Secure Elliptic Curve Cryptosystems Sangook Moon School

More information

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures 1 Suresh Sharma, 2 T S B Sudarshan 1 Student, Computer Science & Engineering, IIT, Khragpur 2 Assistant

More information

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap Storage-Ecient Finite Field Basis Conversion Burton S. Kaliski Jr. 1 and Yiqun Lisa Yin 2 RSA Laboratories 1 20 Crosby Drive, Bedford, MA 01730. burt@rsa.com 2 2955 Campus Drive, San Mateo, CA 94402. yiqun@rsa.com

More information

Novel Approach Design of Elliptic curve Cryptography Implementation in VLSI

Novel Approach Design of Elliptic curve Cryptography Implementation in VLSI Novel Approach Design of Elliptic curve Cryptography Implementation in VLSI V. CHANDRASEKARAN Department of Electronics and Communication Engineering Central Polytechnic College Chennai 113, INDIA N.NAGARAJAN

More information

Efficient Elliptic Curve Processor Architectures for Field Programmable Logic

Efficient Elliptic Curve Processor Architectures for Field Programmable Logic Efficient Elliptic Curve Processor Architectures for Field Programmable Logic by Gerardo Orlando A Dissertation Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of

More information

FPGA Implementation of a Microcoded Elliptic Curve. Cryptographic Processor. K.H. Leung, K.W. Ma, W.K. Wong and P.H.W. Leong

FPGA Implementation of a Microcoded Elliptic Curve. Cryptographic Processor. K.H. Leung, K.W. Ma, W.K. Wong and P.H.W. Leong FPGA Implementation of a Microcoded Elliptic Curve Cryptographic Processor K.H. Leung, K.W. Ma, W.K. Wong and P.H.W. Leong fkhleung,kwma,wkwong1,phwlg@cse.cuhk.edu.hk Department of Computer Science and

More information

Optimal Finite Field Multipliers for FPGAs

Optimal Finite Field Multipliers for FPGAs Optimal Finite Field Multipliers for FPGAs Captain Gregory C. Ahlquist, Brent Nelson, and Michael Rice 59 Clyde Building, Brigham Young University, Provo UT 8602 USA ahlquist@ee.byu.edu, nelson@ee.byu.edu,

More information

Serial-Out Bit-level Mastrovito Multipliers for High Speed Hybrid-Double Multiplication Architectures

Serial-Out Bit-level Mastrovito Multipliers for High Speed Hybrid-Double Multiplication Architectures Serial-Out Bit-level Mastrovito Multipliers for High Speed Hybrid-Double Multiplication Architectures Mrs Ramya K 1, Anupama H M 2, Anusha M K 3, Manjunath K N 4, Kishore Kumar 5 1 2,3,4,5 -----------------------------------------------------------------------------***---------------------------------------------------------------------------------

More information

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS 1 B.SARGUNAM, 2 Dr.R.DHANASEKARAN 1 Assistant Professor, Department of ECE, Avinashilingam University, Coimbatore 2 Professor & Director-Research,

More information

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,

More information

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields Santosh Ghosh, Dipanwita Roy Chowdhury, and Abhijit Das Computer Science and Engineering

More information

Daniel V. Bailey 1 and Christof Paar 2. 1 Computer Science Department, Worcester Polytechnic Institute, Worcester, MA.

Daniel V. Bailey 1 and Christof Paar 2. 1 Computer Science Department, Worcester Polytechnic Institute, Worcester, MA. Optimal Extension Fields for Fast Arithmetic in Public-Key Algorithms Daniel V. Bailey 1 and Christof Paar 2 1 Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609 USA. Email:

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 3 / JUNE 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 3 / JUNE 2016 VLSI DESIGN OF HIGH THROUGHPUT FINITE FIELD MULTIPLIER USING REDUNDANT BASIS TECHNIQUE YANATI.BHARGAVI, A.ANASUYAMMA Department of Electronics and communication Engineering Audisankara College of Engineering

More information

A New Architecture of High Performance WG Stream Cipher

A New Architecture of High Performance WG Stream Cipher A New Architecture of High Performance WG Stream Cipher Grace Mary S. 1, Abhila R. Krishna 2 1 P G Scholar, VLSI and Embedded Systems, Department of ECE T K M Institute of Technology, Kollam, India 2 Assistant

More information

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS Shaik.Sooraj, Jabeena shaik,m.tech Department of Electronics and communication Engineering, Quba College

More information

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 11, NOVEMBER 211 2125 [1] B. Calhoun and A. Chandrakasan, Static noise margin variation for sub-threshold SRAM in 65-nm CMOS,

More information

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

A High-Speed FPGA Implementation of an RSD-Based ECC Processor RESEARCH ARTICLE International Journal of Engineering and Techniques - Volume 4 Issue 1, Jan Feb 2018 A High-Speed FPGA Implementation of an RSD-Based ECC Processor 1 K Durga Prasad, 2 M.Suresh kumar 1

More information

An Efficient Pipelined Multiplicative Inverse Architecture for the AES Cryptosystem

An Efficient Pipelined Multiplicative Inverse Architecture for the AES Cryptosystem An Efficient Pipelined Multiplicative Inverse Architecture for the AES Cryptosystem Mostafa Abd-El-Barr and Amro Khattab Abstract In this paper, we introduce an architecture for performing a recursive

More information

ISSN Vol.08,Issue.12, September-2016, Pages:

ISSN Vol.08,Issue.12, September-2016, Pages: ISSN 2348 2370 Vol.08,Issue.12, September-2016, Pages:2273-2277 www.ijatir.org G. DIVYA JYOTHI REDDY 1, V. ROOPA REDDY 2 1 PG Scholar, Dept of ECE, TKR Engineering College, Hyderabad, TS, India, E-mail:

More information

Implementation of Galois Field Arithmetic Unit on FPGA

Implementation of Galois Field Arithmetic Unit on FPGA Implementation of Galois Field Arithmetic Unit on FPGA 1 LakhendraKumar, 2 Dr. K. L. Sudha 1 B.E project scholar, VIII SEM, Dept. of E&C, DSCE, Bangalore, India 2 Professor, Dept. of E&C, DSCE, Bangalore,

More information

Implementation and Analysis of an Error Detection and Correction System on FPGA

Implementation and Analysis of an Error Detection and Correction System on FPGA Implementation and Analysis of an Error Detection and Correction System on FPGA Constantin Anton, Laurenţiu Mihai Ionescu, Ion Tutănescu, Alin Mazăre, Gheorghe Şerban University of Piteşti, Romania Abstract

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER

PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER Philip Brisk, Adam Kaplan, Majid Sarrafzadeh Computer Science Department, University of California Los Angeles 3532C Boelter Hall, Los Angeles, CA 90095-1596

More information

Channel Coding and Cryptography Part II: Introduction to Cryptography

Channel Coding and Cryptography Part II: Introduction to Cryptography Channel Coding and Cryptography Part II: Introduction to Cryptography Prof. Dr.-Ing. habil. Andreas Ahrens Communications Signal Processing Group, University of Technology, Business and Design Email: andreas.ahrens@hs-wismar.de

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

Dual-Field Arithmetic Unit for GF (p) and GF (2 m )

Dual-Field Arithmetic Unit for GF (p) and GF (2 m ) Dual-Field Arithmetic Unit for GF (p) and GF (2 m ) Johannes Wolkerstorfer Institute for Applied Information Processing and Communications, Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria

More information

Elliptic Curves over Prime and Binary Fields in Cryptography

Elliptic Curves over Prime and Binary Fields in Cryptography Elliptic Curves over Prime and Binary Fields in Cryptography Authors Dana Neustadter (danan@ellipticsemi.com) Tom St Denis (tstdenis@ellipticsemi.com) Copyright 2008 Elliptic Semiconductor Inc. Elliptic

More information

A Scalable and High Performance Elliptic Curve Processor with Resistance to Timing Attacks

A Scalable and High Performance Elliptic Curve Processor with Resistance to Timing Attacks A Scalable and High Performance Elliptic Curve Processor with Resistance to Timing Attacks Alireza Hodjat, David D. Hwang, Ingrid Verbauwhede, University of California, Los Angeles Katholieke Universiteit

More information

FPGA BASED CRYPTOGRAPHY FOR INTERNET SECURITY

FPGA BASED CRYPTOGRAPHY FOR INTERNET SECURITY Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 10, October 2015,

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

Fast Multiplication on Elliptic Curves over GF (2 m ) without Precomputation

Fast Multiplication on Elliptic Curves over GF (2 m ) without Precomputation Fast Multiplication on Elliptic Curves over GF (2 m ) without Precomputation Julio López 1 and Ricardo Dahab 2 1 Department of Combinatorics & Optimization University of Waterloo, Waterloo, Ontario N2L

More information

DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA

DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA 1 Brundha K A MTech Email: 1 brundha1905@gmail.com Abstract Pseudo-random number generators (PRNGs) are a key component of stream ciphers

More information

Final Project Report: Cryptoprocessor for Elliptic Curve Digital Signature Algorithm (ECDSA)

Final Project Report: Cryptoprocessor for Elliptic Curve Digital Signature Algorithm (ECDSA) Final Project Report: Cryptoprocessor for Elliptic Curve Digital Signature Algorithm (ECDSA) Team ID: IN00000026 Team member: Kimmo Järvinen tel. +358-9-4512429, email. kimmo.jarvinen@tkk.fi Instructor:

More information

Using Multiple FPGA Architectures for Real-time Processing of Low-level Machine Vision Functions

Using Multiple FPGA Architectures for Real-time Processing of Low-level Machine Vision Functions Using Multiple FPGA Architectures for Real-time Processing of Low-level Machine Vision Functions Thomas H. Drayer, William E. King IV, Joeseph G. Tront, Richard W. Conners Philip A. Araman Bradley Department

More information

FPGA Accelerated Tate Pairing Cryptosystems over Binary Fields

FPGA Accelerated Tate Pairing Cryptosystems over Binary Fields FPGA Accelerated ate Pairing Cryptosystems over Binary Fields Chang Shu, Soonhak Kwon, and Kris Gaj Dept. of ECE, George Mason University Fairfax VA, USA Dept. of Mathematics, Sungkyukwan University Suwon,

More information

Low-Power Elliptic Curve Cryptography Using Scaled Modular Arithmetic

Low-Power Elliptic Curve Cryptography Using Scaled Modular Arithmetic Low-Power Elliptic Curve Cryptography Using Scaled Modular Arithmetic E. Öztürk1, B. Sunar 1, and E. Savaş 2 1 Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, Worcester

More information

This is a repository copy of High Speed and Low Latency ECC Implementation over GF(2m) on FPGA.

This is a repository copy of High Speed and Low Latency ECC Implementation over GF(2m) on FPGA. This is a repository copy of High Speed and Low Latency ECC Implementation over GF(2m) on FPGA. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/99476/ Version: Accepted Version

More information

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT K.Sandyarani 1 and P. Nirmal Kumar 2 1 Research Scholar, Department of ECE, Sathyabama

More information

Towards an FPGA Architecture Optimized for Public-Key Algorithms

Towards an FPGA Architecture Optimized for Public-Key Algorithms Towards an FPGA Architecture Optimized for Public-Key Algorithms AJ Elbirt *, C Paar ** Cryptography and Information Security Laboratory, Worcester, MA 01609 Electrical and Computer Engineering epartment,

More information

Algorithms and arithmetic for the implementation of cryptographic pairings

Algorithms and arithmetic for the implementation of cryptographic pairings Cairn seminar November 29th, 2013 Algorithms and arithmetic for the implementation of cryptographic pairings Nicolas Estibals CAIRN project-team, IRISA Nicolas.Estibals@irisa.fr What is an elliptic curve?

More information

Design Space Exploration of the Lightweight Stream Cipher WG-8 for FPGAs and ASICs

Design Space Exploration of the Lightweight Stream Cipher WG-8 for FPGAs and ASICs Design Space Exploration of the Lightweight Stream Cipher WG- for FPGAs and ASICs Gangqiang Yang, Xinxin Fan, Mark Aagaard and Guang Gong University of Waterloo g37yang@uwaterloo.ca Sept 9, 013 Gangqiang

More information

Bipartite Modular Multiplication

Bipartite Modular Multiplication Bipartite Modular Multiplication Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering, Nagoya University, Nagoya, 464-8603, Japan {mkaihara, ntakagi}@takagi.nuie.nagoya-u.ac.jp Abstract.

More information

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Veerraju kaki Electronics and Communication Engineering, India Abstract- In the present work, a low-complexity

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

Low area implementation of AES ECB on FPGA

Low area implementation of AES ECB on FPGA Total AddRoundkey_3 MixCollumns AddRoundkey_ ShiftRows SubBytes 1 Low area implementation of AES ECB on FPGA Abstract This project aimed to create a low area implementation of the Rajindael cipher (AES)

More information

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Sudheer Vemula, Student Member, IEEE, and Charles Stroud, Fellow, IEEE Abstract The first Built-In Self-Test (BIST) approach for the programmable

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Coupon Recalculation for the GPS Authentication Scheme

Coupon Recalculation for the GPS Authentication Scheme Coupon Recalculation for the GPS Authentication Scheme Georg Hofferek and Johannes Wolkerstorfer Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology, Inffeldgasse

More information

Prime Field over Elliptic Curve Cryptography for Secured Message Transaction

Prime Field over Elliptic Curve Cryptography for Secured Message Transaction Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications

Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications M.Jasmin Assistant Professor, Department Of ECE, Bharath University, Chennai,India ABSTRACT: Power consumption

More information

High-Performance Integer Factoring with Reconfigurable Devices

High-Performance Integer Factoring with Reconfigurable Devices FPL 2010, Milan, August 31st September 2nd, 2010 High-Performance Integer Factoring with Reconfigurable Devices Ralf Zimmermann, Tim Güneysu, Christof Paar Horst Görtz Institute for IT-Security Ruhr-University

More information

Elliptic Curve Cryptography

Elliptic Curve Cryptography Elliptic Curve Cryptography Dimitri Dimoulakis, Steve Jones, and Lee Haughton May 05 2000 Abstract. Elliptic curves can provide methods of encryption that, in some cases, are faster and use smaller keys

More information

Design of Flash Controller for Single Level Cell NAND Flash Memory

Design of Flash Controller for Single Level Cell NAND Flash Memory Design of Flash Controller for Single Level Cell NAND Flash Memory Ashwin Bijoor 1, Sudharshana 2 P.G Student, Department of Electronics and Communication, NMAMIT, Nitte, Karnataka, India 1 Assistant Professor,

More information

Elliptic Curve Cryptography on a Palm OS Device

Elliptic Curve Cryptography on a Palm OS Device Elliptic Curve Cryptography on a Palm OS Device André Weimerskirch 1, Christof Paar 2, and Sheueling Chang Shantz 3 1 CS Department, Worcester Polytechnic Institute, USA weika@wpi.edu 2 ECE and CS Department,

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs P. Kollig B. M. Al-Hashimi School of Engineering and Advanced echnology Staffordshire University Beaconside, Stafford

More information

Multifunction Residue Architectures for Cryptography 1

Multifunction Residue Architectures for Cryptography 1 Multifunction Residue Architectures for Cryptography 1 LAXMI TRIVENI.D, M.TECH., EMBEDDED SYSTEMS & VLSI 2 P.V.VARAPRASAD,RAO ASSOCIATE PROFESSOR., SLC S INSTITUTE OF ENGINEERING AND TECHNOLOGY Abstract

More information

Advanced Encryption Standard Implementation on Field Programmable Gate Arrays. Maryam Behrouzinekoo. B.Eng., University of Guilan, 2011

Advanced Encryption Standard Implementation on Field Programmable Gate Arrays. Maryam Behrouzinekoo. B.Eng., University of Guilan, 2011 Advanced Encryption Standard Implementation on Field Programmable Gate Arrays by Maryam Behrouzinekoo B.Eng., University of Guilan, 2011 A Report Submitted in Partial Fulfillment of the Requirements for

More information

Faster Interleaved Modular Multiplier Based on Sign Detection

Faster Interleaved Modular Multiplier Based on Sign Detection Faster Interleaved Modular Multiplier Based on Sign Detection Mohamed A. Nassar, and Layla A. A. El-Sayed Department of Computer and Systems Engineering, Alexandria University, Alexandria, Egypt eng.mohamedatif@gmail.com,

More information

High Performance Architecture for Elliptic. Curve Scalar Multiplication over GF(2 m )

High Performance Architecture for Elliptic. Curve Scalar Multiplication over GF(2 m ) High Performance Architecture for Elliptic 1 Curve Scalar Multiplication over GF(2 m ) Junjie Jiang, Jing Chen, Jian Wang, Duncan S. Wong, and Xiaotie Deng Abstract We propose a new architecture for performing

More information

Advanced WG and MOWG Stream Cipher with Secured Initial vector

Advanced WG and MOWG Stream Cipher with Secured Initial vector International Journal of Scientific and Research Publications, Volume 5, Issue 12, December 2015 471 Advanced WG and MOWG Stream Cipher with Secured Initial vector Dijomol Alias Pursuing M.Tech in VLSI

More information

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box Volume 5 Issue 2 June 2017 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org An Efficient FPGA Implementation of the Advanced Encryption

More information

Efficient Hardware Design and Implementation of AES Cryptosystem

Efficient Hardware Design and Implementation of AES Cryptosystem Efficient Hardware Design and Implementation of AES Cryptosystem PRAVIN B. GHEWARI 1 MRS. JAYMALA K. PATIL 1 AMIT B. CHOUGULE 2 1 Department of Electronics & Telecommunication 2 Department of Computer

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES FACULTY OF TECHNOLOGY

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES FACULTY OF TECHNOLOGY ADDIS ABABA UNIVERSIY SCHOOL OF GRADUAE SUDIES FACULY OF ECHNOLOGY Hardware Acceleration of Elliptic Curve Based Cryptographic Algorithms: Design and Simulation BY Mubarek Kedir April, 008 ADDIS ABABA

More information

Digital Design with FPGAs. By Neeraj Kulkarni

Digital Design with FPGAs. By Neeraj Kulkarni Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI

More information

Hardware Implementation of a Montgomery Modular Multiplier in a Systolic Array

Hardware Implementation of a Montgomery Modular Multiplier in a Systolic Array Hardware Implementation of a Montgomery Modular Multiplier in a Systolic Array Sıddıka Berna Örs 1 Lejla Batina 1,2 Bart Preneel 1 Joos Vandewalle 1 1 Katholieke Universiteit Leuven, ESAT/SCD-COSIC Kasteelpark

More information

ECC1 Core. Elliptic Curve Point Multiply and Verify Core. General Description. Key Features. Applications. Symbol

ECC1 Core. Elliptic Curve Point Multiply and Verify Core. General Description. Key Features. Applications. Symbol General Description Key Features Elliptic Curve Cryptography (ECC) is a public-key cryptographic technology that uses the mathematics of so called elliptic curves and it is a part of the Suite B of cryptographic

More information

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed Vijaya Kumar. B.1 #1, T. Thammi Reddy.2 #2 #1. Dept of Electronics and Communication, G.P.R.Engineering College,

More information

Hardware/Software Co-Design of Elliptic Curve Cryptography on an 8051 Microcontroller

Hardware/Software Co-Design of Elliptic Curve Cryptography on an 8051 Microcontroller Hardware/Software Co-Design of Elliptic Curve Cryptography on an 8051 Microcontroller Manuel Koschuch, Joachim Lechner, Andreas Weitzer, Johann Großschädl, Alexander Szekely, Stefan Tillich, and Johannes

More information

IMPLEMENTATION OF ELLIPTIC CURVE POINT MULTIPLICATION ALGORITHM USING DSP PROCESSOR 1Prof. Renuka H. Korti, 2Dr. Vijaya C.

IMPLEMENTATION OF ELLIPTIC CURVE POINT MULTIPLICATION ALGORITHM USING DSP PROCESSOR 1Prof. Renuka H. Korti, 2Dr. Vijaya C. ISSN 2320-9194 13 International Journal of Advance Research, IJOAR.org Volume 1, Issue 7, July 2013, Online: ISSN 2320-9194 IMPLEMENTATION OF ELLIPTIC CURVE POINT MULTIPLICATION ALGORITHM USING DSP PROCESSOR

More information

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

Studying Software Implementations of Elliptic Curve Cryptography

Studying Software Implementations of Elliptic Curve Cryptography Studying Software Implementations of Elliptic Curve Cryptography Hai Yan and Zhijie Jerry Shi Department of Computer Science and Engineering, University of Connecticut Abstract Elliptic Curve Cryptography

More information

An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n )

An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n ) An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n ) Lo ai A. Tawalbeh and Alexandre F. Tenca School of Electrical Engineering and Computer

More information

Number Theory and Cryptography

Number Theory and Cryptography Volume 114 No. 11 2017, 211-220 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Number Theory and Cryptography 1 S. Vasundhara 1 G.Narayanamma Institute

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

Tiny Tate Bilinear Pairing Core Specification. Author: Homer Hsing

Tiny Tate Bilinear Pairing Core Specification. Author: Homer Hsing Tiny Tate Bilinear Pairing Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1 May 3, 2012 This page has been intentionally left blank. www.opencores.org Rev 0.1 ii Rev. Date Author Description

More information

Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware

Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware Lukáš Sekanina Faculty of Information Technology, Brno University of Technology Božetěchova 2, 612 66 Brno, Czech Republic

More information

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Charles Stroud, Ping Chen, Srinivasa Konala, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici

More information

Applications of The Montgomery Exponent

Applications of The Montgomery Exponent Applications of The Montgomery Exponent Shay Gueron 1,3 1 Dept. of Mathematics, University of Haifa, Israel (shay@math.haifa.ac.il) Or Zuk 2,3 2 Dept. of Physics of Complex Systems, Weizmann Institute

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Implementation of Full -Parallelism AES Encryption and Decryption

Implementation of Full -Parallelism AES Encryption and Decryption Implementation of Full -Parallelism AES Encryption and Decryption M.Anto Merline M.E-Commuication Systems, ECE Department K.Ramakrishnan College of Engineering-Samayapuram, Trichy. Abstract-Advanced Encryption

More information

RC6 Implementation including key scheduling using FPGA

RC6 Implementation including key scheduling using FPGA ECE 646, HI-3 1 RC6 Implementation including key scheduling using FPGA (ECE 646 Project, December 2006) Fouad Ramia, Hunar Qadir, GMU Abstract with today's great demand for secure communications systems,

More information

Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012

Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012 Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012 1 Digital vs Analog Digital signals are binary; analog

More information

Encryption and Decryption by AES algorithm using FPGA

Encryption and Decryption by AES algorithm using FPGA Encryption and Decryption by AES algorithm using FPGA Sayali S. Kshirsagar Department of Electronics SPPU MITAOE, Alandi(D), Pune, India sayali.kshirsagar17@gmail.com Savita Pawar Department of Electronics

More information

ECE 297:11 Reconfigurable Architectures for Computer Security

ECE 297:11 Reconfigurable Architectures for Computer Security ECE 297:11 Reconfigurable Architectures for Computer Security Course web page: http://mason.gmu.edu/~kgaj/ece297 Instructors: Kris Gaj (GMU) Tarek El-Ghazawi (GWU) TA: Pawel Chodowiec (GMU) Kris Gaj George

More information

NEW MODIFIED LEFT-TO-RIGHT RADIX-R REPRESENTATION FOR INTEGERS. Arash Eghdamian 1*, Azman Samsudin 1

NEW MODIFIED LEFT-TO-RIGHT RADIX-R REPRESENTATION FOR INTEGERS. Arash Eghdamian 1*, Azman Samsudin 1 International Journal of Technology (2017) 3: 519-527 ISSN 2086-9614 IJTech 2017 NEW MODIFIED LEFT-TO-RIGHT RADIX-R REPRESENTATION FOR INTEGERS Arash Eghdamian 1*, Azman Samsudin 1 1 School of Computer

More information

HIGH PERFORMANCE ELLIPTIC CURVE CRYPTO-PROCESSOR FOR FPGA PLATFORMS

HIGH PERFORMANCE ELLIPTIC CURVE CRYPTO-PROCESSOR FOR FPGA PLATFORMS HIGH PERFORMANCE ELLIPTIC CURVE CRYPTO-PROCESSOR FOR FPGA PLATFORMS Debdeep Mukhopadhyay Dept. of Computer Science and Engg. IIT Kharagpur 3/6/2010 NTT Labs, Japan 1 Outline Elliptic Curve Cryptography

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history

More information

Mastrovito Multipliers Based New High Speed Hybrid Double Multiplication Architectures Based On Verilog

Mastrovito Multipliers Based New High Speed Hybrid Double Multiplication Architectures Based On Verilog Mastrovito Multipliers Based New High Speed Hybrid Double Multiplication Architectures Based On Verilog Sangoju Janardhana Chary & Rajesh Kanuganti 1 M-Tech, Dept. of ECE,Khammam Institute of Technology

More information

IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD

IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD JyothiLeonoreDake 1,Sudheer Kumar Terlapu 2 and K. Lakshmi Divya 3 1 M.Tech-VLSID,ECE Department, SVECW (Autonomous),Bhimavaram,

More information

Hardware Architectures

Hardware Architectures Hardware Architectures Secret-key Cryptography Public-key Cryptography Cryptanalysis AES & AES candidates estream candidates Hash Functions SHA-3 Montgomery Multipliers ECC cryptosystems Pairing-based

More information

A Scalable Architecture for Montgomery Multiplication

A Scalable Architecture for Montgomery Multiplication A Scalable Architecture for Montgomery Multiplication Alexandre F. Tenca and Çetin K. Koç Electrical & Computer Engineering Oregon State University, Corvallis, Oregon 97331 {tenca,koc}@ece.orst.edu Abstract.

More information