Power Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique

Size: px
Start display at page:

Download "Power Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique"

Transcription

1 Poer Optimization or Universal Hash Function Data Path Using Divide-and-Concatenate Technique Bo Yang, and Ramesh Karri Dept. o Electrical and Computer Engineering, Polytechnic University Brooklyn, NY, USA yangbo@photon.poly.edu, ramesh@india.poly.edu ABSTRACT We present an architecture level lo poer design technique called divide-and-concatenate or universal hash unctions based on the olloing observations: (i) the poer consumption o a -bit array multiplier and associated universal hash data path decreases as O( 4 ) i its clock rate remains constant. (ii) to universal hash unctions are equivalent i they have the same collision probability property. In the proposed approach e divide a -bit data path (ith collision probability 2 ) into to/our -bit data paths (each ith collision probability 2 ) and concatenate their results to construct an equivalent -bit data path (ith a collision probability 2 ). A popular lo poer technique that uses parallel data paths saves 62.10% dynamic poer consumption incurring 102% area overhead. In contrast, the divide-and-concatenate technique saves 55.44% dynamic poer consumption ith only 16% area overhead. Categories and Subject Descriptors B.2.2 [Arithmetic and Logic Structures]: Perormance Analysis and Design Aids General Terms Design, Perormance Keyords Universal Hash Function, Poer Optimization, Divide-and- Concatenate 1. INTRODUCTION Hash unctions have a ide range o applications in computer and netork related areas including databases, eb search engines, and most importantly netork security applications. In databases and eb search engines the record lookup can be speed up by using hash values as indexes. In netork security, keyed hash unctions are used as message Permission to make digital or hard copies o all or part o this ork or personal or classroom use is granted ithout ee provided that copies are not made or distributed or proit or commercial advantage and that copies bear this notice and the ull citation on the irst page. To copy otherise, to republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a ee. CODES+ISSS 05, Sept , 2005, Jersey City, Ne Jersey, USA. Copyright 2005 ACM /05/ $5.00. authentication codes to assure the integrity o a message. MD5 and SHA-1 are to popular hash algorithms. Since they are iterative algorithms in hich the current computation step depends on the result o the previous step, they are not parallelizable and cannot be pipelined. Moreover, these hash unctions are not scalable; message integrity oered by these hash unctions cannot be tailored to application requirements. Recently, universal hash unction such as MMH [1], NH [2] and TMMH [3] has been proposed as an attractive alternative. They use additions and multiplications. Their collision probabilities are determined by the size o these additions and multiplications. They do not have iterative internal structure and are parallelizable. And most importantly they are scalable; message integrity oered by these hash unctions can be tailored to application requirements. They are ideal or implementation in hardare. Hardare implementations o universal hash unctions have been used in several applications including, high speed routers and ireless cards. In virus detection and content classiication applications or high speed routers, each and every packet is hashed and compared ith pre-computed signatures [4]. Integrating a large number o universal hash unctions improves the accuracy o content classiication. For these high perormance systems, lo poer universal hash implementations can improve integration density, reduce packaging cost and improve reliability. More and more ireless cards, PDAs and cell phones are supporting authentication in hardare. For these portable devices, lo poer universal hash implementations can prolong battery lie. The hardare implementations o universal hash unctions are data path dominated ith only a smaller amount o control logic. Arithmetic operations such as addition and multiplication are at the core o these algorithms. Existing lo poer design techniques can be applied to universal hash unction hardare designs. One straightorard approach to reduce poer consumption o universal hash unctions is by using lo poer implementations o adders and multipliers [5]. Orthogonal to this and other circuit level and logic level lo poer approaches are architectural level lo poer design techniques such as glitch reduction [6], dynamic voltage scaling [7], data representation [8] [9], pipelining and parallel data paths [10] [11]. In the parallel data paths technique, the original data path is replicated N times ith each replicated data path operating at 1/N o the original clock requency. Since each o the data paths operate at a loer clock requency, their supply voltage can be reduced. The dynamic poer consumed by the parallel data paths architecture is 219

2 only 1/N 2 o the poer consumption o the original data path. Hoever, it incurs N times the hardare overhead [10] [11]. In this paper e ill develop an architecture level poer optimization technique or universal hash unctions. This technique can yield savings in poer consumption comparable to that o the parallel data paths technique but ith signiicantly less area overhead. Instead o replicating the original hash data path directly, the divide-and-concatenate technique divides the -bit data path into our -bit data paths and concatenates their outputs to construct an equivalent -bit data path [12]. Obviously, a straightorard hash data path and its corresponding divide-and-concatenate hash data path are not equivalent in terms o the results that they output. We deine to hash data paths to be equivalent i the results that they output satisy a pre-deined property. For one ay universal hash unctions and associated message authentication codes the actual result is not important. Rather, it is the collision probability o the result that is important. Hence, e propose that to hash data paths be considered equivalent i they have the same collision probability. When e discuss equivalent hash data paths and architectures in this paper, it means 1) they can process same size input every clock cycle and generate same size output; 2) they have the same collision probability. In the rest o this paper e ill describe the divide-andconcatenate lo poer design technique or universal hash unctions. Speciically, e ill introduce universal hash unctions and Linear Congruential Hash (LCH) universal hash unction in section 2. The motivation or the proposed lo poer optimization technique is presented in section 3. We ill apply the divide-and-concatenate technique to design lo poer LCH universal hash hardare in section 4. The experimental results o the proposed lo poer LCH hash data paths using an IBM 0.18 ASIC library are reported in section 5. We ill summarize our contributions in section LINEAR CONGRUENTIAL HASH (LCH) A hash unction (h) converts an input rom a large domain (x) into an output in a small range (the hash value y = h(x), oten a subset o integers). There are three eatures or a hash unction: Pre-image resistance: For a given hash value y, it is computationally ineasible to ind a message x such that h(x) = y. Second pre-image resistance: For a given message x, it is computationally ineasible to ind a message x x such that h(x ) = h(x). Collision resistance: It is computationally ineasible to ind messages x and x such that x x and h(x ) = h(x). A hash unction that meets the pre-image resistance and the second pre-image resistance is a one ay hash unction. Further, i a one ay hash unction meets the third eature, it is collision resistant. Collision resistance implies second pre-image resistance o a hash unction. It does not guarantee the pre-image resistance only i one insists on alloing the degenerate case o hash unctions that do not actually compress [13]. So or a one ay hash unction, the collision probability, hich stands or the ability o collision resistance, is the most important parameter. 2.1 LCH Algorithm Carter and Wegman [14] deined a universal hash unction as ollos: Let A and B be to sets, and let H be a amily o unctions rom A to B. For every pair x 1, x 2 A ith x 1 x 2, h(x 1), h(x 2) B ith h H, i the collision probability o h(x 1) = h(x 2) equals to 1/ B, H is a universal amily o hash unctions. B is size o set and 1/ B is the smallest possible value o the probability. In many cases, some hash unction amilies can only achieve a collision probability hich is slightly larger than 1/ B. Such universal hash unctions are called almost universal [14]. In this paper e ill ocus on LCH, a idely used universal unction amily hich is deined as: H m,x(m) = k (m ix i + t)modp (1) i=1 here m i is the i th ord in a message block m and x i is the i th ord in key x and t Z p. p = 2 + s is a prime number hich is close to 2. Modular reduction o the accumulated result using p generates a -bit hash value. Since p is close to 2, the collision probability o LCH (1/p)is close to 2. The key block and every message block consist o k -bit ords. Since the modular reduction step can be amortized over k multiply-and-accumulate operations, increasing k can increase the speed o hash computation. Hoever, a large k results in a longer key. 2.2 Modular Reduction Reducing the 2-bit result o the multiply-and-accumulate step o the -bit LCH unction modulo p = 2 +s yields a - bit hash value. A division-less modular reduction algorithm rom [1] that uses 2 multiplications and 3 subtractions can be used to perorm the modular reduction. Let x = 2 a + b be the 2-bit input here a, b are unsigned -bit integers: 2 a+b (2 a+b) a (2 a+s) = b s a(mod2 a+s) (2) Step 1: Since a, b [0, 2 1], b s a [ s (2 1), 2 1]. I y = b s a then y x (mod 2 + s) can be represented as a signed 2-bit integer y = 2 c + d, here c { s,..., 0} and d is an unsigned -bit integer. Step 2: Repeat step 1 to compute z = d s c. z y x (mod 2 + s) and z {0,..., 2 + s 2 }. Step 3: I z > 2 + s return z (2 + s) else return z. 3. MOTIVATION Toards motivating the proposed divide-and-concatenate lo poer technique consider a -bit LCH universal hash data path shon in Figure 1 that operates on -bit input message ords and -bit subkey ords and has a collision probability o 2. This data path can be implemented as a to-stage pipeline using a combinational adder and a combinational array multiplier that can perorm addition and multiplication in a single clock cycle respectively. It uses our -bit registers (R1, R2, H and L), to 2-bit registers (R3, R4), to -bit 2-to-1 multiplexers (mux1 and mux2) and to 2-bit 2-to-1 multiplexers (mux3 and mux4). The control signals are generated by control logic. Equation (1) can be reritten as ollos: h m,x(m) = (( k (m ix i)) + kt) mod p (3) i=1 At the beginning o every message block hen neblock is set to active, registers R1, R2, R3, L and H and the 220

3 mulsrc addsrc addsub load DMUX m i s x i mux1 mulsrc mux2 Ne block R1 R2 Control p Control signals... R3 mux3 loadr4 + 2 mux4 R4 comp sign ext H L Hash Data path /N /N Data path /N /N MUX Data path Figure 1: -bit hardare architecture or the LCH (a) (b) counter in the control logic are all set to zero hile register R4 is initialized to k t (this is a constant or a speciic instance o LCH unction). This is olloed by k multiplyand-accumulate steps o the k -bit message ords ith the k -bit key ords. In each step, m i in register R1 is multiplied ith x i in register R2 and the result is stored in register R3. For this purpose, the control signal mulsrc o multiplexers mux1 and mux2 are set to select loer input. This is then accumulated into register R4 using the adder/subtractor unit. The 2-bit accumulated result at the end o the k + 1 clock cycles is stored in the (H, L) register pair. Ater the accumulation, the modular reduction is applied to the 2-bit accumulated result in (H, L) to get a -bit hash. Step 1 o the reduction algorithm consumes to clock cycles. By setting mulsrc to select the upper input o mux1 and mux2 respectively, e ill get s a at the end o cycle 1. By setting comp signal to select the upper input o mux4 and setting addsub to do subtraction e ill get y = b s a at the end o cycle 2. By repeating these operations in clock cycles 3 and 4 e ill get z (step 2 in section 2.2). In the last cycle, e perorm z p. I the result is less than zero, the value in register pair (H, L) is the hash value. Otherise output o the adder/subtractor is the hash. This is loaded into the output registers (H, L) by setting the load signal to 1. Overall, this architecture generates a -bit hash or a k message block in k + 6 cycles. Let the maximum clock rate o a -bit LCH data path be MHz; this is determined by the -bit combinational multiplier. Throughput o the straightorard LCH data path shon in Figure 1 is Mbps (The coeicient k/(k+ 6) is omitted, because all throughput ill be scaled by this actor.). 3.1 Analysis o Poer Consumption The dynamic poer or CMOS circuits can be computed as: P dynamic = αc L V 2 dd (4) α is the average sitching probability o the inputs to the circuit. is the clock requency, C L is the equivalent load capacitance o the circuit (or a target technology library, it is proportional to the area o the circuit) and V dd is the supply voltage o the circuit. α is a unction o the statistics o the input signals, the circuit style, the implemented unction and the circuit topology. Dynamic poer can be reduced by reducing one or a number o actors o equation (4). Reducing V dd obviously has the most impact. Any Figure 2: technique Parallel data path poer optimization lo poer technique that reduces the supply voltage has to deploy additional techniques to maintain the throughput because reducing the supply voltage ill increase the circuit delay as ollos [11]: V dd delay = lc criticalpath (5) (V dd V t ) 2 l is a technology parameter that can be omitted hen the optimized circuits are targeted to the same technology. V t is the threshold voltage and much smaller than V dd. The loer the supply voltage V dd, the longer the circuit delay. Finally, it is the capacitance in the critical path and not the capacitance o the hole circuit that determines the delay. Parallel data paths technique shon in Figure 2 reduces the dynamic poer by irst replicating the original data path N times ith each orking at 1/N the original clock requency. The supply voltage to each o the data paths is then scaled. Hoever, the original throughput is maintained by the replicated data paths. Since the replicated data paths process the same input, the average sitching probabilities o every data path are the same. Ignoring the capacitance o the multiplexer and demultiplexer, the capacitance o the parallel data path is NC L. The dynamic poer consumed by these parallel data paths is: P parallel α( N )NC L( V dd N )2 = αclv dd N 2 = P original N 2 (6) We can apply this technique to the LCH data path shon in Figure 1. I to LCH data paths are used, the poer consumption can be reduced to 1/4 o the original value, but it doubles the area. Let us look at poer consumed by LCH and other universal hash unctions rom a dierent angle. Let us consider the poer consumption o these universal hash unctions as a unction o their input ord length hen the clock requency is ixed. The hardare complexity o a -bit array multiplier and hence that o a -bit LCH data path increases as O( 2 ). The average capacitance o a circuit is proportional to its area, so the capacitance o a multiplier and the LCH data path increase as O( 2 ). Consequently, the delay o a -bit array multiplier decreases as O() [15]. When the same voltage supply is applied to to multipliers, the delay o a small multiplier is small. By reducing the supply voltage to the small multiplier described in equation (5), its delay 221

4 can be made equal to that o the large multiplier. According to equation (5), the supply voltage can be reduced as O() to maintain a constant delay hen operand size is reduced. The to multipliers ill have the same average sitching probability α. Overall, the dynamic poer consumption o a -bit multiplier decreases as: P multiplier = αc L V 2 dd = αo( 2 )O() 2 = O( 4 ) (7) Since the delay o a 2-bit adder is generally smaller than that o a -bit combinational array multiplier [15], the critical path o the LCH data path shon in Figure 1 is determined by the multiplier. Overall, the dynamic poer consumption o a -bit LCH data path decreases as: O( 4 ) i the critical path delays remain unchanged, so using small size data paths can reduce the poer consumption greatly. The proposed divide-and-concatenate technique ill use several small data paths to construct an equivalent poer eicient LCH data path. The concept o equivalence is crucial. As e discussed in the introduction, e propose that to data paths implementing a hash unction be considered equivalent i they have the same collision probability. 3.2 Collision Probability o Universal Hash Functions Proposition 1: Given to universal hash unctions h1 rom A to B and h2 rom A to C ith collision probability p1 and p2 respectively, a ne universal hash unction h3 can be deined rom A to B C as an ordered pair h3(x) = (h1(x), h2(x)) ith collision probability p1 p2 [16]. When h1 and h2 are to universal hash unctions in the same universal hash amily ith to dierent key, the above proposition is simpliied as: Proposition 2: For a universal hash unction, its collision probability o 2 can be reduced to 2 n by hashing the same message n times using n dierent keys and concatenating the results. Hoever, this solution requires n times key material. The Toeplitz-extension described in [1] reduces the amount o key material making this approach practical. In Toeplitzextension, the keys or n times hashing are not necessarily dierent, the other n 1 keys can be obtained by rotating the irst key. For example, hen e use to copies o the -bit LCH data path to construct a 2-bit data path, using Toeplitz-extension the keys or the second -bit LCH data path can be obtained by rotating the original key by one-bit. The proo that Toeplitz extension does not compromise the collision probability o the result is given in [2]. Using to -bit LCH data paths (ith collision probability o 2 ) and concatenating their outputs can provide a collision probability o 2 2 ; this is same as the collision probability provided by a 2-bit data path. Since the area o an LCH data path decreases as O( 2 ) and the dynamic poer consumption o LCH data path decreases as O( 4 ), e can reduce both area and poer consumption. 4. OPTIMIZING POWER BY DIVIDE AND CONCATENATE Let us no construct an equivalent divide-and-concatenate m i x i 2 R LCH universal hash architecture that reduces dynamic poer consumption ith very lo area overhead. Consider the straightorard -bit LCH universal hash data path shon in Figure 1. It takes one -bit message ord every clock cy-bit LCH data path Control -bit LCH data path C Hash Figure 3: The divide-and-concatenate architecture using to -bit LCH data paths m i x i m i+1 x i+1 R R -bit LCH data path Control -bit LCH data path -bit LCH data path -bit LCH data path C C + Hash Figure 4: The equivalent divide-and-concatenate architecture composed o our -bit LCH data paths cle and generates a -bit hash value ater the entire message is processed. Using the divide-and-concatenate technique a -bit LCH universal hash data path ith collision probability o 2 can be constructed using to -bit LCH universal hash data paths, each ith collision probability o 2 and by concatenating their -bit results to generate a -bit hash value ith collision probability o 2 as shon in Figure 3. The ixed rotate operation R implements the Toeplitz extension technique to generate additional key material or the replicated data paths. R and the concatenation operation C in Figure 3 do not entail any area overhead as they are implemented as just renaming o ires in the circuit. The upper -bits and loer -bits o the -bit ord need to be applied one ater the other to this divide-and-concatenate data path. Since the poer consumed by the -bit LCH data path is O( 4 ), the -bit LCH data path consumes about 1/16 o that consumed by a -bit LCH data path hen they run at the same speed. The divide-and-concatenate LCH data path shon in Figure 3 consumes about 1/8 o that consumed by a -bit LCH data path hen they run at the same speed. Since the area o a -bit LCH data path is O( 2 ), the area o the divide-and-concatenate architecture shon in Figure 3 is about hal o that o a -bit LCH hash data path shon in Figure 1. Hoever, this divideand-concatenate LCH data path can only process a -bit input every clock cycle resulting in a throughput o ( )/2 Mbps, about hal o that o a -bit LCH data path. Let us duplicate this LCH divide-and-concatenate architecture once more to yield the LCH data path shon in Figure 4. This divide-and-concatenate data path uses our - bit LCH data paths and processes a -bit input every cycle. The upper -bit and loer -bit o the -bit ord can be applied in parallel in this divide-and-concatenate data path This divide-and-concatenate LCH data path yields the same hash value as the divide-and-concatenate LCH data path shon in Figure

5 In the divide-and-concatenate LCH data paths shon in Figure 3 and Figure 4, the message ord m i and key ord x i are -bit. So the original k message block is reorganized as (2k) (). Modular reduction has the olloing property: (a + b) mod p = (a mod p + b mod p) mod p (8) The equation (3) can be reritten as ollos: h m,x(x) = [ ( 2k i=1,i=odd (mixi) + k t) mod p + 2 ( 2k i=1,i=even (mixi) + k t) mod p 2 ] mod p (9) The divide-and-concatenate LCH data path in Figure 3 accumulates the temporary results or 2k cycles and then perorms the modular reduction. The divide-and-concatenate LCH data path in Figure 4 accumulates the temporary results (the upper and loer to -bit data paths calculate the irst and second term inside the bracket o equation (9) respectively) or k cycles and then perorms the modular reduction step. This divide-and-concatenate LCH data path using our -bit data paths is equivalent to the straightorard -bit LCH data path in to aspects: 1) they can process same size input every clock cycle (then same throughput) and generate same size hash value; 2) they have a collision probability o 2. We call the divideand-concatenate architecture using our -bit data path equivalent divide-and-concatenate architecture. The area o this equivalent divide-and-concatenate LCH data path is approximately equal to the area o the straightorard -bit LCH data path. The dynamic poer consumed by this equivalent divide-and-concatenate LCH data path is about 1/4 o that consumed by straightorard -bit LCH data path. Let us compare this ith the parallel data paths approach using to -bit LCH data paths. This duplicated data paths approach incurs about 100% area overhead and consumes 1/4 o the dynamic poer consumed by the straightorard -bit LCH data path. The divide-and-concatenate approach can be urther extended as ollos: Each -bit data path in the equivalent divide-and-concatenate architecture can be replaced by our /4-bit LCH data paths. This translates into sixteen /4- bit LCH data paths to construct an equivalent -bit LCH data path. While the parallel data paths technique using our -bit LCH data paths incurs 400% area overhead, the equivalent divide-and-concatenate LCH data path using sixteen /4-bit LCH data paths can reduce the dynamic poer consumption to 1/16 ithout incurring any area overhead. The smallest data path size that can be used to construct an equivalent divide-and-concatenate data path is 4-bits. Belo this value, the keys are so small that Toeplitz extension does not ork (a 2-bit key ord can not be rotated our times). Another advantage o the divide-and-concatenate technique over the parallel data paths technique is that the divide-and-concatenate technique uses a single clock cycle; the divide-and-concatenate technique does not reduce the clock rate o the duplicated data paths. On the contrary, a higher speed clock is used or the multiplexer and demultiplexer in parallel data paths as shon in Figure EXPERIMENTAL RESULTS The analysis o the proposed divide-and-concatenate technique in section 4 used the simpliied models or the area and delay o multipliers and LCH data paths. In this section, e ill present a detailed implementation based validation o our claims. In the divide-and-concatenate technique, the number o small size data paths increases quadratically hen the size o data paths decreases. The linear and constant component ill incur area overhead. For example, the 32-bit LCH data path consumes 9071 gates hile its equivalent divideand-concatenate architecture composed o our 16-bit LCH data path consumes 9917 hen they are targeted to IBM 0.18m ASIC library. We implemented the straightorard 64-bit LCH universal hash data path and the divide-andconcatenate LCH data paths using 32-bit, 16-bit, 8-bit and 4-bit. LCH data path using to parallel 64-bit LCH data paths is also implemented. They are modeled using VHDL and simulated using Modeltech Modelsim and synthesized using Synopsys Design Complier. The poer consumption as reported by Synopsys Prime Poer based on the netlist and simulation results rom Modelsim. The supply voltage o the targeted library is 1.62V. Since the divide-and-concatenate data paths using dierent size data paths use dierent supply voltages, e modiied the voltage parameter in technology library ile. The divide-andconcatenate architectures using dierent size data paths are synthesized on the library ith dierent voltage parameter. The delay and dynamic poer consumption is also reported based on the modiied library. We summarize the data path idth, number o data paths, area, clock requency, voltage, poer consumption, poer consumption and ratio o area overhead percentage to poer consumption saving percentage in Table 1. The 64-bit straightorard 64-bit LCH data path uses the original 1.62V poer supply provided by the targeted library. Its maximum clock rate can achieve 142MHz. For the divide-and-concatenate LCH architecture using 32-bit data path, the supply voltage can be scaled don to 1.32V and the clock rate o the design can still achieve 142MHz. The area overhead compared to the straightorard 64-bit LCH Data path is in the ith ro. For example, the area overhead o equivalent divide-and-concatenate architecture using 32-bit LCH data paths is 16%, hile the area o overhead o parallel data path technique that uses to 64-bit LCH data path is 102%. The dynamic poer consumption saving are listed in the second to the last ro. For example, the equivalent divide-and-concatenate architecture using 16- bit data paths can save 75.29% poer consumption, hile the parallel data path technique that uses to 64-bit LCH data path saves 62.10%. We use the ratio o poer consumption saving to area overhead to evaluate the eiciency o the proposed architectures. For example, the equivalent divideand-concatenate architecture using 32-bit LCH data paths saves 55.44% poer consumption ith 16% area overhead, so its ratio is 3.465( ). The ratios are shon in the last ro. Except the divide-and-concatenate architecture using 4-bit data paths, all other divide-and-concatenate architectures are superior to the parallel data path technique. Specially, the divide-and-concatenate using 32-bit data path has the best ratio. As the idth data path in divide-andconcatenate architecture become small, the percentage o the area o multipliers in the area o the hardare archi- 223

6 Table 1: Implementation results or the straightorard 64-bit LCH data path, equivalent divide-andconcatenate data paths and parallel data paths ith collision probability o 2 64 Architectures Straightorard Divide-and-Concatenate Parallel data paths Data path idth (bits) # o data paths Area (Gates) Area overhead ratio 16% 34 % 57% 92% 102% Clock Rate (MHz) Voltage (Volts) Poer consumption (µ) Poer saving ratio 55.44% 75.29% 75.49% 74.34% 62.10% Ratio (poer ratio/area ratio) tecture also becomes small. The quadratically duplication o linear components incurs more area overhead and poer consumption resulting in smaller ratios. Especially, in 4- bit data path based designs the LCH area is dominated by adders. This is because the area o an adder is comparable to that o a multiplier and the number o adders gros quadratically resulting in ineicient design. We ound that the poer consumption reported by Prime Poer by using deault average sitch probability is almost the same as the poer consumption based on the simulation results rom Modelsim. This is because o the inherent randomness eature o hash unctions. The average sitch probability reported by Modelsim is almost 50% that is the same as the deault value used by Prime Poer. 6. CONCLUSIONS Applying general lo poer design techniques to universal hash unctions yields only moderate improvements. We deined a collision probability equivalent data path and combined it ith the divide-and-concatenate technique to design lo poer architectures ith very small area overhead or LCH calculations based on using multiple small multipliers in place o one larger multiplier. When applied to design the LCH universal hash ith collision probability o 2 64, compared to the 64-bit straightorard implementation, the proposed technique can reduce the poer consumption by 55% and 75% having no perormance loss ith an area overhead o just 16% and 34% by using 32-bit and 16-bit data paths respectively. 7. REFERENCES [1] S. Halevi and H. Kraczyk. Mmh: Sotare message authentication in the gbit/second rates. In Workshop on Fast Sotare Encryption, pages , [2] J. Black, S. Halevi, H. Kraczyk, T. Krovetz, and P. Rogaay. Umac: Fast and secure message authentication. In Cryptology Conerence on Advances in Cryptology, pages , [3] D. A. McGre. The truncated multi-modular hash unction (TMMH). In IETF Internet Drat, [4] S.Dharmapurikar, P. Krishnamurthy, T. Sproull, and J. Lockood. Deep packet inspection using parallel bloom ilters. In High Perormance Interconnects, pages , [5] Z. Huang. High-Level Optimization Techniques or Lo-Poer Multiplier Design. Ph.D. Thesis, University o Caliornia at Los Angeles, [6] A. Raghunathan, S. Dey, and N. K. Jha. Glitch analysis and reduction in register transer level poer optimization. In Design Automation Conerence, pages , [7] J. Yu, W. Wu, X. Chen, H. Hsieh, J. Yang, and F. Balarin. Assertion-based design exploration o dvs in netork processor architectures. In Design Automation and Test in Europe, pages 92 97, [8] P. Petrov and A. Orailoglu. Lo-poer instruction bus encoding or embedded processors. IEEE Trans. Very Large Scale Integr. Syst., 12(8): , [9] M. Srivastava and M. Potkonjak. Poer optimization in programmable processors and asic implementations o linear systems: Transormation-based approach. In Design Automation Conerence, pages , [10] A. Chandrakasan, S. Sheng, and R. Brodersen. Lo-poer cmos digital design. IEEE Journal o Solid-State Circuits, 27(4): , [11] J. M. Rabaey. Digital Integrated Circuits: A Design Perspective. Prentice-Hall, Engleood Clis, NJ, [12] B. Yang, R. Karri, and D. A. Mcgre. Divide-and-concatenate: An architecture level optimization technique or universal hash unctions. In Design Automation Conerence, pages 44 52, [13] P. Rogaay and T. Shrimpton. Cryptographic hash unction basics: Deinitions, implications, and separations or preimage resistance, second-preimage resistance, and collision resistance. In Fast Sotare Encryption, pages , [14] L. Carter and M. Wegman. Universal hash unctions. Journal o Computer and System Sciences, 18: , [15] I. Koren. Computer Arithmetic Algorithms. A. K. Peters, Natick, Massachusetts, 2nd Edition, [16] J. R. Black. Message Authentication Codes. Ph.D. Thesis, University o Caliornia at Davis,

An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code

An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code An 8Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code Abstract We developed an architecture optimization technique called divide-and-concatenate and applied it to

More information

A High Speed Hardware Architecture for Universal Message. Authentication Code

A High Speed Hardware Architecture for Universal Message. Authentication Code A High Speed Hardware Architecture for Universal Message Authentication Code Bo Yang Ramesh Karri Department of Electrical and Computer Engineering Polytechnic University, Brooklyn, NY, 11201 yangbo@photon.poly.edu,

More information

A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation

A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation A ast and area-eicient FPGA-based architecture or high accuracy logarithm approximation Dimitris Bariamis, Dimitris Maroulis, Dimitris K. Iakovidis Department o Inormatics and Telecommunications University

More information

Parallelized Very High Radix Scalable Montgomery Multipliers

Parallelized Very High Radix Scalable Montgomery Multipliers Parallelized Very High Radix Scalable Montgomery Multipliers Kyle Kelley and Daid Harris Harey Mudd College 301 E. Telfth St. Claremont, CA 91711 {Kyle_Kelley, Daid_Harris}@hmc.edu Abstract This paper

More information

Routing and Wavelength Assignment in Multifiber WDM Networks with Non-Uniform Fiber Cost

Routing and Wavelength Assignment in Multifiber WDM Networks with Non-Uniform Fiber Cost Routing and Wavelength Assignment in Multiiber WDM Netorks ith Non-Uniorm Fiber Cost Christos Nomikos a, Aris Pagourtzis b, Katerina Potika b and Stathis Zachos b a Department o Computer Science, University

More information

Acyclic orientations do not lead to optimal deadlock-free packet routing algorithms

Acyclic orientations do not lead to optimal deadlock-free packet routing algorithms Acyclic orientations do not lead to optimal deadloc-ree pacet routing algorithms Daniel Šteanovič 1 Department o Computer Science, Comenius University, Bratislava, Slovaia Abstract In this paper e consider

More information

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean

More information

High-Performance and Area-Efficient Hardware Design for Radix-2 k Montgomery Multipliers

High-Performance and Area-Efficient Hardware Design for Radix-2 k Montgomery Multipliers High-Performance and Area-Efficient Hardare Design for Radix- k Montgomery Multipliers Liang Zhou, Miaoqing Huang, Scott C. Smith University of Arkansas, Fayetteville, Arkansas 771, USA Abstract Montgomery

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

Neighbourhood Operations

Neighbourhood Operations Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle

More information

13. Power Optimization

13. Power Optimization 13. Power Optimization May 2013 QII52016-13.0.0 QII52016-13.0.0 The Quartus II sotware oers power-driven compilation to ully optimize device power consumption. Power-driven compilation ocuses on reducing

More information

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012 CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required

More information

Message authentication

Message authentication Message authentication -- Reminder on hash unctions -- MAC unctions hash based block cipher based -- Digital signatures (c) Levente Buttyán (buttyan@crysys.hu) Hash unctions a hash unction is a unction

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Cryptographic Hash Functions. Rocky K. C. Chang, February 5, 2015

Cryptographic Hash Functions. Rocky K. C. Chang, February 5, 2015 Cryptographic Hash Functions Rocky K. C. Chang, February 5, 2015 1 This set of slides addresses 2 Outline Cryptographic hash functions Unkeyed and keyed hash functions Security of cryptographic hash functions

More information

Digital Design using HDLs EE 4755 Final Examination

Digital Design using HDLs EE 4755 Final Examination Name Digital Design using HDLs EE 4755 Final Examination Thursday, 8 December 26 2:3-4:3 CST Alias Problem Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Exam Total (3 pts) (2 pts) (5 pts) (5 pts) (

More information

The spatial frequency response and resolution limitations of pixelated mask spatial carrier based phase shifting interferometry

The spatial frequency response and resolution limitations of pixelated mask spatial carrier based phase shifting interferometry The spatial requency response and resolution limitations o pixelated mask spatial carrier based phase shiting intererometry Brad Kimbrough, James Millerd 4D Technology Corporation, 80 E. Hemisphere Loop,

More information

9.8 Graphing Rational Functions

9.8 Graphing Rational Functions 9. Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm P where P and Q are polynomials. Q An eample o a simple rational unction

More information

Sensor Stream Reduction For Clustered Wireless Sensor Networks

Sensor Stream Reduction For Clustered Wireless Sensor Networks Sensor Stream Reduction For Clustered Wireless Sensor Netorks André L. L. de Aquino 1, Carlos M. S. Figueiredo 2, Eduardo F. Nakamura 2, Alejandro C. Frery 3, Antonio A. F. Loureiro 1, Antônio Otávio Fernandes

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

MICROCOLOUR MRD CONTROLLER

MICROCOLOUR MRD CONTROLLER User Instructions for MICROCOLOUR MRD CONTROLLER Unit C O N T E N T S Page 2 Description 3 MULTIFUNCTION DIGITAL TIMER - MODES 4 FEATURES 5 -UP INSTRUCTION 6 EXPOSURE TIMER 8 SHUTTER SYNCHRONISATION -

More information

10. SOPC Builder Component Development Walkthrough

10. SOPC Builder Component Development Walkthrough 10. SOPC Builder Component Development Walkthrough QII54007-9.0.0 Introduction This chapter describes the parts o a custom SOPC Builder component and guides you through the process o creating an example

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Piecewise polynomial interpolation

Piecewise polynomial interpolation Chapter 2 Piecewise polynomial interpolation In ection.6., and in Lab, we learned that it is not a good idea to interpolate unctions by a highorder polynomials at equally spaced points. However, it transpires

More information

AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 16-POINT MODULE AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,

More information

Architecture Techniques

Architecture Techniques EE29A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 3 Architecture Techniques Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki up and running Go to: EEWeb / Online Lab Please

More information

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, K.S.R College of Engineering, Tiruchengode, Tamilnadu,

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design International Journal of Engineering Research and General Science Volume 2, Issue 3, April-May 2014 Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design FelcyJeba

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers

More information

Power Efficient Solutions w/ FPGAs. Bill Jenkins Altera Sr. Product Specialist for Programming Language Solutions

Power Efficient Solutions w/ FPGAs. Bill Jenkins Altera Sr. Product Specialist for Programming Language Solutions 1 Poer Efficient Solutions / FPGs Bill Jenkins ltera Sr. Product Specialist for Programming Language Solutions System Challenges CPU rchitecture is inefficient for most parallel computing applications

More information

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor

More information

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the EET 3 Chapter 3 7/3/2 PAGE - 23 Larger K-maps The -variable K-map So ar we have only discussed 2 and 3-variable K-maps. We can now create a -variable map in the same way that we created the 3-variable

More information

2. Design Planning with the Quartus II Software

2. Design Planning with the Quartus II Software November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various

More information

The Synthesis of Cyclic Combinational Circuits

The Synthesis of Cyclic Combinational Circuits The Synthesis o Cyclic Combinational Circuits Marc D. Riedel riedel@paradise.caltech.edu Caliornia Institute o Technology Mail Code 136 93, Pasadena, CA 91125 Jehoshua Bruck bruck@paradise.caltech.edu

More information

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.

More information

High-Performance Full Adders Using an Alternative Logic Structure

High-Performance Full Adders Using an Alternative Logic Structure Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

Reducing Pin and Area Overhead in Fault-Tolerant FPGAbased

Reducing Pin and Area Overhead in Fault-Tolerant FPGAbased Reducing Pin and Area Overhead in Fault-Tolerant FPGAbased Designs Fernanda Lima Luigi Carro Ricardo Reis Universidade Federal do Rio Grande do Sul PPGC - Instituto de Inormática - DELET Caia Postal: 15064,

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

Wolfson Control Write Sequencer

Wolfson Control Write Sequencer Wolfson Control Write Sequencer The Control Write Sequencer is a function that executes pre-programmed sequences of register operations ith a high degree of autonomy from the host processor. This means

More information

Classification Method for Colored Natural Textures Using Gabor Filtering

Classification Method for Colored Natural Textures Using Gabor Filtering Classiication Method or Colored Natural Textures Using Gabor Filtering Leena Lepistö 1, Iivari Kunttu 1, Jorma Autio 2, and Ari Visa 1, 1 Tampere University o Technology Institute o Signal Processing P.

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

HW/SW Partitioning of an Embedded Instruction Memory Decompressor

HW/SW Partitioning of an Embedded Instruction Memory Decompressor HW/SW Partitioning of an Embedded Instruction Memory Decompressor Shlomo Weiss and Shay Beren EE-Systems, Tel Aviv University Tel Aviv 69978, ISRAEL ABSTRACT We introduce a ne PLA-based decoder architecture

More information

New Key-Recovery Attacks on HMAC/NMAC-MD4 and NMAC-MD5

New Key-Recovery Attacks on HMAC/NMAC-MD4 and NMAC-MD5 New Key-Recovery Attacks on MAC/NMAC-MD4 and NMAC-MD5 Lei Wang, Kazuo Ohta and Noboru Kunihiro* The University o Electro-Communications * The University o Tokyo at present. 1 Motivation o This Research

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

REAL-TIME 3D GRAPHICS STREAMING USING MPEG-4

REAL-TIME 3D GRAPHICS STREAMING USING MPEG-4 REAL-TIME 3D GRAPHICS STREAMING USING MPEG-4 Liang Cheng, Anusheel Bhushan, Renato Pajarola, and Magda El Zarki School of Information and Computer Science University of California, Irvine, CA 92697 {lcheng61,

More information

Logic Debugging of Arithmetic Circuits

Logic Debugging of Arithmetic Circuits Logic Debugging o Arithmetic Circuits Samaneh Ghandali, Cunxi Yu, Duo Liu, Walter Brown, Maciej Ciesielski University o Massachusetts, Amherst, USA {samaneh, ycunxi, duo, webrown, ciesiel}@umass.edu Abstract

More information

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. A.Anusha 1 R.Basavaraju 2 anusha201093@gmail.com 1 basava430@gmail.com 2 1 PG Scholar, VLSI, Bharath Institute of Engineering

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY Andrew C. Ling Electrical and Computer Engineering University o Toronto Toronto, CANADA email: aling@eecg.toronto.edu Deshanand P. Singh, Stephen

More information

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017 VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier 1 Katakam Hemalatha,(M.Tech),Email Id: hema.spark2011@gmail.com 2 Kundurthi Ravi Kumar, M.Tech,Email Id: kundurthi.ravikumar@gmail.com

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO.

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. vii TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATION iii xii xiv xvii 1 INTRODUCTION 1 1.1 GENERAL 1 1.2 TYPES OF WIRELESS COMMUNICATION

More information

ECE 6560 Multirate Signal Processing Chapter 8

ECE 6560 Multirate Signal Processing Chapter 8 Multirate Signal Processing Chapter 8 Dr. Bradley J. Bazuin Western Michigan University College o Engineering and Applied Sciences Department o Electrical and Computer Engineering 903 W. Michigan Ave.

More information

In this project, you'll learn how to enter data using flash fill using the Flash Fill Options button and automatic recognition.

In this project, you'll learn how to enter data using flash fill using the Flash Fill Options button and automatic recognition. Workshops Introduction The Workshops are all about being creative and thinking outside of the box. These orkshops ill help your right-brain soar, hile making your left-brain happy; by explaining hy things

More information

Chapter 8 Folding. VLSI DSP 2008 Y.T. Hwang 8-1. Introduction (1)

Chapter 8 Folding. VLSI DSP 2008 Y.T. Hwang 8-1. Introduction (1) Chapter 8 olding LSI SP 008 Y.T. Hang 8- folding Introduction SP architecture here multiple operations are multiplexed to a single function unit Trading area for time in a SP architecture Reduce the number

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

A unified architecture of MD5 and RIPEMD-160 hash algorithms

A unified architecture of MD5 and RIPEMD-160 hash algorithms Title A unified architecture of MD5 and RIPMD-160 hash algorithms Author(s) Ng, CW; Ng, TS; Yip, KW Citation The 2004 I International Symposium on Cirquits and Systems, Vancouver, BC., 23-26 May 2004.

More information

Management of Secret Keys: Dynamic Key Handling

Management of Secret Keys: Dynamic Key Handling Management of Secret Keys: Dynamic Key Handling Joan Daemen Banksys Haachtesteeneg 1442 B-1130 Brussel, Belgium Daemen.J@banksys.be Abstract. In this paper e describe mechanisms for the management of secret

More information

Contrast Improvement on Various Gray Scale Images Together With Gaussian Filter and Histogram Equalization

Contrast Improvement on Various Gray Scale Images Together With Gaussian Filter and Histogram Equalization Contrast Improvement on Various Gray Scale Images Together With Gaussian Filter and Histogram Equalization I M. Rajinikannan, II A. Nagarajan, III N. Vallileka I,II,III Dept. of Computer Applications,

More information

ECE 545 Lecture 7. VHDL Description of Basic Combinational & Sequential Circuit Building Blocks. Required reading. Fixed Shifters & Rotators

ECE 545 Lecture 7. VHDL Description of Basic Combinational & Sequential Circuit Building Blocks. Required reading. Fixed Shifters & Rotators EE 55 Lecture 7 VHL escription o Basic ombinational & Sequential ircuit Building Blocks Required reading P. hu, RTL Hardare esign using VHL hapter 7, ombinational ircuit esign: Practice hapter 5., VHL

More information

Review: Chip Design Styles

Review: Chip Design Styles MPT-50 Introduction to omputer Design SFU, Harbour entre, Spring 007 Lecture 9: Feb. 6, 007 Programmable Logic Devices (PLDs) - Read Only Memory (ROM) - Programmable Array Logic (PAL) - Programmable Logic

More information

Formats. SAS Formats under OpenVMS. Writing Binary Data CHAPTER 13

Formats. SAS Formats under OpenVMS. Writing Binary Data CHAPTER 13 263 CHAPTER 13 Formats SAS Formats under OpenVMS 263 Writing Binary Data 263 SAS Formats under OpenVMS A SAS format is an instruction or template that the SAS System uses to rite data values. Most SAS

More information

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1 Hash Tables Outline Definition Hash functions Open hashing Closed hashing collision resolution techniques Efficiency EECS 268 Programming II 1 Overview Implementation style for the Table ADT that is good

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S RENUKUNTLA KIRAN 1 & SUNITHA NAMPALLY 2 1,2 Ganapathy Engineering College E-mail: kiran00447@gmail.com, nsunitha566@gmail.com Abstract- In

More information

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA Chandana Pittala 1, Devadas Matta 2 PG Scholar.VLSI System Design 1, Asst. Prof. ECE Dept. 2, Vaagdevi College of Engineering,Warangal,India.

More information

ED&TC 97 on CD-ROM Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided

ED&TC 97 on CD-ROM Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided Accurate High Level Datapath Power Estimation James E. Crenshaw and Majid Sarrafzadeh Department of Electrical and Computer Engineering Northwestern University, Evanston, IL 60208 Abstract The cubic switching

More information

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths International Journal of Computer Science and Telecommunications [Volume 3, Issue 5, May 2012] 105 ISSN 2047-3338 FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for

More information

- 0 - CryptoLib: Cryptography in Software John B. Lacy 1 Donald P. Mitchell 2 William M. Schell 3 AT&T Bell Laboratories ABSTRACT

- 0 - CryptoLib: Cryptography in Software John B. Lacy 1 Donald P. Mitchell 2 William M. Schell 3 AT&T Bell Laboratories ABSTRACT - 0 - CryptoLib: Cryptography in Software John B. Lacy 1 Donald P. Mitchell 2 William M. Schell 3 AT&T Bell Laboratories ABSTRACT With the capacity of communications channels increasing at the current

More information

An Improved Neural Network Design with Asynchronous Programmable Synaptic Memory

An Improved Neural Network Design with Asynchronous Programmable Synaptic Memory Foundation of Computer Science FCS, Ne York, USA Volume 1 No.7, May 2015.caeaccess.org An Improved Neural Netork Design ith Asynchronous Programmable Synaptic Memory Vaishnavi.M PG Student, ECE Department

More information

ExBCG-TC: Extended Borel Cayley Graph Topology Control for Ad Hoc Networks

ExBCG-TC: Extended Borel Cayley Graph Topology Control for Ad Hoc Networks Ex: Extended Borel Cayley Graph Topology Control for Ad Hoc Netorks Dongsoo Kim and Jaeook Yu Department of ECE Stony Brook University (SUNY) Stony Brook, NY 11794 2350 Email: {dongsoo.kim, jaeook.yu}@stonybrook.edu

More information

Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Lea Hwang Lee, William Moyer, John Arends

Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Lea Hwang Lee, William Moyer, John Arends Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications ith Small Tight Loops Lea Hang Lee, William Moyer, John Arends Instruction Fetch Energy Reduction Using Loop Caches For Loop

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

KANGAL REPORT

KANGAL REPORT Individual Penalty Based Constraint handling Using a Hybrid Bi-Objective and Penalty Function Approach Rituparna Datta Kalyanmoy Deb Mechanical Engineering IIT Kanpur, India KANGAL REPORT 2013005 Abstract

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Affine Transformations Computer Graphics Scott D. Anderson

Affine Transformations Computer Graphics Scott D. Anderson Affine Transformations Computer Graphics Scott D. Anderson 1 Linear Combinations To understand the poer of an affine transformation, it s helpful to understand the idea of a linear combination. If e have

More information

A Controller Testability Analysis and Enhancement Technique

A Controller Testability Analysis and Enhancement Technique A Controller Testability Analysis and Enhancement Technique Xinli Gu Erik Larsson, Krzysztof Kuchinski and Zebo Peng Synopsys, Inc. Dept. of Computer and Information Science 700 E. Middlefield Road Linköping

More information

Multiresolution on Spherical Curves

Multiresolution on Spherical Curves Multiresolution on Spherical Curves Troy Alderson, Ali Mahdavi Amiri, Faramarz Samavati Department o Computer Science, University o Calgary Abstract In this paper, e present an approximating multiresolution

More information

A Binary Redundant Scalar Point Multiplication in Secure Elliptic Curve Cryptosystems

A Binary Redundant Scalar Point Multiplication in Secure Elliptic Curve Cryptosystems International Journal of Network Security, Vol3, No2, PP132 137, Sept 2006 (http://ijnsnchuedutw/) 132 A Binary Redundant Scalar Multiplication in Secure Elliptic Curve Cryptosystems Sangook Moon School

More information

Foveated Wavelet Image Quality Index *

Foveated Wavelet Image Quality Index * Foveated Wavelet Image Quality Index * Zhou Wang a, Alan C. Bovik a, and Ligang Lu b a Laboratory or Image and Video Engineering (LIVE), Dept. o Electrical and Computer Engineering The University o Texas

More information

The Rational Zero Theorem

The Rational Zero Theorem The Rational Zero Theorem Our goal in this section is to learn how we can ind the rational zeros o the polynomials. For example: x = x 4 + x x x + ( ) We could randomly try some actors and use synthetic

More information

A Classification System and Analysis for Aspect-Oriented Programs

A Classification System and Analysis for Aspect-Oriented Programs A Classiication System and Analysis or Aspect-Oriented Programs Martin Rinard, Alexandru Sălcianu, and Suhabe Bugrara Massachusetts Institute o Technology Cambridge, MA 02139 ABSTRACT We present a new

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

2. Recommended Design Flow

2. Recommended Design Flow 2. Recommended Design Flow This chapter describes the Altera-recommended design low or successully implementing external memory interaces in Altera devices. Altera recommends that you create an example

More information

Hybrid Signed Digit Representation for Low Power Arithmetic Circuits

Hybrid Signed Digit Representation for Low Power Arithmetic Circuits Hybrid Signed Digit Representation for Low Power Arithmetic Circuits Dhananjay S. Phatak Steffen Kahle, Hansoo Kim and Jason Lue Electrical Engineering Department State University of New York Binghamton,

More information

An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n )

An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n ) An Algorithm and Hardware Architecture for Integrated Modular Division and Multiplication in GF (p) and GF (2 n ) Lo ai A. Tawalbeh and Alexandre F. Tenca School of Electrical Engineering and Computer

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Introductory Digital Systems Lab (6.111) uiz - Spring 2004 Prof. Anantha Chandrakasan Student Name: Problem

More information

Implementation of a Turbo Encoder and Turbo Decoder on DSP Processor-TMS320C6713

Implementation of a Turbo Encoder and Turbo Decoder on DSP Processor-TMS320C6713 International Journal of Engineering Research and Development e-issn : 2278-067X, p-issn : 2278-800X,.ijerd.com Volume 2, Issue 5 (July 2012), PP. 37-41 Implementation of a Turbo Encoder and Turbo Decoder

More information

Area And Power Optimized One-Dimensional Median Filter

Area And Power Optimized One-Dimensional Median Filter Area And Power Optimized One-Dimensional Median Filter P. Premalatha, Ms. P. Karthika Rani, M.E., PG Scholar, Assistant Professor, PA College of Engineering and Technology, PA College of Engineering and

More information

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY Saroja pasumarti, Asst.professor, Department Of Electronics and Communication Engineering, Chaitanya Engineering

More information

DSP Design Flow User Guide

DSP Design Flow User Guide DSP Design Flow User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Document Date: June 2009 Copyright 2009 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,

More information

Algorithms (III) Yu Yu. Shanghai Jiaotong University

Algorithms (III) Yu Yu. Shanghai Jiaotong University Algorithms (III) Yu Yu Shanghai Jiaotong University Review of the Previous Lecture Factoring: Given a number N, express it as a product of its prime factors. Many security protocols are based on the assumed

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University Hardware description language (HDL) Intended to describe circuits textually, for a computer to read Evolved starting in the 1970s and 1980s Popular languages today include: VHDL Defined in 1980s by U.S.

More information

ITU - Telecommunication Standardization Sector. G.fast: Far-end crosstalk in twisted pair cabling; measurements and modelling ABSTRACT

ITU - Telecommunication Standardization Sector. G.fast: Far-end crosstalk in twisted pair cabling; measurements and modelling ABSTRACT ITU - Telecommunication Standardization Sector STUDY GROUP 15 Temporary Document 11RV-22 Original: English Richmond, VA. - 3-1 Nov. 211 Question: 4/15 SOURCE 1 : TNO TITLE: G.ast: Far-end crosstalk in

More information

Using VCS with the Quartus II Software

Using VCS with the Quartus II Software Using VCS with the Quartus II Sotware December 2002, ver. 1.0 Application Note 239 Introduction As the design complexity o FPGAs continues to rise, veriication engineers are inding it increasingly diicult

More information