A Reconfigurable Network Architecture For Parallel Prefix Counting

Size: px
Start display at page:

Download "A Reconfigurable Network Architecture For Parallel Prefix Counting"

Transcription

1 A Reconfigurable Network Architecture For Parallel Prefix Counting R. Lin S. Olariu Department of Computer Science Department of Computer Science SUNY at Geneseo Old Dominion University Geneseo, NY Norfolk, Va Abstract We propose an efficient reconfigurable parallel prefix counting network based on the recently-proposed technique of shift switching with domino logic, where the discharging signals can propagate along the switch chain asynchronously and produce a semaphore to indicate the end of each domino process. This results in a network that is fast and highly hardwarecompact. The proposed architecture for prefix counting N-1 bits involves a total of N+ cascaded simple switches and achieves a delay of (logn)( T d + ) + T add where T d is the time for discharging 4 cascaded precharged pass-transistors, T add is a full adder delay, and < T add. This is significantly faster than any design known to us for N Another important feature of the proposed architecture is that it requires a very simple control structure, driven solely by the semaphores. This significantly reduces the hardware complexity and fully utilizes the inherent speed of the computation. Index Terms Special purpose architectures, reconfigurable architectures, digital signal processing, VLSI design, domino logic, computer arithmetic. The work was supported, in part, by National Science Foundation under grants MIP , CCR and...

2 1. Introduction Due to the high degree of miniaturization possible today in VLSI technology, the size and complexity of designs that can be implemented in hardware has increased dramatically. This has made it technologically feasible and econnomically viable to develop high-speed, applications-specific, architectures featuring a spectacular performance increase over their general-purpose counterparts. The main goal of this work is to present such a special-purpose architecture for fast and hardware compact computation of parallel binary prefix counting. Our design is based on the recently-proposed technique of shift switching with domino logic [7]. Reconfigurable bus systems [1, 4, 5, 13, 15] with shift switches have been recently proposed to solve a number of fundamental computation problems [7-11]. In particular, it has been shown that shift switching on enhanced reconfigurable buses, that is, on buses containing shift switches endowed with buffers (inverters), is an elegant and powerful arithmetic design technique. It possesses the following unique features: (1) When special digital signals called state signals are propagating through a shift switch array, or enhanced reconfigurable buses, the desired modulo operations can be performed directly, simplifying the traditional approach to basic arithmetic computation; and (2) during the process the state signals are inverted, after passing through each shift switch, alternatively in two mutually inverted forms (n and p), minimizing the loads of transistors and maximizing the speeds of circuits. By apply precharged CMOS NP domino logic technique on pass-transistor-based shift switching buses, we can obtain a domino discharging chain with a new feature, that is, the discharging signals can propagate along the chain asynchronously (in the absence of a clocking scheme) and always produces a semaphore to indicate the end of the process[7]. This makes it possible to construct a network of processing elements to compute parallel binary prefix sums fast and in a highly hardware-compact fashion. The problem of parallel binary prefix counting is fundamental in parallel processing, for its solution is the principle ingredient in arithmetic expression evaluation, storage and data compaction, processor assignment, and routing, among many others [3, 8]. In this paper we propose a parallel prefix counting network of size N-1 (that is, with N-1 input bits, for N=2 2k =n*n). It is a two levels construction containing a total of N+ cascaded basic shift switches, with N pass-transistor-based shift switch plus trans-gate-based shift switches. Each pass-transistor-based switch is associated with a simple processing element (PE): for each row of switch units there is a row processing element (PE_r). In fact, both PEs and PE_rs, are simple control units, receiving 1

3 corresponding semaphores, and either sending control signals to tri-state drivers, or sending select signal to the MUX, or sending precharge/evaluation enable signals to start precharge/evaluation process. Thus, the entire network can be perceived as an application-specific circuit. The proposed network achieves a total delay of (logn)( T d + ) + T add where T d is the delay of discharging 4 cascaded precharged pass-transistors, and T add is an full adder delay, with <T add (see Section 5). Since it is not realistic to compute the prefix sums of a large array of binary numbers, say N 2 10, we assume that N<2 10. Under this assumption, the proposed architecture is significantly (at least 50%) faster than any known architectures, including tree of adders [3], or a processor with the same structure as ours but with each shift switch substituted by a half adder (half-adder-based processor for short), or our previously proposed architectures [8]. Another important feature of the proposed design is that it has a compact VLSI area, requiring an area about 0.5(N+ ) A h (almost linear in the input size), where A h is the area equivalent of a half adder. This area is significantly smaller than both the half adder-based processors (about 50%), and the tree of half adders processors, which require area of about (NlogN -1.5N+1)A h (note that the area for the control devices necessary in any such architecture is omitted). Another important feature of our architecture is that the N-prefix sums are computed and output row by row, with a simple PE (or control unit) per row. The control units runs asynchronously, driven solely by enable signals, i.e. semaphores produced at the end of the row s domino discharging. This greatly simplifies the hardware requirement, and the full inherent speed of the computation can be utilized. 2. State signals and shift switches Shift switches and state signals have been introduced in [6-11]. To make the paper self-contained, we now briefly review some of the relevant concepts. Definition 1. A state signal with integer value I, (0 I w-1; w 2 ), is represented by bit sequence b0, b1,..bw-1 (note that the order of bits here is either from left to right or bottom to top (Figure 1), with the unique bit u (either 0 or 1) in the I-th position. A state signal is said to be of n- (p-) type denoted by I (w) (I ) if u=0 (1). An S-signal may also be denoted by I (w) regardless of type or simply I (value only). In this paper we use only state signals with w=2, the discussion of general state signals can be found in [7]. The type of a state signals with w>2 is implied by its representation. However, for w=2 it must be defined 2

4 explicitly. The conversion between I (2) and I can be done easily through exchanging the positions of its two bits. Definition 2. A degenerate state signal with value I, (0 I w-1; w 2), is state signal I with the 0-th bit removed, for w=2 it may be denoted by I ( 2) (I ) or I (2), or simple I if no confusion arises (Figure 1). Clearly, the degenerate state signals I( ) and I(2) can also be seen as 1-bit regular signals I and respectively. (a) (b) (c) (d) Figure 1.State (and degenerate) signals (a) (or 1 (2)) (b) (0 (2) ) (c) (1 (2) ) (d) ( 0 (2) ). Definition 3. The inversion of a w-bit state signal I is represented by I with each of its bits inverted. The inversion of I( ) (or I (2) ) can be interpreted either as the same type state signal with value inverted, i.e. ( ) (or (2) ), or as a different type state signal with the same value, i.e. I (2) (I ( ) ), and this can be explicitly indicated if needed. In this paper, we alway interpret state signal inversions as the inversions of their types without changes in their values (which may be changed by shift switches as shown below). And, if necessary, we convert I( ) to I, I(2) to (see Figure 2). Figure 2 The conversions of regular and state signals and the inversions of state signals. Note that the bit sequence inside parentheses represents the corresponding state signal. It will soon be clear that the primary reason for employing state signals is that we can add small integers through shifting, instead of doing logic operations. This is particularly useful when a sequence of such additions must be performed. A shift switch processing 2-bit state signals is defined below: 3

5 Definition 4. Given state signals X (2), Y (2) called shift_in and control, respectively, a shift switch S<2, 1> is a trans-gate-based digital filter which realizes the function F (X (2), Y (2) )=(R (2), C (2) ), where outputs R (2) and C (2) are called shift_out and rotation signals, respectively, and the following hold: (a) X+Y = 2Q+R for some integers Q, R of 0 or 1; (b) C( ) =Q; Note that the shift switch definition is independent of implementation details. However, once a shift switch is implemented, the types of its inputs/outputs signals must be fixed. For example, the output C (2) can be implemented as C( ) or C(2) i.e. Q or.we may associate the quadruple, (X( ),Y( ),R(2), ), called specification, with a shift switch to describe its inputs/outputs signal types if necessary. The Shift switch S<2, 1> is also called basic (shift) switch. Figure 3. A shift switch S<2, 1> specified as: (X( ),Y( ),R(2), ) and sample inputs/outputs. Figure 3 shows such a switch. Since R=(X+Y) mod 2; and Q= =, the shift_out signal can be seen as the shift_in signal with Y bits cyclically shifted, i.e. the shifting is controlled by Y. We refer to R as the remainder of (X+Y) over 2, to Q as the quotient of (X+Y) over 2, and to Y as the state of the switch. When a signal propagates through an array of cascaded shift switches, we also call it a shifting signal. Definition 5. (extension of definition 4) (1) If C (2) is not produced, i.e. the output of the function consists of R only, the shift switch is denoted by S <2,1>. It is functionally identical to the well-known 2 2 barrel shifter (processing state signals) [19]; (2) if R (2) instead of R (2) is produced, the shift switch is denoted by <2, 1>; Note that, in general, a shift switch which processes w-bit state signals with a d-bit control signal is denoted by S<w, d>. The significance of a shift switch lies in the fact that it can sum two small integers Y and X by essentially letting X pass through one transistor that is preset by the control signal Y. Also since both shifting and control signals are state signals, we can use the shift-out of a switch to control another switch, thus we may 4

6 easily organize a network of shift switches (or enhanced reconfigurable buses) for the desired arithmetic computation. 3. The trans-gate based CMOS implementation of shift switches Figure 4 illustrates some trans-gate based schematics of basic shift switches: S<2,1>, S <2,1> and <2,1>. It is clear that if the trans-gates are preset by state signal Y, i.e. Y is received properly (about T inv ) earlier than X then for all of them the delay from X to R (or Q) can be specified as T trans.+ T inv, T trans is a trans-gate delay, T inv is the delay of an inverter with a small standard load (we assume that 4T inv + 2T trans =T add or 4 (T inv + T trans ) =T add [12]). For the purpose of our special-purpose application in this paper, the trans-gates of a switch may be preset by two zero bits (or state signal 0 ), in which case, all trans-gates of the switch are open, and the shift in X and shift out R are disconnected. In section 5 we will use the trans-gate based switches S <2,1> and <2,1> to construct a switch array for the computation of prefix row-parities. (a) (b) (c) Figure 4. The trans-gate-based schematics (a) S<2,1>: (X(2),Y( ),R(2),Q); (b) S <2,1>:(X(2),Y( ),R(2));. (c) <2,1>:(X(2),Y( ),R( ) ); 4. The Precharged CMOS implementation of shift switches and switch units Figures 5 illustrates two pass-transistor-based schematics of basic shift switches: S<2,1> (in the doted boxes). The corresponding modified switches with a state signal generator of Y added, are called n-switch 5

7 (Figures 5.a and 5.c) and p-switch (Figure 5.b and 5.d), respectively. Once the switches are precharged (high for n-switch, low for p-switch) and preset, the discharging signal X(2) (X( )) from the shift-in port will pull-down (pull-up) the data path to yield shift-out R(2) (R( )) and Q ( ). To improve the efficiency of discharging, we cascade a small number (four in the proposed scheme) of switches of the same type to form an n- (or p-) switch unit, as illustrated in Figure 6. The discharging process now yields the following (modulo 2) results (refer to Figure 6): u=(x+a) mod 2, v=(x+a+b) mod 2, w=(x+a+b+c) mod 2, z=(x+a+b+c+d) mod 2=R, and a =(X+a) DIV 2 (i.e. the quotient of (X+a) over 2), b =(X+a+b ) DIV 2, c =(X+a+b+c) DIV 2, z =(X+a+b+c+d) DIV 2. The complete process is composed of two phases that we describe next: A. Precharge phase, in this phase the following should occur (we use n-unit as example, see Figure 6.a), (1) E (tri-state enable signal) is set to 0, i.e. the output of each tri-state internal bus driver is in Hi-Z; (2) a bit data item is loaded in the corresponding register (at the beginning the item is the input bit of the PE associated with the switch), which preset each switch to the corresponding state; (3) prec/eval (precharge and evaluation signal) is set to 0, which starts recharging all switches of the unit in parallel. When the precharge is done, the semaphore q=0 and R=11 are produced. If a row contains more than one switch unit, then the units are cascaded in an alternative (n-p) chain form. The precharge phase for p-unit can be described similarly (refer to Figure 6.b), with the results of q=1, and R=00, which may become the input X of the following n-unit. Evaluation phase, in this phase the following should occur (see Figure 6.a), (1) prec/eval is set to 1; (2) when the discharging state signal X =01 (or 10) arrives, its bit 0 pull-down the preset data path along the unit to produce the (modulo 2) results (i.e. u, v, w, z and a, b, c, d ) for each unit as described above, and to produce a semaphore q=1 and R=01 (or 10); (3) if the signal E from PE_r is 1, read output bits (i.e.u, v, w, z ), otherwise no outputs; (4) if signal E from PE_r is 1, each PE triggers register-load operation (loading the corresponding a, b, c, d ) otherwise there is no loading (5) set prec/eval back to 0, i.e. start precharge again (and keep in precharge phase until next evaluation begin). If a row contains more than one switch unit, the discharging process can propagate from one switch unit to another automatically, i.e. the whole process will be a domino discharging process. Note that the PEs of each switch unit here can be simply driven by the semaphore of the unit, in other words, all operations here can be driven by the semaphore after initialization. 6

8 (a) (b) (c) (d) Figure 5. The pass-transistor-based schematics of (a) n-switch S<2,1>: (X( ),Y( ),R(2), ) and (b) p-switch (c and d) the alternative representations of n-switch S<2, 1> and p-switch. (a) 7

9 (a) Figure 6. (a) The n-switch unit and (b) the p-switch unit. 5. The parallel prefix counting network architecture We are now in a position to present our prefix counting network architecture and the prefix counting algorithm. Figure 7 shows the block diagram of a complete prefix counting network with input size of N-1=63. The mesh consists of n=8 rows, each featuring two cascaded (an n- and a p-) switch units as described in section 4. To simplify the network control, we use an additional column of trans-gate-based shift switch array on the left part of the mesh (see Figure 7). Let the input of the column switch array be a sequence of (p-type) state signals, b1, b2,..b7, and the output be a sequence of regular signals, p1, p2,.. p7. Clearly the pi=(b1+..+bi) mod 2. for all i (Note that it is slower than the precharged switch array and generates no semaphore, however, the computation does not require two phases). The beginning of each row consists of a row processing element (PE_r) which receives semaphore from the previous row, and controls a 2-input multiplexer, and an input state signal generator (consisting of two tri-state buffers). The computation algorithm is spelled out as follows: 8

10 1. Initial stage {to compute and output the least significant bits of the prefix counts} Step 1. all PE load their input bits into the registers; Step 2. each unit starts the initial precharge phase; Step 3. PE_r sets select signal to 0, such that, the right input (i.e. 0) of each MUX is selected; Step 4. PE_r sets Er=1, each row begin domino discharging; Step 5. PE_r sets E=0 (i.e. no output and register loading, refer to evaluation phase (4) in the last ection); Step 6. i *(T inv +T trans ) time after the semaphore value of 1 is received, the i-th PE_r set select signal to 1 Step 7. the i-th PE_r sets E=1 (i.e. to output and load register); {note after this step the precharge is automatically driven by semaphores and is done in parallel to the operations for charging controls} 2. Main stage, consists of logn-1 iterations of the following {note: different rows get into the main stage at different time, the first row gets into the stage earliest, the last row, the latest} Step 8. PE_r sets select signal to 0, such that, the right input (i.e. 0) of each MUX is selected; Step 9. PE_r sets Er=1, each row begin domino discharging; Step 10. PE_r sets E=0 (i.e. no output and register loading); Step 11. PE_r sets select signal to 1; Step 12. PE_r sets Er=1, each row begin domino discharging; Step 13. PE_r sets E=1 (i.e. output and register loading). The algorithm can be interpreted as follows: In the initial stage each row computes the least significant bit of the sum of the row (steps 1 to 5). The results (called the parity-bits of the rows) then are prefix summed by the column switch array, which takes about i * (T inv +T trans ) time to get the i-th prefix sum computed, so i * (T inv +T trans ) time after the first semaphore value of 1 is received, the i-the row start to compute the LSBs of the global prefix sums for the row (Steps 6 and 7); Once the inital stage is passed, each row gets into the main stage, and in the stage, the process is similar to the inital stage except that there is no waiting for the prefix sum of the parity-bits of the lower rows (the waiting has been made in the intial stage), the data are always available and the computations are for the remaining bits of the prefix sums. In fact, the column switch array takes a pipeline process to produce the bits of the global prefix counts of rows. From the implementation point of view the algorithm is very simple, in fact, every step can be easily driven by the value of the corresponding semaphore, with little additional logic devices needed to realize the controls. 9

11 Now the running time and VLSI area can be estimated as follows: the initial stage takes about 2 T d (two domino discharging processes along a row) + T add (waiting time, i.e. column switch array delay time); the main stage takes time (logn-1)*2 T d (two domino discharging processes along a row) + logn, here is the time from a switch of the column array receives an input to the beginning of domino discharging of the row next to the switch, which is about T trans + T inv + 2T inv (or 3T inv ) < T add So the total delay can be specified as (logn)( T d + )+ T add here T d is the delay of discharging 4 cascaded precharged pass-transistors, T add is an full adder delay =4(T inv +T trans ). Since to prefix a large array of binary number, say N 2 10 is not realistic, we assume that N<2 10. Under this assumption, it is easy to verify that our processor is significantly (at least 50%) faster than any known processors, say, tree of adders [5], or a processor with the same structure as ours but each shift switch is substituted by a half adder (half-adder-based processor for short). Since each pass-transistor-based switch is about half size of a half adder, the total area can be specified as 0.5(N+ ) A h, where A h is a half adder area equivalent (note registers and basic controls devices are not counted because they are necessary for any scheme to accomplish the computation). This area is significantly smaller than both half adder-based processor (about 50% less), and the tree of half adders processor (which requires an area (NlogN -1.5N+1)A h ). Note that the half adder-based processor requires significantly larger number of control devices because that it does not generate semaphore (even it uses the traditional domino logic technique). 10

12 Figure 7. Parallel prefix counting network. 11

13 7 Concluding remarks The main contribution of this paper was to propose a parallel prefix counting network of size N-1 (i.e.with N-1 input bits, for N=2 2k =n*n). Our design is a two level architecture involving a total of N+ cascaded simple shift switches, with N pass-transistor-based shift switches along with trans-gate-based shift switches. The resulting network achieves a delay of (logn)( T d + ) + T add where T d is the delay of discharging 4 cascaded precharged pass-transistors, and T add is an full adder delay. It is easily seen that this delay is at least 50% faster than any design known to us for N Our architecture is also area-compact and is constructed by using CMOS, NP domino logic technique on buses with shift switches. The processing elements require a very simple asynchronous control, being driven by semaphores produced at the end of each row s domino discharging process. This greatly simplifies the hardware requirement, and allows the full inherent speed of the computation to be utilized. 12

14 REFERENCES [1] J. Jang, H. Park and V. K. Prasanna, A bit model of the reconfigurable mesh, Proc. of the Workshop on Reconfigurable Architectures, the 8th International Parallel Processing Symposium, Cancun, Maxico, [2] R. H. Krambeck, C. M. Lee, and H.S. Law, High-Speed Compact Circuits with CMOS, IEEE Journal of Solid-State Circuits, Vol SC-17, No. 3, June, [3] F. T. Leighton, Parallel algorithms and architectures: Arrays.trees.hypercubes, Morgan kaufmann publishers, 1992, 49. [4] H. Li and M Maresca Polymorphic-torus architecture for computer vision, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 3, 1989, [5] H. Li and Q. Stout, Reconfigurable massively parallel computers, Prentice-Hall, [6] R. Lin, Reconfigurable Buses with Shift Switching - VLSI Radix Sort, Proc. of International Conference on Parallel Processing, St. Charles, Illinois, 1992, Vol III, pp [7] R. Lin, Shift switching and novel arithmetic schemes, in Proc. of 29th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov Vol [8] R. Lin, and S. Olariu, Reconfigurable buses with shift switching --concept and applications, in IEEE Trans. on Parallel And Distributed System.Vol. 6, No 1, January 1995, [9] R. Lin and Olariu, Reconfigurable network architecture for parallel prefix counting, Proc. of the Workshop on Reconfigurable Architectures, the 10th International Parallel Processing Symposium, Hawaii, [10] R. Lin, and S. Olariu, Reconfigurable shift switching parallel comparators, VLSI Design: An International Journal of Custom-Chip Design, Simulation, and Testing, VOL. 9, No. 1. March, [11] R. Lin, K. Nakano, S. Olariu, M.C. Pintoti, J.L. Schwing, A.Y. Zomaya, Scalable hardware-algorithms for binary prefix sums, in IEEE Transactions on Parallel and Distributed Systems, [12] LSI logic 1.0 micron cell-based products data book, LSI logic corporation, Milpitas, Califoria, [13] R. Miller, V. K. P. Kumar, D. Reisis, and Q. F. Stout, Parallel Computations on Reconfigurable Meshes, IEEE Transactions on Computers, 42 (1993), [14] A. Mukherjee, Introduction to nmos & CMOS VLSI System Design, Prentice-Hall, Englewood, NJ, [15] D. B. Shu, L. W. Chow, and J. G. Nash, A constant addressable, bit serial associate processor, Proc. of the IEEE workshop on VLSI signal processing, Montery CA, Nov [16] E. E. Swartzlander, Jr., Computer Arithmetic Vol. 1 and 2 (IEEE CSP, CA, 1990). [17] J. D. Ullman, Computational Aspect of VLSI (Computer Science Press, MD, 1983). [18] J. Wang, Z. Wang, G. A. Jullien, S. S. Bizzan, W. Luo and W. C. Miller, Circuit Driven Delay Optimization of EMODL Carry Lookahead Adders, Proc. of 28th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov [19] N. Weste and K. Eshraghian, PRINCIPLES OF CMOS VLSI DESIGN, A systems Perspective (Second Edition), A,ddison-Wesley Publishing Company,

An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching

An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.6, June 2007 209 An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching Young-Hak Kim Kumoh National

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik Bit Summation on the Recongurable Mesh Martin Middendorf? Institut fur Angewandte Informatik und Formale Beschreibungsverfahren, Universitat Karlsruhe, D-76128 Karlsruhe, Germany mmi@aifb.uni-karlsruhe.de

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

A 65nm Parallel Prefix High Speed Tree based 64-Bit Binary Comparator

A 65nm Parallel Prefix High Speed Tree based 64-Bit Binary Comparator A 65nm Parallel Prefix High Speed Tree based 64-Bit Binary Comparator Er. Deepak sharma Student M-TECH, Yadavindra College of Engineering, Punjabi University, Guru Kashi Campus,Talwandi Sabo. Er. Parminder

More information

Partitioned Branch Condition Resolution Logic

Partitioned Branch Condition Resolution Logic 1 Synopsys Inc. Synopsys Module Compiler Group 700 Middlefield Road, Mountain View CA 94043-4033 (650) 584-5689 (650) 584-1227 FAX aamirf@synopsys.com http://aamir.homepage.com Partitioned Branch Condition

More information

A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array

A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array P. Baglietto, M. Maresca and M. Migliardi DIST - University of Genoa via Opera Pia 13-16145 Genova, Italy email baglietto@dist.unige.it

More information

On-Line Error Detecting Constant Delay Adder

On-Line Error Detecting Constant Delay Adder On-Line Error Detecting Constant Delay Adder Whitney J. Townsend and Jacob A. Abraham Computer Engineering Research Center The University of Texas at Austin whitney and jaa @cerc.utexas.edu Parag K. Lala

More information

Column decoder using PTL for memory

Column decoder using PTL for memory IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems

Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems Valentin Gies and Thierry M. Bernard ENSTA, 32 Bd Victor 75015, Paris, FRANCE, contact@vgies.com,

More information

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems

More information

IMPLEMENTATION OF DIGITAL CMOS COMPARATOR USING PARALLEL PREFIX TREE

IMPLEMENTATION OF DIGITAL CMOS COMPARATOR USING PARALLEL PREFIX TREE Int. J. Engg. Res. & Sci. & Tech. 2014 Nagaswapna Manukonda and H Raghunadh Rao, 2014 Research Paper ISSN 2319-5991 www.ijerst.com Vol. 3, No. 4, November 2014 2014 IJERST. All Rights Reserved IMPLEMENTATION

More information

Madhusudan Nigam and Sartaj Sahni. University of Florida. Gainesville, FL <Revised July 1992> Technical Report 92-5 ABSTRACT

Madhusudan Nigam and Sartaj Sahni. University of Florida. Gainesville, FL <Revised July 1992> Technical Report 92-5 ABSTRACT Sorting n Numbers On n n Reconfigurable Meshes With Buses* Madhusudan Nigam and Sartaj Sahni University of Florida Gainesville, FL 32611 Technical Report 92-5 ABSTRACT We show how column

More information

Combinational Circuits

Combinational Circuits Combinational Circuits Q. What is a combinational circuit? A. Digital: signals are or. A. No feedback: no loops. analog circuits: signals vary continuously sequential circuits: loops allowed (stay tuned)

More information

Tailoring the 32-Bit ALU to MIPS

Tailoring the 32-Bit ALU to MIPS Tailoring the 32-Bit ALU to MIPS MIPS ALU extensions Overflow detection: Carry into MSB XOR Carry out of MSB Branch instructions Shift instructions Slt instruction Immediate instructions ALU performance

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

6. Combinational Circuits. Building Blocks. Digital Circuits. Wires. Q. What is a digital system? A. Digital: signals are 0 or 1.

6. Combinational Circuits. Building Blocks. Digital Circuits. Wires. Q. What is a digital system? A. Digital: signals are 0 or 1. Digital Circuits 6 Combinational Circuits Q What is a digital system? A Digital: signals are or analog: signals vary continuously Q Why digital systems? A Accurate, reliable, fast, cheap Basic abstractions

More information

Performance Evaluation of Digital CMOS Comparator Using Parallel Prefix Tree.

Performance Evaluation of Digital CMOS Comparator Using Parallel Prefix Tree. Performance Evaluation of Digital CMOS Comparator Using Parallel Prefix Tree. 1 D. Swathi 2 T. Arun Prasad 1 PG Scholar, Sri Sarada Institute of science and Technology, Anantharam (v), Bhongir(m), Nalgonda

More information

FLOATING POINT ADDERS AND MULTIPLIERS

FLOATING POINT ADDERS AND MULTIPLIERS Concordia University FLOATING POINT ADDERS AND MULTIPLIERS 1 Concordia University Lecture #4 In this lecture we will go over the following concepts: 1) Floating Point Number representation 2) Accuracy

More information

Design of Low Power Digital CMOS Comparator

Design of Low Power Digital CMOS Comparator Design of Low Power Digital CMOS Comparator 1 A. Ramesh, 2 A.N.P.S Gupta, 3 D.Raghava Reddy 1 Student of LSI&ES, 2 Assistant Professor, 3 Associate Professor E.C.E Department, Narasaraopeta Institute of

More information

VLSI for Multi-Technology Systems (Spring 2003)

VLSI for Multi-Technology Systems (Spring 2003) VLSI for Multi-Technology Systems (Spring 2003) Digital Project Due in Lecture Tuesday May 6th Fei Lu Ping Chen Electrical Engineering University of Cincinnati Abstract In this project, we realized the

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA Chandana Pittala 1, Devadas Matta 2 PG Scholar.VLSI System Design 1, Asst. Prof. ECE Dept. 2, Vaagdevi College of Engineering,Warangal,India.

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT K.Sandyarani 1 and P. Nirmal Kumar 2 1 Research Scholar, Department of ECE, Sathyabama

More information

On the Implementation of a Three-operand Multiplier

On the Implementation of a Three-operand Multiplier On the Implementation of a Three-operand Multiplier Robert McIlhenny rmcilhen@cs.ucla.edu Computer Science Department University of California Los Angeles, CA 9002 Miloš D. Ercegovac milos@cs.ucla.edu

More information

on Mesh Connected Computers and Meshes with Multiple Broadcasting Ion Stoica Abstract

on Mesh Connected Computers and Meshes with Multiple Broadcasting Ion Stoica Abstract Time-Optimal Algorithms for Generalized Dominance Computation and Related Problems on Mesh Connected Computers and Meshes with Multiple Broadcasting Ion Stoica Abstract The generalized dominance computation

More information

A novel technique for fast multiplication

A novel technique for fast multiplication INT. J. ELECTRONICS, 1999, VOL. 86, NO. 1, 67± 77 A novel technique for fast multiplication SADIQ M. SAIT², AAMIR A. FAROOQUI GERHARD F. BECKHOFF and In this paper we present the design of a new high-speed

More information

VARUN AGGARWAL

VARUN AGGARWAL ECE 645 PROJECT SPECIFICATION -------------- Design A Microprocessor Functional Unit Able To Perform Multiplication & Division Professor: Students: KRIS GAJ LUU PHAM VARUN AGGARWAL GMU Mar. 2002 CONTENTS

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Many ways to build logic out of MOSFETs

Many ways to build logic out of MOSFETs Many ways to build logic out of MOSFETs pass transistor logic (most similar to the first switch logic we saw) static CMOS logic (what we saw last time) dynamic CMOS logic Clock=0 precharges X through the

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, K.S.R College of Engineering, Tiruchengode, Tamilnadu,

More information

Design of Parallel Self-Timed Adder

Design of Parallel Self-Timed Adder Design of Parallel Self-Timed Adder P.S.PAWAR 1, K.N.KASAT 2 1PG, Dept of EEE, PRMCEAM, Badnera, Amravati, MS, India. 2Assistant Professor, Dept of EXTC, PRMCEAM, Badnera, Amravati, MS, India. ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Fast Arithmetic. Philipp Koehn. 19 October 2016

Fast Arithmetic. Philipp Koehn. 19 October 2016 Fast Arithmetic Philipp Koehn 19 October 2016 1 arithmetic Addition (Immediate) 2 Load immediately one number (s0 = 2) li $s0, 2 Add 4 ($s1 = $s0 + 4 = 6) addi $s1, $s0, 4 Subtract 3 ($s2 = $s1-3 = 3)

More information

Chapter 4. Combinational Logic

Chapter 4. Combinational Logic Chapter 4. Combinational Logic Tong In Oh 1 4.1 Introduction Combinational logic: Logic gates Output determined from only the present combination of inputs Specified by a set of Boolean functions Sequential

More information

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline Today s Outline San Jose State University EE176-SJSU Computer Architecture and Organization Lecture 5 HDL, ALU, Shifter, Booth Algorithm Multiplier & Divider Instructor: Christopher H. Pham Review of Last

More information

END-TERM EXAMINATION

END-TERM EXAMINATION (Please Write your Exam Roll No. immediately) END-TERM EXAMINATION DECEMBER 2006 Exam. Roll No... Exam Series code: 100919DEC06200963 Paper Code: MCA-103 Subject: Digital Electronics Time: 3 Hours Maximum

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

A Scalable Architecture for Montgomery Multiplication

A Scalable Architecture for Montgomery Multiplication A Scalable Architecture for Montgomery Multiplication Alexandre F. Tenca and Çetin K. Koç Electrical & Computer Engineering Oregon State University, Corvallis, Oregon 97331 {tenca,koc}@ece.orst.edu Abstract.

More information

Computer Architecture: Part III. First Semester 2013 Department of Computer Science Faculty of Science Chiang Mai University

Computer Architecture: Part III. First Semester 2013 Department of Computer Science Faculty of Science Chiang Mai University Computer Architecture: Part III First Semester 2013 Department of Computer Science Faculty of Science Chiang Mai University Outline Decoders Multiplexers Registers Shift Registers Binary Counters Memory

More information

Area And Power Optimized One-Dimensional Median Filter

Area And Power Optimized One-Dimensional Median Filter Area And Power Optimized One-Dimensional Median Filter P. Premalatha, Ms. P. Karthika Rani, M.E., PG Scholar, Assistant Professor, PA College of Engineering and Technology, PA College of Engineering and

More information

Logic and Computer Design Fundamentals. Chapter 2 Combinational Logic Circuits. Part 3 Additional Gates and Circuits

Logic and Computer Design Fundamentals. Chapter 2 Combinational Logic Circuits. Part 3 Additional Gates and Circuits Logic and Computer Design Fundamentals Chapter 2 Combinational Logic Circuits Part 3 Additional Gates and Circuits Charles Kime & Thomas Kaminski 28 Pearson Education, Inc. (Hyperlinks are active in View

More information

Dynamic CMOS Logic Gate

Dynamic CMOS Logic Gate Dynamic CMOS Logic Gate In dynamic CMOS logic a single clock can be used to accomplish both the precharge and evaluation operations When is low, PMOS pre-charge transistor Mp charges Vout to Vdd, since

More information

Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012

Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012 Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012 1 Digital vs Analog Digital signals are binary; analog

More information

On Designs of Radix Converters Using Arithmetic Decompositions

On Designs of Radix Converters Using Arithmetic Decompositions On Designs of Radix Converters Using Arithmetic Decompositions Yukihiro Iguchi 1 Tsutomu Sasao Munehiro Matsuura 1 Dept. of Computer Science, Meiji University, Kawasaki 1-51, Japan Dept. of Computer Science

More information

A Universal Test Pattern Generator for DDR SDRAM *

A Universal Test Pattern Generator for DDR SDRAM * A Universal Test Pattern Generator for DDR SDRAM * Wei-Lun Wang ( ) Department of Electronic Engineering Cheng Shiu Institute of Technology Kaohsiung, Taiwan, R.O.C. wlwang@cc.csit.edu.tw used to detect

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

A Quadruple Precision and Dual Double Precision Floating-Point Multiplier

A Quadruple Precision and Dual Double Precision Floating-Point Multiplier A Quadruple Precision and Dual Double Precision Floating-Point Multiplier Ahmet Akkaş Computer Engineering Department Koç University 3445 Sarıyer, İstanbul, Turkey ahakkas@kuedutr Michael J Schulte Department

More information

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

More information

Arithmetic Logic Unit. Digital Computer Design

Arithmetic Logic Unit. Digital Computer Design Arithmetic Logic Unit Digital Computer Design Arithmetic Circuits Arithmetic circuits are the central building blocks of computers. Computers and digital logic perform many arithmetic functions: addition,

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

Model EXAM Question Bank

Model EXAM Question Bank VELAMMAL COLLEGE OF ENGINEERING AND TECHNOLOGY, MADURAI Department of Information Technology Model Exam -1 1. List the main difference between PLA and PAL. PLA: Both AND and OR arrays are programmable

More information

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications Metin Mete Özbilen 1 and Mustafa Gök 2 1 Mersin University, Engineering Faculty, Department of Computer Science,

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Enhancement of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Sushmita Bilani Department of Electronics and Communication (Embedded System & VLSI Design),

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Combinational Logic Martha. Kim Columbia University Spring 6 / Combinational Circuits Combinational circuits are stateless. Their output is a function only of the current

More information

Floating Point Square Root under HUB Format

Floating Point Square Root under HUB Format Floating Point Square Root under HUB Format Julio Villalba-Moreno Dept. of Computer Architecture University of Malaga Malaga, SPAIN jvillalba@uma.es Javier Hormigo Dept. of Computer Architecture University

More information

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS NUNAVATH.VENNELA (1), A.VIKAS (2) P.G.Scholor (VLSI SYSTEM DESIGN),TKR COLLEGE OF ENGINEERING (1) M.TECHASSISTANT

More information

Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation

Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation Y.Gowthami PG Scholar, Dept of ECE, MJR College of Engineering & Technology,

More information

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Reference Sheet for C112 Hardware

Reference Sheet for C112 Hardware Reference Sheet for C112 Hardware 1 Boolean Algebra, Gates and Circuits Autumn 2016 Basic Operators Precedence : (strongest),, + (weakest). AND A B R 0 0 0 0 1 0 1 0 0 1 1 1 OR + A B R 0 0 0 0 1 1 1 0

More information

Combinational Logic. Prof. Wangrok Oh. Dept. of Information Communications Eng. Chungnam National University. Prof. Wangrok Oh(CNU) 1 / 93

Combinational Logic. Prof. Wangrok Oh. Dept. of Information Communications Eng. Chungnam National University. Prof. Wangrok Oh(CNU) 1 / 93 Combinational Logic Prof. Wangrok Oh Dept. of Information Communications Eng. Chungnam National University Prof. Wangrok Oh(CNU) / 93 Overview Introduction 2 Combinational Circuits 3 Analysis Procedure

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

High-Performance Full Adders Using an Alternative Logic Structure

High-Performance Full Adders Using an Alternative Logic Structure Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,

More information

Performance of Constant Addition Using Enhanced Flagged Binary Adder

Performance of Constant Addition Using Enhanced Flagged Binary Adder Performance of Constant Addition Using Enhanced Flagged Binary Adder Sangeetha A UG Student, Department of Electronics and Communication Engineering Bannari Amman Institute of Technology, Sathyamangalam,

More information

Design of a Multiplier for Similar Base Numbers Without Converting Base Using a Data Oriented Memory

Design of a Multiplier for Similar Base Numbers Without Converting Base Using a Data Oriented Memory Journal of Computer & Robotics 7 (1), 2014 37-50 37 Design of a Multiplier for Similar Base Numbers Without Converting Base Using a Data Oriented Memory Majid Jafari *, Ali Broumandnia, Navid Habibi, Shahab

More information

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and Parallel Processing Letters c World Scientific Publishing Company A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS DANNY KRIZANC Department of Computer Science, University of Rochester

More information

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3 EECS15 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3 October 8, 22 John Wawrzynek Fall 22 EECS15 - Lec13-cla3 Page 1 Multiplication a 3 a 2 a 1 a Multiplicand b 3 b 2 b

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

ECE 331: N0. Professor Andrew Mason Michigan State University. Opening Remarks

ECE 331: N0. Professor Andrew Mason Michigan State University. Opening Remarks ECE 331: N0 ECE230 Review Professor Andrew Mason Michigan State University Spring 2013 1.1 Announcements Opening Remarks HW1 due next Mon Labs begin in week 4 No class next-next Mon MLK Day ECE230 Review

More information

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005 Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 200 Multiply We will start with unsigned multiply and contrast how humans and computers multiply Layout 8-bit 8 Pipelined Multiplier 1 2

More information

Extension and VLSI Implementation of the Majority-Gate Algorithm for Gray-Scale Morphological Operations

Extension and VLSI Implementation of the Majority-Gate Algorithm for Gray-Scale Morphological Operations Extension and VLSI Implementation of the Majority-Gate Algorithm for Gray-Scale Morphological Operations A Gasteratos, I Andreadis and Ph Tsalides Laboratory of Electronics Section of Electronics and Information

More information

COMP 303 Computer Architecture Lecture 6

COMP 303 Computer Architecture Lecture 6 COMP 303 Computer Architecture Lecture 6 MULTIPLY (unsigned) Paper and pencil example (unsigned): Multiplicand 1000 = 8 Multiplier x 1001 = 9 1000 0000 0000 1000 Product 01001000 = 72 n bits x n bits =

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI CHAPTER 2 ARRAY SUBSYSTEMS [2.4-2.9] MANJARI S. KULKARNI OVERVIEW Array classification Non volatile memory Design and Layout Read-Only Memory (ROM) Pseudo nmos and NAND ROMs Programmable ROMS PROMS, EPROMs,

More information

Jan Rabaey Homework # 7 Solutions EECS141

Jan Rabaey Homework # 7 Solutions EECS141 UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Last modified on March 30, 2004 by Gang Zhou (zgang@eecs.berkeley.edu) Jan Rabaey Homework # 7

More information

Digital Logic Design Exercises. Assignment 1

Digital Logic Design Exercises. Assignment 1 Assignment 1 For Exercises 1-5, match the following numbers with their definition A Number Natural number C Integer number D Negative number E Rational number 1 A unit of an abstract mathematical system

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS MS. PRITI S. KAPSE 1, DR.

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures 1 Suresh Sharma, 2 T S B Sudarshan 1 Student, Computer Science & Engineering, IIT, Khragpur 2 Assistant

More information

ADDERS AND MULTIPLIERS

ADDERS AND MULTIPLIERS Concordia University FLOATING POINT ADDERS AND MULTIPLIERS 1 Concordia University Lecture #4 In this lecture we will go over the following concepts: 1) Floating Point Number representation 2) Accuracy

More information

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography 118 IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.1, January 2013 An Optimized Montgomery Modular Multiplication Algorithm for Cryptography G.Narmadha 1 Asst.Prof /ECE,

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2 ASIC Implementation and Comparison of Diminished-one Modulo 2 n +1 Adder Raj Kishore Kumar 1, Vikram Kumar 2 1 Shivalik Institute of Engineering & Technology 2 Assistant Professor, Shivalik Institute of

More information

A Comparative Study of Power Efficient SRAM Designs

A Comparative Study of Power Efficient SRAM Designs A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,

More information

Fixed-Width Recursive Multipliers

Fixed-Width Recursive Multipliers Fixed-Width Recursive Multipliers Presented by: Kevin Biswas Supervisors: Dr. M. Ahmadi Dr. H. Wu Department of Electrical and Computer Engineering University of Windsor Motivation & Objectives Outline

More information

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques B.Bharathi 1, C.V.Subhaskar Reddy 2 1 DEPARTMENT OF ECE, S.R.E.C, NANDYAL 2 ASSOCIATE PROFESSOR, S.R.E.C, NANDYAL.

More information

Microcomputers. Outline. Number Systems and Digital Logic Review

Microcomputers. Outline. Number Systems and Digital Logic Review Microcomputers Number Systems and Digital Logic Review Lecture 1-1 Outline Number systems and formats Common number systems Base Conversion Integer representation Signed integer representation Binary coded

More information