- PDF Free Download

Size: px

Start display at page:

Download ""

Job Carpenter
6 years ago
Views:

1 This article was origally published a journal published by Elsevier, and the attached copy is provided by Elsevier for the author s benefit and for the benefit of the author s stitution, for non-commercial research and educational use cludg without limitation use struction at your stitution, sendg it to specific colleagues that you know, and providg a copy to your stitution s admistrator. All other uses, reproduction and distribution, cludg without limitation commercial reprts, sellg or licensg copies or access, or postg on open ternet sites, your personal or stitution s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier s permissions site at:

2 Microelectronic Engeerg 84 (2007) Design and characterization of NULL convention arithmetic logic units Abstract Satish K. Bandapati, Scott C. Smith * University of Missouri Rolla, Department of Electrical and Computer Engeerg, 33 Emerson Electric Co. Hall, 870 Mer Circle, Rolla, MO 6409, United States Available onle 20 March 2006 In this paper, a number of 4-bit, 8-operation arithmetic logic units (ALUs) are designed usg the delay-sensitive NULL convention logic paradigm, and are characterized terms of speed and area. Both dual-rail and quad-rail, pipeled and non-pipeled versions are developed, and the tradeoffs and design considerations for each are discussed. Comparg the various architectures shows that the fastest dual-rail and quad-rail ALUs achieve average speedups of.72 and.9, respectively, over their non-pipeled counterparts, while requirg 33% and 9% more area, respectively. Overall, the dual-rail designs are both faster and require less area than their respective quad-rail counterparts; however, the quad-rail versions are expected to consume less power. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Asynchronous logic design; Delay-sensitive circuits; Self-timed circuits; Computer arithmetic; Arithmetic logic unit. Introduction For the last two decades, digital design has focused primarily on synchronous, clocked architectures. However, because clock rates have significantly creased while feature size has decreased, clock skew has become a major problem. To achieve acceptable skew, high-performance chips must dedicate creasgly larger portions of their area to clock drivers, thus dissipatg creasgly higher power, especially at the clock edge, when switchg is the most prevalent. As this trend contues, the clock is becomg more difficult to manage, causg renewed terest asynchronous digital design. Researchers have demonstrated that correctby-construction asynchronous paradigms, particularly NULL convention logic (NCL), require less power, generate less noise, produce less electromagnetic terference, and allow for easier reuse of components than their synchronous counterparts, without compromisg performance []. Furthermore, these paradigms should allow much greater flexibility the design of complex circuits such as systems-on-a-chip (SoCs). Because these circuits are delay-sensitive, they should drastically reduce the effort required to ensure correct operation under all timg scenarios, compared to equivalent synchronous designs. Also, the self-timed nature of correct-by-construction SoCs should allow designers to reuse previously designed and verified functional blocks subsequent designs, without significant modifications or retimg effort with a reused functional block. Such SoCs may also provide for simpler terfacg between the digital core and nontraditional functional blocks. NCL technology is currently beg utilized to design low-power, low-emi ASICs for use mobile electronics and smart cards, and is beg used the design of soft processor cores for use SoCs. The itial version of the Motorola STAR08 processor usg NCL technology shows a 40% reduction power and a 0 db reduction noise over its clocked Boolean counterpart, while operatg at a comparable frequency []. One of the first tasks necessary to help tegrate NCL to the semiconductor design dustry is to develop and characterize the key components of a reusable design library. Of fundamental importance are arithmetic circuits, cludg the ALUs described this paper, as well as the * Correspondg author. Tel.: ; fax: address: smithsco@umr.edu (S.C. Smith) /$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:0.06/j.mee

3 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) multipliers [2], MACs [3], divider [4], and counters [] described elsewhere. An overview of the NULL Convention Logic (NCL) paradigm is provided the followg section. 2. Overview of NCL NCL offers a self-timed logic paradigm where control is herent with each datum. NCL follows the so-called weak conditions of Seitz s delay-sensitive signalg scheme [6]. As with other self-timed logic methods discussed here, the NCL paradigm assumes that forks wires are isochronic [7], meang that the wire delays are much less than the logic element delays with a component, which is a valid assumption even future nanometer technologies. Wires connectg components do not have to adhere to the isochronic fork assumption. The origs of various aspects of the paradigm, cludg the NULL (or spacer) logic state from which NCL derives its name, can be traced back to Muller s work on speed-dependent circuits the 90s and 960s [8]. 2.. Delay-sensitivity NCL uses symbolic completeness of expression [9] to achieve delay-sensitive behavior. A symbolically complete expression is defed as an expression that only depends on the relationships of the symbols present the expression without a reference to their time of evaluation. In particular, dual-rail signals, quad-rail signals, or other Mutually Exclusive Assertion Groups (MEAGs) can be used to corporate data and control formation to one mixed signal path to elimate time reference [0]. A dual-rail signal, D, consists of two wires, D 0 and D, which may assume any value from the set {DATA0, DATA, NULL}. The DATA0 state (D 0 =, D = 0) corresponds to a Boolean logic 0, the DATA state (D 0 =0, D =) corresponds to a Boolean logic, and the NULL state (D 0 =0, D = 0) corresponds to the empty set meang that the value of D is not yet available. The two rails are mutually exclusive, such that both rails can never be asserted simultaneously; this state is defed as an illegal state. A quad-rail signal, Q, consists of four wires, Q 0, Q, Q 2, and Q 3, which may assume any value from the set {DATA0, DATA, DATA2, DATA3, NULL}. The DATA0 state (Q 0 =, Q =0, Q 2 =0, Q 3 = 0) corresponds to two Boolean logic signals, X and Y, where X = 0 and Y = 0. The DATA state (Q 0 =0, Q =, Q 2 =0, Q 3 = 0) corresponds to X = 0 and Y =. The DATA2 state (Q 0 =0, Q =0, Q 2 =, Q 3 = 0) corresponds to X = and Y = 0. The DATA3 state (Q 0 =0, Q =0, Q 2 =0, Q 3 = ) corresponds to X = and Y =, and the NULL state (Q 0 =0,Q =0,Q 2 =0,Q 3 = 0) corresponds to the empty set meang that the result is not yet available. The four rails of a quad-rail NCL signal are mutually exclusive, such that no two rails can ever be asserted simultaneously; these states are defed as illegal states. Both dual-rail and quad-rail signals are space optimal -out-of-n delay-sensitive codes, requirg two wires per bit. Other higher order MEAGs are not typically wire count optimal; however, they can be more power efficient due to the decreased number of transitions per cycle. Most multi-rail delay-sensitive systems, cludg NCL, have at least two register stages, one at both the put and at the output. Two adjacent register stages teract through their request and acknowledge les, K i and, respectively, to prevent the current DATA wavefront from overwritg the previous DATA wavefront, by ensurg that the two DATA wavefronts are always separated by a NULL wavefront Logic gates NCL differs from many other delay-sensitive paradigms that these other paradigms only utilize one type of state-holdg gate, the C-element [8]. A C-element behaves as follows: when all puts assume the same value then the output assumes this value, otherwise the output does not change. On the other hand, all NCL gates are state-holdg. Thus, NCL optimization methods can be considered as a subclass of the techniques for developg delay-sensitive circuits usg a pre-defed set of more complex components, with built- hysteresis behavior. NCL uses threshold gates for its basic logic elements []. The primary type of threshold gate is the THmn gate, where 6 m 6 n, as depicted Fig.. THmn gates have n-puts. At least m of the n-puts must be asserted before the output will become asserted. Because NCL threshold gates are designed with hysteresis, all asserted puts must be de-asserted before the output will be de-asserted. Hysteresis ensures a complete transition of puts back to NULL before assertg the output associated with the next wavefront of put data. Therefore, a THnn gate is equivalent to an n-put C-element and a THn gate is equivalent to an n-put OR gate. In a THmn gate, each of the n-puts is connected to the rounded portion of the gate; the output emanates from the poted end of the gate; and the gate s threshold value, m, is written side of the gate. NCL threshold gates may also clude a reset put to itialize the output. Resetable gates are denoted by either a D or an N appearg side the gate, along with the gate s threshold, referrg to the gate beg reset to logic or logic 0, respectively. By employg threshold gates for each logic rail, NCL is able to determe the output status without referencg time. Inputs are partitioned to two separate wavefronts, Input Input 2 Input n m Fig.. THmn threshold gate. Output

4 282 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) the NULL wavefront and the DATA wavefront. The NULL wavefront consists of all puts to a circuit beg NULL, while the DATA wavefront refers to all puts beg DATA, some combation of DATA0 and DATA. Initially, all circuit elements are reset to the NULL state. First, a DATA wavefront is presented to the circuit. Once all of the outputs of the circuit transition to DATA, the NULL wavefront is presented to the circuit. Once all of the outputs of the circuit transition to NULL, the next DATA wavefront is presented to the circuit. This DATA/NULL cycle contues repeatedly. As soon as all outputs of the circuit are DATA, the circuit s result is valid. The NULL wavefront then transitions all of these DATA outputs back to NULL. When they transition back to DATA aga, the next output is available. This period is referred to as the DATA-to-DATA cycle time, denoted as T DD, and has an analogous role to the clock period a synchronous system Completeness of put The completeness of put criterion [9], which NCL combational circuits and circuits developed from other delay-sensitive paradigms must mata order to be delay-sensitive, requires that:. all the outputs of a combational circuit may not transition from NULL to DATA until all puts have transitioned from NULL to DATA, and 2. all the outputs of a combational circuit may not transition from DATA to NULL until all puts have transitioned from DATA to NULL. In circuits with multiple outputs, it is acceptable, accordg to Seitz s weak conditions [6], for some of the outputs to transition without havg a complete put set present, as long as all outputs cannot transition before all puts arrive Observability There is one more condition that must be met to ensure delay-sensitivity for NCL circuits and other delay-sensitive circuits. No orphans may propagate through a gate [2]. An orphan is defed as a wire that transitions durg the current DATA wavefront, but is not used the determation of the output. Orphans are caused by wire forks and can be neglected through the isochronic fork assumption [7], as long as they are not allowed to cross a gate boundary. This observability condition, also referred to as dicatability or stability, ensures that every gate transition is observable at the output, which means that every gate that transitions is necessary to transition at least one of the outputs. 3. Dual-rail ALU A block diagram of the dual-rail 4-bit ALU is shown Fig. 2. S selects which operation is to be performed on the C /B A(3:0) B(3:0) 4-bit puts, A and B, and the carry/borrow, C /B, determed by the function table Fig. 3. F is the 4-bit output and C out /B out is the carry/borrow output. This circuit, as do all NCL systems, contas a complete request/acknowledge terface, cludg for requestg the puts, and K i for acknowledgg the outputs, and a reset put to itialize the NCL registers to NULL. Notice that C /B is not used operations 0 3, and that B is not used operations 3. However, to ensure delay-sensitivity the ALU must still be put-complete with respect to these puts, even for the operations where the C /B and/or B put(s) are not used. This ensures that the unused puts are received before the output can transition. 3.. Non-pipeled version S(2:0) ALU reset F(3:0) C out /B out Fig. 2. Dual-rail ALU block diagram. The logic diagram of the non-pipeled dual-rail ALU is shown Fig. 4. It consists of dual-rail registers [9], completion components, denoted as [9], a Convert to MEAG function, a Demultiplexer, NCL OR, AND, XOR, vert, shift right, and shift left functions, a ripplecarry subtractor and adder, two Multiplexers, and Carry Logic. The Convert to MEAG function is comprised of eight TH33 gates that convert S, which consists of three dual-rail signals, to an 8-rail MEAG. The OR, AND, and XOR function are all put-complete versions [], consistg of two gates and one gate delay. The vert, shift right, and shift left function are performed by renamg signals, hence they have no logic delay. The ripple-carry subtracter and adder consist of four full-adders [3,4], while the K i S 2 S S 0 F C out/b out A OR B A AND B A XOR B 0 0 NOT A SHR C, A A 0 0 SHL A, C A 3 0 A-B-+B B out A+B+C C out Fig. 3. ALU function table.

5 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) reset A(3:0) B(3:0) C /B 9-bit Dual-Rail Register Multiplexers consist of TH4 and TH2 gates, which OR each rail of the demultiplexed results together to form a sgle result. The Carry Logic is required to generate a C out of logic 0, when operation 0 3 is chosen, and to ensure put-completeness with respect to C these cases, order to mata delay-sensitivity, as shown Fig.. The Demultiplexer consists of TH22 gates, which pass the A put, and the B and C /B puts, when necessary, to the correct function, determed by the select MEAG. For the functions which B is not required, the Demultiplexer consists of TH34 gates to pass A, while matag put-completeness with respect to B. Input-completeness of C /B is ensured the carry logic. The alternative is to always pass B to every function, whether it is needed or not, and then ensure put-completeness of B with functions 3. However, this would require an extra gate delay for these three functions, which reduced average throughput by 3% and required an additional 24 gates; hence this alternative was not chosen. 9 Demultiplexer 3 S(2:0) 3-bit Dual-Rail Register AB AB A B A C A AC AB BA B OR AND XOR NOT SHR SHL SUB ADD FF FF F F C out C out B out F C out F C i C i 0 C /B 0 C /B K i Multiplexer -bit Dual-Rail Register F(3:0) C out /B out Multiplexer C i C S 0 /B 2 Carry Logic Fig. 4. Logic diagram of dual-rail non-pipeled ALU Pipeled version Convert to 8-rail MEAG The terface for the pipeled dual-rail ALU is the same as for the non-pipeled version, shown Fig. 2; the logic diagram is shown Fig. 6. To pipele the ALU, registration stages must be added with the combational logic of the non-pipeled version, without violatg the completeness of put criterion [9]. To mimize both delay and area, many embedded registration stages were used, where the combational logic and registration are combed together (i.e. the Convert to MEAG Register, the Carry MEAG Register, the Demultiplexer Register, the Select (Sel) Registers, and the Multiplexer Register). Fig. 7 shows a 2-to- multiplexer with registered output; while Fig. 8 shows a multiplexer havg the same functionality but utilizg embedded registration, where the registration functionality becomes part of the multiplexer s combational logic. Note that utilizg embedded registration this case results a decrease gate delay for the generation of the output, F, and a reduction of 2 gate delays for the generation of the completion signal,. It also requires 2 fewer gates, but because of the larger gate sizes, it requires 4 additional transistors (00 vs. 96). The major differences between the pipeled and nonpipeled designs are the put-completeness of the B and C /B puts, and the subtractor and adder circuits. Sce a registration stage was to be added between the Demultiplexer and the functions, an extra level of logic would have been required for this registration if put-completeness of S Fig.. Carry logic. C o C o 0 C o C

6 284 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) reset D D 0 D0 D K i A(3:0) B(3:0) C /B S(2:0) 9 9-bit Dual-Rail Register Convert to MEAG Register Special Carry MEAG Demultiplexer Register Register ABABAB AB C BB A AC AB B ACB OR AND XOR NOT SHR SHL Pipeled Pipeled FF FF F F C out C out Subtractor Adder B out F C out F 4-bit Sel Register bit Sel Register 4 4-bit Sel Register 4 F(3:0) 4-bit Sel Register -bit Sel Register Multiplexer Register -bit Sel Register -bit Sel Register C out /B out Fig. 6. Logic diagram of dual-rail pipeled ALU. -bit Sel Register 3n 2 rst 3n S S 0 MUX K i register F 0 F MEAG Register MEAG Register Fig. 7. Multiplexer without embedded registration. 2n 2n 8 D D 0 D0 D0 0 rst 8 S S 0 K i Special 3n 3n F 0 F B was performed with the demultiplexer, as was done for the non-pipeled design, sce this required maximum 4-put, TH34 gates; thus excludg the possibility of embedded registration due to the needed additional request put and the 4-put limitation. Therefore, the alternative to pass B to functions 3 and then ensure put-completeness Fig. 8. Multiplexer with embedded registration. of B with these functions was chosen, sce this option only required 2-put, TH22 gates, for the demultiplexer, thus enablg the extra request put to be added, changg

7 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) Demultiplexer A DATA Circuit # NULL A Multiplexer Input Reset Completion Detection DATA the TH22 gates to TH33 gates, order to embed the registration stage with the combational logic of the demultiplexer. Also, both the Demultiplexer Register and the Select Register require special completion components, sce the puts are only sent to one out of the eight output sets, whereas normally all outputs transition every DATA/ NULL cycle. Input-completeness of the C /B put was ensured with the Carry MEAG register, stead of at the end of the design, like the non-pipeled version, sce this would have required unnecessarily pipelg C /B throughout the design. This allowed for the carry output of zero to be generated from the F 0 output of functions 0 3; hence only one multiplexer component is shown the diagram. Also, the adder and subtractor circuits have themselves been pipeled [3,4], by sertg two registration stages between the four full-adders and utilizg bitwise completion [3,4], where the completion signal from bit b register i is only sent back to the bits register i that took part the calculation of bit b. A third registration stage could have been serted, however this did not crease throughput. Fally, the select MEAG was pipeled by addg two additional registration stages, after the Carry MEAG Register, such that the ALU could store the select signals associated with the concurrent DATA/ NULL wavefronts beg processed with the ALU. Both more and less MEAG pipelg stages decreased throughput. These pipelg optimizations resulted a 32% crease average throughput, with a % crease area NULL cycle reduced version NULL Sce pipelg the ALU resulted more than twice the area of the non-pipeled design, with a speedup of only.32, a viable alternative was to use the NULL cycle reduction (NCR) technique [3,] on the non-pipeled design rfn S S2 Circuit #2 DATA S S2 S2 S Sequencer # Sequencer #2 D Reset Reset rfn B 2 rfn Reset to NULL Reset to NULL Fig. 9. NCR architecture. Reset rfn to crease its throughput. As shown Fig. 9, NCR demultiplexes successive put wavefronts such that one circuit processes a DATA wavefront, while its duplicate processes a NULL wavefront. The first DATA/NULL cycle flows through the origal circuit, while the next DATA/NULL cycle flows through the duplicate circuit. The outputs of the two circuits are then multiplexed to form a sgle output stream. The application of NCR to the non-pipeled design resulted a 63% crease average throughput, with only a 36% crease area, thus the NCR version was both faster than the pipeled version and it required less area. To further crease throughput and decrease area, embedded registration was applied to the non-pipeled design. This resulted the Convert to MEAG function becomg an embedded register, replacg the 3-bit put register, and the multiplexer for F and the Carry Logic function becomg embedded registers, replacg the -bit output register. This optimization resulted a speedup of.8 over the origal non-pipeled design, and a 2% decrease area. Now NCR could be applied to the nonpipeled design utilizg embedded registration, resultg a speedup of.72 with only a 33% crease area, verses the origal non-pipeled design. 4. Quad-rail ALU B DATA Output The terface for the quad-rail ALU is similar to that of the dual-rail ALU, shown Fig. 2, except that A, B, and F now consist of two quad-rail signals, stead of four dualrail signals, and the select put, S, consists of one dual-rail signal, S 2, and one quad-rail signal, which represents S(:0). The logic diagram for the quad-rail ALU is also very similar to that of the dual-rail version, shown Fig. 4. It consists of both quad-rail and dual-rail registers [3], completion components, a Convert to MEAG function, a Demultiplexer, NCL OR, AND, XOR, vert, shift right, D

8 286 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) and shift left functions, a ripple-carry subtractor and adder, two Multiplexers, and Carry Logic. The ma difference is that each component and each function was designed usg quad-rail logic stead of dual-rail logic. This results ripple-carry subtractor and adder circuits consistg of only two quad-rail adders (two quad-rail signals plus a dual-rail carry put, yieldg a quad-rail sum and a dual-rail carry output) [2]. The other major difference was that B had to be passed to all functions through the demultiplexer and put-completeness of B had to be ensured with functions 3, due to the quad-rail logic and 4-put limitation for gates. Input-completeness of C /B was ensured the same manner as for the dual-rail design. Pipelg the quad-rail design followed much the same manner as the dual-rail design, resultg a 34% crease average throughput, with a 6% crease area. Aga, NCR was applied to the non-pipeled design, resultg a throughput crease of 3% with a 9% crease area. Like the dual-rail design, embedded registration was applied to the non-pipeled design, resultg a 0% crease throughput with a mimal crease area; and NCR was applied to the non-pipeled design with embedded registration, resultg a speedup of.9 with an area crease of 9%, verses the origal non-pipeled design.. Simulation results To assess the performance of alternate designs, Mentor Graphics ModelSim tool was used to simulate structural VHDL implementations of the circuits to generate their timg characteristics. The average risg and fallg time for each NCL gate was determed through Cadence simulation of a 0. lm CMOS process operatg at 3.3 V; and this formation was then stored a VHDL library to be used the simulations. Table summarizes the characterization of the various ALUs discussed here, terms of both speed and area. Gate count can be used as one measure of area for comparison purposes; however sce the NCL gates vary greatly size (i.e. from 2 transistors Table ALU characterization ALU architecture Gate count Transistor count T DD Dual-rail non-pipeled Dual-rail pipeled Dual-rail NCR Dual-rail non-pipeled with embedded registration Dual-rail NCR with embedded registration Quad-rail non-pipeled Quad-rail pipeled Quad-rail NCR Quad-rail non-pipeled with embedded registration Quad-rail NCR with embedded registration (ns) for an verter to 26 transistors for a TH24 gate), transistor count provides a better means of comparison, especially when embedded registration is used, sce this creases transistor count without creasg gate count. Also, sce NCL circuits are delay-sensitive, speed is data dependent; therefore average cycle time, T DD, is calculated and used for comparison. T DD is calculated as the arithmetic mean of the cycle times correspondg to all 4096 possible pairs of put operands. 6. Conclusions Comparg the various architectures shows that the dualrail versions of all designs outperform their quad-rail counterparts terms of both speed and area. However, the quadrail designs are expected to consume less power, due to the fact that only one quad-rail signal transitions for every two dual-rail signals that transition. The reason that pipelg the designs did not further crease throughput is due to the long completion delays the special completion components required for both the Demultiplexer Register and Select Registers, described Section 3.2. However, the application of embedded registration to the non-pipeled design, followed by applyg NCR yielded a significant additional crease throughput over the origal nonpipeled design, verses the throughput crease achieved by pipelg the design for both dual-rail and quad-rail architectures. Furthermore, this NCR approach required less area than pipelg for the dual-rail architecture, and only slightly more area than pipelg for the quad-rail architecture. References [] J. McCardle, D. Chester, Measurg an asynchronous processor s power and noise, : Proceedgs of the Synopsys User Group Conference, Boston, 200. [2] S.K. Bandapati, S.C. Smith, M. Choi, Design and characterization of NULL convention self-timed multipliers, IEEE Design and Test of Computers 30/6 (November December) (2003) (Special Issue on Clockless VLSI Design). [3] S.C. Smith, R.F. DeMara, J.S. Yuan, M. Hagedorn, D. Ferguson, NULL convention multiply and accumulate unit with conditional roundg, scalg, and saturation, Journal of Systems Architecture 47 (2) (2002) [4] S.C. Smith, Design of a NULL convention self-timed divider, Proceedgs of the 2004 International Conference on VLSI, June 2004, pp [] S.C. Smith, R.F. DeMara, J.S. Yuan, D. Ferguson, D. Lamb, Optimization of NULL convention self-timed circuits, Integration, The VLSI Journal 37 (3) (2004) 3 6. [6] C.L. Seitz, : System Timg Introduction to VLSI Systems, Addison-Wesley, Readg, MA, 980, pp [7] C.H. (Kees) van Berkel, M. Rem, R. Saeijs, VLSI Programmg, : Proceedgs of the 988 IEEE International Conference on Computer Design: VLSI Computers and Processors, 998, pp [8] D.E. Muller, Asynchronous logics and application to formation processg, : Switchg Theory Space technology, Stanford University Press, Stanford, CA, 963, pp [9] K.M. Fant, S.A. Brandt, NULL convention logic: a complete and consistent logic for asynchronous digital circuit synthesis, :

9 S.K. Bandapati, S.C. Smith / Microelectronic Engeerg 84 (2007) Proceedgs of the International Conference on Application Specific Systems, Architectures, and Processors, 996, pp [0] T. Verhoff, Delay-Insensitive Codes-An Overview, Distributed Computg 3 (988) 8. [] G.E. Sobelman, K.M. Fant, CMOS circuit design of threshold gates with hysteresis, : Proceedgs of the IEEE International Symposium on Circuits and Systems (II), 998, pp [2] A. ndratyev, L. Neukom, O. Roig, A. Taub, K. Fant, Checkg delay-sensitivity: 0 4 gates and beyond, : Proceedgs of the Eighth International Symposium on Asynchronous Circuits and Systems, 2002, pp [3] S.C. Smith, Gate, Throughput Optimizations for NULL Convention Self-Timed Digital Circuits, Ph.D. Dissertation, School of Electrical Engeerg and Computer Science, University of Central Florida, May 200. [4] S.C. Smith, R.F. DeMara, J.S. Yuan, M. Hagedorn, D. Ferguson, Delay-sensitive gate-level pipelg, Integration, The VLSI Journal 30 (2) (200) [] S.C. Smith, R.F. DeMara, J.S. Yuan, M. Hagedorn, D. Ferguson, Speedup of delay-sensitive digital systems usg NULL cycle reduction, : Proceedgs of the 0th International Workshop on Logic and Synthesis, June 200, pp

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering