Feedback Techniques for Dual-rail Self-timed Circuits

Size: px

Start display at page:

Download "Feedback Techniques for Dual-rail Self-timed Circuits"

Sharleen Sparks
6 years ago
Views:

1 This document is an author-formatted work. The definitive version for citation appears as: R. F. DeMara, A. Kejriwal, and J. R. Seeber, Feedback Techniques for Dual-Rail Self-Timed Circuits, in Proceedings of the 4 International Conference on VLSI (VLSI-4), pp , Las Vegas, Nevada, U.S.A., June 4, 4. Feedback Techniques for Dual-rail Self-timed Circuits Ronald F. DeMara Amit Kejriwal Jude Seeber Department of Electrical and Computer Engineering University of Central Florida Box 645, Orlando, FL Tel: demara@mail.ucf.edu Keywords: Asynchronous Circuit, Dual-rail Logic, NCL, Timer Register, Toggle Element Abstract Design techniques for time and space optimization of NCL feedback circuits are developed and compared. First, a -Register Stage method is employed to circulate DATA and wavefronts required to realize feedback in 4-bit binary timer case-study. While modular and adaptable, this approach requires a significant gate count to realize the feedback circuit that comprises 75% of the total gates required. The second methodology aims at reducing the feedback overhead by using the statemaintaining capacity inherent with each threshold logic gate s built-in hysteresis behavior. This methodology employs two characteristics: the circuit preserves its present state; and it keeps track of the number of requests for DATA so that it can determine the appropriate next state. This embedded approach can reduce the feedback gate count by nearly 5% and also DATA-to-DATA cycle time by % depending on the feedback scheme used. Finally, the above-explained methodologies are assessed in terms of their design tradeoffs.. Introduction Convention Logic (NCL) [] offers a self-timed logic paradigm where control is inherent with each datum. NCL follows the so-called weak conditions of Seitz s delay-insensitive signaling scheme []. As with other self-timed logic methods discussed herein, the NCL paradigm assumes that forks in wires are isochronic [, 4]. The origins of various aspects of the paradigm, including the (or spacer) logic state from which NCL derives its name, can be traced back to Muller s work on speed-independent circuits in the 95s and 96s [5]. NCL differs from the above-mentioned methods in that they only utilize one type of state-holding gate, the C-element [5]. On the other hand, all NCL gates are state-holding as described below. As shown in Figure, a dual-rail signal in NCL consists of two wires, Rail_ and Rail_, which may assume any value from the set {DATA, DATA, }. The DATA state (Rail_ =, Rail_ = ) corresponds to a Boolean logic, the DATA state (Rail_ =, Rail_ = ) corresponds to a Boolean logic, and the state (Rail_ =, Rail_ = ) meaning that the value is not yet available. The two rails are mutually exclusive, so that both rails can never be asserted simultaneously; this state is defined as an illegal state. DATA Rail Rail DATA Rail Rail Rail Rail Rail Rail Figure : NCL DATA and states. Undefined NCL circuits can be composed using a methodology that utilizes NCL threshold gates with hysteresis [6,7]. One type of threshold gate is the THmn gate, where m n, as depicted in Figure. A THmn gate corresponds to an operator with at least m signals asserted as its set condition and all signals de-asserted as its reset condition. THmn gates have n inputs. At least m of the n inputs must be asserted before the output will become asserted. Because threshold gates are designed with hysteresis, all asserted inputs must be de-asserted before the output will

2 be de-asserted. Hysteresis is used to provide a means for monotonic transitions and a complete transition of multirail inputs back to a state before asserting the output associated with the next wavefront of input data. In a THmn gate, each of the n inputs is connected to the rounded portion of the gate. The output emanates from the pointed end of the gate. The gate s threshold value, m, is written inside of the gate. Inputs n m output Figure : Threshold Gate with Hysteresis Karl Fant [] demonstrated an NCL-based binaryencoded up-counter design based on Ring Oscillator circuits. For instance, Figure depicts an arrangement of w Ring Oscillators to generate the required bit sequences for a w-bit-wide count value C where w=. The calculation of each bit C i is computed independently upon assertion of up-count control. Specifically, whenever the value of C is incremented, the required bit values for the corresponding sequence of C i for i w are recirculated within their respective ring. This approach generates the output sequence for each bit C i independently without requiring input from any elements that generate C j for j i. Although Figure realizes the bit sequence for a binary-encoded up-counter, this approach is readily generalized to arbitrary sequences using the corresponding arrangement of feedback shift registers. However, when such circuits employing feedback are designed in NCL, a primary concern is maintaining the proper flow of DATA and wavefronts such that neither wavefront is prematurely overwritten. A straightforward solution is to insert register stages that delineate the completion boundaries of the asynchronous subcircuits. However, the inclusion of additional register stages also increases the transistor count and the DATA-to-DATA cycle time. Thus, this paper develops alternative methods to provide NCL space and time optimizations in the presence of feedback while maintaining delay-insensitivity. This paper is organized into 6 sections. In Section, a baseline NCL 4-bit timer design using a -Register Stage feedback circuit is presented. Section defines the characteristics to be optimized over the baseline design. The consideration of these characteristics leads to development of a self-contained toggle element using NCL threshold gates in Section 4. Section 5 utilizes the toggle element in the alternative design methods which are simulated and compared using a case study where w=4. Applications requiring w>4 can be accommodated by extending the proposed techniques or cascading 4-bit stages. Section 6 identifies conclusions and potential implementation considerations.. Three-Register Stage Feedback. Design Approach The conventional -Register stage approach for controlling feedback includes the required up-counter combinational logic and register stages. This feedback network is shown in Figure 4. The use of three stages accommodates re-circulation of the DATA and wavefronts while prohibiting improper overwriting or deadlock scenarios possible with less than register stages []. Output C (MSB) Output C Output C (LSB) Figure : Ring Oscillator -bit Counter/Timer The output of the combinational circuit is fed to the input of the first register while the output of the third register is fedback to the combinational circuit. Initially, the register stages R, R, and R are initialized to DATA,, and, respectively. R will request for, R will request for DATA, and R will also request for DATA. This does not create the possibility of a lockup scenario since the required inputs are available for R, R, and R. Table lists the flow of wavefronts and signaling sequences for this configuration. Initial Operands Condition (if Input == Request) Register Operands Condition (if Input == Request) Register Operands Condition (if Input == Request) Register Operands Input DATA utput DATA Hold state DATA State State Request DATA R Input DATA DATA Output State DATA DATA Hold state DATA State Request DATA DATA DATA Input DATA DATA DATA R Output State Hold state DATA State Request DATA DATA

3 Table : -Register Signaling Sequence Combinational Logic Request for NCL Register Stage R DATA Request for DATA NCL Register Stage R Request for DATA NCL Register Stage Figure 4: -Register Stage Feedback Initialization. Block Diagram As shown in Figure 5, an NCL 4-bit timer/counter has the functionality of incrementing the present value of the Timer upon the arrival of each DATA cycle. The Timer Increment synchronization signal for the request for DATA or is controlled externally by asserting the request input, K i, of the register []. The convention used is that when K i is asserted then it denotes a request for DATA. When it K i is de-asserted then it denotes a request for. The w-bit count value C indicates the current timer value and K o is required to determine whether the DATA should be requested or should be requested. Reset sets the timer to the initial condition and zeroes the count value. Reset Ko Figure 5: NCL 4-bit Timer 4-bit Timer block diagram R C(:) Ki NCL Register NCL Register NCL Register Count (:) Figure 6: -Register Stage 4-Bit Timer. NCL Timer with -Register Stage Feedback As shown in Figure 6, the Reset signal sets the registers to the initial values of DATA,, for register stage, register stage, register stage respectively along with interconnections to the completion logic labeled COMP. The request, K i, for DATA or, is provided as the input to the circuit to regulate the timer operation. An AND gate is used to combine K i with the request from register stage. Thus, unless they match, the request to register stage will not change. This synchronizes the timer with the external control signal. The COMP logic detects the completion of the DATA or cycle for all bits in the corresponding stage of the circuit. The output is taken from the first register, though it could be similarly obtained from the output of any register stage..4 4-Bit Increment Circuit Using a K-Map for both Rail and Rail individually, the increment-by- circuit can be reduced to the following equations: Y().rail= X.rail Y().rail= X.rail Y().rail= X().rail. X().rail + X().rail. X().rail Y().rail= X().rail. X().rail + X().rail. X().rail Y().rail= X.rail ( X.rail + X.rail) + X.rail. X.rail. X.rail Y().rail= X.rail (X.rail + X.rail) + X.rail. X.rail. X.rail Y().rail= X().rail(X().rail + X().rail + X().rail) + X().rail. X().rail. X().rail. X().rail Y().rail= X().rail (X().rail + X().rail + X().rail) + X().rail. X().rail. X().rail. X().rail Increment Circuitry (:) (:) Stage Stage Stage (:) (:) The equations are realized using the circuit in Figure 7. Ko Ki Ko Ki Ko Ki Reset Reset to DATA Reset to Reset to COMP Ki COMP Ko

4 X X X X X X X Figure 7: X 4-Bit Timer Increment Circuit Table : Simulation Results for -Register Timer Gate Count Transistor Count Increment Register Total Increment Total Table shows design parameters and simulation results for the -Register stage 4-bit timer. Results are based on simulations performed using SPICE in a Cadence design environment with a.8µ technology library at. Volts. T DD denotes the DATA-to-DATA cycle time, which for this configuration was 5.5 ns. Observe that the gate count of the registers is.4-fold larger than the gate count of increment circuit itself. Thus, just to achieve the feedback regulating the DATA and wavefronts, 4 overhead gates are required in addition to the gates needed by the increment circuit. The overhead not only increases the cost but also increases T DD. These are the primary motivation for the alternative designs developed in the remainder of the paper. 4 4 T DD ns x () Threshold Gates THx () Used for the Circuit THwx (4) TH4wx () wx () Y Y Y Y Y Y Y Y threshold gates themselves. Embedded registration within the increment circuit requires that the increment circuit:. retains its present state, and. keeps the track of the number of request for DATA to determine the appropriate next state. These two objectives can both be achieved if each bit C i is generated by an appropriate state machine. The state machine that generates C i has i distinct states of which half will output a single bit with a value of and the other half will output a single bit with a value of. In particular, the state machine outputs a string s that is i long and then a string s that is i long. The state machine then resets and repeats its cycle, just as in the case of the Ring Oscillator design in Figure. Without using Feedback Registers, additional gates are used to detect when the output from C i transitions from to perform the reset of the state machine generating C i. It simultaneously triggers the transition to the next state in the state machine that generates C i+. By cascading these state machines, the complete sequence for all bits of C are generated while maintaining the overall integrity of the DATA and wavefronts. 4. Toggle Elements for Embedded Registration The basic building block for embedded registration in the NCL timer is the state machine that generates C. This circuit generates the sequence and thus acts as a Toggle Element. The NCL toggle element must consider both the wavefront and the DATA wavefronts. In particular, it will generate a DATA DATA cycle. Instead of feedback using registers, if the present state is retained internally along with the count of requests for DATA that are received, then the required behavior for a binary toggle element is achieved. 4. Toggle Element with Reset The circuit shown in Figure 8 can store the present state and also provide a mechanism to maintain the number of requests for DATA. The sequence of timing signals is listed in Table. After the cycle is complete, the circuit is required to begin a new cycle. Since the last output is a DATA output, the next is the request for, which indicates K i =. The reset circuitry in Figure 8 will be asserted since R = and K i =.. NCL Timer Without Feedback Registers Instead of inserting register stages for controlling feedback, an increment circuit is developed with embedded feedback using the hysteresis in the NCL Table : Sequence of Toggle Element with Reset initial state state state state state4 T Null reset T Null reset T Null reset T5(Reset) Null R Null R Null Ki DATA DATA DATA

5 A T A4 R T5 A A T A5 R T T4 CIRCUIT Figure 8: Toggle Element with Reset A7 A6 After each cycle is complete the circuit is required to start a new cycle. Since last output was a DATA output, next is the request for, which has K i =. At this time both the inputs to gate A are de-asserted, as R = so its compliment is equal to and K i =, which caused the gate A to be de-asserted and the output is achieved. However, this circuit is not entirely delay-insensitive because NCL circuits utilizing feedback without intervening asynchronous registers are susceptible to a feedback-lagging race condition. In this case, the time at which the feedback to the A threshold gate might arrive being later than the request for should be considered. This circuit has an exposure to the case that the reset signal to the A gate is delayed while the cycle is complete and the DATA wavefront is arriving. Thus, this circuit should be designed as a module and needs to take into consideration the wire and gate propagation delays associated with the reset circuitry. 4. Toggle Element without Reset Instead of using the above-explained methodology for the Toggle element, an alternative methodology shown in Figure 9 can be used. Instead of always keeping one of the inputs asserted, the compliment of one of the output rails is feedback using the bubble inverted input on the upper gate labeled A in Figure 9. K i A R T H T A T H T A A4 R O Figure 9: Toggle Element without Reset This methodology eliminates the race condition involved with generating the reset signal to set the circuit back to its initial value once it completes each cycle. The gates A and A are initialized to and will follow the sequence listed in Table 4. Table 4: Sequence of Toggle Element without Reset initial state state state state state4 T Null reset to T Null reset to R Null R Null Ki DATA DATA DATA

6 5. NCL Timer with Embedded Registration 5. Version A: Reset-equipped Gates with Combinational Rollover-Detect Bit C is generated by the Toggle Element directly. Bit Bit C is generated in Figure according to Table 5. The information that C is starting a new cycle is stored in the threshold gate with one input always asserted. Once the reset signal of C is asserted the output of is also asserted otherwise it remains de-asserted. K i T5 T7 G T8 G TH T9 T4 R R T6 Reset Circuit Figure : C circuit for Version A Timer Table 5: C sequence for Version A Timer initial state state state state state4 T4 reset to T7(Reset) Null A Null reset to R Null R Null R Null Once the circuit has completed its cycle of and DATA wavefronts, it is necessary to reset the C state machine to its initial condition. Reset will be asserted when RAIL of C and C, i.e. R and R respectively, and complement of K i are asserted. This is only possible when both C and C start a new cycle. Succeeding bits are generated similarly as described below. 5. Version A: Reset-equipped Embedded Register The version A circuit can be developed using more threshold gates to replace all the non-hysteresis gates used in the reset circuitry except the non-hysteresis AND gate. This reduces the gate count from 44 gates to 8 gates, although the transistor count remains unchanged since the weighted gates used each require more transistors individually. The complete circuit for Version A is shown in Figure for an NCL timer with w=4.

7 TH TH TH R R O R TH arises before RAIL of C can be asserted. Thus, the Self-Reset circuit remembers the status of the previous bit and upon every other cycle resets the circuit. For C and C, the same logic is used except in C, the S gate in the Self-Reset circuit is asserted only when RAIL for bit and RAIL for bit are asserted. For generating K o the same logic is required as for the completion circuit in Version A. The complete circuit for Version B is shown in Figure. TH R K O R R TH x_ R S TH R S R S Self Reset R TH R R Figure : Version A Timer 5. Version B: Without Reset-equipped Gates The timer without embedded registration and without reset uses the complement of RAIL as a controlling input. Observe in Figure that the threshold gate G always has one input asserted so there is a need to generate a reset signal to de-assert it when needed. As shown in Figure, the S threshold gate in the chain of the Self-Reset circuitry is a Th gate to which one input is the output of RAIL of C, the second input is the complement of the output of gate S. The third input to first threshold gate, x_, to the S gate of the Self-Reset circuit is provided so that the first threshold gate can assert at the proper wavefront. If this input is not present then RAIL will start following K i prematurely. The S gate will assert when RAIL of C is asserted which will allow RAIL of C to follow K i as it has in Versions A and A. The output of the S gate will remain de-asserted until the time C completes its cycle twice. This is possible because the S gate in the Self- Reset circuit is asserted when the S gate and RAIL for C is asserted. In the second cycle, RAIL of C will be asserted and the third threshold gate will assert its output. At that time, the complement of the output of the S gate, which is the input to S gate, will be de-asserted. The S gate introduced in the Self-Reset circuit has the purpose of becoming asserted when a unique sequence Figure : C Circuit Diagram for Version B 5.4 Simulation Results Table 6 lists the Gate Count, Transistor Count and Data-to-Data cycle time for all the methodologies discussed in the paper. Results are based on simulations performed using SPICE in a Cadence design environment with a.8µ technology library at. Volts. Storage Mechanism Design Gate CountTransistor Count T DD Approach : -Stage Asynchronous Conventional design ns registration Approach : -Stage Asynchronous Collapsed REQUEST ns registration control Approach : Embedded registration Version A: Reset-equipped gates ns with combinational rollover-detect gates Version A: Reset-equipped gates ns with TH rollover-detect gates Version B: Without 74.5 ns Reset-equipped Gates Table 6: Simulation Results for Alternative Designs

8 x_ R TH equipped gates. Version B also produced the most space and time optimized circuits among these alternatives. It was also shown how dual-rail designs without explicit registration can suffer from the drawback of feedbacklagging race conditions that can only be mitigated at the physical design level. x_ R TH R R R All the methodologies developed herein will reduce overhead in terms of gate count and cycle time more significantly as the input width w of the circuit increases. In particular, the increases in the transistor count are almost linear with respect to w using methodologies for versions A, A, and B. A minor implementation limitation when w becomes large is the input width of the gates available, which is limited to 4 in current NCL libraries. TH R TH R R TH R R TH R K O Acknowledgements: The authors would like to acknowledge Heng Tan and Kening Zhang for their review and enhancements that have improved this paper. R R R R R R R Figure : Version B Without Reset-Equipped Gates 6. Conclusion Feedback in dual-rail self-timed circuits using the conventional method of -register stages has overhead three times larger than the Increment circuitry itself. The methodology developed herein reduces this overhead by embedding registration within state machines that realize an increment sequence on a bit-by-bit basis. For the test case of a 4-bit timer, the embedded approaches reduced the feedback gate count by nearly 5% and also DATAto-DATA cycle time by % over -stage feedback. Version A and A used feedback with embedded registration with reset-equipped gates. Version B further optimized the feedback process without using reset- TH TH References: [] Karl M. Fant and Scott A. Brandt, Convention Logic: A Complete and Consistent Logic for Asynchronous Digital Circuit Synthesis, 996 International Conference on Application Specific Systems, Architectures, and Processors, pp [] C. L. Seitz, System Timing, in Introduction to VLSI Systems, Addison-Wesley, 98, pp 8-6 [] A. J. Martin, Programming in VLSI, in Development in Concurrency and Communication, Addison-Wesley, 99, pp 64. [4] K. Van Berkel, Beware the Isochronic Fork, Integration, The VLSI Journal, Vol., No., 99, pp -8. [5] D. E. Muller, Asynchronous Logics and Application to Information Processing, in Switching Theory in Space Technology, Stanford University Press, 96, pp [6] S. C. Smith, R. F. DeMara, J. S. Yuan, D. Ferguson, and D. Lamb, Optimization of Convention Selftimed Circuits, Integration, The VLSI Journal, 4, in press. [7] S. C. Smith, R. F. DeMara, J. S. Yuan, M. Hagedorn and D.Ferguson, Delay-Insensitive Gate-Level Pipelining, Integration, The VLSI Journal, Vol., No., November, pp -.

9 TH TH TH TH R TH R R K O TH R R TH R R O

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering