Asynchronous Logic: A Computer Systems Perspective. Rajit Manohar Computer Systems Lab, Cornell

Size: px

Start display at page:

Download "Asynchronous Logic: A Computer Systems Perspective. Rajit Manohar Computer Systems Lab, Cornell"

Prosper Dean
6 years ago
Views:

1 Asynchronous Logic: A Computer Systems Perspective Rajit Manohar Computer Systems Lab, Cornell

2 Approaches to computation Discrete Value Continuous Value Discrete Time Digital, synchronous logic Switched-capacitor analog Continuous Time Digital, asynchronous logic General analog computation

3 Asynchronous logic sender clock data receiver signal value time sampling window sender data 0 data 1 ack receiver sender req data ack receiver data 1 data 0 X X X X data request acknowledge acknowledge

4 Explicit versus implicit synchronization A B C A B C A1 B1 C1 A1 B1 C1 A2 B2 C2 A2 B2 C2 A3 B3 C3 time A3 B3 C3 C4 time A4 B4 C4 A4 B4 Synchronous computation Asynchronous computation

5 Differences at a glance Asynchronous Continuous time computation. Observed delay is the average over possible circuit paths. Local switching from data-driven computation. Can lead to low-power operation. Throughput determined by device speed. (Margins needed for some circuit families.) Data must be encoded, requiring additional wires for signals. Synchronous Discrete time computation. Observed delay is the maximum over possible circuit paths. Global activity from clock-driven computation. Low power requires careful clock gating. Throughput determined by device speed and additional operating margins. Unencoded data acceptable due to global clock network.

6 A little bit of (biased) history Binary addition (von Neumann) Switching theory (Miller) Illiac II (UIUC/Muller) Macro modular systems (Clarke, Molnar, Stucki, ) Atlas, MU5 (Manchester) Flow tables (Huffman) System Timing (Seitz, in Mead/Conway) Trace theory (Snepscheut) Syntactic compilation (Martin, Rem) ARM with caches (Furber) Pipelined FPGAs (Manohar) HiAER (Cauwenberghs) FACETS (Heidelberg) TrueNorth (IBM/Cornell) Micro pipelines (Sutherland) Asynchronous CPU (Martin) General hazard-free synthesis AER (Mead lab) 17FO4 MIPS CPU (Martin) Commercial 80C51 (Phillips) UltraSparc IIIi mem controller (Sun) Sensory systems (retina, cochlea, etc.) Event-driven CPU (Manohar) 10G Ethernet switch (Fulcrum) SpiNNaker (Furber) Neurogrid (Boahen)

7 I. Exploiting the gap between average and max delay Slow part: computing carries but only in the worst-case! Theorem [von Neumann, 1946]. The average-case latency for a ripple carry binary adder is O(log N) for i.i.d. inputs A.W. Burks, H.H. Goldstein, J. von Neumann. Preliminary discussion of the logical design of an electronic computing instrument. (1946)

8 I. Exploiting the gap between average and max delay Standard adder tree O(log N) latency Hybrid adder tree Tree adder + ripple adder Theorem [Winograd, 1965]. The worst-case latency for binary addition is Ω(log N) Theorem. The average-case latency of the hybrid asynchronous adder is O(log log N) for i.i.d. inputs Theorem. The average-case latency of the hybrid asynchronous adder is optimal for any input distribution S.O. Winograd. On the time required to perform addition. JACM (1965) R. Manohar and J. Tierno. Asynchronous parallel prefix computation. IEEE Transactions on Computers (1998)

9 II. Exploiting data-driven power management Low power FIR filter External, synchronous I/O Monitor occupancy of FIFOs Speed up/slow down using adaptive power supply Break-point add/multiply Detect when top half of operand is required Dynamically switch between full width and half width arithmetic State DC/DC converter Vdd FIR filter L. Nielsen and J. Sparso. A Low-power Asynchronous Datapath for a FIR filter bank. Proc. ASYNC (1996)

II. Exploiting data-driven power management Width-adaptive numbers Compress leading 0 s and 1 s 8 bits per VDW block 0 0 0 0 0 0 1 + 0 0

30 20 10 0 Average: 8.4 Average: 8.8 5 10 SPEC: 124.m88ksim Average: 10.

10 II. Exploiting data-driven power management Width-adaptive numbers Compress leading 0 s and 1 s 8 bits per VDW block v/s sender receiver 128 bit datapath Cumulative Distribution Function (%) Average: 8.4 Average: SPEC: 124.m88ksim Average: 10.9 no compaction simple compaction full compaction Operand Widths Number: receiver sender Number: R. Manohar. Width-Adaptive Data Word Architectures. Proc. ARVLSI (2001)

11 III. Exploiting elasticity of pipelines f g h Asynchronous computation is* robust to changes in circuitlevel pipelining! f g h *R. Manohar, A. J. Martin. Slack Elasticity in Concurrent Computing. Proc. Mathematics of Program Construction (1998)

12 III. Exploiting elasticity of pipelines Asynchronous FIR filter Data arrives at variable rates Data rates are predictable Fixed amount of time for async computation Variable number of clock cycles for the fixed time budget M. Singh, J.A. Tierno, A. Rylyakov, S. Rylov, S.M. Nowick. An adaptively pipelined mixed synchronous-asynchronous digital FIR filter chip operating at 1.3 GHz. Proc. ASYNC (2002)

13 III. Exploiting elasticity of pipelines FPGA throughput low due to overhead of programmable interconnect LB CB LB CB CB SB CB SB ~3x improvement in peak throughput by transparent interconnect pipelining LB CB LB CB CB SB CB SB J. Teifel and R. Manohar. Highly pipelined asynchronous FPGAs. Proc. FPGA (2004)

14 IV. Exploiting timing robustness Asynchronous FPGA Test Data Throughput (MHz) Process: TSMC 0.18um Nominal Vdd: 1.8V 77K 12K K K 200 Xilinx Virtex Voltage (V) D. Fang, J. Teifel, and R. Manohar. A High-Performance Asynchronous FPGA: Test Results. Proc. FCCM (2005)

15 IV. Exploiting timing robustness GasP logic family 6-10 FO4 Track reset Self-reset w Single-track control wire No margins on control signals RA(i) RA(i+1) Standard datapath Wait for L set, R reset Track set Latest 40nm chip operates >6 GHz Datapath driver w I. Sutherland, S. Fairbanks. GasP: a minimal FIFO control. Proc. ASYNC (2001)

V. Exploiting low noise and low EMI time domain frequency domain 0 100 200 300 400 MHz 0 100 200 300 400 MHz Synchronous 80C51 (Phillips) Asynchronous

16 V. Exploiting low noise and low EMI time domain frequency domain MHz MHz Synchronous 80C51 (Phillips) Asynchronous 80C51 (Phillips) J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous Circuits in Contactless Smart Cards. Proc. ASYNC (2000)

VI. Exploiting continuous-time information processing Continuous-time digital signal processing Automatic adaptation to input bandwidth No aliasing from sampling process

17 VI. Exploiting continuous-time information processing Continuous-time digital signal processing Automatic adaptation to input bandwidth No aliasing from sampling process Nyquist sampling Level-crossing sampling B. Schell, Y. Tsividis. A clockless adc/dsp/dac system with activitydependent power dissipation and no aliasing. Proc. ISSCC (2008)

18 VI. Exploiting continuous-time information processing Address-event representation in neuromorphic systems Time represents itself Spike timing and coincidence for information processing Asynchronous logic Temporal resolution is decoupled from throughput Automatic power management Projects Silicon retinas, cochleas, More recent: FACETS, HiAER, Neurogrid, neurop, SpiNNaker, TrueNorth

19 A note on oscillations in asynchronous logic Well-known result in asynchronous circuit theory Signal transitions: look at the i th occurrence of transition t Then: time(t, i) = the actual time of this event time 0 (t, i) =f(t)+p i time(t, i) time 0 (t, i) <B Bounded, independent of i, and t If the circuit is strongly connected, this result holds More recently: stronger periodicity under more general conditions J. Magott. Performance evaluation of concurrent systems using petri nets. IPL (1984) S.M. Burns, A.J. Martin. Performance Analysis and Optimization of Asynchronous Circuits. Proc. ARVLSI (1990) W. Hua and R. Manohar. Exact timing analysis for concurrent systems. To appear, work-in-progress session, DAC (2016)

Asynchronous logic in TrueNorth Spikes trigger activity so they should be asynchronous Neurons leak periodically so add a periodic event to the system (the clock) At the external timing tick : Update

20 Asynchronous logic in TrueNorth Spikes trigger activity so they should be asynchronous Neurons leak periodically so add a periodic event to the system (the clock) At the external timing tick : Update each neuron, delivering spikes plus any local state update If the neuron spikes, send output spike to destinations, with specified delay Spikes travel through asynchronous network Receiver buffers spikes until it is time to deliver them Digital, asynchronous logic Scheduler Router Token Control Neuron SRAM

21 Asynchronous logic in TrueNorth Spike communication network Links are a shared resource Links might be contended Spike delivery times can differ based on traffic Different runs of the same network might produce different spike patterns Instead, we use a spike scheduler Guarantees deterministic computation and hence, repeatability

22 Summary Asynchronous logic has a long history Some key properties exploited in projects Gap between average and maximum delay for a computation Automatic data-driven power management Elastic pipelining Robustness to timing uncertainty/variation Reduced EMI and noise Continuous-time information representation

23 Acknowledgments The async community Students: Ned Bingham, Wenmian Hua, Sean Ogden, Praful Purohit, Tayyar Rzayev, Nitish Srivastava John Teifel, Clinton Kelly, Virantha Ekanayake, Song Peng, David Biermann, David Fang, Chris LaFrieda, Filipp Akopyan, Basit Riaz Sheikh, Benjamin Tang, Nabil Imam, Sandra Jackson, Carlos Otero, Benjamin Hill, Stephen Longfield, Jonathan Tse, Robert Karmazin Collaborators: General: David Albonesi, Sunil Bhave, Francois Guimbretiere, Amit Lal, Yoram Moses, Ivan Sutherland, Yannis Tsividis Neuromorphic: John Arthur, Kwabena Boahen, Tobi Delbruck, Chris Eliasmith, Giacomo Indiveri, Paul Merolla, Dharmendra Modha Support: AFRL, DARPA, IARPA, NSF, ONR, IBM, Lockheed, Qualcomm, Samsung, SRC

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction l Synchronous