Asynchronous Circuit Design

Similar documents
Introduction to asynchronous circuit design. Motivation

TIMA Lab. Research Reports

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

Arbiter for an Asynchronous Counterflow Pipeline James Copus

AMULET3i an Asynchronous System-on-Chip

Advances in Designing Clockless Digital Systems

TEMPLATE BASED ASYNCHRONOUS DESIGN

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Synthesis of Asynchronous Logic Design: A Study of Current Testing Practices. EXTRA CREDIT PROJECT EE552 Advanced Logic Design and Switching Theory

Globally Asynchronous Locally Synchronous FPGA Architectures

The design of a simple asynchronous processor

Implementation of ALU Using Asynchronous Design

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Lecture 23. Finish-up buses Storage

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Lecture 25: Busses. A Typical Computer Organization

ECE 341 Final Exam Solution

Lecture 1 Introduction to Microprocessors

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Embedded Systems Ch 15 ARM Organization and Implementation

Computer Systems Organization

Chapter 4. The Processor

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

DYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SPEED CMOS SYSTEMS. I. Introduction

The Nios II Family of Configurable Soft-core Processors

A Comparative Power Analysis of an Asynchronous Processor

One and a half hours. Section A is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Homeschool Enrichment. The System Unit: Processing & Memory

ECE 485/585 Microprocessor System Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Embedded Computing Platform. Architecture and Instruction Set

CpE 442. Memory System

The D igital Digital Logic Level Chapter 3 1

CPE300: Digital System Architecture and Design

CS310 Embedded Computer Systems. Maeng

The University of Manchester Department of Computer Science

EEL 4783: HDL in Digital System Design

Chapter 9: A Closer Look at System Hardware

Chapter 9: A Closer Look at System Hardware 4

The Memory Component

1.3 Data processing; data storage; data movement; and control.

L2: Design Representations

CS152 Computer Architecture and Engineering Lecture 16: Memory System

EECS 579: Built-in Self-Test 3. Regular Circuits

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Chapter Seven Morgan Kaufmann Publishers

CS146 Computer Architecture. Fall Midterm Exam

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Hardware Design with VHDL PLDs IV ECE 443

Lecture #1: Introduction

COMP2121: Microprocessors and Interfacing. Introduction to Microprocessors

Advances in Designing Clockless Digital Systems

AMULET2e. AMULET group. What is it?

University of Alexandria Faculty of Engineering Division of Communications & Electronics

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Write only as much as necessary. Be brief!

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)

VHDL for Synthesis. Course Description. Course Duration. Goals

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN

von Neumann Architecture Basic Computer System Early Computers Microprocessor Reading Assignment An Introduction to Computer Architecture

Basic Computer System. von Neumann Architecture. Reading Assignment. An Introduction to Computer Architecture. EEL 4744C: Microprocessor Applications

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CHAPTER 1 INTRODUCTION

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Low Power and Energy Efficient Asynchronous Design

Computer Architecture!

EITF20: Computer Architecture Part4.1.1: Cache - 2

Introduction to ICs and Transistor Fundamentals


Memory Systems IRAM. Principle of IRAM

EE 3170 Microcontroller Applications

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

Pad Ring and Floor Planning

Advanced Computer Architecture

Computer Architecture!

One and a half hours. Section A is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Computer Architecture

Microcomputers. Outline. Number Systems and Digital Logic Review

Random Access Memory (RAM)

CSE 140L Final Exam. Prof. Tajana Simunic Rosing. Spring 2008

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

Unleashing the Power of Embedded DRAM

Chapter 2 Logic Gates and Introduction to Computer Architecture

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN FALL 2005

Chapter 4. MARIE: An Introduction to a Simple Computer

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

Chapter 4. The Processor

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Digital Semiconductor. StrongARMARM

Overview ECE 753: FAULT-TOLERANT COMPUTING 1/21/2014. Recap. Fault Modeling. Fault Modeling (contd.) Fault Modeling (contd.)

Chapter 4. Chapter 4 Objectives

Introduction to Asynchronous Circuits and Systems

Transcription:

Asynchronous Circuit Design Chris J. Myers Lecture 9: Applications Chapter 9 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 1 / 60

Overview A brief history of asynchronous circuit design Intel s RAPPID Performance analysis Testing asynchronous circuits The synchronization problem Arbitration The future of asynchronous circuit design Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 2 / 60

Early Mainframes In 50s and 60s, asynchronous design used in many mainframe computers, including ILLIAC and ILLIAC II designed at U. of Illinois and the Atlas and MU-5 designed at U. of Manchester. ILLIAC and ILLIAC II designed using speed-independent design by Muller and his colleagues. ILLIAC, completed in 1952, was 10 feet long, 2 feet wide, 8 1 feet high, 2 contained 2800 vacuum tubes, and weighed 5 tons. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 3 / 60

ILLIAC Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 4 / 60

ILLIAC II ILLIAC II completed in 1962 was 100 times faster than its predecessor. Had 55,000 transistors and performed a FP multiply in 6.3 µs. Used three concurrently operating controls: an arithmetic control, interplay control, and Advanced Control. Three controls are largely asynchronous and speed-independent. SI design used to increase reliability and ease of maintenance. Controls collected reply signals to indicate that all operations for current step are complete before going on to next step. Arithmetic unit was not speed-independent, as it would increase its complexity and cost while decreasing its speed. Electromechanical peripheral devices were also not speed-independent, since they were inherently synchronous. Used until 1967, and found prime, 2 11213 1 (over 3000 digits). Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 5 / 60

ILLIAC II Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 6 / 60

ILLIAC II Control Panel Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 7 / 60

Macromodules Developed in 60s and 70s at Washington University in St. Louis. Macromodules are building blocks... from which it is possible for the electronically-naive to construct arbitrarily large and complex computers that work. Asynchronous to allow easy interconnection and to allow designers to worry only about logical and not electrical problems. Used to build macromodular computer systems by placing in a rack and interconnecting with wires. Wires carry data signals as well as a bundled data control signal. Wires also for control to sequence operations. Developed directly from a flowchart description of an algorithm. Many computing engines designed using macromodules. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 8 / 60

Macromodules Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 9 / 60

Macromodules Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 10 / 60

Macromodules Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 11 / 60

Macromodules Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 12 / 60

Macromodules Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 13 / 60

Data Driven Machines Rather than single PC, flow of data controls operation speed. In 70s, first operational dataflow computer, DDM-1, designed at U. of Utah and first commercial graphics system designed at Evans and Sutherland. In the 80s, Matsushita, Sanyo, Sharp, and Mitsubishi designed several data-driven processors. Most recently, a data-driven media processor (DDMP) has been designed at Sharp capable of 2400 million signal processing operations per second while consuming only 1.32 W. Videonics has utilized DDMP design in a high-speed video DSP. Asynchronous design of DDMP is cited to have simplified the board layout and reduced RF interference. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 14 / 60

Caltech s Asynchronous Microprocessors In 1989, designed first fully asynchronous microprocessor. Processor has a 16-bit datapath with 16- and 32-bit instructions. Has twelve 16-bit registers, 4 buses, an ALU, and 2 adders. Consists of 20,000 transistors, fabricated in 2 µm and 1.6 µm, and took five people only five months to design. 2 µm version could perform 12 million ALU instructions per second, while the 1.6 µm version could perform 18 million. Chips operate with VDD from 0.35 to 7 V, and achieve almost double the performance when cooled in liquid nitrogen. Design is entirely quasi-delay insensitive (QDI). Design was derived from a high-level channel description using program transformations to introduce ideas like pipelining. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 15 / 60

Caltech s Asynchronous Microprocessors Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 16 / 60

Caltech s Asynchronous Microprocessors (cont) Also designed first asynchronous GaAs microprocessor. This processor ran at 100 MIPS while consuming 2 W. Designed an asynchronous MIPS R3000 microprocessor. Design fabricated in 0.6 µm CMOS, and it uses 2 million transistors, of which 1.25 million are in its caches. Measured performance ranged from 60 MIPS and 220 mw at 1.5 V and 25 C to 180 MIPS and 4 W at 3.3 V and 25 C. Running Dhrystone, the chip achieved about 185 MHz at 3.3 V. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 17 / 60

AMULET AMULET1 completed in 1994 at U. of Manchester was the first asynchronous processor to be code-compatible with an existing synchronous processor, the ARM. Design style followed Sutherland s two-phase micropipeline idea. Chip fabricated in a 1 µm and a 0.7 µm process. Performance was measured for the 1 µm part from 3.5 to 6 V completing between 15 and 25 thousand Dhrystones per second. MIPS/watt value was also measured to be from 175 down to 50. Chip operates correctly between 50 C and 120 C. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 18 / 60

AMULET (cont) AMULET2e, targeting embedded systems, contained an AMULET2 core with a cache/ram, a memory interface, and other control functions on chip. This design used four-phase bundled-data style. Design was fabricated in 0.5 µm CMOS, and its measured performance at 3.3 V was 74 kdhrystones, which is roughly equivalent to 42 MIPS and consumed 150 mw. The radio-frequency emission spectrum or EMC was shown to be significantly spread as compared with a clocked system. When processor enters halt mode, power is under 0.1 mw. AMULET3 has incorporated ARM thumb code and new architectural features to improve performance. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 19 / 60

AMULET 3 Microprocessor Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 20 / 60

SUN Pipelines In 1994, group at SUN suggested replicating synchronous architecture may not be the best for asynchronous design. Suggested a radically different architecture, the counterflow pipeline in which instructions are injected up the pipeline while contents of registers are injected down. When instruction meets register values, computes a result. Key to the performance of such a design are circuits to move data very quickly. This group has designed several very fast FIFO circuits. Test chips fabricated in 0.6 µm have a maximum throughput of between 1.1 and 1.7 Giga data items per second. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 21 / 60

Designs at Philips Research Laboratories Designed many asynchronous designs targeting low power. Developed design procedure from specification in TANGRAM to a chip, and applied to several commercially interesting designs. In 1994, produced an error corrector chip for a DCC player that consumed only 10 mw at 5 V, 1 of synchronous counterpart. 5 Design also required only 20 percent more area. In 1997, designed standby circuits for a pager decoder which uses 4 times less power while 40% larger than synchronous design. Results in a 37 percent decrease in power for entire pager. In 1998, this group designed an 80C51 microcontroller. Chip in 0.5 µm is 3 to 4 times more power efficient than synchronous counterpart, consuming only 9 mw at 4 MIPS. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 22 / 60

Designs at Philips Research Laboratories Most notable accomplishment is a fully asynchronous pager sold by Philips using standby circuits and 80C51 microcontroller. Major reason Philips uses asynchronous pager is it has a more evenly spread emission spectrum over the frequency range. Interference at clock frequency and harmonics requires digital circuitry in synchronous design to be shut off as message arrives. Spread-spectrum emission for asynchronous design allows digital circuitry to remain active as the message is received. Permits pager to be capable of being universal in that it can accept all three of the international pager standards. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 23 / 60

Epson s Flexible 8-bit Asynchronous Microcontroller Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 24 / 60

Fulcrum s 1GHz Processor using Asynchronous Circuits Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 25 / 60

Cornel s Asynchronous FPGAs (Achronix) Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 26 / 60

An Asynchronous Instruction-Length Decoder RAPPID (Revolving Asynchronous Pentium Processor Instruction Decoder) is a fully asynchronous instruction-length decoder for the complete PentiumII 32-bit MMX instruction set. RAPPID project conducted at Intel between 1995 and 1999. Achieved 3 fold improvement in speed and 2 fold improvement in power compared with existing synchronous design. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 27 / 60

x86 Instruction-Length Decoding Each instruction can be from 1 to 15 bytes long. To allow concurrent execution of instructions, necessary to rapidly determine positions of each instruction in a cache line. It was the critical bottleneck in this architecture. A partial list of rules to determine instruction length: Opcode can be 1 or 2 bytes. Opcode determines presence of the ModR/M byte. ModR/M determines presence of the SIB byte. ModR/M and SIB set length of displacement field. Opcode determines length of immediate field. Instructions may be preceded by as many as 15 prefix bytes. A prefix may change the length of an instruction. The maximum instruction length is 15 bytes. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 28 / 60

RAPPID Data Instruction Length Statistics 100% 80% 60% Frequency Cummulative 40% 33.2% 14.5% 24.4% 20% 0% 9.8% 6.7% 8.1% 2.8% 0.4% 0.0% 0.1% 0.0% 1 2 3 4 5 6 7 8 9 10 11 100% Opcode Type Statistics 80% 60% Frequency Cummulative 40% 20% 0% Instruction Type Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 29 / 60

RAPPID Microarchitecture Byte Unit (BU) Input FIFO (IF) Column 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Byte Byte Latch Ctrl (BC) Length Decode (LD) Decode and Steer Unit (DU) Row 0 Tag Unit (TU) Steering Switch (SS) Output Buffer Row 1 Tag Unit (TU) Steering Switch (SS) Output Buffer Row 2 Tag Unit (TU) Steering Switch (SS) Output Buffer Row 3 Tag Unit (TU) Steering Switch (SS) Output Buffer Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 30 / 60

Balanced Versus Unbalanced Logic A A B C B C Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 31 / 60

Tag Unit Circuit InstRdy SSRdy Length 1 TagOut 1 Length 2 TagOut 2 Tagln 1 Tagln 2... Tagln 7 TagArrived Length 7...... TagOut 7 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 32 / 60

RAPPID Test Results Test chip fabricated in May 1998 using a 0.25 µm process. Capable of decoding and steering instructions at a rate of 2.5 to 4.5 instructions per ns. Fastest synchronous 3-issue product in same fabrication process clocked at 400 MHz is capable of only 1.2 instructions per ns. Chip operated correctly between 1.0 and 2.5 V while synchronous design could only tolerate about 1.9 to 2.1 V. Consumes only 1 of energy of clocked design. 2 Found to achieve these gains with only a 22 percent area penalty. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 33 / 60

Performance Analysis For asynchronous design, cannot simply find critical path delay or count number of clock cycles per operation. Worst-case analysis may be quite pessimistic, as goal is to achieve high rates of performance on average. Must take a probabilistic approach to performance analysis. Need to extend model to include a distribution function for each delay. Simple approach is assume delay is uniform in its range. Another possibility is to use a truncated Gaussian. Most direct approach to determine performance analysis is a Monte Carlo simulation. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 34 / 60

Wine Shop Example: Timed Circuit ack_wine req_patron C CSC0 CSC0 ack_wine ack_patron C req_patron req_wine ack_patron req_patron ack_wine Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 35 / 60

Timing Assumptions Assumption Winery delays Patron responds Patron resets Inverter delay AND gate delay OR gate delay C-element delay Delay 2 to 3 minutes 5 to minutes 2 to 3 minutes 0 to 6 seconds 6 to 12 seconds 6 to 12 seconds 12 to 18 seconds Synchronous worst-case cycle time is 18.3 minutes Asynchronous average-case cycle time is 14.2 minutes Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 36 / 60

Testing Asynchronous Circuits (Bad News) Once chip is fabricated, must test it to determine presence of manufacturing defects, or faults, before delivering to consumer. In asynchronous circuits, there is no global clock which can be used to single step the design through a sequence of steps. Asynchronous circuits have more state holding elements, which increases overhead to apply and examine test vectors. Huffman circuits use redundant circuitry to remove hazards which tends to hide some faults, making them untestable. Asynchronous circuits may fail due to glitches caused by delay faults, which are difficult to detect. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 37 / 60

Testing Asynchronous Circuits (Good News) Since many asynchronous styles use handshakes, for many faults, a defective circuit simply halts. In the stuck-at fault model, a defect is assumed to cause a wire to become permanently stuck-at-0 or stuck-at-1. If acknowledge wire is stuck-at-0, the request is never acknowledged, causing the circuit to stop and wait forever. Easy to detect with a timeout. The circuit is said to be self-checking. Delay-insensitive circuits halt in the presence of any stuck-at fault on any wire in the design. Muller circuits halt for any output stuck-at fault. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 38 / 60

Fault Locations Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 39 / 60

Testing Example 1 req_wine1 ack_wine1 ack_wine2 C req_patron1 ack_patron1 ack_patron2 req_wine2 C req_patron2 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 40 / 60

Testing Example 2 r1 x req_patron req_wine r2 C a1 ack_patron ack_wine x a2 Consider r 1 stuck-at-0 req_wine+, x+, req_patron+ Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 41 / 60

Other Testing Issues In the isochronic fork fault model, faults can be detected on all branches of a nonisochronic fault by the circuit halting, but only on the input to forks that are isochronic. More complex delay fault and bridging fault models have been successfully adapted to asynchronous circuits. Once one selects a fault model, next step is to add circuitry to apply and analyze test vectors, such as scan paths. Must also generate sufficient set of test vectors to guarantee a high degree of coverage of all possible faults in our model. While traditional synchronous test methods don t work off the shelf, many popular methods adapted to asynchronous problem. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 42 / 60

The Synchronization Problem It is difficult to reliably communicate between asynchronous and synchronous modules without substantial latency penalties. Synchronous circuit sampling asynchronous signal changing too close to the clock edge may end up in a metastable state. If this state persists so long that something bad happens, this is called a synchronization failure. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 43 / 60

Metastability Asynchronous Input D Q Synchronized Input CLK Q Voltage D CLK Q Time Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 44 / 60

Flip-Flop Response Time t r t su 0 t h t pd t d Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 45 / 60

Probability of Synchronization Failure Data arrive at time uniformly distributed within clock cycle, T. P(t d [t su,t h ]) = t h t su T Assume flip-flop is given bounded amount of time, t b : (1) P(t r > t b t d [t su,t h ]) = 1 k +(1 k)e (t b t pd )/τ (2) k is a positive fraction less than 1 and τ is a time constant with values on the order of a few picoseconds for modern technologies. Combine Equations 1 and 2 using Bayes rule: P(t r > t b ) = P(t d [t su,t h ]) P(t r > t b t d [t su,t h ]) (3) = t h t su 1 T k +(1 k)e (t b t pd )/τ (4) Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 46 / 60

Probability of Synchronization Failure (cont) If t b t pd 5τ, Equation 4 can be simplified as follows: P(t r > t b ) t h t su T e (t b t pd )/τ 1 k (5) By combining constants, Equation 5 can be changed to P(t r > t b ) T 0 T e t b/τ (6) T 0 and τ scale linearly with feature size. Equation 6 has been verified experimentally and found to be a good estimate as long as t b is not too close to t pd. There is no finite value of t b such that P(t r > t b ) = 0. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 47 / 60

Synchronization Error A synchronization error occurs when t r is greater than time available to respond, t a. A synchronization failure occurs when there is an inconsistency caused by the error. Expected number of errors is E e (t a ) = P(t r > t a ) λ t (7) where λ is the average rate of change of the signal being sampled and t is the time over which the errors are counted. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 48 / 60

Mean Time Between Failure If we set E e (t a ) to 1, change t to MTBF (mean time between failure), substitute Equation 6 for P(t r > t a ), and rearrange Equation 7, we get MTBF = T et a/τ T 0 λ (8) This equation increases rapidly as t a is increased. Though no absolute bound in which no failure can ever occur, there does exist an engineering bound. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 49 / 60

Double Latch Solution Asynchronous Input D Q D Q Synchronized Input CLK Q Q Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 50 / 60

Reducing the Probability of Failure If n extra latches are added in series with an asynchronous input, the new value of t a is given by t a = t a + n(t t pd ) (9) where T is the clock period and t pd is the propagation delay through the added flip-flops. IMPORTANT: INCORRECT IN THE BOOK. Cost is extra n cycles of delay when communicating data from an asynchronous module to a synchronous module. This scheme only minimizes probability and does not eliminate the possibility of synchronization failure. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 51 / 60

Stoppable Ring Oscillator Clock RUN CLK Odd number of inverters Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 52 / 60

Stoppable Ring Oscillator Clock with ME ACK REQ R1 ME A1 R2 A2 CLK Odd number of inverters Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 53 / 60

Circuit for Mutual Exclusion R1 V1 T1 A2 R2 V2 T2 A1 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 54 / 60

Basic Module of a GALS Architecture Data in Synchronous Module Data out ReqIn AckIn Local Clock Generator CLK ReqOut AckOut Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 55 / 60

LAGS Architecture Reg. Reg. Reg. Reg. Sync Module Async Module Sync Module Async Module REQ ACK REQ ACK Handshake Controller Handshake Controller Async / Sync Interface Controller Stoppable Clock RUN CLK Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 56 / 60

Arbitration Arbitration necessary when two or more modules want mutually exclusive access to a shared resource. If modules are asynchronous, arbitration also cannot be guaranteed in a bounded amount of time. If all modules are asynchronous, it is possible to build an arbiter with zero probability of failure. Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 57 / 60

Circuit for Arbitration A1 R1 C A3 R2 ME R3 A2 C Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 58 / 60

Four-Way Arbiters R1 A1 R2 A2 R3 A3 R4 A4 R A R1 A1 R2 A2 R3 A3 R4 A4 R A Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 59 / 60

Future of Asynchronous Design Papers in 60s and 70s cite same advantages of asynchronous design used in papers today. Synchronous design has been so simple to understand and use. Asynchronous revolution may be quite gradual. Even if it never happens asynchronous research has been useful. Hopefully when you reach industry, you will consider using asynchronous design! Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 60 / 60