Asynchronous Logic: A Computer Systems Perspective. Rajit Manohar Computer Systems Lab, Cornell

Size: px
Start display at page:

Download "Asynchronous Logic: A Computer Systems Perspective. Rajit Manohar Computer Systems Lab, Cornell"

Transcription

1 Asynchronous Logic: A Computer Systems Perspective Rajit Manohar Computer Systems Lab, Cornell

2 Approaches to computation Discrete Value Continuous Value Discrete Time Digital, synchronous logic Switched-capacitor analog Continuous Time Digital, asynchronous logic General analog computation

3 Asynchronous logic sender clock data receiver signal value time sampling window sender data 0 data 1 ack receiver sender req data ack receiver data 1 data 0 X X X X data request acknowledge acknowledge

4 Explicit versus implicit synchronization A B C A B C A1 B1 C1 A1 B1 C1 A2 B2 C2 A2 B2 C2 A3 B3 C3 time A3 B3 C3 C4 time A4 B4 C4 A4 B4 Synchronous computation Asynchronous computation

5 Differences at a glance Asynchronous Continuous time computation. Observed delay is the average over possible circuit paths. Local switching from data-driven computation. Can lead to low-power operation. Throughput determined by device speed. (Margins needed for some circuit families.) Data must be encoded, requiring additional wires for signals. Synchronous Discrete time computation. Observed delay is the maximum over possible circuit paths. Global activity from clock-driven computation. Low power requires careful clock gating. Throughput determined by device speed and additional operating margins. Unencoded data acceptable due to global clock network.

6 A little bit of (biased) history Binary addition (von Neumann) Switching theory (Miller) Illiac II (UIUC/Muller) Macro modular systems (Clarke, Molnar, Stucki, ) Atlas, MU5 (Manchester) Flow tables (Huffman) System Timing (Seitz, in Mead/Conway) Trace theory (Snepscheut) Syntactic compilation (Martin, Rem) ARM with caches (Furber) Pipelined FPGAs (Manohar) HiAER (Cauwenberghs) FACETS (Heidelberg) TrueNorth (IBM/Cornell) Micro pipelines (Sutherland) Asynchronous CPU (Martin) General hazard-free synthesis AER (Mead lab) 17FO4 MIPS CPU (Martin) Commercial 80C51 (Phillips) UltraSparc IIIi mem controller (Sun) Sensory systems (retina, cochlea, etc.) Event-driven CPU (Manohar) 10G Ethernet switch (Fulcrum) SpiNNaker (Furber) Neurogrid (Boahen)

7 I. Exploiting the gap between average and max delay Slow part: computing carries but only in the worst-case! Theorem [von Neumann, 1946]. The average-case latency for a ripple carry binary adder is O(log N) for i.i.d. inputs A.W. Burks, H.H. Goldstein, J. von Neumann. Preliminary discussion of the logical design of an electronic computing instrument. (1946)

8 I. Exploiting the gap between average and max delay Standard adder tree O(log N) latency Hybrid adder tree Tree adder + ripple adder Theorem [Winograd, 1965]. The worst-case latency for binary addition is Ω(log N) Theorem. The average-case latency of the hybrid asynchronous adder is O(log log N) for i.i.d. inputs Theorem. The average-case latency of the hybrid asynchronous adder is optimal for any input distribution S.O. Winograd. On the time required to perform addition. JACM (1965) R. Manohar and J. Tierno. Asynchronous parallel prefix computation. IEEE Transactions on Computers (1998)

9 II. Exploiting data-driven power management Low power FIR filter External, synchronous I/O Monitor occupancy of FIFOs Speed up/slow down using adaptive power supply Break-point add/multiply Detect when top half of operand is required Dynamically switch between full width and half width arithmetic State DC/DC converter Vdd FIR filter L. Nielsen and J. Sparso. A Low-power Asynchronous Datapath for a FIR filter bank. Proc. ASYNC (1996)

10 II. Exploiting data-driven power management Width-adaptive numbers Compress leading 0 s and 1 s 8 bits per VDW block v/s sender receiver 128 bit datapath Cumulative Distribution Function (%) Average: 8.4 Average: SPEC: 124.m88ksim Average: 10.9 no compaction simple compaction full compaction Operand Widths Number: receiver sender Number: R. Manohar. Width-Adaptive Data Word Architectures. Proc. ARVLSI (2001)

11 III. Exploiting elasticity of pipelines f g h Asynchronous computation is* robust to changes in circuitlevel pipelining! f g h *R. Manohar, A. J. Martin. Slack Elasticity in Concurrent Computing. Proc. Mathematics of Program Construction (1998)

12 III. Exploiting elasticity of pipelines Asynchronous FIR filter Data arrives at variable rates Data rates are predictable Fixed amount of time for async computation Variable number of clock cycles for the fixed time budget M. Singh, J.A. Tierno, A. Rylyakov, S. Rylov, S.M. Nowick. An adaptively pipelined mixed synchronous-asynchronous digital FIR filter chip operating at 1.3 GHz. Proc. ASYNC (2002)

13 III. Exploiting elasticity of pipelines FPGA throughput low due to overhead of programmable interconnect LB CB LB CB CB SB CB SB ~3x improvement in peak throughput by transparent interconnect pipelining LB CB LB CB CB SB CB SB J. Teifel and R. Manohar. Highly pipelined asynchronous FPGAs. Proc. FPGA (2004)

14 IV. Exploiting timing robustness Asynchronous FPGA Test Data Throughput (MHz) Process: TSMC 0.18um Nominal Vdd: 1.8V 77K 12K K K 200 Xilinx Virtex Voltage (V) D. Fang, J. Teifel, and R. Manohar. A High-Performance Asynchronous FPGA: Test Results. Proc. FCCM (2005)

15 IV. Exploiting timing robustness GasP logic family 6-10 FO4 Track reset Self-reset w Single-track control wire No margins on control signals RA(i) RA(i+1) Standard datapath Wait for L set, R reset Track set Latest 40nm chip operates >6 GHz Datapath driver w I. Sutherland, S. Fairbanks. GasP: a minimal FIFO control. Proc. ASYNC (2001)

16 V. Exploiting low noise and low EMI time domain frequency domain MHz MHz Synchronous 80C51 (Phillips) Asynchronous 80C51 (Phillips) J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous Circuits in Contactless Smart Cards. Proc. ASYNC (2000)

17 VI. Exploiting continuous-time information processing Continuous-time digital signal processing Automatic adaptation to input bandwidth No aliasing from sampling process Nyquist sampling Level-crossing sampling B. Schell, Y. Tsividis. A clockless adc/dsp/dac system with activitydependent power dissipation and no aliasing. Proc. ISSCC (2008)

18 VI. Exploiting continuous-time information processing Address-event representation in neuromorphic systems Time represents itself Spike timing and coincidence for information processing Asynchronous logic Temporal resolution is decoupled from throughput Automatic power management Projects Silicon retinas, cochleas, More recent: FACETS, HiAER, Neurogrid, neurop, SpiNNaker, TrueNorth

19 A note on oscillations in asynchronous logic Well-known result in asynchronous circuit theory Signal transitions: look at the i th occurrence of transition t Then: time(t, i) = the actual time of this event time 0 (t, i) =f(t)+p i time(t, i) time 0 (t, i) <B Bounded, independent of i, and t If the circuit is strongly connected, this result holds More recently: stronger periodicity under more general conditions J. Magott. Performance evaluation of concurrent systems using petri nets. IPL (1984) S.M. Burns, A.J. Martin. Performance Analysis and Optimization of Asynchronous Circuits. Proc. ARVLSI (1990) W. Hua and R. Manohar. Exact timing analysis for concurrent systems. To appear, work-in-progress session, DAC (2016)

20 Asynchronous logic in TrueNorth Spikes trigger activity so they should be asynchronous Neurons leak periodically so add a periodic event to the system (the clock) At the external timing tick : Update each neuron, delivering spikes plus any local state update If the neuron spikes, send output spike to destinations, with specified delay Spikes travel through asynchronous network Receiver buffers spikes until it is time to deliver them Digital, asynchronous logic Scheduler Router Token Control Neuron SRAM

21 Asynchronous logic in TrueNorth Spike communication network Links are a shared resource Links might be contended Spike delivery times can differ based on traffic Different runs of the same network might produce different spike patterns Instead, we use a spike scheduler Guarantees deterministic computation and hence, repeatability

22 Summary Asynchronous logic has a long history Some key properties exploited in projects Gap between average and maximum delay for a computation Automatic data-driven power management Elastic pipelining Robustness to timing uncertainty/variation Reduced EMI and noise Continuous-time information representation

23 Acknowledgments The async community Students: Ned Bingham, Wenmian Hua, Sean Ogden, Praful Purohit, Tayyar Rzayev, Nitish Srivastava John Teifel, Clinton Kelly, Virantha Ekanayake, Song Peng, David Biermann, David Fang, Chris LaFrieda, Filipp Akopyan, Basit Riaz Sheikh, Benjamin Tang, Nabil Imam, Sandra Jackson, Carlos Otero, Benjamin Hill, Stephen Longfield, Jonathan Tse, Robert Karmazin Collaborators: General: David Albonesi, Sunil Bhave, Francois Guimbretiere, Amit Lal, Yoram Moses, Ivan Sutherland, Yannis Tsividis Neuromorphic: John Arthur, Kwabena Boahen, Tobi Delbruck, Chris Eliasmith, Giacomo Indiveri, Paul Merolla, Dharmendra Modha Support: AFRL, DARPA, IARPA, NSF, ONR, IBM, Lockheed, Qualcomm, Samsung, SRC

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction l Synchronous

More information

POWER consumption has become one of the most important

POWER consumption has become one of the most important 704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits

A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits Basit Riaz Sheikh and Rajit Manohar, Cornell University We present two novel energy-efficient pipeline templates for high

More information

An Asynchronous Floating-Point Multiplier

An Asynchronous Floating-Point Multiplier An Asynchronous Floating-Point Multiplier Basit Riaz Sheikh and Rajit Manohar Computer Systems Lab Cornell University http://vlsi.cornell.edu/ The person that did the work! Motivation Fast floating-point

More information

Reconfigurable Asynchronous Logic

Reconfigurable Asynchronous Logic Reconfigurable Asynchronous Logic Rajit Manohar Computer Systems Laboratory Cornell University, Ithaca, NY 14853. Abstract Challenges in mapping asynchronous logic to a flexible substrate include developing

More information

SpiNNaker - a million core ARM-powered neural HPC

SpiNNaker - a million core ARM-powered neural HPC The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding

Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding Marcos Ferretti, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California Los Angeles, CA 90089

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

High Performance Asynchronous Circuit Design Method and Application

High Performance Asynchronous Circuit Design Method and Application High Performance Asynchronous Circuit Design Method and Application Charlie Brej School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK. cbrej@cs.man.ac.uk Abstract

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

GRE Architecture Session

GRE Architecture Session GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble

More information

Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design

Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design Basel Halak and Hsien-Chih Chiu, ECS, Southampton University, Southampton, SO17 1BJ, United Kingdom Email: {bh9, hc13g09}

More information

Natalie Enright Jerger, Jason Anderson, University of Toronto November 5, 2010

Natalie Enright Jerger, Jason Anderson, University of Toronto November 5, 2010 Next Generation FPGA Research Natalie Enright Jerger, Jason Anderson, and Ali Sheikholeslami l i University of Toronto November 5, 2010 Outline Part (I): Next Generation FPGA Architectures Asynchronous

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 The Design of High-Performance Dynamic Asynchronous Pipelines: Lookahead Style Montek Singh and Steven M. Nowick Abstract A new class of asynchronous

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

TEMPLATE BASED ASYNCHRONOUS DESIGN

TEMPLATE BASED ASYNCHRONOUS DESIGN TEMPLATE BASED ASYNCHRONOUS DESIGN By Recep Ozgur Ozdag A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Volume 4 Issue 01 Pages-4786-4792 January-2016 ISSN (e): 2321-7545 Website: http://ijsae.in 16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Authors Channa.sravya

More information

Introduction to asynchronous circuit design. Motivation

Introduction to asynchronous circuit design. Motivation Introduction to asynchronous circuit design Using slides from: Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic,

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,

More information

Introduction to Asynchronous Circuits and Systems

Introduction to Asynchronous Circuits and Systems RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction

More information

Asynchronous Circuit Design

Asynchronous Circuit Design Asynchronous Circuit Design Chris J. Myers Lecture 9: Applications Chapter 9 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 1 / 60 Overview A brief history of asynchronous circuit

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

ISSN Vol.08,Issue.07, July-2016, Pages:

ISSN Vol.08,Issue.07, July-2016, Pages: ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

Architecture or Parallel Computers CSC / ECE 506

Architecture or Parallel Computers CSC / ECE 506 Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

An Asynchronous Array of Simple Processors for DSP Applications

An Asynchronous Array of Simple Processors for DSP Applications An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014 RESEARCH ARTICLE OPEN ACCESS A Survey on Efficient Low Power Asynchronous Pipeline Design Based on the Data Path Logic D. Nandhini 1, K. Kalirajan 2 ME 1 VLSI Design, Assistant Professor 2 Department of

More information

Recent Advances in Designing Clockless Digital Systems

Recent Advances in Designing Clockless Digital Systems Recent Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Chair, Computer Engineering Program Department of Computer Science (and Elect. Eng.) Columbia

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC 181 POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC R.Yamini, V.Kavitha, S.Sarmila, Anila Ramachandran,, Assistant Professor, ECE Dept, M.E Student, M.E. Student, M.E. Student Sri Eshwar

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Mapping AER Events to SpiNNaker Multicast Packets

Mapping AER Events to SpiNNaker Multicast Packets SpiNNaker AppNote 8 SpiNNaker - AER interface Page 1 AppNote 8 - Interfacing AER devices to SpiNNaker using an FPGA SpiNNaker Group, School of Computer Science, University of Manchester Luis A. Plana -

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Design of 8 bit Pipelined Adder using Xilinx ISE

Design of 8 bit Pipelined Adder using Xilinx ISE Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

A Scalable Counterflow-Pipelined Asynchronous Radix-4 Booth Multiplier Λ

A Scalable Counterflow-Pipelined Asynchronous Radix-4 Booth Multiplier Λ A Scalable Counterflow-Pipelined Asynchronous Radix-4 Booth Multiplier Λ Justin Hensley, Anselmo Lastra and Montek Singh Department of Computer Science University of North Carolina, Chapel Hill, NC 27514,

More information

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4) Verilog Sequential Logic Verilog for Synthesis Rev C (module 3 and 4) Jim Duckworth, WPI 1 Sequential Logic Module 3 Latches and Flip-Flops Implemented by using signals in always statements with edge-triggered

More information

More Course Information

More Course Information More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

Emerging computing paradigms: The case of neuromorphic platforms

Emerging computing paradigms: The case of neuromorphic platforms Emerging computing paradigms: The case of neuromorphic platforms Andrea Acquaviva DAUIN Computer and control Eng. Dept. 1 Neuromorphic platforms Neuromorphic platforms objective is to enable simulation

More information

The CoreConnect Bus Architecture

The CoreConnect Bus Architecture The CoreConnect Bus Architecture Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formerly attached

More information

AFRL-RI-RS-TR

AFRL-RI-RS-TR AFRL-RI-RS-TR-2015-047 EXPERIMENTAL 3D ASYNCHRONOUS FIELD PROGRAMMABLE GATE ARRAY (FPGA) CORNELL UNIVERSITY MARCH 2015 FINAL TECHNICAL REPORT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED STINFO

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA

More information

Multicycle-Path Challenges in Multi-Synchronous Systems

Multicycle-Path Challenges in Multi-Synchronous Systems Multicycle-Path Challenges in Multi-Synchronous Systems G. Engel 1, J. Ziebold 1, J. Cox 2, T. Chaney 2, M. Burke 2, and Mike Gulotta 3 1 Department of Electrical and Computer Engineering, IC Design Research

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

Lecture 3: Modulation & Layering"

Lecture 3: Modulation & Layering Lecture 3: Modulation & Layering" CSE 123: Computer Networks Alex C. Snoeren HW 1 out Today, due 10/09! Lecture 3 Overview" Encoding schemes Shannon s Law and Nyquist Limit Clock recovery Manchester, NRZ,

More information

Interconnecting Components

Interconnecting Components Interconnecting Components Need interconnections between CPU, memory, controllers Bus: shared communication channel Parallel set of wires for data and synchronization of data transfer Can become a bottleneck

More information

Computer Architecture: Dataflow/Systolic Arrays

Computer Architecture: Dataflow/Systolic Arrays Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1 ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

ANEW asynchronous pipeline style, called MOUSETRAP,

ANEW asynchronous pipeline style, called MOUSETRAP, 684 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007 MOUSETRAP: High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh and Steven M. Nowick Abstract

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A REVIEWARTICLE OF SDRAM DESIGN WITH NECESSARY CRITERIA OF DDR CONTROLLER Sushmita Bilani *1 & Mr. Sujeet Mishra 2 *1 M.Tech Student

More information

Silicon Auditory Processors as Computer Peripherals

Silicon Auditory Processors as Computer Peripherals Silicon uditory Processors as Computer Peripherals John Lazzaro, John Wawrzynek CS Division UC Berkeley Evans Hall Berkeley, C 94720 lazzaro@cs.berkeley.edu, johnw@cs.berkeley.edu M. Mahowald, Massimo

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached

More information

Creating a Scalable Microprocessor:

Creating a Scalable Microprocessor: Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub. es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller

More information

Asynchronous Circuits: An Increasingly Practical Design Solution

Asynchronous Circuits: An Increasingly Practical Design Solution Asynchronous Circuits: An Increasingly Practical Design Solution Peter A. Beerel Fulcrum Microsystems Calabasas Hills, CA 91301 and Electrical Engineering, Systems Division University of Southern California

More information

AMULET2e. S. B. Furber, J. D. Garside, S. Temple, Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK.

AMULET2e. S. B. Furber, J. D. Garside, S. Temple, Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK. AMULET2e S. B. Furber, J. D. Garside, S. Temple, Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK. P. Day, N. C. Paver, Cogency Technology Inc., Bruntwood

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Babbage Analytical Machine

Babbage Analytical Machine Von Neumann Machine Babbage Analytical Machine The basis of modern computers is proposed by a professor of mathematics at Cambridge University named Charles Babbage (1972-1871). He has invented a mechanical

More information

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits NOC INTERCONNECT IMPROVES SOC ECONO CONOMICS Initial Investment is Low Compared to SoC Performance and Cost Benefits A s systems on chip (SoCs) have interconnect, along with its configuration, verification,

More information

CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns

CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 Lecture

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Computer Architecture Dr. Charles Kim Howard University

Computer Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals & Design Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components

More information