An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart

Size: px
Start display at page:

Download "An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart"

Transcription

1 An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart Weiwei Jiang Columbia University, USA Gabriele Miorandi University of Ferrara, Italy Wayne Burleson Advanced Micro Devices, USA Davide Bertozzi University of Ferrara, Italy Steven M. Nowick Columbia University, USA Greg Sadowski Advanced Micro Devices, USA ACM/IEEE Design, Automation and Test in Europe (DATE-17)

2 Motivation for Networks-on-Chip Future of computing is multi-core CPU: 8 to 24 cores widely available - AMD 16-core Opteron 6000 series - AMD Ryzen 4,6,8,+ cores - Intel 24-core Xeon-E7 - Intel Xeon Phi 80+ core AMD Ryzen 8-core Processor (March 2017) GPU: up to graphics cores - AMD FirePro series: up to 2560 GCN Stream Processors - NVIDIA Titan X: 3584 CUDA Cores 1

3 Motivation for Networks-on-Chip (Cont.) NoC separates computation and communication Improves scalability - global interconnects have high latency and power consumption (e.g. buses and point-to-point wiring) Increases performance/energy efficiency - share wiring resources between parallel data flows Facilitates design reuse - optimized IPs can simply plug in largely decrease design efforts 2

4 Potential Advantages of Asynchronous Design No global clock No clock power less overall power than deeply clock-gated sync designs No clock design overhead no clock generation, distribution, skew analysis, etc. - [Gebhardt/Stevens et al., Comparing energy and latency of asynchronous and synchronous NoCs for embedded SoCs, NOCS-10] Greater flexibility/modularity Easily integrates multiple timing domains Supports reusable components - [Bainbridge/Furber, CHAIN: a delay-insensitive chip area interconnect, IEEE Micro-02] Lower system latency No per-router clock synchronization no waiting for clock - [Sheibanyrad/Greiner et al., Multisynchronous and fully asynchronous NoCs for GALS architectures, IEEE Design & Test of Computers-08] 3

5 Recent Commercial Asynchronous NoC Chips Intel s FM5000/6000 Ethernet switches [IEEE Design & Test 2015] - high performance: 640 Gbps max. bandwidth ns cut-through latency - support up to 176 ports IBM s TrueNorth neuromorphic chip [Science 2014] - a 5.4-billion-transitor chip with 4096 neurosynaptic cores - models 1M neurons and 256M synapses - ultra-low power: only 63 milliwatts with 400x240 video input at 30 frames/sec. STMicroelectronics STHORM processor [DAC-12] - A GALS computing accelerator for embedded SoCs - connect 4 clusters, each with 16 sync processors - improved performance efficiency over several Quadro and Nvidia GPUs 4

6 Contributions (1) First comparison for: async vs. commercial sync router in advanced technology Sync baseline is for high-end processors and graphics products - NoC handles system config and power/performance control Sync baseline uses aggressive clock optimization and finegrain clock gating Comparison in a 14nm FinFET library - not textbook academic technology library - state of the art CMOS technology used in commercial products Dominating results for asynchronous - in key metrics: area, latency and idle/active power 5

7 Contributions (2) Implementation and validation at pre- and post-layout results presented only for pre-layout (confidentiality reasons) Industrial tools used in async design and validation Functional validation tool (using Synopsys environment) - wrapper added for async design for sync environment re-use - used for both pre- and post-layout implementations Place & Route tool (using AMD s internal tool environment) - largely manual synthesis + automated P&R - expect automated logic synthesis can be included with reasonable efforts (e.g.,an existing solution is proposed in [Ghiribaldi/Bertozzi/Nowick DATE-13]) 6

8 Contributions (3) A novel async end-to-end credit-based Virtual Channel control scheme Key idea = lazy credit-update approach - credit-increments are queued and no immediate update - credit updated only with a credit-decrement - fewer backward credit synchronization to upstream router Potential increased throughput VC is required for practical industrial usage - many existing async NoCs do not include VCs Not the focus of this presentation (see paper for details) 7

9 Proposed Asynchronous Node Structure Local Terminal North Channel North Channel Router Local Interface North Interface Router West Channel West Interface Switch 1 Switch 0 East Interface East Channel West Channel Router for South Interface East Channel Router for South Channel South Channel Two identical and uncorrelated planes Follows AMD sync baseline router architecture 8

10 Proposed Asynchronous Node Structure (Cont.) Switch replication inside each plane - as many times as the number of VCs Local Terminal West Channel West Interface North Channel Local Interface North Interface Switch 1 Switch 0 North Channel East Interface East Channel For VC #0 traffic For VC #1 traffic West Channel Router for South Interface East Channel Router for South Channel South Channel 9

11 Node Operation 1 Example: data from west input -> east output De-mux data to a switch 3 Merge data from 2 VCs Local Terminal North Channel North Channel Local Interface North Interface Datain West Channel West Interface Switch 1 Switch 0 East Interface East Channel Dataout West Channel Router for South Interface East Channel Router for 2 South Channel Data traverses the switch { South Channel Header sets up the path Body/tail flits follow the pre-set up path 10

12 New Components in the Async Router Two new components added on previous DATE-13 async router Local Terminal North Channel Local Interface North Interface North Channel [Ghiribaldi/Bertozzi/Nowick DATE-13] Input interface: New high-performance Input buffer West Channel West Interface Switch 1 Switch 0 East Interface East Channel Switch 1 Switch 0 Input Interface Output Interface East Channel West Channel Router for South Channel South Interface Router for South Channel East Channel Output interface: New VC control Identical switches; new components in router interfaces 11

13 Input Buffer Circular FIFO: Forward Latency Default-open single D-latch register Default-open single D-latch register + XOR2 Forward latency: 2 x D Q latch delay + XOR2 + XOR4 Written-in data can be immediately read out (not aligned to clk cycle: much faster than a sync circular FIFO) 12

14 Input Buffer Circular FIFO: Storage Element 13 Each async storage element = single level-sensitive D-latch register - Each latch register has full storage capacity - Half area/power cost as a typical Flip-Flop storage in sync key source for performance/area/power benefits 13

15 Output Interface Design: Proposed VC Control Two d a ta inp ut c ha nnels: ea c h from a d ifferent VC a nd c orresp ond ing switc h (OPM) Ackout1 Ackout0 Reqin0 Reqin1 Datain0 Datain1 Blocks or allows output traffic for a particular VC L4 D Q E L3 D Q E full0 full0_valid Full Detector0 Timer 0 forced _clk0 Mutex Input Ctl0 zerowins mutex _req0 Mutex E D Q L1 Mutex Input Ctl1 mutex _req1 E D Q L2 onewins R_ Q Timer 1 forced _clk1 S Q DataMux sel full1 full1_valid Full Detector1 E D Q L5 E D Q L6 E D Q L7 D E Q Data Reg VC c ontrols: from the outp ut link Credit_increment0 Credit_increment1 Ackin Reqout Dataout Da ta outp ut c ha nnel: to the outp ut link Updates downstream credits only every time a flit is sent out (See details in the paper) 14

16 Design Validation Tool Async Router Design Wrapper Pre- or Post-layout netlist Synchronize async I/O data to a given clock (Ideal wrapper, not considering metastability) Standard Sync Simulator Re-used standard sync I/Os and benchmarks 15

17 Design Flow and Place & Route Tool Manually derive gate netlist Manually add inverter chains Manual Synthesis Manual Timing Correction Automated P&R Timing violations? Yes No Final Layout Standard sync P&R with don t touch everything Expect further synthesis automation can be included with reasonable effort - An async logic synthesis solution was proposed in [Ghiribaldi/Bertozzi/Nowick DATE-13] 16

18 Actual Layout for Asynchronous Router West channel pins East channel pins Local channel pins North channels pins Router config.: - double-plane router - 5 port + 2 VCs South channel pins 17

19 Experimental Results: Overview AMD commercial sync router vs. proposed async router Identical router configuration for both routers - 5-port + 2 VCs - buffer depth = 7 for each VC Pre-layout results only (for confidentiality reasons) - post-layout comparisons expected to be similar for small designs One testing benchmark: activating all switch ports - evenly distributed traffic from all inputs to all outputs - sufficient for initial router-level results Testing corner: 14nm FinFET library (0.65V, TT) Additional projected results for more complex routers 7-port router with 2 VCs 5-port router with 8 VCs for 3D stacking more realistic VC configuration 18

20 Basic comparison: 5-port router with 2 VCs Asynchronous router dominates in area, latency and power Comparison for 5-port router with 2 VCs Sync router Async router 55% lower 28% lower 88% lower 58% lower 19

21 Projected Results for More Complex Routers Absolute area and power costs are noticeably increased - due to higher radix or more VCs Relative asynchronous benefits are largely maintained Comparison for 5-port router with 8 VCs Sync 5-port 2 VCs Async 5-port 2 VCs Sync 7-port 2 VCs Async 7-port 2 VCs Sync 5-port 8 VCs Async 5-port 8 VCs 47% lower 28% lower 85% lower 51% lower 47% lower 16% lower 85% lower 51% lower Comparison for 7-port router with 2 VCs 20

22 Conclusions First async vs. commercial sync router in advanced library Sync router optimized for high-end products with fine-grain clock-gating Comparison in 14nm FinFET library Industrial tools for async design and validation Design validation tool: sync testing environments are largely re-used Manual synthesis + automated P&R - synthesis automation can be further included with some effort Shows opportunity for industrial asynchronous designs Some remaining tool challenges for full automation A novel async end-to-end credit-based VC control approach Lazy credit-update approach potential higher throughput Results: async router shows significant benefits In key metrics: area, latency and power 21

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture 1 Physical Implementation of the DSPI etwork-on-chip in the FAUST Architecture Ivan Miro-Panades 1,2,3, Fabien Clermidy 3, Pascal Vivet 3, Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris,

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/get-opensmart)

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction l Synchronous

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Design-for-Test Approach of an Asynchronous etwork-on-chip Architecture and its Associated Test Pattern Generation and Application

Design-for-Test Approach of an Asynchronous etwork-on-chip Architecture and its Associated Test Pattern Generation and Application Design-for-Test Approach of an Asynchronous etwork-on-chip Architecture and its Associated Test Pattern Generation and Application Xuan-Tu Tran 1, 3, Yvain Thonnart 1, Jean Durupt 1, Vincent Beroulle 2,

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

The Design of Low-Latency Interfaces for Mixed-Timing Systems

The Design of Low-Latency Interfaces for Mixed-Timing Systems The Design of Low-Latency Interfaces for Mixed-Timing Systems Tiberiu Chelcea and Steven M. Nowick Department of Computer Science Columbia University Keynote Invited Talk: GALS Session IEEE Workshop on

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN Comparative Analysis of Latency, Throughput and Network Power for West First, North Last and West First North Last Routing For 2D 4 X 4 Mesh Topology NoC Architecture Bhupendra Kumar Soni 1, Dr. Girish

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Clocked and Asynchronous FIFO Characterization and Comparison

Clocked and Asynchronous FIFO Characterization and Comparison Clocked and Asynchronous FIFO Characterization and Comparison HoSuk Han Kenneth S. Stevens Electrical and Computer Engineering University of Utah Abstract Heterogeneous blocks, IP reuse, network-on-chip

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Andrew M. Scott, Mark E. Schuelein, Marly Roncken, Jin-Jer Hwan John Bainbridge, John R. Mawer, David L. Jackson, Andrew

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Design of Asynchronous Interconnect Network for SoC

Design of Asynchronous Interconnect Network for SoC Final Report for ECE 6770 Project Design of Asynchronous Interconnect Network for SoC Hosuk Han 1 han@ece.utah.edu Junbok You jyou@ece.utah.edu May 12, 2007 1 Team leader Contents 1 Introduction 1 2 Project

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

A full asynchronous serial transmission converter for network-on-chips

A full asynchronous serial transmission converter for network-on-chips Vol. 31, No. 4 Journal of Semiconductors April 2010 A full asynchronous serial transmission converter for network-on-chips Yang Yintang( 杨银堂 ), Guan Xuguang( 管旭光 ), Zhou Duan( 周端 ), and Zhu Zhangming(

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

Architecture without explicit locks for logic simulation on SIMD machines

Architecture without explicit locks for logic simulation on SIMD machines Architecture without explicit locks for logic on machines M. Chimeh Department of Computer Science University of Glasgow UKMAC, 2016 Contents 1 2 3 4 5 6 The Using models to replicate the behaviour of

More information

III. RELATED WORK SWITCH ARCHITECTURES UNDER TEST

III. RELATED WORK SWITCH ARCHITECTURES UNDER TEST Accurate Assessment of Bundled-Data Asynchronous NoCs abled by a Predictable and Efficient Hierarchical Synthesis Flow Gabriele Miorandi, Marco Balboni, Steven M. Nowick, Davide Bertozzi Department of

More information

Quality-of-Service for a High-Radix Switch

Quality-of-Service for a High-Radix Switch Quality-of-Service for a High-Radix Switch Nilmini Abeyratne, Supreet Jeloka, Yiping Kang, David Blaauw, Ronald G. Dreslinski, Reetuparna Das, and Trevor Mudge University of Michigan 51 st DAC 06/05/2014

More information

Conquering Memory Bandwidth Challenges in High-Performance SoCs

Conquering Memory Bandwidth Challenges in High-Performance SoCs Conquering Memory Bandwidth Challenges in High-Performance SoCs ABSTRACT High end System on Chip (SoC) architectures consist of tens of processing engines. In SoCs targeted at high performance computing

More information

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G MAHESH BABU, et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G.Mahesh Babu 1*, Prof. Ch.Srinivasa Kumar 2* 1. II. M.Tech (VLSI), Dept of ECE,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

WITH THE CONTINUED advance of Moore s law, ever

WITH THE CONTINUED advance of Moore s law, ever IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011 1663 Asynchronous Bypass Channels for Multi-Synchronous NoCs: A Router Microarchitecture, Topology,

More information

Design and Analysis of Networks-on-Chip in Heterogeneous Multicore Systems. Young Jin Yoon

Design and Analysis of Networks-on-Chip in Heterogeneous Multicore Systems. Young Jin Yoon Design and Analysis of Networks-on-Chip in Heterogeneous Multicore Systems Young Jin Yoon Contents Motivation and Applications System Drivers On-Chip Communication and Networks-on-Chip

More information

OCB-Based SoC Integration

OCB-Based SoC Integration The Present and The Future 黃俊達助理教授 Juinn-Dar Huang, Assistant Professor March 11, 2005 jdhuang@mail.nctu.edu.tw Department of Electronics Engineering National Chiao Tung University 1 Outlines Present Why

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

On Packet Switched Networks for On-Chip Communication

On Packet Switched Networks for On-Chip Communication On Packet Switched Networks for On-Chip Communication Embedded Systems Group Department of Electronics and Computer Engineering School of Engineering, Jönköping University Jönköping 1 Outline : Part 1

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

The Nostrum Network on Chip

The Nostrum Network on Chip The Nostrum Network on Chip 10 processors 10 processors Mikael Millberg, Erland Nilsson, Richard Thid, Johnny Öberg, Zhonghai Lu, Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

THERE are few published methods to aid in automating

THERE are few published methods to aid in automating IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 9, SEPTEMBER 2011 1387 Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology 1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 34 (2014 ) 552 558 2014 International Workshop on the Design and Performance of Network on Chip (DPNoC 2014) Packet-based

More information

Place Your Logo Here. K. Charles Janac

Place Your Logo Here. K. Charles Janac Place Your Logo Here K. Charles Janac President and CEO Arteris is the Leading Network on Chip IP Provider Multiple Traffic Classes Low Low cost cost Control Control CPU DSP DMA Multiple Interconnect Types

More information

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc. On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

Digital Integrated Circuits

Digital Integrated Circuits Digital Integrated Circuits Lecture 4 Jaeyong Chung System-on-Chips (SoC) Laboratory Incheon National University BCD TO EXCESS-3 CODE CONVERTER 0100 0101 +0011 +0011 0111 1000 LSB received first Chung

More information

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Srinivas Murthy Garimella Master s Thesis Defense Thesis Advisor: Dr. Charles E. Stroud Committee Members: Dr. Victor P. Nelson

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Reconfigurable Cell Array for DSP Applications

Reconfigurable Cell Array for DSP Applications Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Performance Evaluation of Elastic GALS Interfaces and Network Fabric

Performance Evaluation of Elastic GALS Interfaces and Network Fabric FMGALS 2007 Performance Evaluation of Elastic GALS Interfaces and Network Fabric Junbok You Yang Xu Hosuk Han Kenneth S. Stevens Electrical and Computer Engineering University of Utah Salt Lake City, U.S.A

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub. es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller

More information

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre -DSP Harnessing the Xilinx DSP Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org FPL 201 paper Jan Gray co-author Specs 60 s+100 FFs 2.9ns clock Smallest

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

EECS 570 Final Exam - SOLUTIONS Winter 2015

EECS 570 Final Exam - SOLUTIONS Winter 2015 EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Kyoung Hwan Lim and Taewhan Kim Seoul National University

Kyoung Hwan Lim and Taewhan Kim Seoul National University Kyoung Hwan Lim and Taewhan Kim Seoul National University Table of Contents Introduction Motivational Example The Proposed Algorithm Experimental Results Conclusion In synchronous circuit design, all sequential

More information

Bandwidth Optimization in Asynchronous NoCs by Customizing Link Wire Length

Bandwidth Optimization in Asynchronous NoCs by Customizing Link Wire Length Bandwidth Optimization in Asynchronous NoCs by Customizing Wire Length Junbok You Electrical and Computer Engineering, University of Utah jyou@ece.utah.edu Daniel Gebhardt School of Computing, University

More information

Asynchronous Circuit Design

Asynchronous Circuit Design Asynchronous Circuit Design Chris J. Myers Lecture 9: Applications Chapter 9 Chris J. Myers (Lecture 9: Applications) Asynchronous Circuit Design 1 / 60 Overview A brief history of asynchronous circuit

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS 1 SARAVANAN.K, 2 R.M.SURESH 1 Asst.Professor,Department of Information Technology, Velammal Engineering College, Chennai, Tamilnadu,

More information

Martin Dubois, ing. Contents

Martin Dubois, ing. Contents Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information