FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs

Size: px
Start display at page:

Download "FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs"

Transcription

1 1/29 FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre + Tushar Krishna nachiket@uwaterloo.ca, tushar@ece.gatech.edu

2 2/29 Claim FPGA overlay NoCs designed to exploit interconnect properties of the FPGA fabric can surpass existing state-of-the-art NoCs by: throughput 2.2 energy at 2.5 LUT cost Xilinx Virtex-7 485T FPGA, 8 8 system size, synthetic+real-world traffic.

3 3/29 Context FPGAs finding comfortable home in datacenters Offloading compute intensive workloads to the FPGA Energy-efficiency, fast coupling to networking Common Infrastructure: NoCs for apps + system IO

4 3/29 Context FPGAs finding comfortable home in datacenters Offloading compute intensive workloads to the FPGA Energy-efficiency, fast coupling to networking Common Infrastructure: NoCs for apps + system IO

5 Landscape of contemporary FPGA NoC Routers 4/29

6 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus

7 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus

8 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus

9 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) Hoplite FastTrack ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus

10 6/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite

11 Quick Tutorial on Hoplite 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs, TRETS 2017 Hoplite: Building Austere Overlay NoCs for FPGAs, FPL /29

12 8/29 Quick Tutorial on HopliteRT 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 HopliteRT: An Efficient FPGA NoC for Real-Time Applications, FPT 2017

13 9/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite

14 9/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite

15 10/29 Challenge Deflection routing inefficient use of wiring resources Deflected packets stay in network for longer latency Steal bandwidth from other traffic throughput Can we allow improve NoC performance under deflection routing? Are there unique opportunities provided by the FPGA fabric? Hoplite cheap in LUT cost... FastTrack inspect FPGA interconnect

16 11/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation

17 12/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation

18 13/29 FPGA Wire Speeds distances not to scale

19 14/29 FastTrack NoC Organization

20 15/29 Depopulated Topology Generation

21 16/29 Parametric Topology generation FPGA NoC parameterized by three terms: N System size D Distance of express link R Depopulation parameter controls how many routers are FastTrack vs. vanilla Hoplite Fully populated 4 4 NoC FT(16,2,1) Half population 4 4 NoC FT(16,2,2)

22 17/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation

23 18/29 FastTrack Switch Organization N Sh N Ex N W Ex 5:1 E Ex W 3:1 E W Sh 4:1 E Sh 3:1 4:1 4:1 PE S (a) Base HopliteRT PE S Sh S Ex (b) FastTrack

24 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

25 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

26 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

27 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

28 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

29 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

30 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

31 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress

32 20/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation

33 21/29 Experimental Setup RTL implementation of Routers parameterized D, R parameters control cost Cycle-accurate simulations Verilator FPGA synthesis + out-of-context place-and-route + XDC floorplanning constraints Vivado Benchmarking: Synthetic traffic patterns at various injection rates Traces from real workloads SpMV, Graph Analytics, Multi-processing Measure sustained throughput, average latency, power model

34 22/29 Avg. Latency RANDOM traffic 8 8 NoC Avg Latency (cyc) Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate 0.2

35 22/29 Avg. Latency RANDOM traffic 8 8 NoC Avg Latency (cyc) Hoplite 3x Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate 0.2

36 23/29 Avg. Latency RANDOM traffic Avg Latency (cyc) Avg Latency (cyc) Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate Injection Rate 0.2 Hoplite 3x Hoplite FT(N,2,2) FT(N,2,1) 0.2 FastTrack saturates at 4 5 higher injection rate than Hoplite vs Replicated Hoplite, still better but by smaller margin Replicated Hoplite has a new kind of livelock possibility (delivery)

37 24/29 Results LUT vs Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Area (LUTs)

38 24/29 Results LUT vs Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite 2x Hoplite Hoplite 3x Area (LUTs)

39 24/29 Results LUT vs Throughput 8 8 NoC Sustained Rate (Million Packets/s) FT R=1 Hoplite 2x Hoplite FT R=2 Hoplite 3x Area (LUTs)

40 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Wire Count

41 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Hoplite 2x Hoplite 3x Wire Count

42 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Wire Count

43 26/29 Results Cost vs. Throughput 8 8 NoC Sustained Rate (Million Packets/s) Sustained Rate (Million Packets/s) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Area (LUTs) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Wire Count FastTrack makes better use of FPGA resources (LUTs, and wires) Packets are allowed to leave the NoC faster, freeing up resources Must pick proper combination of FT design parameters

44 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite

45 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite

46 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite FastTrack

47 28/29 FPGA Mapping Frequency 8 8 NoC NoC Frequency (MHz) FastTrack (64,2) FastTrack (64,4) Hoplite Calibration studies showed express links can travel quickly on chip NoC Datawidth Fmax for 2-hop FastTrack keeps up with original Hoplite 4-hop express link distance too large, some noticeable slowdown

48 29/29 Conclusions FastTrack outperforms state-of-the-art Hoplite FPGA NoC by 2.5 for synthetic traffic, 2.8 for real-world traces 2.2 on energy efficiency 2.5 more LUTs required FastTrack better at larger system sizes Ideal hop distance is 2 4 (4 256 PEs) Fmax gap between FastTrack and Hoplite is small

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre University of Waterloo Ontario, Canada nachiket@uwaterloo.ca Tushar Krishna Georgia Institute

More information

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre -DSP Harnessing the Xilinx DSP Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org FPL 201 paper Jan Gray co-author Specs 60 s+100 FFs 2.9ns clock Smallest

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs

Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs Siddhartha Nanyang Technological University siddhart00@e.ntu.edu.sg Nachiket Kapre University of Waterloo nachiket@uwaterloo.ca Abstract The Hoplite

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades

Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades Nachiket Kapre University of Waterloo Waterloo, Ontario, Canada Email: nachiket@uwaterloo.ca Abstract We can enhance the performance

More information

Evaluating Bufferless Flow Control for On-Chip Networks

Evaluating Bufferless Flow Control for On-Chip Networks Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University In a nutshell Many researchers report high buffer

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for oc Modeling in Full-System Simulations Michael K. Papamichael, James C. Hoe, Onur Mutlu papamix@cs.cmu.edu, jhoe@ece.cmu.edu, onur@cmu.edu

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs

Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs Nachiket Kapre Nanyang Technological University 5 Nanyang Avenue, Singapore 639798 Email: nachiket@ieee.org Abstract We can improve

More information

Fast Flexible FPGA-Tuned Networks-on-Chip

Fast Flexible FPGA-Tuned Networks-on-Chip This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/get-opensmart)

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department UNIVERSITY OF CASTILLA-LA MANCHA Computing Systems Department A case study on implementing virtual 5D torus networks using network components of lower dimensionality HiPINEB 2017 Francisco José Andújar

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

Fast Scalable FPGA-Based Network-on-Chip Simulation Models We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations and support. Computer Architecture Lab at Carnegie Mellon Fast Scalable FPGA-Based Network-on-Chip Simulation

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

HopliteRT: An Efficient FPGA NoC for Real-time Applications

HopliteRT: An Efficient FPGA NoC for Real-time Applications HopliteRT: An Efficient FPGA NoC for Real-time Applications 1,2 Saud Wasly, 1 Rodolfo Pellizzoni, 1 Nachiket Kapre 1 University of Waterloo, Canada, {asly, rpellizz, nachiket}@uwaterloo.ca 2 King Abdulaziz

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

BeiHang Short Course, Part 5: Pandora Smart IP Generators

BeiHang Short Course, Part 5: Pandora Smart IP Generators BeiHang Short Course, Part 5: Pandora Smart IP Generators James C. Hoe Department of ECE Carnegie Mellon University Collaborator: Michael Papamichael J. C. Hoe, CMU/ECE/CALCM, 0, BHSC L5 s CONNECT NoC

More information

3D WiNoC Architectures

3D WiNoC Architectures Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",

More information

3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs

3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs 3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs NACHKET KAPRE, University of Waterloo JAN GRA, Gray Research LLC We can design an FPGA-optimized lightweight network-on-chip (NoC) router

More information

Smart Port Allocation in Adaptive NoC Routers

Smart Port Allocation in Adaptive NoC Routers 205 28th International Conference 205 on 28th VLSI International Design and Conference 205 4th International VLSI Design Conference on Embedded Systems Smart Port Allocation in Adaptive NoC Routers Reenu

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing

More information

Dual Split-Merge: A High Throughput Router Architecture for FPGAs

Dual Split-Merge: A High Throughput Router Architecture for FPGAs Dual plit-erge: A High Throughput Router Architecture for FPGAs Khaled Helal a,, ameh Attia a,, Hossam Fahmy a, Tawfik Ismail a, Yehea Ismail b, Hassan ostafa a,b, Abstract a Department of Electronics

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

CONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip

CONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip Workshop on the Intersections of Computer Architecture and Reconfigurable Logic (CARL 212): Category 2 CONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael Computer Science Department

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework. Jan Gray CARRV2017: 2017/10/14

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework. Jan Gray   CARRV2017: 2017/10/14 GRVI halanx Update: A Massively arallel RISC-V FGA Accelerator Framework Jan Gray jan@fpga.org http://fpga.org CARRV2017: 2017/10/14 FGA Datacenter Accelerators Are Almost Mainstream Catapult v2. Intel

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song.   HUAWEI TECHNOLOGIES Co., Ltd. IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring

More information

DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE. NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM. A Thesis SWAPNIL SUBHASH LOTLIKAR

DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE. NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM. A Thesis SWAPNIL SUBHASH LOTLIKAR DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM A Thesis by SWAPNIL SUBHASH LOTLIKAR Submitted to the Office of Graduate Studies of Texas A&M

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Trading hardware overhead for communication performance in mesh-type topologies

Trading hardware overhead for communication performance in mesh-type topologies Trading hardware overhead for communication performance in mesh-type topologies Claas Cornelius, Philipp Gorski, Stephan Kubisch, Dirk Timmermann 13th EUROMICRO Conference on Digital System Design (DSD)

More information

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems

More information

An Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs

An Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs An Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs Thiem Van Chu and Kenji Kise Tokyo Institute of Technology, JSPS Research Fellow thiem@arch.cs.titech.ac.jp, kise@c.titech.ac.jp

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP

AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP Mohamed S. Abdelfattah and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada {mohamed,vaughn}@eecg.utoronto.ca

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS. A Thesis HRISHIKESH NANDKISHOR DESHPANDE

MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS. A Thesis HRISHIKESH NANDKISHOR DESHPANDE MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS A Thesis by HRISHIKESH NANDKISHOR DESHPANDE Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment

More information

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner RiceNIC A Reconfigurable Network Interface for Experimental Research and Education Jeffrey Shafer Scott Rixner Introduction Networking is critical to modern computer systems Role of the network interface

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang. High-Performance Crossbar Designs for Network-on-Chips (NoCs) A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Address InterLeaving for Low- Cost NoCs

Address InterLeaving for Low- Cost NoCs Address InterLeaving for Low- Cost NoCs Miltos D. Grammatikakis, Kyprianos Papadimitriou, Polydoros Petrakis, Marcello Coppola, and Michael Soulie Technological Educational Institute of Crete, GR STMicroelectronics,

More information

ProtoFlex: FPGA Accelerated Full System MP Simulation

ProtoFlex: FPGA Accelerated Full System MP Simulation ProtoFlex: FPGA Accelerated Full System MP Simulation Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Our work in this area has been supported in part

More information

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1 ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]

More information

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees

Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees Srinivasan Murali 1, Prof. Luca Benini 2, Prof. Giovanni De Micheli 1 1 Computer Systems Lab, Stanford

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect SAFARI Technical Report No. 211-8 (September 13, 211) MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin Gregory Nazario Xiangyao Yu cfallin@cmu.edu gnazario@cmu.edu

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms

An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms Yuankun Xue 1, Zhiliang Qian 2, Guopeng Wei 3, Paul Bogdan 1, Chi-Ying Tsui 2, Radu Marculescu 3

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Mesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches?

Mesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches? ESE534: Computer Organization Previously Saw need to exploit locality/structure in interconnect Day 20: April 7, 2010 Interconnect 5: Meshes (and MoT) a mesh might be useful Question: how does w grow?

More information

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA 1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,

More information

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe

More information

Dynamic Flow Regulation for IP Integration on Network-on-Chip

Dynamic Flow Regulation for IP Integration on Network-on-Chip Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden Agenda The IP integration problem Why

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

Router Construction. Workstation-Based. Switching Hardware Design Goals throughput (depends on traffic model) scalability (a function of n) Outline

Router Construction. Workstation-Based. Switching Hardware Design Goals throughput (depends on traffic model) scalability (a function of n) Outline Router Construction Outline Switched Fabrics IP Routers Tag Switching Spring 2002 CS 461 1 Workstation-Based Aggregate bandwidth 1/2 of the I/O bus bandwidth capacity shared among all hosts connected to

More information

Mapping of Real-time Applications on

Mapping of Real-time Applications on Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real

More information

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable? ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters CONGA: Distributed Congestion-Aware Load Balancing for Datacenters By Alizadeh,M et al. Motivation Distributed datacenter applications require large bisection bandwidth Spine Presented by Andrew and Jack

More information