FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs
|
|
- Beryl Blake
- 5 years ago
- Views:
Transcription
1 1/29 FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre + Tushar Krishna nachiket@uwaterloo.ca, tushar@ece.gatech.edu
2 2/29 Claim FPGA overlay NoCs designed to exploit interconnect properties of the FPGA fabric can surpass existing state-of-the-art NoCs by: throughput 2.2 energy at 2.5 LUT cost Xilinx Virtex-7 485T FPGA, 8 8 system size, synthetic+real-world traffic.
3 3/29 Context FPGAs finding comfortable home in datacenters Offloading compute intensive workloads to the FPGA Energy-efficiency, fast coupling to networking Common Infrastructure: NoCs for apps + system IO
4 3/29 Context FPGAs finding comfortable home in datacenters Offloading compute intensive workloads to the FPGA Energy-efficiency, fast coupling to networking Common Infrastructure: NoCs for apps + system IO
5 Landscape of contemporary FPGA NoC Routers 4/29
6 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus
7 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus
8 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus
9 5/29 Landscape of contemporary FPGA NoC Routers Peak S/W Bandwidth (packets/ns) Split Merge CONNECT BLESS OpenSMART Cost per Switch max(luts,ffs) Hoplite FastTrack ASIC clones transplanted onto FPGAs fare poorly! expensive buffers, virtual channels, multi-ported itches Even contemporary FPGA routers are expensive and slow FastTrack: Deflection-routing + Bufferless + Torus
10 6/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite
11 Quick Tutorial on Hoplite 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs, TRETS 2017 Hoplite: Building Austere Overlay NoCs for FPGAs, FPL /29
12 8/29 Quick Tutorial on HopliteRT 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 HopliteRT: An Efficient FPGA NoC for Real-Time Applications, FPT 2017
13 9/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite
14 9/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite
15 10/29 Challenge Deflection routing inefficient use of wiring resources Deflected packets stay in network for longer latency Steal bandwidth from other traffic throughput Can we allow improve NoC performance under deflection routing? Are there unique opportunities provided by the FPGA fabric? Hoplite cheap in LUT cost... FastTrack inspect FPGA interconnect
16 11/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation
17 12/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation
18 13/29 FPGA Wire Speeds distances not to scale
19 14/29 FastTrack NoC Organization
20 15/29 Depopulated Topology Generation
21 16/29 Parametric Topology generation FPGA NoC parameterized by three terms: N System size D Distance of express link R Depopulation parameter controls how many routers are FastTrack vs. vanilla Hoplite Fully populated 4 4 NoC FT(16,2,1) Half population 4 4 NoC FT(16,2,2)
22 17/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation
23 18/29 FastTrack Switch Organization N Sh N Ex N W Ex 5:1 E Ex W 3:1 E W Sh 4:1 E Sh 3:1 4:1 4:1 PE S (a) Base HopliteRT PE S Sh S Ex (b) FastTrack
24 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
25 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
26 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
27 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
28 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
29 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
30 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
31 19/29 Switch Operation W Ex W Sh PE N Sh N Ex 4:1 4:1 S Sh S Ex 5:1 4:1 E Ex E Sh Packets can start in either short or express links DOR routing function: travel in X first, then Y Packets can upgrade to fast links if they can Packets can downgrade to slow links only on turn! Livelock avoidance: W S > N S Express links=higher priority, deflected packets acquire higher priority progress
32 20/29 Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation
33 21/29 Experimental Setup RTL implementation of Routers parameterized D, R parameters control cost Cycle-accurate simulations Verilator FPGA synthesis + out-of-context place-and-route + XDC floorplanning constraints Vivado Benchmarking: Synthetic traffic patterns at various injection rates Traces from real workloads SpMV, Graph Analytics, Multi-processing Measure sustained throughput, average latency, power model
34 22/29 Avg. Latency RANDOM traffic 8 8 NoC Avg Latency (cyc) Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate 0.2
35 22/29 Avg. Latency RANDOM traffic 8 8 NoC Avg Latency (cyc) Hoplite 3x Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate 0.2
36 23/29 Avg. Latency RANDOM traffic Avg Latency (cyc) Avg Latency (cyc) Hoplite FT(N,2,2) FT(N,2,1) 0.1 Injection Rate Injection Rate 0.2 Hoplite 3x Hoplite FT(N,2,2) FT(N,2,1) 0.2 FastTrack saturates at 4 5 higher injection rate than Hoplite vs Replicated Hoplite, still better but by smaller margin Replicated Hoplite has a new kind of livelock possibility (delivery)
37 24/29 Results LUT vs Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Area (LUTs)
38 24/29 Results LUT vs Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite 2x Hoplite Hoplite 3x Area (LUTs)
39 24/29 Results LUT vs Throughput 8 8 NoC Sustained Rate (Million Packets/s) FT R=1 Hoplite 2x Hoplite FT R=2 Hoplite 3x Area (LUTs)
40 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Wire Count
41 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite Hoplite 2x Hoplite 3x Wire Count
42 25/29 Results Wiring vs. Throughput 8 8 NoC 200 Sustained Rate (Million Packets/s) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Wire Count
43 26/29 Results Cost vs. Throughput 8 8 NoC Sustained Rate (Million Packets/s) Sustained Rate (Million Packets/s) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Area (LUTs) Hoplite 2x Hoplite FT R=2 FT R=1 Hoplite 3x Wire Count FastTrack makes better use of FPGA resources (LUTs, and wires) Packets are allowed to leave the NoC faster, freeing up resources Must pick proper combination of FT design parameters
44 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART BLESS CONNECT Split-Merge Hoplite
45 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite
46 27/29 Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART BLESS CONNECT Split-Merge Hoplite FastTrack
47 28/29 FPGA Mapping Frequency 8 8 NoC NoC Frequency (MHz) FastTrack (64,2) FastTrack (64,4) Hoplite Calibration studies showed express links can travel quickly on chip NoC Datawidth Fmax for 2-hop FastTrack keeps up with original Hoplite 4-hop express link distance too large, some noticeable slowdown
48 29/29 Conclusions FastTrack outperforms state-of-the-art Hoplite FPGA NoC by 2.5 for synthetic traffic, 2.8 for real-world traces 2.2 on energy efficiency 2.5 more LUTs required FastTrack better at larger system sizes Ideal hop distance is 2 4 (4 256 PEs) Fmax gap between FastTrack and Hoplite is small
FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs
FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre University of Waterloo Ontario, Canada nachiket@uwaterloo.ca Tushar Krishna Georgia Institute
More informationHoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre
-DSP Harnessing the Xilinx DSP Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org FPL 201 paper Jan Gray co-author Specs 60 s+100 FFs 2.9ns clock Smallest
More informationOpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April
More informationHoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs
Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs Siddhartha Nanyang Technological University siddhart00@e.ntu.edu.sg Nachiket Kapre University of Waterloo nachiket@uwaterloo.ca Abstract The Hoplite
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationSynchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom
ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar
More informationImplementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades
Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades Nachiket Kapre University of Waterloo Waterloo, Ontario, Canada Email: nachiket@uwaterloo.ca Abstract We can enhance the performance
More informationEvaluating Bufferless Flow Control for On-Chip Networks
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University In a nutshell Many researchers report high buffer
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationFIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for oc Modeling in Full-System Simulations Michael K. Papamichael, James C. Hoe, Onur Mutlu papamix@cs.cmu.edu, jhoe@ece.cmu.edu, onur@cmu.edu
More informationUltra-Fast NoC Emulation on a Single FPGA
The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo
More informationMinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU
More informationMarathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs
Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs Nachiket Kapre Nanyang Technological University 5 Nanyang Avenue, Singapore 639798 Email: nachiket@ieee.org Abstract We can improve
More informationFast Flexible FPGA-Tuned Networks-on-Chip
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationRe-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationLow-Power Interconnection Networks
Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:
More informationGRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray
GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationLecture 3: Topology - II
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and
More informationThomas Moscibroda Microsoft Research. Onur Mutlu CMU
Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank
More informationFlow Control can be viewed as a problem of
NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationOpenSMART: An Opensource Singlecycle Multi-hop NoC Generator
OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/get-opensmart)
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationUNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department
UNIVERSITY OF CASTILLA-LA MANCHA Computing Systems Department A case study on implementing virtual 5D torus networks using network components of lower dimensionality HiPINEB 2017 Francisco José Andújar
More informationNetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013
NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationFast Scalable FPGA-Based Network-on-Chip Simulation Models
We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations and support. Computer Architecture Lab at Carnegie Mellon Fast Scalable FPGA-Based Network-on-Chip Simulation
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationHopliteRT: An Efficient FPGA NoC for Real-time Applications
HopliteRT: An Efficient FPGA NoC for Real-time Applications 1,2 Saud Wasly, 1 Rodolfo Pellizzoni, 1 Nachiket Kapre 1 University of Waterloo, Canada, {asly, rpellizz, nachiket}@uwaterloo.ca 2 King Abdulaziz
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationBeiHang Short Course, Part 5: Pandora Smart IP Generators
BeiHang Short Course, Part 5: Pandora Smart IP Generators James C. Hoe Department of ECE Carnegie Mellon University Collaborator: Michael Papamichael J. C. Hoe, CMU/ECE/CALCM, 0, BHSC L5 s CONNECT NoC
More information3D WiNoC Architectures
Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",
More information3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs
3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs NACHKET KAPRE, University of Waterloo JAN GRA, Gray Research LLC We can design an FPGA-optimized lightweight network-on-chip (NoC) router
More informationSmart Port Allocation in Adaptive NoC Routers
205 28th International Conference 205 on 28th VLSI International Design and Conference 205 4th International VLSI Design Conference on Embedded Systems Smart Port Allocation in Adaptive NoC Routers Reenu
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationLecture 3: Flow-Control
High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor
More informationPart IV: 3D WiNoC Architectures
Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures
More informationDesign and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University
Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing
More informationDual Split-Merge: A High Throughput Router Architecture for FPGAs
Dual plit-erge: A High Throughput Router Architecture for FPGAs Khaled Helal a,, ameh Attia a,, Hossam Fahmy a, Tawfik Ismail a, Yehea Ismail b, Hassan ostafa a,b, Abstract a Department of Electronics
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationLecture 7: Flow Control - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical
More informationA Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on
A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced
More informationCONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip
Workshop on the Intersections of Computer Architecture and Reconfigurable Logic (CARL 212): Category 2 CONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael Computer Science Department
More informationA Thermal-aware Application specific Routing Algorithm for Network-on-chip Design
A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong
More informationGRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework. Jan Gray CARRV2017: 2017/10/14
GRVI halanx Update: A Massively arallel RISC-V FGA Accelerator Framework Jan Gray jan@fpga.org http://fpga.org CARRV2017: 2017/10/14 FGA Datacenter Accelerators Are Almost Mainstream Catapult v2. Intel
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationIQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.
IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring
More informationDESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE. NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM. A Thesis SWAPNIL SUBHASH LOTLIKAR
DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM A Thesis by SWAPNIL SUBHASH LOTLIKAR Submitted to the Office of Graduate Studies of Texas A&M
More informationNetwork-on-Chip Micro-Benchmarks
Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of
More informationDesign and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3
IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek
More informationDynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers
Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California
More informationAchieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation
Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationTrading hardware overhead for communication performance in mesh-type topologies
Trading hardware overhead for communication performance in mesh-type topologies Claas Cornelius, Philipp Gorski, Stephan Kubisch, Dirk Timmermann 13th EUROMICRO Conference on Digital System Design (DSD)
More informationA Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems
A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems
More informationAn Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs
An Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs Thiem Van Chu and Kenji Kise Tokyo Institute of Technology, JSPS Research Fellow thiem@arch.cs.titech.ac.jp, kise@c.titech.ac.jp
More informationInterconnection Networks: Routing. Prof. Natalie Enright Jerger
Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly
More informationAUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP
AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP Mohamed S. Abdelfattah and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada {mohamed,vaughn}@eecg.utoronto.ca
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationMULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS. A Thesis HRISHIKESH NANDKISHOR DESHPANDE
MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS A Thesis by HRISHIKESH NANDKISHOR DESHPANDE Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment
More informationRiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner
RiceNIC A Reconfigurable Network Interface for Experimental Research and Education Jeffrey Shafer Scott Rixner Introduction Networking is critical to modern computer systems Role of the network interface
More informationQuest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling
Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationA thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.
High-Performance Crossbar Designs for Network-on-Chips (NoCs) A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationAddress InterLeaving for Low- Cost NoCs
Address InterLeaving for Low- Cost NoCs Miltos D. Grammatikakis, Kyprianos Papadimitriou, Polydoros Petrakis, Marcello Coppola, and Michael Soulie Technological Educational Institute of Crete, GR STMicroelectronics,
More informationProtoFlex: FPGA Accelerated Full System MP Simulation
ProtoFlex: FPGA Accelerated Full System MP Simulation Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Our work in this area has been supported in part
More informationStructured Datapaths. Preclass 1. Throughput Yield. Preclass 1
ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]
More informationCCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationDesign of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques
Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationRouting Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip
Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract
More informationMapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees
Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees Srinivasan Murali 1, Prof. Luca Benini 2, Prof. Giovanni De Micheli 1 1 Computer Systems Lab, Stanford
More informationMinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
SAFARI Technical Report No. 211-8 (September 13, 211) MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin Gregory Nazario Xiangyao Yu cfallin@cmu.edu gnazario@cmu.edu
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationApplying the Benefits of Network on a Chip Architecture to FPGA System Design
white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1
More informationAn Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms
An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms Yuankun Xue 1, Zhiliang Qian 2, Guopeng Wei 3, Paul Bogdan 1, Chi-Ying Tsui 2, Radu Marculescu 3
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationMesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches?
ESE534: Computer Organization Previously Saw need to exploit locality/structure in interconnect Day 20: April 7, 2010 Interconnect 5: Meshes (and MoT) a mesh might be useful Question: how does w grow?
More informationFCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA
1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,
More informationNoC Simulation in Heterogeneous Architectures for PGAS Programming Model
NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe
More informationDynamic Flow Regulation for IP Integration on Network-on-Chip
Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden Agenda The IP integration problem Why
More informationSTLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip
STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,
More informationRouter Construction. Workstation-Based. Switching Hardware Design Goals throughput (depends on traffic model) scalability (a function of n) Outline
Router Construction Outline Switched Fabrics IP Routers Tag Switching Spring 2002 CS 461 1 Workstation-Based Aggregate bandwidth 1/2 of the I/O bus bandwidth capacity shared among all hosts connected to
More informationMapping of Real-time Applications on
Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real
More informationImplementing Flexible Interconnect Topologies for Machine Learning Acceleration
Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges
More informationDesign of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture
Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and
More informationSoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology
SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching
More informationESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?
ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationChallenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer
Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationNoC Test-Chip Project: Working Document
NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance
More informationEmbedded Systems: Hardware Components (part II) Todor Stefanov
Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded
More informationCONGA: Distributed Congestion-Aware Load Balancing for Datacenters
CONGA: Distributed Congestion-Aware Load Balancing for Datacenters By Alizadeh,M et al. Motivation Distributed datacenter applications require large bisection bandwidth Spine Presented by Andrew and Jack
More information