Fast Flexible FPGA-Tuned Networks-on-Chip

Similar documents
Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

CONNECT: Fast Flexible FPGA-Tuned Networks-on-Chip

CONNECT: Re-Examining Conventional Wisdom for Designing NoCs in the Context of FPGAs

Modern integrated circuits

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

BeiHang Short Course, Part 5: Pandora Smart IP Generators

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

Interconnection Networks

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

Interconnection Networks

Ultra-Fast NoC Emulation on a Single FPGA

Small Virtual Channel Routers on FPGAs Through Block RAM Sharing

DESIGN GUIDELINES FOR THE IMPLEMENTATION OF EMBEDDED NETWORK ON CHIP (NOC) IN FPGAS. Noha Gamal Mohamed

Lecture 6: Hard vs Soft Logic. James C. Hoe Department of ECE Carnegie Mellon University

OASIS Network-on-Chip Prototyping on FPGA

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Interconnection Networks

Lecture 3: Topology - II

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

POLYMORPHIC ON-CHIP NETWORKS

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Evaluating Bufferless Flow Control for On-Chip Networks

Lecture: Interconnection Networks

ECE/CS 757: Advanced Computer Architecture II Interconnects

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre

Network on Chip Architecture: An Overview

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

Design Methodologies. Full-Custom Design

Low-Power Interconnection Networks

Networks-on-Chip Router: Configuration and Implementation

Lecture 22: Router Design

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Introduction to the Qsys System Integration Tool

NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION SIMPLIFIES THE DISTRIBUTION OF I/O DATA THROUGHOUT THE CHIP, AND IS ALWAYS

Networks-on-Chip for FPGAs: Hard, Soft or Mixed?

Basic Low Level Concepts

Network-on-chip (NOC) Topologies

Design of a System-on-Chip Switched Network and its Design Support Λ

TDT Appendix E Interconnection Networks

PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing

NoC Test-Chip Project: Working Document

Design and Implementation of A Reconfigurable Arbiter

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

Flow Control can be viewed as a problem of

Lecture 3: Flow-Control

Real-Time Mixed-Criticality Wormhole Networks

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

FPGA Based Digital Design Using Verilog HDL

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

3D WiNoC Architectures

An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms

4. Networks. in parallel computers. Advances in Computer Architecture

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

Lecture 7: Flow Control - I

A closer look at network structure:

Design Methodologies and Tools. Full-Custom Design

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

INTRODUCTION TO FPGA ARCHITECTURE

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Part IV: 3D WiNoC Architectures

Employing Multi-FPGA Debug Techniques

Embedded Systems: Hardware Components (part II) Todor Stefanov

Time-Multiplexed FPGA Overlay Networks on Chip

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Topologies. Maurizio Palesi. Maurizio Palesi 1

Packet Switch Architecture

Packet Switch Architecture

Multi processor systems with configurable hardware acceleration

Address InterLeaving for Low- Cost NoCs

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Quality-of-Service for a High-Radix Switch

INT G bit TCP Offload Engine SOC

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

Lecture 2: Topology - I

INTERCONNECTION NETWORKS LECTURE 4

A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports

Prediction Router: Yet another low-latency on-chip router architecture

The Design and Implementation of a Low-Latency On-Chip Network

Transcription:

This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe papamix@cs.cmu.edu, jhoe@ece.cmu.edu www.ece.cmu.edu/~mpapamic/connect Computer Architecture Lab at Portland, OR, June 2012

FPGAs and Networks-on-Chip (NoCs) Rapid growth of FPGA capacity and features Extended SoC and full-system prototyping FPGA-based high-performance computing Need for flexible NoCs to support communication Map existing ASIC-oriented NoC designs on FPGAs? 2

FPGAs and Networks-on-Chip (NoCs) Rapid growth of FPGA capacity and features Extended SoC and full-system prototyping FPGA-based high-performance computing Need for flexible NoCs to support communication Map existing ASIC-oriented NoC designs on FPGAs? Inefficient use of FPGA resources ASIC-driven NoC architecture not optimal for FPGA 3

FPGAs and Networks-on-Chip (NoCs) Rapid growth of FPGA capacity and features Extended SoC and full-system prototyping FPGA-based high-performance computing Need for flexible NoCs to support communication Map existing ASIC-oriented NoC designs on FPGAs? Inefficient use of FPGA resources ASIC-driven NoC architecture not optimal for FPGA FPGA-tuned NoC Architecture Embodies FPGA-motivated design principles Very lightweight, minimizes resource usage ~50% resource reduction vs. ASIC-oriented NoC Publicly released flexible NoC generator (demo) Often goes against ASIC-driven NoC conventional wisdom 4

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 5

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 6

NoC Terminology Overview Router A Virtual Channels Tail Flit Flits Packet T F F F F F Data Channel/Link Flow Control Link Head Flit H F Router B Virtual Channels Packets Basic logical unit of transmission Flits Packets broken into into multiple flits unit of flow control Virtual Channels Multiple logical channels over single physical link Flow Control Management of buffer space in the network 7

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 8

How FPGAs are Different from ASICs FPGAs peculiar HW realization substrate in terms of Relative cost of speed vs. logic vs. wires vs. memory Unique mapping and operating characteristics focuses on 4 FPGA characteristics: 1 2 3 4 Abundance of Wires Storage Shortage & Peculiarities Frequency Challenged Reconfigurable Nature FPGA characteristics uniquely influence key NoC design decisions 9

Tailoring NoCs to FPGAs 1 Abundance of of Wires Densely connected wiring substrate (Over)provisioned to handle worst case Wires are free compared to other resources Make datapaths and channels as wide as possible Adjust packet format --CONNECT-- NoC Implications E.g. carry control info on the side through dedicated links Adapt traditional credit-based flow control 10

Tailoring NoCs to FPGAs 2 Storage Abundance Shortage & of Peculiarities Wires Modern FPGAs offer storage in two forms Block RAMs and LUT RAMs (use logic resources) Only come in specific aspect ratios and sizes Typically in high demand, especially Block RAMs Minimize usage and optimize for aspect ratios and sizes Implement multiple logical flit buffers in each physical buffer Use LUT RAM for flit buffers --CONNECT-- NoC Implications Block RAM much larger than typically NoC flit buffer sizes Allow rest of design to use scarce Block RAM resources 11

Tailoring NoCs to FPGAs 3 Frequency Abundance Challenged of Wires Much lower frequencies compared to ASICs LUTs inherently slower that ASIC standard cells Large wire delays when chaining LUTs Rapidly diminishing returns when pipelining Deep pipelining hard due to quantization effects --CONNECT-- NoC Implications Design router as single-stage pipeline Also dramatically reduces network latency Make up for lower frequency by adjusting network E.g. increase width of datapath and links or change topology 12

Tailoring NoCs to FPGAs 4 Reconfigurable Abundance of Nature Wires Reconfigurable nature of FPGAs Sets them apart from ASICs Support diverse range of applications --CONNECT-- NoC Implications Support extensive application-specific customization Flexible parameterized NoC architecture Automated NoC design generator (demo!) Adhere to standard common interface NoC appears as plug-and-play black box from user-perspective 13

Routing Arbitration CONNECT Architecture Topology-Agnostic Parameterized Architecture # in/out ports, # virtual channels, flit width, buffer depths Flexible user-specified routing Four allocation algorithms and two flow-control mechanisms CONNECT Router Architecture Input Ports In0 (flits) In0 (credits) In15 (flits) In15 (credits) Flit Buffers VC 0 VC 7 VC 0 VC 1 Router Arbitration & Flow Control State Switch Output Ports Out0 (flits) Out0 (credits) Out15 (flits) Out15 (credits)

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 15

LUTs w/ FPGA RTL opts. LUTs CONNECT vs. ASIC-Oriented RTL 16-node 4x4 Mesh Network-on-Chip (NoC) SOTA: state-of-the-art high-quality ASIC-oriented RTL* CONNECT: identically configured -generated RTL FPGA Resource Usage (same router/noc configuration) 9K 8K 7K 60K 50K 6K 5K 4K 40K 30K 50% 3K 2K 1K 50% 20K 10K Single Router 4x4 Mesh Noc *NoC RTL from http://nocs.stanford.edu/cgi-bin/trac.cgi/wiki/resources/router 16

LUTs w/ FPGA RTL opts. LUTs Avg. Packet Latency (in ns) CONNECT vs. ASIC-Oriented RTL 16-node 4x4 Mesh Network-on-Chip (NoC) SOTA: state-of-the-art high-quality ASIC-oriented RTL* CONNECT: identically configured -generated RTL 9K 8K 7K FPGA Resource Usage (same router/noc configuration) 6K 40K 50% 5K ~50% lower LUT 30K Usage 4K 3K 2K 1K 50% Single Router 60K 50K 20K 10K 4x4 Mesh Noc 500 400 300 200 100 Network Performance (uniform random traffic @ 100MHz) similar bandwidth same LUT bugdet 4x BW 2x lower idle latency 2 4 6 8 Load (in Gbps) *NoC RTL from http://nocs.stanford.edu/cgi-bin/trac.cgi/wiki/resources/router 17

Latency (in cycles) Latency (in cycles) CONNECT Sample Networks Four sample CONNECT Networks ( router, endpoint) 16 endpoints, 2/4 virtual channels, 128-bit datapath Ring Fat Tree Mesh High Radix All above networks are interchangeable from user perspective 40 Uniform Random Traffic 40 90% Neighbor Traffic 20 20 0.25 0.50 0.75 1 0.25 0.50 0.75 1 please Load see (in paper flits/cycle) for more synthesis & performance Load results (in flits/cycle) 18

Latency (in cycles) Latency (in cycles) CONNECT Sample Networks Four sample CONNECT Networks ( router, endpoint) 16 endpoints, 2/4 virtual channels, 128-bit datapath Ring Fat Tree Mesh High Radix All above networks are interchangeable from user perspective Uniform Random Traffic 90% Neighbor Traffic 40 40 There 20 is no one-size-fits-all NoC! Tune 20 NoC to application. 0.25 0.50 0.75 1 0.25 0.50 0.75 1 please Load see (in paper flits/cycle) for more synthesis & performance Load results (in flits/cycle) 19

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 20

Related Work FPGA-oriented NoC Architectures PNoC: lightweight circuit-switched NoC [Hilton 06] NoCem: simple router block, no virtual channels [Schelle 08] FPGA-related NoC Studies Analytical models for predicting NoC perf. on FPGAs [Lee 10] Effect of FPGA NoC params on multiproccesor system [Lee 09] Modify FPGA configuration circuitry to build NoC Metawire: use configuration circuitry as NoC [Shelburne 08] Time-division multiplexed wiring to enable new NoC [Francis 08] Commercial Interconnect Approaches ARM AMBA, STNoC, CoreConnect PLB/OPB, Altera Qsys, etc. 21

Conclusions Significant gains from tuning for FPGA FPGAs and ASICs have different design sweet spot CONNECT flexible, efficient, lightweight NoC Compared to ASIC-driven NoC, CONNECT offers Significantly lower network latency and ~50% lower LUT usage or 3-4x higher network performance Take advantage of reconfigurable nature of FPGA Tailor NoC to specific communication needs of application 22

Outline NoC Terminology (single-slide review) Approach Tailoring NoCs to FPGAs Results Related Work & Conclusion Public Release & Demo! 23

Public Release http://www.ece.cmu.edu/~mpapamic/connect/ NoC Generator with web-based interface Supports multiple pre-configured topologies Includes graphical editor for custom topologies FreeBSD-like license (limited to non-commercial research use) Acknowledgments Derek Chiou, Daniel Becker & Stanford CVA group NSF, Xilinx, Bluespec Demo! 24

Some Release Stats Released in March 2012 1500+ unique visitors 150+ network generation requests Most Popular Topologies User Breakdown 1 2 3 4 5 6 Mesh/Torus 51% Double Ring 14% Ring/Line 14% Fully Connected 10% Custom 6% Star 5% 35% Other 25% Industry 40% Academia 25

Thanks! Questions?