Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs

Similar documents
Lecture 22: Router Design

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 23: Router Design

Lecture: Interconnection Networks

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Phastlane: A Rapid Transit Optical Routing Network

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Packet Switch Architecture

Packet Switch Architecture

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Prediction Router: Yet another low-latency on-chip router architecture

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Interconnection Networks

Interconnection Networks

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

A Survey of Techniques for Power Aware On-Chip Networks.

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Ultra-Fast NoC Emulation on a Single FPGA

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Express Virtual Channels: Towards the Ideal Interconnection Fabric

Deadlock-free XY-YX router for on-chip interconnection network

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Low-Power Interconnection Networks

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Chapter 4 NETWORK HARDWARE

Lecture 3: Topology - II

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE 669 Parallel Computer Architecture

Brief Background in Fiber Optics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Lecture 2: Topology - I

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

Lecture 3: Flow-Control

EE/CSCI 451: Parallel and Distributed Computation

NOC: Networks on Chip SoC Interconnection Structures

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

CS575 Parallel Processing

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Flow Control can be viewed as a problem of

Interconnection Networks

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

TDT Appendix E Interconnection Networks

Tutorial 2 : Networking

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Evaluation of NOC Using Tightly Coupled Router Architecture

Interconnect Technology and Computational Speed

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

4. Networks. in parallel computers. Advances in Computer Architecture

Lecture 12: SMART NoC

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES

CMPE 257: Wireless and Mobile Networking

Outline: Connecting Many Computers

A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

Asynchronous Bypass Channel Routers

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

NoCAlert: An On-Line and Real- Time Fault Detection Mechanism for Network-on-Chip Architectures

Switching & ARP Week 3

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

in Oblivious Routing

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Links. CS125 - mylinks 1 1/22/14

ECE 551 System on Chip Design

Basic Switch Organization

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

BROADBAND AND HIGH SPEED NETWORKS

Network on Chip Architecture: An Overview

Lecture 7: Flow Control - I

ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Topologies. Maurizio Palesi. Maurizio Palesi 1

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Transcription:

Lecture 15: NoC Innovations Today: power and performance innovations for NoCs 1

Network Power Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton Energy for a flit = E R. H + E wire. D = (E buf + E xbar + E arb ). H + E wire. D E R = router energy E wire = wire transmission energy E buf = router buffer energy E arb = router arbiter energy H = number of hops D = physical Manhattan distance E xbar = router crossbar energy This paper assumes that E wire. D is ideal network energy (assuming no change to the application and how it is mapped to physical nodes) 2

Segmented Crossbar By segmenting the row and column lines, parts of these lines need not switch less switching capacitance (especially if your output and input ports are close to the bottom-left in the figure above) Need a few additional control signals to activate the tri-state buffers Overall crossbar power savings: ~15-30% 3

Cut-Through Crossbar Attempts to optimize the common case: in dimension-order routing, flits make up to one turn and usually travel straight 2/3 rd the number of tristate buffers and 1/2 the number of data wires Straight traffic does not go thru tristate buffers Some combinations of turns are not allowed: such as E N and N W (note that such a combination cannot happen with dimension-order routing) Crossbar energy savings of 39-52% 4

Write-Through Input Buffer Input flits must be buffered in case there is a conflict in a later pipeline stage If the queue is empty, the input flit can move straight to the next stage: helps avoid the buffer read To reduce the datapaths, the write bitlines can serve as the bypass path Power savings are a function of rd/wr energy ratios and probability of finding an empty queue 5

Express Channels Express channels connect non-adjacent nodes flits traveling a long distance can use express channels for most of the way and navigate on local channels near the source/destination (like taking the freeway) Helps reduce the number of hops The router in each express node is much bigger now 6

Express Channels Routing: in a ring, there are 5 possible routes and the best is chosen; in a torus, there are 17 possible routes A large express interval results in fewer savings because fewer messages exercise the express channels 7

Express Virtual Channels To a large extent, maintain the same physical structure as a conventional network (changes to be explained shortly) Some virtual channels are treated differently: they go through a different router pipeline and can effectively avoid most router overheads 8

Router Pipelines If Normal VC (NVC): at every router, must compete for the next VC and for the switch will get buffered in case there is a conflict for VA/SA If EVC (at intermediate bypass router): need not compete for VC (an EVC is a VC reserved across multiple routers) similarly, the EVC is also guaranteed the switch (only 1 EVC can compete for an output physical channel) since VA/SA are guaranteed to succeed, no need for buffering simple router pipeline: incoming flit directly moves to ST stage If EVC (at EVC source/sink router): must compete for VC/SA as in a conventional pipeline before moving on, must confirm free buffer at next EVC router 9

Hierarchical Buses Udipi et al., HPCA 10 Use buses to reduce routers Use bloom filters to stifle broadcasts Use page coloring to minimize travel beyond a bus 10

MC Placement Abts et al., ISCA 2009 Bullet Tilera 11

MC Placement Diamond placement has the least link contention as it distributes traffic XY routing for requests and YX routing for replies is also effective in distributing traffic 12

NoX Router Hayenga and Lipasti, MICRO 2011 In speculative routers, two packets may be sent out at the same time on the link; receiver gets garbage Instead, send the XOR of the colliding packets In the next cycles, send the XOR of colliding packets, minus one of the colliding packets Receiver decodes all of the packets with XORs Saves one wasted cycle on every collision or mis-speculation 13

Title Bullet 14