Lecture 7: Flow Control - I

Similar documents
Lecture 3: Flow-Control

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

Lecture 2: Topology - I

Flow Control can be viewed as a problem of

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Lecture 13: System Interface

Lecture 3: Topology - II

Interconnection Networks

Lecture: Interconnection Networks

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Advanced Computer Networks. Flow Control

Topologies. Maurizio Palesi. Maurizio Palesi 1

Lecture 23: Router Design

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Interconnection Networks

Lecture 12: SMART NoC

Packet Switch Architecture

Topologies. Maurizio Palesi. Maurizio Palesi 1

Packet Switch Architecture

Network-on-chip (NOC) Topologies

Multicomputer distributed system LECTURE 8

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

ES1 An Introduction to On-chip Networks

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Basic Low Level Concepts

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Advanced Computer Networks. Flow Control

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Interconnection Networks

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Parallel Computing Interconnection Networks

Evaluation of NOC Using Tightly Coupled Router Architecture

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

ECE/CS 757: Advanced Computer Architecture II Interconnects

Lecture 18: Communication Models and Architectures: Interconnection Networks

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Towards A Formally Verified Network-on-Chip

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

CS 3516: Computer Networks

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

NoC Test-Chip Project: Working Document

TDT Appendix E Interconnection Networks

Lecture 22: Router Design

4. Networks. in parallel computers. Advances in Computer Architecture

Communication Cost in Parallel Computing

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Input Buffering (IB): Message data is received into the input buffer.

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Routing Algorithms. Review

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

ECE 669 Parallel Computer Architecture

Outline. Computer Communication and Networks. The Network Core. Components of the Internet. The Network Core Packet Switching Circuit Switching

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

ECEN Final Exam Fall Instructor: Srinivas Shakkottai

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

NOC Deadlock and Livelock

Network on Chip Architecture: An Overview

Deadlock and Livelock. Maurizio Palesi

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Message Passing Models and Multicomputer distributed system LECTURE 7

EE/CSCI 451: Parallel and Distributed Computation

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Lecture 2: Internet Structure

Performance Evaluation of Different Routing Algorithms in Network on Chip

Lecture 1: Introduction

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

ECE 650 Systems Programming & Engineering. Spring 2018

Switching / Forwarding

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

On Packet Switched Networks for On-Chip Communication

UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING

Data Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies

Packet Switching - Asynchronous Transfer Mode. Introduction. Areas for Discussion. 3.3 Cell Switching (ATM) ATM - Introduction

Interconnect Technology and Computational Speed

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Switching and Forwarding Reading: Chapter 3 1/30/14 1

ECE 650 Systems Programming & Engineering. Spring 2018

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

Network Superhighway CSCD 330. Network Programming Winter Lecture 13 Network Layer. Reading: Chapter 4

Duke University CompSci 356 Midterm Spring 2016

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

LS Example 5 3 C 5 A 1 D

Lecture 9: Bridging & Switching"

EC 513 Computer Architecture

HWP2 Application level query routing HWP1 Each peer knows about every other beacon B1 B3

Transcription:

ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical and Computer Engineering Georgia Institute of Technology tushar@ece.gatech.edu Acknowledgment: Slides adapted from Univ of Toronto ECE 1749 H (N Jerger)

Network Architecture Topology How to connect the nodes ~Road Network Routing Which path should a message take ~Series of road segments from source to destination Flow Control When does the message have to stop/proceed ~Traffic signals at end of each road segment Router Microarchitecture How to build the routers ~Design of traffic intersection (number of lanes, algorithm for turning red/green) 2

3 Flow Control Once the topology and route are fixed, flow control determines the allocation of network resources (channel bandwidth, buffer capacity, and control state) to packets as they traverse the network == resolution of contention between packets requesting the same resource ~Traffic Signals / Stop signs at end of each road segment

Why Flow Control matters? 4 Flow control can single-handedly determine performance, however efficient the topology or routing algorithm might be B Topology Routing (XY) Latency (hops) (AàB) 1 2 Throughput (msg/cycle) (AàB) 1 1 A C D Flow Control Case I: One buffer at C Case II: Four DàB msgs 3 (R A +) L AC + R C + L CB (+ R B ) 1/2 1/5 Suppose Router Delay = 1, Link Delay = 1

Allocation Granularity: Messages, Packets, and Flits 5 Message Header Payload Packet Route Seq# Sequence in the message Flit Head Flit Body Flit Tail Flit Route* Type VCID *Only in Head / Head_Tail Flit Head, Body, Tail, Head_Tail [1-flit packet) Phit Off-chip (SANs) Messages could be B/KB/MB of data Flits have to be sent serially as multiple phits (limited by pins) On-chip (NoC) Message = Packet Flit = Phit (abundant on-chip wires)

Packet Sizes in NoCs 6 Cache line (Data) Route Type VCid Addr Bytes 0-15 Bytes 16-31 Bytes 32-47 Bytes 48-63 Head Flit Body Flits Tail Flit Cache Line Request Route Type VCid Addr Cmd Head_Tail Flit 64B Cache Line ~128-bit flits (i.e., link width) 1 control flit (cache line req) 5 data flits (cache line data) All flits of a packet take same route and have the same VCid

Flow Control based on Allocation Granularity 7 Message-based Flow Control E.g., Circuit Switching Packet-based Flow Control E.g., Store and Forward, Virtual Cut-Through Flit-based Flow Control E.g., Wormhole, Virtual Channel

8 Message-based Flow Control Coarsest Granularity Circuit-switching Setup entire path before sending message Reserve all channels from source to destination using a setup probe Once setup complete, send Data through the channels Buffers not needed at routers as no contention Tear down the circuit once transmission complete

Circuit Switching Example 9 0 Configuration Probe 5 Data Circuit Acknowledgement Significant latency overhead prior to data transfer Data transfer does not pay per-hop overhead for buffering, routing, and allocation

10 Handling Contention 0 Wait till Data transmission from 0 complete! 1 2 Configuration Probe 5 Data Circuit Acknowledgement When there is contention Significant wait time Message from 1 à 2 must wait

11 Challenges with Circuit-Switching Loss in bandwidth (throughput) Throughput can suffer due to setup and transfer time for circuits Links are idle until setup is complete No other message can use links until transfer is complete Latency overhead in setup if the amount of data being transferred is small

Circuit-Switching in NoCs? Cache Line = 64B Suppose Channel Width = 128b => 64x8/128 = 4 chunks 3-hop traversal with 1-cycle per hop Setup = 3 cycles ACK = 3 cycles Data Transfer Time = 3 (for first chunk) + 3 (remaining chunks) = 6 cycles Total Time = 12 cycles Half of this went in circuit setup! Hybrid Circuit-Packet Switching Jerger et. al, Circuit Switched Coherence, NOCS 2008 12

Time-Space Diagram: Circuit Switching 13

14 Packet-based Flow Control Packet Switching Break messages into packets Interleave packets on links Better utilization Requires per-node buffering to store packets inflight waiting for output channel Two techniques Store and Forward Virtual Cut-Through

Packet-based: Store and Forward Links and buffers are allocated to entire packet 15 Head flit waits at router until entire packet is received before being forwarded to the next hop

Store and Forward Example 16 0 5 Total delay = 4 cycles per hop x 3 hops = 12 cycles Not suitable on-chip. Why? High per-hop latency Serialization delay paid at each hop Larger buffering required

Time-Space Diagram: Store and Forward 17

Packet-based: Virtual Cut-Through Links and Buffers allocated to entire packets 18 Flits can proceed to next hop before tail flit has been received by current router But only if next router has enough buffer space for entire packet

Virtual Cut-Through Example 19 0 Allocate 4 flit-sized Allocate 4 flit-sized buffers before buffers head before head proceeds proceeds 5 Total delay = 1 cycle per hop x 3 hops + serialization = 6 cycles Lower per-hop latency Large buffering required

Time-Space Diagram: Virtual Cut-Through 20

Virtual Cut-Through Example (2) 21 Cannot proceed because only 2 flit buffers available Throughput suffers from inefficient buffer allocation

Time-Space Diagram: Virtual Cut-Through (2) 22

Flit-level Flow Control Like VCT, flit can proceed to next router before entire packet arrives Unlike VCT, flit can proceed as soon as there is sufficient buffering for that flit 23 Buffers allocated per flit rather than per packet Routers do not need to have packet-sized buffers Help routers meet tight area/power constraints Two techniques Wormhole link allocated per packet Virtual Channel link allocated per flit

Wormhole Flow Control Example 24 Red holds this channel: channel remains idle until red proceeds Channel idle but red packet blocked behind blue Dest for Red 6 flit buffers/input port Buffer full: blue cannot proceed Dest for Blue Blocked by other packets Head-of-Line Blocking

25 Wormhole Flow Control Pros More efficient buffer utilization (good for on-chip) Low latency Cons Poor link utilization: if head flit becomes blocked, all links spanning length of packet are idle Cannot be re-allocated to different packet Suffers from head of line (HOL) blocking

Time-Space Diagram: Wormhole 26