Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Similar documents
Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture: Interconnection Networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Lecture: Networks, Disks, Datacenters, GPUs. Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Interconnection Networks

Deadlock and Livelock. Maurizio Palesi

Flow Control can be viewed as a problem of

Interconnection Networks

Network-on-chip (NOC) Topologies

4. Networks. in parallel computers. Advances in Computer Architecture

Deadlock and Router Micro-Architecture

Interconnection Network

Lecture 3: Flow-Control

TDT Appendix E Interconnection Networks

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

CS575 Parallel Processing

NOC Deadlock and Livelock

Communication Performance in Network-on-Chips

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

INTERCONNECTION NETWORKS LECTURE 4

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Packet Switch Architecture

Packet Switch Architecture

Interconnection Networks

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Lecture 18: Communication Models and Architectures: Interconnection Networks

Topologies. Maurizio Palesi. Maurizio Palesi 1

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Generic Methodologies for Deadlock-Free Routing

ECE 669 Parallel Computer Architecture

Routing Algorithms. Review

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

Lecture 22: Router Design

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Interconnect Technology and Computational Speed

Lecture 7: Flow Control - I

EC 513 Computer Architecture

Topologies. Maurizio Palesi. Maurizio Palesi 1

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

EE 382C Interconnection Networks

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Interconnection Network

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Multicomputer distributed system LECTURE 8

Physical Organization of Parallel Platforms. Alexandre David

Lecture 2: Topology - I

Lecture 2 Parallel Programming Platforms

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Network on Chip Architecture: An Overview

Basic Low Level Concepts

NOC: Networks on Chip SoC Interconnection Structures

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

Advanced Computer Networks. Flow Control

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

Lecture 23: Router Design

Parallel Computer Architecture II

EE382 Processor Design. Illinois

Advanced Computer Networks. Flow Control

Interconnection networks

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

EE/CSCI 451: Parallel and Distributed Computation

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

ES1 An Introduction to On-chip Networks

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Deadlock-free XY-YX router for on-chip interconnection network

BlueGene/L. Computer Science, University of Warwick. Source: IBM

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

Parallel Architectures

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

in Oblivious Routing

CMSC 611: Advanced. Interconnection Networks

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

EE/CSCI 451: Parallel and Distributed Computation

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Parallel Computing Platforms

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

Overview. Processor organizations Types of parallel machines. Real machines

Transcription:

Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1

Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection bandwidth Cost Ports/switch Total links 1 1 2 16 32 1024 3 128 5 192 7 256 64 2080 2

k-ary d-cube Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k) Number of nodes N = k d Number of switches : Switch degree : Number of links : Pins per node : Avg. routing distance: Diameter : Bisection bandwidth : Switch complexity : Should we minimize or maximize dimension? 3

k-ary d-cube Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k) Number of nodes N = k d (with no wraparound) Number of switches : Switch degree : Number of links : Pins per node : N 2d + 1 Nd 2wd Avg. routing distance: Diameter : Bisection bandwidth : Switch complexity : Should we minimize or maximize dimension? d(k-1)/2 d(k-1) 2wk d-1 (2d + 1) 2 4

Dimension For a fixed machine size N, low-dimension networks have significantly higher latencies for a packet scalable machines should employ high dimensionality (high cost!) For a fixed number of pins, message latency decreases at first, then increases (as we increase dimensionality) What if we keep constant bisection bandwidth? Number of switches : Switch degree : Number of links : Pins per node : N 2d+1 Nd 2wd Avg. routing distance: Diameter : Bisection bandwidth : Switch complexity : N = k d d(k-1)/2 d(k-1) 2wk d-1 (2d + 1) 2 5

Routing Deterministic routing: given the source and destination, there exists a unique route Adaptive routing: a switch may alter the route in order to deal with unexpected events (faults, congestion) more complexity in the router vs. potentially better performance Example of deterministic routing: dimension order routing: send packet along first dimension until destination co-ord (in that dimension) is reached, then next dimension, etc. 6

Deadlock Deadlock happens when there is a cycle of resource dependencies a process holds on to a resource (A) and attempts to acquire another resource (B) A is not relinquished until B is acquired 7

Deadlock Example 4-way switch Input ports Output ports Packets of message 1 Packets of message 2 Packets of message 3 Packets of message 4 Each message is attempting to make a left turn it must acquire an output port, while still holding on to a series of input and output ports 8

Deadlock-Free Proofs Number edges and show that all routes will traverse edges in increasing (or decreasing) order therefore, it will be impossible to have cyclic dependencies Example: k-ary 2-d array with dimension routing: first route along x-dimension, then along y 17 18 19 1 2 3 2 1 0 18 1 2 3 2 1 0 17 1 2 3 2 1 0 16 1 2 3 2 1 0 9

Breaking Deadlock I The earlier proof does not apply to tori because of wraparound edges Partition resources across multiple virtual channels If a wraparound edge must be used in a torus, travel on virtual channel 1, else travel on virtual channel 0 10

Breaking Deadlock II Consider the eight possible turns in a 2-d array (note that turns lead to cycles) By preventing just two turns, cycles can be eliminated Dimension-order routing disallows four turns Helps avoid deadlock even in adaptive routing West-First North-Last Negative-First Can allow deadlocks 11

Packets/Flits A message is broken into multiple packets (each packet has header information that allows the receiver to re-construct the original message) A packet may itself be broken into flits flits do not contain additional headers Two packets can follow different paths to the destination Flits are always ordered and follow the same path Such an architecture allows the use of a large packet size (low header overhead) and yet allows fine-grained resource allocation on a per-flit basis 12

Flow Control The routing of a message requires allocation of various resources: the channel (or link), buffers, control state Bufferless: flits are dropped if there is contention for a link, NACKs are sent back, and the original sender has to re-transmit the packet Circuit switching: a request is first sent to reserve the channels, the request may be held at an intermediate router until the channel is available (hence, not truly bufferless), ACKs are sent back, and subsequent packets/flits are routed with little effort (good for bulk transfers) 13

Buffered Flow Control A buffer between two channels decouples the resource allocation for each channel buffer storage is not as precious a resource as the channel (perhaps, not so true for on-chip networks) Packet-buffer flow control: channels and buffers are allocated per packet Store-and-forward H B B B T Cut-through Channel Channel 0 1 2 3 0 1 2 3 H B B B T H B B B T H B B B T Time-Space diagrams H B B B T H B B B T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Cycle 14

Flit-Buffer Flow Control (Wormhole) Wormhole Flow Control: just like cut-through, but with buffers allocated per flit (not channel) A head flit must acquire three resources at the next switch before being forwarded: channel control state (virtual channel, one per input port) one flit buffer one flit of channel bandwidth The other flits adopt the same virtual channel as the head and only compete for the buffer and physical channel Consumes much less buffer space than cut-through routing does not improve channel utilization as another packet cannot cut in (only one VC per input port) 15

Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual channel keeps track of the output channel assigned to the head, and pointers to buffered packets A head flit must allocate the same three resources in the next switch before being forwarded By having multiple virtual channels per physical channel, two different packets are allowed to utilize the channel and not waste the resource when one packet is idle 16

Example Wormhole: Node-0 A is going from Node-1 to Node-4; B is going from Node-0 to Node-5 Node-1 A Virtual channel: B idle B idle Node-2 Node-3 Node-4 Node-5 (blocked, no free VCs/buffers) Node-0 Traffic Analogy: B is trying to make a left turn; A is trying to go straight; there is no left-only lane with wormhole, but there is one with VC Node-1 A B A B A Node-2 Node-3 Node-4 Node-5 (blocked, no free VCs/buffers) 17

Title Bullet 18