Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Similar documents
Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

ECE 669 Parallel Computer Architecture

Interconnection Networks

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

Lecture: Interconnection Networks

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Deadlock and Livelock. Maurizio Palesi

TDT Appendix E Interconnection Networks

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Communication Performance in Network-on-Chips

NOC Deadlock and Livelock

Interconnection Network

Routing Algorithms. Review

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Basic Low Level Concepts

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

CMSC 611: Advanced. Interconnection Networks

Lecture 3: Flow-Control

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Lecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996

Interconnection Networks

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

4. Networks. in parallel computers. Advances in Computer Architecture

Deadlock and Router Micro-Architecture

Interconnection Networks

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Interconnect Technology and Computational Speed

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Lecture 7: Flow Control - I

Deadlock-free XY-YX router for on-chip interconnection network

Parallel Computer Architecture II

Boosting the Performance of Myrinet Networks

Flow Control can be viewed as a problem of

CS575 Parallel Processing

Performance Analysis of a Minimal Adaptive Router

CSC 4900 Computer Networks: Network Layer

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

L14-15 Scalable Interconnection Networks. Scalable, High Performance Network

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Routing in packet-switching networks

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Routing, Routers, Switching Fabrics

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Network-on-chip (NOC) Topologies

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

DUE to the increasing computing power of microprocessors

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Distributed Memory Machines. Distributed Memory Machines

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

Real-Time Protocol (RTP)

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

The Network Layer and Routers

Interconnection Networks

Advanced Computer Networks. Flow Control

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

Topologies. Maurizio Palesi. Maurizio Palesi 1

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Lecture 2 Parallel Programming Platforms

Scalable Multiprocessors

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999

Chapter 6 Queuing Disciplines. Networking CS 3470, Section 1

LS Example 5 3 C 5 A 1 D

EE 6900: Interconnection Networks for HPC Systems Fall 2016

BlueGene/L. Computer Science, University of Warwick. Source: IBM

The IRAM Network Interface

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

6.9. Communicating to the Outside World: Cluster Networking

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Performance Evaluation of Different Routing Algorithms in Network on Chip

Quality of Service in the Internet

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Interprocessor Communication. Basics of Network Routing

Lecture 16: Network Layer Overview, Internet Protocol

Advanced Computer Networks. Flow Control

Transcription:

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1

Review: Terms Network characterized by Topology physical structure of the graph Routing Algorithm which paths through network can message flow Switching Strategy How data in message traverses its route Circuit Switched vs. Packet Switched Flow Control When does a packet (or portions of it) move along its route CPS 220 3 Review: Organization Given topology constructed by linking switches and network interfaces, must deliver packet from node A to node B Link: cable with connectors on each end connect switches to other switches or network interfaces Switch: N inputs N outputs (degree N) Phit: Minimum # of bits physically moved across link in one cycle (Can pipeline on single wire) Flit: Minimum # of bits move across link as a single unit Packet: unit that requires routing information, some number of flits CPS 220 4 2

Review: Switches At minimum, must route inputs to outputs Receiver Input Buffer Output Buffer Transmitter Input Ports Cross-bar Output Ports Control Routing, Scheduling VLSI makes it easier to create larger fully connected switches CPS 220 5 Store-and-forward Cut-through Virtual cut-through Wormhole Review: Routing CPS 220 6 3

Routing Algorithm How do I know where a packet should go? Arithmetic Source-Based Table Lookup Adaptive route based on network state (e.g., contention) CPS 220 7 Arithmetic Routing For regular topology, simple arithmetic to determine route 2D Mesh (Also called NEWS network) packet header contains signed offset to destination switch ++ or -- one field of header (x or y dimension) when x == 0 and y == 0, then at correct processor Requires ALU in switch Must recompute CRC CPS 220 8 4

Source Based and Table Lookup Routing Source Based Source specifies output port for each switch in route Very Simple Switches no control state strip output port off header Myrinet uses this Table Lookup Very Small Header, index into table for output port Big tables, must be kept up to date... CPS 220 9 Deterministic v.s. Adaptive Routing Deterministic follows a prespecified route mesh: dimension-order routing» (x1, y1) -> (x2, y2)» first Dx = x2 - x1,» then Dy = y2 - y1, hypercube: edge-cube routing» X = x0x1x2...xn -> Y = y0y1y2...yn» R = X xor Y» Traverse dimensions of differing address in order tree: common ancestor Adaptive route determined by contention for output port 010 110 111 011 100 000 001 101 CPS 220 10 5

Deadlock Key is to allow receiving even if you can t send. What is livelock? CPS 220 11 Deadlock Free Routing Virtual Channels Not virtual cut-through Add buffers so, flits of wormhole packets can be interleaved Up*-Down* Number switches: higher = farther away from processors route up, make one turn, route down Turn Model Routing Restrict order of turns» West First» North Last» Negative First Can increase number of hops Can handle faulty nodes CPS 220 12 6

A Generic Switch At minimum, must route inputs to outputs Receiver Input Buffer Output Buffer Transmitter Input Ports Cross-bar Output Ports Control Routing, Scheduling VLSI makes it easier to create larger fully connected switches CPS 220 13 Switch Design Issues ports, pin limited data path (cross bar designs) non-blocking crossbar routing logic per input ALU table Finite State Machine for cut-through CPS 220 14 7

Switch Buffering need to absorb some transient peaks shared buffer pool need high bandwidth one output could hog all buffer space input buffers CPS 220 15 Input Buffering buffer per port routing logic head of line (HOL) blocking subsequent packet may be routed to unused output port CPS 220 16 8

Output Buffering Buffers logically associated with output split on either side of cross bar Arbitration for physical link (output scheduling) static priority random round-robin oldest-first Effects of adaptive routing? Select output based on availability requires feedback from output port CPS 220 17 Stacked Dimension Switch Uses only 2x2 switch to build higher dimension switch Host out z in 2x2 z out y in 2x2 y out x in 2x2 x out Host in CPS 220 18 9

Congestion Control Packet switched networks do not reserve bandwidth; this leads to contention Solution: prevent packets from entering until contention is reduced (e.g., metering lights) Options: End-to-end Flow Control Link-level Flow Control CPS 220 19 Flow Control Packet discarding: If a packet arrives at a switch and there is no room in the buffer, the packet is discarded no communication between switches, requires higher level protocol Flow control: between pairs of receivers and senders; use feedback to tell the sender when it is allowed to send the next packet CPS 220 20 10

Link-Level Flow Control Ready Data Transfer single flit when receiver is ready Could have long links with many flits in flight CPS 220 21 Credit-based (Window) Flow Control Receiver gives N credits to sender sender decrements count stops sending if zero receiver sends back credit as it drains its buffer bundle credits to reduce overhead Must account for link latency CPS 220 22 11

Water Level high water, low water stop & go back to source switch (Myrinet) can send redundant stop/go Incoming phits Stop Go Outgoing phits CPS 220 23 Case Study Cray T3D 1024 switch nodes each connected to 2 processors 3D Torus, bidirectional, 300 MB/s Link: 16 bits, 8 control bits Variable size packet (multiple of 16 bits) Logical request & response networks 2 virtual channels each for deadlock Stacked dimension routing Wormhole for large packets, virtual cut-through for small packets CPS 220 24 12

IBM SP-2 (Vulcan) Switch has eight bidirectional 40 MB/s links Link: 8 data bits, 1 tag, 1 reverse flow-control Flit is 16 bits, phit is 8 input FIFO + output FIFO + central buffer 128 8-byte segments CPS 220 25 Next Time Multiprocessor Architectures CPS 220 26 13