Basic Low Level Concepts

Similar documents
Lecture 3: Flow-Control

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Input Buffering (IB): Message data is received into the input buffer.

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

TDT Appendix E Interconnection Networks

Lecture: Interconnection Networks

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

NoC Test-Chip Project: Working Document

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Packet Switch Architecture

Packet Switch Architecture

Interconnection Networks

Advanced Computer Networks. Flow Control

Lecture 7: Flow Control - I

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

ES1 An Introduction to On-chip Networks

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Chapter 1 Bufferless and Minimally-Buffered Deflection Routing

NOC: Networks on Chip SoC Interconnection Structures

Networks. Distributed Systems. Philipp Kupferschmied. Universität Karlsruhe, System Architecture Group. May 6th, 2009

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Flow Control can be viewed as a problem of

Interconnection Networks

Low-Power Interconnection Networks

Architecture or Parallel Computers CSC / ECE 506

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Abstract. Paper organization

Routing Algorithms. Review

Evaluating Bufferless Flow Control for On-Chip Networks

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Advanced Computer Networks. Flow Control

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Networks-on-Chip Router: Configuration and Implementation

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

Lecture 22: Router Design

Ultra-Fast NoC Emulation on a Single FPGA

4. Networks. in parallel computers. Advances in Computer Architecture

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

ECE/CS 757: Advanced Computer Architecture II Interconnects

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

UNIT IV -- TRANSPORT LAYER

Outline: Connecting Many Computers

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Interconnection Networks

CMPE150 Midterm Solutions

Communication Performance in Network-on-Chips

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Prediction Router: Yet another low-latency on-chip router architecture

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Address InterLeaving for Low- Cost NoCs

Network management and QoS provisioning - revise. When someone have to share the same resources is possible to consider two particular problems:

Under the Hood, Part 1: Implementing Message Passing

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

EC 513 Computer Architecture

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

Protocol Specification

Part 5: Link Layer Technologies. CSE 3461: Introduction to Computer Networking Reading: Chapter 5, Kurose and Ross

CRC. Implementation. Error control. Software schemes. Packet errors. Types of packet errors

Deadlock-free XY-YX router for on-chip interconnection network

Lecture 9: Bridging & Switching"

HWP2 Application level query routing HWP1 Each peer knows about every other beacon B1 B3

A Hybrid Interconnection Network for Integrated Communication Services

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

Lecture 18: Communication Models and Architectures: Interconnection Networks

A closer look at network structure:

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EE 6900: Interconnection Networks for HPC Systems Fall 2016

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999

Prevention Flow-Control for Low Latency Torus Networks-on-Chip

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

Optical Packet Switching

Network on Chip Architecture: An Overview

ADVANCED COMPUTER NETWORKS

Congestion Management in Lossless Interconnects: Challenges and Benefits

IEEE , Token Rings. 10/11/06 CS/ECE UIUC, Fall

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

Transcription:

Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock freedom Operation through a single switch: Router micro-architectures v Buffering, arbitration, scheduling, datapath Operation of a single link: switching and flow control Optimization: technology, congestion, reliability Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) ECE 8813a (2) Overview Sources Main architectural issues for communication over a single link v Message units v Flow control (lossless links) v Switching (next) v Buffer management (later) v Arbitration & Scheduling (later) Chapters 1 & 2 v Interconnection Networks: An Engineering Approach, J. Duato, S. Yalamanchili and L. Ni, Morgan Kaufmann (pubs.) Papers v Virtual Channel Flow Control v Optimistic Flow Control Illinois Fast Messages System Goals: high levels of link utilization and minimal impact on end-to-end latency ECE 8813a (3) ECE 8813a (4) 1

Message Passing Communication Protocol v Typical steps followed by the sender: 1. System call by application n Copies the data into OS and/or network interface memory n Packetizes the message (if needed) n Prepares headers and trailers of packets 2. Checksum is computed and added to header/trailer 3. Timer is started and the network interface sends the packets processor memory ni ni memory processor Message Passing Communication Protocol v Typical steps followed by the receiver: 1. NI allocates received packets into its memory or OS memory 2. Checksum is computed and compared for each packet n If checksum matches, NI sends back an ACK packet 3. Once all packets are correctly received processor n n The message is reassembled and copied to user's address The corresponding application is signalled (via polling or interrupt) memory ni ni memory processor register file proc/mem user system IO or proc/mem FIFO Interconnection network packet FIFO IO or proc/mem user system proc/mem register file register file proc/mem user system IO or proc/mem FIFO Interconnection network packet FIFO IO or proc/mem user system proc/mem register file user writes data in memory system call sends 1 copy pipelined transfer e.g., DMA ECE 8813a (5) pipelined reception e.g., DMA interrupt data 2 copy ECE 8813a ready (6) Shared Memory L2 miss v Miss Status Handling Register (MSHR) allocation, address mapping, packet construction v Message injection, flow control set-up, rate control Message reception v Packet ejection, update message status (return), and control processing (end-to-end flow control) v Packet servicing, message injection Shared Memory L2 miss v Miss Status Handling Register (MSHR) allocation, address mapping, packet construction v Message injection, flow control set-up, rate control Message reception v Packet ejection, update message status (return), and control processing (end-to-end flow control) v Packet servicing, message injection thewere42.worldpress.com ECE 8813a (7) ECE 8813a (8) 2

The Network Model Link Traversal Basic Switch Microarchitecture Switch Traversal Metrics (for now): latency and bandwidth Physical channel Link Control Route Computation DEMUX... MUX DEMUX... MUX Link Control Physical channel Routing, switching, flow control, error control Physical channel Link Control Route Computation DEMUX... MUX CrossBar DEMUX... MUX Link Control Physical channel Route Computation Switch & VC Allocation D hops L bit message W bit wide channels message path ECE 8813a (9) ECE 8813a (10) On-Chip Wide links Shallow pipelines Not pin limited Low flow control latency Smaller buffers Off-Chip vs. On-Chip Off-Chip Narrow links Deeper pipelines Pin limited Larger flow control latency Deeper buffers Routing Layer Switching Layer Physical Layer The Hardware Message Stack Where?: Destination decisions, i.e., which output port When?: When is data forwarded How?: synchronization of data transfer Largely responsible for deadlock and livelock properties Largely responsible for latency, bandwidth and energy properties Switching is tightly coupled with flow control & buffer management Relative timing is key to performance ECE 8813a (11) ECE 8813a (12) 3

Messaging Units Data/Message Packets Flits: flow control digits type head Dest Info Seq # misc tail Phits: physical flow control digits Link Level Flow Control Data is transmitted based on a hierarchical data structuring mechanism v Messages à packets à s à phits v While s and phits are fixed size, packets and data may be variable sized ECE 8813a (13) Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Flow Control A synchronization protocol for the lossless transmission of bits Determines how network resources are allocated v Buffers Determines how conflicts are resolved v How (e.g., priorities) and when resources are assigned For Synchronized Transfers Acknowledge Receipt Unit of synchronized communication v Smallest unit whose transfer is requested by the sender and acknowledged by the receiver v No restriction on the relative timing of control vs. data transfers v Is a form of backpressure ECE 8813a (15) ECE 8813a (16) 4

For Buffer Management Physical Channel Flow Control Buffer availability information Flow control occurs at two levels v Level of buffer management (s/packets) v Level of physical transfers (phits) v Relationship between s and phits is machine & technology specific What if there are no buffers? v Bufferless switching/flow control (later) Asynchronous Flow Control What is the limiting factor on link throughput? Synchronous Flow Control How is buffer availability indicated? ECE 8813a (17) ECE 8813a (18) Flow Control Mechanisms Credit-Based Flow Control Credit Based flow control On/off flow control Optimistic/Reliable Flow control Virtual Channel Flow Control Basic Network Structure and Functions v Credit-based flow control Sender sends packets whenever credit counter is not zero sender 10 87 65 43 21 0 Credit counter 9 pipelined transfer receiver X Queue is not serviced ECE 8813a (19) ECE 8813a (20) 5

Credit-Based Flow Control Timeline* Basic Network Structure and Functions v Credit-based flow control Sender resumes injection sender 87 6 54 32 10 Credit counter 9 pipelined transfer Receiver sends credits after they become available +5 receiver X Node 1 Node 2 credit process credit process credit Round trip credit time equivalently expressed in number of flow control buffer units - t rt Queue is not serviced credit ECE 8813a (21) *From W. J. Dally & B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004 ECE 8813a (22) Performance of Credit Based Schemes The control bandwidth can be reduced by submitting block credits Basic Network Structure and Functions v /Xoff flow control On/Off Flow Control Buffers must be sized to maximize link utilization v Large enough to host packets in transit Xoff sender Control bit a packet is injected if control bit is in Xoff receiver # buffers F trt b Lf link bandwidth pipelined transfer size *From W. J. Dally & B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004 ECE 8813a (23) ECE 8813a (24) 6

Basic Network Structure and Functions v /Xoff flow control On-Off Flow Control Basic Network Structure and Functions v /Xoff flow control On-Off Flow Control Xoff sender When in Xoff, sender cannot inject packets When Xoff threshold is reached, an Xoff notification is sent receiver Xoff sender When threshold is reached, an notification is sent receiver Control bit Xoff Control bit Xoff pipelined transfer X pipelined transfer X Queue is not serviced Queue is not serviced ECE 8813a (25) ECE 8813a (26) Node 1 Node 2 off process FC Timeline* Hit the high water mark (stop) Stop Go Performance of On-Off Schemes Buffer sizing and position of Stop and Go watermarks To operate at full speed buffer size must be at least 2F on Stop Go trt b F Lf Hit the low water mark (go) *From W. J. Dally & B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004 ECE 8813a (27) *From W. J. Dally & B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004 ECE 8813a (28) 7

Comparison of Flow Control Schemes Basic Network Structure and Functions v Comparison of /Xoff vs credit-based flow control Comparing Credit-Based & On/Off Flow Control Both schemes can fully utilize buffers Stop & Go Credit based Stop Go Stop signal Sender Last packet returned by stops reaches receiver receiver transmission buffer # credits returned to sender Sender Last packet uses reaches receiver last credit buffer Stop Go Packets in buffer get processed Stop Go Packets get Sender processed and transmits credits returned packets Go signal returned to sender Stop Go Sender resumes transmission First packet reaches buffer First packet Time reaches Flow control latency buffer observed by receiver buffer Time Restart latency is lower for credit-based schemes and therefore v Credit-based flow control has higher average buffer occupancy at high loads v Credit-based flow control leads to higher throughput at high loads v Smaller inter-packet gap ECE 8813a (29) ECE 8813a (30) Comparing Credit-Based & On/Off Flow Control (cont.) Control traffic is higher for credit schemes v Block credits can be used to tune link behavior Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance) v Not true for On/Off without dropping packets Credit schemes have higher information content à useful for QoS schemes On-off schemes better suited for many to one relationships Sending Reject Queue Network Interface Optimistic Flow Control Net ACK/NACK Network Interface Optimistically send messages v Allocate for returned messages v Deallocate on reception of Ack v Retransmit on reception of Nack Buffer sizes are proportional to the number of packets rather than the number of senders Receiving ECE 8813a (31) ECE 8813a (32) 8

Reliable Flow Control Reliable Flow Control Sending Network Interface Net ACK/NACK Network Interface Receiving Sending Network Interface ACK Net Network Interface 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 Receiving Transmit packets when available v De-allocate when reception is acknowledged v Re-transmit if packet is dropped (and negative ACK is received) Derived from traditional telecom networks v Employed over long and error prone links v Extended to operate over the network à end-to-end last tx d packet Retransmission interval last ack d packet last rcv d packet Packets are tagged with sequence numbers Need to recycle sequence numbers Receiver acknowledges (Ack) received packets v Detect out of sequence reception v Time-outs to detect lost packets ECE 8813a (33) ECE 8813a (34) Sending ACK Reliable Flow Control Net Data structures to hold transmitted packets and the order in which they were transmitted v Utilize the send buffers o Go-back-N strategy v Maintain a separate data structure o Maintain original order v Minimize redundant transmissions v Block acknowledgements to minimize flow control bandwidth used Receiving Buffering Used for long and error prone links # buffers trt b F Lf link bandwidth size Also known as Ack/Nack flow control ECE 8813a (35) ECE 8813a (36) 9

Optimism Optimistic/Reliable Flow Control v Inefficient/increased buffer usage o Messages held at source o Re-ordering may be required due to out of order reception Reliable v Must deal with out-of-order reception v Need sophisticated buffer management schemes for multi-source control Generally give way to credit-schemes or stopand-go schemes v Small buffers à credit-based v Large buffers à stop-and-go A A B B Virtual Channel Flow Control Channels and buffers are dynamically allocated network resources Physical channels are idle when messages block ECE 8813a (37) ECE 8813a (38) Virtual Channels Virtual Channel Flow Control Per VC state status credits vc state Output packet Output buffers DEMUX... MUX Link Control Unidirectional Physical Channel Link Control Input buffers DEMUX... MUX Each virtual channel is a unidirectional channel v Independently managed buffers multiplexed over the physical channel Each channel is independently flow controlled Improves performance through reduction of blocking delay Important in realizing deadlock freedom (later) type VC Virtual Channels As the number of virtual channels increase, the increased channel multiplexing has multiple effects (more later) v Overall performance v Router complexity and critical path Flits/phits must now record VC information v Or send VC information out of band ECE 8813a (39) ECE 8813a (40) 10

Intel Single Chip Cloud Computer (SCC) Intel SCC Message Format Memory Controller V 1 Memory Controller Flit Types: Null Credit Body/tail control Memory Controller V 2 Memory Controller n 24 dual core tiles n 8 voltage and 28 frequency islands n X-Y routed mesh: 144 bit physical channels ECE 8813a (41) J. Howard et. Al, A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling. IEEE Journal of Solid-State Circuits, vol. 46, no. 1, January 2011. ECE 8813a (42) Flow Control: Global View Flow control parameters are tuned based on link length, link width and processing overhead at the end-points Effective FC and buffer management is necessary for high link utilizations à network throughput v In-band vs. out of band flow control Flow Control: Global View Latency: overlapping FC, buffer management and switching à impacts end-to-end latency In-band vs. out-of band flow control v Use link bandwidth vs. additional side-band signals Links maybe non-uniform, e.g., lengths/widths on chips v Buffer sizing for long links ECE 8813a (43) ECE 8813a (44) 11

Commercial Examples AMD HyperTransport credit based Intel QuickPath credit based Infiniband credit based Ethernet On/Off Myrinet Stop-and-Go PCI Express credit based IBM Blue Gene token flow control Cray T3E credit based Some Research Questions Reliable Flow Control v PVT effects for high speed links v Encoding schemes, e.g., for power efficiency Adaptive flow control v Buffer and congestion management v Quality of Service (QoS) End-to-End v Flow control for multicast v Multisource flow control (networks) Low power designs v Error rate vs. voltage scaling v Link and buffer widths and depths v On-off schemes ECE 8813a (45) ECE 8813a (46) Summary Flow control, buffer management and switching are closely related and generally co-designed v Closest to the physical layer and directly impact utilization and latency v Object of significant tuning How are these schemes impacted by and integrated with switch designs? ECE 8813a (47) 12