Lecture 2: Topology - I

Similar documents
Lecture 3: Topology - II

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Lecture 7: Flow Control - I

Lecture 3: Flow-Control

4. Networks. in parallel computers. Advances in Computer Architecture

Physical Organization of Parallel Platforms. Alexandre David

Interconnection Networks

Network-on-chip (NOC) Topologies

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Interconnection Networks

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Topologies. Maurizio Palesi. Maurizio Palesi 1

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

Interconnection Network

Topology basics. Constraints and measures. Butterfly networks.

Lecture 13: System Interface

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

Topologies. Maurizio Palesi. Maurizio Palesi 1

Interconnection networks

Interconnection Networks

ECE/CS 757: Advanced Computer Architecture II Interconnects

Homework Assignment #1: Topology Kelly Shaw

Interconnection Networks. Issues for Networks

Interconnection Network

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Lecture: Interconnection Networks

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Lecture 1: Introduction

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Packet Switch Architecture

Packet Switch Architecture

Multiprocessor Interconnection Networks

Chapter 9 Multiprocessors

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Parallel Computing Interconnection Networks

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Multiprocessor Interconnection Networks- Part Three

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

CS575 Parallel Processing

Network on Chip Architecture: An Overview

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

TDT Appendix E Interconnection Networks

Parallel Computing Platforms

Lecture 2 Parallel Programming Platforms

Lecture 1: Introduction

EE/CSCI 451: Parallel and Distributed Computation

Chapter 3 : Topology basics

EE382 Processor Design. Illinois

Interconnection Networks

INTERCONNECTION NETWORKS LECTURE 4

Chapter 4 : Butterfly Networks

Communication Performance in Network-on-Chips

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

GIAN Course on Distributed Network Algorithms. Network Topologies and Local Routing

Parallel Architectures

Parallel Architecture. Sathish Vadhiyar

Lecture 12: SMART NoC

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

ES1 An Introduction to On-chip Networks

ECE 551 System on Chip Design

Basic Switch Organization

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

The Impact of Optics on HPC System Interconnects

A Survey of Techniques for Power Aware On-Chip Networks.

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

Advanced Parallel Architecture. Annalisa Massini /2017

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Overview. Processor organizations Types of parallel machines. Real machines

EE/CSCI 451: Parallel and Distributed Computation

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

Network-on-Chip Architecture

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

CSC630/CSC730: Parallel Computing

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

Parallel Architectures

GIAN Course on Distributed Network Algorithms. Network Topologies and Interconnects

Chapter 7 Slicing and Dicing

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

The final publication is available at

Prediction Router: Yet another low-latency on-chip router architecture

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs

18-740/640 Computer Architecture Lecture 16: Interconnection Networks. Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Interconnect Technology and Computational Speed

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees

A Case for Random Shortcut Topologies for HPC Interconnects

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Transcription:

ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and Computer Engineering Georgia Institute of Technology tushar@ece.gatech.edu Acknowledgment: Slides adapted from Univ of Toronto ECE 1749 H (N Jerger) and MIT 6.883 (L-S Peh)

Recap 2 Processor Cache Processor Cache Processor Cache Processor Cache Memory Controller On-Chip Network / Network-on-Chip Chip Processor+ Cache Processor+ Cache Processor+ Cache Processor+ Cache Mem I/O Mem I/O Mem I/O Mem I/O System/Storage Area Network

Why NoCs are important 3

Network Architecture Topology How to connect the nodes ~Road Network Routing Which path should a message take ~Series of road segments from source to destination Flow Control When does the message have to stop/proceed ~Traffic signals at end of each road segment Router Microarchitecture How to build the routers ~Design of traffic intersection (number of lanes, algorithm for turning red/green) 4

Network Performance 5 Saturation Throughput: the injection rate at which latency ~3x(zero-load latency) Latency Zero load latency (topology+routing+f low control) Min latency given by routing algorithm Throughput given by flow control Throughput given by routing Throughput given by topology Min latency given by topology Offered Traffic (bits/sec)

Topology: How to connect the nodes with links 6 ~Road Network

7 Topology Overview Often the first step in network design Significant impact on network cost-performance Determines implementation complexity, i.e., cost number of routers and links router degree (i.e., ports) Ease of layout Determines application performance number of hops à latency and energy consumption maximum throughput

8 How to select a topology? A B Best topology? A B F D C D E C F E Application s Task Communication Graph Vertices - tasks Edges - communication Topology is fixed at design-time. Benefits to being regular and flexible Network Topology Graph Problems? Cannot change algorithm Cannot change mapping Vertices - cores Edges - links Cannot adapt to data-dependent load imbalance in application Layout/packaging issue with long wires and high-node degree

Let us define some design-time metrics Degree number of ports at a node Proxy for area/energy cost 9 Bisection Bandwidth - bandwidth crossing a minimal cut that divides the network in half (Min # channels crossing two halves) * (BW of each channel) Proxy for peak bandwidth Can be misleading as it does not account for routing and flow control efficiency At this stage, we assume ideal routing (perfect load balancing) and ideal flow control (no idle cycles on any channel) Diameter maximum routing distance (number of links in shortest route) Proxy for latency

Some run-time metrics Hop count (or routing distance) Number of hops between a communicating pair Depends on application and mapping Average hop count or Average distance: average hops across all valid routes Channel load Number of flows passing through a particular link Depends on application and mapping Maximum channel load determines throughput Path diversity Number of shortest paths between a communicating pair Can be exploited by routing algorithm Provides fault tolerance 10

Regular topologies 11 A B Can you suggest a regular topology (each router with same degree) with smallest possible diameter? D F E C Trick question :p One node. Degree = 0, Diameter = 0. Application s Task Communication Graph Vertices - tasks Edges - communication

Regular topologies 12 A B Can you suggest a regular topology (each router with same degree) with diameter = 1? D F E C Application s Task Communication Graph Vertices - tasks Edges - communication Bisection BW =? 9 Challenge? Not scalable!! Cannot layout more than 4-6 cores in this manner for area and power reasons Bisection Cut Fully Connected Degree =? 5

13 Bus Diameter =? 1 Degree =? 1 Bisection BW =? 1 Pros Cost-effective for small number of nodes Easy to implement snoopy coherence Most multicores with 4-6 cores use Buses Cons Bandwidth! à Not scalable

Popular Bus Protocols ARM AMBA Bus AHB AXI ACE CHI IBM Core Connect ST Microelectronics STBus 14 How to increase bus bandwidth? Hierarchical Buses Split-buses

Route Packets, Not Wires! 15 Dally and Towles, DAC 2001 The birth of on-chip switched networks

Topology Classification Direct Each router (switch) is associated with a terminal node All routers are sources and destinations of traffic Example: Ring, Mesh, Torus Most on-chip networks use direct topologies 16 Indirect Routers (switches) are distinct from terminal nodes Terminal nodes can source / sink traffic Intermediate nodes switch traffic Examples: Crossbar, Butterfly, Clos, Omega, Benes, Next lecture

17 Ring and Torus Formally: k-ary n-cube k n network nodes n-dimensional grid with k nodes in each dimension 8-ary 1-cube 4-ary 2-cube

18 Ring A B Pros Cheap: O(N) cost Used in most multicores today Diameter? Avg Distance? Bisection BW? Degree? N/2 N/4 2 2 Cons High latency Difficult to scale bisection bandwidth remains constant No path diversity 1 shortest path from A to B

19 Torus A Diameter? Bisection BW? Degree? B N 2 N 4 Pros O(N) cost Exploit locality for nearneighbor traffic High path diversity 6 shortest paths from A to B Edge symmetric good for load balancing Same router degree Cons Unequal link lengths Harder to layout

20 Mesh Diameter? A Bisection BW? Degree? B 2( N-1) N 4 Pros O(N) cost Easy to layout on-chip: regular and equal-length links Path diversity 3 shortest paths from A to B Cons Not symmetric on edges Performance sensitive to placement on edge vs. middle Different degrees for edge vs. middle routers Blocking, i.e., certain paths can block others (unlike crossbar)

Folded Torus 21 Easier to layout Is there any con compared to the mesh? All channels have double the length

Multi-dimensional topologies Used in Supercomputers, Datacenters, and other off-chip System Area Networks 22 Example: 2,3,4-ary 3 Mesh

Run-Time Metrics Hop Count Latency 23 Maximum Channel Load Throughput We will consider Uniform Random Traffic

24 Hop Count 9-ary 1 cube 3-ary 2 cube 3-ary 2 mesh Max = Avg = 4 2 4 2.22 1.33 1.77 k-ary n cube k-ary n mesh

Network Latency 25 T = H.t r + T w + T s + T c H = number of hops t r = router delay T w = wire delay T s = serialization delay T c = contention delay T = H.t r + D / v + L / b + T c D = wire distance v = propagation velocity L = packet length b = channel bandwidth

26 How to reduce hop count? Low-diameter topology Challenge? high-radix of each switch Some dedicated long-range links High-radix for few switches How to decide where to add long links?

27 ST Spidergon Proprietary NoC from ST Microelectronics Pseudo-regular topology All routers have 2 or 3 ports Depending on application BW needs, links can be added removed e.g, (a) vs. (b), and (c) vs. (d) Easy to layout

28 Small World Networks Milgram s Experiment and Six degrees of separation Common across neurons, WWW, electrical power grid, Add few long-distance links to a mesh randomly reduces average distance ~ logn `It s a Small World After All : NoC Performance Optimization Via Long-Range Link Insertion, VLSI 2006

Run-Time Metrics Hop Count Latency 29 Maximum Channel Load Throughput We will consider Uniform Random Traffic