Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Similar documents
Interconnection Network

Parallel Computing Platforms

Physical Organization of Parallel Platforms. Alexandre David

Interconnection Network

CSC630/CSC730: Parallel Computing

Interconnection Networks. Issues for Networks

4. Networks. in parallel computers. Advances in Computer Architecture

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Design of Parallel Algorithms. The Architecture of a Parallel Computer

Interconnection networks

CS Parallel Algorithms in Scientific Computing

INTERCONNECTION NETWORKS LECTURE 4

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

CS575 Parallel Processing

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Parallel Programming Platforms

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Lecture: Interconnection Networks

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Interconnection Networks

TDT Appendix E Interconnection Networks

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Network-on-chip (NOC) Topologies

Parallel Architecture. Sathish Vadhiyar

SHARED MEMORY VS DISTRIBUTED MEMORY

EE/CSCI 451: Parallel and Distributed Computation

Parallel Architectures

Lecture 2 Parallel Programming Platforms

Multiprocessor Interconnection Networks

Parallel Computing Platforms

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Lecture 2: Topology - I

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

COSC 6374 Parallel Computation. Parallel Computer Architectures

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

EE/CSCI 451: Parallel and Distributed Computation

Parallel Computing Platforms

COSC 6374 Parallel Computation. Parallel Computer Architectures

Lecture 3: Topology - II

Interconnect Technology and Computational Speed

Model Questions and Answers on

Dr e v prasad Dt

Advanced Parallel Architecture. Annalisa Massini /2017

EE382 Processor Design. Illinois

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

Interconnection Networks

Multiprocessor Interconnection Networks- Part Three

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Scalability and Classifications

EE/CSCI 451: Parallel and Distributed Computation

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

High Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Overview. Processor organizations Types of parallel machines. Real machines

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

ECE 697J Advanced Topics in Computer Networks

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Three basic multiprocessing issues

Chapter 2: Parallel Programming Platforms

Communication Performance in Network-on-Chips

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Types of Parallel Computers

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

INTERCONNECTION TECHNOLOGIES. Non-Uniform Memory Access Seminar Elina Zarisheva

Chapter 9 Multiprocessors

Topologies. Maurizio Palesi. Maurizio Palesi 1

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Principles of Parallel Algorithm Design: Concurrency and Mapping

Parallel Architectures

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

John Mellor-Crummey Department of Computer Science Rice University

Cache Coherency and Interconnection Networks

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

EE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100

Limitations of Memory System Performance

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

Interconnection Networks

EE 4683/5683: COMPUTER ARCHITECTURE

GIAN Course on Distributed Network Algorithms. Network Topologies and Local Routing

Convergence of Parallel Architecture

High Performance Computing Programming Paradigms and Scalability

Transcription:

Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

Topics Taxonomy Metric Topologies Characteristics Cost Performance 2

Interconnection Networks Carry data between processors and to memory. Components Switches Links (wires, fiber) Classifications Static networks Point-to-point communication links among processing nodes A.k.a. direct networks Dynamic networks Using switches and communication links A.k.a. indirect networks 3

Static and Dynamic Networks static(direct) network dynamic(indirect) network 4

Dynamic Network Switch Map a fixed number of inputs to outputs Number of ports Degree of the switch. Switch cost Grows as the square of switch degree Packaging costs linearly as the number of pins 5

Network Interfaces Links processors (or node) to the interconnect Functions Packetizing communication data Computing routing information Buffering incoming/outgoing data Network interface connection I/O buss: Peripheral Component Interface express (PCIe) Memory bus: e.g., AMD HyperTransport, Intel QuickPath Network performance Depends on relative speeds of I/O and memory busses 6

Example: Intel Quickpath Interconnect 7

Network Topologies A variety of network topologies exist Topologies tradeoff performance for cost Commercial machines often implement hybrids of multiple topologies Due to packaging, cost, and available components 8

Metrics for Interconnection Networks Degree # of links per node Diameter Longest distance between two nodes in the network Worst case communication latency Bisection width Minimum # of wire cuts to divide the network into two equal parts Cost # of links and switches 9

Network Topologies: Buses All processors access a common bus for exchanging data Used in simplest and earliest parallel machines Ex. Sun enterprise servers, Intel Pentium Advantages Distance between any two nodes is O(1) Provides a convenient broadcast media Disadvantage Bus bandwidth is a performance bottleneck P P P Bus 10

Network Topologies: Buses CPU Interrupt controller 256-KB L 2 $ P-Pro module P-Pro module P-Pro module Bus interface P-Pro bus (64-bit data, 36-bit addr ess, 66 MHz) Bus-based interconnects without local caches PCI bridge PCI bridge Memory controller PCI I/O cards PCI bus PCI bus MIU 1-, 2-, or 4-way interleaved DRAM Interconnects in Intel Pentium Pro Quad Bus-based interconnects with local caches 11

Network Topologies: Crossbars A crossbar network uses an p m grid of switches to connect p inputs to m outputs in a non-blocking manner 12

Network Topologies: Crossbars Cost O(p 2 ) for p processors (and memory banks) Difficult to scale for large values of p Ex. Sun Ultra HPC 10000 and the Fujitsu VPP500. 13

Multistage Networks Busses Excellent cost scalability Poor performance scalability Crossbars Excellent performance scalability Poor cost scalability Multistage interconnects Compromise between these extremes 14

Multistage Networks The schematic of a typical multistage interconnection network. 15

Multistage Omega Network Organization log p stages p Inputs and outputs At each stage, input i is connected to output j if: 16

Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs 17

Multistage Omega Network The perfect shuffle patterns are connected using 2 2 switches. The switches operate in two modes Pass-through Cross-over Pass-through Cross-over 18

Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. Cost: p/2 log p switches à O(p log p) 19

Multistage Omega Network Routing s is source processor in binary representation d is destination processor in binary representation In each stage if the most significant bits in s and d are the same Pass-through Otherwise Cross-over Strip the most significant bits Repeat for each of the log p switching stages 20

Multistage Omega Network Routing cross-over cross-over pass-through Example: 001 à 100 1. Stage 1: 0!= 1 à cross-over 2. Stage 2: 0 == 0 à pass-through 3. Stage 3: 1!= 0 à cross-over 21

Blocking in Omega Network One of the messages (010 to 111 or 110 to 100) is blocked at link AB 22

Completely Connected Network Each processor is connected to every other processor Costs # of links is O(p 2 ) Performance scales very well Hardware complexity is not realizable for large values of p Static counterparts of crossbars. 23

Star Connected Network Every node is connected only to a common node at the center Distance between any pair of nodes O(1) But, the central node becomes a bottleneck Static counterparts of buses 24

Linear Array & Ring Linear array Each node has two neighbors, one to its left and one to its right Ring (or 1-D torus) If the nodes at either end are connected (having a wrap-around link) 25

Meshes and k-dimensional Meshes Mesh Generalization of linear array to 2D 4 neighbors (north, south, east, and west) k-dimensional mesh 2k neighbors 2D mesh 2D torus 3D mesh 26

Hypercubes 0D 1D 2D 3D 4D 27

Hypercubes Distance between any two nodes is at most log p Each node has log p neighbors Distance between two nodes # of bit positions at which the two nodes differ 28

Tree-Based Networks Static tree network Dynamic tree network 29

Tree-Based Networks Distance between two nodes is at most 2 log p Easy to layout as planar graphs E.g. H-Trees H-Tree Root can become bottleneck Links closer to root carry more traffic than those at lower levels Solution: fat tree Fattens the links as we go up the tree 30

Fat Tree A fat tree network of 16 processing nodes. 31

Evaluating Interconnection Networks Diameter Longest distance between two nodes Measuring the longest latency of possible communications Bisection Width Minimum # of wire cuts to divide the network into two equal parts Measuring # of concurrent communications two concurrent communications vs. 4 concurrent communications Cost # of links or switches Ability to layout the network Length of wires 32

Static Interconnection Networks Network Diameter Bisection Width Cost (# of links) Completely-connected Star Complete binary tree Linear array 2-D mesh, no wraparound 2-D wraparound mesh Hypercube Wraparound k-ary d-cube 33

Dynamic Interconnection Networks Network Diameter Bisection Width Cost (# of switches) Crossbar Omega Network Dynamic Tree 34

Summary Interconnection network Performance (latency, bandwidth), Cost (#links, #switches) Used to be important, becomes less important Likely to be important for multi-core processors Topologies Low dimension networks Bus, ring, mesh, torus embedding into 2D/3D Direct network (nodes are connected directly) Logarithmic networks (multi-stage networks) More switches between nodes (nodes are connected indirectly) High dimension networks Hypercube (binary n-cube) theoretically good characteristics But degree of node increases exponentially impractical in real world 35

References Chapter 2.4.2-2.4.4 in Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Addison Wesley, 2003 COMP422: Parallel Computing by Prof. John Mellor-Crummey at Rice Univ. 36