CSC630/CSC730: Parallel Computing

Similar documents
Physical Organization of Parallel Platforms. Alexandre David

Parallel Architecture. Sathish Vadhiyar

Interconnection Network

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

4. Networks. in parallel computers. Advances in Computer Architecture

Interconnection Network

Interconnection networks

CS575 Parallel Processing

Interconnection Networks. Issues for Networks

SHARED MEMORY VS DISTRIBUTED MEMORY

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

INTERCONNECTION NETWORKS LECTURE 4

CS Parallel Algorithms in Scientific Computing

Model Questions and Answers on

Lecture 2 Parallel Programming Platforms

EE/CSCI 451: Parallel and Distributed Computation

Parallel Architectures

EE/CSCI 451: Parallel and Distributed Computation

Scalability and Classifications

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

EE 4683/5683: COMPUTER ARCHITECTURE

Chapter 9 Multiprocessors

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Interconnect Technology and Computational Speed

Lecture: Interconnection Networks

EE/CSCI 451: Parallel and Distributed Computation

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Network-on-chip (NOC) Topologies

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

TDT Appendix E Interconnection Networks

Multiprocessor Interconnection Networks- Part Three

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Overview. Processor organizations Types of parallel machines. Real machines

Chapter 2: Parallel Programming Platforms

CS 614 COMPUTER ARCHITECTURE II FALL 2005

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Types of Parallel Computers

Advanced Parallel Architecture. Annalisa Massini /2017

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Parallel Programming Platforms

High Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

COSC 6374 Parallel Computation. Parallel Computer Architectures

Goals of this Course

Design of Parallel Algorithms. The Architecture of a Parallel Computer

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

COSC 6374 Parallel Computation. Parallel Computer Architectures

High Performance Computing Programming Paradigms and Scalability

Parallel Computing Platforms

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

Lecture 2: Topology - I

Dr e v prasad Dt

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Cache Coherency and Interconnection Networks

Lecture 3: Sorting 1

Interconnection Networks

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Parallel Architectures

Lecture 3: Topology - II

Chapter 8 : Multiprocessors

Networks. Distributed Systems. Philipp Kupferschmied. Universität Karlsruhe, System Architecture Group. May 6th, 2009

Data Communication and Parallel Computing on Twisted Hypercubes

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

Parallel Computer Architecture II

EE382 Processor Design. Illinois

Multiprocessor Interconnection Networks

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

Processor Performance. Overview: Classical Parallel Hardware. The Processor. Adding Numbers. Review of Single Processor Design

Fundamentals of Networking Types of Topologies

The Impact of Optics on HPC System Interconnects

Parallel Computing Platforms

Overview: Classical Parallel Hardware

What is Parallel Computing?

Last Time. Intro to Parallel Algorithms. Parallel Search Parallel Sorting. Merge sort Sample sort

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Taxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Communication Performance in Network-on-Chips

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Lecture 8 Parallel Algorithms II

Lecture 18: Communication Models and Architectures: Interconnection Networks

Transcription:

CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control structure Communication model Physical organization (actual hardware) Interconnection networks Network topologies Characteristics 2 1

Interconnection Networks There are two main types of interconnection networks: Static networks and dynamic networks Dr. Joe Zhang PDC-4: Topology Static Networks Also called direct networks Each vertex corresponds to a node. Has point-to-point communication links No switches at vertices in static networks. If there is no direct connection between two nodes, intermediate nodes would have to forward communication between them. Static networks can be arranged as a linear array, a ring, hypercube, 2d mesh, 3d mesh, and 2d torus, in increasing order of connectivity. Examples The Intel Paragon: a 2D mesh The Cray T3E: a 3d torus. Both scale to thousands of nodes. 4 2

Dynamic Networks Also called indirect networks Some vertices correspond to switches that route communications. A crossbar switch would be optimal but very expensive. Most switches are multistage Examples are omega networks. 5 Network Topology - Bus 6 3

Bus The cost of network scales linearly, O(p) The distance between any two nodes: O(1) Ideal for broadcasting information among nodes The bounded bandwidth affects the performance Reduce demand on bus bandwidth Provide cache for each node Cache private data Only access remote data through bus Scalable in terms of cost but unscalable in terms of performance 7 Network Topology - Crossbar 8 4

Crossbar A non-blocking network Total number of switches: Q(pb) Assume that b is at least p. (reasonable?) As p increase, the complexity grows as Ω(p 2 ) Scalable in terms of performance but unscalable in terms of cost 9 Network Topology - Multistage 10 5

Multistage Network -- Omega An intermediate class of networks More scalable than the bus in terms of performance More scalable than the crossbar in terms of cost A common used Omega network p processing nodes b memory banks (b=p) log p stages A link exist between input i and output j 11 Interconnection Pattern (Omega) Left rotation of binary representation of i and j 2 i, 0 i p / 2 1 j 2i 1 p, p / 2 i p 1 12 6

Omega Network Switching nodes: p/ 2 log p Cost of network: ( plog p) Routing data in an Omega network: Let s be binary representation of a processor that needs to write some data into memory bank t First stage: if the most significant bits of s and t are the same, data is routed in pass-through mode If they are different, the data is routed in cross-over mode Repeated at next stage using the next most significant bit. 13 Blocking in Omega Network 14 7

Completely-Connected Network Star-Connected Network Desirable but impractical 15 Completely Connected Network Completely connected network: each node has a link to every other node. N nodes could have n-1 links from each node to other n-1 nodes. Therefore, there should be n(n-1)/2 links in all. It is applied to small n. not practical to large n 16 8

Linear Array Line/Ring: each node has two links and link only to neighboring node N-node ring requires n links Two end node are farthest away in a line and hence the diameter is n-1 17 2D and 3D Meshes N=16 Links 21 Diameter 2*(sqrt(16)-1)=6 N=16 Links 32 Diameter 4 Naturally map a regularly structured computation to 2D or 3D mesh. 3D Cube used in Cray T3E 18 9

Hypercube 19 Hypercube Construct a cube with p nodes from two subcubes of p/2 nodes Numbering scheme for nodes in a hypercube Derived from the construction of a hypercube Prefixing the labels of one of the subcubes with a 0 and the labels of the other subcube with a 1. Useful property The minimum distance between two nodes is given by the number of bites that are different in the two labels. Nodes labeled 0110 and 0101 are two link apart Useful for deriving a number of parallel algorithms 20 10

Tree-Based Network 21 Tree-Based Network Tree Network: binary network or hierarchy tree network; each node has two links that link to two nodes. Total nodes with j levels: 2 j+1-1 root level: one node First level: two nodes Second level: four nodes jth level: 2 j nodes CM5 system deploys such architecture 22 11

Cost and Performance of Static Network Network Criteria Diameter The maximum distance between any two processing nodes in the network Distance between two processing nodes is defined as the shortest path (in terms of number of links) between them Connectivity A measure of the multiplicity of paths between any two processing nodes High connectivity is desirable Reduce contention Arc connectivity The minimum number of arcs that must be removed from the network to break it into two disconnected networks 23 Cost and Performance Network Criteria Bisection width The minimum number of communication links that must be removed to partition the network into two equal halves Bisection width of a completely connected network: p 2 /4 Bisection bandwidth The minimum volume of communication allowed between any two halves of the network Cost Number of communication links 24 12

Characteristics of Static Networks 25 Summary Interconnection network Static and dynamic networks Network topology Characteristics 26 13

CSC630/CSC730: Parallel Computing Questions? Dr. Joe Zhang PDC-4: Topology 27 14