EE/CSCI 451: Parallel and Distributed Computation

Similar documents
EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation

Interconnection networks

Physical Organization of Parallel Platforms. Alexandre David

Lecture 3: Topology - II

EE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network

EE/CSCI 451 Midterm 1

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

CS575 Parallel Processing

INTERCONNECTION NETWORKS LECTURE 4

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CSC630/CSC730: Parallel Computing

Interconnection Network

Lecture 2: Topology - I

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Interconnection Networks. Issues for Networks

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

EE/CSCI 451: Parallel and Distributed Computation

Parallel Architecture. Sathish Vadhiyar

Lecture: Interconnection Networks

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Network-on-chip (NOC) Topologies

Overview. Processor organizations Types of parallel machines. Real machines

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Parallel Architectures

High Performance Computing Programming Paradigms and Scalability

Advanced Parallel Architecture. Annalisa Massini /2017

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Multiprocessor Interconnection Networks- Part Three

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnect Technology and Computational Speed

4. Networks. in parallel computers. Advances in Computer Architecture

High Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Parallel Computing Platforms

Chapter 4 : Butterfly Networks

EE/CSCI 451: Parallel and Distributed Computation

EE382 Processor Design. Illinois

CS Parallel Algorithms in Scientific Computing

Topologies. Maurizio Palesi. Maurizio Palesi 1

SHARED MEMORY VS DISTRIBUTED MEMORY

What is Parallel Computing?

Scalability and Classifications

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

EE 4683/5683: COMPUTER ARCHITECTURE

Lecture 2 Parallel Programming Platforms

Interconnection Networks

Communication Performance in Network-on-Chips

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

COSC 6374 Parallel Computation. Parallel Computer Architectures

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

COSC 6374 Parallel Computation. Parallel Computer Architectures

Model Questions and Answers on

Homework Assignment #1: Topology Kelly Shaw

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Simulation Analysis of Permutation Passibility behavior of Multi-stage Interconnection Networks

Advanced and parallel architectures. Part B. Prof. A. Massini. June 13, Exercise 1a (3 points) Exercise 1b (3 points) Exercise 2 (8 points)

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

The final publication is available at

Last Time. Intro to Parallel Algorithms. Parallel Search Parallel Sorting. Merge sort Sample sort

Chapter 9 Multiprocessors

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation

Interconnection Networks

Interconnection Networks

Chapter 2: Parallel Programming Platforms

Interconnection Networks

EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches

Routing Algorithms. Review

Parallel Computing Interconnection Networks

Topology basics. Constraints and measures. Butterfly networks.

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

GIAN Course on Distributed Network Algorithms. Network Topologies and Interconnects

GIAN Course on Distributed Network Algorithms. Network Topologies and Local Routing

Parallel Architectures

Homework #4 Due Friday 10/27/06 at 5pm

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

CS 614 COMPUTER ARCHITECTURE II FALL 2005

Introduction to Parallel Computing

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Parallel Computing Platforms

The Impact of Optics on HPC System Interconnects

Transcription:

EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1

From last class Outline Crossbar network Shuffle exchange network Multistage network Today CLOS network Butterfly network Hypercube network Tree-based network Performance metrics 2

Announcements HW #2 out Today Programming HW #2 out last Friday (27 th Jan) 3

Large software project Scientific computing Graph analytics Big data Sample course projects Course Project (1) Multi-core implementation of data plane kernels for software defined networking (e.g., traffic classification, packet classification, etc.) Accelerating convolutional neural networks (CNN) using GPU Big data analytics using Spark (e.g., graph analysis, log processing, etc.) Students can work in teams of 3 4

Course Project (2) Project timeline Week 5: Project Discussion Week 5-8: Identify team members and project topic Week 9: Project proposal due (March 10 th ) Week 14-15: Presentation Week 15: Project report due (April 28 th ) Grading breakdown for the course project Proposal: 25% Final presentation: 25% Final report: 50% 5

CLOS network (1) Multistage network n input crossbar Cost: α n 2 Cost: α n n input crossbar 6

CLOS network (2) Structure of CLOS network Control n n 7

CLOS network (3) Stage i Stage i + 1 connections Any Box all boxes in the next stage 3 stage network Number of switches (boxes) = 3 n Cost of n n crossbar = O n - Total Cost = O n n = O(n / 0) Note: CLOS network can realize all n! permutations 8

CLOS network (4) Realizing a permutation Choose the control setting for each of the 3 n boxes so the desired connection is realized # of control bits/box = log( n!) A permutation can be specified by n log n bits 9

CLOS network (5) Example: 9 inputs / outputs 0 1 2 3 i i + 3 mod 9 1 0 1 2 3 4 5 2 3 4 5 6 7 8 3 6 7 8 10

CLOS network (6) Example: 4 input / output CLOS network Example permutation: Example Permutation: 0 2 1 3 2 1 3 0 Note: All 4! permutations can be realized 11

CLOS network (7) Another example: 4 input / output CLOS network Example permutation: Permutation: 0 2 1 3 2 1 3 0 12

Routing problem Given a permutation p CLOS network (8) Specify the switch settings such that all connections in p are realized Each 3 n box permutation on n inputs 13

CLOS network (9) Non blocking network Any connection request from input to output can be routed at any time without rearranging the existing set of connections at that time 14

CLOS network (10) Rearrangeable network (1) To route a connection, we may have to rearrange existing connections, i.e., change the control settings of some switches that are routing an existing connection Stage 0 Stage k 1 0 n 1... Permutation ; Switches -... Permutation ; Switches -... 15

CLOS network (11) Rearrangeable network (2) A (multistage) network is rearrangeable if any connection (i j) can be realized by (possibly) rearranging some existing connections. 16

n = 2 k for some k Butterfly network (1) 8-input butterfly network Stage 0 Stage 3 log - n + 1 stages Stage l, 0 l < log - n i i in Stage l + 1 i i complement bit l in Stage l + 1 l = 0 l = 1 l = 2 17

Butterfly network (2) n input butterfly network Total number of nodes = n E log - n + 1 Total number of edges = n E 2 E log - n # of edges/node # of stages 18

Mesh-connected Network 1-D mesh 2-D mesh (without wraparound) Without wraparound 1-D torus With wraparound ring 0 1 p-1 0 1 p-1 p p Number of connections per node 2k 19

Hypercube Network (1) 100 110 Existing edge Added edge 0 1 00 10 01 11 000 010 101 001 011 111 0-D hypercube 1-D hypercube 2-D hypercube 3-D hypercube 0100 0110 0000 0010 0101 0111 0001 0011 1100 1110 1000 1010 1101 1111 4-D hypercube 1001 1011 Construction of hypercube from hypercubes of lower dimension 20

Hypercube Network (2) In general, for k dimension hypercube: p = total number of nodes = 2 F Node i = i FGH i FG- i J k connections/node complement a bit of i Example: 001 000 010 100 000 100 101 010 110 111 001 011 21

Tree-based Network (1) Processing node Switching node Height: log p + 1 1 Static tree network Dynamic tree network p = total number of nodes 22

Tree-based Network (2) Fat tree (16 processing nodes) 23

Performance metrics (1) Diameter Diameter Maximum distance between any two processing nodes in the network diameter max{distance(i, j)} (T,V) distance(i, j) min path length between i and j length of the shortest path between i and j length of a path = # of edges in the path 24

Example: 0 1 2 Performance metrics (2) 3 Distance 0 1 1 0 2 1 0 3 1 1 0 1 1 2 2 1 3 1 2 0 1 2 1 2 2 3 1 3 0 1 3 1 1 3 2 1 Diameter = 2 25

Performance metrics (3) Diameter Example: 1-D mesh (no wraparound): p 1 { 0 p 1 } 2-D mesh (no wraparound): 2( p 1) { (0,0) ( p 1, p 1)} Hypercube : log - p { 0 p 1 } 26

Performance metrics (4) Diameter = d Routing can be performed in at most d steps (hops). 27

Performance metrics (5) Bisection width (1) Bisection width Minimum number of links to be removed to partition the network into two equal-sized (in number of nodes) subnetworks partition links g - nodes g - nodes Minimum over all possible partitions 28

Performance metrics (6) Example Bisection width (2) partition p p p for 2-D mesh (no wraparound) 29

Performance metrics (7) Bisection bandwidth (3) Bandwidth number of bits/sec Example: 8 bits Bandwidth = 8 clock rate Bisection bandwidth Minimum bandwidth between any two equal halves (number of bits/sec that can be exchanged between 2 equal halves of the network) g nodes g nodes - - i bandwidth of each link 30

Performance metrics (8) Cost of a static network Cost number of communication links in the network Example: Tree: p 1 1-D mesh (no wraparound): p 1 d-dimensional wraparound mesh: d E p Hypercube: gelmn 0g - k-ary d-cube A d-dimensional array with k elements in each dimension Number of nodes p = k o Cost: dp 31

Summary Network Diameter Bisection width Cost (No. of links) Completely connected 1 p - 4 p(p 1) 2 Star 2 1 p 1 Complete binary tree 2 log p + 1 2 1 p 1 1-D mesh, no wraparound p 1 1 p 1 2-D mesh, no wraparound 2( p 1) p 2(p p) 2-D wraparound mesh 2 p/2 2 p 2p Hypercube log p p/2 (p E log p)/2 Wraparound k-ary d-cube p = k o d k/2 2k ogh dp 32

Summary Static network Multistage network Cost Performance Routing Diameter Bisection bandwidth 33