Scalability and Classifications

Similar documents
Lecture 2 Parallel Programming Platforms

Overview. Processor organizations Types of parallel machines. Real machines

Parallel Architectures

Types of Parallel Computers

What is Parallel Computing?

Interconnect Technology and Computational Speed

Parallel Architectures

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CS Parallel Algorithms in Scientific Computing

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Parallel Architecture. Sathish Vadhiyar

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Chapter 9 Multiprocessors

INTERCONNECTION NETWORKS LECTURE 4

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

COSC 6374 Parallel Computation. Parallel Computer Architectures

SHARED MEMORY VS DISTRIBUTED MEMORY

Advanced Parallel Architecture. Annalisa Massini /2017

COSC 6374 Parallel Computation. Parallel Computer Architectures

BİL 542 Parallel Computing

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Computer parallelism Flynn s categories

CSC630/CSC730: Parallel Computing

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Design of Parallel Algorithms. The Architecture of a Parallel Computer

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

4. Networks. in parallel computers. Advances in Computer Architecture

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Architectures

Physical Organization of Parallel Platforms. Alexandre David

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Chapter 11. Introduction to Multiprocessors

BlueGene/L (No. 4 in the Latest Top500 List)

Interconnection Network

CS575 Parallel Processing

Interconnection Networks. Issues for Networks

Lecture 7: Parallel Processing

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Interconnection Network

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EE 4683/5683: COMPUTER ARCHITECTURE

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

PARALLEL COMPUTER ARCHITECTURES

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Interconnection Networks

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Architecture

EE/CSCI 451: Parallel and Distributed Computation

Multiprocessors - Flynn s Taxonomy (1966)

EE/CSCI 451: Parallel and Distributed Computation

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Lecture: Interconnection Networks

Interconnection Networks

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Top500 Supercomputer list

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Multi-Processor / Parallel Processing

TDT Appendix E Interconnection Networks

Introduction to parallel computing

Taxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism

COSC 6385 Computer Architecture - Multi Processor Systems

Lecture 7: Parallel Processing

Parallel Computers. c R. Leduc

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

CS/COE1541: Intro. to Computer Architecture

Module 5 Introduction to Parallel Processing Systems

Introduction to Parallel Programming

2. Parallel Architectures

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Processor Architecture and Interconnect

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

Parallel Architecture, Software And Performance

Approaches to Parallel Computing

Parallel Numerics, WT 2013/ Introduction

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Real Parallel Computers

Architectures of Flynn s taxonomy -- A Comparison of Methods

Interconnection networks

Instruction Register. Instruction Decoder. Control Unit (Combinational Circuit) Control Signals (These signals go to register) The bus and the ALU

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Computer Organization. Chapter 16

Parallel Computer Architecture II

Dheeraj Bhardwaj May 12, 2003

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Computer Architecture and Organization

Interconnection Networks

Transcription:

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections MCS 572 Lecture 2 Introduction to Supercomputing Jan Verschelde, 24 August 2016 Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 1 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 2 / 27

the classification of Flynn In 1966, Flynn introduced the following classification: SISD = Single Instruction Single Data stream one single processor handles data sequentially use pipelining (e.g.: car assembly) to achieve parallelism MISD = Multiple Instruction Single Data stream called systolic arrays, has been of little interest SIMD = Single Instruction Multiple Data stream graphics computing, issue same command for pixel matrix vector and arrays processors for regular data structures MIMD = Multiple Instruction Multiple Data stream this is the general purpose multiprocessor computer Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 3 / 27

Single Program Multiple Data Stream One model is SPMD: Single Program Multiple Data stream: 1 All processors execute the same program. 2 Branching in the code depends on the identification number of the processing node. Manager worker paradigm: manager (also called root) has identification zero, workers are labeled 1, 2,...,p 1. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 4 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 5 / 27

types of parallel computers shared memory distributed memory m 1 m 2 m 3 m 4 network network p 1 p 2 p 3 p 4 p 1 p 2 p 3 p 4 m 1 m 2 m 3 m 4 One crude distinction concerns memory, shared or distributed: A shared memory multicomputer has one single address space, accessible to every processor. In a distributed memory multicomputer, every processor has its own memory accessible via messages through that processor. Many parallel computers consist of multicore nodes. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 6 / 27

clusters Definition A cluster is an independent set of computers combined into a unified system through software and networking. Beowulf clusters are scalable performance clusters based on commodity hardware, on a private network, with open source software. What drove the clustering revolution in computing? 1 commodity hardware: choice of many vendors for processors, memory, hard drives, etc... 2 networking: Ethernet is dominating commodity networking technology, supercomputers have specialized networks; 3 open source software infrastructure: Linux and MPI. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 7 / 27

Message Passing and Scalability total time = computation time + communication time }{{} o v e r h e a d Because we want to reduce the overhead, the computation/communication ratio = determines the scalability of a problem: computation time communication time How well can we increase the problem size n, keeping p, the number of processors fixed? Desired: order of overhead order of computation, so ratio, examples: O(log 2 (n)) < O(n) < O(n 2 ). Remedy: overlapping communication with computation. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 8 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 9 / 27

distributed shared memory computers In a distributed shared memory computer: Benefits: memory is physically distributed with each processor, and each processor has access to all memory in single address space. 1 message passing often not attractive to programmers, 2 shared memory computers allow limited number of processors, whereas distributed memory computers scale well Disadvantage: access to remote memory location causes delays and the programmer does not have control to remedy the delays. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 10 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 11 / 27

some terminology bandwidth: number of bits transmitted per second on latency, we distinguish tree types: message latency: time to send zero length message (or startup time), network latency: time to make a message transfer the network, communication latency: total time to send a message including software overhead and interface delays. diameter of network: minimum number of links between nodes that are farthest apart on bisecting the network: bisection width : number of links needed to cut network in two equal parts, bisection bandwidth: number of bits per second which can be sent from one half of network to the other half. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 12 / 27

arrays, rings, meshes, and tori Connecting p nodes in complete graph is too expensive. An array and ring of 4 nodes: A matrix and torus of 16 nodes: Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 13 / 27

hypercube network Two nodes are connected their labels differ in exactly one bit. 01 00 11 10 011 111 001 101 110 010 000 100 e-cube or left-to-right routing: flip bits from left to right, e.g.: going from node 000 to 101 passes through 100. In a hypercube network with p nodes: maximum number of flips is log 2 (p), number of connections is...? Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 14 / 27

a tree network Consider a binary tree: The leaves in the tree are processors. The interior nodes in the tree are switches. Often the tree is fat: with an increasing number of links towards the root of the tree. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 15 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 16 / 27

a crossbar switch In a shared memory multicomputer, processors are usually connected to memory modules by a crossbar switch. For example, for p = 4: p 4 p 3 p 2 p 1 m1 m 2 m3 m4 A p-processor shared memory computer requires p 2 switches. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 17 / 27

2-by-2 switches a switch pass through cross over Changing from pass through to cross over configuration: 1 2 3 4 1 2 3 4 Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 18 / 27

a multistage network Rules in the routing algorithm: 1 bit is zero: select upper output of switch 2 bit is one: select lower output of switch The first bit in the input determines the output of the first switch, the second bit in the input determines the output of the second switch. inputs 00 10 01 11 00 01 10 11 outputs The communication between 2 nodes using 2-by-2 switches causes blocking: other nodes are prevented from communicating. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 19 / 27

a 3-stage Omega interconnection network circuit switching: 000 100 000 001 001 101 010 110 011 111 010 011 100 101 110 111 number of switches for p processors: log 2 (p) p 2 Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 20 / 27

circuit and packet switching If all circuits are occupied, communication is blocked. Alternative solution: packet switching: message is broken in packets and sent through network. Problems to avoid: deadlock: Packets are blocked by other packets waiting to be forwarded. This occurs when the buffers are full with packets. Solution: avoid cycles using e-cube routing algorithm. livelock: a packet keeps circling the network and fails to find its destination. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 21 / 27

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 22 / 27

ethernet connection network Computers in a typical cluster are connected via ethernet. compute nodes ethernet network manager node Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 23 / 27

personal supercomputing HP workstation Z800 RedHat Linux: two 6-core Intel Xeon at 3.47Ghz, 24GB of internal memory, 2 NVDIA Tesla C2050 general purpose graphic processing units. Microway whisperstation RedHat Linux: two 8-core Intel Xeon at 2.60Ghz, 128GB of internal memory, 2 NVDIA Tesla K20C general purpose graphic processing units. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 24 / 27

The UIC Condo Cluster The Hardware Specs athttp://rc.uic.edu/hardware-specs: Two login nodes are for managing jobs and file system access. 160 nodes, each node has 16 cores, running at 2.60GHz, 20MB cache, 128GB RAM, 1TB storage. 40 nodes, each node has 20 cores, running at 2.50GHz, 20MB cache, 128GB RAM, 1TB storage. 3 large memory compute nodes, each with 32 cores having 1TB RAM giving 31.25GB per core. Total adds upto 96 cores and 3TB of RAM. Total adds up to 3,456 cores, 28TB RAM, and 203TB storage. 288TB fast scratch communicating with nodes over QDR infiniband. 1.14PB of raw persistent storage. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 25 / 27

summary and recommended reading We classified computers and introduced network terminology, we have covered chapter 1 of Wilkinson-Allen almost entirely. Available to UIC via the ACM digital library: Michael J. Flynn and Kevin W. Rudd: Parallel Architectures. ACM Computing Surveys 28(1): 67-69, 1996. Available to UIC via IEEE Xplore: George K. Thiruvathukal: Cluster Computing. Guest Editor s Introduction. Computing in Science & Engineering 7(2): 11-13, 2005. Visitwww.beowulf.org. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 26 / 27

Exercises Homework will be collected at a to be announced date. Exercises: 1 Derive a formula for the number of links in a hypercube with p = 2 k processors for some positive number k. 2 Consider a network of 16 nodes, organized in a 4-by-4 mesh with connecting loops to give it the topology of a torus (or doughnut). Can you find a mapping of the nodes which give it the topology of a hypercube? If so, use 4 bits to assign labels to the nodes. If not, explain why. 3 We derived an Omega network for eight processors Give an example of a configuration of the switches which is blocking, i.e.: a case for which the switch configurations prevent some nodes from communicating with each other. 4 Draw a multistage Omega interconnection network for p = 16. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 24 August 2016 27 / 27