Parallel Architectures

Similar documents
Parallel Architectures

SHARED MEMORY VS DISTRIBUTED MEMORY

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Overview. Processor organizations Types of parallel machines. Real machines

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Scalability and Classifications

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Types of Parallel Computers

Advanced Parallel Architecture. Annalisa Massini /2017

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Architecture. Sathish Vadhiyar

Lecture 2 Parallel Programming Platforms

Multiprocessors - Flynn s Taxonomy (1966)

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Chapter 9 Multiprocessors

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Computer Organization. Chapter 16

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Computer parallelism Flynn s categories

CS Parallel Algorithms in Scientific Computing

Parallel Architectures

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

COSC 6385 Computer Architecture - Multi Processor Systems

Parallel and High Performance Computing CSE 745

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Computer Systems Architecture

COSC 6374 Parallel Computation. Parallel Computer Architectures

4. Networks. in parallel computers. Advances in Computer Architecture

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Lecture 24: Virtual Memory, Multiprocessors

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computing Introduction

Computer Systems Architecture

Computer Architecture

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Chapter 18 Parallel Processing

Module 5 Introduction to Parallel Processing Systems

A taxonomy of computer architectures

Physical Organization of Parallel Platforms. Alexandre David

Introduction to parallel computing

Chap. 4 Multiprocessors and Thread-Level Parallelism

COSC 6374 Parallel Computation. Parallel Computer Architectures

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Organisasi Sistem Komputer

Introduction II. Overview

WHY PARALLEL PROCESSING? (CE-401)

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Dr. Joe Zhang PDC-3: Parallel Platforms

Chapter 11. Introduction to Multiprocessors

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Parallel Computing Platforms

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Introduction to Parallel Programming

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Top500 Supercomputer list

Parallel Architecture. Hwansoo Han

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Lecture 9: MIMD Architectures

Multiprocessors & Thread Level Parallelism

BİL 542 Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

What is Parallel Computing?

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Introduction to Parallel Programming

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Parallel Architecture, Software And Performance

Issues in Multiprocessors

BlueGene/L (No. 4 in the Latest Top500 List)

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Issues in Multiprocessors

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Lecture 7: Parallel Processing

Interconnection networks

Chapter 17 - Parallel Processing

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Parallel Computers. c R. Leduc

Parallel Numerics, WT 2013/ Introduction

Interconnect Technology and Computational Speed

Chapter 8 : Multiprocessors

Interconnection Network

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Chapter 5. Thread-Level Parallelism

Transcription:

Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 2 / 36

Acknowledgements Some material used in creating these slides comes from Parallel Programming in C with MPI and OpenMP, Michael Quinn, McGraw-Hill, 2004. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 3 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 4 / 36

Flynn s Taxonomy (1966) Single Data Multiple Data Single SISD SIMD Instruction uniprocessors processor arrays pipelined vector processors Multiple MISD MIMD Instruction systolic arrays multiprocessors multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 5 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 6 / 36

Other taxonomies SPMD Single Program, Multiple Data. This is similar to SIMD but indicates that a single program is used for the parallel application i.e. every node runs the same program. MPMD Multiple Program, Multiple Data. Like MIMD except this explicitly uses more than one program for the parallel application. Typically one is a master or control program and the others are slave or compute programs. Modern clusters typically follow one of these two models. It is often convenient to use the SPMD model but have the program behave as either a master or slave based on some criteria determined at run-time (often the node on which the program is running). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 7 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 8 / 36

Definitions Diameter largest distance between two switch nodes. Want this to be as small as possible. Bisection width smallest number of connections that need to be severed to separate network into two distinct subnetworks of the same size. Want this to be as large as possible. Edges per switch node best if this is constant and independent of network size. Edge length physical length of wire or other media required for connections. Ideally this is constant (does not grow with network size). Direct Topology every switch is connected to a node and other switches. Indirect Topology some switches are only connected to other switches. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 9 / 36

2-D Meshes with n processors 2-D Mesh 2-D Mesh with wrap-around (torus) Switch Processor CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 10 / 36

2-D Meshes with n processors Direct topology 2-D mesh with wrap-around connections is a torus. Diameter on non-wrap-around mesh is minimized when mesh is square: 2( n 1). For a square mesh with wrap-around the diameter is n. Bisection width is also large without wrap-around the bisection with of a square mesh is n. Constant number of edges per switch (4 for the wrap-around version) and can have constant edge length. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 11 / 36

Binary tree Switch Processor CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 12 / 36

Binary tree Indirect topology. Interior nodes are switches only connected to other switches; leaf nodes are also connected to processors. n = 2 d for some d and there are 2n 1 switches. Diameter is 2 log n = 2d. Bisection width is 1 this is not good as all traffic from one side of tree must travel though a single switch on its way to the other side. Number of edges per switch is fixed at 3 for switches only connected to other switches and 2 switches connected to processors. Edge length grows as nodes are added. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 13 / 36

Hypertree network of degree k and depth d Indirect topology Keeps the diameter small (good) while making bisection width larger (also good). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 14 / 36

Butterfly network 0 1 2 3 4 5 6 7 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 15 / 36

Butterfly network Indirect topology n = 2 d for some d and there are n(log n + 1) = n(d + 1) switches arranged in d + 1 rows (called ranks) of length n. Often the first and last rank of switches is physically the same but for the purposes of this discussion are counted separately. Let i, 0 i < d, be a switch s rank and let j, 0 j n, be the switch s position within the rank. Switch(i,j) is connected to switch((i + 1), j) and switch((i + 1), inv(i,j)) where inv(i,j) means that the i th most-significant bit (counting from the left) in j is flipped. Diameter is log n = d and bisection width is n. Number of edges per switch is 4 but edge length does grow as depth of network increases. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 16 / 36

Butterfly network For example suppose d = 3 so n = 2 3 = 8 and consider switch(1,4): CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 17 / 36

Butterfly network For example suppose d = 3 so n = 2 3 = 8 and consider switch(1,4): (i + 1) = (1 + 1) = 2. Thus switch(1,4) is connected to switch(2,4). The 0 th most-significant bit (counting from left) in j = 4 = 100 2 is 1. The 1 st most significant bit is 0; flipping this gives 110 2 = 6 so inv(1,4)=6. Thus switch(1,4) is connected to switch(2,6). What switches is switch(2,5) connected to? (i + 1) = (2 + 1) = 3 and flipping the 3 rd most-significant bit in 5 = 101 2 gives 100 2 = 4 so switch(2,5) is connected to switch(3,5) and switch(3,4). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 17 / 36

Butterfly network Routing turns out to be relatively easy. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 18 / 36

Butterfly network Routing turns out to be relatively easy. 1 Destination address is appended to message. 2 Each switch that processes message pops most-significant bit from destination address and routes message left if the bit is 0, else routes message right. 3 Remaining address bits are sent with message. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 18 / 36

Butterfly network 0 0,0 1 0,1 2 0,2 3 0,3 4 0,4 5 0,5 6 0,6 7 0,7 Send message: Node 2 to Node 5 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1,0 1 0,1 1,1 2 3 4 5 6 7 Send message: Node 2 to Node 5 0,2 0,3 0,4 0,5 0,6 0,7 Node 2 initiates a message to node 5. 1,2 1,3 1,4 1,5 1,6 1,7 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1 0,1 2 3 4 5 6 7 Send message: Node 2 to Node 5 0,2 0,3 0,4 0,5 0,6 0,7 Dest: 101 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 Switch 0,2 pops leftmost bit from destination address. 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 Bit is 1 so message is routed down right path. 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 Remaining destination address is 01. 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1 0,1 2 3 4 5 6 7 Send message: Node 2 to Node 5 0,2 0,3 0,4 0,5 0,6 0,7 Dest: 01 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 Switch 1,6 pops leftmost bit from destination address. 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 Bit is 0 so message is routed down left path. 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 Remaining destination address is 1. 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1 0,1 2 3 4 5 6 7 Send message: Node 2 to Node 5 0,2 0,3 0,4 0,5 0,6 0,7 Dest: 1 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 Switch 2,4 pops leftmost bit from destination address. 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 Bit is 1 so message is routed down right path. 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1 0,1 2 3 4 5 6 7 Send message: Node 2 to Node 5 0,2 0,3 0,4 0,5 0,6 0,7 Dest: 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 Switch 3,5 makes message available to node 5. 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7

Butterfly network 0 0,0 1 0,1 2 0,2 3 0,3 4 0,4 5 0,5 6 0,6 7 0,7 Send message: Node 2 to Node 5 Message arrives at node 5. 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 0 1 2 3 4 5 6 7 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 19 / 36

Hypercube network 110 111 010 011 100 101 000 001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 20 / 36

Hypercube network Direct topology n = 2 d processors 0-D hypercube is a single processor. 1-D hypercube is a line of 2 1 = 2 processors with a single edge connecting them. 2-D hypercube is a square of 2 2 = 4 processors with four edges. 3-D hypercube is a cube of 2 3 = 8 processors with 12 edges. 4-D hypercube is a hypercube of 2 4 = 16 processors with 32 edges. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 21 / 36

Hypercube network Rule: to construct a (d + 1)-D hypercube, take two d-d dimensional hypercubes and connect the corresponding vertices. Nodes (or switches for nodes) are numbered using a Gray code: if two nodes are adjacent then their numbers differ by only a single bit. Diameter is log n = d and bisection width is n/2. There are log n = d edges per switch (not constant, which is not ideal) and the edge lengths do grow as the number of processors grows (also not ideal). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 22 / 36

Hypercube network Routing in a hypercube network is fairly easy: To route from one processor to another: start with address of source node and flip any bit that differs from the corresponding bit in the destination address. Send to the node whose address was just generated. Repeat, always flipping a bit in a different position. Example: consider a 5-D hypercube with 32 processors. Each node has a 5 bit address. Suppose node 9 (01001 2 ) wants to send a message to node 30 (11110 2 ). The differing bits are 11111. The order in which the bits are selected is immaterial but does lead to non-unique communication paths. In this case there are 4! = 24 possible paths; here are two of them: 1 01001 11001 11101 11111 11110 2 01001 01000 01010 01110 11110 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 23 / 36

Hypercube network 16 0-D hypercubes 1110 1111 0110 0111 1010 1011 0010 0011 1100 1101 0100 0101 1000 1001 0000 0001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 24 / 36

Hypercube network 8 1-D hypercubes 1110 1111 0110 0111 1010 1011 0010 0011 1100 1101 0100 0101 1000 1001 0000 0001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 24 / 36

Hypercube network 4 2-D hypercubes 1110 1111 0110 0111 1010 1011 0010 0011 1100 1101 0100 0101 1000 1001 0000 0001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 24 / 36

Hypercube network 2 3-D hypercubes 1110 1111 0110 0111 1010 1011 0010 0011 1100 1101 0100 0101 1000 1001 0000 0001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 24 / 36

Hypercube network 1 4-D hypercube 1110 1111 0110 0111 1010 1011 0010 0011 1100 1101 0100 0101 1000 1001 0000 0001 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 24 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 25 / 36

Processor arrays Multiple processors that carry out the same instruction in a given instruction cycle. Typically paired with a front-end processor that is handing all the other work of the system. Often called vector processors since operations on vectors typically involve identical operations on different data. Processors have local memory and some form of interconnection network (often a 2-D mesh) allowing for communication between processors. Masks can be used to selective enable/disable operations on individual processors. (e.g. think absolute value only want to change sign if value is less than zero). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 26 / 36

Processor array disadvantages As late as the mid 2000 s processor arrays were viewed as old technology. Reasons for this at the time included: CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 27 / 36

Processor array disadvantages As late as the mid 2000 s processor arrays were viewed as old technology. Reasons for this at the time included: many problems do not map well to strictly data-parallel solution conditionally executed parallel code is does not perform well not well suited for multi-user situations requires dedicated access to processor array for best performance. cost does not scale-down well hard or impossible to do with COTS (commodity off-the-shelf) parts CPUs are relatively cheap (and getting faster) CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 27 / 36

Processor arrays: GPU and accelerators Recently processor arrays in the form of GPU and accelerators like Intel s Phi have become quite popular. We ll talk more about these soon... CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 28 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 29 / 36

Multiprocessors A computer with multiple CPUs or one or more multicore CPUs with shared memory. Common examples include dual-processor workstations and systems with multiple cores in a single processor. Past supercomputers include Cray X-MP and Y-MP. In a Symmetric multiprocessor (SMP) all processors are the same and have equal (but shared) access to resources. This is currently the standard model of desktop or laptop computers; a CPU with multiple cores, various levels of cache, and common access to single pool of RAM. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 30 / 36

Multiprocessor cache issues Typically each CPU (or core; henceforth we ll just say CPU) has at least one level of cache. Data in the cache should be consistent with corresponding data in memory. When a CPU writes to its cache, that update must also be carried out in memory and in the caches of other CPUs where the updated data may also be stored. This is called the cache coherence problem. Snooping each CPU s cache controller monitors bus so it is aware what is stored in other caches. System uses this information to avoid coherence problems. One solution is the write invalidate protocol: When one CPU write to its cache, the corresponding data in other s cache is marked as invalid. This causes a cache miss when any other CPU tries to read the data from its own cache. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 31 / 36

Multiprocessor synchronization Barriers Points in a parallel program where all processors must be at the same spot before proceeding. Mutual Exclusion Often there are critical sections in code where the program must guarantee that only a single process accesses certain memory for a period of time. (e.g., don t want one process trying to read a memory location at the same instant another is writing to it.) Semaphores provide one way to do this. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 32 / 36

Multiprocessor Memory Access UMA (Unified Memory Access): Every processor has the same view and access to memory. Connections can be through a bus or a switched network. SMP machines typically belongs to the UMA category. Designs using a bus become less practical as the number of processors increases. NUMA (NonUniform Memory Access) Distributed Multiprocessor: Each processor has access to all memory but some access is indirect. Often memory is distributed and associated with each processor. There is a uniformly accessed virtual memory composed of all the distributed segments. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 33 / 36

Outline 1 Parallel Computer Classification Flynn s Taxonomy Other taxonomies 2 Parallel Computer Architectures Switched Network Topologies Processor arrays Multiprocessors Multicomputers CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 34 / 36

Multicomputers Commonly called distributed memory computers, these have multiple CPUs each having exclusive access to a certain segment of memory. Multicomputers can be symmetrical or asymmetrical. Symmetrical multicomputers are typically a collection of identical compute nodes. CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 35 / 36

Asymmetric vs symmetric multicomputers An asymmetrical multicomputer is a cluster of nodes with differentiated tasks and resources. Usually composed of front-end computer(s) called head nodes or login nodes and multiple back-end compute nodes. The head node typically handles program launching and supervision, I/O, network connections, as well as user-interface activities such as software development. Often programmed using either SPMD or MPMD model. In the SPMD case the program usually detects if it is running on the head node or a compute node and behaves accordingly. Supercomputer clusters that have dominated the supercomputer industries for the last 15-20 years are examples of asymmetrical multicomputers; many compute nodes and relatively few nodes for I/O, cluster supervision, and program development. Many small Beowulf clusters are symmetric multicomputers (or very nearly so). CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 36 / 36