Parallel Computers. c R. Leduc

Similar documents
BİL 542 Parallel Computing

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Types of Parallel Computers

Multiprocessors - Flynn s Taxonomy (1966)

Dr. Joe Zhang PDC-3: Parallel Platforms

Computer parallelism Flynn s categories

Computer Architecture

Chapter 11. Introduction to Multiprocessors

Introduction II. Overview

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Processor Architecture and Interconnect

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms

Introduction to Parallel Computing

Lect. 2: Types of Parallelism

COSC 6385 Computer Architecture - Multi Processor Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Module 5 Introduction to Parallel Processing Systems

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

BlueGene/L (No. 4 in the Latest Top500 List)

Chap. 4 Multiprocessors and Thread-Level Parallelism

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Moore s Law. Computer architect goal Software developer assumption

Computer Systems Architecture

WHY PARALLEL PROCESSING? (CE-401)

Moore s Law. Computer architect goal Software developer assumption

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Parallel Architectures

Issues in Multiprocessors

Convergence of Parallel Architecture

Issues in Multiprocessors

Introduction to Parallel Programming

Multicores, Multiprocessors, and Clusters

Why Multiprocessors?

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Interconnect Technology and Computational Speed

Overview. Processor organizations Types of parallel machines. Real machines

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Scalability and Classifications

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Computer Systems Architecture

Lecture 7: Parallel Processing

Parallel Programming. Michael Gerndt Technische Universität München

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

High Performance Computing

Parallel Computing Introduction

Parallel Computing Why & How?

The Art of Parallel Processing

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Parallel Computing. Hwansoo Han (SKKU)

Online Course Evaluation. What we will do in the last week?

Parallel Architectures

Multiprocessors 1. Outline

CSCI 4717 Computer Architecture

Unit 9 : Fundamentals of Parallel Processing

Top500 Supercomputer list

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Organisasi Sistem Komputer

Parallel Architectures

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Computer Organization. Chapter 16

Chapter 5: Thread-Level Parallelism Part 1

Introduction to Parallel Processing

Lecture 24: Memory, VM, Multiproc

Chapter 5. Thread-Level Parallelism

CS420/CSE 402/ECE 492. Introduction to Parallel Programming for Scientists and Engineers. Spring 2006

Lecture 2. Memory locality optimizations Address space organization

Comp. Org II, Spring

CSE 262 Spring Scott B. Baden. Lecture 1 Introduction

Chapter 18 Parallel Processing

Parallel Processing & Multicore computers

CMSC 611: Advanced. Parallel Systems

Parallel Architecture. Hwansoo Han

Comp. Org II, Spring

THREAD LEVEL PARALLELISM

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Lecture 9: MIMD Architectures

Parallel and High Performance Computing CSE 745

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Lecture 24: Virtual Memory, Multiprocessors

COSC4201 Multiprocessors

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Lecture notes for CS Chapter 4 11/27/18

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ

Introduction to High-Performance Computing

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

Transcription:

Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc

Why Parallel Computing? Many areas require great computational speed: ie. numerical modelling, and simulation of scientific and engineering problems. Require repetitive computations on large amounts of data. Must complete in a reasonable time. For manufacturing, engineering calculations and simulation must only take seconds or minutes. A simulation that takes two weeks is too long. A designer requires a quick answer, so they can try different ideas and fix errors. c 2002-2004 R. Leduc 1

Some problems have a specific deadline. ie. weather forecasting Grand challenge problems, like global weather forcasting and modelling large DNA structures, are problems that can not be handled in a reasonable time by today s computers. Such problems are always pushing the envelope. c 2002-2004 R. Leduc

N-body Problem Each body attracted to each other body by gravitational forces. Predicting the motion of astronomical bodies in space requires a large number of calculations. These forces can be calculated and the movement of each body predicted. Requires calculating total force acting on each body. For N bodies, there will be N 1 forces to calculate. Approx N 2 calculations. A galaxy might have 10 11 stars. That s 10 22 calculations! c 2002-2004 R. Leduc 2

Assuming each calculation took 10 6 seconds, even an efficient Nlog 2 N approximate algorithm would take almost a year! Split the computation across 1000 processors, and that time could reduce to about 9 hours. A lot easier to get 1000 processors than build one processor 1000 times as fast. c 2002-2004 R. Leduc

Figure 1.1 Astrophysical N-body simulation by Scott Linssen (undergraduate University of North Carolina at Charlotte [UNCC] student). c 2002-2004 R. Leduc 3 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 1

Parallel Computers A parallel computer consists of multiple processors operating together to solve a single problem. This provides an effective and relatively inexpensive means to solve problems requiring large computation speed. To use a parallel computer, one must split the problem into parts, each to be performed on a separate processor in parallel. Parallel programming is the art of writing programs of this form. The idea is that n processors can provide up to n times the speed. Ideal situation. Rarely achieved in practice. c 2002-2004 R. Leduc 4

Problems can t always be divided perfectly into independent parts. Interaction required for data transfer and synchronization (overhead). Parallel computers offer the advantage of more memory: The aggregate memory is larger than the memory for a single processor. Because of speed and memory increase, parallel computers often allow larger or more precise solutions to be solved. Multi-processor computers are becoming the norm. IBM, HP, AMD, and Intel are designing processors that can execute multiple threads/programs in parallel on a single chip. c 2002-2004 R. Leduc

Types of Parallel Computers Parallel computers are either specially designed computer systems containing multiple processors or multiple independent computers interconnected. We will discuss three types of parallel computers: Shared memory multiprocessor systems Message-Passing multicomputers Distributed shared memory systems c 2002-2004 R. Leduc 5

Shared Memory Multiprocessor Systems Conventional Computer consists of a single processor executing program stored in memory. Each memory location has an address from 0 to 2 n 1 where the address has n bits. See Figure 1.2. c 2002-2004 R. Leduc 6

Main memory Instructions (to processor) Data (to or from processor) Processor Figure 1.2 Conventional computer having a single processor and memory. c 2002-2004 R. Leduc 7 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 2

Multiprocessor system extends this by having multiple processors and multiple memory modules connected through an interconnection network. See Figure 1.3. Each processor can access each module. Called shared memory configuration. Employs a single address space. Each memory location has unique address, and processors all use same address. c 2002-2004 R. Leduc 8

One address space Memory modules Interconnection network Processors Figure 1.3 Traditional shared memory multiprocessor model. c 2002-2004 R. Leduc 9 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 3

Programming Shared Memory Multiprocessor Systems Each processor has its own executable code stored in memory to execute. Data for each processor is stored in memory, and thus accessible to all. Can use a parallel programming language with special constructs and statements. ie. FORTRAN 90 or high performance FOR- TRAN. See chapter 13 of High Performance Computing. Rely on compilers. Can also use threads. A multi-threaded program has regular code sequences for each processor. Communicate through shared memory locations. We will examine the POSIX standard Pthreads. c 2002-2004 R. Leduc 10

Types of Shared Memory Systems Two main types of shared memory multiprocessor systems: Uniform Memory Access (UMA) Nonuniform memory access (NUMA) systems. In a UMA system, each processor can access each memory module in the same amount of time. A common example is a Symmetric multiprocessing (SMP) system such as a duo processor Pentium III computer. Does not scale well above 64 processors. Expensive to have same access time to multiple memory modules and processors due to physical distance and number of interconnects. c 2002-2004 R. Leduc 11

NUMA Systems NUMA systems solve this by having hierarchical or distributed memory structure. Processors can access physically nearby memory locations faster than distant locations. Can scale to 100 s and 1000 s of processors. K. Dowd and C. Severance, High Performance Computing, 2nd Ed., O reilly, 1998. c 2002-2004 R. Leduc 12

Message-Passing Multicomputers A shared memory multiprocessor is a special designed computer system. Alternately, can can create multiprocessor by connecting complete computers through interconnection network. See Fig 1.4. Each computer has a processor and local memory not accessible to other processors Each computer has its own address space. A processor can only access a location in own memory. c 2002-2004 R. Leduc 13

Messages Processor Interconnection network Local memory Computers Figure 1.4 Message-passing multiprocessor model (multicomputer). c 2002-2004 R. Leduc 14 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 4

Message-Passing Multicomputers Cont. Interconnection network is used to send messages between processors. Messages may be instructions, synchronization info, as well as data other processors need for computations. Systems of this type are called message-passing multiprocessors or multicomputers. Examples: network of workstations (NOW), beowulf clusters. Message-passing multiprocessors scale better than shared memory multiprocessor systems. Cheaper and more flexible to construct. Design more open. Easy to extend. c 2002-2004 R. Leduc 15

Programming Multicomputers Problem divided into parts intended to be executed simultaneously on each processor. Typically, we have multiple independent processes running in parallel to solve problem. Can be on same processor or not. Messages carry data between processes as dictated by program. Use message-passing library routines linked to sequential programs. We will examine the messagepassing interface (MPI) libraries. c 2002-2004 R. Leduc 16

Pros/cons of Message-Passing Model Advantages: Universality: Can be used with multiple processors connected by a (fast/slow) communication network. ie. either a multiprocessor or network of workstations. Ease of Debugging: Prevents accidental overwriting of memory. Model only allows one process to directly access to a specific memory location. The fact that no special mechanisms are required to control concurrent access to data can greatly decrease execution time. Performance: Associates data with specific processor and memory. Makes cache-management and compilers work better. Applications can exhibit superlinear speedup. c 2002-2004 R. Leduc 17

Pros/cons Cont. Disadvantages: Requires programmers to use explicit program calls to pass messages. Error prone and has been compared to low-level assembly language programming. Data can not be shared. It must be copied. Problem if need to do many tasks using a lot of data. c 2002-2004 R. Leduc 18

Distributed Shared Memory Gives the programming flexibility of shared memory with the hardware flexibility of Message- Passing Multicomputers. Each processor has access to entire memory using a single common address space. Memory access to a location not local to a processor is done using message passing in an automated fashion. Called shared virtual memory. See Figure 1.5. c 2002-2004 R. Leduc 19

Messages Processor Interconnection network Shared memory Computers Figure 1.5 Shared memory multiprocessor implementation. c 2002-2004 R. Leduc 20 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 5

Flynn Computer Classifications SISD: For single processor computer, have a single stream of instructions operating on a single stream of data. Called a single instruction stream - single data stream (SISD) computer. MIMD: In a multiprocessor system, each processor has a stream of instructions acting upon a separate set of data. Called a multiple instruction stream - multiple data stream (MIMD) computer. See Figure 1.6. SIMD: This is when a single program generates a single stream of instructions which are broadcast to multiple processors who execute the same instruction in synchronism, but on different data. Called a single instruction stream - multiple data stream (SIMD) computer. c 2002-2004 R. Leduc 21

Instructions Program Instructions Program Processor Processor Data Data Figure 1.6 MPMD structure. c 2002-2004 R. Leduc 22 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 6