Introduction to Parallel Programming

Similar documents
Introduction to Parallel Programming

Introduction to Parallel Programming

Lecture 7: Parallel Processing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Chapter 11. Introduction to Multiprocessors

BlueGene/L (No. 4 in the Latest Top500 List)

Introduction to Parallel Computing

COSC 6385 Computer Architecture - Multi Processor Systems

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Lecture 7: Parallel Processing

Parallel Computing Introduction

Parallel Architectures

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Top500 Supercomputer list

High Performance Computing

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computing Basics, Semantics

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Computer Architecture

Processor Architecture and Interconnect

Parallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Numerics, WT 2013/ Introduction

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction II. Overview

Computer parallelism Flynn s categories

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Module 5 Introduction to Parallel Processing Systems

High Performance Computing Systems

Computer Systems Architecture

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Parallel Architectures

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Multiprocessors & Thread Level Parallelism

CSL 860: Modern Parallel

Multiprocessors - Flynn s Taxonomy (1966)

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Fundamentals of Computer Design

Introduction to High-Performance Computing

Online Course Evaluation. What we will do in the last week?

Computer Organization. Chapter 16

Cray XE6 Performance Workshop

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Organisasi Sistem Komputer

High Performance Computing in C and C++

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Dr. Joe Zhang PDC-3: Parallel Platforms

What is Parallel Computing?

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

Parallel Numerics, WT 2017/ Introduction. page 1 of 127

Parallel Computing Platforms

Parallel programming. Luis Alejandro Giraldo León

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

The Art of Parallel Processing

Computer Systems Architecture

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

Chapter 18 Parallel Processing

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of Computers Design

Chap. 4 Multiprocessors and Thread-Level Parallelism

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Lecture 9: MIMD Architectures

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Handout 3 Multiprocessor and thread level parallelism

Introduction to Parallel Processing

Parallel Computing Why & How?

Multicores, Multiprocessors, and Clusters

RAID 0 (non-redundant) RAID Types 4/25/2011

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

WHY PARALLEL PROCESSING? (CE-401)

Multi-Processor / Parallel Processing

Scalability and Classifications

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architectures

Issues in Multiprocessors

Parallel Computing Ideas

High Performance Computing (HPC) Introduction

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Parallel Processors. Session 1 Introduction

Lect. 2: Types of Parallelism

Computer Architecture and Organization

Programming at Scale: Concurrency

Issues in Multiprocessors

Distributed Systems CS /640

Why Multiprocessors?

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Lecture 8: RISC & Parallel Computers. Parallel computers

Transcription:

Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1

y What is Parallel Programming? Using more than one processor or computer to complete a task Each processor works on its section of the problem (functional parallelism) Each processor works on its section of the data (data parallelism) Processors can exchange information Grid of Problem to be solved CPU #1 works on this area of the problem CPU #2 works on this area of the problem CPU #3 works on this area of the problem CPU #4 works on this area of the problem x 5/23/2011 www.cac.cornell.edu 2

Why Do Parallel Programming? Limits of single CPU computing performance available memory Parallel computing allows one to: solve problems that don t fit on a single CPU solve problems that can t be solved in a reasonable time We can solve larger problems faster more cases 5/23/2011 www.cac.cornell.edu 3

4

5

Terminology (1) serial code is a single thread of execution working on a single data item at any one time parallel code has more than one thing happening at a time. This could be A single thread of execution operating on multiple data items simultaneously Multiple threads of execution in a single executable Multiple executables all working on the same problem Any combination of the above task is the name we use for an instance of an executable. Each task has its own virtual address space and may have multiple threads. 5/23/2011 www.cac.cornell.edu 6

Terminology (2) node: a discrete unit of a computer system that typically runs its own instance of the operating system core: a processing unit on a computer chip that is able to support a thread of execution; can refer either to a single core or to all of the cores on a particular chip cluster: a collection of machines or nodes that function in someway as a single resource. grid: the software stack designed to handle the technical and social challenges of sharing resources across networking and institutional boundaries. grid also applies to the groups that have reached agreements to share their resources. 5/23/2011 www.cac.cornell.edu 7

Types of Parallelism 8

Data Parallelism Definition: when independent tasks can apply the same operation to different elements of the data set at the same time. Examples: 2 brothers mow the lawn 8 farmers paint a barn A B B B C 9

Functional Parallelism Definition: when independent tasks can apply different operations to different data elements at the same time. Examples: 2 brothers do yard work (1 edges & 1 mows) 8 farmers build a barn A B C D E 10

Task Parallelism Definition: independent tasks perform functions but do not communicate with each other, only with a Master Process. These are often called Embarrassingly Parallel. Examples: Independent Monte Carlo Simulations ATM Transactions A B C D 11

Pipeline Parallelism Definition: Each Stage works on a part of a solution. The output of one stage is the input of the next. (Note: This works best when each stage takes the same amount of time to complete) Examples: Assembly lines, Computing partial sums T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 A i i+1 i+2 i+3 i+4 i+5 i+6 B i i+1 i+2 i+3 i+4 i+5 i+6 C i i+1 i+2 i+3 i+4 i+5 i+6 i i+1 i+2 i+3 i+4 i+5 i+6 12

Amdahl s Law Amdahl s Law places a strict limit on the speedup that can be realized by using multiple processors. Effect of multiple processors on run time t n = (f p / N + f s )t 1 Where f s = serial fraction of code f p = parallel fraction of code N = number of processors t 1 = time to run on one processor 5/23/2011 www.cac.cornell.edu 13

Practical Limits: Amdahl s Law vs. Reality Amdahl s Law shows a theoretical upper limit speedup In reality, the situation is even worse than predicted by Amdahl s Law due to: Load balancing (waiting) Scheduling (shared processors or memory) Communications 80 I/O 70 60 f p = 0.99 S 50 40 30 20 10 0 Amdahl's Law Reality 0 50 100 150 200 250 Number of processors 5/23/2011 www.cac.cornell.edu 14

More Terminology synchronization: the temporal coordination of parallel tasks. It involves waiting until two or more tasks reach a specified point (a sync point) before continuing any of the tasks. parallel overhead: the amount of time required to coordinate parallel tasks, as opposed to doing useful work, including time to start and terminate tasks, communication, move data. granularity: a measure of the ratio of the amount of computation done in a parallel task to the amount of communication. fine-grained (very little computation per communication-byte) coarse-grained (extensive computation per communication-byte). 5/23/2011 www.cac.cornell.edu 15

Performance Considerations Computationally or Data Intensive Applications Have Critical Sections or Hot Spots where a majority of the application time is spent Tightly Coupled Parallel Approaches Parallel tasks must exchange data during the computation Loosely Coupled Parallel Approaches Parallel tasks can complete independent of each other for (int i=0; i < n; i++) { for (int j=0; j < m; j++) { //Perform Calculation Here } // for j } // for i 16

for (int i=0; i < n; i++) i i+1 i+2 i+3 i+n Workers 17

Is it really worth it to go Parallel? Writing effective parallel applications is difficult!! Load balance is important Communication can limit parallel efficiency Serial time can dominate Is it worth your time to rewrite your application? Do the CPU requirements justify parallelization? Is your problem really `large? Is there a library that does what you need (parallel FFT, linear system solving) Will the code be used more than once? 5/23/2011 www.cac.cornell.edu 18

High Performance Computing Architectures 19

HPC Systems Continue to Evolve Over Time Centralized Big-Iron Mainframes Mini Computers Specialized Parallel Computers NOWS RISC MPPS Clusters Grids + Clusters PCs RISC Workstations Decentralized collections 1970 1980 1990 2000 20

Cluster Computing Environment Access Control File Server(s) Login Nodes File servers & Scratch Space Compute Nodes Batch Schedulers Login Node(s) Compute Nodes 21

Instruction Stream Multiple Single Flynn s Taxonomy Classification Scheme for Parallel Computers Data Stream Single Multiple SISD SIMD MISD MIMD 22

Types of Parallel Computers (Memory Model) Nearly all parallel machines these days are multiple instruction, multiple data (MIMD) A much more useful way to classify modern parallel computers is by their memory model shared memory distributed memory 5/23/2011 www.cac.cornell.edu 23

Shared and Distributed Memory Models P P P P P P Bus Memory P M P P P P P M M M M M Network Shared memory: single address space. All processors have access to a pool of shared memory; easy to build and program, good price-performance for small numbers of processors; predictable performance due to UMA.(example: SGI Altix) Methods of memory access : - Bus - Crossbar Distributed memory: each processor has its own local memory. Must do message passing to exchange data between processors. cc-numa enables larger number of processors and shared memory address space than SMPs; still easy to program, but harder and more expensive to build. (example: Clusters) Methods of memory access : - various topological interconnects 5/23/2011 www.cac.cornell.edu 24

Shared Memory vs. Distributed Memory Tools can be developed to make any system appear to look like a different kind of system distributed memory systems can be programmed as if they have shared memory, and vice versa such tools do not produce the most efficient code, but might enable portability HOWEVER, the most natural way to program any machine is to use tools and languages that express the algorithm explicitly for the architecture. 5/23/2011 www.cac.cornell.edu 25

Programming Parallel Computers Programming single-processor systems is (relatively) easy because they have a single thread of execution and a single address space. Programming shared memory systems can benefit from the single address space Programming distributed memory systems is the most difficult due to multiple address spaces and need to access remote data Both shared memory and distributed memory parallel computers can be programmed in a data parallel, SIMD fashion and they also can perform independent operations on different data (MIMD) and implement task parallelism. 5/23/2011 www.cac.cornell.edu 26

Single Program, Multiple Data (SPMD) SPMD: dominant programming model for shared and distributed memory machines. One source code is written Code can have conditional execution based on which processor is executing the copy All copies of code are started simultaneously and communicate and sync with each other periodically 5/23/2011 www.cac.cornell.edu 27

SPMD Programming Model source.c source.c source.c source.c source.c Processor 0 Processor 1 Processor 2 Processor 3 5/23/2011 www.cac.cornell.edu 28

Questions? 29