Parallel Computing Why & How?

Similar documents
Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to Parallel Computing

Parallel Computing Introduction

Introduction II. Overview

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

High Performance Computing Systems

COSC 6385 Computer Architecture - Multi Processor Systems

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Parallel Numerics, WT 2013/ Introduction

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

The Art of Parallel Processing

Parallel Architectures

BlueGene/L (No. 4 in the Latest Top500 List)

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

An Introduction to Parallel Programming

Top500 Supercomputer list

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Introduction to Parallel Programming

Introduction to Parallel Programming

Parallel Computing. Hwansoo Han (SKKU)

Issues in Multiprocessors

Review of previous examinations TMA4280 Introduction to Supercomputing

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

THREAD LEVEL PARALLELISM

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

Computer Architecture

WHY PARALLEL PROCESSING? (CE-401)

Parallel and High Performance Computing CSE 745

Issues in Multiprocessors

Multiprocessors - Flynn s Taxonomy (1966)

High Performance Computing in C and C++

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Introduction to Parallel Programming

Multi-core Programming - Introduction

Parallel Numerics, WT 2017/ Introduction. page 1 of 127

High Performance Computing. Introduction to Parallel Computing

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

ECE 563 Second Exam, Spring 2014

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Introduction to High-Performance Computing

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Parallel Computers. c R. Leduc

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Computer Systems Architecture

Parallel Computing Platforms

Introduction to GPU computing

CSL 860: Modern Parallel

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

Chap. 4 Multiprocessors and Thread-Level Parallelism

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Distributed Systems CS /640

Lecture 7: Parallel Processing

Trends and Challenges in Multicore Programming

CSE 392/CS 378: High-performance Computing - Principles and Practice

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Parallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Comp. Org II, Spring

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

CS 475: Parallel Programming Introduction


Lect. 2: Types of Parallelism

EPL372 Lab Exercise 5: Introduction to OpenMP

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

COMP Parallel Computing. SMM (2) OpenMP Programming Model

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

27. Parallel Programming I

Computer Systems Architecture

Parallel Architectures

27. Parallel Programming I

Introduction to High Performance Computing

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Introduction to OpenMP

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

RAID 0 (non-redundant) RAID Types 4/25/2011

Parallel Algorithm Engineering

12:00 13:20, December 14 (Monday), 2009 # (even student id)

27. Parallel Programming I

Comp. Org II, Spring

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Plan. 1: Parallel Computations. Plan. Aims of the course. Course in Scalable Computing. Vittorio Scarano. Dottorato di Ricerca in Informatica

Parallel Processing & Multicore computers

BİL 542 Parallel Computing

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Online Course Evaluation. What we will do in the last week?

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

How to Write Fast Code , spring th Lecture, Mar. 31 st

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Transcription:

Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008

Outline 1 Motivation 2 Parallel hardware 3 Parallel programming 4 Important concepts

List of Topics 1 Motivation 2 Parallel hardware 3 Parallel programming 4 Important concepts

Background (1) There s an everlasting pursuit of realism in computational sciences More sophisticated mathematical models Smaller x, y, z, t Longer computation time Larger memory requirement

Background (2) Traditional serial computing (single processor) has limits Physical size of transistors Memory size and speed Instruction level parallelism is limited Power usage, heat problem Moore s law will not continue forever

Background (3) Parallel computing platforms are nowadays widely available Access to HPC centers Local Linux clusters Multiple CPUs and multi-core chips in laptops GPUs (graphics processing units)

What is parallel computing? Parallel computing: simultaneous use of multiple processing units to solve one computational problem Plot obtained from https://computing.llnl.gov/tutorials/parallel comp/

Why parallel computing? Saving time Solving larger problems access to more memory better memory performance (when programmed correctly) Providing concurrency Saving cost

List of Topics 1 Motivation 2 Parallel hardware 3 Parallel programming 4 Important concepts

Today s most powerful computer IBM BlueGene/L system (Lawrence Livermore Lab) 106,496 dual-processor nodes (PowerPC 440 700 MHz) Peak performance: 596 teraflops (596 10 12 ) Linpack benchmark: 478.2 teraflops https://asc.llnl.gov/computing resources/bluegenel/

Top500 list (Nov 2007) http://www.top500.org

Flynn s taxonomy Classification of computer architectures SISD (single instruction, single data) serial computers SIMD (single instruction, multiple data) array computers, vector computers, GPUs MISD (mulitple instruction, single data) systolic array (very rare) MIMD (mulitple instruction, multiple data) mainstream parallel computers

Classification of parallel computers Classification from the memory perspective Shared memory systems A single global address space SMP (symmetric multiprocessing) NUMA (non-uniform memory access) Multi-core processor CMP (chip multi-processing) Distributed memory systems Each node has its own physical memory Massively parallel systems Different types of clusters Hybrid distributed-shared memory systems

Shared memory Advantages: user-friendly Disadvantages: scalibility Plot obtained from https://computing.llnl.gov/tutorials/parallel comp/

Distributed memory Advantages: data locality (no interference), cost-effective Disadvantages: explicit communication, explicit decomposition of data or tasks Plot obtained from https://computing.llnl.gov/tutorials/parallel comp/

List of Topics 1 Motivation 2 Parallel hardware 3 Parallel programming 4 Important concepts

Parallel programming models Threads model Easy to program (inserting a few OpenMP directives) Parallelism behind the scene (little user control) Difficult to scale to many CPUs (NUMA, cache coherence) Message passing model Many programming details (MPI or PVM) Better user control (data & work decomposition) Larger systems and better performance Stream-based programming (for using GPUs) Some special parallel languages Co-Array Fortran, Unified Parallel C, Titanium Hybrid parallel programming

OpenMP programming OpenMP is a portable API for programming shared-memory computers Existence of multiple threads Use of compiler directives Fork-join model Plot obtained from https://computing.llnl.gov/tutorials/openmp/

OpenMP example Inner-product between two vectors: c = n i=1 a(i)b(i) chunk = 10; c = 0.0; #pragma omp parallel for \ default(shared) private(i) \ schedule(static,chunk) \ reduction(+:c) for (i=0; i < n; i++) c = c + (a[i] * b[i]);

MPI programming MPI (message passing interface) is a library standard Implementation(s) of MPI available on almost every major parallel platform Portability, good performance & functionality Each process has its local memory Explicit message passing enables information exchange and collaboration between processes More info: http://www-unix.mcs.anl.gov/mpi/

MPI example Inner-product between two vectors: c = n i=1 a(i)b(i) MPI_Comm_size (MPI_COMM_WORLD, &num_procs); MPI_Comm_rank (MPI_COMM_WORLD, &my_rank); my_start = n/num_procs*my_rank; my_stop = n/num_procs*(my_rank+1); my_c = 0.; for (i=my_start; i<my_stop; i++) my_c = my_c + (a[i] * b[i]); MPI_Allreduce (&my_c, &c, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

Standard way of parallelization Identify the parts of a serial code that have concurrency Be aware of inhibitors to parallelism (e.g. data dependency) When using OpenMP insert directives to create parallel regions When using MPI decide an explicit decomposition of tasks and/or data insert MPI calls Parallel programming requires a new way of thinking

List of Topics 1 Motivation 2 Parallel hardware 3 Parallel programming 4 Important concepts

Some useful concepts Cost model of sending a message Speed-up Parallel efficiency t C (L) = τ + βl S(P) = T(1) T(P) η(p) = S(P) P Factors of parallel inefficiency communication load imbalance additional calculations that are parallelization specific synchronization serial sections (Amdahl s Law)

Amdahl s Law The upper limit of speedup T(1) T(P) T(1) (f s + fp P )T(1) = 1 f s + 1 fs P < 1 f s f s fraction of code that is serial (not parallelizable) f p fraction of code parallelizable. f p = 1 f s

Gustafson s law Things are normally not so bad as Amdahl s law says Normalize the parallel execution time to be 1 Scaled speed-up S s (P) = f s + Pf p f s + f p = f s + P(1 f s ) = P + (1 P)f s f s is normally small f s normally decreases as the problem size grows

Granularity Granularity is a qualitative measure of the ratio of computation over communication Fine-grain parallelism small amounts of computation between communication load imbalance may be a less important issue Coarse-grain parallelism large amounts of computation between communication high ratio of computation over communication Objective: Design coarse-grain parallel algorithms, if possible

Summary We re already at the age of parallel computing Parallel computing relies on parallel hardware Parallel computing needs parallel software So parallel programming is very important new way of thinking identification of parallelism design of parallel algorithm implementation can be a challenge