Programmation Concurrente (SE205)

Similar documents
Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Multiprocessors - Flynn s Taxonomy (1966)

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Lecture 7: Parallel Processing

Introduction II. Overview

Lect. 2: Types of Parallelism

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

BlueGene/L (No. 4 in the Latest Top500 List)

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

CSL 860: Modern Parallel

Multiprocessors & Thread Level Parallelism

Processor Performance and Parallelism Y. K. Malaiya

High Performance Computing Systems

Lecture 7: Parallel Processing

WHY PARALLEL PROCESSING? (CE-401)

Introduction to Parallel Programming

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

Message Passing Interface (MPI)

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Processor Architecture and Interconnect

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

Unit 9 : Fundamentals of Parallel Processing

CSE 392/CS 378: High-performance Computing - Principles and Practice

Computer Architecture

Heterogenous Computing

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Computer Systems Architecture

CDA3101 Recitation Section 13

Why do we care about parallel?

Introduction to Parallel Programming

Chapter 17 - Parallel Processing

Top500 Supercomputer list

Computer Systems Architecture

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Introduction to Parallel Programming

Dr. Joe Zhang PDC-3: Parallel Platforms

Parallel Computing Platforms

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Module 5 Introduction to Parallel Processing Systems

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Types of Parallel Computers

Flynn s Taxonomy of Parallel Architectures

COSC 6385 Computer Architecture - Multi Processor Systems

An Introduction to Parallel Architectures

Parallel Numerics, WT 2013/ Introduction

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Introduction to parallel computing

Online Course Evaluation. What we will do in the last week?

Parallel Computing: Parallel Architectures Jin, Hai

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

COMP 322: Fundamentals of Parallel Programming. Flynn s Taxonomy for Parallel Computers

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

UNIT I (Two Marks Questions & Answers)

Chap. 4 Multiprocessors and Thread-Level Parallelism

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Parallel Computing Introduction

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

Computer parallelism Flynn s categories

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Handout 3 Multiprocessor and thread level parallelism

Parallelism. CS6787 Lecture 8 Fall 2017

Chapter 5: Thread-Level Parallelism Part 1

! Readings! ! Room-level, on-chip! vs.!

Issues in Multiprocessors

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

Architectures of Flynn s taxonomy -- A Comparison of Methods

A taxonomy of computer architectures

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Review: Parallelism - The Challenge. Agenda 7/14/2011

Chapter 11. Introduction to Multiprocessors

Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture Programming Languages and Operating System

Processor Architectures

Lecture 1: Introduction

Copyright 2012, Elsevier Inc. All rights reserved.

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Masterpraktikum Scientific Computing

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

ILP: Instruction Level Parallelism

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

Multicores, Multiprocessors, and Clusters

Issues in Multiprocessors

Multi-core Programming - Introduction

Introduction to Parallel Computing

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Lecture 24: Virtual Memory, Multiprocessors

Multicore Hardware and Parallelism

Multi-core Architectures. Dr. Yingwu Zhu

Part VII Advanced Architectures. Feb Computer Architecture, Advanced Architectures Slide 1

Transcription:

Programmation Concurrente (SE205) CM1 - Introduction to Parallelism Florian Brandner & Laurent Pautet LTCI, Télécom ParisTech, Université Paris-Saclay

x Outline

Course Outline CM1: Introduction Forms of parallelism Computer architectures exploiting parallelism Parallel programming paradigms Amdahl s law and dependencies CM2: The shared-memory model CM3-5: Concurrent programming POSIX/Java (L. Pautet) CM6: Actor-based programming (me) CM7: Patterns and Algorithms (L. Pautet) CM8: Transactional memory (me) 3/34

Organization Control continue: 25% Two multiple choice tests (QCMs) (first 10 minutes of TD2 & TD3, be on time!) Extended TP later on (probably April) Exam: 75% All course material allowed Course material: I like group exercises... https://se205.wp.mines-telecom.fr/ 4/34

x Introduction

Group Exercise: Forms of parallelism What is parallelism? Form groups of 2-3 persons Discuss what forms of parallelism exist in computer science Think about: Computer architectures/hardware Programming languages/paradigms Granularity/level of parallelism (what is parallelized?) You have 5 minutes 6/34

Forms of parallelism A view on computer architectures Computers perform computations Parallel computers perform (some) computations in parallel Capabilities may vary between computer architectures (which developed of course over time) ld r1, (a) ld r2, (b) add r3, r1, r2 st (c), r3 Time A sequential, non-parallel processor (SISD) 7/34

Pipelining (80s) Computers execute instructions to perform a computation Decompose instructions into steps Read the instruction from memory Read the instruction s operands Perform the computation Write the result Perform these steps in parallel for different instructions 8/34 ld r1, (a) ld r2, (b) add r3, r1, r2 st (c), r3 Time A simple pipelined processor (still SISD)

Instruction-level parallelism (ILP, 90s) Execute (independent) instructions in parallel We will come back to dependencies later today. Implementations: Super-scalar or Very Long Instruction Word (VLIW) Hardware/software detects independence online/offline Usually combined with pipelining ld r1, (a) ld r2, (b) add r3, r1, r2 st (c), r3 Time A processor exploiting instruction-level parallelism (MIMD) 9/34

Data-level parallelism (70s and again 90s) Computers operate on data Data-parallel machines operate on many data items at once Implementations: Vector machines (super-computers of 70s) SIMD-extensions (PCs since 90s) GPGPUs (still developing) ldv r1:r4, (a:a+3) 4 less ldv r5:r8, (b:b+3) addv r9:r12, r1:r4, r5:r8 stv (c:c+3), r9:r12 Time A processor with vector support (SIMD) 10/34

Flynn s taxonomy SISD Instruction Pool MISD Instruction Pool Data Pool PU Data Pool PU PU SIMD Instruction Pool MIMD Instruction Pool PU PU PU Data Pool PU PU Data Pool PU PU PU PU PU PU PU PU... Processing Unit 11/34 Image source: Wikipedia

Flynn s taxonomy (2) Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD SISD Single Instruction Single Data, a non-parallel computer MISD Multiple Instruction Single Data, rather exotic model, sometimes found in safety-critical systems (airplanes) SIMD Single Instruction Multiple Data, a vector computer MIMD Multiple Instruction Multiple Data, a computer exploiting instruction-level parallelism (ILP). Most modern computers (PCs) are a mix of the SIMD/MIMD model. 12/34

x Beyond Instructions

Thread-level parallelism (60s) Also called task-level parallelism Execution of several threads (or programs) in parallel Each thread represents its own stream of instructions (which may of course be executed in parallel themselves) Threads may have private data Threads may share data Threads may need to coordinate among each other Requires much more involvement of the programmer Interaction with programming languages and models Heavily researched even today after 50 years! 14/34

Implementations Multiprocessor computers (60s) A computer with multiple processors interconnected by a bus or a simple network. The processor may or may not access the same main memory. Multicore processors (2000s) Several processors on a single chip. Processors on the same chip usually share caches and the connection to main memory. Clusters/grids/distributed computers (70s) Several (often thousands) of computers interconnected by a network. Each computer has its on memory. 15/34

Models of parallel programming Shared-memory model x Typically multicore or -processor computers where all processors access the same main memory. Threads coordinate by accessing shared data in the shared main memory space. Subject of CM2-5 & 8 Message-passing model x Typically used for parallel systems using a network (e.g., clusters, grids). Threads coordinate by exchanging messages. Subject of CM6-7 The programming model is somewhat implied by the computer architecture. 16/34

Group Exercise: Why parallel programming? Discuss in groups of 2-3: Why should we care about parallel programming? What does it bring us? Why should we teach it? What issues might be relevant with regard to teaching parallel programming? 17/34

x Dependencies

Amdahl s Law Has to be mentioned in every lecture on parallelism: Speedup (S) through parallelization by n is governed by the amount of strictly sequential code (B): Speedup 20.00 18.00 16.00 14.00 12.00 10.00 8.00 6.00 Amdahl s Law Parallel Portion 50% 75% 90% 95% S(n) = 1 B + 1 n (1 B) 4.00 2.00 0.00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 Number of Processors 19/34

Amdahl s Law: What does it mean? Nice speedups are possible (that s the good news) Throwing processors at a problem is not always helpful Even with a modest amount of sequential code the speedup levels off quickly Algorithms have to be designed to be parallel Programs have to be written to be parallel Programmers need to know how to do this (and tools have to be available to help them) 20/34

Amdahl s Law: What does it mean? Nice speedups are possible (that s the good news) Throwing processors at a problem is not always helpful Even with a modest amount of sequential code the speedup levels off quickly Algorithms have to be designed to be parallel Programs have to be written to be parallel Programmers need to know how to do this (and tools have to be available to help them) You have to know what dependencies are 20/34

Dependencies A somewhat simplified definition: Data Dependence: The result obtained from one computation is needed in order to perform another computation. Control Dependence: The outcome of one computation determines whether another computation is performed or not. 21/34

Example: Dependencies int foo(int x) { int a = x >> 3; int b = x & 7; int c = a + b; return c; } c cannot be computed before a or b true data dependence a and b are independent no data dependence they can be computed in parallel 22/34

Dependence Graphs Representation of dependencies as a graph G = (V, E) V Nodes in the graph, representing a computation step E Pairs of nodes (a, b) V V Example from before: N = {x, a, b, c, ret} V = {(x, a), (x, b), (a, c), (b, c), (c, ret)} 23/34

Drawing Dependence Graphs int x a = x >> 3 b = x & 7 c = a + b return c 24/34

Dependencies and Parallelism How do dependencies constrain parallelism? the two operations without dependencies in parallel 25/34

Dependencies and Parallelism How do dependencies constrain parallelism? Execute the two operations without dependencies in parallel 25/34

Dependencies and Parallelism How do dependencies constrain parallelism? The next three operations can be executed in parallel 25/34

Dependencies and Parallelism How do dependencies constrain parallelism? The next operations can be executed (no parallelism) 25/34

Dependencies and Parallelism How do dependencies constrain parallelism? The next four operations can be executed in parallel 25/34

Forms of Data Dependencies True Dependence: (aka read-after-write dep., or RAW, ) The value produced by the source a of the dependence is used by the sink b (information flows from a to b). Anti Dependence: (aka write-after-read dep., or WAR, ) Arise from reusing names (e.g., variables or memory locations), a value used by the source a of the dependence is overwritten by the target b (information does not flow from a to b). Output Dependence: (aka write-after-write dep., or WAW, ) Also arise when names are reused, a value written by the source of the dependence is overwritten by the target. 26/34

Loop Carried Dependencies Dependencies across loop iterations (for, while, do): Distance vectors: Attached to dependencies, expressing the relation between loop iterations reading and/or writing a value. Typically involve array accesses or pointers Example: void foo(int *v, int n) { for(int i = 1; i < n; i++) { a = v[i]; b = v[i - 1]; // from previous iteration! v[i] = a + b; } } 27/34

Dependence Graph with Distance Vectors a = v[i] b = v[i 1] v a b v[ i ] = a + b Dependencies within a single iteration of the loop 28/34

Dependence Graph with Distance Vectors a = v[i] b = v[i 1] -1 v[ i ] = a + b +1 Dependencies across loop iterations (focusing on v) 28/34

Dependence Graph with Distance Vectors a = v[i] b = v[i 1]?? -1 v[ i ] = a + b +1 Dependencies you might imagine, but that are in fact not there 28/34

Dependence Graph with Distance Vectors a = v[i] b = v[i 1]?? -1 v[ i ] = a + b +1 Dependencies you might imagine, but that are in fact not there 28/34

Group Exercise: Parallelism in Loops Discuss in groups of 2-3: Can the loop from before be parallelized? Which forms of parallelism are available? (SIMD, MIMD, Thread-Level?) Which role do the dependencies play? a = v[i] b = v[i 1] -1 v[ i ] = a + b +1 29/34

Iteration Space Graphs Represent dependencies between loop iterations: Points represent iterations Arrows dependencies between them Example from before: i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 30/34

Iteration Space Graphs of Nested Loops Iteration space graphs also work in higher dimensions: i=1 j=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 j=2 j=3 j=4 j=5 j=6 j=7 j=8 Lots of (data-level) parallelism immediately visible. 31/34

Iteration Space Graphs of Nested Loops Iteration space graphs also work in higher dimensions: i=1 j=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 j=2 j=3 j=4 j=5 j=6 j=7 j=8 Lots of (data-level) parallelism immediately visible. 31/34

Iteration Space Graphs of Nested Loops Iteration space graphs also work in higher dimensions: i=1 j=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 j=2 j=3 j=4 j=5 j=6 j=7 j=8... also when we rotate the matrix. 31/34

Iteration Space Graphs of Nested Loops (2) A matching program for the iteration space graph from before: void foo(int v[10][10]) { for(int i = 1; i < 9; i++) { for(int j = 1; j < 8; j++) { v[i+1][j+2] = v[i][j] + 1; } } } 32/34

Summary Forms of parallelism: Pipelining / instruction-level parallelism (MIMD) Data-level parallelism (SIMD) Thead-level parallelism Most modern computers allow to exploit all three forms Amdahl s Law: Amount of sequential code limits speedup. Dependencies: Impose an ordering on the execution of operations Data-dependencies: RAW, WAR, WAW Control-dependencies Loop-carried dependencies and distance vectors Iteration space graphs 33/34

Todays TD Explore data-level parallelism and its relation to dependencies: Reason about dependencies in programs (rather interactive) Play with vectorization, i.e., the compiler extracts SIMD-like data-parallelism in loops Some practical considerations with caches... 34/34