Shared-Memory Programming Models

Size: px
Start display at page:

Download "Shared-Memory Programming Models"

Transcription

1 Shared-Memory Programming Models Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube

2 Cilk C language combined with several new keywords Different approach to OpenMP pragmas Developed at MIT since 1994 (!) Initial commercial version Cilk++ with C / C++ support Since 2010, offered by Intel as Cilk Plus Official language specification to foster other implementations Meanwhile maintained as GCC branch (similar to OpenMP) Support for Windows, Linux, and MacOS X Basic concept of serialization Any Cilk program compiled as concurrent code has the same execution semantics as the serial version

3 Intel Cilk Plus 3 Three keywords to express potential parallelism cilk_spawn: Asynchronous function call Runtime decides, spawning is not mandated cilk_for: Allows loop iterations to be performed in parallel Runtime decides, parallelization is not mandated cilk_sync: Wait until all spawned calls are completed Barrier for cilk_spawn activity Runtime decided the level of parallelism, performs work stealing Strand: Instruction sequence in-between a change of parallelism Reducers: Lock-free private views on variables Notation for SIMD array operations and SIMD functions Serialization: Cilk keyword become ordinary statements, code semantics are expected to remain the same

4 Intel Cilk Plus 4 Strand concept makes it possible to express every program as directed acyclic graph (DAG) cilk_spawn fib(n-2) fib(n-1) Strand Strand [cilkplus.org] cilk_sync Strand return x+y Implicit cilk_sync Continuation / Strand

5 Intel Cilk Plus 5 [cilkplus.org]

6 Intel Cilk Plus 6 Accumulator / reduction algorithm Compute one result value by updating it with every computational step (that may be parallelized) Same reduction concept as with OpenMP and others Problem of avoiding data races [software.intel.com]

7 Intel Cilk Plus 7 Express accumulated result as reducer pointer variable to get automated locking Parallel reducer operations are promised to be in serial ordering [software.intel.com]

8 Intel Cilk Plus 8 Express accumulated result as reducer pointer variable to get automated locking Parallel reducer operations are promised to be in serial ordering [software.intel.com]

9 Intel Cilk Plus 9 Parallel tree search Resulting list is always in-order Left subtree Root Right subtree Stable semantics regardless of parallelization [software.intel.com]

10 Intel Cilk Plus 10 Predefined reducers for C and C++, custom reducers supported Optimized internal operation based on strands concept Each strand gets a private view on the reducer variable No locking during update When strands join again, the reducer merges the operations

11 Intel Cilk Plus 11 Cilk support the high-level expression of array operations Gives the runtime a chance to parallelize work Intended for data parallel element operations without any ordering constraints New operator [:] Specify data parallelism on an array array-expression[lowerbound : length : stride] Multi-dimensional sections are supported: a[:][:] Short-hand description for complex loops A[:]=5 for (i = 0; i < 10; i++) A[i] = 5; A[0:n] = 5; A[0:5:2] = 5; for (i = 0; i < 10; i += 2) A[i] = 5; A[:] = B[:]; A[:] = B[:] + 5; D[:] = A[:] + B[:]; func (A[:]);

12 Intel Cilk Plus 12 Array notation can be used inside conditions if (5 == a[:]) results[:] = "Matched ; else results[:] = "Not Matched"; Function mapping is executed in parallel with no specific order A[:] = pow(b[:], c); In C++, this works with any defined operator A[:] = B[:] + C[:]; // A[:] = operator+(b[:], C[:]); Several predefined reduction macros applicable to array sections sec_reduce_add, sec_reduce_mul, sec_reduce_max, sec_reduce_min, sec_reduce_all_zero, sec_reduce_any_zero Array sections can be used as array indices for gather / scatter C[:] = A[B[:]] (gather), A[B[:]] = C[:] (scatter)

13 Intel Threading Building Blocks (TBB) 13 Portable C++ library, toolkits for different operating systems Also available as open source version Complements basic OpenMP / Cilk features Loop parallelization, parallel reduction, synchronization, explicit tasks High-level concurrent containers hash map, queue, vector, set High-level parallel operations prefix scan, sorting, data-flow pipelining, deterministic reduce Unfair scheduling approach, to favor threads having data in cache Supported for cache-aware memory allocation Comparable: Microsoft C++ Concurrency Runtime

14 Intel Math Kernel Library (MKL) 14 Intel library with hand-optimized functions for... Highly vectorized and threaded linear algebra Basic Linear Algebra Subprograms (BLAS) API, confirms to de-facto standards in high-performance computing Vector-vector, matrix-vector, matrix-matrix operations Fast fourier transforms (FFT) Single precision, double precision, complex, real,... Vector math and statistics functions Random number generators and probability distributions Spline-based data fitting C or Fortran API calls Beats any automated compiler optimization

15 Easy Mappings [Dig] 15 Oracle Java Intel TBB MS.Net TPL Parallel For ParallelArray parallel_for Parallel.For Concurrent Collections ConcurrentHashMap,... concurrent_hash_map,... Atomic Classes AtomicInteger,... atomic<t> Interlocked ForkJoin Task Parallelism ForkJoinTask framework task Task, ReplicableTask

16 Lock-Free Programming 16 Lock-free programming as a way of sharing data without maintaining locks Prevents deadlock and live-lock conditions Goal: Suspension of one thread never prevents another thread from making progress (e.g. synchronized shared queue) Blocking by design does not disqualify the lock-free realization Algorithms rely on hardware support for atomic operations Read-Modify-Write (RMW) operations Compare-And-Swap (CAS) operations These operations are typically mapped in operating system API

17 Lock-Free Programming 17 void LockFreeQueue::push(Node* newhead) { for (;;) { // Copy a shared variable (m_head) to a local. Node* oldhead = m_head; // Do some speculative work, not yet visible to other threads. newhead->next = oldhead; // Next, attempt to publish our changes to the shared variable. // If the shared variable hasn't changed, the CAS succeeds and we return. // Otherwise, repeat. if (_InterlockedCompareExchange(&m_Head, newhead, oldhead) == oldhead) return; } }#

18 Sequential Consistency 18 Consistency model where the order of memory operations is consistent with the source code Important for lock-free algorithm semantic Not guaranteed by some processor architectures (e.g. PowerPC) Java and C++ support the enforcement of sequential consistency std::atomic<int> X(0), Y(0); int r1, r2; void thread1() { X.store(1); r1 = Y.load(); } void thread2() { Y.store(1); r2 = X.load(); }# r1 and r2 never become zero at the same time Compiler generates additional memory fences and RMW operations Still does not prevent from memory re-ordering due to instruction re-ordering by the compiler itself

19 Functional Programming 19 Programming paradigm that treats execution as function evaluation -> map some input to some output Contrary to imperative programming No longer focus on statement execution for state modification Programmer no longer specifies control flow explicitly Side-effect free computation through avoidance of local state -> referential transparency (no demand for some control flow) Typically strong focus on immutable data as language default -> instead of altering values, return altered copy One foundation: Alonzo Church s lambda calculus from the 1930 s First functional language was Lisp (late 50s) Trend to add functional programming features into imperative languages

20 Imperative to Functional 20 alert("i'd like some Spaghetti!");# alert("i'd like some Chocolate Moose!");# Optimize function SwedishChef( food ) # {# alert("i'd like some " + food + "!");# }# SwedishChef("Spaghetti");# SwedishChef("Chocolate Moose");# # alert("get the lobster");# PutInPot("lobster");# PutInPot("water");# alert("get the chicken");# BoomBoom("chicken");# BoomBoom("coconut");# Optimize function Cook( i1, i2, f ) {# alert("get the " + i1);# f(i1); f(i2); } # # Cook( "lobster", "water", PutInPot);# Cook( "chicken", "coconut", BoomBoom): # Anonymous Function function Cook( i1, i2, f ) {# alert("get the " + i1);# f(i1); f(i2); } # # Cook( "lobster", "water", # function(x) { alert("pot " + x); } );# Cook( "chicken", "coconut", # function(x) { alert("boom " + x); } );#

21 Imperative to Functional 21 map() does not demand particular operation ordering var a = [1,2,3];# for (i=0; i<a.length; i++) {# a[i] = a[i] * 2; # } # # for (i=0; i<a.length; i++) {# alert(a[i]); # }# function map(fn, a)# {# for (i = 0; i < a.length; i++)# {# a[i] = fn(a[i]);# }# }# map( function(x){return x*2;}, a );# map( alert, a );#

22 Imperative to Functional 22 function sum(a) {# var s = 0;# for (i = 0; i < a.length; i++)# s += a[i];# return s;# }# function join(a) {# var s = "";# for (i = 0; i < a.length; i++)# s += a[i];# return s; }# alert(sum([1,2,3])); # alert(join(["a","b","c"]));# map() and reduce() functions do not demand particular operation ordering function reduce(fn, a, init){# var s = init;# for (i = 0; i < a.length; i++)# s = fn( s, a[i] );# return s;# }# # function sum(a){# return reduce( function(a, b){ return a + b; }, a, 0 );# }# # function join(a){# return reduce( function(a, b){ return a + b; }, a, "" );# }#

23 Imperative to Functional - Python 23 # Nested loop procedural style for finding big products xs = (1,2,3,4) ys = (10,15,3,22) bigmuls = [] for x in xs: for y in ys: if x*y > 25: bigmuls.append((x,y)) print bigmuls [David Merz] print [(x,y) for x in (1,2,3,4) for y in (10,15,3,22) if x*y > 25]

24 Functional Programming 24 Higher order functions: Functions as argument or return value Pure functions: No memory or I/O side effects If the result of a pure expression is not used, it can be removed A pure function called with side-effect free parameters has a constant result Without data dependencies, pure functions can run in parallel A language with only pure function semantic can change evaluation order Functions with side effects (e.g. printing) typically do not return results Recursion as replacement for looping (e.g. factorial) Lazy evaluation possible, e.g. to support infinite data structures Perfect foundation for implicit parallelism...

Rationale for Map-Reduce

Rationale for Map-Reduce Rationale for Map-Reduce Map-reduce idea: An Example (part 1) This example uses JavaScript. // A trivial example: alert("i d like some Spaghetti!"); alert("i d like some Chocolate Moose!"); The above can

More information

Advanced Shared-Memory Programming

Advanced Shared-Memory Programming Advanced Shared-Memory Programming Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube Shared-Memory Parallelism 2 Libraries and language extensions for standard

More information

On the cost of managing data flow dependencies

On the cost of managing data flow dependencies On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work

More information

Cilk Plus GETTING STARTED

Cilk Plus GETTING STARTED Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

MetaFork: A Metalanguage for Concurrency Platforms Targeting Multicores

MetaFork: A Metalanguage for Concurrency Platforms Targeting Multicores MetaFork: A Metalanguage for Concurrency Platforms Targeting Multicores Xiaohui Chen, Marc Moreno Maza & Sushek Shekar University of Western Ontario September 1, 2013 Document number: N1746 Date: 2013-09-01

More information

Multicore programming in CilkPlus

Multicore programming in CilkPlus Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory

More information

Cilk Plus: Multicore extensions for C and C++

Cilk Plus: Multicore extensions for C and C++ Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

15-210: Parallelism in the Real World

15-210: Parallelism in the Real World : Parallelism in the Real World Types of paralellism Parallel Thinking Nested Parallelism Examples (Cilk, OpenMP, Java Fork/Join) Concurrency Page1 Cray-1 (1976): the world s most expensive love seat 2

More information

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms Page1 Andrew Chien, 2008 2 Outline Concurrency vs. Parallelism Quicksort example Nested

More information

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence Plan Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 Multi-core Architecture 2 Race Conditions and Cilkscreen (Moreno Maza) Introduction

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1 OUTLINE CILK and CILK++ Language Features and Usages Work stealing runtime CILK++ Reducers Conclusions 2 IDEALIZED SHARED MEMORY ARCHITECTURE Hardware

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Today s Theme Message passing model (abstraction) Threads operate within

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

CSE 613: Parallel Programming

CSE 613: Parallel Programming CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of

More information

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement Outline Expression Evaluation and Control Flow In Text: Chapter 6 Notation Operator evaluation order Operand evaluation order Overloaded operators Type conversions Short-circuit evaluation of conditions

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 3 Thomas Wies New York University Review Last week Names and Bindings Lifetimes and Allocation Garbage Collection Scope Outline Control Flow Sequencing

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2015-16 Massimo Coppola 08/04/2015 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1.

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1. Moore s Law 1000000 Intel CPU Introductions 6.172 Performance Engineering of Software Systems Lecture 11 Multicore Programming Charles E. Leiserson 100000 10000 1000 100 10 Clock Speed (MHz) Transistors

More information

COMP Parallel Computing. SMM (4) Nested Parallelism

COMP Parallel Computing. SMM (4) Nested Parallelism COMP 633 - Parallel Computing Lecture 9 September 19, 2017 Nested Parallelism Reading: The Implementation of the Cilk-5 Multithreaded Language sections 1 3 1 Topics Nested parallelism in OpenMP and other

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2017-18 Massimo Coppola 23/03/2018 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB

Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Jim Cownie Intel SSG/DPD/TCAR 1 Optimization Notice Optimization Notice Intel s compilers may or

More information

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 Concepts in Multicore Programming The Multicore- Software Challenge MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 2009 Charles E. Leiserson 1 Cilk, Cilk++, and Cilkscreen, are trademarks of

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

Questions from last time

Questions from last time Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100

More information

Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Presented By Andrew Butt

Presented By Andrew Butt Presented By Andrew Butt Overview A Brief History of Functional Programming Comparison Between OOP and Functional programming Paradigms and Concepts Functional Programming in Other Languages An Overview

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Intel MPI Library Conditional Reproducibility

Intel MPI Library Conditional Reproducibility 1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance

More information

C Language Constructs for Parallel Programming

C Language Constructs for Parallel Programming C Language Constructs for Parallel Programming Robert Geva 5/17/13 1 Cilk Plus Parallel tasks Easy to learn: 3 keywords Tasks, not threads Load balancing Hyper Objects Array notations Elemental Functions

More information

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC. Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:

More information

Weeks 6&7: Procedures and Parameter Passing

Weeks 6&7: Procedures and Parameter Passing CS320 Principles of Programming Languages Weeks 6&7: Procedures and Parameter Passing Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Weeks 6&7: Procedures and Parameter Passing 1 / 45

More information

Shared-Memory Concurrency Programmierung Paralleler und Verteilter Systeme (PPV)

Shared-Memory Concurrency Programmierung Paralleler und Verteilter Systeme (PPV) Shared-Memory Concurrency Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Von Neumann Model 2 Processor executes

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements

More information

Introduction to Computer Systems /18-243, fall th Lecture, Dec 1

Introduction to Computer Systems /18-243, fall th Lecture, Dec 1 Introduction to Computer Systems 15-213/18-243, fall 2009 24 th Lecture, Dec 1 Instructors: Roger B. Dannenberg and Greg Ganger Today Multi-core Thread Level Parallelism (TLP) Simultaneous Multi -Threading

More information

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming Nicolas Bettenburg 1 Universitaet des Saarlandes, D-66041 Saarbruecken, nicbet@studcs.uni-sb.de Abstract. As traditional

More information

CS 5220: Shared memory programming. David Bindel

CS 5220: Shared memory programming. David Bindel CS 5220: Shared memory programming David Bindel 2017-09-26 1 Message passing pain Common message passing pattern Logical global structure Local representation per processor Local data may have redundancy

More information

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding

More information

Message-Passing Programming with MPI

Message-Passing Programming with MPI Message-Passing Programming with MPI Message-Passing Concepts David Henty d.henty@epcc.ed.ac.uk EPCC, University of Edinburgh Overview This lecture will cover message passing model SPMD communication modes

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

The C/C++ Memory Model: Overview and Formalization

The C/C++ Memory Model: Overview and Formalization The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions

More information

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Chapter 4: Threads. Operating System Concepts 9 th Edit9on Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward. High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming

More information

Programming Kotlin. Familiarize yourself with all of Kotlin s features with this in-depth guide. Stephen Samuel Stefan Bocutiu BIRMINGHAM - MUMBAI

Programming Kotlin. Familiarize yourself with all of Kotlin s features with this in-depth guide. Stephen Samuel Stefan Bocutiu BIRMINGHAM - MUMBAI Programming Kotlin Familiarize yourself with all of Kotlin s features with this in-depth guide Stephen Samuel Stefan Bocutiu BIRMINGHAM - MUMBAI Programming Kotlin Copyright 2017 Packt Publishing First

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Functional Programming

Functional Programming Functional Programming Björn B. Brandenburg The University of North Carolina at Chapel Hill Based in part on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. Brief

More information

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

Functional Programming. Big Picture. Design of Programming Languages

Functional Programming. Big Picture. Design of Programming Languages Functional Programming Big Picture What we ve learned so far: Imperative Programming Languages Variables, binding, scoping, reference environment, etc What s next: Functional Programming Languages Semantics

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

Runtime Support for Scalable Task-parallel Programs

Runtime Support for Scalable Task-parallel Programs Runtime Support for Scalable Task-parallel Programs Pacific Northwest National Lab xsig workshop May 2018 http://hpc.pnl.gov/people/sriram/ Single Program Multiple Data int main () {... } 2 Task Parallelism

More information

CSE 260 Lecture 19. Parallel Programming Languages

CSE 260 Lecture 19. Parallel Programming Languages CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014

More information

CSc 372. Comparative Programming Languages. 2 : Functional Programming. Department of Computer Science University of Arizona

CSc 372. Comparative Programming Languages. 2 : Functional Programming. Department of Computer Science University of Arizona 1/37 CSc 372 Comparative Programming Languages 2 : Functional Programming Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2013 Christian Collberg 2/37 Programming Paradigms

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Cilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation

Cilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation Cilk Plus in GCC GNU Tools Cauldron 2012 Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation July 10, 2012 Presentation Outline Introduction Cilk Plus components Implementation GCC Project Status

More information

Parallelization on Multi-Core CPUs

Parallelization on Multi-Core CPUs 1 / 30 Amdahl s Law suppose we parallelize an algorithm using n cores and p is the proportion of the task that can be parallelized (1 p cannot be parallelized) the speedup of the algorithm is assuming

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Masterpraktikum Scientific Computing

Masterpraktikum Scientific Computing Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Intel Cilk Plus OpenCL Übung, October 7, 2012 2 Intel Cilk

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

CS 240A: Shared Memory & Multicore Programming with Cilk++

CS 240A: Shared Memory & Multicore Programming with Cilk++ CS 240A: Shared Memory & Multicore rogramming with Cilk++ Multicore and NUMA architectures Multithreaded rogramming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some

More information

Pablo Halpern Parallel Programming Languages Architect Intel Corporation

Pablo Halpern Parallel Programming Languages Architect Intel Corporation Pablo Halpern Parallel Programming Languages Architect Intel Corporation CppCon, 8 September 2014 This work by Pablo Halpern is licensed under a Creative Commons Attribution

More information

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen COP4020 Programming Languages Functional Programming Prof. Robert van Engelen Overview What is functional programming? Historical origins of functional programming Functional programming today Concepts

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Basics

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Basics 1 CSCE 314: Programming Languages Dr. Flemming Andersen Haskell Basics 2 Contents 1. Jump into Haskell: Using ghc and ghci (more detail) 2. Historical Background of Haskell 3. Lazy, Pure, and Functional

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Xiaohui Chen, Marc Moreno Maza & Sushek Shekar University of Western Ontario, Canada IBM Toronto Lab February 11, 2015 Plan

More information

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

Presenter: Georg Zitzlsberger Date:

Presenter: Georg Zitzlsberger Date: C++ SIMD parallelism with Intel Cilk Plus and OpenMP* 4.0 Presenter: Georg Zitzlsberger Date: 05-12-2014 Agenda SIMD & Vectorization How to Vectorize? Vectorize with OpenMP* 4.0 Vectorize with Intel Cilk

More information

Kotlin/Native concurrency model. nikolay

Kotlin/Native concurrency model. nikolay Kotlin/Native concurrency model nikolay igotti@jetbrains What do we want from concurrency? Do many things concurrently Easily offload tasks Get notified once task a task is done Share state safely Mutate

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

Reducers and other Cilk++ hyperobjects

Reducers and other Cilk++ hyperobjects Reducers and other Cilk++ hyperobjects Matteo Frigo (Intel) ablo Halpern (Intel) Charles E. Leiserson (MIT) Stephen Lewin-Berlin (Intel) August 11, 2009 Collision detection Assembly: Represented as a tree

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Shared Memory Programming. Parallel Programming Overview

Shared Memory Programming. Parallel Programming Overview Shared Memory Programming Arvind Krishnamurthy Fall 2004 Parallel Programming Overview Basic parallel programming problems: 1. Creating parallelism & managing parallelism Scheduling to guarantee parallelism

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Haskell 101 Dr. Hyunyoung Lee 1 Contents 1. Historical Background of Haskell 2. Lazy, Pure, and Functional Language 3. Using ghc and ghci 4. Functions 5. Haskell Scripts

More information

Java Concurrency. Towards a better life By - -

Java Concurrency. Towards a better life By - - Java Concurrency Towards a better life By - Srinivasan.raghavan@oracle.com - Vaibhav.x.choudhary@oracle.com Java Releases J2SE 6: - Collection Framework enhancement -Drag and Drop -Improve IO support J2SE

More information