Intel Threading Building Blocks (TBB)

Size: px
Start display at page:

Download "Intel Threading Building Blocks (TBB)"

Transcription

1 Intel Threading Building Blocks (TBB) SDSC Summer Institute 2012 Pietro Cicotti Computational Scientist Gordon Applications Team Performance Modeling and Characterization Lab

2 Parallelism and Decomposition TBB vs. Pthreads TBB vs. OpenMP Data parallelism,task parallelism Mixed solutions Pipelining TBB Algorithms, splittable ranges, containers Graph execution model Low level constructs

3 Algorithms parallel_for parallel_reduce parallel_scan parallel_do parallel_for_each pipeline parallel_pipeline parallel_sort parallel_invoke Generic Algorithms

4 parallel_for Parallel loop with no loop carried dependencies for ( size_t i=0; i<n; ++i) func(a[i]); class use_func{ data* a; operator()(range) { for i in range func(a[i]); parallel_for(range(0,n),use_func(a));

5 Splittable ranges Range Half open set [b,e) blocked_range<t> Partitioning split constructor divisibility partitioner class use_func{ data* a; operator()(blocked_range<int> r) { for(i=r.begin(), i!=r.end(); ++I) func(a[i]); parallel_for(blocked_range(0,n,b),use_func(a) );

6 Partitioners Recursive splitting simple_partitioner Always split auto_partitioner Load balance affinity_partitioner Try to preserve mapping data->threads parallel_for(blocked_range(0,n),use_func(a), auto_partitioner());

7 More parallel_for parallel_for<range,func>(range, Func f) parallel_for<idx,func>(idx b, Idx e, Func f) parallel_for<idx,func>(idx b, Idx e, Idx s, Func f) Partitioner Task group

8 parallel_reduce Reduce operations + operator reduce a[0:n-1] n 1 a[i] i=0 class use_func{ data* a; tmp s; operator()(range) { for i in range s+=a[i]; for ( size_t i=0; i<n; ++i) s+=a[i]; join(use_func f) { s+=f.s; use_func f(a); parallel_reduce(range(0,n),f); // result in f.s

9 Split and join Split Divide range for localized reductions Join Combine partial results A 10 A A B C B D 23

10 More parallel_reduce parallel_reduce(range r, Value 0, Func f, Red r) Replaces initial tmp values with 0 Replaces += with operator function f as op= over a range Replaces join using operator function r Partitioners Groups class r { t operator()(t lo, t ro) { return lo+ro; parallel_reduce(range(0,n),0,f(a),r);

11 parallel_scan Parallel prefix operation E.g. y[i]= i x[j] j=0 y[i] depends on y[i-1] y[0]=x[0] y[i]=y[i-1]+x[i] Parallelization Phase 1: compute partial results Phase 2: compute final prefix

12 parallel_scan phases Use tagging type parameter in operators class use_func{ data* x; data* y; data tmp; y[0]=x[0]; for ( size_t i=1; i<n; ++i) y[i] = y[i-1]+x[i]; operator(range r, pre_scan) { for(i=r.begin()+1; i!=r.end(); ++i) tmp+=x[i]; operator(range r, final_scan) { y[r.begin()] = tmp; for(i=r.begin()+1; i!=r.end(); ++i) y[i]+=y[i-1]+x[i]; parallel_scan(range(0,n),use_func(x,y));

13 parallel_scan example Split Divide range and do pre_scan reverse_join Combine partial results final_scan A A A B C B D

14 parallel_do parallel_do(iterator first, Iterator last, Func f) Apply f to iterator f can extend the range parallel_do_feeder::add class use_func{ operator()(data item, parallel_do_feeder pdf) { if(process(item)) pdf.add(new item);

15 Other parallel_* parallel_for_each Like parallel_do, but without feeder parallel_invoke Invoke up to 10 functions parallel_invoke(f0, f1,.. f9); parallel_sort(iterator begin, Iterator end) parallel_sort(iterator begin, Iterator end, Comp c) RandomAccessIterators

16 pipeline pipeline class Concatenated filters add_filter() run(live tokens) Filter class Phase of the pipeline Parallel, serial in and out of order parallel_pipeline class Same functionality, different interface

17 Pipeline example Word dictionary pipeline Read word filter Lower case filter Dictionary update filter class read_f: public filter { operator()(void* ) { b=buffers[next]; next=(next+1)%buffers_count; read(file, b); return b; class lower_f: public filter { operator()(void* b) { tolower(b); return b; class dictionary_f: public filter { operator()(void* b) { dictionary.add(b); return NULL;

18 Concurrent containers Generic containers Maps, sets, queues, vectors Concurrent operations (some) Transparent locking Next concurrent_queue concurrent_vector concurrent_hash_map Others concurrent_unordered_map/set concurrent_bounded/priority_map

19 Concurrent operations push pop_if_present pop size() Signed size type concurrent_queue

20 concurrent_vector Concurrent operations grow_by grow_to_at_least push_back operator[] Range container Has range type member Can be used in conjunction with parallel_for, reduce, scan

21 concurrent_hash_map Concurrent operations find insert erase hashing Hash function defined on keys Equality operator defined on keys Range container Has range type member Can be used in conjunction with parallel_for, reduce, scan

22 Getting started task_scheduler_init Constructor initializes the library Library activated by creating a scheduler object Number of threads is optionally specified Activation can be deferred Termination is automatic when scheduler objects are destructed Can be invoked explicitly Compiling on Gordon /opt/intel/composer_xe_2011_sp /tbb -ltbb

23 Examples /oasis/scratch/pcicotti/temp_project/tbb_si.tgz Extract a copy to a work directory Read examples code Compile and run (Interactive job!) Modify, rerun, experiment number of threads, granularity of ranges, partitioners More examples /opt/intel/composer_xe_2011_sp /tbb/examples

24 Task graph execution model Graph Nodes Functions Data/Communication managers Edges Connectors Dependencies Messages continue_message Data Support construction of DAGs of tasks

25 Thread local storage combinable enumerable Allocators scalable_allocator aligned_allocator cache_aligned_allocator zero_allocator Memory allocation

26 Synchronization Mutex (OS) mutex class recursive_mutex spin_mutex queuing_mutex spin_rw_mutex queuing_rw_mutex Atomic operations fetch_and_store fetch_and_add compare_and_swap Derived operations +=,++

27 Scheduler Task groups Group of tasks executing concurrently Tasks API Create and spawn tasks Define dependencies Wait for tasks Destroy and recycle tasks Set priorities Affinity Native threads API

28 References n.php Installing TBB Download library package Build library and examples Makefile+gnu compilers

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization

More information

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2015-16 Massimo Coppola 08/04/2015 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2017-18 Massimo Coppola 23/03/2018 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2014-15 Massimo Coppola 5/05/2015 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization support

More information

Task-based Data Parallel Programming

Task-based Data Parallel Programming Task-based Data Parallel Programming Asaf Yaffe Developer Products Division May, 2009 Agenda Overview Data Parallel Algorithms Tasks and Scheduling Synchronization and Concurrent Containers Summary 2 1

More information

Intel Thread Building Blocks, Part IV

Intel Thread Building Blocks, Part IV Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for

More information

Shared memory parallel computing. Intel Threading Building Blocks

Shared memory parallel computing. Intel Threading Building Blocks Shared memory parallel computing Intel Threading Building Blocks Introduction & history Threading Building Blocks (TBB) cross platform C++ template lib for task-based shared memory parallel programming

More information

Tasks and Threads. What? When? Tasks and Threads. Use OpenMP Threading Building Blocks (TBB) Intel Math Kernel Library (MKL)

Tasks and Threads. What? When? Tasks and Threads. Use OpenMP Threading Building Blocks (TBB) Intel Math Kernel Library (MKL) CGT 581I - Parallel Graphics and Simulation Knights Landing Tasks and Threads Bedrich Benes, Ph.D. Professor Department of Computer Graphics Purdue University Tasks and Threads Use OpenMP Threading Building

More information

Intel(R) Threading Building Blocks

Intel(R) Threading Building Blocks Intel(R) Threading Building Blocks Reference Manual Copyright 2007 Intel Corporation All Rights Reserved Document Number 315415-001US Revision: 1.6 World Wide Web: http://www.intel.com Document Number

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

Intel(R) Threading Building Blocks

Intel(R) Threading Building Blocks Intel(R) Threading Building Blocks Reference Manual Copyright 2008 Intel Corporation All Rights Reserved Document Number 315415-001US Revision: 1.10 World Wide Web: http://www.intel.com Intel(R) Threading

More information

Intel Threading Building Blocks

Intel Threading Building Blocks Scalable Programming for Multi-core Software and Solutions Group (SSG), Developer Products Division Arch D. Robison Principal Engineer Overview (Intel TBB) is a C++ library that simplifies threading for

More information

Intel(R) Threading Building Blocks

Intel(R) Threading Building Blocks Intel(R) Threading Building Blocks Reference Manual Document Number 315415-007US World Wide Web: http://www.intel.com Intel(R) Threading Building Blocks Legal Information INFORMATION IN THIS DOCUMENT IS

More information

Parallel Programming Models

Parallel Programming Models Parallel Programming Models Intel Cilk Plus Tasking Intel Threading Building Blocks, Copyright 2009, Intel Corporation. All rights reserved. Copyright 2015, 2011, Intel Corporation. All rights reserved.

More information

Threads: either under- or over-utilised

Threads: either under- or over-utilised Threads: either under- or over-utilised Underutilised: limited by creation speed of work Cannot exploit all the CPUs even though there is more work Overutilised: losing performance due to context switches

More information

Parallelization on Multi-Core CPUs

Parallelization on Multi-Core CPUs 1 / 30 Amdahl s Law suppose we parallelize an algorithm using n cores and p is the proportion of the task that can be parallelized (1 p cannot be parallelized) the speedup of the algorithm is assuming

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Models Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Overview Execution options

More information

Building High Performance Threaded Applications using Libraries or Why you don t need a parallel compiler. Jim Cownie Principal Engineer

Building High Performance Threaded Applications using Libraries or Why you don t need a parallel compiler. Jim Cownie Principal Engineer Building High Performance Threaded Applications using Libraries or Why you don t need a parallel compiler Jim Cownie Principal Engineer Outline Overview of Intel Threading Building Blocks (Intel TBB) Problems

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Questions from last time

Questions from last time Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

Marco Danelutto. May 2011, Pisa

Marco Danelutto. May 2011, Pisa Marco Danelutto Dept. of Computer Science, University of Pisa, Italy May 2011, Pisa Contents 1 2 3 4 5 6 7 Parallel computing The problem Solve a problem using n w processing resources Obtaining a (close

More information

Rama Malladi. Application Engineer. Software & Services Group. PDF created with pdffactory Pro trial version

Rama Malladi. Application Engineer. Software & Services Group. PDF created with pdffactory Pro trial version Threaded Programming Methodology Rama Malladi Application Engineer Software & Services Group Objectives After completion of this module you will Learn how to use Intel Software Development Products for

More information

Guillimin HPC Users Meeting January 13, 2017

Guillimin HPC Users Meeting January 13, 2017 Guillimin HPC Users Meeting January 13, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit

More information

A C++ Library Solution To Parallelism

A C++ Library Solution To Parallelism A C++ Library Solution To Parallelism Alexey Kukanov (Intel) Artur Laksberg (Microsoft) Arch Robison (Intel) Abstract This white paper describes the common subset of two C++ programming libraries: the

More information

Chapter 4: Multi-Threaded Programming

Chapter 4: Multi-Threaded Programming Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading

More information

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; } Semaphore Semaphore S integer variable Two standard operations modify S: wait() and signal() Originally called P() and V() Can only be accessed via two indivisible (atomic) operations wait (S) { while

More information

Shared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools

Shared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools Shared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools Bart lomiej Jacek Kubica 1 1 Warsaw University of Technology, Institute of Control and Computation Engineering,

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING 1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING TASK-PARALLELISM OpenCL, CUDA, OpenMP (traditionally) and the like are largely data-parallel models Their core unit of parallelism

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Parallel Programming Principle and Practice Lecture 7

Parallel Programming Principle and Practice Lecture 7 Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Outline Intel Threading Building Blocks Task-based programming Task Scheduler Scalable Memory Allocators Concurrent Containers

More information

Intel Parallel Studio

Intel Parallel Studio Intel Parallel Studio Product Brief Intel Parallel Studio Parallelism for your Development Lifecycle Intel Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application

More information

ZKI-Tagung Supercomputing, Okt Parallel Patterns for Composing and Nesting Parallelism in HPC Applications

ZKI-Tagung Supercomputing, Okt Parallel Patterns for Composing and Nesting Parallelism in HPC Applications ZKI-Tagung Supercomputing, 10-11. Okt. 2013 Parallel Patterns for Composing and Nesting Parallelism in HPC Applications Hans Pabst Software and Services Group Intel Corporation Agenda Introduction Parallel

More information

POSIX Threads and OpenMP tasks

POSIX Threads and OpenMP tasks POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i

More information

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You

More information

Shared-Memory Programming Models

Shared-Memory Programming Models Shared-Memory Programming Models Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube Cilk C language combined with several new keywords Different approach to OpenMP

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

CSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing

CSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing CSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing Final Exam General Information The final exam will be held Monday, Dec 21st, 2009, 11:30am-2:30pm, DCC 308. A makeup

More information

Intel Threading Building Blocks

Intel Threading Building Blocks Tutorial Copyright 2006 2008 Intel Corporation All Rights Reserved Document Number 319872-001US Revision: 1.11 World Wide Web: http://www.intel.com Disclaimer and Legal Information INFORMATION IN THIS

More information

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Chapter 4: Threads. Operating System Concepts 9 th Edit9on Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit

More information

Exploring Parallelism At Different Levels

Exploring Parallelism At Different Levels Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Application Programming

Application Programming Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Problem. Context. Hash table

Problem. Context. Hash table Problem In many problems, it is natural to use Hash table as their data structures. How can the hash table be efficiently accessed among multiple units of execution (UEs)? Context Hash table is used when

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

CS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009.

CS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009. CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont. Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following

More information

Copyright 2010, Elsevier Inc. All rights Reserved

Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 6 Parallel Program Development 1 Roadmap Solving non-trivial problems. The n-body problem. The traveling salesman problem. Applying Foster

More information

Parallel Programming Languages COMP360

Parallel Programming Languages COMP360 Parallel Programming Languages COMP360 The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight,

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve OPENMP Standard multicore API for scientific computing Based on fork-join model: fork many threads, join and resume sequential thread Uses pragma:#pragma omp parallel Shared/private

More information

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation

More information

Intel Tools zur parallelen Programmierung Windows HPC RWTH Aachen 2007

Intel Tools zur parallelen Programmierung Windows HPC RWTH Aachen 2007 Intel Tools zur parallelen Programmierung Windows HPC RWTH Aachen 2007 Dr. Mario Deilmann Intel Compiler Group Processor Evolution X86 New Quad-Core Intel Xeon 5300 for 2006 Dual-Core Intel Xeon processor

More information

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4 Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Discussion CSE 224. Week 4

Discussion CSE 224. Week 4 Discussion CSE 224 Week 4 Midterm The midterm will cover - 1. Topics discussed in lecture 2. Research papers from the homeworks 3. Textbook readings from Unit 1 and Unit 2 HW 3&4 Clarifications 1. The

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

Shared-Memory Programming

Shared-Memory Programming Shared-Memory Programming 1. Threads 2. Mutual Exclusion 3. Thread Scheduling 4. Thread Interfaces 4.1. POSIX Threads 4.2. C++ Threads 4.3. OpenMP 4.4. Threading Building Blocks 5. Side Effects of Hardware

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

On the cost of managing data flow dependencies

On the cost of managing data flow dependencies On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work

More information

Concurrent Data Structures in C++ CSInParallel Project

Concurrent Data Structures in C++ CSInParallel Project Concurrent Data Structures in C++ CSInParallel Project July 26, 2012 CONTENTS 1 Concurrent Data Structures in C++: Web crawler lab 1 1.1 Your goals................................................ 1 1.2

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Joe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago.

Joe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago. Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago email: joe@joehummel.net stuff: http://www.joehummel.net/downloads.html Async programming:

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

CS333 Intro to Operating Systems. Jonathan Walpole

CS333 Intro to Operating Systems. Jonathan Walpole CS333 Intro to Operating Systems Jonathan Walpole Threads & Concurrency 2 Threads Processes have the following components: - an address space - a collection of operating system state - a CPU context or

More information

War Stories : Graph Algorithms in GPUs

War Stories : Graph Algorithms in GPUs SAND2014-18323PE War Stories : Graph Algorithms in GPUs Siva Rajamanickam(SNL) George Slota, Kamesh Madduri (PSU) FASTMath Meeting Exceptional service in the national interest is a multi-program laboratory

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level

More information

MULTI-THREADED QUERIES

MULTI-THREADED QUERIES 15-721 Project 3 Final Presentation MULTI-THREADED QUERIES Wendong Li (wendongl) Lu Zhang (lzhang3) Rui Wang (ruiw1) Project Objective Intra-operator parallelism Use multiple threads in a single executor

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

Intel Threading Building Blocks

Intel Threading Building Blocks Intel Threading Building Blocks Tutorial Document Number 319872-008US World Wide Web: http://www.intel.com Intel Threading Building Blocks Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Introduction to OpenMP.

Introduction to OpenMP. Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i

More information

C# 6.0 in a nutshell / Joseph Albahari & Ben Albahari. 6th ed. Beijin [etc.], cop Spis treści

C# 6.0 in a nutshell / Joseph Albahari & Ben Albahari. 6th ed. Beijin [etc.], cop Spis treści C# 6.0 in a nutshell / Joseph Albahari & Ben Albahari. 6th ed. Beijin [etc.], cop. 2016 Spis treści Preface xi 1. Introducing C# and the.net Framework 1 Object Orientation 1 Type Safety 2 Memory Management

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Intel Threading Tools

Intel Threading Tools Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Lecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications

Lecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads

More information