Intel Threading Building Blocks (TBB)
|
|
- Marybeth Jackson
- 5 years ago
- Views:
Transcription
1 Intel Threading Building Blocks (TBB) SDSC Summer Institute 2012 Pietro Cicotti Computational Scientist Gordon Applications Team Performance Modeling and Characterization Lab
2 Parallelism and Decomposition TBB vs. Pthreads TBB vs. OpenMP Data parallelism,task parallelism Mixed solutions Pipelining TBB Algorithms, splittable ranges, containers Graph execution model Low level constructs
3 Algorithms parallel_for parallel_reduce parallel_scan parallel_do parallel_for_each pipeline parallel_pipeline parallel_sort parallel_invoke Generic Algorithms
4 parallel_for Parallel loop with no loop carried dependencies for ( size_t i=0; i<n; ++i) func(a[i]); class use_func{ data* a; operator()(range) { for i in range func(a[i]); parallel_for(range(0,n),use_func(a));
5 Splittable ranges Range Half open set [b,e) blocked_range<t> Partitioning split constructor divisibility partitioner class use_func{ data* a; operator()(blocked_range<int> r) { for(i=r.begin(), i!=r.end(); ++I) func(a[i]); parallel_for(blocked_range(0,n,b),use_func(a) );
6 Partitioners Recursive splitting simple_partitioner Always split auto_partitioner Load balance affinity_partitioner Try to preserve mapping data->threads parallel_for(blocked_range(0,n),use_func(a), auto_partitioner());
7 More parallel_for parallel_for<range,func>(range, Func f) parallel_for<idx,func>(idx b, Idx e, Func f) parallel_for<idx,func>(idx b, Idx e, Idx s, Func f) Partitioner Task group
8 parallel_reduce Reduce operations + operator reduce a[0:n-1] n 1 a[i] i=0 class use_func{ data* a; tmp s; operator()(range) { for i in range s+=a[i]; for ( size_t i=0; i<n; ++i) s+=a[i]; join(use_func f) { s+=f.s; use_func f(a); parallel_reduce(range(0,n),f); // result in f.s
9 Split and join Split Divide range for localized reductions Join Combine partial results A 10 A A B C B D 23
10 More parallel_reduce parallel_reduce(range r, Value 0, Func f, Red r) Replaces initial tmp values with 0 Replaces += with operator function f as op= over a range Replaces join using operator function r Partitioners Groups class r { t operator()(t lo, t ro) { return lo+ro; parallel_reduce(range(0,n),0,f(a),r);
11 parallel_scan Parallel prefix operation E.g. y[i]= i x[j] j=0 y[i] depends on y[i-1] y[0]=x[0] y[i]=y[i-1]+x[i] Parallelization Phase 1: compute partial results Phase 2: compute final prefix
12 parallel_scan phases Use tagging type parameter in operators class use_func{ data* x; data* y; data tmp; y[0]=x[0]; for ( size_t i=1; i<n; ++i) y[i] = y[i-1]+x[i]; operator(range r, pre_scan) { for(i=r.begin()+1; i!=r.end(); ++i) tmp+=x[i]; operator(range r, final_scan) { y[r.begin()] = tmp; for(i=r.begin()+1; i!=r.end(); ++i) y[i]+=y[i-1]+x[i]; parallel_scan(range(0,n),use_func(x,y));
13 parallel_scan example Split Divide range and do pre_scan reverse_join Combine partial results final_scan A A A B C B D
14 parallel_do parallel_do(iterator first, Iterator last, Func f) Apply f to iterator f can extend the range parallel_do_feeder::add class use_func{ operator()(data item, parallel_do_feeder pdf) { if(process(item)) pdf.add(new item);
15 Other parallel_* parallel_for_each Like parallel_do, but without feeder parallel_invoke Invoke up to 10 functions parallel_invoke(f0, f1,.. f9); parallel_sort(iterator begin, Iterator end) parallel_sort(iterator begin, Iterator end, Comp c) RandomAccessIterators
16 pipeline pipeline class Concatenated filters add_filter() run(live tokens) Filter class Phase of the pipeline Parallel, serial in and out of order parallel_pipeline class Same functionality, different interface
17 Pipeline example Word dictionary pipeline Read word filter Lower case filter Dictionary update filter class read_f: public filter { operator()(void* ) { b=buffers[next]; next=(next+1)%buffers_count; read(file, b); return b; class lower_f: public filter { operator()(void* b) { tolower(b); return b; class dictionary_f: public filter { operator()(void* b) { dictionary.add(b); return NULL;
18 Concurrent containers Generic containers Maps, sets, queues, vectors Concurrent operations (some) Transparent locking Next concurrent_queue concurrent_vector concurrent_hash_map Others concurrent_unordered_map/set concurrent_bounded/priority_map
19 Concurrent operations push pop_if_present pop size() Signed size type concurrent_queue
20 concurrent_vector Concurrent operations grow_by grow_to_at_least push_back operator[] Range container Has range type member Can be used in conjunction with parallel_for, reduce, scan
21 concurrent_hash_map Concurrent operations find insert erase hashing Hash function defined on keys Equality operator defined on keys Range container Has range type member Can be used in conjunction with parallel_for, reduce, scan
22 Getting started task_scheduler_init Constructor initializes the library Library activated by creating a scheduler object Number of threads is optionally specified Activation can be deferred Termination is automatic when scheduler objects are destructed Can be invoked explicitly Compiling on Gordon /opt/intel/composer_xe_2011_sp /tbb -ltbb
23 Examples /oasis/scratch/pcicotti/temp_project/tbb_si.tgz Extract a copy to a work directory Read examples code Compile and run (Interactive job!) Modify, rerun, experiment number of threads, granularity of ranges, partitioners More examples /opt/intel/composer_xe_2011_sp /tbb/examples
24 Task graph execution model Graph Nodes Functions Data/Communication managers Edges Connectors Dependencies Messages continue_message Data Support construction of DAGs of tasks
25 Thread local storage combinable enumerable Allocators scalable_allocator aligned_allocator cache_aligned_allocator zero_allocator Memory allocation
26 Synchronization Mutex (OS) mutex class recursive_mutex spin_mutex queuing_mutex spin_rw_mutex queuing_rw_mutex Atomic operations fetch_and_store fetch_and_add compare_and_swap Derived operations +=,++
27 Scheduler Task groups Group of tasks executing concurrently Tasks API Create and spawn tasks Define dependencies Wait for tasks Destroy and recycle tasks Set priorities Affinity Native threads API
28 References n.php Installing TBB Download library package Build library and examples Makefile+gnu compilers
Intel Thread Building Blocks, Part II
Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization
More informationParallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai
Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading
More informationIntel Thread Building Blocks
Intel Thread Building Blocks SPD course 2015-16 Massimo Coppola 08/04/2015 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa
More informationIntel Thread Building Blocks
Intel Thread Building Blocks SPD course 2017-18 Massimo Coppola 23/03/2018 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa
More informationIntel Thread Building Blocks, Part II
Intel Thread Building Blocks, Part II SPD course 2014-15 Massimo Coppola 5/05/2015 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization support
More informationTask-based Data Parallel Programming
Task-based Data Parallel Programming Asaf Yaffe Developer Products Division May, 2009 Agenda Overview Data Parallel Algorithms Tasks and Scheduling Synchronization and Concurrent Containers Summary 2 1
More informationIntel Thread Building Blocks, Part IV
Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for
More informationShared memory parallel computing. Intel Threading Building Blocks
Shared memory parallel computing Intel Threading Building Blocks Introduction & history Threading Building Blocks (TBB) cross platform C++ template lib for task-based shared memory parallel programming
More informationTasks and Threads. What? When? Tasks and Threads. Use OpenMP Threading Building Blocks (TBB) Intel Math Kernel Library (MKL)
CGT 581I - Parallel Graphics and Simulation Knights Landing Tasks and Threads Bedrich Benes, Ph.D. Professor Department of Computer Graphics Purdue University Tasks and Threads Use OpenMP Threading Building
More informationIntel(R) Threading Building Blocks
Intel(R) Threading Building Blocks Reference Manual Copyright 2007 Intel Corporation All Rights Reserved Document Number 315415-001US Revision: 1.6 World Wide Web: http://www.intel.com Document Number
More informationTable of Contents. Cilk
Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages
More informationIntel(R) Threading Building Blocks
Intel(R) Threading Building Blocks Reference Manual Copyright 2008 Intel Corporation All Rights Reserved Document Number 315415-001US Revision: 1.10 World Wide Web: http://www.intel.com Intel(R) Threading
More informationIntel Threading Building Blocks
Scalable Programming for Multi-core Software and Solutions Group (SSG), Developer Products Division Arch D. Robison Principal Engineer Overview (Intel TBB) is a C++ library that simplifies threading for
More informationIntel(R) Threading Building Blocks
Intel(R) Threading Building Blocks Reference Manual Document Number 315415-007US World Wide Web: http://www.intel.com Intel(R) Threading Building Blocks Legal Information INFORMATION IN THIS DOCUMENT IS
More informationParallel Programming Models
Parallel Programming Models Intel Cilk Plus Tasking Intel Threading Building Blocks, Copyright 2009, Intel Corporation. All rights reserved. Copyright 2015, 2011, Intel Corporation. All rights reserved.
More informationThreads: either under- or over-utilised
Threads: either under- or over-utilised Underutilised: limited by creation speed of work Cannot exploit all the CPUs even though there is more work Overutilised: losing performance due to context switches
More informationParallelization on Multi-Core CPUs
1 / 30 Amdahl s Law suppose we parallelize an algorithm using n cores and p is the proportion of the task that can be parallelized (1 p cannot be parallelized) the speedup of the algorithm is assuming
More informationKlaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation
S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Models Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Overview Execution options
More informationBuilding High Performance Threaded Applications using Libraries or Why you don t need a parallel compiler. Jim Cownie Principal Engineer
Building High Performance Threaded Applications using Libraries or Why you don t need a parallel compiler Jim Cownie Principal Engineer Outline Overview of Intel Threading Building Blocks (Intel TBB) Problems
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationQuestions from last time
Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationMarco Danelutto. May 2011, Pisa
Marco Danelutto Dept. of Computer Science, University of Pisa, Italy May 2011, Pisa Contents 1 2 3 4 5 6 7 Parallel computing The problem Solve a problem using n w processing resources Obtaining a (close
More informationRama Malladi. Application Engineer. Software & Services Group. PDF created with pdffactory Pro trial version
Threaded Programming Methodology Rama Malladi Application Engineer Software & Services Group Objectives After completion of this module you will Learn how to use Intel Software Development Products for
More informationGuillimin HPC Users Meeting January 13, 2017
Guillimin HPC Users Meeting January 13, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit
More informationA C++ Library Solution To Parallelism
A C++ Library Solution To Parallelism Alexey Kukanov (Intel) Artur Laksberg (Microsoft) Arch Robison (Intel) Abstract This white paper describes the common subset of two C++ programming libraries: the
More informationChapter 4: Multi-Threaded Programming
Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading
More informationSemaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }
Semaphore Semaphore S integer variable Two standard operations modify S: wait() and signal() Originally called P() and V() Can only be accessed via two indivisible (atomic) operations wait (S) { while
More informationShared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools
Shared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools Bart lomiej Jacek Kubica 1 1 Warsaw University of Technology, Institute of Control and Computation Engineering,
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More information1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING
1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING TASK-PARALLELISM OpenCL, CUDA, OpenMP (traditionally) and the like are largely data-parallel models Their core unit of parallelism
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationParallel Programming Principle and Practice Lecture 7
Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Outline Intel Threading Building Blocks Task-based programming Task Scheduler Scalable Memory Allocators Concurrent Containers
More informationIntel Parallel Studio
Intel Parallel Studio Product Brief Intel Parallel Studio Parallelism for your Development Lifecycle Intel Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application
More informationZKI-Tagung Supercomputing, Okt Parallel Patterns for Composing and Nesting Parallelism in HPC Applications
ZKI-Tagung Supercomputing, 10-11. Okt. 2013 Parallel Patterns for Composing and Nesting Parallelism in HPC Applications Hans Pabst Software and Services Group Intel Corporation Agenda Introduction Parallel
More informationPOSIX Threads and OpenMP tasks
POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i
More informationChap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1
Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You
More informationShared-Memory Programming Models
Shared-Memory Programming Models Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube Cilk C language combined with several new keywords Different approach to OpenMP
More informationOperating Systems 2 nd semester 2016/2017. Chapter 4: Threads
Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationChapter 4: Threads. Chapter 4: Threads
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationEfficient Work Stealing for Fine-Grained Parallelism
Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationCSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing
CSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing Final Exam General Information The final exam will be held Monday, Dec 21st, 2009, 11:30am-2:30pm, DCC 308. A makeup
More informationIntel Threading Building Blocks
Tutorial Copyright 2006 2008 Intel Corporation All Rights Reserved Document Number 319872-001US Revision: 1.11 World Wide Web: http://www.intel.com Disclaimer and Legal Information INFORMATION IN THIS
More informationChapter 4: Threads. Operating System Concepts 9 th Edit9on
Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationApplication Programming
Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationProblem. Context. Hash table
Problem In many problems, it is natural to use Hash table as their data structures. How can the hash table be efficiently accessed among multiple units of execution (UEs)? Context Hash table is used when
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationCS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009.
CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont. Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following
More informationCopyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 6 Parallel Program Development 1 Roadmap Solving non-trivial problems. The n-body problem. The traveling salesman problem. Applying Foster
More informationParallel Programming Languages COMP360
Parallel Programming Languages COMP360 The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight,
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve OPENMP Standard multicore API for scientific computing Based on fork-join model: fork many threads, join and resume sequential thread Uses pragma:#pragma omp parallel Shared/private
More informationAmdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP
AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation
More informationIntel Tools zur parallelen Programmierung Windows HPC RWTH Aachen 2007
Intel Tools zur parallelen Programmierung Windows HPC RWTH Aachen 2007 Dr. Mario Deilmann Intel Compiler Group Processor Evolution X86 New Quad-Core Intel Xeon 5300 for 2006 Dual-Core Intel Xeon processor
More informationMotivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4
Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationDiscussion CSE 224. Week 4
Discussion CSE 224 Week 4 Midterm The midterm will cover - 1. Topics discussed in lecture 2. Research papers from the homeworks 3. Textbook readings from Unit 1 and Unit 2 HW 3&4 Clarifications 1. The
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationScheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok
Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation
More informationFractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures
Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin
More informationShared-Memory Programming
Shared-Memory Programming 1. Threads 2. Mutual Exclusion 3. Thread Scheduling 4. Thread Interfaces 4.1. POSIX Threads 4.2. C++ Threads 4.3. OpenMP 4.4. Threading Building Blocks 5. Side Effects of Hardware
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationAn Introduction to Parallel Programming
F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO
More informationOn the cost of managing data flow dependencies
On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work
More informationConcurrent Data Structures in C++ CSInParallel Project
Concurrent Data Structures in C++ CSInParallel Project July 26, 2012 CONTENTS 1 Concurrent Data Structures in C++: Web crawler lab 1 1.1 Your goals................................................ 1 1.2
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationJoe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago.
Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago email: joe@joehummel.net stuff: http://www.joehummel.net/downloads.html Async programming:
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationCS333 Intro to Operating Systems. Jonathan Walpole
CS333 Intro to Operating Systems Jonathan Walpole Threads & Concurrency 2 Threads Processes have the following components: - an address space - a collection of operating system state - a CPU context or
More informationWar Stories : Graph Algorithms in GPUs
SAND2014-18323PE War Stories : Graph Algorithms in GPUs Siva Rajamanickam(SNL) George Slota, Kamesh Madduri (PSU) FASTMath Meeting Exceptional service in the national interest is a multi-program laboratory
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level
More informationMULTI-THREADED QUERIES
15-721 Project 3 Final Presentation MULTI-THREADED QUERIES Wendong Li (wendongl) Lu Zhang (lzhang3) Rui Wang (ruiw1) Project Objective Intra-operator parallelism Use multiple threads in a single executor
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationIntel Threading Building Blocks
Intel Threading Building Blocks Tutorial Document Number 319872-008US World Wide Web: http://www.intel.com Intel Threading Building Blocks Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationC# 6.0 in a nutshell / Joseph Albahari & Ben Albahari. 6th ed. Beijin [etc.], cop Spis treści
C# 6.0 in a nutshell / Joseph Albahari & Ben Albahari. 6th ed. Beijin [etc.], cop. 2016 Spis treści Preface xi 1. Introducing C# and the.net Framework 1 Object Orientation 1 Type Safety 2 Memory Management
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationIntel Threading Tools
Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,
More informationEffective Performance Measurement and Analysis of Multithreaded Applications
Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined
More informationCS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University
CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution
More informationMPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0
More informationChapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne
Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationEfficiently Introduce Threading using Intel TBB
Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++
More informationLecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications
Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads
More information