Bottom-Up Multicore Programming
|
|
- Barnaby Edwards
- 5 years ago
- Views:
Transcription
1 Bottom-Up Multicore Programming 1
2 Bottom-Up Approach Inspiration 2
3 Bottom-Up Approach Inspiration Robert Lee Moore Mathematician Penn ( ) U. of Texas ( ) Developed Moore Method or Texas Method A unique style of teaching mathematics Start with axioms, have class build up the framework during the semester Looking ahead forbidden 2
4 Bottom-Up Approach Inspiration Robert Lee Moore Mathematician Penn ( ) U. of Texas ( ) Developed Moore Method or Texas Method A unique style of teaching mathematics Start with axioms, have class build up the framework during the semester Looking ahead forbidden Yale Patt Computer Architect Renowned teacher and researcher Bits and Bytes to C and Beyond textbook 2
5 Bottom-up Multicore Programming 3
6 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives 3
7 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition 3
8 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition Discarded alternative #1: top-down Start with high-level parallelism libraries and languages As black boxes Then try to explain how they work And their odd performance characteristics 3
9 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition Discarded alternative #1: top-down Start with high-level parallelism libraries and languages As black boxes Then try to explain how they work And their odd performance characteristics Discard alternative #2: top-up 3
10 What Bottom-Up Means for You 4
11 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. 4
12 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start 4
13 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start Less spoon-feeding, more self-learning 4
14 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start Less spoon-feeding, more self-learning Best way I know how to teach this course 4
15 Simple Parallel Work Decomposition 5
16 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6
17 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6
18 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) 6
19 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6
20 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7
21 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7
22 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return 7
23 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7
24 Coarse-Grain Dynamic Work Distribution 8
25 Coarse-Grain Dynamic Work Distribution 8
26 Coarse-Grain Dynamic Work Distribution Parallel code, for each thread: int num_segments = SIZE / GRAIN_SIZE int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= num_segments) return my_start = i * GRAIN_SIZE my_end = my_start + GRAIN_SIZE for (int j=my_start; j<my_end; j++) calculate(j,,, ) 8
27 Coarse-Grain Dynamic Work Distribution Parallel code, for each thread: int num_segments = SIZE / GRAIN_SIZE int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= num_segments) return my_start = i * GRAIN_SIZE my_end = my_start + GRAIN_SIZE for (int j=my_start; j<my_end; j++) calculate(j,,, ) Dynamic load balancing with lower (adjustable) overhead 8
CS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization
CS758 Synchronization 1 Shared Memory Primitives CS758 Multicore Programming (Wood) Performance Tuning 2 Shared Memory Primitives Create thread Ask operating system to create a new thread Threads starts
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationIntroduction to Parallel Programming For Real-Time Graphics (CPU + GPU)
Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Aaron Lefohn, Intel / University of Washington Mike Houston, AMD / Stanford 1 What s In This Talk? Overview of parallel programming
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More information1001ICT Introduction To Programming Lecture Notes
1001ICT Introduction To Programming Lecture Notes School of Information and Communication Technology Griffith University Semester 2, 2015 1 2 Elements of Java Java is a popular, modern, third generation
More informationLecture 1: Introduction and Basics
CS 515 Programming Language and Compilers I Lecture 1: Introduction and Basics Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/5/2017 Class Information Instructor: Zheng (Eddy) Zhang Email: eddyzhengzhang@gmailcom
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationCSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on
CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on I- Ting Angelina Lee Jan 13, 2015 Technology Scaling 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 u Transistors x 1000 Clock
More informationTowards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.
Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing
More informationProgramming at Scale: Concurrency
Programming at Scale: Concurrency 1 Goal: Building Fast, Scalable Software How do we speed up software? 2 What is scalability? A system is scalable if it can easily adapt to increased (or reduced) demand
More informationTest 1: CPS 100. Owen Astrachan. October 1, 2004
Test 1: CPS 100 Owen Astrachan October 1, 2004 Name: Login: Honor code acknowledgment (signature) Problem 1 value 20 pts. grade Problem 2 30 pts. Problem 3 20 pts. Problem 4 20 pts. TOTAL: 90 pts. This
More informationParallel Programming for Graphics
Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models
More informationParallel Hybrid Computing Stéphane Bihan, CAPS
Parallel Hybrid Computing Stéphane Bihan, CAPS Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous hardware
More informationJava Tutorial. Saarland University. Ashkan Taslimi. Tutorial 3 September 6, 2011
Java Tutorial Ashkan Taslimi Saarland University Tutorial 3 September 6, 2011 1 Outline Tutorial 2 Review Access Level Modifiers Methods Selection Statements 2 Review Programming Style and Documentation
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationParallel Languages: Past, Present and Future
Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One
More informationTeaching Parallel Programming using Java
Teaching Parallel Programming using Java Aamir Shafi 1, Aleem Akhtar 1, Ansar Javed 1, and Bryan Carpenter 2 Presenter: Faizan Zahid 1 1 National University of Sciences & Technology (NUST), Pakistan 2
More informationChapel Introduction and
Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple
More informationCS 112 Introduction to Programming
CS 112 Introduction to Programming Java Primitive Data Types; Arithmetic Expressions Yang (Richard) Yang Computer Science Department Yale University 308A Watson, Phone: 432-6400 Email: yry@cs.yale.edu
More informationAdmin. CS 112 Introduction to Programming. Recap: Java Static Methods. Recap: Decomposition Example. Recap: Static Method Example
Admin CS 112 Introduction to Programming q Programming assignment 2 to be posted tonight Java Primitive Data Types; Arithmetic Expressions Yang (Richard) Yang Computer Science Department Yale University
More informationExercise: OpenMP Programming
Exercise: OpenMP Programming Multicore programming with OpenMP 19.04.2016 A. Marongiu - amarongiu@iis.ee.ethz.ch D. Palossi dpalossi@iis.ee.ethz.ch ETH zürich Odroid Board Board Specs Exynos5 Octa Cortex
More informationCS 112 Introduction to Programming
CS 112 Introduction to Programming (Spring 2012) Lecture #5: Conditionals and Loops Zhong Shao Department of Computer Science Yale University Office: 314 Watson http://flint.cs.yale.edu/cs112 Acknowledgements:
More informationCSE 613: Parallel Programming Department of Computer Science SUNY Stony Brook Spring 2015
CSE 613: Parallel Programming Department of Computer Science SUNY Stony Brook Spring 2015 We used to joke that parallel computing is the future, and always will be, but the pessimists have been proven
More informationAdministrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting
CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon
More informationFull file at
Java Programming, Fifth Edition 2-1 Chapter 2 Using Data within a Program At a Glance Instructor s Manual Table of Contents Overview Objectives Teaching Tips Quick Quizzes Class Discussion Topics Additional
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationChapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. A Guide to this Instructor s Manual:
Java Programming, Eighth Edition 2-1 Chapter 2 Using Data A Guide to this Instructor s Manual: We have designed this Instructor s Manual to supplement and enhance your teaching experience through classroom
More informationCompsci 590.3: Introduction to Parallel Computing
Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due
More informationChapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. Overview. Objectives. Teaching Tips. Quick Quizzes. Class Discussion Topics
Java Programming, Sixth Edition 2-1 Chapter 2 Using Data At a Glance Instructor s Manual Table of Contents Overview Objectives Teaching Tips Quick Quizzes Class Discussion Topics Additional Projects Additional
More informationCLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level
CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION
More informationProgramming Lecture 3
Programming Lecture 3 Expressions (Chapter 3) Primitive types Aside: Context Free Grammars Constants, variables Identifiers Variable declarations Arithmetic expressions Operator precedence Assignment statements
More informationArchitecture, Programming and Performance of MIC Phi Coprocessor
Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics
More informationTable of Contents. Cilk
Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy of
More informationparallel Parallel R ANF R Vincent Miele CNRS 07/10/2015
Parallel R ANF R Vincent Miele CNRS 07/10/2015 Thinking Plan Thinking Context Principles Traditional paradigms and languages Parallel R - the foundations embarrassingly computations in R the snow heritage
More informationParallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai
Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming Copyright 2010 Daniel J. Sorin Duke University Slides are derived from work by Sarita Adve (Illinois),
More informationReview. Primitive Data Types & Variables. String Mathematical operators: + - * / % Comparison: < > <= >= == int, long float, double boolean char
Review Primitive Data Types & Variables int, long float, double boolean char String Mathematical operators: + - * / % Comparison: < > = == 1 1.3 Conditionals and Loops Introduction to Programming in
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More informationParallel Computer Architecture and Programming Written Assignment 3
Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the
More informationJoe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago.
Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago email: joe@joehummel.net stuff: http://www.joehummel.net/downloads.html Async programming:
More informationParallel Programming Languages COMP360
Parallel Programming Languages COMP360 The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight,
More informationMechanized Operational Semantics
Mechanized Operational Semantics J Strother Moore Department of Computer Sciences University of Texas at Austin Marktoberdorf Summer School 2008 (Lecture 2: An Operational Semantics) 1 M1 An M1 state consists
More informationParallella: A $99 Open Hardware Parallel Computing Platform
Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA Adapteva Achieves 3 World Firsts 1. First
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationCase Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks
4/8/2012 2:44:00 PM Case Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks Recently I have been working closely with DreamWorks Animation engineers to improve the performance
More information15-210: Parallelism in the Real World
: Parallelism in the Real World Types of paralellism Parallel Thinking Nested Parallelism Examples (Cilk, OpenMP, Java Fork/Join) Concurrency Page1 Cray-1 (1976): the world s most expensive love seat 2
More informationCSE 230. Concurrency: STM. Slides due to: Kathleen Fisher, Simon Peyton Jones, Satnam Singh, Don Stewart
CSE 230 Concurrency: STM Slides due to: Kathleen Fisher, Simon Peyton Jones, Satnam Singh, Don Stewart The Grand Challenge How to properly use multi-cores? Need new programming models! Parallelism vs Concurrency
More informationConcurrency: what, why, how
Concurrency: what, why, how May 28, 2009 1 / 33 Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and
More informationIntroduction to Multithreaded Algorithms
Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationCS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:
Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial
More informationHomework #3. CS 318/418/618, Fall Handout: 10/09/2017
CS 318/418/618, Fall 2017 Homework #3 Handout: 10/09/2017 1. Microsoft.NET provides a synchronization primitive called a CountdownEvent. Programs use CountdownEvent to synchronize on the completion of
More informationCopperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19
Copperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19 Motivation Not everyone wants to write C++ or Fortran How can we broaden the use of parallel computing? All of us want higher programmer
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationTHE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430. Parallel Systems
THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430 Parallel Systems Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable Calculator This
More informationCOSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek
COSC 462 Parallel Algorithms The Design Basics Piotr Luszczek September 18, 2017 1/16 Levels of Abstraction 2/16 Concepts Tools Algorithms: partitioning, communication, agglomeration, mapping Domain, channel,
More informationTHE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD
THE PROGRAMMER S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU
More informationProgramming Parallel Computers
ICS-E4020 Programming Parallel Computers Jukka Suomela Jaakko Lehtinen Samuli Laine Aalto University Spring 2016 users.ics.aalto.fi/suomela/ppc-2016/ New code must be parallel! otherwise a computer from
More informationRecap: Assignment as an Operator CS 112 Introduction to Programming
Recap: Assignment as an Operator CS 112 Introduction to Programming q You can consider assignment as an operator, with a (Spring 2012) lower precedence than the arithmetic operators First the expression
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationConcurrency and Parallel Computing. Ric Glassey
Concurrency and Parallel Computing Ric Glassey glassey@kth.se Outline Concurrency and Parallel Computing Aim: Introduce the paradigms of concurrency and parallel computing and distinguish between them
More informationPart 1 Fine-grained Operations
Part 1 Fine-grained Operations As we learned on Monday, CMPXCHG can be used to implement other primitives, such as TestAndSet. int CMPXCHG (int* loc, int oldval, int newval) { ATOMIC(); int old_reg_val
More informationEfficient Work Stealing for Fine-Grained Parallelism
Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n
More informationPreface A Brief History Pilot Test Results
Preface A Brief History In Fall, 2005, Wanda Dann and Steve Cooper, originators of the Alice approach for introductory programming (in collaboration with Randy Pausch), met with Barb Ericson and Mark Guzdial,
More informationCMSC 132: Object-Oriented Programming II
CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java
More informationThreaded Programming. Lecture 9: Alternatives to OpenMP
Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming
More informationParallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming
Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department Philippas Tsigas
More informationG52CON: Concepts of Concurrency
G52CON: Concepts of Concurrency Lecture 4: Atomic Actions Natasha Alechina School of Computer Science nza@cs.nott.ac.uk Outline of the lecture process execution fine-grained atomic actions using fine-grained
More informationIntel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel
Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Which performance analysis tool should I use first? Intel Application
More informationIT101. File Input and Output
IT101 File Input and Output IO Streams A stream is a communication channel that a program has with the outside world. It is used to transfer data items in succession. An Input/Output (I/O) Stream represents
More informationCS560 Lecture Parallel Architecture 1
Parallel Architecture Announcements The RamCT merge is done! Please repost introductions. Manaf s office hours HW0 is due tomorrow night, please try RamCT submission HW1 has been posted Today Isoefficiency
More informationCS 112 Introduction to Programming
CS 112 Introduction to Programming (Spring 2012) Lecture #7: Variable Scope, Constants, and Loops Zhong Shao Department of Computer Science Yale University Office: 314 Watson http://flint.cs.yale.edu/cs112
More informationAt the End of the Last Decade The Next Decade in Compiler Technology
At the End of the Last Decade The Next Decade in Compiler Technology An Industry Perspective Marcel Beemster marcel@ace.nl ACE Associated Compiler Experts The Customer s Toolset: Much Like this Car In
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationAgenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks
Agenda Lecture Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Next discussion papers Selecting Locking Primitives for Parallel Programming Selecting Locking
More informationChapter 2: Data and Expressions
Chapter 2: Data and Expressions CS 121 Department of Computer Science College of Engineering Boise State University April 21, 2015 Chapter 2: Data and Expressions CS 121 1 / 53 Chapter 2 Part 1: Data Types
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware!! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More informationMarwan Burelle. Parallel and Concurrent Programming
marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 3 OpenMP Tell Me More (Go, OpenCL,... ) Overview 1 Sharing Data First, always try to apply the following mantra: Don t share
More informationThe Mother of All Chapel Talks
The Mother of All Chapel Talks Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Lecture Structure 1. Programming Models Landscape 2. Chapel Motivating Themes 3. Chapel Language Features 4. Project Status
More informationThe Problem with Threads
The Problem with Threads Author Edward A Lee Presented by - Varun Notibala Dept of Computer & Information Sciences University of Delaware Threads Thread : single sequential flow of control Model for concurrent
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationOperating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet
Due date: Monday, 4/06/2012 12:30 noon Teaching assistants in charge: Operating Systems (234123) Spring-2012 Homework 3 Wet Anastasia Braginsky All emails regarding this assignment should be sent only
More informationConcurrency: what, why, how
Concurrency: what, why, how Oleg Batrashev February 10, 2014 what this course is about? additional experience with try out some language concepts and techniques Grading java.util.concurrent.future? agents,
More informationCSE 590o: Chapel. Brad Chamberlain Steve Deitz Chapel Team. University of Washington September 26, 2007
CSE 590o: Chapel Brad Chamberlain Steve Deitz Chapel Team University of Washington September 26, 2007 Outline Context for Chapel This Seminar Chapel Compiler CSE 590o: Chapel (2) Chapel Chapel: a new parallel
More informationAccelerating Financial Applications on the GPU
Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationCS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/
CS61C Machine Structures Lecture 4 C Pointers and Arrays 1/25/2006 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ CS 61C L04 C Pointers (1) Common C Error There is a difference
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationGambler s Ruin Lesson Plan
Gambler s Ruin Lesson Plan Ron Bannon August 11, 05 1 Gambler s Ruin Lesson Plan 1 Printed August 11, 05 Prof. Kim s assigned lesson plan based on the Gambler s Ruin Problem. Preamble: This lesson plan
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 7: Synchronization Administrivia Homework 1 Due today by the end of day Hopefully you have started on project 1 by now? Kernel-level threads (preemptable
More informationCSci 4061 Introduction to Operating Systems. (Thread-Basics)
CSci 4061 Introduction to Operating Systems (Thread-Basics) Threads Abstraction: for an executing instruction stream Threads exist within a process and share its resources (i.e. memory) But, thread has
More information