Bottom-Up Multicore Programming

Size: px
Start display at page:

Download "Bottom-Up Multicore Programming"

Transcription

1 Bottom-Up Multicore Programming 1

2 Bottom-Up Approach Inspiration 2

3 Bottom-Up Approach Inspiration Robert Lee Moore Mathematician Penn ( ) U. of Texas ( ) Developed Moore Method or Texas Method A unique style of teaching mathematics Start with axioms, have class build up the framework during the semester Looking ahead forbidden 2

4 Bottom-Up Approach Inspiration Robert Lee Moore Mathematician Penn ( ) U. of Texas ( ) Developed Moore Method or Texas Method A unique style of teaching mathematics Start with axioms, have class build up the framework during the semester Looking ahead forbidden Yale Patt Computer Architect Renowned teacher and researcher Bits and Bytes to C and Beyond textbook 2

5 Bottom-up Multicore Programming 3

6 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives 3

7 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition 3

8 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition Discarded alternative #1: top-down Start with high-level parallelism libraries and languages As black boxes Then try to explain how they work And their odd performance characteristics 3

9 Bottom-up Multicore Programming We re going to follow a bottom-up approach We re going to start with the lowest-level primitives Build up high abstractions using these primitives Why? No-magic understanding Gain performance intuition Discarded alternative #1: top-down Start with high-level parallelism libraries and languages As black boxes Then try to explain how they work And their odd performance characteristics Discard alternative #2: top-up 3

10 What Bottom-Up Means for You 4

11 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. 4

12 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start 4

13 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start Less spoon-feeding, more self-learning 4

14 What Bottom-Up Means for You Need to forget (for now) some things Put aside any thoughts of higher-level constructs Forget anything you might know of TBB, Cilk, OpenMP, Blocks, Grand Central Dispatch, Ct, Java JSR-166, ZPL, NESL, Fortress, X10, Chapel, StreamIT, Brook, CUDA, OpenCL, memory consistency, etc. Need to actively engage with material Engage in class discussion, in-class exercises Spend time outside of class on material Embrace the homework assignments More gritty, more effort at the start Less spoon-feeding, more self-learning Best way I know how to teach this course 4

15 Simple Parallel Work Decomposition 5

16 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6

17 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6

18 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) 6

19 Static Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: void each_thread(int thread_id): segment_size = SIZE / number_of_threads assert(size % number_of_threads == 0) my_start = thread_id * segment_size my_end = my_start + segment_size for (int i=my_start; i<my_end; i++) Hey, its a parallel program! 6

20 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7

21 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7

22 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return 7

23 Dynamic Work Distribution Sequential code for (int i=0; i<size; i++): Parallel code, for each thread: int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= SIZE) return Dynamic load balancing, but high overhead 7

24 Coarse-Grain Dynamic Work Distribution 8

25 Coarse-Grain Dynamic Work Distribution 8

26 Coarse-Grain Dynamic Work Distribution Parallel code, for each thread: int num_segments = SIZE / GRAIN_SIZE int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= num_segments) return my_start = i * GRAIN_SIZE my_end = my_start + GRAIN_SIZE for (int j=my_start; j<my_end; j++) calculate(j,,, ) 8

27 Coarse-Grain Dynamic Work Distribution Parallel code, for each thread: int num_segments = SIZE / GRAIN_SIZE int counter = 0 // global variable void each_thread(int thread_id): while (true) int i = atomic_add(&counter, 1) if (i >= num_segments) return my_start = i * GRAIN_SIZE my_end = my_start + GRAIN_SIZE for (int j=my_start; j<my_end; j++) calculate(j,,, ) Dynamic load balancing with lower (adjustable) overhead 8

CS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization

CS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization CS758 Synchronization 1 Shared Memory Primitives CS758 Multicore Programming (Wood) Performance Tuning 2 Shared Memory Primitives Create thread Ask operating system to create a new thread Threads starts

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU)

Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Aaron Lefohn, Intel / University of Washington Mike Houston, AMD / Stanford 1 What s In This Talk? Overview of parallel programming

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

1001ICT Introduction To Programming Lecture Notes

1001ICT Introduction To Programming Lecture Notes 1001ICT Introduction To Programming Lecture Notes School of Information and Communication Technology Griffith University Semester 2, 2015 1 2 Elements of Java Java is a popular, modern, third generation

More information

Lecture 1: Introduction and Basics

Lecture 1: Introduction and Basics CS 515 Programming Language and Compilers I Lecture 1: Introduction and Basics Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/5/2017 Class Information Instructor: Zheng (Eddy) Zhang Email: eddyzhengzhang@gmailcom

More information

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows

More information

CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on

CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on I- Ting Angelina Lee Jan 13, 2015 Technology Scaling 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 u Transistors x 1000 Clock

More information

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc. Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing

More information

Programming at Scale: Concurrency

Programming at Scale: Concurrency Programming at Scale: Concurrency 1 Goal: Building Fast, Scalable Software How do we speed up software? 2 What is scalability? A system is scalable if it can easily adapt to increased (or reduced) demand

More information

Test 1: CPS 100. Owen Astrachan. October 1, 2004

Test 1: CPS 100. Owen Astrachan. October 1, 2004 Test 1: CPS 100 Owen Astrachan October 1, 2004 Name: Login: Honor code acknowledgment (signature) Problem 1 value 20 pts. grade Problem 2 30 pts. Problem 3 20 pts. Problem 4 20 pts. TOTAL: 90 pts. This

More information

Parallel Programming for Graphics

Parallel Programming for Graphics Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models

More information

Parallel Hybrid Computing Stéphane Bihan, CAPS

Parallel Hybrid Computing Stéphane Bihan, CAPS Parallel Hybrid Computing Stéphane Bihan, CAPS Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous hardware

More information

Java Tutorial. Saarland University. Ashkan Taslimi. Tutorial 3 September 6, 2011

Java Tutorial. Saarland University. Ashkan Taslimi. Tutorial 3 September 6, 2011 Java Tutorial Ashkan Taslimi Saarland University Tutorial 3 September 6, 2011 1 Outline Tutorial 2 Review Access Level Modifiers Methods Selection Statements 2 Review Programming Style and Documentation

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

Parallel Languages: Past, Present and Future

Parallel Languages: Past, Present and Future Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One

More information

Teaching Parallel Programming using Java

Teaching Parallel Programming using Java Teaching Parallel Programming using Java Aamir Shafi 1, Aleem Akhtar 1, Ansar Javed 1, and Bryan Carpenter 2 Presenter: Faizan Zahid 1 1 National University of Sciences & Technology (NUST), Pakistan 2

More information

Chapel Introduction and

Chapel Introduction and Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple

More information

CS 112 Introduction to Programming

CS 112 Introduction to Programming CS 112 Introduction to Programming Java Primitive Data Types; Arithmetic Expressions Yang (Richard) Yang Computer Science Department Yale University 308A Watson, Phone: 432-6400 Email: yry@cs.yale.edu

More information

Admin. CS 112 Introduction to Programming. Recap: Java Static Methods. Recap: Decomposition Example. Recap: Static Method Example

Admin. CS 112 Introduction to Programming. Recap: Java Static Methods. Recap: Decomposition Example. Recap: Static Method Example Admin CS 112 Introduction to Programming q Programming assignment 2 to be posted tonight Java Primitive Data Types; Arithmetic Expressions Yang (Richard) Yang Computer Science Department Yale University

More information

Exercise: OpenMP Programming

Exercise: OpenMP Programming Exercise: OpenMP Programming Multicore programming with OpenMP 19.04.2016 A. Marongiu - amarongiu@iis.ee.ethz.ch D. Palossi dpalossi@iis.ee.ethz.ch ETH zürich Odroid Board Board Specs Exynos5 Octa Cortex

More information

CS 112 Introduction to Programming

CS 112 Introduction to Programming CS 112 Introduction to Programming (Spring 2012) Lecture #5: Conditionals and Loops Zhong Shao Department of Computer Science Yale University Office: 314 Watson http://flint.cs.yale.edu/cs112 Acknowledgements:

More information

CSE 613: Parallel Programming Department of Computer Science SUNY Stony Brook Spring 2015

CSE 613: Parallel Programming Department of Computer Science SUNY Stony Brook Spring 2015 CSE 613: Parallel Programming Department of Computer Science SUNY Stony Brook Spring 2015 We used to joke that parallel computing is the future, and always will be, but the pessimists have been proven

More information

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon

More information

Full file at

Full file at Java Programming, Fifth Edition 2-1 Chapter 2 Using Data within a Program At a Glance Instructor s Manual Table of Contents Overview Objectives Teaching Tips Quick Quizzes Class Discussion Topics Additional

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Chapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. A Guide to this Instructor s Manual:

Chapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. A Guide to this Instructor s Manual: Java Programming, Eighth Edition 2-1 Chapter 2 Using Data A Guide to this Instructor s Manual: We have designed this Instructor s Manual to supplement and enhance your teaching experience through classroom

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Chapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. Overview. Objectives. Teaching Tips. Quick Quizzes. Class Discussion Topics

Chapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. Overview. Objectives. Teaching Tips. Quick Quizzes. Class Discussion Topics Java Programming, Sixth Edition 2-1 Chapter 2 Using Data At a Glance Instructor s Manual Table of Contents Overview Objectives Teaching Tips Quick Quizzes Class Discussion Topics Additional Projects Additional

More information

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION

More information

Programming Lecture 3

Programming Lecture 3 Programming Lecture 3 Expressions (Chapter 3) Primitive types Aside: Context Free Grammars Constants, variables Identifiers Variable declarations Arithmetic expressions Operator precedence Assignment statements

More information

Architecture, Programming and Performance of MIC Phi Coprocessor

Architecture, Programming and Performance of MIC Phi Coprocessor Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

A General Discussion on! Parallelism!

A General Discussion on! Parallelism! Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy of

More information

parallel Parallel R ANF R Vincent Miele CNRS 07/10/2015

parallel Parallel R ANF R Vincent Miele CNRS 07/10/2015 Parallel R ANF R Vincent Miele CNRS 07/10/2015 Thinking Plan Thinking Context Principles Traditional paradigms and languages Parallel R - the foundations embarrassingly computations in R the snow heritage

More information

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming Copyright 2010 Daniel J. Sorin Duke University Slides are derived from work by Sarita Adve (Illinois),

More information

Review. Primitive Data Types & Variables. String Mathematical operators: + - * / % Comparison: < > <= >= == int, long float, double boolean char

Review. Primitive Data Types & Variables. String Mathematical operators: + - * / % Comparison: < > <= >= == int, long float, double boolean char Review Primitive Data Types & Variables int, long float, double boolean char String Mathematical operators: + - * / % Comparison: < > = == 1 1.3 Conditionals and Loops Introduction to Programming in

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1 Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method

More information

Parallel Computer Architecture and Programming Written Assignment 3

Parallel Computer Architecture and Programming Written Assignment 3 Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the

More information

Joe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago.

Joe Hummel, PhD. Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago. Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago email: joe@joehummel.net stuff: http://www.joehummel.net/downloads.html Async programming:

More information

Parallel Programming Languages COMP360

Parallel Programming Languages COMP360 Parallel Programming Languages COMP360 The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight,

More information

Mechanized Operational Semantics

Mechanized Operational Semantics Mechanized Operational Semantics J Strother Moore Department of Computer Sciences University of Texas at Austin Marktoberdorf Summer School 2008 (Lecture 2: An Operational Semantics) 1 M1 An M1 state consists

More information

Parallella: A $99 Open Hardware Parallel Computing Platform

Parallella: A $99 Open Hardware Parallel Computing Platform Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA Adapteva Achieves 3 World Firsts 1. First

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Case Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks

Case Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks 4/8/2012 2:44:00 PM Case Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks Recently I have been working closely with DreamWorks Animation engineers to improve the performance

More information

15-210: Parallelism in the Real World

15-210: Parallelism in the Real World : Parallelism in the Real World Types of paralellism Parallel Thinking Nested Parallelism Examples (Cilk, OpenMP, Java Fork/Join) Concurrency Page1 Cray-1 (1976): the world s most expensive love seat 2

More information

CSE 230. Concurrency: STM. Slides due to: Kathleen Fisher, Simon Peyton Jones, Satnam Singh, Don Stewart

CSE 230. Concurrency: STM. Slides due to: Kathleen Fisher, Simon Peyton Jones, Satnam Singh, Don Stewart CSE 230 Concurrency: STM Slides due to: Kathleen Fisher, Simon Peyton Jones, Satnam Singh, Don Stewart The Grand Challenge How to properly use multi-cores? Need new programming models! Parallelism vs Concurrency

More information

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how May 28, 2009 1 / 33 Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction: Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial

More information

Homework #3. CS 318/418/618, Fall Handout: 10/09/2017

Homework #3. CS 318/418/618, Fall Handout: 10/09/2017 CS 318/418/618, Fall 2017 Homework #3 Handout: 10/09/2017 1. Microsoft.NET provides a synchronization primitive called a CountdownEvent. Programs use CountdownEvent to synchronize on the completion of

More information

Copperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19

Copperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19 Copperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19 Motivation Not everyone wants to write C++ or Fortran How can we broaden the use of parallel computing? All of us want higher programmer

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430. Parallel Systems

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430. Parallel Systems THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430 Parallel Systems Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable Calculator This

More information

COSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek

COSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek COSC 462 Parallel Algorithms The Design Basics Piotr Luszczek September 18, 2017 1/16 Levels of Abstraction 2/16 Concepts Tools Algorithms: partitioning, communication, agglomeration, mapping Domain, channel,

More information

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD THE PROGRAMMER S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU

More information

Programming Parallel Computers

Programming Parallel Computers ICS-E4020 Programming Parallel Computers Jukka Suomela Jaakko Lehtinen Samuli Laine Aalto University Spring 2016 users.ics.aalto.fi/suomela/ppc-2016/ New code must be parallel! otherwise a computer from

More information

Recap: Assignment as an Operator CS 112 Introduction to Programming

Recap: Assignment as an Operator CS 112 Introduction to Programming Recap: Assignment as an Operator CS 112 Introduction to Programming q You can consider assignment as an operator, with a (Spring 2012) lower precedence than the arithmetic operators First the expression

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Concurrency and Parallel Computing. Ric Glassey

Concurrency and Parallel Computing. Ric Glassey Concurrency and Parallel Computing Ric Glassey glassey@kth.se Outline Concurrency and Parallel Computing Aim: Introduce the paradigms of concurrency and parallel computing and distinguish between them

More information

Part 1 Fine-grained Operations

Part 1 Fine-grained Operations Part 1 Fine-grained Operations As we learned on Monday, CMPXCHG can be used to implement other primitives, such as TestAndSet. int CMPXCHG (int* loc, int oldval, int newval) { ATOMIC(); int old_reg_val

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Preface A Brief History Pilot Test Results

Preface A Brief History Pilot Test Results Preface A Brief History In Fall, 2005, Wanda Dann and Steve Cooper, originators of the Alice approach for introductory programming (in collaboration with Randy Pausch), met with Barb Ericson and Mark Guzdial,

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java

More information

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threaded Programming. Lecture 9: Alternatives to OpenMP Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming

More information

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department Philippas Tsigas

More information

G52CON: Concepts of Concurrency

G52CON: Concepts of Concurrency G52CON: Concepts of Concurrency Lecture 4: Atomic Actions Natasha Alechina School of Computer Science nza@cs.nott.ac.uk Outline of the lecture process execution fine-grained atomic actions using fine-grained

More information

Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel

Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Which performance analysis tool should I use first? Intel Application

More information

IT101. File Input and Output

IT101. File Input and Output IT101 File Input and Output IO Streams A stream is a communication channel that a program has with the outside world. It is used to transfer data items in succession. An Input/Output (I/O) Stream represents

More information

CS560 Lecture Parallel Architecture 1

CS560 Lecture Parallel Architecture 1 Parallel Architecture Announcements The RamCT merge is done! Please repost introductions. Manaf s office hours HW0 is due tomorrow night, please try RamCT submission HW1 has been posted Today Isoefficiency

More information

CS 112 Introduction to Programming

CS 112 Introduction to Programming CS 112 Introduction to Programming (Spring 2012) Lecture #7: Variable Scope, Constants, and Loops Zhong Shao Department of Computer Science Yale University Office: 314 Watson http://flint.cs.yale.edu/cs112

More information

At the End of the Last Decade The Next Decade in Compiler Technology

At the End of the Last Decade The Next Decade in Compiler Technology At the End of the Last Decade The Next Decade in Compiler Technology An Industry Perspective Marcel Beemster marcel@ace.nl ACE Associated Compiler Experts The Customer s Toolset: Much Like this Car In

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Agenda Lecture Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Next discussion papers Selecting Locking Primitives for Parallel Programming Selecting Locking

More information

Chapter 2: Data and Expressions

Chapter 2: Data and Expressions Chapter 2: Data and Expressions CS 121 Department of Computer Science College of Engineering Boise State University April 21, 2015 Chapter 2: Data and Expressions CS 121 1 / 53 Chapter 2 Part 1: Data Types

More information

A General Discussion on! Parallelism!

A General Discussion on! Parallelism! Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware!! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.  Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 3 OpenMP Tell Me More (Go, OpenCL,... ) Overview 1 Sharing Data First, always try to apply the following mantra: Don t share

More information

The Mother of All Chapel Talks

The Mother of All Chapel Talks The Mother of All Chapel Talks Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Lecture Structure 1. Programming Models Landscape 2. Chapel Motivating Themes 3. Chapel Language Features 4. Project Status

More information

The Problem with Threads

The Problem with Threads The Problem with Threads Author Edward A Lee Presented by - Varun Notibala Dept of Computer & Information Sciences University of Delaware Threads Thread : single sequential flow of control Model for concurrent

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet Due date: Monday, 4/06/2012 12:30 noon Teaching assistants in charge: Operating Systems (234123) Spring-2012 Homework 3 Wet Anastasia Braginsky All emails regarding this assignment should be sent only

More information

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how Oleg Batrashev February 10, 2014 what this course is about? additional experience with try out some language concepts and techniques Grading java.util.concurrent.future? agents,

More information

CSE 590o: Chapel. Brad Chamberlain Steve Deitz Chapel Team. University of Washington September 26, 2007

CSE 590o: Chapel. Brad Chamberlain Steve Deitz Chapel Team. University of Washington September 26, 2007 CSE 590o: Chapel Brad Chamberlain Steve Deitz Chapel Team University of Washington September 26, 2007 Outline Context for Chapel This Seminar Chapel Compiler CSE 590o: Chapel (2) Chapel Chapel: a new parallel

More information

Accelerating Financial Applications on the GPU

Accelerating Financial Applications on the GPU Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 4 C Pointers and Arrays 1/25/2006 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ CS 61C L04 C Pointers (1) Common C Error There is a difference

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Gambler s Ruin Lesson Plan

Gambler s Ruin Lesson Plan Gambler s Ruin Lesson Plan Ron Bannon August 11, 05 1 Gambler s Ruin Lesson Plan 1 Printed August 11, 05 Prof. Kim s assigned lesson plan based on the Gambler s Ruin Problem. Preamble: This lesson plan

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 7: Synchronization Administrivia Homework 1 Due today by the end of day Hopefully you have started on project 1 by now? Kernel-level threads (preemptable

More information

CSci 4061 Introduction to Operating Systems. (Thread-Basics)

CSci 4061 Introduction to Operating Systems. (Thread-Basics) CSci 4061 Introduction to Operating Systems (Thread-Basics) Threads Abstraction: for an executing instruction stream Threads exist within a process and share its resources (i.e. memory) But, thread has

More information