Identifying Inter-task Communication in Shared Memory Programming Models. Per Larsen, Sven Karlsson and Jan Madsen

Size: px
Start display at page:

Download "Identifying Inter-task Communication in Shared Memory Programming Models. Per Larsen, Sven Karlsson and Jan Madsen"

Transcription

1 Identifying Inter-task Communication in Shared Memory Programming Models Per Larsen, Sven Karlsson and Jan Madsen

2 Motivation Identifying dependencies between serial parts of OpenMP programs is necessary to derive task graphs compiler analysis by itself is insufficient directives is an alternative Use example 1 to demonstrate limits of program analysis, and show potential of programmer declarations struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b); int main(void) { list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { #pragma omp section processlist(a); #pragma omp section processlist(b); } } main proc a 1. example was adapted from C. Lattner, et al. Data-Structure Analysis, PLDI '05? proc b 2 DTU Informatics, Technical University of Denmark

3 Aliasing If two symbols refer to the same memory location, they are said to be aliases. focus of this talk! *p There are two major sources of aliasing: pointers and arrays. *q future work a[i] a[j] 3 DTU Informatics, Technical University of Denmark

4 Aliasing in the SplitClone Example 1) different aliases are created on different paths struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b) { } if(l == NULL) { a = b = NULL; return; } if(pred){ *a = l; splitclone(l->next,&(*a)->next,b);} else { *b = l; splitclone(l->next,a,&(*b)->next);} int main(void) { } list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { } #pragma omp section processlist(a); #pragma omp section processlist(b); 2) different aliases are created on different call sites simple example requires alias analysis more advanced than those used in production compilers! - and - alias results will always be approximate due to undecidability! 4 DTU Informatics, Technical University of Denmark

5 Outline Motivation Aliasing Task graph extraction from OpenMP programs Using directives to specify dependencies Runtime checking approach cost Wrap-up 5 DTU Informatics, Technical University of Denmark

6 Task Graph Synthesis from OpenMP Programs serial code goal: avoid over-approximating data dependencies when synthesizing task graphs control flow dependencies task data dependencies τ 1 τ 2 τ 3 τ 4 τ 5 τ 6 τ 7 τ 8 6 DTU Informatics, Technical University of Denmark compiler analysis conservatively creates superfluous dependencies

7 Superfluous dependencies leads to poor mapping choices the two solutions seem equally good to mapping tool... (assuming all dependencies have equal weights) τ 1 τ 1 τ 2 τ 3 τ 2 τ 3 τ 4 τ 5 τ 4 τ 5 τ 6 τ 7 τ 6 τ 7 processor 0 τ 8 leftmost solution clearly achieves better data-locality τ 8 processor 1 7 DTU Informatics, Technical University of Denmark

8 SplitClone Example with Declarations goal: compute data-dependencies between tasks at compile-time struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b); label each task and use the labels to describe dependencies among tasks - variables may alias - labels may not! #pragma dep main out(a,proc_a) out(b,proc_b) int main(void) { } list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { } #pragma dep proc_a in(a,main) #pragma omp section processlist(a); #pragma dep proc_b in(b,main) #pragma omp section processlist(b); 8 DTU Informatics, Technical University of Denmark

9 Runtime Checks of Declarations goal: detect mismatch between declarations and observable program behavior: mark objects as: 1) output of task main 2) input of task proc_a check that objects are: 1) originating from task main 2) destined for proc_a struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b) { if(l == NULL) { a = b = NULL; return; } if(pred){ *a = l; splitclone(l->next,&(*a)->next,b);} else { *b = l; splitclone(l->next,a,&(*b)->next);} } #pragma dep main out(a,proc_a) out(b,proc_b) int main(void) { list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { #pragma dep proc_a in(a,main) #pragma omp section processlist(a); #pragma dep proc_b in(b,main) #pragma omp section processlist(b); } } 9 DTU Informatics, Technical University of Denmark

10 Runtime Operations Runtime operations all access a global hash table which associates pointers (addresses) with the tasks that read and write them. Synchronization is necessary since modifications may happen concurrently. operation arguments read-lock write-lock check_input ptr, curr_task check_output ptr, curr_task register_input ptr, tasks_reading_ptr ( ) register_output ptr, tasks_writing_ptr ( ) update_input ptr, tasks_reading_other_ptr ( ) update_output ptr, tasks_reading_other_ptr ( ) unregister ptr 10 DTU Informatics, Technical University of Denmark

11 Micro-benchmark of Runtime Operations Testbench: Intel Core i7 920, Ubuntu Linux 8.10, gcc 4.3.2, -O2 execution time in microseconds Overhead can be small compared to the cost of entering a parallel section 11 DTU Informatics, Technical University of Denmark

12 Overhead in NPB Integer Sort Benchmark [4] Benchmark kernel performing large integer sort on arrays Important operation in particle method codes [5] We conservatively overestimate dependencies between tasks working on the same array, as data-decomposition is left for future work [4] H. Jin, et al. NPB-OpenMP 3.0, 1999 [5] D. H. Bailey, et al. The NAS Parallel Benchmarks, DTU Informatics, Technical University of Denmark

13 Overhead in Integer Sort Benchmark Cont'd Testbench: Intel Core i7 920, Ubuntu Linux 8.10, gcc 4.3.2, -O2 Code contains: 5 parallel sections 7 barriers Corresponds to 12 tasks Eight pointers to objects shared between tasks Runtime calls: 19 to check_input 11 to check_output 19 for initialization...no measurable overhead! (used GNU time utility and averages from 30 runs) 13 DTU Informatics, Technical University of Denmark

14 Summary Applicability of compiler analysis in extracting task graphs has practical as well as theoretical limitations Programmer declarations can address problem of pointer aliasing Programmer must think carefully about data sharing - so why not document this in the source code? Benchmarks indicate that the overhead of runtime checks can be acceptable We are currently extending our work to handle data-decomposition 14 DTU Informatics, Technical University of Denmark

15 Questions Thanks for your attention! Reach me at 15 DTU Informatics, Technical University of Denmark

16 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs w/o pointers and arrays 16 DTU Informatics, Technical University of Denmark

17 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Determining the targets of pointers is undecidable [2] even under the control-flow assumption Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs with pointers and arrays 17 DTU Informatics, Technical University of Denmark

18 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Determining the targets of pointers is undecidable [2] even under the control-flow assumption Arbitrary type conversions means that pointers can point almost anywhere Strong typing can be used at the cost of some efficiency Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs with weak typing, pointers and arrays 18 DTU Informatics, Technical University of Denmark

19 Trade-offs in runtime checking pointerbased data structures check = O(n) split = O(1) update = O(n) check = O(n) split = O(1) update = O(1) *part *part check = O(1) split = O(n) update = O(1) permissions C object check = O(1) split = O(1) update = O(1) part objects can only cross task boundaries as part of a non-part object 19 DTU Informatics, Technical University of Denmark

20 Runtime Operations Pseudocode hashtable: ptr ptr_permissions ptr_permissions: (readers: set, writers: set) # tasks are compile time constants so # the function below can be precomputed def all_writers(task) return decl_writers(task) post_writers(task) def check_input(ptr, curr_task) acquire_read_lock() (readers, writers) = hashtable[ptr] if not(readers.contains(curr_task)) or writers eq all_writers(curr_task) raise_error() release_read_lock() def get_or_insert(ptr) : ptr_permissions acquire_read_lock() if hashtable.contains(ptr) release_read_lock() else upgrade_to_write_lock() hashtable[ptr] = (, ) release_write_lock() return hashtable[ptr] def register_input(ptr, decl_writers) (readers, writers) = get_or_insert(ptr) hashtable[ptr].writers = decl_writers def update_input(ptr, decl_writers_of_ptr2) (readers, writers) = get_or_insert(ptr) hashtable[ptr].writers decl_writers_of_ptr2 20 DTU Informatics, Technical University of Denmark

21 Task Graph Synthesis from OpenMP Programs serial fragment control-flow dependencies task τ 1 data dependencies τ 2 τ 3 τ 4 τ 5 parallel region τ 6 τ 7 τ 8 τ 9 barriers τ 10 challenge: determine data-dependencies between tasks in the presence of aliasing 21 DTU Informatics, Technical University of Denmark

22 Points-to Analysis Computes aliasing relations between pointers at compile time. Points-to analysis is undecideable [1] if language supports: if statements, loops, dynamic storage and recursive data structures [1] G. Ramalingam, undecideability of aliasing, TOPLAS 1994 [2] accessed 11/03/09 Language popularity [2] 22 DTU Informatics, Technical University of Denmark

Identifying Inter-task Communication in Shared Memory Programming Models

Identifying Inter-task Communication in Shared Memory Programming Models Identifying Inter-task Communication in Shared Memory Programming Models Per Larsen, Sven Karlsson, and Jan Madsen DTU Informatics Technical University of Denmark {pl,ska,jan}@imm.dtu.dk Abstract. Modern

More information

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

OpenACC (Open Accelerators - Introduced in 2012)

OpenACC (Open Accelerators - Introduced in 2012) OpenACC (Open Accelerators - Introduced in 2012) Open, portable standard for parallel computing (Cray, CAPS, Nvidia and PGI); introduced in 2012; GNU has an incomplete implementation. Uses directives in

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2014 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable

More information

Siloed Reference Analysis

Siloed Reference Analysis Siloed Reference Analysis Xing Zhou 1. Objectives: Traditional compiler optimizations must be conservative for multithreaded programs in order to ensure correctness, since the global variables or memory

More information

PtrSplit: Supporting General Pointers in Automatic Program Partitioning

PtrSplit: Supporting General Pointers in Automatic Program Partitioning PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan Trent Jaeger Computer Science and Engineering Department The Pennsylvania State University 11/02/2017 Motivation

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

My malloc: mylloc and mhysa. Johan Montelius HT2016

My malloc: mylloc and mhysa. Johan Montelius HT2016 1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work

More information

S Comparing OpenACC 2.5 and OpenMP 4.5

S Comparing OpenACC 2.5 and OpenMP 4.5 April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo

More information

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

Point-to-Point Synchronisation on Shared Memory Architectures

Point-to-Point Synchronisation on Shared Memory Architectures Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:

More information

Functions in C C Programming and Software Tools

Functions in C C Programming and Software Tools Functions in C C Programming and Software Tools N.C. State Department of Computer Science Functions in C Functions are also called subroutines or procedures One part of a program calls (or invokes the

More information

OpenMP and more Deadlock 2/16/18

OpenMP and more Deadlock 2/16/18 OpenMP and more Deadlock 2/16/18 Administrivia HW due Tuesday Cache simulator (direct-mapped and FIFO) Steps to using threads for parallelism Move code for thread into a function Create a struct to hold

More information

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Abstract As both an OpenMP and OpenACC insider I will present my opinion of the current status of these two directive sets for programming

More information

CSE 12 Abstract Syntax Trees

CSE 12 Abstract Syntax Trees CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures

More information

2

2 1 2 3 4 5 Code transformation Every time the compiler finds a #pragma omp parallel directive creates a new function in which the code belonging to the scope of the pragma itself is moved The directive

More information

JCudaMP: OpenMP/Java on CUDA

JCudaMP: OpenMP/Java on CUDA JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems

More information

CS 261 Fall Mike Lam, Professor. Threads

CS 261 Fall Mike Lam, Professor. Threads CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain

More information

Parallel Programming

Parallel Programming Parallel Programming Midterm Exam Wednesday, April 27, 2016 Your points are precious, don t let them go to waste! Your Time All points are not equal. Note that we do not think that all exercises have the

More information

Introduction to Computer Science Midterm 3 Fall, Points

Introduction to Computer Science Midterm 3 Fall, Points Introduction to Computer Science Fall, 2001 100 Points Notes 1. Tear off this sheet and use it to keep your answers covered at all times. 2. Turn the exam over and write your name next to the staple. Do

More information

1 Serial Implementation

1 Serial Implementation Grey Ballard, Razvan Carbunescu, Andrew Gearhart, Mehrzad Tartibi CS267: Homework 2 1 Serial Implementation For n particles, the original code requires O(n 2 ) time because at each time step, the apply

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Class Information INFORMATION and REMINDERS Homework 8 has been posted. Due Wednesday, December 13 at 11:59pm. Third programming has been posted. Due Friday, December 15, 11:59pm. Midterm sample solutions

More information

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet Due date: Monday, 4/06/2012 12:30 noon Teaching assistants in charge: Operating Systems (234123) Spring-2012 Homework 3 Wet Anastasia Braginsky All emails regarding this assignment should be sent only

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #34. Function with pointer Argument

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #34. Function with pointer Argument Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #34 Function with pointer Argument (Refer Slide Time: 00:05) So, here is the stuff that we have seen about pointers.

More information

COMP 524 Spring 2018 Midterm Thursday, March 1

COMP 524 Spring 2018 Midterm Thursday, March 1 Name PID COMP 524 Spring 2018 Midterm Thursday, March 1 This exam is open note, open book and open computer. It is not open people. You are to submit this exam through gradescope. Resubmissions have been

More information

Pointers. Pointer Variables. Chapter 11. Pointer Variables. Pointer Variables. Pointer Variables. Declaring Pointer Variables

Pointers. Pointer Variables. Chapter 11. Pointer Variables. Pointer Variables. Pointer Variables. Declaring Pointer Variables Chapter 11 Pointers The first step in understanding pointers is visualizing what they represent at the machine level. In most modern computers, main memory is divided into bytes, with each byte capable

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Array Basics: Outline. Creating and Accessing Arrays. Creating and Accessing Arrays. Arrays (Savitch, Chapter 7)

Array Basics: Outline. Creating and Accessing Arrays. Creating and Accessing Arrays. Arrays (Savitch, Chapter 7) Array Basics: Outline Arrays (Savitch, Chapter 7) TOPICS Array Basics Arrays in Classes and Methods Programming with Arrays Searching and Sorting Arrays Multi-Dimensional Arrays Static Variables and Constants

More information

OpenMP 3.0 Tasking Implementation in OpenUH

OpenMP 3.0 Tasking Implementation in OpenUH Open64 Workshop @ CGO 09 OpenMP 3.0 Tasking Implementation in OpenUH Cody Addison Texas Instruments Lei Huang University of Houston James (Jim) LaGrone University of Houston Barbara Chapman University

More information

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017 CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.

More information

Lectures 13 & 14. memory management

Lectures 13 & 14. memory management Lectures 13 & 14 Linked lists and memory management Courtesy of Prof. Garcia (UCB) CS61C L05 Introduction to C (pt 3) (1) Review Pointers and arrays are virtually same C knows how to increment pointers

More information

Questions from last time

Questions from last time Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100

More information

Subject: PROBLEM SOLVING THROUGH C Time: 3 Hours Max. Marks: 100

Subject: PROBLEM SOLVING THROUGH C Time: 3 Hours Max. Marks: 100 Code: DC-05 Subject: PROBLEM SOLVING THROUGH C Time: 3 Hours Max. Marks: 100 NOTE: There are 11 Questions in all. Question 1 is compulsory and carries 16 marks. Answer to Q. 1. must be written in the space

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management 2007-02-06 Hello to Said S. from Columbus, OH CS61C L07 More Memory Management (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia

More information

Assignment 4: Semantics

Assignment 4: Semantics Assignment 4: Semantics 15-411: Compiler Design Jan Hoffmann Jonathan Burns, DeeDee Han, Anatol Liu, Alice Rao Due Thursday, November 3, 2016 (9:00am) Reminder: Assignments are individual assignments,

More information

Basic programming knowledge (arrays, looping, functions) Basic concept of parallel programming (in OpenMP)

Basic programming knowledge (arrays, looping, functions) Basic concept of parallel programming (in OpenMP) Parallel Sort Course Level: CS2 PDC Concepts Covered PDC Concept Concurrency Data Parallel Sequential Dependency Bloom Level C A A Programing Knowledge Prerequisites: Basic programming knowledge (arrays,

More information

Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits,

Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits, Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs {livshits, lam}@cs.stanford.edu 2 Background Software systems are getting bigger Harder to develop Harder to modify Harder

More information

C Pointers. 6th April 2017 Giulio Picierro

C Pointers. 6th April 2017 Giulio Picierro C Pointers 6th April 07 Giulio Picierro Functions Return type Function name Arguments list Function body int sum(int a, int b) { return a + b; } Return statement (return keyword

More information

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of

More information

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result.

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result. Every program uses data, either explicitly or implicitly to arrive at a result. Data in a program is collected into data structures, and is manipulated by algorithms. Algorithms + Data Structures = Programs

More information

Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions

Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Ethan Blanton and Lukasz Ziarek Fiji Systems, Inc. October 10 th, 2013 WFPT Overview Wait-Free Pair Transactions A communication

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Lecture 14 Pointer Analysis

Lecture 14 Pointer Analysis Lecture 14 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis [ALSU 12.4, 12.6-12.7] Phillip B. Gibbons 15-745: Pointer Analysis

More information

A Local-View Array Library for Partitioned Global Address Space C++ Programs

A Local-View Array Library for Partitioned Global Address Space C++ Programs Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June

More information

Programming Fundamentals - A Modular Structured Approach using C++ By: Kenneth Leroy Busbee

Programming Fundamentals - A Modular Structured Approach using C++ By: Kenneth Leroy Busbee 1 0 1 0 Foundation Topics 1 0 Chapter 1 - Introduction to Programming 1 1 Systems Development Life Cycle N/A N/A N/A N/A N/A N/A 1-8 12-13 1 2 Bloodshed Dev-C++ 5 Compiler/IDE N/A N/A N/A N/A N/A N/A N/A

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,

More information

Reasoning and writing about algorithms: some tips

Reasoning and writing about algorithms: some tips Reasoning and writing about algorithms: some tips Theory of Algorithms Winter 2016, U. Chicago Notes by A. Drucker The suggestions below address common issues arising in student homework submissions. Absorbing

More information

Final exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Dec 20, Student's name: Student ID:

Final exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Dec 20, Student's name: Student ID: Fall term 2012 KAIST EE209 Programming Structures for EE Final exam Thursday Dec 20, 2012 Student's name: Student ID: The exam is closed book and notes. Read the questions carefully and focus your answers

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

Pointers. 1 Background. 1.1 Variables and Memory. 1.2 Motivating Pointers Massachusetts Institute of Technology

Pointers. 1 Background. 1.1 Variables and Memory. 1.2 Motivating Pointers Massachusetts Institute of Technology Introduction to C++ Massachusetts Institute of Technology ocw.mit.edu 6.096 Pointers 1 Background 1.1 Variables and Memory When you declare a variable, the computer associates the variable name with a

More information

Algorithms & Data Structures

Algorithms & Data Structures GATE- 2016-17 Postal Correspondence 1 Algorithms & Data Structures Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Shared Memory Programming. Parallel Programming Overview

Shared Memory Programming. Parallel Programming Overview Shared Memory Programming Arvind Krishnamurthy Fall 2004 Parallel Programming Overview Basic parallel programming problems: 1. Creating parallelism & managing parallelism Scheduling to guarantee parallelism

More information

POSIX Threads and OpenMP tasks

POSIX Threads and OpenMP tasks POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i

More information

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS

More information

C++ (Non for C Programmer) (BT307) 40 Hours

C++ (Non for C Programmer) (BT307) 40 Hours C++ (Non for C Programmer) (BT307) 40 Hours Overview C++ is undoubtedly one of the most widely used programming language for implementing object-oriented systems. The C++ language is based on the popular

More information

5.12 EXERCISES Exercises 263

5.12 EXERCISES Exercises 263 5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from

More information

Programming refresher and intro to C programming

Programming refresher and intro to C programming Applied mechatronics Programming refresher and intro to C programming Sven Gestegård Robertz sven.robertz@cs.lth.se Department of Computer Science, Lund University 2018 Outline 1 C programming intro 2

More information

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Pros and Cons of Pointers Lecture 27 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Lecture 20 Pointer Analysis

Lecture 20 Pointer Analysis Lecture 20 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) 15-745:

More information

Some changes in snow and R

Some changes in snow and R Some changes in snow and R Luke Tierney Department of Statistics & Actuarial Science University of Iowa December 13, 2007 Luke Tierney (U. of Iowa) Some changes in snow and R December 13, 2007 1 / 22 Some

More information

Introduction to Computer Systems /18 243, fall th Lecture, Oct. 22 th

Introduction to Computer Systems /18 243, fall th Lecture, Oct. 22 th Introduction to Computer Systems 15 213/18 243, fall 2009 16 th Lecture, Oct. 22 th Instructors: Gregory Kesden and Markus Püschel Today Dynamic memory allocation Process Memory Image %esp kernel virtual

More information

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University   Scalable Tools Workshop 7 August 2017 Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Implementing Interfaces. Marwan Burelle. July 20, 2012

Implementing Interfaces. Marwan Burelle. July 20, 2012 Implementing marwan.burelle@lse.epita.fr http://www.lse.epita.fr/ July 20, 2012 Outline 1 2 3 4 Quick Overview of System oriented programming language Variant of C with a rationnalized syntax. Syntactic

More information

Loop Modifications to Enhance Data-Parallel Performance

Loop Modifications to Enhance Data-Parallel Performance Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent

More information

DawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs

DawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs DawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs Breno Campos Ferreira Guimarães, Gleison Souza Diniz Mendonça, Fernando Magno Quintão Pereira 1 Departamento de Ciência da Computação

More information

COMP26120: Linked List in C (2018/19) Lucas Cordeiro

COMP26120: Linked List in C (2018/19) Lucas Cordeiro COMP26120: Linked List in C (2018/19) Lucas Cordeiro lucas.cordeiro@manchester.ac.uk Linked List Lucas Cordeiro (Formal Methods Group) lucas.cordeiro@manchester.ac.uk Office: 2.28 Office hours: 10-11 Tuesday,

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

OpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR

OpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Is used to write kernels when working with OpenCL Used to code the part that runs on the device Based on C99 with some extensions

More information

M1-R4: Programing and Problem Solving using C (JAN 2019)

M1-R4: Programing and Problem Solving using C (JAN 2019) M1-R4: Programing and Problem Solving using C (JAN 2019) Max Marks: 100 M1-R4-07-18 DURATION: 03 Hrs 1. Each question below gives a multiple choice of answers. Choose the most appropriate one and enter

More information

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use

More information

Parallel design patterns ARCHER course. Vectorisation and active messaging

Parallel design patterns ARCHER course. Vectorisation and active messaging Parallel design patterns ARCHER course Vectorisation and active messaging Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

https://lambda.mines.edu Evaluating programming languages based on: Writability: How easy is it to write good code? Readability: How easy is it to read well written code? Is the language easy enough to

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #33 Pointer Arithmetic

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #33 Pointer Arithmetic Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #33 Pointer Arithmetic In this video let me, so some cool stuff which is pointer arithmetic which helps you to

More information

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition Chapter 5: Process Synchronization Silberschatz, Galvin and Gagne 2013 Chapter 5: Process Synchronization Background The Critical-Section Problem Peterson s Solution Synchronization Hardware Mutex Locks

More information

STAPL Standard Template Adaptive Parallel Library

STAPL Standard Template Adaptive Parallel Library STAPL Standard Template Adaptive Parallel Library Lawrence Rauchwerger Antal Buss, Harshvardhan, Ioannis Papadopoulous, Olga Pearce, Timmie Smith, Gabriel Tanase, Nathan Thomas, Xiabing Xu, Mauro Bianco,

More information

8. Functions (II) Control Structures: Arguments passed by value and by reference int x=5, y=3, z; z = addition ( x, y );

8. Functions (II) Control Structures: Arguments passed by value and by reference int x=5, y=3, z; z = addition ( x, y ); - 50 - Control Structures: 8. Functions (II) Arguments passed by value and by reference. Until now, in all the functions we have seen, the arguments passed to the functions have been passed by value. This

More information

CPSC 213, Winter 2009, Term 2 Midterm Exam Date: March 12, 2010; Instructor: Mike Feeley

CPSC 213, Winter 2009, Term 2 Midterm Exam Date: March 12, 2010; Instructor: Mike Feeley CPSC 213, Winter 2009, Term 2 Midterm Exam Date: March 12, 2010; Instructor: Mike Feeley This is a closed book exam. No notes. Electronic calculators are permitted. Answer in the space provided. Show your

More information

CSE 307: Principles of Programming Languages

CSE 307: Principles of Programming Languages CSE 307: Principles of Programming Languages Variables and Constants R. Sekar 1 / 22 Topics 2 / 22 Variables and Constants Variables are stored in memory, whereas constants need not be. Value of variables

More information

System Software Assignment 1 Runtime Support for Procedures

System Software Assignment 1 Runtime Support for Procedures System Software Assignment 1 Runtime Support for Procedures Exercise 1: Nested procedures Some programming languages like Oberon and Pascal support nested procedures. 1. Find a run-time structure for such

More information

Run-time Environments - 3

Run-time Environments - 3 Run-time Environments - 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture n What is run-time

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information