Identifying Inter-task Communication in Shared Memory Programming Models. Per Larsen, Sven Karlsson and Jan Madsen
|
|
- Sheena Banks
- 6 years ago
- Views:
Transcription
1 Identifying Inter-task Communication in Shared Memory Programming Models Per Larsen, Sven Karlsson and Jan Madsen
2 Motivation Identifying dependencies between serial parts of OpenMP programs is necessary to derive task graphs compiler analysis by itself is insufficient directives is an alternative Use example 1 to demonstrate limits of program analysis, and show potential of programmer declarations struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b); int main(void) { list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { #pragma omp section processlist(a); #pragma omp section processlist(b); } } main proc a 1. example was adapted from C. Lattner, et al. Data-Structure Analysis, PLDI '05? proc b 2 DTU Informatics, Technical University of Denmark
3 Aliasing If two symbols refer to the same memory location, they are said to be aliases. focus of this talk! *p There are two major sources of aliasing: pointers and arrays. *q future work a[i] a[j] 3 DTU Informatics, Technical University of Denmark
4 Aliasing in the SplitClone Example 1) different aliases are created on different paths struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b) { } if(l == NULL) { a = b = NULL; return; } if(pred){ *a = l; splitclone(l->next,&(*a)->next,b);} else { *b = l; splitclone(l->next,a,&(*b)->next);} int main(void) { } list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { } #pragma omp section processlist(a); #pragma omp section processlist(b); 2) different aliases are created on different call sites simple example requires alias analysis more advanced than those used in production compilers! - and - alias results will always be approximate due to undecidability! 4 DTU Informatics, Technical University of Denmark
5 Outline Motivation Aliasing Task graph extraction from OpenMP programs Using directives to specify dependencies Runtime checking approach cost Wrap-up 5 DTU Informatics, Technical University of Denmark
6 Task Graph Synthesis from OpenMP Programs serial code goal: avoid over-approximating data dependencies when synthesizing task graphs control flow dependencies task data dependencies τ 1 τ 2 τ 3 τ 4 τ 5 τ 6 τ 7 τ 8 6 DTU Informatics, Technical University of Denmark compiler analysis conservatively creates superfluous dependencies
7 Superfluous dependencies leads to poor mapping choices the two solutions seem equally good to mapping tool... (assuming all dependencies have equal weights) τ 1 τ 1 τ 2 τ 3 τ 2 τ 3 τ 4 τ 5 τ 4 τ 5 τ 6 τ 7 τ 6 τ 7 processor 0 τ 8 leftmost solution clearly achieves better data-locality τ 8 processor 1 7 DTU Informatics, Technical University of Denmark
8 SplitClone Example with Declarations goal: compute data-dependencies between tasks at compile-time struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b); label each task and use the labels to describe dependencies among tasks - variables may alias - labels may not! #pragma dep main out(a,proc_a) out(b,proc_b) int main(void) { } list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { } #pragma dep proc_a in(a,main) #pragma omp section processlist(a); #pragma dep proc_b in(b,main) #pragma omp section processlist(b); 8 DTU Informatics, Technical University of Denmark
9 Runtime Checks of Declarations goal: detect mismatch between declarations and observable program behavior: mark objects as: 1) output of task main 2) input of task proc_a check that objects are: 1) originating from task main 2) destined for proc_a struct list {int data; list *next}; list *createlist(void); void processlist(list *l); void splitclone(list *l, list **a, list **b) { if(l == NULL) { a = b = NULL; return; } if(pred){ *a = l; splitclone(l->next,&(*a)->next,b);} else { *b = l; splitclone(l->next,a,&(*b)->next);} } #pragma dep main out(a,proc_a) out(b,proc_b) int main(void) { list *l, *a, *b; l = createlist(); splitclone(l, &a, &b); #pragma omp parallel sections { #pragma dep proc_a in(a,main) #pragma omp section processlist(a); #pragma dep proc_b in(b,main) #pragma omp section processlist(b); } } 9 DTU Informatics, Technical University of Denmark
10 Runtime Operations Runtime operations all access a global hash table which associates pointers (addresses) with the tasks that read and write them. Synchronization is necessary since modifications may happen concurrently. operation arguments read-lock write-lock check_input ptr, curr_task check_output ptr, curr_task register_input ptr, tasks_reading_ptr ( ) register_output ptr, tasks_writing_ptr ( ) update_input ptr, tasks_reading_other_ptr ( ) update_output ptr, tasks_reading_other_ptr ( ) unregister ptr 10 DTU Informatics, Technical University of Denmark
11 Micro-benchmark of Runtime Operations Testbench: Intel Core i7 920, Ubuntu Linux 8.10, gcc 4.3.2, -O2 execution time in microseconds Overhead can be small compared to the cost of entering a parallel section 11 DTU Informatics, Technical University of Denmark
12 Overhead in NPB Integer Sort Benchmark [4] Benchmark kernel performing large integer sort on arrays Important operation in particle method codes [5] We conservatively overestimate dependencies between tasks working on the same array, as data-decomposition is left for future work [4] H. Jin, et al. NPB-OpenMP 3.0, 1999 [5] D. H. Bailey, et al. The NAS Parallel Benchmarks, DTU Informatics, Technical University of Denmark
13 Overhead in Integer Sort Benchmark Cont'd Testbench: Intel Core i7 920, Ubuntu Linux 8.10, gcc 4.3.2, -O2 Code contains: 5 parallel sections 7 barriers Corresponds to 12 tasks Eight pointers to objects shared between tasks Runtime calls: 19 to check_input 11 to check_output 19 for initialization...no measurable overhead! (used GNU time utility and averages from 30 runs) 13 DTU Informatics, Technical University of Denmark
14 Summary Applicability of compiler analysis in extracting task graphs has practical as well as theoretical limitations Programmer declarations can address problem of pointer aliasing Programmer must think carefully about data sharing - so why not document this in the source code? Benchmarks indicate that the overhead of runtime checks can be acceptable We are currently extending our work to handle data-decomposition 14 DTU Informatics, Technical University of Denmark
15 Questions Thanks for your attention! Reach me at 15 DTU Informatics, Technical University of Denmark
16 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs w/o pointers and arrays 16 DTU Informatics, Technical University of Denmark
17 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Determining the targets of pointers is undecidable [2] even under the control-flow assumption Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs with pointers and arrays 17 DTU Informatics, Technical University of Denmark
18 Program Analysis is Inevitably Approximate Determining which statements are executable in a program is undecidable [1] Control-flow assumption Determining the targets of pointers is undecidable [2] even under the control-flow assumption Arbitrary type conversions means that pointers can point almost anywhere Strong typing can be used at the cost of some efficiency Approximation by compile time analysis [1] H. G. Rice, Classes of Recursively Enumerable Sets and Their Decision Problems, Trans. of AMS, 1953 [2] G. Ramalingam, undecidability of aliasing, TOPLAS 1994 Actual behavior of programs with weak typing, pointers and arrays 18 DTU Informatics, Technical University of Denmark
19 Trade-offs in runtime checking pointerbased data structures check = O(n) split = O(1) update = O(n) check = O(n) split = O(1) update = O(1) *part *part check = O(1) split = O(n) update = O(1) permissions C object check = O(1) split = O(1) update = O(1) part objects can only cross task boundaries as part of a non-part object 19 DTU Informatics, Technical University of Denmark
20 Runtime Operations Pseudocode hashtable: ptr ptr_permissions ptr_permissions: (readers: set, writers: set) # tasks are compile time constants so # the function below can be precomputed def all_writers(task) return decl_writers(task) post_writers(task) def check_input(ptr, curr_task) acquire_read_lock() (readers, writers) = hashtable[ptr] if not(readers.contains(curr_task)) or writers eq all_writers(curr_task) raise_error() release_read_lock() def get_or_insert(ptr) : ptr_permissions acquire_read_lock() if hashtable.contains(ptr) release_read_lock() else upgrade_to_write_lock() hashtable[ptr] = (, ) release_write_lock() return hashtable[ptr] def register_input(ptr, decl_writers) (readers, writers) = get_or_insert(ptr) hashtable[ptr].writers = decl_writers def update_input(ptr, decl_writers_of_ptr2) (readers, writers) = get_or_insert(ptr) hashtable[ptr].writers decl_writers_of_ptr2 20 DTU Informatics, Technical University of Denmark
21 Task Graph Synthesis from OpenMP Programs serial fragment control-flow dependencies task τ 1 data dependencies τ 2 τ 3 τ 4 τ 5 parallel region τ 6 τ 7 τ 8 τ 9 barriers τ 10 challenge: determine data-dependencies between tasks in the presence of aliasing 21 DTU Informatics, Technical University of Denmark
22 Points-to Analysis Computes aliasing relations between pointers at compile time. Points-to analysis is undecideable [1] if language supports: if statements, loops, dynamic storage and recursive data structures [1] G. Ramalingam, undecideability of aliasing, TOPLAS 1994 [2] accessed 11/03/09 Language popularity [2] 22 DTU Informatics, Technical University of Denmark
Identifying Inter-task Communication in Shared Memory Programming Models
Identifying Inter-task Communication in Shared Memory Programming Models Per Larsen, Sven Karlsson, and Jan Madsen DTU Informatics Technical University of Denmark {pl,ska,jan}@imm.dtu.dk Abstract. Modern
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance
More informationOpenACC (Open Accelerators - Introduced in 2012)
OpenACC (Open Accelerators - Introduced in 2012) Open, portable standard for parallel computing (Cray, CAPS, Nvidia and PGI); introduced in 2012; GNU has an incomplete implementation. Uses directives in
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationTHE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing
THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2014 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable
More informationSiloed Reference Analysis
Siloed Reference Analysis Xing Zhou 1. Objectives: Traditional compiler optimizations must be conservative for multithreaded programs in order to ensure correctness, since the global variables or memory
More informationPtrSplit: Supporting General Pointers in Automatic Program Partitioning
PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan Trent Jaeger Computer Science and Engineering Department The Pennsylvania State University 11/02/2017 Motivation
More informationConcurrency, Thread. Dongkun Shin, SKKU
Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point
More informationMy malloc: mylloc and mhysa. Johan Montelius HT2016
1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work
More informationS Comparing OpenACC 2.5 and OpenMP 4.5
April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationYasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors
Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo
More informationStatic Data Race Detection for SPMD Programs via an Extended Polyhedral Representation
via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationPoint-to-Point Synchronisation on Shared Memory Architectures
Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:
More informationFunctions in C C Programming and Software Tools
Functions in C C Programming and Software Tools N.C. State Department of Computer Science Functions in C Functions are also called subroutines or procedures One part of a program calls (or invokes the
More informationOpenMP and more Deadlock 2/16/18
OpenMP and more Deadlock 2/16/18 Administrivia HW due Tuesday Cache simulator (direct-mapped and FIFO) Steps to using threads for parallelism Move code for thread into a function Create a struct to hold
More informationComparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015
Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Abstract As both an OpenMP and OpenACC insider I will present my opinion of the current status of these two directive sets for programming
More informationCSE 12 Abstract Syntax Trees
CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures
More information2
1 2 3 4 5 Code transformation Every time the compiler finds a #pragma omp parallel directive creates a new function in which the code belonging to the scope of the pragma itself is moved The directive
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationCS 261 Fall Mike Lam, Professor. Threads
CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain
More informationParallel Programming
Parallel Programming Midterm Exam Wednesday, April 27, 2016 Your points are precious, don t let them go to waste! Your Time All points are not equal. Note that we do not think that all exercises have the
More informationIntroduction to Computer Science Midterm 3 Fall, Points
Introduction to Computer Science Fall, 2001 100 Points Notes 1. Tear off this sheet and use it to keep your answers covered at all times. 2. Turn the exam over and write your name next to the staple. Do
More information1 Serial Implementation
Grey Ballard, Razvan Carbunescu, Andrew Gearhart, Mehrzad Tartibi CS267: Homework 2 1 Serial Implementation For n particles, the original code requires O(n 2 ) time because at each time step, the apply
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationPortability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17
Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationClass Information INFORMATION and REMINDERS Homework 8 has been posted. Due Wednesday, December 13 at 11:59pm. Third programming has been posted. Due Friday, December 15, 11:59pm. Midterm sample solutions
More informationOperating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet
Due date: Monday, 4/06/2012 12:30 noon Teaching assistants in charge: Operating Systems (234123) Spring-2012 Homework 3 Wet Anastasia Braginsky All emails regarding this assignment should be sent only
More informationIntroduction to Programming in C Department of Computer Science and Engineering. Lecture No. #34. Function with pointer Argument
Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #34 Function with pointer Argument (Refer Slide Time: 00:05) So, here is the stuff that we have seen about pointers.
More informationCOMP 524 Spring 2018 Midterm Thursday, March 1
Name PID COMP 524 Spring 2018 Midterm Thursday, March 1 This exam is open note, open book and open computer. It is not open people. You are to submit this exam through gradescope. Resubmissions have been
More informationPointers. Pointer Variables. Chapter 11. Pointer Variables. Pointer Variables. Pointer Variables. Declaring Pointer Variables
Chapter 11 Pointers The first step in understanding pointers is visualizing what they represent at the machine level. In most modern computers, main memory is divided into bytes, with each byte capable
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationArray Basics: Outline. Creating and Accessing Arrays. Creating and Accessing Arrays. Arrays (Savitch, Chapter 7)
Array Basics: Outline Arrays (Savitch, Chapter 7) TOPICS Array Basics Arrays in Classes and Methods Programming with Arrays Searching and Sorting Arrays Multi-Dimensional Arrays Static Variables and Constants
More informationOpenMP 3.0 Tasking Implementation in OpenUH
Open64 Workshop @ CGO 09 OpenMP 3.0 Tasking Implementation in OpenUH Cody Addison Texas Instruments Lei Huang University of Houston James (Jim) LaGrone University of Houston Barbara Chapman University
More informationCS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011
CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationCS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017
CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.
More informationLectures 13 & 14. memory management
Lectures 13 & 14 Linked lists and memory management Courtesy of Prof. Garcia (UCB) CS61C L05 Introduction to C (pt 3) (1) Review Pointers and arrays are virtually same C knows how to increment pointers
More informationQuestions from last time
Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100
More informationSubject: PROBLEM SOLVING THROUGH C Time: 3 Hours Max. Marks: 100
Code: DC-05 Subject: PROBLEM SOLVING THROUGH C Time: 3 Hours Max. Marks: 100 NOTE: There are 11 Questions in all. Question 1 is compulsory and carries 16 marks. Answer to Q. 1. must be written in the space
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management 2007-02-06 Hello to Said S. from Columbus, OH CS61C L07 More Memory Management (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia
More informationAssignment 4: Semantics
Assignment 4: Semantics 15-411: Compiler Design Jan Hoffmann Jonathan Burns, DeeDee Han, Anatol Liu, Alice Rao Due Thursday, November 3, 2016 (9:00am) Reminder: Assignments are individual assignments,
More informationBasic programming knowledge (arrays, looping, functions) Basic concept of parallel programming (in OpenMP)
Parallel Sort Course Level: CS2 PDC Concepts Covered PDC Concept Concurrency Data Parallel Sequential Dependency Bloom Level C A A Programing Knowledge Prerequisites: Basic programming knowledge (arrays,
More informationTracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits,
Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs {livshits, lam}@cs.stanford.edu 2 Background Software systems are getting bigger Harder to develop Harder to modify Harder
More informationC Pointers. 6th April 2017 Giulio Picierro
C Pointers 6th April 07 Giulio Picierro Functions Return type Function name Arguments list Function body int sum(int a, int b) { return a + b; } Return statement (return keyword
More informationLoop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization
Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of
More informationData Types. Every program uses data, either explicitly or implicitly to arrive at a result.
Every program uses data, either explicitly or implicitly to arrive at a result. Data in a program is collected into data structures, and is manipulated by algorithms. Algorithms + Data Structures = Programs
More informationNon-Blocking Inter-Partition Communication with Wait-Free Pair Transactions
Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Ethan Blanton and Lukasz Ziarek Fiji Systems, Inc. October 10 th, 2013 WFPT Overview Wait-Free Pair Transactions A communication
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationLecture 14 Pointer Analysis
Lecture 14 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis [ALSU 12.4, 12.6-12.7] Phillip B. Gibbons 15-745: Pointer Analysis
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationProgramming Fundamentals - A Modular Structured Approach using C++ By: Kenneth Leroy Busbee
1 0 1 0 Foundation Topics 1 0 Chapter 1 - Introduction to Programming 1 1 Systems Development Life Cycle N/A N/A N/A N/A N/A N/A 1-8 12-13 1 2 Bloodshed Dev-C++ 5 Compiler/IDE N/A N/A N/A N/A N/A N/A N/A
More informationMetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores
MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,
More informationReasoning and writing about algorithms: some tips
Reasoning and writing about algorithms: some tips Theory of Algorithms Winter 2016, U. Chicago Notes by A. Drucker The suggestions below address common issues arising in student homework submissions. Absorbing
More informationFinal exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Dec 20, Student's name: Student ID:
Fall term 2012 KAIST EE209 Programming Structures for EE Final exam Thursday Dec 20, 2012 Student's name: Student ID: The exam is closed book and notes. Read the questions carefully and focus your answers
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationPointers. 1 Background. 1.1 Variables and Memory. 1.2 Motivating Pointers Massachusetts Institute of Technology
Introduction to C++ Massachusetts Institute of Technology ocw.mit.edu 6.096 Pointers 1 Background 1.1 Variables and Memory When you declare a variable, the computer associates the variable name with a
More informationAlgorithms & Data Structures
GATE- 2016-17 Postal Correspondence 1 Algorithms & Data Structures Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationShared Memory Programming. Parallel Programming Overview
Shared Memory Programming Arvind Krishnamurthy Fall 2004 Parallel Programming Overview Basic parallel programming problems: 1. Creating parallelism & managing parallelism Scheduling to guarantee parallelism
More informationPOSIX Threads and OpenMP tasks
POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i
More informationCS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find
CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS
More informationC++ (Non for C Programmer) (BT307) 40 Hours
C++ (Non for C Programmer) (BT307) 40 Hours Overview C++ is undoubtedly one of the most widely used programming language for implementing object-oriented systems. The C++ language is based on the popular
More information5.12 EXERCISES Exercises 263
5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from
More informationProgramming refresher and intro to C programming
Applied mechatronics Programming refresher and intro to C programming Sven Gestegård Robertz sven.robertz@cs.lth.se Department of Computer Science, Lund University 2018 Outline 1 C programming intro 2
More informationLecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis
Pros and Cons of Pointers Lecture 27 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers
More informationLecture 20 Pointer Analysis
Lecture 20 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) 15-745:
More informationSome changes in snow and R
Some changes in snow and R Luke Tierney Department of Statistics & Actuarial Science University of Iowa December 13, 2007 Luke Tierney (U. of Iowa) Some changes in snow and R December 13, 2007 1 / 22 Some
More informationIntroduction to Computer Systems /18 243, fall th Lecture, Oct. 22 th
Introduction to Computer Systems 15 213/18 243, fall 2009 16 th Lecture, Oct. 22 th Instructors: Gregory Kesden and Markus Püschel Today Dynamic memory allocation Process Memory Image %esp kernel virtual
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationChapter 4: Threads. Chapter 4: Threads
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationImplementing Interfaces. Marwan Burelle. July 20, 2012
Implementing marwan.burelle@lse.epita.fr http://www.lse.epita.fr/ July 20, 2012 Outline 1 2 3 4 Quick Overview of System oriented programming language Variant of C with a rationnalized syntax. Syntactic
More informationLoop Modifications to Enhance Data-Parallel Performance
Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent
More informationDawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs
DawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs Breno Campos Ferreira Guimarães, Gleison Souza Diniz Mendonça, Fernando Magno Quintão Pereira 1 Departamento de Ciência da Computação
More informationCOMP26120: Linked List in C (2018/19) Lucas Cordeiro
COMP26120: Linked List in C (2018/19) Lucas Cordeiro lucas.cordeiro@manchester.ac.uk Linked List Lucas Cordeiro (Formal Methods Group) lucas.cordeiro@manchester.ac.uk Office: 2.28 Office hours: 10-11 Tuesday,
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationAdvanced Compiler Construction
CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationOpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR
OpenCL C Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Is used to write kernels when working with OpenCL Used to code the part that runs on the device Based on C99 with some extensions
More informationM1-R4: Programing and Problem Solving using C (JAN 2019)
M1-R4: Programing and Problem Solving using C (JAN 2019) Max Marks: 100 M1-R4-07-18 DURATION: 03 Hrs 1. Each question below gives a multiple choice of answers. Choose the most appropriate one and enter
More informationCCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs
CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use
More informationParallel design patterns ARCHER course. Vectorisation and active messaging
Parallel design patterns ARCHER course Vectorisation and active messaging Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading
More informationhttps://lambda.mines.edu Evaluating programming languages based on: Writability: How easy is it to write good code? Readability: How easy is it to read well written code? Is the language easy enough to
More informationIntroduction to Programming in C Department of Computer Science and Engineering. Lecture No. #33 Pointer Arithmetic
Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #33 Pointer Arithmetic In this video let me, so some cool stuff which is pointer arithmetic which helps you to
More informationChapter 5: Process Synchronization. Operating System Concepts 9 th Edition
Chapter 5: Process Synchronization Silberschatz, Galvin and Gagne 2013 Chapter 5: Process Synchronization Background The Critical-Section Problem Peterson s Solution Synchronization Hardware Mutex Locks
More informationSTAPL Standard Template Adaptive Parallel Library
STAPL Standard Template Adaptive Parallel Library Lawrence Rauchwerger Antal Buss, Harshvardhan, Ioannis Papadopoulous, Olga Pearce, Timmie Smith, Gabriel Tanase, Nathan Thomas, Xiabing Xu, Mauro Bianco,
More information8. Functions (II) Control Structures: Arguments passed by value and by reference int x=5, y=3, z; z = addition ( x, y );
- 50 - Control Structures: 8. Functions (II) Arguments passed by value and by reference. Until now, in all the functions we have seen, the arguments passed to the functions have been passed by value. This
More informationCPSC 213, Winter 2009, Term 2 Midterm Exam Date: March 12, 2010; Instructor: Mike Feeley
CPSC 213, Winter 2009, Term 2 Midterm Exam Date: March 12, 2010; Instructor: Mike Feeley This is a closed book exam. No notes. Electronic calculators are permitted. Answer in the space provided. Show your
More informationCSE 307: Principles of Programming Languages
CSE 307: Principles of Programming Languages Variables and Constants R. Sekar 1 / 22 Topics 2 / 22 Variables and Constants Variables are stored in memory, whereas constants need not be. Value of variables
More informationSystem Software Assignment 1 Runtime Support for Procedures
System Software Assignment 1 Runtime Support for Procedures Exercise 1: Nested procedures Some programming languages like Oberon and Pascal support nested procedures. 1. Find a run-time structure for such
More informationRun-time Environments - 3
Run-time Environments - 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture n What is run-time
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More information