The Implementation of Cilk-5 Multithreaded Language

Size: px
Start display at page:

Download "The Implementation of Cilk-5 Multithreaded Language"

Transcription

1 The Implementation of Cilk-5 Multithreaded Language By Matteo Frigo, Charles E. Leiserson, and Keith H Randall Presented by Martin Skou 1/14

2 The authors Matteo Frigo Chief Scientist and founder of Cilk Arts Ph.D. in 1999 from the Department of Electrical Engineering and Computer Science at MIT. Awards George M. Sprowls award for outstanding doctoral dissertations in computer science at MIT Charles E. Leiserson Chairman, Chief Technology Officer and Founder of Cilk Arts Professor of Computer Science and Engineering at MIT MacVicar Faculty Fellow at MIT ACM Fellow Keith H. Randall Software engineer at Google Ph.d from MIT 2/14

3 The Paper ACM SIGPLAN Notices 1998 Association for Computing Machinery's Special Interest Group on programming languages Most Influential PLDI Paper Award 2008 A paper presented at the PLDI held 10 years prior to the award year. Includes a prize of $1,000 Programming Language Design and Implementation (PLDI) One of the ACM SIGPLAN's most important conferences. 3/14

4 Introduction to Cilk Language for multithreaded parallel programming based on C. Faithful extension of C It can on one processor scales down to run nearly as fast as the serial version of C, called C elision Designed for general-purpose parallel programming Has a scheduler that allows the performance of programs to be estimated Developed by the MIT CSAIL Supercomputing Technologies Group under the leadership of prof. Charles Leiserson. in 1994 Funding Supported in part by NSF Previous support in part by DARPA Latest version is Designed on the work-first principle Reduce the work overhead 4/14

5 Introduction to Cilk Performance of a Cilk computation work total execution time in serial critical-path length execution time on a infinite number of processors Uses a workstealing scheduler In Cilk-1: scheduler optimized only at compile time in Cilk-5: scheduler optimize both at compile time and run time 5/14

6 The Cilk language Uses keywords cilk identifies a function which is written in Cilk spawn Indicates that the procedure call it modifies can safely operate in parallel with other executing code sync Indicates that execution of the current procedure cannot proceed until all previously spawned procedures have completed and returned their results to the parent frame inlet Identifies a function defined within the procedure as an inlet abort Can only be used inside an inlet Tells the scheduler that any other procedures that have been spawned off by the parent procedure can safely be aborted 6/14

7 Example in Cilk #include tstdlib.h> #include <stdio.h> #include <cilk.h> cilk int fib (int n) { if (n(2) return n; else { int x, y; x = spawn fib (n-1); y = spawn fib (n-2); sync; return (x+y) ; } cilk int main (int argc, char *argv[]) { int n. result; n = atoi(argv[1]); result. = spawn fib(n); sync ; printf ( Result: %d\n, result) ; return 0: } cilk int fib (int n) { int x = 0; inlet void summer (int result) { x += result; return; } } if (nt2) return n; else c { summer(spawn fib (n-l)); summer(spawn fib (n-2)); sync; return (x); } 7/14

8 Cilk's compilation strategy The Cilk complier is called cilk2c Generate two clones fast little support for parallelism operates as the C elision does slow full support for parallelism, along enclosed overhead 8/14

9 Cilk's scheduling algorithm worker (processor) Maintains a ready deque A double ended queue of ready procedures Has a head and a tail Operates on its local tail end of the queue If the queue is empty (idle) it change to a thief thief Steal procedures from other workers Take form the head end of the other workers queue victim Worker losing procedures 9/14

10 Cilk's scheduling algorithm When a procedure is spawned start the as a fast clone if the procedures is getting stolen converted to a slow clone The stealing from the head, and convert to slow clone invariant that fast has not been stolen no descendants of a fast clone has been stolen 10/14

11 Stealing procedures Thief can get a procedure and change head to 3 Victim can change tail to 5 and get a procedure Victim will change tail to 4, if no thieves interfere, victim gets the procedure, if there are then If thief finds H>T, it stops trying If victim finds H>T, it restart and try again If thief try to steal, it fails if the victim try, it fails and control return to the runtime system 11/14

12 Conclusion Cilk-5 has better use of the work-first principle You can remove the Cilk keyword in C program and run it like a normal C program It uses compiled clones and queues to enhance the parallel performance 12/14

13 My comments More difficult than first expected Give a good overview over how some of the features in Cilk-5 is implemented The Cilk language seem so difficult to program i, if the user do understand C 13/14

14 Your comments Thank You 14/14

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding

More information

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas

More information

Multithreaded Programming in Cilk. Matteo Frigo

Multithreaded Programming in Cilk. Matteo Frigo Multithreaded Programming in Cilk Matteo Frigo Multicore challanges Development time: Will you get your product out in time? Where will you find enough parallel-programming talent? Will you be forced to

More information

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1 OUTLINE CILK and CILK++ Language Features and Usages Work stealing runtime CILK++ Reducers Conclusions 2 IDEALIZED SHARED MEMORY ARCHITECTURE Hardware

More information

Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101

More information

Multicore programming in CilkPlus

Multicore programming in CilkPlus Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements

More information

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Håkan Sundell University College of Borås Parallel Scalable Solutions AB

Håkan Sundell University College of Borås Parallel Scalable Solutions AB Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell University College of Borås Parallel Scalable Solutions AB Philippas Tsigas Chalmers University of Technology

More information

CSE 260 Lecture 19. Parallel Programming Languages

CSE 260 Lecture 19. Parallel Programming Languages CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014

More information

A Quick Introduction To The Intel Cilk Plus Runtime

A Quick Introduction To The Intel Cilk Plus Runtime A Quick Introduction To The Intel Cilk Plus Runtime 6.S898: Advanced Performance Engineering for Multicore Applications March 8, 2017 Adapted from slides by Charles E. Leiserson, Saman P. Amarasinghe,

More information

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence Plan Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 Multi-core Architecture 2 Race Conditions and Cilkscreen (Moreno Maza) Introduction

More information

Cilk Plus: Multicore extensions for C and C++

Cilk Plus: Multicore extensions for C and C++ Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

Reducers and other Cilk++ hyperobjects

Reducers and other Cilk++ hyperobjects Reducers and other Cilk++ hyperobjects Matteo Frigo (Intel) ablo Halpern (Intel) Charles E. Leiserson (MIT) Stephen Lewin-Berlin (Intel) August 11, 2009 Collision detection Assembly: Represented as a tree

More information

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Marc Moreno Maza CS CS 9624

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Marc Moreno Maza CS CS 9624 Plan Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 Multi-core Architecture Multi-core processor CPU Cache CPU Coherence

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5

More information

Atomic Transactions in Cilk Project Presentation 12/1/03

Atomic Transactions in Cilk Project Presentation 12/1/03 Atomic Transactions in Cilk 6.895 Project Presentation 12/1/03 Data Races and Nondeterminism int x = 0; 1: read x 1: write x time cilk void increment() { x = x + 1; cilk int main() { spawn increment();

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 02 - CS 9535 arallelism Complexity Measures 2 cilk for Loops 3 Measuring

More information

The Implementation of the Cilk-5 Multithreaded Language

The Implementation of the Cilk-5 Multithreaded Language The Implementation of the Cilk-5 Multithreaded Language Matte0 Frigo Charles E. Leiserson Keith H. Randall MIT Laboratory for Computer Science 545 Technology Square Cambridge, Massachusetts 02139 {athena,cel,randall}@lcs.mit.edu

More information

CS 240A: Shared Memory & Multicore Programming with Cilk++

CS 240A: Shared Memory & Multicore Programming with Cilk++ CS 240A: Shared Memory & Multicore rogramming with Cilk++ Multicore and NUMA architectures Multithreaded rogramming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some

More information

Cost Model: Work, Span and Parallelism

Cost Model: Work, Span and Parallelism CSE 539 01/15/2015 Cost Model: Work, Span and Parallelism Lecture 2 Scribe: Angelina Lee Outline of this lecture: 1. Overview of Cilk 2. The dag computation model 3. Performance measures 4. A simple greedy

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 Concepts in Multicore Programming The Multicore- Software Challenge MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 2009 Charles E. Leiserson 1 Cilk, Cilk++, and Cilkscreen, are trademarks of

More information

An Overview of Parallel Computing

An Overview of Parallel Computing An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1.

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1. Moore s Law 1000000 Intel CPU Introductions 6.172 Performance Engineering of Software Systems Lecture 11 Multicore Programming Charles E. Leiserson 100000 10000 1000 100 10 Clock Speed (MHz) Transistors

More information

Parallelism and Performance

Parallelism and Performance 6.172 erformance Engineering of Software Systems LECTURE 13 arallelism and erformance Charles E. Leiserson October 26, 2010 2010 Charles E. Leiserson 1 Amdahl s Law If 50% of your application is parallel

More information

A Minicourse on Dynamic Multithreaded Algorithms

A Minicourse on Dynamic Multithreaded Algorithms Introduction to Algorithms December 5, 005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 9 A Minicourse on Dynamic Multithreaded Algorithms

More information

Lace: non-blocking split deque for work-stealing

Lace: non-blocking split deque for work-stealing Lace: non-blocking split deque for work-stealing Tom van Dijk and Jaco van de Pol Formal Methods and Tools, Dept. of EEMCS, University of Twente P.O.-box 217, 7500 AE Enschede, The Netherlands {t.vandijk,vdpol}@cs.utwente.nl

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

Work-Stealing by Stealing States from Live Stack Frames of a Running Application

Work-Stealing by Stealing States from Live Stack Frames of a Running Application Work-Stealing by Stealing States from Live Stack Frames of a Running Application Vivek Kumar Daniel Frampton David Grove Olivier Tardieu Stephen M. Blackburn Australian National University IBM T.J. Watson

More information

Cilk: Ecient Multithreaded Computing by Keith H. Randall Submitted to the Department of Electrical Engineering and Computer Science on May 21, 1998, i

Cilk: Ecient Multithreaded Computing by Keith H. Randall Submitted to the Department of Electrical Engineering and Computer Science on May 21, 1998, i Cilk: Ecient Multithreaded Computing by Keith H. Randall Submitted to the Department of Electrical Engineering and Computer Science in partial fulllment of the requirements for the degree of Doctor of

More information

Provably Efficient Non-Preemptive Task Scheduling with Cilk

Provably Efficient Non-Preemptive Task Scheduling with Cilk Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the

More information

Efficient Detection of Determinacy Race in Transactional Cilk Programs

Efficient Detection of Determinacy Race in Transactional Cilk Programs Efficient Detection of Determinacy Race in Transactional Cilk Programs Xie Yong Singapore-MIT Alliance Outline Definition determinacy race in transactional Cilk Algorithm T. E. R. D. Implementation Cilk

More information

An Implementation of Exception Handling with Collateral Task Abortion

An Implementation of Exception Handling with Collateral Task Abortion [DOI: 10.2197/ipsjjip.24.439] Regular Paper An Implementation of Exception Handling with Collateral Task Abortion Tasuku Hiraishi 1,a) Shingo Okuno 2 Masahiro Yasugi 3 Received: July 3, 2015, Accepted:

More information

Overview Parallel Algorithms. New parallel supports Interactive parallel computation? Any application is parallel :

Overview Parallel Algorithms. New parallel supports Interactive parallel computation? Any application is parallel : Overview Parallel Algorithms 2! Machine model and work-stealing!!work and depth! Design and Implementation! Fundamental theorem!! Parallel divide & conquer!! Examples!!Accumulate!!Monte Carlo simulations!!prefix/partial

More information

CS CS9535: An Overview of Parallel Computing

CS CS9535: An Overview of Parallel Computing CS4403 - CS9535: An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) January 10, 2017 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms:

More information

IDENTIFYING PERFORMANCE BOTTLENECKS IN WORK-STEALING COMPUTATIONS

IDENTIFYING PERFORMANCE BOTTLENECKS IN WORK-STEALING COMPUTATIONS C OV ER F E AT U RE IDENTIFYING PERFORMANCE BOTTLENECKS IN WORK-STEALING COMPUTATIONS Nathan R. Tallent and John M. Mellor-Crummey, Rice University Work stealing is an effective load-balancing strategy

More information

Parallel Algorithms. Design and Implementation. Jean-Louis.Roch at imag.fr. MOAIS / Lab. Informatique Grenoble, INRIA, France

Parallel Algorithms. Design and Implementation. Jean-Louis.Roch at imag.fr. MOAIS / Lab. Informatique Grenoble, INRIA, France Parallel Algorithms Design and Implementation Jean-Louis.Roch at imag.fr MOAIS / Lab. Informatique Grenoble, INRIA, France 1 Overview 2 Machine model and work-stealing! Work and depth! Fundamental theorem

More information

Multithreaded Parallelism on Multicore Architectures

Multithreaded Parallelism on Multicore Architectures Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 March 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk

More information

CellCilk: Extending Cilk for heterogeneous multicore platforms

CellCilk: Extending Cilk for heterogeneous multicore platforms CellCilk: Extending Cilk for heterogeneous multicore platforms Tobias Werth 1, Silvia Schreier 2, and Michael Philippsen 1 1 University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming

More information

On the cost of managing data flow dependencies

On the cost of managing data flow dependencies On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work

More information

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical

More information

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015 CSE 539 Project 3 Assigned: 03/17/2015 Due Date: 03/27/2015 Building a parallelism profiler for Cilk computations In this project, you will implement a simple serial tool for Cilk programs a parallelism

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,

More information

Understanding Task Scheduling Algorithms. Kenjiro Taura

Understanding Task Scheduling Algorithms. Kenjiro Taura Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 48 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary

More information

TaskMan: Simple Task-Parallel Programming

TaskMan: Simple Task-Parallel Programming TaskMan: Simple Task-Parallel Programming Derek Hower University of Wisconsin-Madison Computer Sciences Dept. 1210 W. Dayton St. Madison, WI drh5@cs.wisc.edu Steve Jackson University of Wisconsin-Madison

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

Cilk, Matrix Multiplication, and Sorting

Cilk, Matrix Multiplication, and Sorting 6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction

More information

Ownership of a queue for practical lock-free scheduling

Ownership of a queue for practical lock-free scheduling Ownership of a queue for practical lock-free scheduling Lincoln Quirk May 4, 2008 Abstract We consider the problem of scheduling tasks in a multiprocessor. Tasks cannot always be scheduled independently

More information

COMP Parallel Computing. SMM (4) Nested Parallelism

COMP Parallel Computing. SMM (4) Nested Parallelism COMP 633 - Parallel Computing Lecture 9 September 19, 2017 Nested Parallelism Reading: The Implementation of the Cilk-5 Multithreaded Language sections 1 3 1 Topics Nested parallelism in OpenMP and other

More information

A Primer on Scheduling Fork-Join Parallelism with Work Stealing

A Primer on Scheduling Fork-Join Parallelism with Work Stealing Doc. No.: N3872 Date: 2014-01-15 Reply to: Arch Robison A Primer on Scheduling Fork-Join Parallelism with Work Stealing This paper is a primer, not a proposal, on some issues related to implementing fork-join

More information

Atomic Transactions in Cilk

Atomic Transactions in Cilk Atomic Transactions in Jim Sukha 12-13-03 Contents 1 Introduction 2 1.1 Determinacy Races in Multi-Threaded Programs......................... 2 1.2 Atomicity through Transactions...................................

More information

Multicore Programming Handout 1: Installing GCC Cilk Plus

Multicore Programming Handout 1: Installing GCC Cilk Plus Multicore Programming Handout 1: Installing GCC Cilk Plus Leo Ferres Department of Computer Science Universidad de Concepción Email: lferres@inf.udec.cl February 19, 2013 1 Introduction For our lab work,

More information

1 Optimizing parallel iterative graph computation

1 Optimizing parallel iterative graph computation May 15, 2012 1 Optimizing parallel iterative graph computation I propose to develop a deterministic parallel framework for performing iterative computation on a graph which schedules work on vertices based

More information

Cilk Reference Manual

Cilk Reference Manual Cilk 5.4.6 Reference Manual Supercomputing Technologies Group MIT Laboratory for Computer Science http://supertech.lcs.mit.edu/cilk Cilk is a trademark of the Massachusetts Institute of Technology. The

More information

Cilk: An Efficient Multithreaded Runtime System

Cilk: An Efficient Multithreaded Runtime System Cilk: An Efficient Multithreaded Runtime System ROBERT D. BLUMOFE, CHRISTOPHER F. JOERG, BRADLEY C. KUSZMAUL, CHARLES E. LEISERSON, KEITH H. RANDALL, AND YULI ZHOU MIT Laboratory for Computer Science,

More information

Cilk: An Efficient Multithreaded Runtime System

Cilk: An Efficient Multithreaded Runtime System Cilk: An Efficient Multithreaded Runtime System ROBERT D. BLUMOFE, CHRISTOPHER F. JOERG, BRADLEY C. KUSZMAUL, CHARLES E. LEISERSON, KEITH H. RANDALL, AND YULI ZHOU MIT Laboratory for Computer Science,

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Dynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs. Siddhartha Sen

Dynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs. Siddhartha Sen Dynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs by Siddhartha Sen Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements

More information

The JCilk Multithreaded Language. I-Ting Angelina Lee

The JCilk Multithreaded Language. I-Ting Angelina Lee The JCilk Multithreaded Language by I-Ting Angelina Lee Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of

More information

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song CSCE 313 Introduction to Computer Systems Instructor: Dezhen Song Programs, Processes, and Threads Programs and Processes Threads Programs, Processes, and Threads Programs and Processes Threads Processes

More information

CSE 613: Parallel Programming

CSE 613: Parallel Programming CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of

More information

CSCE 313: Intro to Computer Systems

CSCE 313: Intro to Computer Systems CSCE 313 Introduction to Computer Systems Instructor: Dr. Guofei Gu http://courses.cse.tamu.edu/guofei/csce313/ Programs, Processes, and Threads Programs and Processes Threads 1 Programs, Processes, and

More information

Reading Assignment 4. n Chapter 4 Threads, due 2/7. 1/31/13 CSE325 - Processes 1

Reading Assignment 4. n Chapter 4 Threads, due 2/7. 1/31/13 CSE325 - Processes 1 Reading Assignment 4 Chapter 4 Threads, due 2/7 1/31/13 CSE325 - Processes 1 What s Next? 1. Process Concept 2. Process Manager Responsibilities 3. Operations on Processes 4. Process Scheduling 5. Cooperating

More information

Adaptively Parallel Processor Allocation for Cilk Jobs

Adaptively Parallel Processor Allocation for Cilk Jobs 6.895 Theory of Parallel Systems Kunal Agrawal, Siddhartha Sen Final Report Adaptively Parallel Processor Allocation for Cilk Jobs Abstract An adaptively parallel job is one in which the number of processors

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez MIT rmperez@mit.edu Gregory Malecha Harvard University SEAS gmalecha@cs.harvard.edu ABSTRACT Backtracking search algorithms are useful in many domains, from

More information

Computer Systems Assignment 2: Fork and Threads Package

Computer Systems Assignment 2: Fork and Threads Package Autumn Term 2018 Distributed Computing Computer Systems Assignment 2: Fork and Threads Package Assigned on: October 5, 2018 Due by: October 12, 2018 1 Understanding fork() and exec() Creating new processes

More information

Scheduling Parallel Programs by Work Stealing with Private Deques

Scheduling Parallel Programs by Work Stealing with Private Deques Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Carnegie Mellon University Arthur Charguéraud INRIA Mike Rainey Max Planck Institute for Software Systems PPoPP 25.2.2013 1 Scheduling

More information

Shared-memory Parallel Programming with Cilk Plus (Parts 2-3)

Shared-memory Parallel Programming with Cilk Plus (Parts 2-3) Shared-memory Parallel Programming with Cilk Plus (Parts 2-3) John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 5-6 24,26 January 2017 Last Thursday

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

AMCAT Automata Coding Sample Questions And Answers

AMCAT Automata Coding Sample Questions And Answers 1) Find the syntax error in the below code without modifying the logic. #include int main() float x = 1.1; switch (x) case 1: printf( Choice is 1 ); default: printf( Invalid choice ); return

More information

Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures

Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures Daniel Spoonhower Guy E. Blelloch Phillip B. Gibbons Robert Harper Carnegie Mellon University {spoons,blelloch,rwh}@cs.cmu.edu

More information

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab. Load Balancing Minsoo Ryu Department of Computer Science and Engineering 2 1 Concepts of Load Balancing Page X 2 Load Balancing Algorithms Page X 3 Overhead of Load Balancing Page X 4 Load Balancing in

More information

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process

More information

A Fast Fourier Transform Compiler

A Fast Fourier Transform Compiler RETROSPECTIVE: A Fast Fourier Transform Compiler Matteo Frigo Vanu Inc., One Porter Sq., suite 18 Cambridge, MA, 02140, USA athena@fftw.org 1. HOW FFTW WAS BORN FFTW (the fastest Fourier transform in the

More information

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable

More information

Thread Scheduling for Multiprogrammed Multiprocessors

Thread Scheduling for Multiprogrammed Multiprocessors Thread Scheduling for Multiprogrammed Multiprocessors (Authors: N. Arora, R. Blumofe, C. G. Plaxton) Geoff Gerfin Dept of Computer & Information Sciences University of Delaware Outline Programming Environment

More information

Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects

Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects ABSTRACT I-Ting Angelina Lee Washington University in St. Louis One Brookings Drive St. Louis, MO 63130 A multithreaded Cilk program

More information

The DAG Model; Analysis of For-Loops; Reduction

The DAG Model; Analysis of For-Loops; Reduction CSE341T 09/06/2017 Lecture 3 The DAG Model; Analysis of For-Loops; Reduction We will now formalize the DAG model. We will also see how parallel for loops are implemented and what are reductions. 1 The

More information

Due: What to submit: Background

Due: What to submit: Background Due: See website for due date. (Late days may be used.) What to submit: Upload a tar ball using the p2 identifier that includes the following files: - id.txt with the SLO IDs in the format described for

More information

CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on

CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on I- Ting Angelina Lee Jan 13, 2015 Technology Scaling 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 u Transistors x 1000 Clock

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

BIL 104E Introduction to Scientific and Engineering Computing. Lecture 14

BIL 104E Introduction to Scientific and Engineering Computing. Lecture 14 BIL 104E Introduction to Scientific and Engineering Computing Lecture 14 Because each C program starts at its main() function, information is usually passed to the main() function via command-line arguments.

More information

Definition Multithreading Models Threading Issues Pthreads (Unix)

Definition Multithreading Models Threading Issues Pthreads (Unix) Chapter 4: Threads Definition Multithreading Models Threading Issues Pthreads (Unix) Solaris 2 Threads Windows 2000 Threads Linux Threads Java Threads 1 Thread A Unix process (heavy-weight process HWP)

More information

Efficiently Scheduling Task Dataflow Parallelism: A Comparison Between Swan and QUARK

Efficiently Scheduling Task Dataflow Parallelism: A Comparison Between Swan and QUARK Comparison Between and, H. (215). Efficiently Scheduling Task Dataflow Parallelism: A Comparison Between and. In Proceedings of the Exascale Applications and Software Conference 215 (pp. 36-41). Edinburgh:

More information

Introduction to Computing Systems Fall Lab # 3

Introduction to Computing Systems Fall Lab # 3 EE 1301 UMN Introduction to Computing Systems Fall 2013 Lab # 3 Collaboration is encouraged. You may discuss the problems with other students, but you must write up your own solutions, including all your

More information

CISC2200 Threads Spring 2015

CISC2200 Threads Spring 2015 CISC2200 Threads Spring 2015 Process We learn the concept of process A program in execution A process owns some resources A process executes a program => execution state, PC, We learn that bash creates

More information

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction: Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial

More information

What Is A Process? Process States. Process Concept. Process Control Block (PCB) Process State Transition Diagram 9/6/2013. Process Fundamentals

What Is A Process? Process States. Process Concept. Process Control Block (PCB) Process State Transition Diagram 9/6/2013. Process Fundamentals What Is A Process? A process is a program in execution. Process Fundamentals #include int main(int argc, char*argv[]) { int v; printf( hello world\n ); scanf( %d, &v); return 0; Program test

More information

Introduction to OpenMP

Introduction to OpenMP Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Today s Theme Message passing model (abstraction) Threads operate within

More information

Functions. CS10001: Programming & Data Structures. Sudeshna Sarkar Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Functions. CS10001: Programming & Data Structures. Sudeshna Sarkar Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Functions CS10001: Programming & Data Structures Sudeshna Sarkar Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 1 Recursion A process by which a function calls itself

More information

Data-Race Detection in Transactions- Everywhere Parallel Programming

Data-Race Detection in Transactions- Everywhere Parallel Programming Data-Race Detection in Transactions- Everywhere Parallel Programming by Kai Huang B.S. Computer Science and Engineering, B.S. Mathematics Massachusetts Institute of Technology, June 2002 Submitted to the

More information

Runtime Support for Scalable Task-parallel Programs

Runtime Support for Scalable Task-parallel Programs Runtime Support for Scalable Task-parallel Programs Pacific Northwest National Lab xsig workshop May 2018 http://hpc.pnl.gov/people/sriram/ Single Program Multiple Data int main () {... } 2 Task Parallelism

More information

A Cilk Implementation of LTE Base-station uplink on the TILEPro64 Processor

A Cilk Implementation of LTE Base-station uplink on the TILEPro64 Processor A Cilk Implementation of LTE Base-station uplink on the TILEPro64 Processor Master of Science Thesis in the Programme of Integrated Electronic System Design HAO LI Chalmers University of Technology Department

More information

Sample Problems for Quiz # 2

Sample Problems for Quiz # 2 EE 1301 UMN Introduction to Computing Systems Fall 2013 Sample Problems for Quiz # 2 (with solutions) Here are sample problems to help you prepare for Quiz 2 on Oct. 31. 1. Bit-Level Arithmetic (a) Consider

More information