DAG-Calculus: A Calculus for Parallel Computation

Similar documents
Scheduling Parallel Programs by Work Stealing with Private Deques

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

Compsci 590.3: Introduction to Parallel Computing

Today: Amortized Analysis (examples) Multithreaded Algs.

Responsive Parallel Computation: Bridging Competitive and Cooperative Threading

Oracle-Guided Scheduling for Controlling Granularity in Implicitly Parallel Languages

Semi-Persistent Data Structures

Equations: a tool for dependent pattern-matching

Multicore programming in CilkPlus

Shared-memory Parallel Programming with Cilk Plus

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson

Pretty-Big-Step Semantics

Meta-programming with Names and Necessity p.1

Multithreaded Programming in Cilk. Matteo Frigo

Shared-memory Parallel Programming with Cilk Plus

We defined congruence rules that determine the order of evaluation, using the following evaluation

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

Latency-Hiding Work Stealing

Thesis Proposal. Responsive Parallel Computation. Stefan K. Muller. May 2017

1.3. Conditional expressions To express case distinctions like

Cilk Plus: Multicore extensions for C and C++

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

GADTs meet Subtyping

Safe Reactive Programming: the FunLoft Proposal

Complexity of Algorithms

CS 173, Running Time Analysis, Counting, and Dynamic Programming. Tandy Warnow

Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

CMPSCI 250: Introduction to Computation. Lecture #14: Induction and Recursion (Still More Induction) David Mix Barrington 14 March 2013

Multithreaded Parallelism and Performance Measures

HOMEWORK FILE SOLUTIONS

CSCI 121: Recursive Functions & Procedures

A Brief Introduction to Standard ML

From CryptoVerif Specifications to Computationally Secure Implementations of Protocols

Algorithmic λ-calculus for the Design, Analysis, and Implementation of Parallel Algorithms

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Functional Programming. Overview. Topics. Definition n-th Fibonacci Number. Graph

CS4215 Programming Language Implementation. Martin Henz

Coq. LASER 2011 Summerschool Elba Island, Italy. Christine Paulin-Mohring

A Modular Way to Reason About Iteration

Identify recursive algorithms Write simple recursive algorithms Understand recursive function calling

Recursion I and II. Discrete Structures (CS 173) Madhusudan Parthasarathy, University of Illinois 1

15-210: Parallelism in the Real World

Practical Concurrency. Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

CS 310 Advanced Data Structures and Algorithms

Design Principles of Programming Languages. Recursive Types. Zhenjiang Hu, Haiyan Zhao, Yingfei Xiong Peking University, Spring Term

Functions. CS10001: Programming & Data Structures. Sudeshna Sarkar Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Recursion. Tjark Weber. Functional Programming 1. Based on notes by Sven-Olof Nyström. Tjark Weber (UU) Recursion 1 / 37

CS 4110 Programming Languages & Logics

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Rule-Based Operational Semantics for an Imperative Language

Big Bang. Designing a Statically Typed Scripting Language. Pottayil Harisanker Menon, Zachary Palmer, Scott F. Smith, Alexander Rozenshteyn

Reminder of the last lecture. Aliasing Issues: Call by reference, Pointer programs. Introducing Aliasing Issues. Home Work from previous lecture

CS 220: Discrete Structures and their Applications. Recursive objects and structural induction in zybooks

CS 240A: Shared Memory & Multicore Programming with Cilk++

Agenda. The worst algorithm in the history of humanity. Asymptotic notations: Big-O, Big-Omega, Theta. An iterative solution

recursive algorithms 1

A New Verified Compiler Backend for CakeML. Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish

Verification of an ML compiler. Lecture 3: Closures, closure conversion and call optimisations

Homework 1. Problem 1. The following code fragment processes an array and produces two values in registers $v0 and $v1:

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

MetaFork: A Metalanguage for Concurrency Platforms Targeting Multicores

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

The gradual typing approach to mixing static and dynamic typing

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Types for References, Exceptions and Continuations. Review of Subtyping. Γ e:τ τ <:σ Γ e:σ. Annoucements. How s the midterm going?

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

CS 101: Computer Programming and Utilization

Complexity of Algorithms. Andreas Klappenecker

A Context-Sensitive Memory Model for Verification of C/C++ Programs

Part VI. Imperative Functional Programming

Design Principles of Programming Languages. Recursive Types. Zhenjiang Hu, Haiyan Zhao, Yingfei Xiong Peking University, Spring Term, 2014

CSE 260 Lecture 19. Parallel Programming Languages

The Knapsack Problem an Introduction to Dynamic Programming. Slides based on Kevin Wayne / Pearson-Addison Wesley

λ calculus is inconsistent

Introduction to Concurrent Collections (CnC)

CSE 341: Programming Languages

Recursively Defined Functions

Part II. Hoare Logic and Program Verification. Why specify programs? Specification and Verification. Code Verification. Why verify programs?

Compilation and Program Analysis (#11) : Hoare triples and shape analysis

An Overview of Parallel Computing

A Logic of Proofs for Differential Dynamic Logic

Recursion. Lars-Henrik Eriksson. Functional Programming 1. Based on a presentation by Tjark Weber and notes by Sven-Olof Nyström

Programming Languages Lecture 15: Recursive Types & Subtyping

Chapter 13: Reference. Why reference Typing Evaluation Store Typings Safety Notes

Formal Verification of a Floating-Point Elementary Function

APA Interprocedural Dataflow Analysis

CS1 Recitation. Week 2

Multithreaded Parallelism on Multicore Architectures

Programming Language Pragmatics

Objects as Session-Typed Processes

Beluga: A Framework for Programming and Reasoning with Deductive Systems (System Description)

3. Functional Programming. Oscar Nierstrasz

Chapter 13: Reference. Why reference Typing Evaluation Store Typings Safety Notes

Scope and Introduction to Functional Languages. Review and Finish Scoping. Announcements. Assignment 3 due Thu at 11:55pm. Website has SML resources

ALGORITHM DESIGN DYNAMIC PROGRAMMING. University of Waterloo

Recursion defining an object (or function, algorithm, etc.) in terms of itself. Recursion can be used to define sequences

Transcription:

DAG-Calculus: A Calculus for Parallel Computation Umut Acar Carnegie Mellon University Arthur Charguéraud Inria & LRI Université Paris Sud, CNRS Mike Rainey Inria Filip Sieczkowski Inria Gallium Seminar February 2016

Parallel Computation Crucial for efficiency in the age of multicores Many different modes of use and language constructs Stress generally on efficiency, semantics not as well understood

Fibonacci with fork-join function fib (n) if n <= 1 then n else let (x,y) = forkjoin (fib (n-1), fib (n-2)) in x + y 2. Evaluate the result 1. Perform two recursive calls in parallel

Fibonacci with async-finish function fib (n) if n <= 1 then n else let (x,y) = (ref 0, ref 0) in finish { async (x := fib (n-1)); async (y := fib (n-2)) };!x +!y 1. Perform two recursive calls in parallel 3. Read results of calls and compute final result 2. Synchronise on completion of parallel calls

Fibonacci with futures function fib (n) if n <= 1 then n else let (x,y) = (future fib (n-1), future fib (n-2)) in (force x) + (force y) 1. Perform two recursive calls in parallel 2 & 3. Demand results from the parallel calls, and evaluate the final result

Motivating questions Is there a unifying model or calculus that can be used to express and study different forms of parallelism? Can such a calculus be realised efficiently in practice as a programming language, which can serve, for example, as a target for compiling different forms of parallelism?

Parallelism patterns: fork-join, async-finish fib(n-1) fib(n-2) fib(n) K[ ]

Parallelism patterns: futures x y fib(n) fib(n-1) fib(n-2) x y K[force(x)] fib(n-1) fib(n-2) K [force(y)]

Core idea reify the dependency edges

Computing with DAGs State of computation: (V, E, σ) V a set of DAG vertices, each with associated program and status E a set of DAG edges σ shared mutable state Side conditions enforce edges form DAG, status of vertices, etc.

Manipulating the DAG Four commands that are used to dynamically modify the computation DAG newtd(e) creates a new vertex in the DAG newedge(e, e ) creates an edge from e to e release(e) is called to mark that the vertex e is now set up and can be scheduled transfer(e) is a parallel control operator that transfers the outgoing edges of calling thread to e

The life-cycle of a vertex Vertex can have one of four status values: New, Released, executing or Finished (N, R, X, F) Node created by running newtd has status N After calling release, its status is changed to R At this point it can be scheduled for execution, which sets the status to X After the execution terminates, the status is set to F

Operational Semantics (1) 7! V(t) = (K[newTd e], X) t 0 fresh V, E, V[t 7! (K[t 0 7! ], X)][t 0 7! (e, N)], E, V(t) (K[ t t ] X) t t 2 dom(v) V(t) = (K[release t 0 ], X) V(t 0 ) = (e, N) NewTd Release V, E, V[t 7! (K[()], X)][t 0 7! (e, R)], E,! V(t) (K[ t 0 ] X) status( V(t) = (e, R) {t 0 (t 0, t) 2 E} = ; 0 V, E, V[t 7! (e, X)], E, ) E 0 E \{(t t 0 ) t 0 2 dom(v)} V(t) = (e 1, X) 1, e 1! e 2, 2 Start Step V, E, 1 V[t 7! (e 2, X)], E, 2 [ 7! ( )] V(t) (K[ e] X) t 0 fresh V(t) = (v, X) E 0 = E \{(t, t 0 ) t 0 2 dom(v)} V, E, V[t 7! ((), F)], E 0, Stop

Operational Semantics (2) 7! 0 0 7! V(t) = (K[newEdge t 1 t 2 ], X) t 1, t 2 2 dom(v) status(v(t 2 )) 2 {N, R} E 0 cycle-free E 0 = E ] (if status(v(t 1 )) = F then ; else {(t 1, t 2 )}) V, E, V[t 7! (K[()], X)], E 0, V(t 0 )) N {t 00 (t 0 t 00 ) 2 E} ; 0 7! 7! NewEdge V(t) = (K[transfer t 0 ], X) status(v(t 0 )) = N {t 00 (t 0, t 00 ) 2 E} = ; E 0 = E \{(t, t 00 ) t 00 2 dom(v)} ] {(t 0, t 00 ) (t, t 00 ) 2 E} E 0 cycle-free V, E, V[t 7! (K[()], X)], E 0, 0 Transfer

Encoding fork-join forkjoin (e1, e2) = capture (fn k => let l1 = alloc l2 = alloc t1 = newtd (l1 := e1 ) t2 = newtd (l2 := e2 ) t = newtd (k (!l1,!l2)) in newedge(t1, t); newedge(t2, t) transfer(t); release(t); release(t1); release(t2))

Encoding async-finish t async(e) = let t = newtd( t e ) in newedge(t, t); release(t ) t finish(e) = capture (fn k => let t2 = newtd(k ()) t1 = newtd( t2 e ) in newedge(t1, t2); transfer(t2); release(t2); release(t1))

Encoding futures future(e) = let l = alloc t = newtd(()) t = newtd(l := e ) in newedge(t, t); release(t); release(t ); (t, l) force(e) = let (t, l) = e in capture (fn k => let t = newtd (k (!l)) in newedge(t, t ); transfer(t ); release(t )

Proving the encodings correct: the technique Compiler-style proof of simulation Need backwards-simulation due to nondeterminism Problem: partially evaluated encodings do not correspond to any source terms Solution: an intermediate, annotated language, two-step proof Keep the structure and allow partial evaluation of parallel primitives

Also in the paper Scheduling DAG-calculus computations using work stealing Data-structures for efficient implementation of DAG edges Experimental evaluation of implementation

Conclusions A unifying calculus: common framework for expressing different modes of parallelism A low-level calculus: useful as an intermediate language/mental model rather than directly An efficient implementation using novel datastructures to handle high-degree vertices