Safe Parallel Language Extensions for Ada 202X

Similar documents
Panel on Ada & Parallelism: Ada Container Iterators for Parallelism and Map/Reduce

Safe Parallel Programming in Ada with Language Extensions

OpenMP tasking model for Ada: safety and correctness

Tucker Taft AdaCore Inc

Advanced OpenMP. Lecture 11: OpenMP 4.0

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0. Mark Bull, EPCC

An Introduction to ParaSail: Parallel Specification and Implementation Language S. Tucker Taft AdaEurope, Valencia, June 2010

An Execution Model for Fine-Grained Parallelism in Ada

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Allows program to be incrementally parallelized

CS 179: GPU Programming. Lecture 7

Technical Report. Tasklettes a Fine Grained Parallelism for Ada on Multicores. Stephen Michell Brad Moore Luis Miguel Pinho*

Annales UMCS Informatica AI 2 (2004) UMCS. OpenMP parser for Ada

Control Structures. CIS 118 Intro to LINUX

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

Remote Programming. Michaël Friess Technical Sales Manager. Ada-Europe 2006, Porto June 6,

Safe and Secure Software. Ada An Invitation to. Safe Object Construction. Courtesy of. John Barnes. The GNAT Pro Company

Progress on OpenMP Specifications

A simple example... Introduction to Ada95. Literals. Identifiers

Ada 2005 An introduction for TTK4145

Chapter 15: Information Flow

Introduction to HPC and Optimization Tutorial VI

Parallel Programming in C with MPI and OpenMP

Alfio Lazzaro: Introduction to OpenMP

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Parallel Programming in C with MPI and OpenMP

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

EE/CSCI 451: Parallel and Distributed Computation

0. Overview of this standard Design entities and configurations... 5

Control flow graphs and loop optimizations. Thursday, October 24, 13

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

GNAT Pro Innovations for High-Integrity Development

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Chapter 7 Control I Expressions and Statements

Weeks 6&7: Procedures and Parameter Passing

Special Topics: Programming Languages

Tutorial 13 Salary Survey Application: Introducing One- Dimensional Arrays

VHDL Essentials Simulation & Synthesis

CS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.

Hybrid Verification in SPARK 2014: Combining Formal Methods with Testing

G Programming Languages - Fall 2012

Parallel Programming in C with MPI and OpenMP

LECTURE 18. Control Flow

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallelization and Synchronization. CS165 Section 8

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Introduction to MapReduce

Modern Processor Architectures. L25: Modern Compiler Design

It is better to have 100 functions operate one one data structure, than 10 functions on 10 data structures. A. Perlis

CSE 307: Principles of Programming Languages

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Introduction to OpenMP.

Introduction to Multicore Programming

Introduction to. Slides prepared by : Farzana Rahman 1

5. VHDL - Introduction - 5. VHDL - Design flow - 5. VHDL - Entities and Architectures (1) - 5. VHDL - Entities and Architectures (2) -

We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:!

CUDA. More on threads, shared memory, synchronization. cuprintf

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP

Simply-Typed Lambda Calculus

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Threaded Programming. Lecture 1: Concepts

Different kinds of resources provided by a package

COMP322 - Introduction to C++ Lecture 02 - Basics of C++

On 17 June 2006, the editor provided the following list via an to the convener:

Parallel Numerical Algorithms

Programming Shared Memory Systems with OpenMP Part I. Book

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Intel Thread Building Blocks, Part II

1d: tests knowing about bitwise fields and union/struct differences.

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Comp2310 & Comp6310 Concurrent and Distributed Systems

Parallel Programming with OpenMP. CS240A, T. Yang

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Computer Architecture

CS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY

Safe Dynamic Memory Management in Ada and SPARK

Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015

Algorithm Analysis and Design

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Iterative Languages. Scoping

Procedural programming with C

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

CS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

l Determine if a number is odd or even l Determine if a number/character is in a range - 1 to 10 (inclusive) - between a and z (inclusive)

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

Lesson 1 1 Introduction

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

CSCI Lab 3. VHDL Syntax. Due: Tuesday, week6 Submit to: \\fs2\csci250\lab-3\

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency

ECE 574 Cluster Computing Lecture 10

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Transcription:

Presentation cover page EU Safe Parallel Language Extensions for Ada 202X Tucker Taft, AdaCore Co-Authors of HILT 2014 paper: Brad Moore, Luís Miguel Pinho, Stephen Michell June 2014 www.adacore.com

HILT 2014 Co-Located with OOPSLA/SPLASH in Portland OR celebrating the 20 th Anniversary of completion of Ada9X www.sigada.org/conf/hilt2014 Safe Parallel Ada 2

The Right Turn in Single-Processor Performance Courtesy IEEE Computer, January 2011, page 33. Safe Parallel Ada 3

Titan Supercomputer at Oak Ridge National Lab in US Distributed Computing with 18,688 nodes : Multicore (16 cores each) with Vector unit GPU with 64 warps of 32 lanes Safe Parallel Ada 4

Example of parallel programming language (ParaSail), with implicit parallelism for divide-and-conquer Simple cases Divide and Conquer func Word_Count! (S : Univ_String; Separators : Countable_Set<Univ_Character> := [' '])! -> Univ_Integer is! // Return count of words separated by given set of separators! case S of! [0] => return 0 // Empty string! [1] =>! if S[1] in Separators then! return 0 // A single separator! else! return 1 // A single non-separator! end if! [..] => // Multi-character string; divide and conquer! const Half_Len := S /2! const Sum := Word_Count( S[ 1.. Half_Len ], Separators ) +! Word_Count( S[ Half_Len <.. S ], Separators )! if S[Half_Len] in Separators! or else S[Half_Len+1] in Separators then! return Sum // At least one separator at border! else! return Sum-1 end if! end case! end func Word_Count! // Combine words at border! Safe Parallel Ada 5

Count words in a string, given a set of separators, using divide-and-conquer (rather than sequential scan) S: This is a test, but it s a bit boring.! 1111111111 2222222222333333333! 1234567890123456789 0123456789012345678! Separators: [,,,. ]! Word_Count(S, Separators) ==?!! S == 38 // means magnitude! Half_Len == 19! Word_Count(S[1.. 19], Separators) == 5! Word_Count(S[19 <.. 38], Separators) == 4! Sum == 9 // X <.. Y means (X, Y]! S[19] == t // t not in Separators! S[19+1] == // is in Separators! return 9! Safe Parallel Ada 6

Simple cases Start outer task Word_Count example in Ada 2012: function Word_Count(S : String; Separators : String) return Natural is! use Ada.Strings.Maps;! Seps : constant Character_Set := To_Set(Separators);!! task type TT(First, Last : Natural; Count : access Natural);! subtype WC_TT is TT; -- So is visible inside TT! task body TT is begin! if First > Last then -- Empty string! Count.all := 0;! elsif First = Last then -- A single character! if Is_In(S(First), Seps) then! Count.all := 0; -- A single separator! else! Count.all := 1; -- A single non-separator! end if;! else -- Divide and conquer! See next slide! end if;! end TT;!! Result : aliased Natural := 0;! begin! declare -- Spawn task to do the computation! Tsk : TT(S'First, S'Last, Result'Access);! begin! null;! end; -- Wait for subtask! return Result;! end Word_Count;! Safe Parallel Ada 7

Divide and Conquer Word_Count example in Ada 2012 (cont d): function Word_Count(S : String; Separators : String) return Natural is! use Ada.Strings.Maps;! Seps : constant Character_Set := To_Set(Separators);! task type TT(First, Last : Natural; Count : access Natural);! subtype WC_TT is TT; -- So is visible inside TT! task body TT is begin! if -- Simple cases (see previous slide)! else -- Divide and conquer! declare! Midpoint : constant Positive := (First + Last) / 2;! Left_Count, Right_Count : aliased Natural := 0;! begin! declare -- Spawn two subtasks for distinct slices! Left : WC_TT(First, Midpoint, Left_Count'Access);! Right : WC_TT(Midpoint + 1, Last, Right_Count'Access);! begin! null;! end; -- Wait for subtasks to complete!! if Is_In(S(Midpoint), Seps) or else! Is_In(S(Midpoint+1), Seps) then -- At least one separator at border! Count.all := Left_Count + Right_Count;! else -- Combine words at border! Count.all := Left_Count + Right_Count - 1;! end if;! end;! end if;! end TT;! See previous slide! end Word_Count;! Safe Parallel Ada 8

Simple cases Divide and Conquer Word_Count example in (hypothetical) Ada 202X: function Word_Count (S : String; Separators : String) return Natural! with Global => null, Potentially_Blocking => False is! case S Length is! when 0 => return 0; -- Empty string! when 1 => -- A single character! if Is_In(S(S First), Seps) then! return 0; -- A single separator! else! return 1; -- A single non-separator! end if;! when others =>! declare -- Divide and conquer! Midpoint : constant Positive := (S First + S Last) / 2;! Left_Count, Right_Count : Natural;! begin! parallel -- Spawn two tasklets for distinct slices! Left_Count := Word_Count (S(S First.. Midpoint), Separators);! and! Right_Count := Word_Count (S(Midpoint+1.. S Last), Separators);! end parallel; -- Wait for tasklets to complete!! if Is_In(S(Midpoint), Seps) or else! Is_In(S(Midpoint+1), Seps) then -- At least one separator at border! return Left_Count + Right_Count;! else -- Combine words at border! return Left_Count + Right_Count - 1;! end if;! end;! end case;! end Word_Count;! Safe Parallel Ada 9

Parallel Block parallel sequence_of_statements {and sequence_of_statements} end parallel; Each alternative is an (explicitly specified) parallelism opportunity (POp) where the compiler may create a tasklet, which can be executed by an execution server while still running under the context of the enclosing task (same task Identity, attributes, etc.). Compiler will complain if any data races or blocking are possible (using Global and Potentially_Blocking aspect information). cf. ARM 9, Note 1: whenever an implementation can determine that the required semantic effects can be achieved when parts of the execution of a given task are performed by different physical processors acting in parallel, it may choose to perform them in this way. Safe Parallel Ada 10

Global (cf. SPARK) and Potentially_Blocking aspects Global => all -- default within non-pure packages -- Explicitly identified globals with modes (SPARK 2014) Global => (Input => (P1.A, P2.B), In_Out => P1.C, Output => (P1.D, P2.E)) -- Pkg private, access collection, task/protected/atomic Global => (In_Out => P3) -- pkg P3 private data Global => (In_Out => P1.Acc_Type) -- acc type Global => (In_Out => synchronized) Global => null -- default within pure packages Potentially_Blocking [ => True => False ] Safe Parallel Ada 11

Parallel Loop for I in parallel 1.. 1_000 loop A(I) := B(I) + C(I); end loop; for Elem of parallel Arr loop Elem := Elem * 2; end loop; Parallel loop is equivalent to parallel block by unrolling loop, with each iteration as a separate alternative of parallel block. Compiler will complain if iterations are not independent or might block (again, using Global/Potentially_Blocking aspects) Safe Parallel Ada 12

Wonderfully simple and obvious, but what about? Exiting the block/loop, or a return statement? All other tasklets are aborted (need not be preemptive) and awaited, and then, in the case of return with an expression, the expression is evaluated, and finally the exit/return takes place. With multiple concurrent exits/returns, one is chosen arbitrarily, and others are aborted. With a very big range or array to be looped over, wouldn t that create a huge number of tasklets? Compiler may choose to chunk the loop into subloops, each subloop becomes a tasklet (subloop runs sequentially within tasklet). Iterations are not completely independent, but could become so by creating multiple accumulators? We provide notion of parallel array of such accumulators (next slide) Safe Parallel Ada 13

Parallel arrays of accumulators; Map/Reduce declare Partial: array (parallel <>) of Float := (others => 0.0); Sum_Of_Squares : Float := 0.0; begin for E of parallel Arr loop -- Map and partial reduction Partial(<>) := Partial(<>) + E ** 2; end loop; for I in Partial Range loop -- Final reduction step Sum_Of_Squares := Sum_Of_Squares + Partial (I); end loop; Put_Line ( Sum of squares of elements of Arr = & Float Image (Sum_Of_Squares)); end; Parallel array bounds of <> are set to match number of chunks of parallel loop in which they are used with (<>) indices. May be specified explicitly. Safe Parallel Ada 14

Map/Reduce short hand Final reduction step will often look the same: Total := <identity>; for I in Partial Range loop Total := <op> (Total, Partial); end loop Provide an attribute function Reduced to do this: Total := Partial Reduced(Reducer => +, Identity => 0.0); or Total := Partial Reduced; -- Reducer and Identity defaulted The Reduced attribute may be applied to any array when Reducer and Identity are specified explicitly The Reduced attribute may be implemented using a tree of parallel reductions. Safe Parallel Ada 15

Summary Safe Parallel Ada 16

Summary Parallel programming constructs can simplify taking advantage of modern multi/manycore hardware Parallel block and Parallel loop constructs are natural solutions for Ada Global (cf. SPARK 2014) and Potentially_Blocking aspects enable compiler to check for data races and blocking Parallel arrays and Reduced attribute simplify map/reduce sorts of computations. Please submit extended abstracts to HILT 2014 by July 5 and come to Portland, OR Safe Parallel Ada 17