An Exascale Programming, Multi objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism.

Size: px

Start display at page:

Download "An Exascale Programming, Multi objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism."

James Armstrong
5 years ago
Views:

1 This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No An Exascale Programming, ulti objective Optimisation and Resilience anagement Environment Based on Nested Recursive Parallelism AllScale Enable developers to be productive and to port their applications to any scale of system Thomas Fahringer University of Innsbruck, Austria Ireland European Exascale Applications Workshop anchester, Oct 11 12, 2016

2 Parallel Architectures ulticore: Accelerators: lusters: emory G OpenP/ilk OpenL/UDA PI/PGAS

3 Real World Architectures G G G G emory G OpenL/UDA PI/PGAS OpenP/ilk

4 Hybrid odes Issues: hard coded problem decomposition lack of coordination among runtime systems No built in support for: portability, auto tuning, load balancing, monitoring, or resilience G G G G 4

5 5 AllScale Vision emory G G G G G Application Unified Parallel Programming odel Toolchain Parallel Algorithm Portability, Tuning, and Resilience ulticore Accelerators lusters Heterogeneous lusters

6 onventional Flat Parallelism How to map flat parallelism to a hierarchical parallel architecture? omplex handling of errors global operations A t=n time time parallelism A t=0 A t=n linear parallel growth parallelism A t=0 A t=n time parallelism A t=0 global barrier

7 Recursively Nested Parallelism time A t=n A t=n/2 A t=0 space Global Synchronisation Local Synchronisation Exponential parallel growth Recursive call

8 Recursive Parallelism recursive task 8

9 Recursive Parallelism Adaptable Task Granularity recursive task 9

10 Recursive Parallelism recursive task fine grained dependencies 10

11 Recursive Parallelism Node Socket Accelerator recursive task aps naturally to multiple levels of HW parallelism 11

12 Recursive Parallelism task versions ultiversioning allows adaption to hardware & system state 12

13 Recursive Parallelism Hardware Entity 1 Hardware Entity 2 Hardware Entity 3 Dynamic load balancing and data migration 13

14 Recursive Parallelism isolated restart failed computation Automatic resilience management 14

15 Objectives Objective 1 Single Source to Any Scale substantial improvement in productivity Objective 2 Exploit Recursive Parallelism foundation of scalability Objective 3 ulti Objective Optimization time, energy, resource usage 16

16 Objectives Objective 4 Unified Runtime System one to rule them all (objects and resources) Objective 5 itigating Hardware Failures let system manage recovery Objective 6 Integrated onitoring runtime system supported online/offline profiling 17

17 omponents Generic Parallel Primitives (++ Template API) Applications User Level API ore API Pilot Applications Single Source User Interface Generic APIs for abstract Algorithm Descriptions Identify & Express Parallelism Standard ++ Toolchain Desktop Hardware API aware highlevel ompiler Unified Runtime System Scheduler Online onitoring and Analysis Small to Extreme Scale Parallel Architectures Resilience anagement ode Generation for Accelerators and Distributed emory Universal Abstract achine odel Dynamic Load, Data and Resource anagement Parallel Hardware Decomposition & Restructuring omputation & Data anagement Development Tuning & Deployment 18

18 Interfaces Generic Parallel Primitives (++ Template API) Applications User Level API ore API Pilot Applications Single Source User Interface Generic APIs for abstract Algorithm Descriptions Identify & Express Parallelism Standard ++ Toolchain Desktop Hardware API aware highlevel ompiler Unified Runtime System Scheduler Online onitoring and Analysis Small to Extreme Scale Parallel Architectures Resilience anagement ode Generation for Accelerators and Distributed emory Universal Abstract achine odel Dynamic Load, Data and Resource anagement Parallel Hardware Decomposition & Restructuring omputation & Data anagement Development Tuning & Deployment 19

19 API Application Groups Applications User Level API Hardware Oblivious ode Abstract Domain Specific Primitives ompiler Group ore API ompiler ompiler Supported Primitives Realization of Primitives 21

20 API Based on ++ templates Widely used industry standard Two Layers: User Level API High level abstractions (e.g. grids, meshes, stencils, channels) Familiar interfaces (e.g. parallel for loops, map reduce) implemented based on ore API Generic function template for recursive parallelism Set of recursive data structure templates Synchronization, control and data flow primitives

21 ompiler ompiler Group ore API ompiler ompiler Supported Primitives Frontend, Analysis and Backend ods Insieme ompiler Toolbox Runtime Groups Runtime Abstract achine odel 26

22 ompiler Analyzesrec primitive usage anddata accesses Generates multiple code versions for each step Sequential Shared memory parallel Distributed memory parallel Accelerator Reports potential issuesto programmer Data dependencies, race conditions, Provides additional information to runtime E.g. type of recursion and data dependencies Improves dynamic optimization potential

23 ompilation AllScale ompiler ode User API ore API Input odes High Level parallel IR High Level parallel IR ++ AST ++ Frontend Template unfolding Semantic Frontend Unified Parallel Representation Analysis Data Requirement Analysis odular Backend Shared emory Shared emory Distributed Distributed emory emory Accelerators Accelerators Resilience /++ 5 Versioning Report ulti Versioned Target ode 6 29

24 Runtime ompiler Group Insieme Runtime Target ode Generation Abstract achine odel Runtime Groups HPX Hardware Execution Infrastructure Execution 30

25 Runtime System Provides an abstract parallel machine as target for compiler generated code anages distributed resources Data locality ommunication & synchronization Accelerators Dynamic load balancing Selects from compiler generated code versions Depending on hardware and execution context Prof. Dietmar Fau Dr. Konstantinos Katrinis Ireland

26 Execution Optimisation Objectives Input ode Actuators 5 Task Handling 4 Steering mds Data Handling Resource Handling 2 ulti Objective Dynamic Optimiser and Scheduler ontinuous Steering Process Distributed Work & Data Entities processed by Resources Processing Architecture ( utilised via: PI / Infiniband / OpenL / UDA / ) 3 6 Sensors Resilience Instructions 1 Events &Data onitoring Sensors 7 Program Events Resilience anager 8 AllScale Runtime System 35

27 AllScale Products AllScale API Parallel ++ Data Structures and Algorithms <implemented by> AllScale Toolchain AllScale Environment ompiler and Runtime System providing Portability, Tuning, and Resilience 36

28 Pilot Applications ipi3d Implicit particle in-cell code for space weather applications KTH AmDaDos Adaptive meshing, data assimilation for dispersion of oils spills IB Research Fine/Open Large Industrial unsteady FD simulations NUEA 39

29 Summary hallenge Explore recursive task parallelism for extreme scale HP AllScale single programming model based on ++ templates main source of parallelism: recursive parallelism single compiler/single runtime system auto tuning, code versioning, fault tolerance, on line monitoring First prototype release by arch 2017 ore information 41

30 AllScale onsortium Ireland

Programming Models for Largescale

Programming Models for Largescale Programming odels for Largescale Parallelism Peter Thoman University of Innsbruck Outline Origin Story State of the Art Well-established APIs and their issues The AllScale Approach What we are working