EEDC. Scientific Programming Models. Execution Environments for Distributed Computing. Master in Computer Architecture, Networks and Systems - CANS

Size: px

Start display at page:

Download "EEDC. Scientific Programming Models. Execution Environments for Distributed Computing. Master in Computer Architecture, Networks and Systems - CANS"

Conrad Taylor
5 years ago
Views:

EEDC Execution Environments for Distributed Computing 34330

Scientific Programming Models Group members: Francesc Lordan

1 EEDC Execution Environments for Distributed Computing Master in Computer Architecture, Networks and Systems - CANS Scientific Programming Models Group members: Francesc Lordan francesc.lordan@bsc.es Roger Rafanell roger.rafanell@bsc.es

2 Outline Scientific Programming Models Part 1: Introduction Part 2: Reference parallel programming models Part 3: Novel parallel programming models Part 4: Conclusions Part 5: Questions 2

3 Introduction Scientific applications: Solve complex problems Usually long run applications Implemented as a sequence of steps Each step (task) can be hard to compute So 3

4 Introduction In time terms Scientific applications can t be no more considered in sequential way!!! OK? 4

5 Introduction We need solutions based on distribute and parallelize the work. 5

6 Introduction: MPI 1980s - early 1990s: Distributed memory & parallel computing started as a bunch of incompatible software tools for writing programs. MPI (Message Passing Interface) becomes at 1994 a new reference standard. It provides: Portability Performance Functionality Availability (many implementations) Good for: Parallelize the processing by distributing the work among different machines/nodes. 6

7 Introduction: OpenMP In the early 90's: Vendors of shared-memory machines supplied similar, directive-based for Fortran programming extensions: The user can extend a serial Fortran program with directives specifying which loops were to be parallelized. The compiler automatically parallelize such loops across the SMP processors. Implementations were all functionally similar, but were diverging (as usual). Good for: Parallelize the computation among all the resources of a single machine. 7

8 Reference PM: OpenMP Programming model: Computation is done by threads. Fork-join model: Threads are dynamically created and destroyed. Programmer can specify which variables are shared among threads and which are private. 8

9 Reference PM: OpenMP Example of sequential PI calculation 9

10 Reference PM: OpenMP Example of OpenMP PI calculation 10

11 Reference PM: OpenMP Strong Points: Keeps the sequential version. Communication is implicit. Easy to program, debug and modify. Good performance and scalability. Weaknesses: Communication is implicit (less control). Simple and flat memory model (does not run on clusters). No support for accelerators. 11

12 Reference PM: MPI Programming model: Computation is done by several processes that execute the same program. Communicates by passing data (send/receive). Programmer decides: Which role the process plays by branches. Orders which communications are done. 12

13 Reference PM: MPI Example of MPI PI calculation 13

14 Reference PM: MPI Strong Points: Any parallel algorithm can be expressed in terms of the MPI paradigm. Data placement problems are rarely observed. Suitable for clusters/supercomputers (large number of processors). Excellent performance and scalable. Weaknesses: Communication is explicit. Re-fitting serial code using MPI often requires refactoring. Dynamic load balancing is difficult to implement. 14

15 Reference PM: The best of both worlds Hybrid (MPI + OpenMP): MPI is most effective for problems with course-grained parallelism. Fine-grain parallelization is successfully handled by OpenMP. When use hybrid programming? The code exhibits limited scaling with MPI. The code could make use of dynamic load balancing. The code exhibits fine-grained or a combination of both fine-grained and course-grained parallelism. Some algorithms, such as computational fluid dynamics, benefit greatly from a hybrid approach!!! 15

16 Reference PM: Hybrid (MPI + OpenMP) Example of MPI + OpenMP PI calculation 16

17 Reference PM: New reference approaches Heterogeneous parallel-computing: CUDA (From NVIDIA) OpenCL (Open Compute Language) Cross-platform Implementations for ATI GPUs NVIDIA GPUs x86 CPUs API similar to OpenGL. Based on C. 17

18 Novel PMs Workflows: Based on processes Requires planning and scheduling Needs flow control In-transit visibility Novel PMs: Complex problems require simple solutions (non reference PMs based) 18

19 Microsoft Dryad The Dryad Project is investigating programming model for writing parallel and distributed programs to scale from a small cluster to a large data-center. Theoretical approach (not used) Last and unique publication on User defines: a set of methods a task dependency graph with a specific language. 19

GraphBuilder YInputs = ugriz2 >= YSet; GraphBuilder XToY = XSet >= DSet >> MSet >= SSet; for (i = 0; i < N*4; ++i){ XToY = XToY (SSet.

20 Microsoft Dryad GraphBuilder Xset = modulex^n; GraphBuilder Dset = moduled^n; GraphBuilder Mset = modulem^(n*4); GraphBuilder Sset = modules^(n*4); GraphBuilder Yset = moduley^n; GraphBuilder Hset = moduleh^1; GraphBuilder XInputs = (ugriz1 >= XSet) (neighbor >= XSet); GraphBuilder YInputs = ugriz2 >= YSet; GraphBuilder XToY = XSet >= DSet >> MSet >= SSet; for (i = 0; i < N*4; ++i){ XToY = XToY (SSet.GetVertex(i) >= YSet.GetVertex(i/4)); } GraphBuilder YToH = YSet >= HSet; GraphBuilder HOutputs = HSet >= output; GraphBuilder final = XInputs YInputs XToY YToH HOutputs; 20

21 MapReduce Programmer only defines 2 functions Map(K Input,V Input ) list(k temp,v temp ) Reduce(K temp, list(v temp )) list(v temp ) The library is in charge of all the rest 21

22 MapReduce Weaknesses Specific programming. Not easy to find key value pairs. Strong points Efficiency. Simplicity of the model. Community and tools. 22

23 The COMP Superscalar (COMPSs) 23

24 COMPSs overview - Objective Reduce the development complexity of Grid/Cluster/Cloud applications to the minimum As easy as writing a sequential application. Target applications: composed of tasks, most of them repetitive Granularity of the tasks of the level of simulations or programs. Data: files, objects, arrays, primitive types. 24

25 COMPSs overview - Main idea Sequential Code... for (i=0; i<n; i++){ T1 (data1, data2); T2 (data4, data5); T3 (data2, data5, data6); T4 (data7, data8); T5 (data6, data8, data9); }... (a) Task selection + parameters direction (input, output, inout) (d) Task completion, synchronization Parallel Resources Resource 1 Resource 2 T1 0 T2 0 T3 0 T (b) Task graph creation based on data T5 0 T1 1 T2 1 (c) Scheduling, Resource N dependencies T3 1 T4 1 data transfer, T5 1 task execution T1 2 25

26 Programming model - Sample application, ,2 public void main(){ Integer sum=0; double pi double step=1.0d /(double) num_steps; for (int i=0;i<num_steps;i++){ computeinterval (i, step,sum); } pi = sum * step; } $:-74:9 30 public static void computeinterval (int index, int step, Integer acum) { int x = (index -0.5) * step; acum = acum + 4.0/(1.0+x*x); } 26

27 Programming Model - Task Selection %, ,.0 public interface PiItf = Pi") void = IN) int = IN) int = INOUT) Integer index, ); Implementation Parameter metadata } 27 13

28 Programming Model Main code public static void main(string[] args) { Integer sum=0; double pi double step=1.0d /(double) num_steps; for (int i=0;i<num_steps;i++){ computeinterval (i, step, sum); } pi = sum * step; } $ 0 Step sum Compute Interval 1 Step sum Compute Interval sum N-1 Step sum Compute Interval sum SYNCH 28

29 Programming Model Real Example HMMER Protein Database Aminoacid Sequence 29 IQKKSGKWHTLTDLRA VNAVIQPMGPLQPGLP SPAMIPKDWPLIIIDLK DCFFTIPLAEQDCEKFA FTIPAINNKEPATRF Model Score E-value N IL6_ COLFI_ pgtp_ clf PKD_

30 Programming Model Real Example Aminoacid sequence 30

31 Programming Model Real Example String[] outputs = new String[numDBFrags]; //Process for (String dbfrag : dbfrags) { outputs[dbnum]= HMMPfamImpl.hmmpfam(sequence, dbfrag); } //Merge int neighbor = 1; while (neighbor < numdbfrags) { for (int db = 0; db < numdbfrags; db += 2 * neighbor) { if (db + neighbor < numdbfrags) { HMMPfamImpl.merge(outputs[db], outputs[db + neighbor]); } } neighbor *= 2; } 31

32 Programming Model Real Example public interface HMMPfamItf = "worker.hmmerobj.hmmpfamimpl") String = Type.FILE, direction = Direction.IN) String = Type.STRING, direction = Direction.IN) String dbfile ); = "worker.hmmerobj.hmmpfamimpl") void = Type.OBJECT, direction = Direction.INOUT) String = Type.OBJECT, direction = Direction.IN) String resultfile2 ); 32

33 Programming Model Real Example 33

34 Programming Model Real Example 34

35 COMPSs Strong points Sequential programming approach Parallelization at task level Transparent data management and remote execution Can operate on different infrastructures: Cluster/Grid Cloud (Public/Private) PaaS IaaS Web services Weaknesses: Under continuous development Does not offer binding to other languages (currently) 35

36 Tutorial Sample & Development Virtual Appliance Tutorial 36

37 Manjrasoft Aneka.NET based Platform-as-a-Service Allows the usage of: Private Clouds. Public Clouds: Amazon EC2, Azure, GoGrid. Offers mechanisms to control, reserve and monitoring the resources. Also offers autoscale mechanisms. 3 programming models Task-based: tasks are put in a bag of executable tasks. Thread-based: exposes the.net thread API but they are remotely created. MapReduce No data dependency analysis!! 37

38 Microsoft Azure.NET based Platform-as-a-Service Computing services Web Role: Web Service frontend. Worker Role: Backend. Storage Services Strong Point Scalable architecture. Weakness Platform-tied applications. 38

39 Conclusions Scientific problems are usually complex. Current reference PMs are usually unsuitable. New novel & flexible PMs came into the game. Existing gap between scientifics and user-friendly workflow-oriented programming models. A sea of available solutions (DSLs) 39

40 Questions 40

Review: Dryad. Louis Rabiet. September 20, 2013

Review: Dryad. Louis Rabiet. September 20, 2013 Review: Dryad Louis Rabiet September 20, 2013 Who is the intended audi- What problem did the paper address? ence? The paper is proposing to solve the problem of being able to take advantages of distributed