EE/CSCI 451: Parallel and Distributed Computation

Size: px

Start display at page:

Download "EE/CSCI 451: Parallel and Distributed Computation"

Cameron Young
5 years ago
Views:

1 EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian University of Southern California 1

2 Last class Outline Analytical modeling of parallel systems Scalability Speedup and efficiency Today Parallel algorithm design (Chapter 3.1) Task and dependency Critical path Task dependency graph 2

various activities (processors) Identifying tasks that can be performed concurrently

3 Parallel Algorithm Design Parallel Algorithm = Concurrency + Coordination Communication Synchronization Design issues to be addressed: Synchronizing the various activities (processors) Identifying tasks that can be performed concurrently Mapping Work processors Data partitioning and distribution Data storage (layout) Data access management 3

4 Metrics and/or Constraints Latency, throughput Memory footprint Speed-up Scalability # of processors used Efficiency Energy efficiency 4

5 Approaches Problem eg. Mat. Mult. Design parallel algorithm Serial solution eg. 3 level loop Computation model Programming model Parallel code Manual parallelization using directives (ex. OpenMP) Automatic Parallelization Compile Compile Run time Run time Parallel platform eg. Multi-core, cluster 5

6 Task size? Tasks and Dependencies Computation = decompose into tasks Task = Set of instructions (program segment) = weight of the node (e.g., # of instructions) Inputs & outputs x # x $ x % y # y $ Note: tasks need not be of the same size 6 Begin execution once all inputs are available Fine grain Coarse grain

7 Task Dependency Graph Directed Acylic Graph Task + cannot start until Task - completes Data x # : output of Task - Input to Task + i x # j Task dependency graph need not be connected 7

8 Example (1) Code A[2] = A[0] + 1 B[0] = A[2] + 1 Task Dependency Graph T 3 Instructions Tasks Load R0 A[0] T 3 Add R1 R0 + 1 T # Add R2 R1 + 1 T $ Store A[2] R1 T % Store B[0] R2 T 4 T % T # T $ T 4 8

9 Example (2) Matrix Vector Multiplication y Ax n n Task - Compute y - Do in parallel Compute y - End Output y T 3 T 9:# Output y 9

10 Maximum Degree of Concurrency (1) Maximum number of tasks that can be executed concurrently Note: maximum degree of concurrency depends on scheduling strategy 10

11 Maximum Degree of Concurrency (2) Example: level by level ordering Task dependency graph Order tasks level by level Execute level i tasks, and then level i + 1 tasks Note: level i has dependency with level i 1 and possibly with other lower levels, and no dependency with level i 11

12 Maximum Degree of Concurrency (3) Example: level by level ordering (cont.) Find the number of tasks in each level Take maximum over all levels = maximum degree of concurrency 12

13 Maximum Degree of Concurrency (4) Example

14 Maximum Degree of Concurrency (5) Example (cont.) Level Maximum degree of concurrency = 4 14

15 Critical Path (1) Dependency graph Start nodes Finish nodes (indeg = 0) (outdeg = 0) Critical path = A longest path from a start node to a finish node (# of edges) Critical path length = Sum of the task weights of the nodes along the critical path 15

16 Critical Path (2) For a given number of tasks, Longer critical path Longer execution time (may also mean less concurrency) 16

17 Critical Path (3) T 3 n/2 n/4 T 9:# 1 Total no. of tasks = n Total no. of tasks = n 1 Critical path length = n Critical path length = log $ n 17

Critical Path (4) Task 4 Task 3 Task 2 Task 1 10 10 10 10 6

18 Critical Path (4) Task 4 Task 3 Task 2 Task Task 5 11 Task 6 7 Task 7 Critical path length = = 34 18

19 DAG To Parallel Program (1) Given a task dependency graph Assume weight of each node = 1 Maximum degree of concurrency = c Critical path length = l Then, the DAG can be executed on a PRAM?? # of processors? Time? 19

20 DAG To Parallel Program (2) Idea DAG Organize into levels 0, 1,, l (level by level ordering) (l + 1) levels Execute level by level, 0 to l Total number of processors needed c 20

21 DAG To Parallel Program (3) Correct parallel program all dependencies are satisfied Parallel time T J = O(l) Using p = c processors 21

22 Tasks, Processes, and Mapping (1) Parallel algorithm Tasks + interactions Mapping problem Further optimizations to explicit hardware features (eg. memory hierarchy, ) Map Processes + interactions Abstract Physical hardware (Processors + interconnection) 22

23 Tasks, Processes, and Mapping (2) Mapping problem Assign each task to a process 23

24 Tasks, Processes, and Mapping (3) Scheduling problem Determine the execution order of each task 24

25 Tasks, Processes, and Mapping (4) First-come-first-serve scheduler Shortest-task-first-serve scheduler Task 3 Task 2 Task 1 Pull 0 p 1 List of executable tasks Map tasks to processes Processes Task 2 Task 1 Task 3 Pull 0 p 1 25

26 Summary Parallel algorithm design Task and dependency Critical path DAG to parallel program 26

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline