Sanjay Rajopadhye Colorado State University

Size: px

Start display at page:

Download "Sanjay Rajopadhye Colorado State University"

Mervyn May
5 years ago
Views:

1 Sanjay Rajopadhye Colorado State University

2 n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2

3 n Parallel Programming is hard n End of the free lunch [Sut05] n Arrival of manycores signals the end of La-Z-Boy Programming [Pat06] Becoming a parallel programming expert will get you a good job But your skills may become obsolete new machines, new languages, Parallelism must return to La-Z-Boy programming [Sut05] Herb Sutter. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency, in Software. Dr. Dobb's Journal, vol. 30, no. 3, [Pat06] David Patterson, in keynote talk at the International Workshop on Languages and Compilers For Parallel Computers LCPC 2006, New Orleans, LA. 3

4 n Short term Become macho GPU programmer: write heroically tuned codes. n Medium term Do it systematically: tuning for GTX 280 vs tuning for GTX 465: learn principles, not skills n Long term Do it automatically: Learn the foundations of automatic compilation. Focus on a regular subset of programs n Polyhedral Equational Model 4

5 n Big picture n Polyhedral Equations as programs: I m loath to write C, despite the slogan C no evil n Equations vs (conventional) loop programs n Equations-to-code (compiling equations) n Schedule n (processor) allocation n (memory) allocation n But what about parallelism? 5

6 10 assignments (basic + advanced) + term project n CUDA performance tuning (2) n Equational programming: Alpha/AlphaZ (1) n Mathematical foundations: polyhedra, affine functions, and operations (2) n Alpha analysis/transformation (1) n Analysis: scheduling & allocation (2) n Code generation/tiling (2) 6

7 n Assignments (30%) n Midterm (take home) (30%) n Final project (30% = ) n Proposal n Advancement report n Final report n Quality of work n Final poster n Participation/Discussion/Quizzes (10%) 7

8 n What are polyhedra? n Why are they useful/important n What is the polyhedral model? 8

9 n What is a model? n A mathematical/computational/mechanical/ abstraction of some other (physical) entity n Objects in the model must emulate the natural operations of the modeled entities semantics 9

10 From Feautrier s keynote at LCPC 2009 Introduction Prehistory State of the Art What Next? Dependences Karp, Miller, Winograd Irigoin, PF 1988, Pugh, H. T. Kung, 1978 Systolic Array Design Quinton, Robert,, 1989 Rajopadhye, 1987 Rau Scheduling Quinton, Rajopadhye, Fortes,, PF Placement PF, Pingali, 1994 Cousot, Halbwachs 1977 The Polytope Model Pugh, 1991 LC Lu, 1991 Code Generation Irigoin, Lengauer, Rajopadhye Bastoul, PF, Boulet, Tiling Irigoin, JL Xue,, 1988 Bernstein 1966 L. Lamport, 1974 Automatic Parallelization Dependence tests, Banerjee, 1976 Kuck Allen, Kennedy, 1987 Lam Irigoin Array Shrinking PF, Rajopadhye, Darte, 2005 Locality Wolfe + Lam, 1991 Bastoul, 2003 HLS Quinton, Risset, / 39 10

11 n Physical entity: programs/computations n The Polyhedral Model is a very high level intermediate representation (IR) of regular computations n Polyhedral equational model: real=abstract n Amenable to: n Mathematical static analysis n Transformation within model: closure n Transformation outside model: (tiled) code generation 11

12 n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro)? 12

13 n Many resources on the web (NVIDIA webinars) n Coalescing (HW1a) n Challenge question: Achieve maximum bandwidth, with fewest threads-per-block n For a strided-by-block access pattern. n Arithmetic peak: warps and virtualization n Bank conflicts in shared memory 13

14 n MAXPYrep: n Repeatedly execute Y=A*X+Y n Where A, X and Y are matrices n Matrices are small enough to fit in shared memory (ignore global memory access coalescing) n Goal: achieve machine peak n Port all previous performance to GTX 480 n And beyond n Teach me 14

15 n Oxford CUDA conf (CUDA webinar online) n Identifying Performance Limiters, Micikevicius NVIDIA/UCF (CUDA webinar) n Roofline for Fast Math Sam Williams, LBL 15

16 n Wiki page for Pascal s Triangle n and also a non-standard way to compute Fibonacci numbers 16

9/5/15. n Class objectives, goals. n Why Fine grain parallelism? n Equational Programming (intro) Sanjay Rajopadhye Colorado State University

9/5/15. n Class objectives, goals. n Why Fine grain parallelism? n Equational Programming (intro) Sanjay Rajopadhye Colorado State University Sanjay Rajopadhye Colorado State University n Class objectives, goals n Why Fine grain parallelism? n Equational Programming (intro) 2 1 n Every problem is underspecified n Questions are ill posed n Finding