A loopy introduction to dependence analysis

Size: px

Start display at page:

Download "A loopy introduction to dependence analysis"

Christine O’Neal’
6 years ago
Views:

1 A loopy introduction to dependence analysis Lindsey Kuper JP Verkamp 1

2 Pipelining is supposed to work like this IF ID EX MEM WB IF EX MEM ID WB IF ID EX MEM WB c 1 c 2 c 3 c 4 c 5 c 6 c 7 source: Allen & Kennedy, chapter 1 2

3 But sometimes it works like this Load IF ID EX MEM WB ALU IF EX MEM ID WB ALU IF ID EX MEM WB ALU stall IF ID EX MEM WB source: Allen & Kennedy, chapter 1 3

4 But sometimes it works like this this is fine Load IF ID EX MEM WB ALU IF EX MEM ID WB ALU IF ID EX MEM WB ALU stall IF ID EX MEM WB source: Allen & Kennedy, chapter 1 3

5 But sometimes it works like this this is fine Load IF ID EX MEM WB ALU IF EX MEM ID WB ALU IF ID EX MEM WB ALU stall IF ID EX MEM WB this is not fine source: Allen & Kennedy, chapter 1 3

6 Keeping the pipeline full source: nytimes.com/2008/03/06/us/06canyon.html 4

7 SIMD: even more need for vectorization source: wikipedia.org/wiki/simd 5

8 Vectorization is about getting rid of constraints Sequential languages introduce constraints that are not critical to preserving the meaning of a computation. Allen & Kennedy, section

9 Vectorization is about getting rid of constraints Sequential languages introduce constraints that are not critical to preserving the meaning of a computation. Allen & Kennedy, section Goal: find the minimal constraints that are critical, so you can throw away the rest 6

10 A vectorizable loop Fortran 77: DO I = 1, 64 C(I) = A(I) * B(I) ENDDO Transliterated to Fortran 90: C(1:64) = A(1:64) + B(1:64) source: Allen & Kennedy, chapter 1 every input on the RHS is loaded from memory before any element of the result is stored 7

11 A vectorizable loop Fortran 77: DO I = 1, 64 C(I) = A(I) * B(I) ENDDO Transliterated to Fortran 90: C(1:64) = A(1:64) + B(1:64) }if these are semantically equivalent, the loop is vectorizable! source: Allen & Kennedy, chapter 1 every input on the RHS is loaded from memory before any element of the result is stored 7

12 A non-vectorizable loop Fortran 77: DO I = 1, 64 A(I+1) = A(I) + B(I) ENDDO Transliterated (not translated!) to Fortran 90: A(2:65) = A(1:64) + B(1:64) source: Allen & Kennedy, chapter 1 8

13 A non-vectorizable loop Fortran 77: DO I = 1, 64 A(I+1) = A(I) + B(I) ENDDO Transliterated (not translated!) to Fortran 90: (indexed from 1) A(2:65) = A(1:64) + B(1:64) A B source: Allen & Kennedy, chapter 1 A

14 A non-vectorizable loop Fortran 77: (indexed from 1) DO I = 1, 64 A(I+1) = A(I) + B(I) ENDDO A B A Transliterated (not translated!) to Fortran 90: (indexed from 1) A(2:65) = A(1:64) + B(1:64) A B source: Allen & Kennedy, chapter 1 A

15 Data dependence defined Binary relation R on statements of the program (S1, S2) R if S2 must be executed after S1 in order to preserve the relative order of loads from and stores to each memory location in the program S2 depends on S1 iff: S1 and S2 access the same memory location and at least one of them stores into it; and there is a feasible run-time execution path from S1 to S2 source: Allen & Kennedy, chapter 2 9

16 Three ways that a dependence can arise S 1 X =... S 2... = X S 1... = X S 2 X =... Output dependen S 1 X =... S 2 X =... True dependence (read-after-write) Antidependence (write-after-read) Output dependence (write-after-write)!! -1! o S 1 S S 2 1 S S 2 1 S 2 source: Allen & Kennedy, chapter 2 10

17 Dependences in loops S 1 DO I = 1, N A(I+1) = A(I) + B(I) ENDDO S 1?! source: Allen & Kennedy, chapter 2 11

18 Dependences in loops S 1 DO I = 1, N A(I+1) = A(I) + B(I) ENDDO S 1?! S 1 DO I = 1, N A(I+2) = A(I) + B(I) ENDDO S 1!?! source: Allen & Kennedy, chapter 2 11

19 Dependences in loops S 1 DO I = 1, N A(I+1) = A(I) + B(I) ENDDO S 1?! S 1 DO I = 1, N A(I+2) = A(I) + B(I) ENDDO S 1!?! Workaround: annotate statements in loops with an iteration number (for nested loops, it s an iteration vector) source: Allen & Kennedy, chapter 2 11

20 Dependence testing f1(α) = g1(β) if this equation can be satisfied for iteration numbers α and β, a dependence exists source: Allen & Kennedy, chapter 2 12

21 Dependence testing f1(α) = g1(β) if this equation can be satisfied for iteration numbers α and β, a dependence exists source: Allen & Kennedy, chapter 2 12

22 Loop-carried and loop-independent dependences S 1 S 2 DO I=1, N A(I+1) = F(I) F(I+1) = A(I) ENDDO DO I = 1,10 S 1 A(I) =... S 2... = A(I) ENDDO DO I = 1, 9 S 1 A(I) = S 2... = A(10-I) ENDDO Loop-carried Loop-independent Some of each Every data dependence is loop-carried xor loop-independent (exercise for the reader: prove it) source: Allen & Kennedy, chapter 2 13

23 A few more definitions A reordering transformation merely changes order of execution of code (no statements added/removed) A reordering transformation preserves a dependence (S1, S2) if it preserves the relative execution order of S1 and S2 Fundamental Theorem of Dependence: any reordering transformation that preserves every dependence in a program preserves the meaning of that program Sound, but not complete source: Allen & Kennedy, chapters

24 OK, so, what did we do? 15

25 What we wanted to do 16

26 What we wanted to do Idea: add SIMD parallelism to the Rust compiler Right now it has task parallelism, but no data parallelism 16

27 What we wanted to do Idea: add SIMD parallelism to the Rust compiler Right now it has task parallelism, but no data parallelism Problem: Rust is not Fortran! Hard to map the concepts in Allen & Kennedy onto Rust Some theorems are actually stated in terms of Fortran 90 Rust is probably the wrong level (cf. LLVM) Especially because Rust has no ILs (aaaaargh) 16

28 What we did instead 17

29 What we did instead We got as far as adding a new vfor language form to Rust Still doesn t do anything that regular for doesn t do 17

30 What we did instead We got as far as adding a new vfor language form to Rust Still doesn t do anything that regular for doesn t do Still wanted to try to put ideas from Allen & Kennedy into practice...albeit in a much simpler setting than Rust 17

31 What we did instead We got as far as adding a new vfor language form to Rust Still doesn t do anything that regular for doesn t do Still wanted to try to put ideas from Allen & Kennedy into practice...albeit in a much simpler setting than Rust So we created a toy language of loops and tried to write a dependence analysis for it First version: syntax-rules and duct tape Second version (in progress): PLT Redex 17

32 Demo time! Photo by vox_efx on Flickr. Thanks! 18

CS 293S Parallelism and Dependence Theory

CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law