Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson example Skewng Smth-Waterman Automatng transformatons lke skewng Iteraton space representaton Transformaton representaton Applyng the transformaton to the teraton space Generatng code for the new teraton space CS 553 Intro to Automatng Loop Transformatons 1
Semester Long Project Posted Onlne Man Idea (fnd a program analyss and/or transformaton tool) Demonstrate usage of the tool to the rest of the class (10 mnutes, 2-page tutoral) Fnd 10+ related papers and descrbe research problem space Descrbe the space of solutons presented n the papers Evaluate the tool on a benchmark. How well does t solve the problem? What are some lmtatons? Present your fndngs to the rest of the class. Requrements Project proposal due next Frday October 17 th In-class demos and 2-page tutorals due Monday November 17th Fnal report due Frday December 12th In-class presentatons Wednesday December 17 th, 4:10-6:10pm CS 553 Intro to Automatng Loop Transformatons 2
Parallelsm and Storage Usage Tradeoff False dependences lmt parallelsm Removng false dependences requres more memory/storage Obtanng performance requres fndng an effectve tradeoff CS 553 Intro to Automatng Loop Transformatons 3
Loop-Carred, Storage-Related Dependences Problem Loop-carred dependences nhbt parallelsm Scalar references result n loop-carred dependences Example!do = 1,6!!!! t = A() + B()!! C() = t + 1/t!!! Can ths loop be parallelzed? What knd of dependences are these? No. Ant dependences. Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS 553 Intro to Automatng Loop Transformatons 4
Removng False Dependences wth Scalar Expanson Idea Elmnate false dependences by ntroducng extra storage Example do = 1,6 T() = A() + B() C() = T() + 1/T() t = T[6] Can ths loop be parallelzed? Dsadvantages? CS 553 Intro to Automatng Loop Transformatons 5
Scalar Expanson Detals Restrctons The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop The expanded scalar must have no upward exposed uses n the loop do = 1,6 prnt(t) t = A() + B() C() = t + 1/t - Nested loops may requre much more storage - When the scalar s lve after the loop, we must move the correct array value nto the scalar - Prvatzaton s another approach that s smlar, one scalar per thread CS 553 Intro to Automatng Loop Transformatons 6
Automatng Loop Transformatons wth Frameworks Currently Frameworks used n compler to abstract loops, memory accesses, and data dependences n loop specfy the effect of a sequence of loop transformatons on the loop, ts memory accesses, and ts data dependences generate code from the transformed loop Loop transformatons affect the schedule of the loop Future How can framework technology be exposed n the programmng model? Frameworks we wll dscuss ths semester Unmodular Polyhedral Presburger Sparse Polyhedral CS 553 Intro to Automatng Loop Transformatons 7
Proten Strng Matchng Example (smthwaterman.c) for (=1;<=a[0];++) {! for (j=1;j<=b[0];j++) {! dag = h[-1][j-1] + sm[a[]][b[j]];! down = h[-1][j] + DELTA;! rght = h[][j-1] + DELTA;! max=max3(dag,down,rght);! f (max <= 0) {! h[][j]=0; xtraceback[][j]=-1; ytraceback[][j]=-1;! } else f (max == dag) {! h[][j]=dag; xtraceback[][j]=-1; ytraceback[][j]=j-1;! } else f (max == down) {! h[][j]=down; xtraceback[][j]=-1; ytraceback[][j]=j;! } else {! h[][j]=rght; xtraceback[][j]=; ytraceback[][j]=j-1;! }! f (max > Max){! Max=max; xmax=; ymax=j;! }! }} // end for loops CS 553 Intro to Automatng Loop Transformatons 8
Skewng (smthwaterman.c) // Let j =+j and =.! for ( =1; <=a[0]; ++) {! for (j = +1;j <= +b[0];j ++) {! dag = h[ -1][j - -1] + sm[a[ ]][b[j - ]];! down = h[ -1][j - ] + DELTA;! rght = h[ ][j - -1] + DELTA;! max=max3(dag,down,rght);! f (max <= 0) {! h[ ][j - ]=0; xtraceback[ ][j - ]=-1; ytraceback[ ][j - ]=-1;! } else f (max == dag) {! h[ ][j - ]=dag; xtraceback[ ][j - ]= -1;! ytraceback[ ][j - ]=j - -1;! } else f (max == down) {! h[ ][j - ]=down; xtraceback[ ][j - ]= -1;! ytraceback[ ][j - ]=j - ;! } else {! h[ ][j - ]=rght; xtraceback[ ][j - ]= ;! ytraceback[ ][j - ]=j - -1;! }! f (max > Max){ Max=max; xmax= ; ymax=j - ;! }! }} // end for loops CS 553 Intro to Automatng Loop Transformatons 9
Iteraton Space Representaton Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 j Represent the teraton space As an ntersecton of nequaltes The teraton space s the nteger tuples wthn the ntersecton Bounds: 1 <= <= 6 1 <= j j <= 5 CS 553 Intro to Automatng Loop Transformatons 10
Lexcographcal Order as Schedule Iteraton pont Integer tuple wth dmensonalty d ( 0, 1,..., d ) Lexcographcal Order Frst order the teraton ponts by _0, then _1, and fnally _d. ( 0, 1,..., d 1 ) ( 0, 1,..., d 1 ) ( 0 <j 0 ) ( 0 = j 0 1 <j 1 )...( 0 = j 0 1 = j 1... d 1 = j d 1 ) CS 553 Intro to Automatng Loop Transformatons 11
Frameworks for Loop Transformatons Loop Transformatons as functons = f() Unmodular Loop Transformatons [Banerjee 90],[Wolf & Lam 91] can represent loop permutaton, loop reversal, and loop skewng unmodular lnear mappng (determnant of matrx s + or - 1) T s a matrx, and are teraton vectors example lmtatons apple 0 j 0 = only perfectly nested loops = T apple 0 1 1 1 apple j all statements are transformed the same CS 553 Intro to Automatng Loop Transformatons 12
Loop Skewng Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 j Dstance vector: (1, -1) Skewng: j CS 553 Intro to Automatng Loop Transformatons 13
Transformng the Dependences and Array Accesses Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 Dependence vector: j A A New Array Accesses: 1 0 0 1 apple 1 0 0 1 apple 1 0 A 0 1 apple 1 0 A 0 1 j 0 + 0 apple 1 0 1 1 apple j apple + apple 1 0 1 1 = A(, j) 1 1 apple 0 j 0 + apple 0 0 = A( 0,j 0 0 ) = A( 1,j+ 1) apple apple 0 1 j 0 + = A( 0 1,j 0 0 + 1) 1 j CS 553 Intro to Automatng Loop Transformatons 14
Transformng the Loop Bounds Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 Bounds: j Transformed code do = 1,6 do j = 1+,5+ A(,j - ) = A( -1,j - +1)+1 CS 553 Intro to Automatng Loop Transformatons 15 j
Revstng (smthwaterman.c) for (=1;<=a[0];++) {! for (j=1;j<=b[0];j++) {! dag = h[-1][j-1] + sm[a[]][b[j]];! down = h[-1][j] + DELTA;! rght = h[][j-1] + DELTA;!! Let j =+j and =. for ( =1; <=a[0]; ++) {! for (j =+1;j <=+b[0];j ++) {! dag = h[ -1][j - -1] + sm[a[]][b[j - ]];! down = h[ -1][j - ] + DELTA;! rght = h[ ][j - -1] + DELTA;!! CS 553 Intro to Automatng Loop Transformatons 16
Transformaton Legalty Recall A dependence vector s legal f t s lexcographcally non-negatve. Applyng the transformaton functon to each dependence vector produces a dependence vector for the new teraton space. When s a transformaton legal assumng a lexcographcal schedule? What about parallelsm? CS 553 Intro to Automatng Loop Transformatons 17
Convertng C loops to teraton space representaton Analyses needed Loop analyss Loop bounds from AST or control-flow graph Inducton varable detecton Ponter analyss Do ponters pont at same or overlappng memory? Note that n C can cast a ponter to an nteger and back and can do ponter arthmetc. In general requres whole program analyss. Dependence analyss Is ths even possble? Current tools make the optmstc ponter assumpton We need programmng models that smplfy or remove the need for such analyses CS 553 Intro to Automatng Loop Transformatons 18
Concepts Parallelsm and Memory Usage tradeoff Transformaton Frameworks Representng the teraton space Representng transformatons Applyng transformatons to the teraton space, dependences, and array accesses Testng the legalty of a transformaton Compler analyses needed n C to obtan an teraton space representaton References [Banerjee90] Uptal Banerjee, Unmodular transformatons of double loops, In Advances n Languages and Complers for Parallel Computng, 1990. [Wolf & Lam 91] Wolf and Lam, A Data Localty Optmzng Algorthm, In Programmng Languages Desgn and Implementaton, 1991. CS 553 Intro to Automatng Loop Transformatons 19
Next Tme Homework Study for the mdterm by dong example problems. Lecture Mdterm revew After mdterm: Usng the unmodular framework to represent other loop transformatons CS 553 Intro to Automatng Loop Transformatons 20