Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop transformatons and transformaton frameworks Loop permutaton Loop reversal Loop skewng Loop fuson Revew Dstance vectors Concsely represent dependences n loops (.e., n teraton spaces) Dctate what transformatons are legal e.g., Permutaton and parallelzaton Legalty A dependence vector s legal when t s lexcographcally nonnegatve Loop-carred dependence A dependence D=(d 1,...d n ) s carred at loop level f d s the frst nonzero element of D CS553 Lecture Loop Transformaton CS553 Lecture Loop Transformaton Scalar Expanson: Motvaton Problem Loop-carred dependences nhbt parallelsm Scalar references result n loop-carred dependences t = A() + B() C() = t + 1/t Can ths loop be parallelzed? What knd of dependences are these? No. Ant dependences. Scalar Expanson Elmnate false dependences by ntroducng extra storage T() = A() + B() C() = T() + 1/T() Can ths loop be parallelzed? Dsadvantages? Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS553 Lecture Loop Transformatons 4 CS553 Lecture Loop Transformatons 5 1
Scalar Expanson Detals Restrctons The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop The expanded scalar must have no upward exposed uses n the loop prnt(t) t = A() + B() C() = t + 1/t Nested loops may requre much more storage When the scalar s lve after the loop, we must move the correct array value nto the scalar Loop Permutaton Swap the order of two loops to ncrease parallelsm, to mprove spatal localty, or to enable other transformatons Also known as loop nterchange do = 1,n x = A(2,) a row of A Ths access strdes through do = 1,n x = A(2,) Ths code s nvarant wth respect to the nner loop, yeldng better localty CS553 Lecture Loop Transformatons 6 CS553 Lecture Loop Transformatons 7 Loop Interchange (cont) do = 1,n x = A(,) Ths array has strde n access do = 1,n x = A(,) (Assumng column-maor order for Fortran) Ths array now has strde 1 access Legalty of Loop Interchange Case analyss of the drecton vectors (=,=) The dependence s loop ndependent, so t s unaffected by nterchange (=,<) The dependence s carred by the loop. After nterchange the dependence wll be (<,=), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. (<,=) The dependence s carred by the loop. After nterchange the dependence wll be (=,<), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. CS553 Lecture Loop Transformatons 8 CS553 Lecture Loop Transformatons 9 2
Legalty of Loop Interchange (cont) Case analyss of the drecton vectors (cont.) (<,<) The dependence dstance s postve n both dmensons. After nterchange t wll stll be postve n both dmensons, so the dependence relatons do not change. (<,>) The dependence s carred by the outer loop. After nterchange the dependence wll be (>,<), whch changes the dependences and results n an llegal drecton vector, so nterchange s llegal. (>,*) (=,>) Such drecton vectors are not possble for the orgnal loop. Loop Interchange Consder the (<,>) case do = 1,n C(,) = C(+1,-1) Before (1,1) C(1,1) = C(2,0) (1,2) C(1,2) = C(2,1)... (2,1) C(2,1) = C(3,0) δ a do = 1,n C(,) = C(+1,-1) After (1,1) C(1,1) = C(2,0) (2,1) C(2,1) = C(3,0)... (1,2) C(1,2) = C(2,1) δ f CS553 Lecture Loop Transformaton0 CS553 Lecture Loop Transformaton1 Frameworks for Loop Transformatons Unmodular Loop Transformatons [Baneree 90],[Wolf & Lam 91] can represent loop permutaton, loop reversal, and loop skewng unmodular lnear mappng (determnant of matrx s + or - 1) T =, T s a matrx, and are teraton vectors transformaton s legal f the transformed dependence vector reman lexcographcally postve lmtatons only perfectly nested loops all statements are transformed the same CS553 Lecture Loop Transformaton2 Legalty of Loop Interchange, Reprse Reduced case analyss of the drecton vectors (=,=) The dependence s loop ndependent, so t s unaffected by nterchange (=,<) The dependence s carred by the loop. After nterchange the dependence wll be (<,=), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. (<,>) The dependence s carred by the outer loop. After nterchange the dependence wll be (>,<), whch changes the dependences and results n an llegal drecton vector, so nterchange s llegal. CS553 Lecture Loop Transformaton3 3
Loop Reversal Change the drecton of loop teraton (.e., From low-to-hgh ndces to hgh-to-low ndces or vce versa) Benefts Improved cache performance Enables other transformatons (comng soon) do = 6,1,-1 A() = B() + C() A() = B() + C() CS553 Lecture Loop Transformaton4 Loop Reversal and Dstance Vectors Impact Reversal of loop negates the th entry of all dstance vectors assocated wth the loop What about drecton vectors? When s reversal legal? When the loop beng reversed does not carry a dependence (.e., When the transformed dstance vectors reman legal) do = 1,5 do = 1,6 A(,) = A(-1,-1)+1 Dependence: Dstance Vector: Transformed Dstance Vector: Flow (1,1) (1,-1) legal CS553 Lecture Loop Transformaton5 Loop Reversal Loop Skewng Legalty Loop reversal wll change the drecton of the dependence relaton Is the followng legal? A() = A(-1) Dependence: Dstance Vector: Flow (1) Orgnal code do = 1,5 A(,) = A(-1,+1)+1 Dstance vector: (1, -1) Can we permute the orgnal loop? do = 6,1,-1 A() = A(-1) Dependence: Dstance Vector: Ant (1) Flow ( 1) Skewng: CS553 Lecture Loop Transformaton6 CS553 Lecture Loop Transformaton7 4
Transformng the Dependences and Array Accesses Transformng the Loop Bounds Orgnal code Orgnal code do = 1,5 A(,) = A(-1,+1)+1 Dependence vector: do = 1,5 A(,) = A(-1,+1)+1 Bounds: New Array Accesses: CS553 Lecture Loop Transformaton8 Transformed code do = 1,6 do = 1+,5+ A(, - ) = A( -1, - +1)+1 CS553 Lecture Loop Transformaton9 Loop Fuson Combne multple loop nests nto one do = 1,n A() = A(-1) B() = A()/2 do = 1,n A() = A(-1) B() = A()/2 Pros Cons May mprove data localty May hurt data localty Reduces loop overhead May hurt cache performance Enables array contracton (opposte of scalar expanson) May enable better nstructon schedulng CS553 Lecture Loop Transformaton0 Legalty of Loop Fuson Basc Condtons Both loops must have same structure Same loop depth Same loop bounds Can we relax any of these restrctons? Same teraton drectons Dependences must be preserved e.g., Flow dependences must not become ant dependences do = 1,n body1 do = 1,n body2 All cross-loop dependences flow from body1 to body2 do = 1,n body1 body2 Ensure that fuson does not ntroduce dependences from body2 to body1 CS553 Lecture Loop Transformaton1 5
Loop Fuson What are the dependences? do = 1,n A() = B() + 1 δ f do = 1,n C() = A()/2 do = 1,n D() = 1/C(+1) What are the dependences? do = 1,n A() = B() + 1 δ f C() = A()/2 δ a D() = 1/C(+1) Fuson changes the dependence between and, so fuson s llegal Is there some transformaton that wll enable fuson of these loops? Loop Fuson (cont) Loop reversal s legal for the orgnal loops Does not change the drecton of any dep n the orgnal code Wll reverse the drecton n the fused loop: δ a wll become do = n,1 A() = B() + 1 δ f do = n,1 C() = A()/2 do = n,1 D() = 1/C(+1) do = n,1,-1 A() = B() + 1 δ f C() = A()/2 D() = 1/C(+1) After reversal and fuson all orgnal dependences are preserved CS553 Lecture Loop Transformaton2 CS553 Lecture Loop Transformaton3 Concepts Usng drecton and dstance vectors Transformaton legalty (from prevous) must respect data dependences scalar expanson as a technque to remove ant and output dependences Next Tme Lecture More loop transformatons An even cooler transformaton framework Transformatons: What s the beneft? What do they enable? When are they legal? Unmodular transformaton framework represents loop permutaton, loop reversal, and loop skewng provdes mathematcal framework for... testng transformaton legalty, transformng array accesses and loop bounds, and combnng transformatons CS553 Lecture Loop Transformaton4 CS553 Lecture Loop Transformaton5 6