Static single assignment

Size: px

Start display at page:

Download "Static single assignment"

Erick Briggs
5 years ago
Views:

1 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses.

2 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. a =... b =... n = b b = a a = n

3 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a =... b =... n = b b = a a = n

4 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

5 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

6 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2

7 Control-flow graph Loop-nesting forest Static single assignment Dominance frontier (elementary algorithm Dominance can be computed by fixed-point iteration: D(r = {r} and D(n = {n} D[p] p pred[n] Many other more efficient algorithms are possible. Then: procedure computedf(n S := {} for each node y in succ[n] do if (idom(y n then S := S {y} /* successor of n not strictly dominated by n */ endfor for each child c of n in the dominator tree do computedf(c for each element w of DF [c] do if (n does not dominate w then S := S {w} endfor endfor DF [n] := S

8 Example What is the dominance frontier of node 5? CMPUT Compiler Design 129and Optimizatio

9 Example First we must find all nodes that node 5 strictly dominates. CMPUT Compiler Design 130and Optimizatio

10 Example A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 131and Optimizatio

11 Example DF(5 = {4, 5, 12, 13} A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 132and Optimizatio

12 Placement of φ-functions Control-flow graph Loop-nesting forest Static single assignment procedure Place-φ-functions(G, DF, D /* D[n] is the set of variables defined in n */ for each node n in G do for each variable a in D[n] do defsites[a] := defsites[a] {n} endfor endfor for each variable a do W := defsites[a] while (W not empty do remove some node n from W for each Y in DF [n] do if (Y / D φ [n] then insert statement a = φ(a,..., a at the top of Y D φ [n] := D φ [n] {Y } if (Y / D[n] then W := W {Y } endif endfor endwhile endfor

13 Renaming variables Control-flow graph Loop-nesting forest Static single assignment procedure Rename(n for each statement S in block n do if (S is not a φ-function then for each use of some variable x in S do i := top(stack[x]; replace the use of x with x i in S endfor endif for each definition of some variable a in S Count[a] + +; i := Count[a]; push i onto Stack[a]; replace definition with a i endfor endfor for each successor Y of block n and each φ-function in Y do i := top(stack[a] where a is the argument coming from n; replace it with a i endfor for each child (in the dominance tree X of n do Rename(X for each definition of some variable a (in the original code do pop Stack[a] procedure RenameAll(G for each variable a do Count[a] := 0; Stack[a] := {}; push 0 onto Stack[a] Rename(r /* root of the dominance tree */

14 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 CMPUT Compiler Design 144and Optimizatio

15 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 145and Optimizatio

16 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 j φ(j,j k φ(k,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 146and Optimizatio

17 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i 0 k k+1 B2 B1 if j<20 i 0 1 j 0 1 k 0 0 j φ(j 0,j k φ(k 0,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 147and Optimizatio

18 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j k 1 φ(k 0,k if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j φ(j 2,j k φ(k 2,k CMPUT Compiler Design 148and Optimizatio

19 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j 3 φ(j 2,j k 3 φ(k 2,k CMPUT Compiler Design 149and Optimizatio

20 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 150and Optimizatio

21 Example: Constant Propagation B1 i 0 1 j 0 1 k 0 0 B1 i 0 1 j 0 1 k 0 0 B2 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B5 B3 j 2 i 0 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B5 B3 j 2 1 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 151and Optimizatio

22 Example: Dead-code Elimination B5 B3 j 2 1 k 2 +1 B7 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 B5 B3 j 2 1 k 2 +1 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 CMPUT Compiler Design 152and Optimizatio

23 Constant Propagation and Dead Code Elimination B5 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 153and Optimizatio

24 Example: Is this the end? B3 k 2 +1 j 4 k 4 +2 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 j 3 φ(1,j 4 k 3 φ(k 2,k 4 return j 1 B6 B4 But block 6 is never executed! How can we find this out, and simplify the program? SSA conditional constant propagation finds the least fixed point for the program and allows further elimination of dead code. See algorithm in Tiger book. CMPUT Compiler Design 154and Optimizatio

25 Example: Dead code elimination B3 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 k 2 +1 j 4 k 4 +2 B6 k 2 +1 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1 k 3 φ(k 2 CMPUT Compiler Design 155and Optimizatio

26 Example: Single Argument φ-function Elimination B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 φ(1 k 3 φ(k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 156and Optimizatio

27 Example: Constant and Copy Propagation B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 157and Optimizatio

28 Example: Dead Code Elimination B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 158and Optimizatio

29 Example: φ-function Simplification B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 CMPUT Compiler Design 159and Optimizatio

30 Example: Constant Propagation B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return 1 B4 CMPUT Compiler Design 160and Optimizatio

31 Example: Dead Code Elimination B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 return 1 B4 return 1 B4 k 2 +1 CMPUT Compiler Design 161and Optimizatio

32 More readings Control-flow graph Loop-nesting forest Static single assignment References Cytron, Ferrante, Rosen, Wegman, Zadek. Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, 13(4: , Ramalingam. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems, 24(5: , Recent advances in SSA SSA-based compilers & JIT compilation. Register allocation, out-of-ssa conversion, liveness analysis. SSA extensions: SSI, gated SSA, psi-ssa, value state dependence graph, array SSA, safetsa, etc.

33 Links between the different notions Control-flow graph Loop-nesting forest Static single assignment A few important results: If S contains the entry node, J(S = J + (S = DF + (S. G is reducible iff simplifiable by the rules T 1 and T 2. iff each SCC has a unique entry node. iff removing all (u, v where v dominates u makes G acyclic.... Dominators and iterated dominance frontiers can be computed quickly from loop-nesting forest, especially if G is reducible. Conversely, DJ-graphs can be used to build loop forests. Advanced algorithms use Tarjan s union-find with almost-linear complexity (see Ramalingam, Sreedhar, Havlak, Steensgaard.

34 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a =... b =... n = b b = a a = n

35 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Swap problem a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

36 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2

37 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 a 2 = b 2 b 2 = a 2

38 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = b 2 b 2 = a 2

39 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x =... y = x x = x + 1 y

40 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = φ(x 1, x 3 y = x 2 x 3 = x y

41 x 2 = φ(x 1, x 3 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 3 = x x 2

42 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = x 1 x 2 = φ(x 1, x 3 x 3 = x x 2 = x 3 x 2

43 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2

44 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2

45 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3

46 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 Many SSA optimizations turned off in gcc and Jikes.

47 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 0 = φ(a 1,..., a n

48 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 1 = a 1 a i = a i a n = a n Correctness After introduction of variables a i and copies, the code is in CSSA. a 0 = φ(a 1,..., a n a 0 = a 0

49 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. Correctness After introduction of variables a i and copies, the code is in CSSA. Code quality Aggressive coalescing can remove useless copies. But better use accurate notion of interferences. From SSA to CSSA B i a 0 = φ(a 1,..., a n a 0 = a 0 B n a 1 = a 1 a i = a i a n = a n Liveness of φ defined by the a i. Be careful with potential bugs due to conditional branches that use or define variables.

50 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2

51 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2

52 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2

53 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2 (u 2, v 2 = (b 2, a 2

54 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 2, v 2 = (b 2, a 2 n = b 1 b 1 = a 1 a 1 = n

55 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 x 2 = φ(x 1, x 3 u 0 = φ(u 1, u 2 x 3 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2

56 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2

57 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2

58 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 u 0 = φ(u 1, u 2 x 2 = x 1 x 1 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2

59 Outline 1 Control-flow graph Loop-nesting forest Static single assignment 2 3 Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness

60 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = φ(b, c d = φ(b, a

61 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a

62 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a

63 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. Need to update interference graph after coalescing. b =... c =... c = b b = c b b c d a d a

64 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b Need to update interference graph after coalescing. Unique value V of a SSA variable a = a d = a For a copy b = a, V (b = V (a (traversal of dominance tree. Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d = d d d b a a c

65 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... b =... Need to update interference graph after coalescing. Unique value V of a SSA variable For a copy b = a, V (b = V (a (traversal of dominance tree. b Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d d b a a c

66 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize.

67 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d (a, b, c, d = (c, a, b, c b c

68 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c (a, b, c = (d, a, b b c

69 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c c = b b = a a = d b c

70 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. Simple circuit: one more copy. a d d = c c = b b = a a = d b c

71 Algorithm 1: Parallel copy sequentialization algorithm Data: Set P of parallel copies a b, a b, one extra fresh variable n Output: List of copies in sequential order 1 ready [] ; to do [] ; pred(n ; 2 forall the (a b P do 3 loc(b ; pred(a ; /* initialization */ 4 forall the (a b P do 5 loc(a a ; pred(b a ; to do.push(b ; /* copy into b to be done */ 6 forall the (a b P do 7 if loc(b = then ready.push(b ; /* b is not used and can be overwritten */ 8 while to do [] do 9 while ready [] do 10 b ready.pop( ; a pred(b ; /* pick a free location */ 11 c loc(a ; emit copy(c b ; loc(a b ; /* generate the copy */ 12 if a = c and pred(a then ready.push(a ; /* first time copied */ 13 b to do.pop( ; /* look for remaining copy */ 14 if b = loc(b then 15 emit copy(b n ; loc(b n ; ready.push(b ; /* break circuit */

72 Qualitative experiments with SPEC CINT2000 Key points of the out-of-ssa translation Copy insertion (to go to CSSA and to handle register renaming constraints followed by coalescing. Value-based interferences coalescing is improved and independent of virtualization (i.e., as in Sreedhar III. Parallel copies followed by sequentialization Intersection Sreedhar I Chaitin Value Sreedhar III Value IS Sharing gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf sum

73 Bug tracking RVM-254 of Jikes RVM Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness Problems with SSA form: lack of loop unrolling breaks VM This problem is probably one of the most serious in the RVM currently. When loop unrolling is disabled and SSA enabled the created IR is corrupt. The error has in the past look like we were suffering from the lost copy problem, but implementing a naive solution to this didn t solve the problem. Their is sound logic behind the code so we need to identify a small test case where things are broken and then reason about what s wrong in leave SSA. This has been attempted once (with the code that removes an element from the live set but the problem no longer appears to surface here. Currently these optimizations are disabled but by RVM 3.0 they should be re-enable and this bug cured.

74 Potential bugs with conditional branches Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u =... v =... u =... v =... w =... v =... B 2 Br(u, B 3, B 4 B 2 v = v Br(u, B 3, B 4 B 2 w = v Br(w, B 3, B 4 B 3 B 4 B 3 B 4 B 3 B 4 w = φ(u, v... = w Initial code w = φ(u, v... = w Blind Sreedhar III... = w Wrong output code

75 Unfeasible out-of-ssa translation example Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u 0 u u u 1 = φ(u 0, u 2 u 2 = u 1 1 t 0 = u 2 Br(u 2,, B 2 Br dec(u,, B 2 Br dec(u,, B 2 t 0 = u B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(u, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 3... = u 2 B 3... = u B 3... = u Initial code After optimization Needs edge splitting

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Compiler Construction 2009/2010 SSA Static Single Assignment Form Peter Thiemann March 15, 2010 Outline 1 Static Single-Assignment Form 2 Converting to SSA Form 3 Optimization Algorithms Using SSA 4 Dependencies