Static single assignment

Size: px
Start display at page:

Download "Static single assignment"

Transcription

1 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses.

2 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. a =... b =... n = b b = a a = n

3 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a =... b =... n = b b = a a = n

4 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

5 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

6 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2

7 Control-flow graph Loop-nesting forest Static single assignment Dominance frontier (elementary algorithm Dominance can be computed by fixed-point iteration: D(r = {r} and D(n = {n} D[p] p pred[n] Many other more efficient algorithms are possible. Then: procedure computedf(n S := {} for each node y in succ[n] do if (idom(y n then S := S {y} /* successor of n not strictly dominated by n */ endfor for each child c of n in the dominator tree do computedf(c for each element w of DF [c] do if (n does not dominate w then S := S {w} endfor endfor DF [n] := S

8 Example What is the dominance frontier of node 5? CMPUT Compiler Design 129and Optimizatio

9 Example First we must find all nodes that node 5 strictly dominates. CMPUT Compiler Design 130and Optimizatio

10 Example A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 131and Optimizatio

11 Example DF(5 = {4, 5, 12, 13} A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 132and Optimizatio

12 Placement of φ-functions Control-flow graph Loop-nesting forest Static single assignment procedure Place-φ-functions(G, DF, D /* D[n] is the set of variables defined in n */ for each node n in G do for each variable a in D[n] do defsites[a] := defsites[a] {n} endfor endfor for each variable a do W := defsites[a] while (W not empty do remove some node n from W for each Y in DF [n] do if (Y / D φ [n] then insert statement a = φ(a,..., a at the top of Y D φ [n] := D φ [n] {Y } if (Y / D[n] then W := W {Y } endif endfor endwhile endfor

13 Renaming variables Control-flow graph Loop-nesting forest Static single assignment procedure Rename(n for each statement S in block n do if (S is not a φ-function then for each use of some variable x in S do i := top(stack[x]; replace the use of x with x i in S endfor endif for each definition of some variable a in S Count[a] + +; i := Count[a]; push i onto Stack[a]; replace definition with a i endfor endfor for each successor Y of block n and each φ-function in Y do i := top(stack[a] where a is the argument coming from n; replace it with a i endfor for each child (in the dominance tree X of n do Rename(X for each definition of some variable a (in the original code do pop Stack[a] procedure RenameAll(G for each variable a do Count[a] := 0; Stack[a] := {}; push 0 onto Stack[a] Rename(r /* root of the dominance tree */

14 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 CMPUT Compiler Design 144and Optimizatio

15 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 145and Optimizatio

16 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 j φ(j,j k φ(k,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 146and Optimizatio

17 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i 0 k k+1 B2 B1 if j<20 i 0 1 j 0 1 k 0 0 j φ(j 0,j k φ(k 0,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 147and Optimizatio

18 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j k 1 φ(k 0,k if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j φ(j 2,j k φ(k 2,k CMPUT Compiler Design 148and Optimizatio

19 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j 3 φ(j 2,j k 3 φ(k 2,k CMPUT Compiler Design 149and Optimizatio

20 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 150and Optimizatio

21 Example: Constant Propagation B1 i 0 1 j 0 1 k 0 0 B1 i 0 1 j 0 1 k 0 0 B2 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B5 B3 j 2 i 0 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B5 B3 j 2 1 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 151and Optimizatio

22 Example: Dead-code Elimination B5 B3 j 2 1 k 2 +1 B7 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 B5 B3 j 2 1 k 2 +1 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 CMPUT Compiler Design 152and Optimizatio

23 Constant Propagation and Dead Code Elimination B5 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 153and Optimizatio

24 Example: Is this the end? B3 k 2 +1 j 4 k 4 +2 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 j 3 φ(1,j 4 k 3 φ(k 2,k 4 return j 1 B6 B4 But block 6 is never executed! How can we find this out, and simplify the program? SSA conditional constant propagation finds the least fixed point for the program and allows further elimination of dead code. See algorithm in Tiger book. CMPUT Compiler Design 154and Optimizatio

25 Example: Dead code elimination B3 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 k 2 +1 j 4 k 4 +2 B6 k 2 +1 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1 k 3 φ(k 2 CMPUT Compiler Design 155and Optimizatio

26 Example: Single Argument φ-function Elimination B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 φ(1 k 3 φ(k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 156and Optimizatio

27 Example: Constant and Copy Propagation B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 157and Optimizatio

28 Example: Dead Code Elimination B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 158and Optimizatio

29 Example: φ-function Simplification B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 CMPUT Compiler Design 159and Optimizatio

30 Example: Constant Propagation B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return 1 B4 CMPUT Compiler Design 160and Optimizatio

31 Example: Dead Code Elimination B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 return 1 B4 return 1 B4 k 2 +1 CMPUT Compiler Design 161and Optimizatio

32 More readings Control-flow graph Loop-nesting forest Static single assignment References Cytron, Ferrante, Rosen, Wegman, Zadek. Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, 13(4: , Ramalingam. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems, 24(5: , Recent advances in SSA SSA-based compilers & JIT compilation. Register allocation, out-of-ssa conversion, liveness analysis. SSA extensions: SSI, gated SSA, psi-ssa, value state dependence graph, array SSA, safetsa, etc.

33 Links between the different notions Control-flow graph Loop-nesting forest Static single assignment A few important results: If S contains the entry node, J(S = J + (S = DF + (S. G is reducible iff simplifiable by the rules T 1 and T 2. iff each SCC has a unique entry node. iff removing all (u, v where v dominates u makes G acyclic.... Dominators and iterated dominance frontiers can be computed quickly from loop-nesting forest, especially if G is reducible. Conversely, DJ-graphs can be used to build loop forests. Advanced algorithms use Tarjan s union-find with almost-linear complexity (see Ramalingam, Sreedhar, Havlak, Steensgaard.

34 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a =... b =... n = b b = a a = n

35 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Swap problem a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n

36 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2

37 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 a 2 = b 2 b 2 = a 2

38 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = b 2 b 2 = a 2

39 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x =... y = x x = x + 1 y

40 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = φ(x 1, x 3 y = x 2 x 3 = x y

41 x 2 = φ(x 1, x 3 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 3 = x x 2

42 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = x 1 x 2 = φ(x 1, x 3 x 3 = x x 2 = x 3 x 2

43 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2

44 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2

45 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3

46 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 Many SSA optimizations turned off in gcc and Jikes.

47 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 0 = φ(a 1,..., a n

48 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 1 = a 1 a i = a i a n = a n Correctness After introduction of variables a i and copies, the code is in CSSA. a 0 = φ(a 1,..., a n a 0 = a 0

49 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. Correctness After introduction of variables a i and copies, the code is in CSSA. Code quality Aggressive coalescing can remove useless copies. But better use accurate notion of interferences. From SSA to CSSA B i a 0 = φ(a 1,..., a n a 0 = a 0 B n a 1 = a 1 a i = a i a n = a n Liveness of φ defined by the a i. Be careful with potential bugs due to conditional branches that use or define variables.

50 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2

51 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2

52 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2

53 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2 (u 2, v 2 = (b 2, a 2

54 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 2, v 2 = (b 2, a 2 n = b 1 b 1 = a 1 a 1 = n

55 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 x 2 = φ(x 1, x 3 u 0 = φ(u 1, u 2 x 3 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2

56 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2

57 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2

58 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 u 0 = φ(u 1, u 2 x 2 = x 1 x 1 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2

59 Outline 1 Control-flow graph Loop-nesting forest Static single assignment 2 3 Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness

60 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = φ(b, c d = φ(b, a

61 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a

62 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a

63 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. Need to update interference graph after coalescing. b =... c =... c = b b = c b b c d a d a

64 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b Need to update interference graph after coalescing. Unique value V of a SSA variable a = a d = a For a copy b = a, V (b = V (a (traversal of dominance tree. Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d = d d d b a a c

65 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... b =... Need to update interference graph after coalescing. Unique value V of a SSA variable For a copy b = a, V (b = V (a (traversal of dominance tree. b Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d d b a a c

66 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize.

67 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d (a, b, c, d = (c, a, b, c b c

68 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c (a, b, c = (d, a, b b c

69 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c c = b b = a a = d b c

70 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. Simple circuit: one more copy. a d d = c c = b b = a a = d b c

71 Algorithm 1: Parallel copy sequentialization algorithm Data: Set P of parallel copies a b, a b, one extra fresh variable n Output: List of copies in sequential order 1 ready [] ; to do [] ; pred(n ; 2 forall the (a b P do 3 loc(b ; pred(a ; /* initialization */ 4 forall the (a b P do 5 loc(a a ; pred(b a ; to do.push(b ; /* copy into b to be done */ 6 forall the (a b P do 7 if loc(b = then ready.push(b ; /* b is not used and can be overwritten */ 8 while to do [] do 9 while ready [] do 10 b ready.pop( ; a pred(b ; /* pick a free location */ 11 c loc(a ; emit copy(c b ; loc(a b ; /* generate the copy */ 12 if a = c and pred(a then ready.push(a ; /* first time copied */ 13 b to do.pop( ; /* look for remaining copy */ 14 if b = loc(b then 15 emit copy(b n ; loc(b n ; ready.push(b ; /* break circuit */

72 Qualitative experiments with SPEC CINT2000 Key points of the out-of-ssa translation Copy insertion (to go to CSSA and to handle register renaming constraints followed by coalescing. Value-based interferences coalescing is improved and independent of virtualization (i.e., as in Sreedhar III. Parallel copies followed by sequentialization Intersection Sreedhar I Chaitin Value Sreedhar III Value IS Sharing gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf sum

73 Bug tracking RVM-254 of Jikes RVM Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness Problems with SSA form: lack of loop unrolling breaks VM This problem is probably one of the most serious in the RVM currently. When loop unrolling is disabled and SSA enabled the created IR is corrupt. The error has in the past look like we were suffering from the lost copy problem, but implementing a naive solution to this didn t solve the problem. Their is sound logic behind the code so we need to identify a small test case where things are broken and then reason about what s wrong in leave SSA. This has been attempted once (with the code that removes an element from the live set but the problem no longer appears to surface here. Currently these optimizations are disabled but by RVM 3.0 they should be re-enable and this bug cured.

74 Potential bugs with conditional branches Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u =... v =... u =... v =... w =... v =... B 2 Br(u, B 3, B 4 B 2 v = v Br(u, B 3, B 4 B 2 w = v Br(w, B 3, B 4 B 3 B 4 B 3 B 4 B 3 B 4 w = φ(u, v... = w Initial code w = φ(u, v... = w Blind Sreedhar III... = w Wrong output code

75 Unfeasible out-of-ssa translation example Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u 0 u u u 1 = φ(u 0, u 2 u 2 = u 1 1 t 0 = u 2 Br(u 2,, B 2 Br dec(u,, B 2 Br dec(u,, B 2 t 0 = u B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(u, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 3... = u 2 B 3... = u B 3... = u Initial code After optimization Needs edge splitting

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Compiler Construction 2009/2010 SSA Static Single Assignment Form Compiler Construction 2009/2010 SSA Static Single Assignment Form Peter Thiemann March 15, 2010 Outline 1 Static Single-Assignment Form 2 Converting to SSA Form 3 Optimization Algorithms Using SSA 4 Dependencies

More information

Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions

Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions Benoit Boissinot Équipe Compsys Laboratoire de l Informatique du Parallélisme (LIP) École normale supérieure

More information

Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA

Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA Advanced Compilers CMPSCI 70 Spring 00 Computing SSA Emery Berger University of Massachusetts, Amherst More SSA Last time dominance SSA form Today Computing SSA form Criteria for Inserting f Functions

More information

Simple and Efficient Construction of Static Single Assignment Form

Simple and Efficient Construction of Static Single Assignment Form Simple and Efficient Construction of Static Single Assignment Form saarland university Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon and Andreas Zwinkau computer science

More information

Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency

Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency Benoit Boissinot, Alain Darte, Fabrice Rastello, Benoît Dupont de Dinechin, Christophe Guillon To cite this version: Benoit

More information

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships Control Flow Analysis Last Time Constant propagation Dominator relationships Today Static Single Assignment (SSA) - a sparse program representation for data flow Dominance Frontier Computing Static Single

More information

8. Static Single Assignment Form. Marcus Denker

8. Static Single Assignment Form. Marcus Denker 8. Static Single Assignment Form Marcus Denker Roadmap > Static Single Assignment Form (SSA) > Converting to SSA Form > Examples > Transforming out of SSA 2 Static Single Assignment Form > Goal: simplify

More information

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean? Static Single Assignment Form What If Many of the complexities of optimization and code generation arise from the fact that a given variable may be assigned to in many different places. Thus reaching definition

More information

Static Single Assignment (SSA) Form

Static Single Assignment (SSA) Form Static Single Assignment (SSA) Form A sparse program representation for data-flow. CSL862 SSA form 1 Computing Static Single Assignment (SSA) Form Overview: What is SSA? Advantages of SSA over use-def

More information

6. Intermediate Representation

6. Intermediate Representation 6. Intermediate Representation Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/

More information

6. Intermediate Representation!

6. Intermediate Representation! 6. Intermediate Representation! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/!

More information

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection An Overview of GCC Architecture (source: wikipedia) CS553 Lecture Control-Flow, Dominators, Loop Detection, and SSA Control-Flow Analysis and Loop Detection Last time Lattice-theoretic framework for data-flow

More information

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1 CSE P 0 Compilers SSA Hal Perkins Spring 0 UW CSE P 0 Spring 0 V- Agenda Overview of SSA IR Constructing SSA graphs Sample of SSA-based optimizations Converting back from SSA form Sources: Appel ch., also

More information

A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs

A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs Dibyendu Das U. Ramakrishna ACM Transactions on Programming Languages and Systems May 2005 Humayun Zafar Outline Introduction

More information

Intermediate representation

Intermediate representation Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.

More information

SSA-Form Register Allocation

SSA-Form Register Allocation SSA-Form Register Allocation Foundations Sebastian Hack Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Overview 1 Graph Theory Perfect Graphs Chordal Graphs 2

More information

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class

More information

Example. Example. Constant Propagation in SSA

Example. Example. Constant Propagation in SSA Example x=1 a=x x=2 b=x x=1 x==10 c=x x++ print x Original CFG x 1 =1 a=x 1 x 2 =2 x 3 =φ (x 1,x 2 ) b=x 3 x 4 =1 x 5 = φ(x 4,x 6 ) x 5 ==10 c=x 5 x 6 =x 5 +1 print x 5 CFG in SSA Form In SSA form computing

More information

CSC D70: Compiler Optimization Register Allocation

CSC D70: Compiler Optimization Register Allocation CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons

More information

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University SSA Construction Daniel Grund & Sebastian Hack Saarland University CC Winter Term 09/10 Outline Overview Intermediate Representations Why? How? IR Concepts Static Single Assignment Form Introduction Theory

More information

Control-Flow Analysis

Control-Flow Analysis Control-Flow Analysis Dragon book [Ch. 8, Section 8.4; Ch. 9, Section 9.6] Compilers: Principles, Techniques, and Tools, 2 nd ed. by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jerey D. Ullman on reserve

More information

SSA. Stanford University CS243 Winter 2006 Wei Li 1

SSA. Stanford University CS243 Winter 2006 Wei Li 1 SSA Wei Li 1 Overview SSA Representation SSA Construction Converting out of SSA 2 Static Single Assignment Each variable has only one reaching definition. When two definitions merge, a Ф function is introduced

More information

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006 P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value

More information

Transla'on Out of SSA Form

Transla'on Out of SSA Form COMP 506 Rice University Spring 2017 Transla'on Out of SSA Form Benoit Boissinot, Alain Darte, Benoit Dupont de Dinechin, Christophe Guillon, and Fabrice Rastello, Revisi;ng Out-of-SSA Transla;on for Correctness,

More information

Compiler Design. Register Allocation. Hwansoo Han

Compiler Design. Register Allocation. Hwansoo Han Compiler Design Register Allocation Hwansoo Han Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of

More information

SSI Revisited. Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello. HAL Id: inria https://hal.inria.

SSI Revisited. Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello. HAL Id: inria https://hal.inria. SSI Revisited Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello To cite this version: Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello. SSI Revisited. [Research Report] LIP 2009-24,

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction

More information

SSI Properties Revisited

SSI Properties Revisited SSI Properties Revisited 21 BENOIT BOISSINOT, ENS-Lyon PHILIP BRISK, University of California, Riverside ALAIN DARTE, CNRS FABRICE RASTELLO, Inria The static single information (SSI) form is an extension

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

The Development of Static Single Assignment Form

The Development of Static Single Assignment Form The Development of Static Single Assignment Form Kenneth Zadeck NaturalBridge, Inc. zadeck@naturalbridge.com Ken's Graduate Optimization Seminar We learned: 1.what kinds of problems could be addressed

More information

Studying Optimal Spilling in the light of SSA

Studying Optimal Spilling in the light of SSA Studying Optimal Spilling in the light of SSA Quentin Colombet, Florian Brandner and Alain Darte Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon Journées compilation, Rennes, France, June 18-20

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

CS153: Compilers Lecture 23: Static Single Assignment Form

CS153: Compilers Lecture 23: Static Single Assignment Form CS153: Compilers Lecture 23: Static Single Assignment Form Stephen Chong https://www.seas.harvard.edu/courses/cs153 Pre-class Puzzle Suppose we want to compute an analysis over CFGs. We have two possible

More information

Lecture 23 CIS 341: COMPILERS

Lecture 23 CIS 341: COMPILERS Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

Global Register Allocation via Graph Coloring

Global Register Allocation via Graph Coloring Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

Improvements to Linear Scan register allocation

Improvements to Linear Scan register allocation Improvements to Linear Scan register allocation Alkis Evlogimenos (alkis) April 1, 2004 1 Abstract Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Register Allocation via Hierarchical Graph Coloring

Register Allocation via Hierarchical Graph Coloring Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL

More information

Static Single Assignment Form in the COINS Compiler Infrastructure

Static Single Assignment Form in the COINS Compiler Infrastructure Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology) Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler

More information

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment ECE 5775 High-Level Digital Design Automation Fall 2018 More CFG Static Single Assignment Announcements HW 1 due next Monday (9/17) Lab 2 will be released tonight (due 9/24) Instructor OH cancelled this

More information

Data Flow Analysis and Computation of SSA

Data Flow Analysis and Computation of SSA Compiler Design 1 Data Flow Analysis and Computation of SSA Compiler Design 2 Definitions A basic block is the longest sequence of three-address codes with the following properties. The control flows to

More information

Control Flow Analysis & Def-Use. Hwansoo Han

Control Flow Analysis & Def-Use. Hwansoo Han Control Flow Analysis & Def-Use Hwansoo Han Control Flow Graph What is CFG? Represents program structure for internal use of compilers Used in various program analyses Generated from AST or a sequential

More information

Lecture 21 CIS 341: COMPILERS

Lecture 21 CIS 341: COMPILERS Lecture 21 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Available Soon Due: Wednesday, April 25 th Zdancewic

More information

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline. What a Smart Allocator Needs to Do Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Determine ranges for each variable can benefit from using

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

Topic I (d): Static Single Assignment Form (SSA)

Topic I (d): Static Single Assignment Form (SSA) Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge

More information

register allocation saves energy register allocation reduces memory accesses.

register allocation saves energy register allocation reduces memory accesses. Lesson 10 Register Allocation Full Compiler Structure Embedded systems need highly optimized code. This part of the course will focus on Back end code generation. Back end: generation of assembly instructions

More information

The Designing and Implementation of A SSA-based register allocator

The Designing and Implementation of A SSA-based register allocator The Designing and Implementation of A SSA-based register allocator Fernando Magno Quintao Pereira UCLA - University of California, Los Angeles Abstract. The register allocation problem has an exact polynomial

More information

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

CS577 Modern Language Processors. Spring 2018 Lecture Optimization CS577 Modern Language Processors Spring 2018 Lecture Optimization 1 GENERATING BETTER CODE What does a conventional compiler do to improve quality of generated code? Eliminate redundant computation Move

More information

Intermediate Representation (IR)

Intermediate Representation (IR) Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler

More information

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices Register allocation IR instruction selection register allocation machine code errors Register allocation: have value in a register when used limited resources changes instruction choices can move loads

More information

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators. The Lecture Contains: Algorithm to Find Dominators Loop Detection Algorithm to Detect Loops Extended Basic Block Pre-Header Loops With Common eaders Reducible Flow Graphs Node Splitting Interval Analysis

More information

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210 Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables

More information

Global Register Allocation - Part 2

Global Register Allocation - Part 2 Global Register Allocation - Part 2 Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline Issues in Global Register Allocation

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment ECE 5775 (Fall 17) High-Level Digital Design Automation Static Single Assignment Announcements HW 1 released (due Friday) Student-led discussions on Tuesday 9/26 Sign up on Piazza: 3 students / group Meet

More information

Introduction to Machine-Independent Optimizations - 4

Introduction to Machine-Independent Optimizations - 4 Introduction to Machine-Independent Optimizations - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of

More information

Control flow and loop detection. TDT4205 Lecture 29

Control flow and loop detection. TDT4205 Lecture 29 1 Control flow and loop detection TDT4205 Lecture 29 2 Where we are We have a handful of different analysis instances None of them are optimizations, in and of themselves The objective now is to Show how

More information

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V. Register allocation CS3300 - Compiler Design Liveness analysis and Register allocation V. Krishna Nandivada IIT Madras Copyright c 2014 by Antony L. Hosking. Permission to make digital or hard copies of

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

Fast Copy Coalescing and Live-Range Identification

Fast Copy Coalescing and Live-Range Identification Fast Copy Coalescing and Live-Range Identification Zoran Budimlić, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, and Steven W. Reeves Department of Computer Science Rice University,

More information

Global Register Allocation

Global Register Allocation Global Register Allocation Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline n Issues in Global Register Allocation n The

More information

Mechanizing Conventional SSA for a Verified Destruction with Coalescing

Mechanizing Conventional SSA for a Verified Destruction with Coalescing Mechanizing Conventional SSA for a Verified Destruction with Coalescing Delphine Demange University of Rennes 1 / IRISA / Inria, France delphine.demange@irisa.fr Yon Fernandez de Retana University of Rennes

More information

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Review; questions Basic Analyses (2) Assign (see Schedule for links) Class 2 Review; questions Basic Analyses (2) Assign (see Schedule for links) Representation and Analysis of Software (Sections -5) Additional reading: depth-first presentation, data-flow analysis, etc.

More information

Lecture 3 Local Optimizations, Intro to SSA

Lecture 3 Local Optimizations, Intro to SSA Lecture 3 Local Optimizations, Intro to SSA I. Basic blocks & Flow graphs II. Abstraction 1: DAG III. Abstraction 2: Value numbering IV. Intro to SSA ALSU 8.4-8.5, 6.2.4 Phillip B. Gibbons 15-745: Local

More information

The C2 Register Allocator. Niclas Adlertz

The C2 Register Allocator. Niclas Adlertz The C2 Register Allocator Niclas Adlertz 1 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code Traditional Three-pass Compiler Source Code Front End IR Middle End IR Back End Machine code Errors Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce

More information

Control Flow Analysis

Control Flow Analysis Control Flow Analysis Last time Undergraduate compilers in a day Today Assignment 0 due Control-flow analysis Building basic blocks Building control-flow graphs Loops January 28, 2015 Control Flow Analysis

More information

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)

More information

Data structures for optimizing programs with explicit parallelism

Data structures for optimizing programs with explicit parallelism Oregon Health & Science University OHSU Digital Commons CSETech March 1991 Data structures for optimizing programs with explicit parallelism Michael Wolfe Harini Srinivasan Follow this and additional works

More information

Execution-based Prediction Using Speculative Slices

Execution-based Prediction Using Speculative Slices Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers

More information

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210 Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control

More information

Control Flow Analysis

Control Flow Analysis COMP 6 Program Analysis and Transformations These slides have been adapted from http://cs.gmu.edu/~white/cs60/slides/cs60--0.ppt by Professor Liz White. How to represent the structure of the program? Based

More information

Phi-Predication for Light-Weight If-Conversion

Phi-Predication for Light-Weight If-Conversion Phi-Predication for Light-Weight If-Conversion Weihaw Chuang Brad Calder Jeanne Ferrante Benefits of If-Conversion Eliminates hard to predict branches Important for deep pipelines How? Executes all paths

More information

A Sparse Algorithm for Predicated Global Value Numbering

A Sparse Algorithm for Predicated Global Value Numbering Sparse Predicated Global Value Numbering A Sparse Algorithm for Predicated Global Value Numbering Karthik Gargi Hewlett-Packard India Software Operation PLDI 02 Monday 17 June 2002 1. Introduction 2. Brute

More information

Parameterized Construction of Program Representations for Sparse Dataflow Analyses

Parameterized Construction of Program Representations for Sparse Dataflow Analyses Parameterized Construction of Program Representations for Sparse Dataflow Analyses André L. C. Tavares 1, Mariza A. S. Bigonha 1, Roberto S. Bigonha 1, Benoit Boissinot 2, Fernando M. Q. Pereira 1 and

More information

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control Control Flow Graphs Nodes Edges Statements or asic locks (Maximal sequence of code with branching only allowed at end) Possible transfer of control Example: if P then S1 else S2 S3 S1 P S3 S2 CFG P predecessor

More information

Bumyong Choi Leo Porter Dean Tullsen University of California, San Diego ASPLOS

Bumyong Choi Leo Porter Dean Tullsen University of California, San Diego ASPLOS Bumyong Choi Leo Porter Dean Tullsen University of California, San Diego ASPLOS 2008 1 Multi-core proposals take advantage of additional cores to improve performance, power, and/or reliability Moving towards

More information

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Live Variable Analysis. Work List Iterative Algorithm Rehashed Putting Data Flow Analysis to Work Last Time Iterative Worklist Algorithm via Reaching Definitions Why it terminates. What it computes. Why it works. How fast it goes. Today Live Variable Analysis (backward

More information

Lecture Notes on Static Single Assignment Form

Lecture Notes on Static Single Assignment Form Lecture Notes on Static Single Assignment Form 15-411: Compiler Design Frank Pfenning Lecture 6 September 12, 2013 1 Introduction In abstract machine code of the kind we have discussed so far, a variable

More information

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013 Loops or Lather, Rinse, Repeat CS4410: Spring 2013 Program Loops Reading: Appel Ch. 18 Loop = a computation repeatedly executed until a terminating condition is reached High-level loop constructs: While

More information

Intermediate Representations Part II

Intermediate Representations Part II Intermediate Representations Part II Types of Intermediate Representations Three major categories Structural Linear Hybrid Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Loop Invariant Code Motion Last Time Loop invariant code motion Value numbering Today Finish value numbering More reuse optimization Common subession elimination Partial redundancy elimination Next Time

More information

SSA Elimination after Register Allocation

SSA Elimination after Register Allocation SSA Elimination after Register Allocation Fernando Magno Quintão Pereira and Jens Palsberg UCLA University of California, Los Angeles Abstract. Compilers such as gcc use static-single-assignment (SSA)

More information

Punctual Coalescing. Fernando Magno Quintão Pereira

Punctual Coalescing. Fernando Magno Quintão Pereira Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on

More information

Why Global Dataflow Analysis?

Why Global Dataflow Analysis? Why Global Dataflow Analysis? Answer key questions at compile-time about the flow of values and other program properties over control-flow paths Compiler fundamentals What defs. of x reach a given use

More information

Programming Language Processor Theory

Programming Language Processor Theory Programming Language Processor Theory Munehiro Takimoto Course Descriptions Method of Evaluation: made through your technical reports Purposes: understanding various theories and implementations of modern

More information

Anticipation-based partial redundancy elimination for static single assignment form

Anticipation-based partial redundancy elimination for static single assignment form SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 00; : 9 Published online 5 August 00 in Wiley InterScience (www.interscience.wiley.com). DOI: 0.00/spe.68 Anticipation-based partial redundancy elimination

More information

The Static Single Assignment Form:

The Static Single Assignment Form: The Static Single Assignment Form: Construction and Application to Program Optimizations - Part 3 Department of Computer Science Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design

More information

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt High Performance Systems Group

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom

BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES A Thesis Submitted to the Faculty of Purdue University by Nathaniel John Nystrom In Partial Fulfillment of the Requirements for the Degree of Master

More information

BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom

BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES A Thesis Submitted to the Faculty of Purdue University by Nathaniel John Nystrom In Partial Fulfillment of the Requirements for the Degree of Master

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 4 Dataflow Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This lecture:

More information

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1 CSE P 501 Compilers Register Allocation Hal Perkins Autumn 2011 11/22/2011 2002-11 Hal Perkins & UW CSE P-1 Agenda Register allocation constraints Local methods Faster compile, slower code, but good enough

More information

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis Challenges in the back end CS3300 - Compiler Design Basic Blocks and CFG V. Krishna Nandivada IIT Madras The input to the backend (What?). The target program instruction set, constraints, relocatable or

More information

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time Graph Coloring Register Allocation Last Time Chaitin et al. Briggs et al. Today Finish Briggs et al. basics An improvement: rematerialization Rematerialization Some expressions are especially simple to

More information

POSH: A TLS Compiler that Exploits Program Structure

POSH: A TLS Compiler that Exploits Program Structure POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops DF00100 Advanced Compiler Construction Register Allocation Register Allocation: Determines values (variables, temporaries, constants) to be kept when in registers Register Assignment: Determine in which

More information