Static single assignment
|
|
- Erick Briggs
- 5 years ago
- Views:
Transcription
1 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses.
2 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. a =... b =... n = b b = a a = n
3 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a =... b =... n = b b = a a = n
4 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n
5 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n
6 Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Conversion into SSA Need to introduce φ-functions at the (iterated dominance frontier. Interests of SSA Link uses/definitions explicit. Code optimizations: efficient, easy-to-implement, fast. More accurate program analysis. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2
7 Control-flow graph Loop-nesting forest Static single assignment Dominance frontier (elementary algorithm Dominance can be computed by fixed-point iteration: D(r = {r} and D(n = {n} D[p] p pred[n] Many other more efficient algorithms are possible. Then: procedure computedf(n S := {} for each node y in succ[n] do if (idom(y n then S := S {y} /* successor of n not strictly dominated by n */ endfor for each child c of n in the dominator tree do computedf(c for each element w of DF [c] do if (n does not dominate w then S := S {w} endfor endfor DF [n] := S
8 Example What is the dominance frontier of node 5? CMPUT Compiler Design 129and Optimizatio
9 Example First we must find all nodes that node 5 strictly dominates. CMPUT Compiler Design 130and Optimizatio
10 Example A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 131and Optimizatio
11 Example DF(5 = {4, 5, 12, 13} A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5? CMPUT Compiler Design 132and Optimizatio
12 Placement of φ-functions Control-flow graph Loop-nesting forest Static single assignment procedure Place-φ-functions(G, DF, D /* D[n] is the set of variables defined in n */ for each node n in G do for each variable a in D[n] do defsites[a] := defsites[a] {n} endfor endfor for each variable a do W := defsites[a] while (W not empty do remove some node n from W for each Y in DF [n] do if (Y / D φ [n] then insert statement a = φ(a,..., a at the top of Y D φ [n] := D φ [n] {Y } if (Y / D[n] then W := W {Y } endif endfor endwhile endfor
13 Renaming variables Control-flow graph Loop-nesting forest Static single assignment procedure Rename(n for each statement S in block n do if (S is not a φ-function then for each use of some variable x in S do i := top(stack[x]; replace the use of x with x i in S endfor endif for each definition of some variable a in S Count[a] + +; i := Count[a]; push i onto Stack[a]; replace definition with a i endfor endfor for each successor Y of block n and each φ-function in Y do i := top(stack[a] where a is the argument coming from n; replace it with a i endfor for each child (in the dominance tree X of n do Rename(X for each definition of some variable a (in the original code do pop Stack[a] procedure RenameAll(G for each variable a do Count[a] := 0; Stack[a] := {}; push 0 onto Stack[a] Rename(r /* root of the dominance tree */
14 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 CMPUT Compiler Design 144and Optimizatio
15 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 145and Optimizatio
16 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i k k+1 B2 B1 if j<20 i 1 j 1 k 0 j φ(j,j k φ(k,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 146and Optimizatio
17 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j i 0 k k+1 B2 B1 if j<20 i 0 1 j 0 1 k 0 0 j φ(j 0,j k φ(k 0,k if k<100 j k k k+2 return j B6 B4 B7 j φ(j,j k φ(k,k CMPUT Compiler Design 147and Optimizatio
18 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j k 1 φ(k 0,k if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j φ(j 2,j k φ(k 2,k CMPUT Compiler Design 148and Optimizatio
19 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 j k k k+2 return j 1 B6 B4 B7 j 3 φ(j 2,j k 3 φ(k 2,k CMPUT Compiler Design 149and Optimizatio
20 SSA: A Complete Example. i=1; j=1; k=0; while(k<100 { if(j<20 { j=i; k=k+1; } else { j=k; k=k+2; } } return j; B5 B3 j 2 i 0 k 2 +1 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 150and Optimizatio
21 Example: Constant Propagation B1 i 0 1 j 0 1 k 0 0 B1 i 0 1 j 0 1 k 0 0 B2 j 1 φ(j 0,j 3 k 1 φ(k 0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B5 B3 j 2 i 0 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B5 B3 j 2 1 k 2 +1 if j 1 <20 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 151and Optimizatio
22 Example: Dead-code Elimination B5 B3 j 2 1 k 2 +1 B7 B2 B1 if j 1 <20 i 0 1 j 0 1 k 0 0 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 B5 B3 j 2 1 k 2 +1 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B6 B4 CMPUT Compiler Design 152and Optimizatio
23 Constant Propagation and Dead Code Elimination B5 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B3 j 2 1 k 2 +1 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 j 4 k 4 +2 B6 B4 B7 j 3 φ(j 2,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 CMPUT Compiler Design 153and Optimizatio
24 Example: Is this the end? B3 k 2 +1 j 4 k 4 +2 B7 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 j 3 φ(1,j 4 k 3 φ(k 2,k 4 return j 1 B6 B4 But block 6 is never executed! How can we find this out, and simplify the program? SSA conditional constant propagation finds the least fixed point for the program and allows further elimination of dead code. See algorithm in Tiger book. CMPUT Compiler Design 154and Optimizatio
25 Example: Dead code elimination B3 B2 if j 1 <20 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 return j 1 B4 k 2 +1 j 4 k 4 +2 B6 k 2 +1 B7 j 3 φ(1,j 4 k 3 φ(k 2,k 4 B7 j 3 φ(1 k 3 φ(k 2 CMPUT Compiler Design 155and Optimizatio
26 Example: Single Argument φ-function Elimination B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 φ(1 k 3 φ(k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 156and Optimizatio
27 Example: Constant and Copy Propagation B2 j 1 φ(1,j 3 k 1 φ(0,k 3 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 157and Optimizatio
28 Example: Dead Code Elimination B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 B7 j 3 1 k 3 k 2 CMPUT Compiler Design 158and Optimizatio
29 Example: φ-function Simplification B2 j 1 φ(1,1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return j 1 B4 CMPUT Compiler Design 159and Optimizatio
30 Example: Constant Propagation B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 k 2 +1 return j 1 B4 k 2 +1 return 1 B4 CMPUT Compiler Design 160and Optimizatio
31 Example: Dead Code Elimination B2 j 1 1 k 1 φ(0,k 2 if k 1 <100 return 1 B4 return 1 B4 k 2 +1 CMPUT Compiler Design 161and Optimizatio
32 More readings Control-flow graph Loop-nesting forest Static single assignment References Cytron, Ferrante, Rosen, Wegman, Zadek. Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, 13(4: , Ramalingam. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems, 24(5: , Recent advances in SSA SSA-based compilers & JIT compilation. Register allocation, out-of-ssa conversion, liveness analysis. SSA extensions: SSI, gated SSA, psi-ssa, value state dependence graph, array SSA, safetsa, etc.
33 Links between the different notions Control-flow graph Loop-nesting forest Static single assignment A few important results: If S contains the entry node, J(S = J + (S = DF + (S. G is reducible iff simplifiable by the rules T 1 and T 2. iff each SCC has a unique entry node. iff removing all (u, v where v dominates u makes G acyclic.... Dominators and iterated dominance frontiers can be computed quickly from loop-nesting forest, especially if G is reducible. Conversely, DJ-graphs can be used to build loop forests. Advanced algorithms use Tarjan s union-find with almost-linear complexity (see Ramalingam, Sreedhar, Havlak, Steensgaard.
34 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a =... b =... n = b b = a a = n
35 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Swap problem a 1 =... b 1 =... a 2 = φ(a 1, a 3 b 2 = φ(b 1, b 3 n = b 2 b 3 = a 2 a 3 = n
36 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2
37 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 a 2 = b 2 b 2 = a 2
38 Early attempts and pitfalls Swap problem Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. a 1 =... b 1 =... a 2 = a 1 b 2 = b 1 a 2 = b 2 b 2 = a 2
39 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x =... y = x x = x + 1 y
40 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = φ(x 1, x 3 y = x 2 x 3 = x y
41 x 2 = φ(x 1, x 3 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 3 = x x 2
42 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies. Lost copy problem x 1 =... x 2 = x 1 x 2 = φ(x 1, x 3 x 3 = x x 2 = x 3 x 2
43 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2
44 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Lost copy problem x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 x 2
45 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3
46 Early attempts and pitfalls Cytron et al. (1991: copies in predecessor basic blocks. Incorrect! Bad understanding of parallel copies; Bad understanding of critical edges and interferences. Briggs et al. (1998: both problems identified. General correctness unclear. Sreedhar et al. (1999: correct but handling of complex branching instructions unclear; interplay with coalescing unclear; virtualization hard to implement. Lost copy problem x 2 x 1 =... x 2 = x 1 x 3 = x x 2 = x 3 Many SSA optimizations turned off in gcc and Jikes.
47 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 0 = φ(a 1,..., a n
48 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. From SSA to CSSA B i B n a 1 = a 1 a i = a i a n = a n Correctness After introduction of variables a i and copies, the code is in CSSA. a 0 = φ(a 1,..., a n a 0 = a 0
49 Going to CSSA (conventional SSA: Sreedhar et al. Definition (conventional SSA CSSA: if variables can be renamed, without changing program semantics, so that, for all φ-function a 0 = φ(a 1,..., a n, a 0,..., a n have the same name. Correctness After introduction of variables a i and copies, the code is in CSSA. Code quality Aggressive coalescing can remove useless copies. But better use accurate notion of interferences. From SSA to CSSA B i a 0 = φ(a 1,..., a n a 0 = a 0 B n a 1 = a 1 a i = a i a n = a n Liveness of φ defined by the a i. Be careful with potential bugs due to conditional branches that use or define variables.
50 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 a 2 = φ(a 1, b 2 b 2 = φ(b 1, a 2 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2
51 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2
52 Coalesced example: the swap problem a 1 u = (u 0, u 1, u 2 a 2 a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 b 1 v = (v 0, v 1, v 2 b 2 (u 2, v 2 = (b 2, a 2
53 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 1, v 1 = (a 1, b 1 u 0 = φ(u 1, u 2 v 0 = φ(v 1, v 2 (a 2, b 2 = (u 0, v 0 (u 2, v 2 = (b 2, a 2 (u 2, v 2 = (b 2, a 2
54 Coalesced example: the swap problem a 1 =... b 1 =... a 1 =... b 1 =... (u 2, v 2 = (b 2, a 2 n = b 1 b 1 = a 1 a 1 = n
55 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 x 2 = φ(x 1, x 3 u 0 = φ(u 1, u 2 x 3 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2
56 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2
57 Coalesced example: the lost copy problem x 1 =... u 1 = x 1 x 1 u = (u 0, u 1, u 2 x 3 u 0 = φ(u 1, u 2 x 2 x 2 = u 0 x 3 = x u 2 = x 3 x 2
58 Coalesced example: the lost copy problem x 1 =... x 1 =... u 1 = x 1 u 0 = φ(u 1, u 2 x 2 = x 1 x 1 = x x 2 = u 0 x 3 = x u 2 = x 3 x 2 x 2
59 Outline 1 Control-flow graph Loop-nesting forest Static single assignment 2 3 Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness
60 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = φ(b, c d = φ(b, a
61 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a
62 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b a = a d = a d = d d b a c d a
63 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. Need to update interference graph after coalescing. b =... c =... c = b b = c b b c d a d a
64 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... c =... a = b a = c d = b Need to update interference graph after coalescing. Unique value V of a SSA variable a = a d = a For a copy b = a, V (b = V (a (traversal of dominance tree. Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d = d d d b a a c
65 Exploiting SSA: value-based interferences Definition (Chaitin interference Two variables interfere if one is live at the definition of the other, which is not a copy of the first. b =... b =... Need to update interference graph after coalescing. Unique value V of a SSA variable For a copy b = a, V (b = V (a (traversal of dominance tree. b Value-based interference a and b interfere if V (a V (b and Live-range(a Live-range(b. d d b a a c
66 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize.
67 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d (a, b, c, d = (c, a, b, c b c
68 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c (a, b, c = (d, a, b b c
69 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. a d d = c c = b b = a a = d b c
70 Using parallel copies instead of sequential copies Parallel copy semantics In (a 1,..., a n = (b 1,..., b n, all copies a i = b i are simultaneous. Fewer interferences than with sequential copies. Easier insertion & liveness updates. But need to sequentialize. Particular copy structure Directed graph with edges b i a i. Directed trees with roots=circuits. Insert copies for the leaves first. Simple circuit: one more copy. a d d = c c = b b = a a = d b c
71 Algorithm 1: Parallel copy sequentialization algorithm Data: Set P of parallel copies a b, a b, one extra fresh variable n Output: List of copies in sequential order 1 ready [] ; to do [] ; pred(n ; 2 forall the (a b P do 3 loc(b ; pred(a ; /* initialization */ 4 forall the (a b P do 5 loc(a a ; pred(b a ; to do.push(b ; /* copy into b to be done */ 6 forall the (a b P do 7 if loc(b = then ready.push(b ; /* b is not used and can be overwritten */ 8 while to do [] do 9 while ready [] do 10 b ready.pop( ; a pred(b ; /* pick a free location */ 11 c loc(a ; emit copy(c b ; loc(a b ; /* generate the copy */ 12 if a = c and pred(a then ready.push(a ; /* first time copied */ 13 b to do.pop( ; /* look for remaining copy */ 14 if b = loc(b then 15 emit copy(b n ; loc(b n ; ready.push(b ; /* break circuit */
72 Qualitative experiments with SPEC CINT2000 Key points of the out-of-ssa translation Copy insertion (to go to CSSA and to handle register renaming constraints followed by coalescing. Value-based interferences coalescing is improved and independent of virtualization (i.e., as in Sreedhar III. Parallel copies followed by sequentialization Intersection Sreedhar I Chaitin Value Sreedhar III Value IS Sharing gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf sum
73 Bug tracking RVM-254 of Jikes RVM Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness Problems with SSA form: lack of loop unrolling breaks VM This problem is probably one of the most serious in the RVM currently. When loop unrolling is disabled and SSA enabled the created IR is corrupt. The error has in the past look like we were suffering from the lost copy problem, but implementing a naive solution to this didn t solve the problem. Their is sound logic behind the code so we need to identify a small test case where things are broken and then reason about what s wrong in leave SSA. This has been attempted once (with the code that removes an element from the live set but the problem no longer appears to surface here. Currently these optimizations are disabled but by RVM 3.0 they should be re-enable and this bug cured.
74 Potential bugs with conditional branches Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u =... v =... u =... v =... w =... v =... B 2 Br(u, B 3, B 4 B 2 v = v Br(u, B 3, B 4 B 2 w = v Br(w, B 3, B 4 B 3 B 4 B 3 B 4 B 3 B 4 w = φ(u, v... = w Initial code w = φ(u, v... = w Blind Sreedhar III... = w Wrong output code
75 Unfeasible out-of-ssa translation example Dominance, liveness, interferences, and chordal graphs Construction of liveness sets in reducible CFGs for strict SSA Extensions to irreducible CFGs and for checking liveness u 0 u u u 1 = φ(u 0, u 2 u 2 = u 1 1 t 0 = u 2 Br(u 2,, B 2 Br dec(u,, B 2 Br dec(u,, B 2 t 0 = u B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(u, t 2 t 2 = t Br(t 2,, B 2 B 2 t 1 = φ(t 0, t 2 t 2 = t Br(t 2,, B 2 B 3... = u 2 B 3... = u B 3... = u Initial code After optimization Needs edge splitting
Compiler Construction 2009/2010 SSA Static Single Assignment Form
Compiler Construction 2009/2010 SSA Static Single Assignment Form Peter Thiemann March 15, 2010 Outline 1 Static Single-Assignment Form 2 Converting to SSA Form 3 Optimization Algorithms Using SSA 4 Dependencies
More informationTowards an SSA based compiler back-end: some interesting properties of SSA and its extensions
Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions Benoit Boissinot Équipe Compsys Laboratoire de l Informatique du Parallélisme (LIP) École normale supérieure
More informationAdvanced Compilers CMPSCI 710 Spring 2003 Computing SSA
Advanced Compilers CMPSCI 70 Spring 00 Computing SSA Emery Berger University of Massachusetts, Amherst More SSA Last time dominance SSA form Today Computing SSA form Criteria for Inserting f Functions
More informationSimple and Efficient Construction of Static Single Assignment Form
Simple and Efficient Construction of Static Single Assignment Form saarland university Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon and Andreas Zwinkau computer science
More informationRevisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency
Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency Benoit Boissinot, Alain Darte, Fabrice Rastello, Benoît Dupont de Dinechin, Christophe Guillon To cite this version: Benoit
More informationComputing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships
Control Flow Analysis Last Time Constant propagation Dominator relationships Today Static Single Assignment (SSA) - a sparse program representation for data flow Dominance Frontier Computing Static Single
More information8. Static Single Assignment Form. Marcus Denker
8. Static Single Assignment Form Marcus Denker Roadmap > Static Single Assignment Form (SSA) > Converting to SSA Form > Examples > Transforming out of SSA 2 Static Single Assignment Form > Goal: simplify
More informationWhat If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?
Static Single Assignment Form What If Many of the complexities of optimization and code generation arise from the fact that a given variable may be assigned to in many different places. Thus reaching definition
More informationStatic Single Assignment (SSA) Form
Static Single Assignment (SSA) Form A sparse program representation for data-flow. CSL862 SSA form 1 Computing Static Single Assignment (SSA) Form Overview: What is SSA? Advantages of SSA over use-def
More information6. Intermediate Representation
6. Intermediate Representation Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/
More information6. Intermediate Representation!
6. Intermediate Representation! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/!
More informationAn Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection
An Overview of GCC Architecture (source: wikipedia) CS553 Lecture Control-Flow, Dominators, Loop Detection, and SSA Control-Flow Analysis and Loop Detection Last time Lattice-theoretic framework for data-flow
More informationCSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1
CSE P 0 Compilers SSA Hal Perkins Spring 0 UW CSE P 0 Spring 0 V- Agenda Overview of SSA IR Constructing SSA graphs Sample of SSA-based optimizations Converting back from SSA form Sources: Appel ch., also
More informationA Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs
A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs Dibyendu Das U. Ramakrishna ACM Transactions on Programming Languages and Systems May 2005 Humayun Zafar Outline Introduction
More informationIntermediate representation
Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.
More informationSSA-Form Register Allocation
SSA-Form Register Allocation Foundations Sebastian Hack Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Overview 1 Graph Theory Perfect Graphs Chordal Graphs 2
More informationRegister Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations
Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class
More informationExample. Example. Constant Propagation in SSA
Example x=1 a=x x=2 b=x x=1 x==10 c=x x++ print x Original CFG x 1 =1 a=x 1 x 2 =2 x 3 =φ (x 1,x 2 ) b=x 3 x 4 =1 x 5 = φ(x 4,x 6 ) x 5 ==10 c=x 5 x 6 =x 5 +1 print x 5 CFG in SSA Form In SSA form computing
More informationCSC D70: Compiler Optimization Register Allocation
CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons
More informationSSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University
SSA Construction Daniel Grund & Sebastian Hack Saarland University CC Winter Term 09/10 Outline Overview Intermediate Representations Why? How? IR Concepts Static Single Assignment Form Introduction Theory
More informationControl-Flow Analysis
Control-Flow Analysis Dragon book [Ch. 8, Section 8.4; Ch. 9, Section 9.6] Compilers: Principles, Techniques, and Tools, 2 nd ed. by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jerey D. Ullman on reserve
More informationSSA. Stanford University CS243 Winter 2006 Wei Li 1
SSA Wei Li 1 Overview SSA Representation SSA Construction Converting out of SSA 2 Static Single Assignment Each variable has only one reaching definition. When two definitions merge, a Ф function is introduced
More informationOutline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006
P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value
More informationTransla'on Out of SSA Form
COMP 506 Rice University Spring 2017 Transla'on Out of SSA Form Benoit Boissinot, Alain Darte, Benoit Dupont de Dinechin, Christophe Guillon, and Fabrice Rastello, Revisi;ng Out-of-SSA Transla;on for Correctness,
More informationCompiler Design. Register Allocation. Hwansoo Han
Compiler Design Register Allocation Hwansoo Han Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of
More informationSSI Revisited. Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello. HAL Id: inria https://hal.inria.
SSI Revisited Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello To cite this version: Benoit Boissinot, Philip Brisk, Alain Darte, Fabrice Rastello. SSI Revisited. [Research Report] LIP 2009-24,
More informationCode generation for modern processors
Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction
More informationSSI Properties Revisited
SSI Properties Revisited 21 BENOIT BOISSINOT, ENS-Lyon PHILIP BRISK, University of California, Riverside ALAIN DARTE, CNRS FABRICE RASTELLO, Inria The static single information (SSI) form is an extension
More informationCode generation for modern processors
Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm
More informationA Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013
A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing
More informationThe Development of Static Single Assignment Form
The Development of Static Single Assignment Form Kenneth Zadeck NaturalBridge, Inc. zadeck@naturalbridge.com Ken's Graduate Optimization Seminar We learned: 1.what kinds of problems could be addressed
More informationStudying Optimal Spilling in the light of SSA
Studying Optimal Spilling in the light of SSA Quentin Colombet, Florian Brandner and Alain Darte Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon Journées compilation, Rennes, France, June 18-20
More informationCompiler Construction 2016/2017 Loop Optimizations
Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling
More informationCS153: Compilers Lecture 23: Static Single Assignment Form
CS153: Compilers Lecture 23: Static Single Assignment Form Stephen Chong https://www.seas.harvard.edu/courses/cs153 Pre-class Puzzle Suppose we want to compute an analysis over CFGs. We have two possible
More informationLecture 23 CIS 341: COMPILERS
Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:
More informationCombining Analyses, Combining Optimizations - Summary
Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate
More informationGlobal Register Allocation via Graph Coloring
Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission
More informationImprovements to Linear Scan register allocation
Improvements to Linear Scan register allocation Alkis Evlogimenos (alkis) April 1, 2004 1 Abstract Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative
More informationCS 406/534 Compiler Construction Putting It All Together
CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationRegister Allocation via Hierarchical Graph Coloring
Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL
More informationStatic Single Assignment Form in the COINS Compiler Infrastructure
Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology) Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler
More informationECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment
ECE 5775 High-Level Digital Design Automation Fall 2018 More CFG Static Single Assignment Announcements HW 1 due next Monday (9/17) Lab 2 will be released tonight (due 9/24) Instructor OH cancelled this
More informationData Flow Analysis and Computation of SSA
Compiler Design 1 Data Flow Analysis and Computation of SSA Compiler Design 2 Definitions A basic block is the longest sequence of three-address codes with the following properties. The control flows to
More informationControl Flow Analysis & Def-Use. Hwansoo Han
Control Flow Analysis & Def-Use Hwansoo Han Control Flow Graph What is CFG? Represents program structure for internal use of compilers Used in various program analyses Generated from AST or a sequential
More informationLecture 21 CIS 341: COMPILERS
Lecture 21 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Available Soon Due: Wednesday, April 25 th Zdancewic
More informationRegister Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.
What a Smart Allocator Needs to Do Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Determine ranges for each variable can benefit from using
More informationCompiler Construction 2010/2011 Loop Optimizations
Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6
More informationTopic I (d): Static Single Assignment Form (SSA)
Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge
More informationregister allocation saves energy register allocation reduces memory accesses.
Lesson 10 Register Allocation Full Compiler Structure Embedded systems need highly optimized code. This part of the course will focus on Back end code generation. Back end: generation of assembly instructions
More informationThe Designing and Implementation of A SSA-based register allocator
The Designing and Implementation of A SSA-based register allocator Fernando Magno Quintao Pereira UCLA - University of California, Los Angeles Abstract. The register allocation problem has an exact polynomial
More informationCS577 Modern Language Processors. Spring 2018 Lecture Optimization
CS577 Modern Language Processors Spring 2018 Lecture Optimization 1 GENERATING BETTER CODE What does a conventional compiler do to improve quality of generated code? Eliminate redundant computation Move
More informationIntermediate Representation (IR)
Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler
More informationRegister allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices
Register allocation IR instruction selection register allocation machine code errors Register allocation: have value in a register when used limited resources changes instruction choices can move loads
More informationModule 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.
The Lecture Contains: Algorithm to Find Dominators Loop Detection Algorithm to Detect Loops Extended Basic Block Pre-Header Loops With Common eaders Reducible Flow Graphs Node Splitting Interval Analysis
More informationRegister Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210
Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables
More informationGlobal Register Allocation - Part 2
Global Register Allocation - Part 2 Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline Issues in Global Register Allocation
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment
ECE 5775 (Fall 17) High-Level Digital Design Automation Static Single Assignment Announcements HW 1 released (due Friday) Student-led discussions on Tuesday 9/26 Sign up on Piazza: 3 students / group Meet
More informationIntroduction to Machine-Independent Optimizations - 4
Introduction to Machine-Independent Optimizations - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of
More informationControl flow and loop detection. TDT4205 Lecture 29
1 Control flow and loop detection TDT4205 Lecture 29 2 Where we are We have a handful of different analysis instances None of them are optimizations, in and of themselves The objective now is to Show how
More informationRegister allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.
Register allocation CS3300 - Compiler Design Liveness analysis and Register allocation V. Krishna Nandivada IIT Madras Copyright c 2014 by Antony L. Hosking. Permission to make digital or hard copies of
More informationInstruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.
Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back
More informationFast Copy Coalescing and Live-Range Identification
Fast Copy Coalescing and Live-Range Identification Zoran Budimlić, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, and Steven W. Reeves Department of Computer Science Rice University,
More informationGlobal Register Allocation
Global Register Allocation Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline n Issues in Global Register Allocation n The
More informationMechanizing Conventional SSA for a Verified Destruction with Coalescing
Mechanizing Conventional SSA for a Verified Destruction with Coalescing Delphine Demange University of Rennes 1 / IRISA / Inria, France delphine.demange@irisa.fr Yon Fernandez de Retana University of Rennes
More informationReview; questions Basic Analyses (2) Assign (see Schedule for links)
Class 2 Review; questions Basic Analyses (2) Assign (see Schedule for links) Representation and Analysis of Software (Sections -5) Additional reading: depth-first presentation, data-flow analysis, etc.
More informationLecture 3 Local Optimizations, Intro to SSA
Lecture 3 Local Optimizations, Intro to SSA I. Basic blocks & Flow graphs II. Abstraction 1: DAG III. Abstraction 2: Value numbering IV. Intro to SSA ALSU 8.4-8.5, 6.2.4 Phillip B. Gibbons 15-745: Local
More informationThe C2 Register Allocator. Niclas Adlertz
The C2 Register Allocator Niclas Adlertz 1 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationMiddle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code
Traditional Three-pass Compiler Source Code Front End IR Middle End IR Back End Machine code Errors Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce
More informationControl Flow Analysis
Control Flow Analysis Last time Undergraduate compilers in a day Today Assignment 0 due Control-flow analysis Building basic blocks Building control-flow graphs Loops January 28, 2015 Control Flow Analysis
More informationShengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota
Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)
More informationData structures for optimizing programs with explicit parallelism
Oregon Health & Science University OHSU Digital Commons CSETech March 1991 Data structures for optimizing programs with explicit parallelism Michael Wolfe Harini Srinivasan Follow this and additional works
More informationExecution-based Prediction Using Speculative Slices
Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers
More informationIntermediate Representations. Reading & Topics. Intermediate Representations CS2210
Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control
More informationControl Flow Analysis
COMP 6 Program Analysis and Transformations These slides have been adapted from http://cs.gmu.edu/~white/cs60/slides/cs60--0.ppt by Professor Liz White. How to represent the structure of the program? Based
More informationPhi-Predication for Light-Weight If-Conversion
Phi-Predication for Light-Weight If-Conversion Weihaw Chuang Brad Calder Jeanne Ferrante Benefits of If-Conversion Eliminates hard to predict branches Important for deep pipelines How? Executes all paths
More informationA Sparse Algorithm for Predicated Global Value Numbering
Sparse Predicated Global Value Numbering A Sparse Algorithm for Predicated Global Value Numbering Karthik Gargi Hewlett-Packard India Software Operation PLDI 02 Monday 17 June 2002 1. Introduction 2. Brute
More informationParameterized Construction of Program Representations for Sparse Dataflow Analyses
Parameterized Construction of Program Representations for Sparse Dataflow Analyses André L. C. Tavares 1, Mariza A. S. Bigonha 1, Roberto S. Bigonha 1, Benoit Boissinot 2, Fernando M. Q. Pereira 1 and
More informationStatements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control
Control Flow Graphs Nodes Edges Statements or asic locks (Maximal sequence of code with branching only allowed at end) Possible transfer of control Example: if P then S1 else S2 S3 S1 P S3 S2 CFG P predecessor
More informationBumyong Choi Leo Porter Dean Tullsen University of California, San Diego ASPLOS
Bumyong Choi Leo Porter Dean Tullsen University of California, San Diego ASPLOS 2008 1 Multi-core proposals take advantage of additional cores to improve performance, power, and/or reliability Moving towards
More informationLive Variable Analysis. Work List Iterative Algorithm Rehashed
Putting Data Flow Analysis to Work Last Time Iterative Worklist Algorithm via Reaching Definitions Why it terminates. What it computes. Why it works. How fast it goes. Today Live Variable Analysis (backward
More informationLecture Notes on Static Single Assignment Form
Lecture Notes on Static Single Assignment Form 15-411: Compiler Design Frank Pfenning Lecture 6 September 12, 2013 1 Introduction In abstract machine code of the kind we have discussed so far, a variable
More informationLoops. Lather, Rinse, Repeat. CS4410: Spring 2013
Loops or Lather, Rinse, Repeat CS4410: Spring 2013 Program Loops Reading: Appel Ch. 18 Loop = a computation repeatedly executed until a terminating condition is reached High-level loop constructs: While
More informationIntermediate Representations Part II
Intermediate Representations Part II Types of Intermediate Representations Three major categories Structural Linear Hybrid Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique
More informationCalvin Lin The University of Texas at Austin
Loop Invariant Code Motion Last Time Loop invariant code motion Value numbering Today Finish value numbering More reuse optimization Common subession elimination Partial redundancy elimination Next Time
More informationSSA Elimination after Register Allocation
SSA Elimination after Register Allocation Fernando Magno Quintão Pereira and Jens Palsberg UCLA University of California, Los Angeles Abstract. Compilers such as gcc use static-single-assignment (SSA)
More informationPunctual Coalescing. Fernando Magno Quintão Pereira
Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on
More informationWhy Global Dataflow Analysis?
Why Global Dataflow Analysis? Answer key questions at compile-time about the flow of values and other program properties over control-flow paths Compiler fundamentals What defs. of x reach a given use
More informationProgramming Language Processor Theory
Programming Language Processor Theory Munehiro Takimoto Course Descriptions Method of Evaluation: made through your technical reports Purposes: understanding various theories and implementations of modern
More informationAnticipation-based partial redundancy elimination for static single assignment form
SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 00; : 9 Published online 5 August 00 in Wiley InterScience (www.interscience.wiley.com). DOI: 0.00/spe.68 Anticipation-based partial redundancy elimination
More informationThe Static Single Assignment Form:
The Static Single Assignment Form: Construction and Application to Program Optimizations - Part 3 Department of Computer Science Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design
More informationAn Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt High Performance Systems Group
More informationFaculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology
Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False
More informationBYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom
BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES A Thesis Submitted to the Faculty of Purdue University by Nathaniel John Nystrom In Partial Fulfillment of the Requirements for the Degree of Master
More informationBYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES. A Thesis. Submitted to the Faculty. Purdue University. Nathaniel John Nystrom
BYTECODE-LEVEL ANALYSIS AND OPTIMIZATION OF JAVA CLASSES A Thesis Submitted to the Faculty of Purdue University by Nathaniel John Nystrom In Partial Fulfillment of the Requirements for the Degree of Master
More informationCompiler Optimisation
Compiler Optimisation 4 Dataflow Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This lecture:
More informationCSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1
CSE P 501 Compilers Register Allocation Hal Perkins Autumn 2011 11/22/2011 2002-11 Hal Perkins & UW CSE P-1 Agenda Register allocation constraints Local methods Faster compile, slower code, but good enough
More informationChallenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis
Challenges in the back end CS3300 - Compiler Design Basic Blocks and CFG V. Krishna Nandivada IIT Madras The input to the backend (What?). The target program instruction set, constraints, relocatable or
More informationRematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time
Graph Coloring Register Allocation Last Time Chaitin et al. Briggs et al. Today Finish Briggs et al. basics An improvement: rematerialization Rematerialization Some expressions are especially simple to
More informationPOSH: A TLS Compiler that Exploits Program Structure
POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign
More informationRegister Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops
DF00100 Advanced Compiler Construction Register Allocation Register Allocation: Determines values (variables, temporaries, constants) to be kept when in registers Register Assignment: Determine in which
More information