PROGRAM CONTROL FLOW CONTROL FLOW ANALYSIS. Control flow. Control flow analysis. Sequence of operations Representations

Size: px
Start display at page:

Download "PROGRAM CONTROL FLOW CONTROL FLOW ANALYSIS. Control flow. Control flow analysis. Sequence of operations Representations"

Transcription

1 CONTROL FLOW ANALYSIS PROGRAM CONTROL FLOW Control flow Sequence of operations Representations Control flow graph Control dependence Call graph Control flow analysis Analyzing program to discover its control structure 2

2 CONTROL FLOW GRAPH CFG models flow of control in the program (procedure) G = (N, E) as a directed graph Node n N: basic blocks A basic block is a maximal sequence of stmts with a single entry point, single exit point, and no internal branches For simplicity, we assume a unique entry node n0 and a unique exit node nf in later discussions Edge e=(n i, n j ) E: possible transfer of control from block n i to block n j if (x==y) { } else { }. if (x==y) 3 BASIC BLOCKS Definition A basic block is a maximal sequence of consecutive statements with a single entry point, a single exit point, and no internal branches Basic unit in control flow analysis Local level of code optimizations Redundancy elimination Register-allocation 4 2

3 BASIC BLOCK EXAMPLE How many basic blocks in this code fragment? What are they? ) i := m 2) j := n 3) t := 4 * n 4) v := a[t] 5) i := i + 6) t2 := 4 * I 7) t3 := a[t2] 8) if t3 < v goto 5) 9) j := j 0) t4 := 4 * j ) t5 := a[t4] 2) If t5 > v goto 9) 3) if i >= j goto 23) 4) t6 := 4*I 5) x := a[t6] 5 BASIC BLOCK EXAMPLE ) i := m 2) j := n 3) t := 4 * n 4) v := a[t] 5) i := i + 6) t2 := 4 * I 7) t3 := a[t2] 8) if t3 < v goto 5) 9) j := j 0) t4 := 4 * j ) t5 := a[t4] 2) If t5 > v goto 9) 3) if i >= j goto 23) 4) t6 := 4*I 5) x := a[t6] How many basic blocks in this code fragment? What are they? 6 3

4 IDENTIFY BASIC BLOCKS Input: A sequence of intermediate code statements Determine the leaders, the first statements of basic blocks The first statement in the sequence (entry point) is a leader Any statement that is the target of a branch (conditional or unconditional) is a leader Any statement immediately following a branch (conditional or unconditional) or a return is a leader For each leader, its basic block is the leader and all statements up to, but not including, the next leader or the end of the program 7 EXAMPLE: LEADERS () i := m (6) t7 := 4 * i (2) j := n (7) t8 := 4 * j (3) t := 4 * n (8) t9 := a[t8] (4) v := a[t] (9) a[t7] := t9 (5) i := i + (20) t0 := 4 * j (6) t2 := 4 * i (2) a[t0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) t := 4 * i (9) j := j - (24) x := a[t] (0) t4 := 4 * j (25) t2 := 4 * i () t5 := a[t4] (26) t3 := 4 * n (2) If t5 > v goto (9) (27) t4 := a[t3] (3) if i >= j goto (23) (28) a[t2] := t4 (4) t6 := 4*i (29) t5 := 4 * n (5) x := a[t6] (30) a[t5] := x 8 4

5 EXAMPLE: LEADERS () i := m (6) t7 := 4 * i (2) j := n (7) t8 := 4 * j (3) t := 4 * n (8) t9 := a[t8] (4) v := a[t] (9) a[t7] := t9 (5) i := i + (20) t0 := 4 * j (6) t2 := 4 * i (2) a[t0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) t := 4 * i (9) j := j - (24) x := a[t] (0) t4 := 4 * j (25) t2 := 4 * i () t5 := a[t4] (26) t3 := 4 * n (2) If t5 > v goto (9) (27) t4 := a[t3] (3) if i >= j goto (23) (28) a[t2] := t4 (4) t6 := 4*i (29) t5 := 4 * n (5) x := a[t6] (30) a[t5] := x 9 EXAMPLE: BASIC BLOCKS () i := m (6) t7 := 4 * i (2) j := n (7) t8 := 4 * j (3) t := 4 * n (8) t9 := a[t8] (4) v := a[t] (9) a[t7] := t9 (5) i := i + (20) t0 := 4 * j (6) t2 := 4 * i (2) a[t0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) t := 4 * i (9) j := j - (24) x := a[t] (0) t4 := 4 * j (25) t2 := 4 * i () t5 := a[t4] (26) t3 := 4 * n (2) If t5 > v goto (9) (27) t4 := a[t3] (3) if i >= j goto (23) (28) a[t2] := t4 (4) t6 := 4*i (29) t5 := 4 * n (5) x := a[t6] (30) a[t5] := x 0 5

6 GENERATING CFGS Partition intermediate code into basic blocks Add edges corresponding to control flows between blocks Unconditional goto Conditional branch multiple edges Sequential flow control passes to the next block (if no branch at the end) If no unique entry node n 0 or exit node n f, add dummy nodes and insert necessary edges Ideally no edges entering n 0 ; no edges exiting nf Simplify many analysis and transformation algorithms EXAMPLE: LEADERS () i := m (6) t7 := 4 * i (2) j := n (7) t8 := 4 * j (3) t := 4 * n (8) t9 := a[t8] (4) v := a[t] (9) a[t7] := t9 (5) i := i + (20) t0 := 4 * j (6) t2 := 4 * i (2) a[t0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) t := 4 * i (9) j := j - (24) x := a[t] (0) t4 := 4 * j (25) t2 := 4 * i () t5 := a[t4] (26) t3 := 4 * n (2) If t5 > v goto (9) (27) t4 := a[t3] (3) if i >= j goto (23) (28) a[t2] := t4 (4) t6 := 4*i (29) t5 := 4 * n (5) x := a[t6] (30) a[t5] := x 2 6

7 EXAMPLE: CFG () i := m (6) t7 := 4 * i (2) j := n (7) t8 := 4 * j (3) t := 4 * n (8) t9 := a[t8] (4) v := a[t] (9) a[t7] := t9 (5) i := i + (20) t0 := 4 * j (6) t2 := 4 * i (2) a[t0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) t := 4 * i (9) j := j - (24) x := a[t] (0) t4 := 4 * j (25) t2 := 4 * i () t5 := a[t4] (26) t3 := 4 * n (2) If t5 > v goto (9) (27) t4 := a[t3] (3) if i >= j goto (23) (28) a[t2] := t4 (4) t6 := 4*i (29) t5 := 4 * n (5) x := a[t6] (30) a[t5] := x 3 CFG AND HL CODE I = J = K = L = do if (P) { 3 J = I if (Q) L = else L = 3 K = K + } 6 else K = K + 2 print (I,J,K,L) do if (R) then L = L + 4 while (S) I = I + 6 while (T)

8 COMPLICATIONS IN CFG CONSTRUCTION Function calls Instruction scheduling may prefer function calls as basic block boundaries Special functions as setjmp() and longjmp() Exception handling Ambiguous jump Jump r //target stored in register r Static analysis may generate edges that never occur at runtime Record potential targets if possible 5 NODES IN CFG Given a CFG = <N, E> If there is an edge n i n j E B A C n i is a predecessor of n j n j is a successor of n i For any node n N Pred(n): the set of predecessors of n Succ(n): the set of successors of n A branch node is a node that has more than one successor A join node is a node that has more than one predecessor D 6 8

9 DEPTH FIRST TRAVERSAL CFG is a rooted, directed graph Entry node as the root Depth-first traversal (depth-first searching) Idea: start at the root and explore as far/deep as possible along each branch before backtracking Can build a spanning tree for the graph Spanning tree of a directed graph G contains all nodes of G such that There is a path from the root to any node reachable in the original graph and There are no cycles 7 DFS SPANNING TREE ALGORITHM procedure span(v) /*v is a node in the graph */ InTree(v) = true For each w that is a successor of v do if (!InTree(w)) Add edge v w to spanning tree span(w) end span Initial: span(n 0 ) 8 9

10 DFST EXAMPLE B A C Nodes are numbered in the order visited during the search == depth first pre-order numbering. D E F G J I H 9 DFST EXAMPLE 2 B A C 3 Nodes are numbered in the order visited during the search == depth first pre-order Numbering. D 4 5 E F 9 G 6 J 0 I 8 H

11 CFG EDGES CLASSIFICATION Edge x y in a CFG is an Advancing edge if x is an ancestor of y in the tree Tree edge if part of the spanning tree Forward edge if not part of the spanning tree and x is an ancestor of y in the tree Retreating edge if not part of the spanning tree and y is an ancestor of x in the tree Cross edge if not part of the spanning tree and neither is an ancestor of the other 2 DFST EXAMPLE B A C Tree Edge Forward Edge Retreating Edge Cross Edge D E F G J I H 22

12 BACK EDGES AND REDUCIBILITY An edge x y in a CFG is a back edge if every path from the entry node of the flow graph to x goes through y y dominates x : more details later Every back edge is a retreating edge Vice versa? A flow graph is reducible if all its retreating edges in any DFST are also back edges Flow graphs that occur in practice are almost always reducible 23 NON-REDUCIBLE GRAPHS Testing reducibility: Take any DFST for the flow graph, remove the back edges, and check that the result is acyclic A B C In any DFST, one of these edges will be a retreating edge 24 2

13 NODES ORDERING WRT DFST Enhanced depth-first spanning tree algorithm: time =0; procedure span(v) /* v is a node in the graph */ InTree(v) = true; d[v] = ++time; For each w that is a successor of v do if (!InTree(w)) then Add edge v w to spanning tree span(w) f[v]=++time; end span Associate two numbers to each node v in the graph d[v]: discovery time of v in the spanning f[v]: finish time of v in the spanning 25 NODES ORDERING WRT DFST Pre-ordering Ordering of vertices based on discovery time Post-ordering Ordering of vertices based on finish time Reverse post-ordering The reverse of a post-ordering, i.e. ordering of vertices in the opposite order of their finish time Not the same as pre-ordering Commonly used in forward data flow analysis Backward data flow analysis: RPO on the reverse CFG 26 3

14 ORDERING EXAMPLE D 5 7 E F 6 G 8 Pre-ordering: DEGF Post-ordering: GEFD Reverse post-ordering: DFEG 27 BIG PICTURE Why care about ordering / back edges? CFGs are commonly used to propagate information between nodes (basic blocks) Data flow analysis The existence of back edges / cycles in flow graphs indicates that we may need to traverse the graph more than once Iterative algorithms: when to stop? How quickly can we stop? Proper ordering of nodes during iterative algorithm assures number of passes limited by the number of nested back edges 28 4

15 REGIONS IN CFG Extended basic block (EBB) EBB is a maximal set of nodes in a CFG that contains no join nodes other than the entry node A single entry and possibly multiple exits Some optimizations like value numbering and instruction scheduling are more effective if applied in EBBs Natural loop Loop is a collection of nodes in a CFG such that All nodes in the collection are strongly connected, and The collection of nodes has a unique entry: the only way to visit the loop from outside A loop that contains no other loops is an inner loop Main target of program optimizations 29 EBB EXAMPLE B A Max-size EBBs: {A,B}, {C,J}, {D,E,F}, {G,H,I} C D Loops? Not that obvious Can use dominatorbased loop detection E F G J I H 30 5

16 DOMINANCE Node d of a CFG dominates node n if every path from the entry node of the graph to n passes through d (d dom n) Dom(n): the set of dominators of node n Every node dominates itself: n Dom(n) Node d strictly dominates n if d Dom(n) and d n Dominance-based loop recognition: entry of a loop dominates all nodes in the loop Each node n has a unique immediate dominator m which is the last dominator of n on any path from the entry to n (m idom n), m n The immediate dominator m of n is the strict dominator of n that is closest to n 3 DOMINATOR EXAMPLE Block Dom IDom {} 2 2 {,2} 3 3 {,3} 4 4 {,3,4} {,3,4,5} 4 6 {,3,4,6} {,3,4,7} {,3,4,7,8} 7 9 {,3,4,7,8,9} 8 0 {,3,4,7,8,0}

17 DOMINATOR TREES In a dominator tree, a node s parent is its immediate dominator 33 OTHER SETS OF INTEREST Block SDom Dom - Dom-n {} {,2,3,4,5,6,7,8,9,0} 2 {} {2} 3 {} {3,4,5,6,7,8,9,0} 4 {,3} {4,5,6,7,8,9,0} 5 {,3,4} {5} 6 {,3,4} {6} 7 {,3,4} {7,8,9,0} 8 {,3,4,7} {8,9,0} 9,3,4,7,8} {9} 0,3,4,7,8} {0} 34 7

18 EXAMPLE Block Dom IDom EXAMPLE Block Dom IDom - 2,2 3,2,3 2 4,2,3,4 3 5,2,3,5 3 6,2,3,6 3 7,2,7 2 8,2,8 2 9,2,8,9 8 0,2,8,9,0 9,2,8,9, 9 2,2,8,9,,2 36 8

19 ALGORITHM: COMPUTING DOM An iterative fixed-point calculation N is the set of nodes in the CFG DOM(n 0 ) = {n 0 } (n 0 is the entry) For all nodes x n 0 DOM(x) = N Until no more changes to dominator sets for all nodes x n 0 DOM(x) = { x } + ( DOM(P) ) for all predecessors P of x At termination, node d in DOM(n) iff d dominates n 37 DOMINATOR EXAMPLE initial iteration 0 {0} {0} N {} + (Dom(0) Dom(9)) = {0,} 2 N {2} + Dom() = {0,,2} 3 N {3} + (Dom() Dom(2) Dom(8) Dom(4)) = {0,,3} 4 N {4} + (Dom(3) Dom(7)) = {0,,3,4} 5 N {5} + Dom(4) = {0,,3,4,5} 6 N {6} + Dom(4) = {0,,3,4,6} 7 N {7} + (Dom(5) Dom(6) Dom(0)) = {0,,3,4,7} 8 N {8} + Dom(7) = {0,,3,4,7,8} 9 N {9} + Dom(8) = {0,,3,4,7,8,9} 0 N {0} + Dom(8) = {0,,3,4,7,8,0} 38 9

20 DOMINATOR EXAMPLE Dom Block initial iteration iteration2 0 {0} {0} {0} N {0,} {0,} 2 N {0,,2} {0,,2} 3 N {0,,3} {0,,3} 4 N {0,,3,4} {0,,3,4} 5 N {0,,3,4,5} {0,,3,4,5} 6 N {0,,3,4,6} {0,,3,4,6} 7 N {0,,3,4,7} {0,,3,4,7} 8 N {0,,3,4,7,8} {0,,3,4,7,8} 9 N {0,,3,4,7,8,9} {0,,3,4,7,8,9} 0 N {0,,3,4,7,8,0} {0,,3,4,7,8,0} 39 COMPUTING IDOM FROM DOM. For each node n, initially set IDOM(n) = DOM(n)-{n} (SDOM - strict dominators) 2. For each node p in IDOM(n), see if p has dominators other than itself also included in IDOM(n): if so, remove them from IDOM(n) The immediate dominator m of n is the strict dominator of n that is closest to n 40 20

21 I-DOMINATOR EXAMPLE 0 Block initial (SDOM) IDom {} {} {0} {0} 2 {0,} {} //0 - s dominator 3 {0,} {} //0 - s dominator 4 {0,,3} {3} // 0, - 3 s dominators 5 {0,,3,4} {4} // 0,,3-4 s dominators 6 {0,,3,4} {4} // 0,,3-4 s dominators 7 {0,,3,4} {4} // 0,,3-4 s dominators 8 {0,,3,4,7} {7} // 0,,3,4-7 s dominators 9 {0,,3,4,7,8} {8} // 0,,3,4,7-8 s dominators 0 {0,,3,4,7,8} {8} // 0,,3,4,7-8 s dominators 4 POST-DOMINANCE Related concept (will look at more later) Node d of a CFG post-dominates node n if every path from n to the exit node passes through d (d pdom n) Pdom(n): the set of post-dominators of node n Every node post-dominates itself: n Pdom(n) Each node n has a unique immediate post dominator m 42 2

22 POST-DOMINATOR EXAMPLE Block Pdom IPdom {3,4,7,8,0,exit} {2,3,4,7,8,0,exit} {3,4,7,8,0,exit} {4,7,8,0,exit} {5,7,8,0,exit} 7 6 {6,7,8,0,exit} {7,8,0,exit} {8,0,exit} 0 9 {,3,4,7,8,0,exit} exit 0 {0,exit} exit 43 CFG 2 exit exit 44 22

23 NATURAL LOOPS Natural loops that are suitable for improvement have two essential properties: A loop must have a single entry point called header There must be at least one way to iterate the loop, i.e., at least one path back to the header Identifying natural loops Searching for back edges (n d) in CFG whose heads dominate their tails For an edge a b, b is the head and a is the tail A back edge flows from a node n to one of n s dominators d The natural loop for that edge is {d}+the set of nodes that can reach n without going through d d is the header of the loop 45 BACK EDGE EXAMPLE Block Dom IDom 2 2,2 3 3,3 4 4,3, ,3,4, ,3,4, ,3,4, ,3,4,7,8 7 9,3,4,7,8,9 8 Back edges? 0,3,4,7,8,

24 IDENTIFYING NATURAL LOOPS Given a back edge n d, the natural loop of the edge includes Node d Any node that can reach n without going through d Loop construction Set loop={d} Add n into loop if n d Consider each node m d that we know is in loop, make sure that m s predecessors are also inserted in loop 47 NATURAL LOOPS EXAMPLE Back edge Natural loop {7,0,8} {4,7,5,6 0,8} {3,4,7,5,6,0,8} {,9,8,7,5,6, 0,4,3,2} 9 0 Why neither {3,4} nor {4,5,6,7} is a natural loop? 48 24

25 INNER LOOPS A useful property of natural loops: unless two loops have the same header, they are either disjoint or one is entirely contained (nested within) the other B2 B0 B B3 An inner loop is a loop that contains no other loops Good optimization candidate The inner loop of the previous example: {7,8,0} 49 DOMINANCE FRONTIERS For a node n in CFG, DF(n) denotes the dominance frontier set of n DF(n) contains all nodes x s.t. n dominates an immediate predecessor of x but does not strictly dominate x For this to happen, there is some path from node n to x, n y x where (n DOM y) but!(n SDOM x) Informally, DF(n) contains the first nodes reachable from n that n does not strictly dominate, on each CFG path leaving n Used in SSA calculation and redundancy elimination 50 25

26 DOMINANCE FRONTIER FOR NODE Paths of interest: DF(7)={,3,4,7} 5 DOMINANCE FRONTIER FOR NODE Paths of interest: DF(4)={,3,4} 52 26

27 COMPUTING DOMINANCE FRONTIERS Easiest way: DF(x) = SUCC(DOM - (x)) SDOM - (x) where SUCC(x) = set of successors of x in the CFG But not the most efficient Observation Nodes in a DF must be join nodes The predecessor of any join node j must have j in its DF unless it dominates j The dominators of j s predecessors must have j in their DF sets unless they also dominate j 53 COMPUTING DOMINANCE FRONTIERS for all nodes n, initialize DF(n) =Ø for all nodes n if n has multiple predecessors, then for each predecessor p of n runner = p while (runner IDom(n)) DF(runner) = DF(runner) {n} runner = IDom(runner) First identify join nodes j in CFG Starting with j s predecessors, walk up the dominator tree until we reach the immediate dominator of j 54 Node j should be included in the DF set of all the nodes we pass by except for j s immediate dominator 27

28 COMPUTING DOMINANCE FRONTIER Join node : runner = 0 = IDom() runner = 9 : DF(9) += {} runner = 8 : DF(8) += {} runner = 7 : DF(7) += {} runner = 4 : DF(4) += {} runner = 3 : DF(3) += {} runner = : DF() += {} runner = 0 = IDom() 55 COMPUTING DOMINANCE FRONTIER Join node 3: runner = = IDom(3) runner = 2: DF(2) += {3} runner = 4: DF(4) += {3} runner = 3: DF(3) += {3} runner = 8 : DF(8) += {3} runner = 7 : DF(7) += {3} 56 28

29 COMPUTING DOMINANCE FRONTIER Join node 4: runner = 3 = IDom(4) runner = 7: DF(7) += {4} runner = 4: DF(4) += {4} 8 Join node 7: runner = 5: DF(5) += {7} runner = 6: DF(6) += {7} runner = 0: DF(0) += {7} runner = 8: DF(8) += {7} runner = 7: DF(7) += {7} 57 DOMINANCE FRONTIER EXAMPLE 0 Block DF {} 2 2 {3} 3 3 {,3} 4 {,3,4} 4 5 {7} {7} 7 {,3,4,7} 8 8 {,3,7} {} 0 {7} 58 29

30 EXAMPLE Bloc k DF 2 59 DOMINATOR-BASED ANALYSIS Idea Use dominators to discover loops for optimization Advantages Sufficient for use by iterative data-flow analysis and optimizations Least time-intensive to implement Favored by most current optimizing compilers Alternative approach Interval-based analysis/structural analysis 60 30

31 REDUNDANT EXPRESSION ELIMINATION REDUNDANT EXPRESSIONS Definition An expression x op y is redundant at a point p if it has already been computed and no intervening operations redefined x or y Optimization for a redundant expression Preserve the result of earlier computations Replace subsequent evaluations with references to the saved value Safety Need to prove that x op y is redundant Today Redundancy elimination at different levels 62 3

32 REDUNDANT EXPRESSION EXAMPLE An expression x op y is redundant at a point p if it has already been computed and no intervening operations redefine x or y m = 2*y*z t0 = 2*y t0 = 2*y m = t0*z m = t0*z n = 3*y*z t = 3*y t = 3*y n = t*z n = t*z o = 2*y z t2 = 2*y t2 = t0 o = t2-z o = t0-z redundant RE elimination + copy propagation + dead code elimination 63 REDUNDANCY ELIMINATION Tasks Need to prove that x op y is redundant Need to rewrite the code In basic blocks Using DAGs Using Value Numbering Beyond basic blocks Must consider all paths between the occurrences 64 32

33 DAG REPRESENTATIONS A dag (directed acyclic graph) for a basic block has the following labels for the nodes: Leaves are labeled by atomic operands Unique identifiers/numbers Interior nodes are labeled by operators/ids Edges pointing to operands Nodes can have multiple labels since they represent computed values x := y op z a := y op z op x,a y z 65 GENERATING DAGS FROM IR Process statements in a basic block sequentially: For statement i: x := y op z. if y op z node exists, add x to the label for that node else add new node for op If y or z exist in the dag, point to existing the hardest part! locations, else add leaves for y and/or z and have the op node point to them Label the op node with x 2. If x existed previously as a leaf, subscript that previous entry 3. If x was previously associated with another interior node, remove that previous entry 66 33

34 DAG EXAMPLE t0 := 2 * y m := t0 * z t := 3 * y n := t * z t := 2 * y o := t * z t0 * 2 y 67 DAG EXAMPLE t0 := 2 * y m := t0 * z t n := 3 * y := t * z * m t := 2 * y o := t * z t0 * z 2 y 68 34

35 DAG EXAMPLE t0 := 2 * y m := t0 * z t n := 3 * y := t * z * m t := 2 * y o := t * z t * t0 * z 3 2 y 69 DAG EXAMPLE t0 := 2 * y m := t0 * z t n := 3 * y := t * z * n * m t := 2 * y o := t * z t * t0 * z 3 2 y 70 35

36 DAG EXAMPLE t0 := 2 * y m := t0 * z t n := 3 * y := t * z * n * m t := 2 * y o := t * z t * t0 * t z 3 2 y 7 DAG EXAMPLE t0 := 2 * y m := t0 * z t n := 3 * y := t * z * n * m,o t := 2 * y o := t * z t * t0 * t z 3 2 y 72 36

37 GENERATING CODE FROM DAGS Graph traversal t0 = 2 * y t = 3 * y n = t * z m = t0 * z * n * m,o o = m t = t0 t * t0 * t z 3 2 y 73 DAG GENERATION PROBLEMS Arrays Must equate all references to a given array since for a[i] and a[j], i may or may not be j x = a[i] a[j] = y z = a[i] /* multiple references to array a */ Pointers How can we determine where a pointer is pointing at in memory? *p = 0; /* increment subscript of every var that may get modified*/ 74 37

38 ARRAY x = a[i] a[j] = y z = a[i] [] z [] x [] a a 0 i 0 y 0 j0 OTTIMIZZAZIONE DEI BLOCCHI E RIFERIMENTI AD ARRAY Hanno comportamento diverso rispetto x = a[i] a[j] = y z = a[i]? x = a[i] a[j] = y z = x Se i j sono equivalenti Se i = j non sono equivalenti È necessaria una logica diversa per la costruzione di un DAG per il riferimento ad un array 38

39 OTTIMIZZAZIONE DEI BLOCCHI E RIFERIMENTI AD ARRAY x = a[i] a[j] = y z = a[i] z =[] ucciso =[] x [] = a 0 i 0 j 0 y0 ASSEGNAMENTI DI PUNTATORI E CHIAMATA DI PROCEDURE L operatore =* deve prendere come argomenti tutti i nodi che sono associati ad un identificatore Deve uccidere tutti i nodi del DAG costruiti fino a quel momento 39

40 REDUNDANCY ELIMINATION Tasks Need to prove that x op y is redundant Need to rewrite the code In basic blocks Using DAGs Using Value Numbering Beyond basic blocks Must consider all paths between the occurrences 79 VALUE NUMBERING LOCAL (LVN) Goal: Group expressions that provably have the same value Associate a unique value number with each distinct value created or used within a block Two expressions have the same value number if and only if they are provably identical for all possible operands Classical way to fold constants and eliminate redundant expressions 80 40

41 LVN ALGORITHM Construct a value table for a basic block A hash table that maps variables, constants, computed values to their value numbers Start with an empty value table For each operation o = o operator o 2 in the block. Get value numbers for the operands from a hash lookup in the value table 2. Hash <operator,vn(o ),VN(o 2 )> to get a value number for o 3. If o already had a value number, replace o with a reference; otherwise generate a new value number for o 4. Record o s value number into the value table If hashing behaves, the algorithm runs in linear time Minor issues Commutative operations Looks at VN of operand, not its name 8 EXAMPLE 3 2 Name VN a = i + b = i + i = j if i + goto L For variable a: hash <+, VN(i), VN()> get a new number 3 i 2 ()+(2) 3 a 3 (#) means the entry associated with number # 82 4

42 EXAMPLE 3 2 a = i b = i + i = j if i + goto L Name For variable b: hash <+, VN(i), VN()> get an existing number 3 i 2 ()+(2) 3 a 3 b 3 VN 83 EXAMPLE 3 2 a = i b = i + 4 i = j 4 if i + goto L VN(i) is changed to 4 Name VN i 4 2 ()+(2) 3 a 3 b 3 j

43 EXAMPLE 3 2 a = i b = i + i 4 = j if i + goto L 5 Name VN i 4 2 ()+(2) 3 a 3 b 3 j 4 (2)+(4) 5 85 EXAMPLE 3 2 a = i + a = i + b 3 = i + 2 b = a i 4 = j 4 i = j 4 2 if i + goto L if i + goto L 5 a and b are given the same numbering, but not the condition expression for if-stmt 86 43

44 NAMING ISSUES Original Code a x + y b x + y a 7 c x + y With VNs a 3 x + y 2 b 3 x + y 2 a c 3 x + y 2 Rewritten a 3 x + y 2 b 3 a 3 a c 3 a 3 (??) Options: Use c 3 b 3 Save a 3 in t 3 Give each value a unique name 87 NAMING ISSUES Original Code a x + y b x + y a 7 c x + y With VNs a 3 x + y 2 b 3 x + y 2 a c 3 x + y 2 Rewritten a 0 3 x 0 + y 0 2 b 0 3 a 0 3 a c 0 3 a 0 3 Give each value a unique name No value is ever killed These are SSA names (static single-assignment) We will see how to construct SSA later in this course 88 44

45 LVN LIMITATIONS a 3 2 = i + b 3 = i + 2 i 4 = j 4 if i goto L c = i + a = i + b = a i = j t = i + if t goto L c = t LVN cannot eliminate redundant expression c=i+ Why? 90 EXTENSIONS TO LVN Constant folding Add a bit that records when a value is constant Evaluate constant values at compile-time Replace with load immediate or immediate operand No stronger local algorithm Identities: x+0, x-0, x, x, x-x, x 0, x x, x 0, x 0xFF FF, max(x,maxint), min(x,minint), max(x,x), min(y,y), and so on... Algebraic identities Must check (many) special cases Replace result with a copy operation Build a decision tree on operation With values, not names 9 45

46 FINDING/FOLDING CONSTANTS i = 2 j = i * 2 k = i + Expression Value number Constant value 2 const=2 i const=2 92 FINDING/FOLDING CONSTANTS i = 2 j 2 = i * 2 k = i + Expression Value number Constant value 2 const=2 i const=2 ()*() 2 const=4 j 2 const=

47 FINDING/FOLDING CONSTANTS i = 2 j 2 = i * 2 k 4 = i + i = 2 j = 4 k = 3 3 Expression Value number Constant value 2 const=2 i const=2 ()*() 2 const=4 j 2 const=4 3 const= ()+(3) 4 const=3 k 4 const=3 94 LOCAL VALUE NUMBERING Safety x op y has been computed: hash table starts empty Operands not redefined: mapping uses VN(x) and VN(y), not x and y With SSA, no value is ever redefined Profitability Assumes a copy is cheaper than an operation Loading constant cheaper than an operation Algebraic identities: do not include non-profitable ones Opportunity Linear scan basic block Exhaustive search for optimization opportunities 97 47

48 REDUNDANCY ACROSS BLOCKS A m = a + b n = a + b LVN B LVN p = c + d r = c + d C q = a + b r = c + d D e = b + 8 s = a + b u = e + f E e = a + 7 t = c + d u = e + f F v = a + b w = c + d x = e + f Opportunities missed by LVN G y = a + b z = c + d Need to consider regions larger than basic blocks in CFG 98 REDUNDANCY ACROSS BLOCKS B LVN A p = c + d r = c + d m = a + b n = a + b C LVN q = a + b r = c + d Expressions evaluated in some predecessor/ancestor LVN: each block starts with an empty hash table Can we change this? D e = b + 8 s = a + b u = e + f E e = a + 7 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d Superlocal value numbering(svn): EBB 99 48

49 SUPERLOCAL VALUE NUMBERING B LVN A p = c + d r = c + d m = a + b n = a + b C LVN q = a + b r = c + d EBB for this CFG {A,B,C,D,E},{F},{G} Consider each path through EBB AB, ACD, ACE D e = b + 8 s = a + b u = e + f E e = a + 7 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d EBB: only the entry node can be join node 00 SUPERLOCAL VALUE NUMBERING Idea Apply local method to each path in the EBB as if the set of blocks in a path were a single block AB, ACD, ACE Results from ancestors can be propagated to descendants A C D, A C E Use A s hash-table to initialize B s and C s Efficiency Avoid re-analyzing common ancestors Stack-like scoped hash-table A, AB, A, AC, ACD, AC, ACE 0 49

50 REDUNDANCY ACROSS BLOCKS A m = a + b n = a + b LVN B LVN p = c + d r = c + d C q = a + b r = c + d SVN D SVN e = b + 8 s = a + b u = e + f E e = a + 7 t = c + d u = e + f SVN F v = a + b w = c + d x = e + f Opportunities missed by LVN & SVN G y = a + b z = c + d Rewriting: need to map VN to unique names Names across block boundaries 02 NAMING ISSUES Need a VN name mapping to handle kills Use the SSA name space Add subscripts to variable names for uniqueness Insert -functions at merge points to reconcile name spaces x... x x +... becomes x 0... x... x 2 (x 0,x ) x

51 SSA FORM B LVN A p 0 = c 0 + d 0 r 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 C LVN q 0 = a 0 + b 0 r = c 0 + d 0 SVN SVN does not help block F or G How do we process join nodes? D SVN e 0 = b s = a 0 + b 0 u 0 = e 0 + f 0 E e = a t 0 = c 0 + d 0 u = e + f 0 SVN F e 2 = (e 0, e ) u 2 = (u 0, u ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 Opportunities missed by LVN & SVN G r 2 = (r 0,r ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 04 LARGER REGIONS Problem of join nodes: multiple predecessors For block F, combine VN table of D and E? Merging states is expensive Fall back on what s known: both paths to F have a common prefix: {A,C} Dominator-based value numbering Use the VN table of IDOM(J) as the initial state for processing any join node J Start with the VN table produced by processing C for block F Will block D or E interfere? SSA ensures that D and E can add information to the value table, but they cannot invalidate it 05 5

52 DOMINATOR-BASED VALUE NUMBERING B LVN A p 0 = c 0 + d 0 r 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 D SVN C e 0 = b s = a 0 + b 0 u 0 = e 0 + f 0 LVN q 0 = a 0 + b 0 r = c 0 + d 0 E SVN SVN e = a t 0 = c 0 + d 0 u = e + f 0 For join node F DOM(F) = {A, C} IDOM(F) = C Perform value numbering for F with the table we got by processing C as the initial state F e 2 = (e 0, e ) u 2 = (u 0, u ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 Join node G? G r 2 = (r 0,r ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 06 DOMINATOR-BASED VALUE NUMBERING B LVN A p 0 = c 0 + d 0 r 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 D SVN C e 0 = b s = a 0 + b 0 u 0 = e 0 + f 0 LVN q 0 = a 0 + b 0 r = c 0 + d 0 E SVN SVN e = a t 0 = c 0 + d 0 u = e + f 0 DVN features Discover more redundancy(+) Little additional cost(+) F e 2 = (e 0, e ) u 2 = (u 0, u ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 DVN DVN Missing some opportunities(-) No values flow along back edges(-) G r 2 = (r 0,r ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 DVN Opportunities missed by LVN, SVN & DVN 07 52

53 GLOBAL REDUNDANCY ELIMINATION Global algorithm that processes an entire cycle of blocks potentially find more redundancy operations Neither SVN nor DVN can propagate information backward May need to process a block more than once Cannot perform code rewriting before analysis phase is finished Classic method: using data-flow analysis to compute the set of expressions that are available on entry to each block AVAIL analysis 08 AVAIL ANALYSIS An expression e is defined at point p if its value is computed at p p is called a definition site for e An expression e is killed at point p if one or more of its operands is defined at p p is called a kill site for e An expression e is available at a point p if every path leading to p contains a definition of e, and e is not killed between that definition and p An expression x op y is redundant at a point p if it has already been computed and no intervening operations redefine x or y 09 53

54 REDUNDANT EXPRESSIONS Definition site Since a + b is available here, redundant! c = a + b d = a * c i = f[i] = a + b c = c * 2 if c > d Candidates: a + b a * c d * d c * 2 i + 0 g = a * c g = d * d i = i + if i > 0 REDUNDANT EXPRESSIONS Definition site Kill site c = a + b d = a * c i = f[i] = a + b c = c * 2 if c > d Candidates: a + b a * c d * d c * 2 i + Not available Not redundant g = a * c i = i + if i > 0 g = d * d 54

55 FINDING GLOBAL REDUNDANCY. Build CFG 2. For each basic block b, compute local information: DEExpr(b) downward exposed expressions e DEExpr(b) if b evaluates e and none of e s operands is re-defined after that evaluation Exprkill(b) expressions killed by definitions in the block 3. Using local information, compute AVAIL_IN(b), AVAIL_OUT(b) over the entire CFG AVAIL_IN(b) is the set of available expressions on entry to block b 2 COMPUTING LOCAL INFORMATION Assume a block B with operations o, o 2,, o k VARKILL[B] = {}; DEExpr[B]={} For i=k to Assume o i is x=y op z Add x to VARKILL[B] If (y VARKILL[B] && z VARKILL[B]) add y op z to DEExpr[B] Backward through block EXPRKILL[B]={} For each expression e in the procedure For each variable v operands(e) If (v VARKILL[B]) EXPRKLL[B] = EXPRKILL[B] {e} } O(k) steps 3 55

56 EXAMPLE B A p = c + d r = c + d m = a + b n = a + b C q = a + b r = c + d Set of expressions to be considered: {a+b, c+d, b+8, e+f, a+7} G D e = b + 8 E e = a + 7 s = a + b t = c + d u = e + f u = e + f F y = a + b z = c + d v = a + b w = c + d x = e + f Block A B C D E F G DEExpr a+b c+d a+b,c+d b+8,a+b,e+f a+7,c+d,e+f a+b,c+d,e+f a+b,c+d VARKILL n,m p,r q,r e,s,u e,t,u v,w,x y,z 4 EXAMPLE B A p = c + d r = c + d m = a + b n = a + b C q = a + b r = c + d Set of expressions to be considered: {a+b, c+d, b+8, e+f, a+7} G D u = e + f E e = a + 7 e = b + 8 t = c + d s = a + b u = e + f F y = a + b z = c + d v = a + b w = c + d x = e + f If we change the order of stmts in D Block A B C D E F G DEExpr a+b c+d a+b,c+d b+8,a+b a+7,c+d,e+f a+b,c+d,e+f a+b,c+d VARKILL n,m p,r q,r e,s,u e,t,u v,w,x y,z 5 56

57 COMPUTING LOCAL INFORMATION Assume a block B with operations o, o 2,, o k VARKILL[B] = {}; DEExpr[B]={} For i=k to Assume o i is x=y+z Add x to VARKILL[B] If (y VARKILL[B] && z VARKILL[B]) add y+z to DEExpr[B] EXPRKILL[B]={} For each expression e in the procedure } O(N) steps For each variable v operands(e) N is # of If (v VARKILL[B]) operations EXPRKLL[B] = EXPRKILL[B] {e} 6 EXAMPLE B A p = c + d r = c + d m = a + b n = a + b C q = a + b r = c + d Set of expressions to be considered: {a+b, c+d, b+8, e+f, a+7} G D e = b + 8 E e = a + 7 s = a + b t = c + d u = e + f u = e + f F y = a + b z = c + d v = a + b w = c + d x = e + f Block A B C D E F G DEExpr a+b c+d a+b,c+d b+8,a+b,e+f a+7,c+d,e+f a+b,c+d,e+f a+b,c+d EXPRKILL {} {} {} e+f e+f {} {} 7 57

58 FINDING GLOBAL REDUNDANCY. Build CFG 2. For each basic block b, compute local information: DEExpr(b) downward exposed expressions Exprkill(b) expressions killed by definitions in the block 3. Using local information, compute AVAIL_IN(b), AVAIL_OUT(b) over the entire CFG AVAIL_IN(b) is the set of available expressions on entry to block b 8 COMPUTING AVAILABLE EXPRESSIONS For each block b Exprkill(b): set of expression killed in b DEExpr(b): set of downward exposed expressions AVAIL(b): set of expressions available on entry to b AVAIL_IN(b)= x pred(b) (AVAIL_OUT(x)) AVAIL_OUT(b)=DEExpr(b) (AVAIL_IN(b)-Exprkill(b)) AVAIL_IN(b 0 ) = Ø This system of simultaneous equations forms a data-flow problem Solve it with a data-flow algorithm Entry node in CFG is b

59 ITERATIVE ALGORITHM FOR AVAIL AVAIL_IN(b 0 ) = Ø for i = 0 to k AVAIL_OUT(b i ) = DEExpr(b i ) while (changed) changed = false for i = 0 to k OldValue = AVAIL_IN(b i ) Expressions that are available on ALL incoming links Computed locally or incoming and not killed AVAIL_IN(b i ) = x pred(bi) (AVAIL_OUT(x)) AVAIL_OUT(b i ) =DEExpr(b i ) (AVAIL_IN(b i )-Exprkill(b i )) If AVAIL(b i ) <> OldValue then changed = true 20 B EXAMPLE A p = c + d r = c + d G m = a + b n = a + b C y = a + b z = c + d q = a + b r = c + d D e = b + 8 E e = a + 7 s = a + b t = c + d u = e + f u = e + f F v = a + b w = c + d x = e + f AVAIL_IN[A]= Ø AVAIL_OUT[A]= {a+b} AVAIL_IN[B]= {a+b} AVAIL_OUT[B] ={a+b,c+d} AVAIL_IN[C]= {a+b} AVAIL_OUT[C]= {a+b,c+d} AVAIL_IN[D] = {a+b,c+d} AVAIL_OUT[D] = {a+b,c+d,b+8,e+f} AVAIL_IN[E] = {a+b,c+d} AVAIL_OUT[E] = {a+b,c+d,a+7,e+f} AVAIL_IN[F] =AVAIL_OUT[D] AVAIL_OUT[E] ={a+b,c+d, e+f} AVAIL_OUT[F] = {a+b,c+d, e+f} AVAIL_IN[G] =AVAIL_OUT[B] AVAIL_OUT[F] ={a+b,c+d} AVAIL_OUT[G]={a+b,c+d} 2 59

60 GLOBAL REDUNDANCY ELIMINATION Redundancy elimination based on AVAIL After computing AVAIL_IN[B] For B, e AVAIL_IN[B], assign a unique name(e) to e At a definition site of e, if e is not available -- a new evaluation of e -- add a copy assignment: name(e) := e At a definition site of e, if e is available, replace (computation of) e by name(e) 22 B EXAMPLE A p = c + d r = c + d G m = a + b n = a + b C y = a + b z = c + d q = a + b r = c + d D e = b + 8 E e = a + 7 s = a + b t = c + d u = e + f u = e + f v = a + b w = c + d x = e + f a+b->g, c+d ->g2, e+f->g3 F AVAIL_IN[A]= Ø AVAIL_OUT[A]= {a+b} AVAIL_IN[B]= {a+b} AVAIL_OUT[B] ={a+b,c+d} AVAIL_IN[C]= {a+b} AVAIL_OUT[C]= {a+b,c+d} AVAIL_IN[D] = {a+b,c+d} AVAIL_OUT[D] = {a+b,c+d,b+8,e+f} AVAIL_IN[E] = {a+b,c+d} AVAIL_OUT[E] = {a+b,c+d,a+7,e+f} AVAIL_IN[F] =AVAIL_OUT[D] AVAIL_OUT[E] ={a+b,c+d, e+f} AVAIL_OUT[F] = {a+b,c+d, e+f} AVAIL_IN[G] =AVAIL_OUT[B] AVAIL_OUT[F] ={a+b,c+d} AVAIL_OUT[G]={a+b,c+d} 23 60

61 B EXAMPLE p = c + d g2=p r = g2 A G m = a + b g=m n = g y = g z = g2 C q = g r = c + d g2=r D e = b + 8 s = g u = e + f g3=u E e = a + 7 t = g2 u = e + f g3=u F v = g w = g2 x = g3 a+b->g, c+d ->g2, e+f->g3 AVAIL_IN[A]= Ø AVAIL_OUT[A]= {a+b} AVAIL_IN[B]= {a+b} AVAIL_OUT[B] ={a+b,c+d} AVAIL_IN[C]= {a+b} AVAIL_OUT[C]= {a+b,c+d} AVAIL_IN[D] = {a+b,c+d} AVAIL_OUT[D] = {a+b,c+d,b+8,e+f} AVAIL_IN[E] = {a+b,c+d} AVAIL_OUT[E] = {a+b,c+d,a+7,e+f} AVAIL_IN[F] =AVAIL_OUT[D] AVAIL_OUT[E] ={a+b,c+d, e+f} AVAIL_OUT[F] = {a+b,c+d, e+f} AVAIL_IN[G] =AVAIL_OUT[B] AVAIL_OUT[F] ={a+b,c+d} AVAIL_OUT[G]={a+b,c+d} 24 COMPARING THE ALGORITHMS B LVN A p = c + d r = c + d m = a + b n = a + b C LVN q = a + b r = c + d GRE,SVN N.B.: SVN subsumes LVN DVN subsumes SVN GRE & xvn are not directly comparable D GRE,SVN e = b + 8 s = a + b u = e + f E e = a + 7 t = c + d u = e + f GRE,SVN F v = a + b w = c + d x = e + f GRE,DVN GRE,DVN GRE G y = a + b z = c + d GRE,DVN GRE LVN/SVN/DVN: Local/Superlocal/Dominator-based value numbering GRE: global redundancy elimination 25 6

62 ANOTHER GRE EXAMPLE PASS Blk VarKill DEExpr ExprKill i,d,c a*c, a + b a*c,d*d,c*2,i+ 2 c,f a+b a*c,c*2 3 g a*c 4 g d*d 5 i i+ AVAIL_IN(b)= x pred(b)(avail_out(x)) AVAIL_OUT(b)= DEExpr(b) (AVAIL_IN(b)-Exprkill(b)) AVAIL_IN(b0) = Ø U={a+b, a*c, i+, c*2, d*d} 2 c = a + b d = a * c i = f[i] = a + b c = c * 2 if {a+b} {a+b} 3 g = a * c 4 g = d * d 5 i = i + if Ø {a+b} 26 ANOTHER GRE EXAMPLE PASS 2 Blk VarKill DEExpr ExprKill i,d,c a*c, a + b a*c,d*d,c*2,i+ 2 c,f a+b a*c,c*2 3 g a*c 4 g d*d 5 i i+ AVAIL_IN(b)= x pred(b)(avail_out(x)) AVAIL_OUT(b)= DEExpr(b) (AVAIL_IN(b)-Exprkill(b)) AVAIL_IN(b0) = Ø 2 c = a + b d = a * c i = f[i] = a + b c = c * 2 if {a+b} {a+b} 3 g = a * c 4 g = d * d 5 {a+b} {a+b} i = i + if Ø 27 62

63 ANOTHER GRE EXAMPLE PASS 3 Blk VarKill DEExpr ExprKill i,d,c a*c, a + b a*c,d*d,c*2,i+ 2 c,f a+b a*c,c*2 3 g a*c 4 g d*d 5 i i+ AVAIL_IN(b)= x pred(b)(avail_out(x)) AVAIL_OUT(b)= DEExpr(b) (AVAIL_IN(b)-Exprkill(b)) AVAIL_IN(b0) = Ø No further changes: We have reached a fixed-point of solution 2 c = a + b d = a * c i = f[i] = a + b c = c * 2 if a+b a+b 3 g = a * c 4 g = d * d 5 a+b a+b i = i + if Ø 28 GRE BASED ON AVAIL Safety Available expressions prove that the replacement value is current Transformation must ensure right name value mapping Profitability Don t add any evaluations Add some copy operations Copies are inexpensive Copies can shrink or stretch live ranges 29 63

64 PARTIAL REDUNDANCY z = a x > 3 B Can we make this better? B2 z = x * y y < 5 B4 z < 7 B5 B3 B6 b = x * y B7 B8 c = x * y B9 Exit 30 PARTIAL REDUNDANCY ELIMINATION An expression is partially redundant if it is available on some paths. Use standard data-flow techniques to figure out where to move the code Subsumes classical global common subexpression elimination and code motion of loop invariants Used by many optimizing compilers Traditionally applied to lexically equivalent expressions With SSA support, applied to values as well 3 64

65 PARTIAL REDUNDANCY ELIMINATION May add a block to deal with critical edges Critical edge edge leading from a block with more than one successor to a block with more than one predecessor a:=d+e t:=d+e a:=t t:=d+e c:=d+e c:=t 32 PARTIAL REDUNDANCY ELIMINATION Code duplication to deal with redundancy a:=d+e a:=d+e t := a B4 B4 B4 c:=d+e c:=t c:=d+e Can we find a way to deal with redundancy in general?? 33 65

66 LAZY CODE MOTION Redundancy: common expressions, loop invariant expressions, partially redundant expressions Desirable Properties: All redundant computations of expressions that can be eliminated with code duplication are eliminated. The optimized program does not perform any computation that is not in the original program execution Expressions are computed at the latest possible time. 34 LAZY CODE MOTION Solve four data-flow problems that reveal the limit of code motion AVAIL: available expressions ANTI: anticipated expression EARLIEST: earliest placement for expressions LATER: expressions that can be postponed Compute INSERT and DELETE sets based on the data-flow solutions for each basic block They define how to move expressions between basic blocks 35 66

67 LAZY CODE MOTION B2 z = x * y y < 5 B4 z = a x > 3 B Can we make this better? B3 z < 7 B5 B6 b = x * y B7 B8 c = x * y B9 Exit 36 LAZY CODE MOTION B2 x*y z = x * y y < 5 z = a x > 3 x*y B z < 7 B3 B4 B5 B6 b = x * y B7 c = x * y Exit B8 B9 Placing computation at these points ensure our conditions 37 67

68 LOCAL INFORMATION For each block b, compute the local sets: DEExpr: an expression is downward-exposed (locally generated) if it is computed in b and its operands are not modified after its last computation UEExpr: an expression is upward-exposed if it is computed in b and its operands are not modified before its first computation NotKilled: an expression is not killed if none of its operands is modified in b f = b + d a = b + c d = a + e DEExpr = {a + e, b + c} UEExpr = { b + d, b + c } NotKilled = { b + c } 38 LOCAL INFORMATION What do they imply? DEExpr:e DEExpr(b) evaluating e at the end of b produces the same result as evaluating it at the original position in b UEExpr:e UEExpr(b) evaluating e at the entry of b produces the same result as evaluating it at the original position in b NotKilled: e NotKilled(b) evaluating e at either the start or end of b produces the same result as evaluating it at the original position f = b + d a = b + c d = a + e DEExpr = {a + e, b + c} UEExpr = { b + d, b + c } NotKilled = { b + c } 39 68

69 EXAMPLE z = a x > 3 B Block Not- Killed DE- Expr B {x*y} {} {} UE- Expr B2 z = x * y y < 5 z < 7 B3 B2 {x*y} {x*y} {x*y} B3 {x*y} {} {} B4 B5 B4 {x*y} {} {} B5 {x*y} {} {} B6 b = x * y B7 B6 {x*y} {x*y} {x*y} B8 B7 {x*y} {} {} c = x * y Exit B9 B8 {x*y} {} {} B9 {x*y} {x*y} {x*y} Exit {x*y} {} {} 40 GLOBAL INFORMATION Availability AvailIn(n 0 ) = Ø AvailIn(b)= x pred(b) AvailOut(x), b n 0 AvailOut(b)=DEExpr(b) (AvailIn(b) NotKilled(b)) Initialize AvailIn and AvailOut to be the set of expressions for all blocks except for the entry block n 0 Interpreting Avail sets e AvailOut(b) evaluating e at end of b produces the same value for e as its most recent evaluation, no matter whether the most recent one is inside b or not AvailOut tells the compiler how far forward e can move 4 69

70 EXAMPLE: AVAIL B6 b = x * y {x*y} {} z = a x > 3 {} B2 z = x * y y < 5 {x*y} {x*y} {} B4 {x*y} {} {x*y} {} B {} B8 {} {} B9 c = x * y {} {x*y} Exit AvailIn(b)= x pred(b) AvailOut(x) AvailOut(b)=DEExpr(b) (AvailIn(b) NotKilled(b)) {} z < 7 {} B5 {} {} B7 B3 Block Not- Killed DE- Expr B {x*y} {} {} Avail- Out B2 {x*y} {x*y} {x*y} B3 {x*y} {} {} B4 {x*y} {} {x*y} B5 {x*y} {} {} B6 {x*y} {x*y} {x*y} B7 {x*y} {} {} B8 {x*y} {} {} B9 {x*y} {x*y} {x*y} Exit {x*y} {} {} 42 GLOBAL INFORMATION Anticipability Expression e is anticipated at a point p if e is certain to be evaluated along all computation path leaving p before any re-computation of e s operands AntOut(nf) = Ø AntOut(b)= x succ(b)antin(x), b nf AntIn(b)=UEExpr(b) (AntOut(b) NotKilled(b)) Initialize AntOut to be the set of expressions for all blocks except for the exit block nf Interpreting Ant sets e AntIn(b) evaluating e at start of b produces the same value for e as evaluating it at the original position(later than start of b) with no additional overhead AntIn tells the compiler how far backward e can move Backwards! 43 70

71 {x*y} B2 z = x * y y < 5 {x*y} {x*y} B4 {x*y} EXAMPLE: ANT B6 b = x * y {} {} z = a x > 3 {} {x*y} {x*y} {x*y} {x*y} c = x * y {} {} Exit {} B {x*y} {x*y} B8 B9 AntOut(b)= x succ(b) AntIn(x) AntIn(b)=UEExpr(b) (AntOut(b) Not Killed(b)) {} z < 7 {} B5 {} {} B7 B3 Block Not- Killed UE- Expr B {x*y} {} {} 44 Ant-In B2 {x*y} {x*y} {x*y} B3 {x*y} {} {} B4 {x*y} {} {x*y} B5 {x*y} {} {x*y} B6 {x*y} {x*y} {x*y} B7 {x*y} {} {} B8 {x*y} {} {x*y} B9 {x*y} {x*y} {x*y} Exit {x*y} {} {} EXAMPLE: AVAIL AND ANT {} {} z = a B x > 3 {} {} {} {x*y} {} {} B2 z = x * y z < 7 y < 5 {x*y} {} {} {x*y} {x*y} {x*y} {}{x*y} B4 B5 {x*y} {x*y} {x*y} {x*y}{} {x*y} {} {} B6 b = x * y B7 {x*y}{} {} {x*y} {} {} B8 {} {x*y} {} {x*y} B9 c = x * y {x*y}{} {} {} Exit B3 Interesting spots: Anticipated but not available 45 7

72 PLACING EXPRESSIONS Earliest placement For an edge <i,j> in CFG, an expression e is in Earliest (i,j) if and only if the computation can legally move to <i,j> and cannot move to any earlier edge EARLIEST(i,j) = AntIn(j) AvailOut(i) (NotKilled(i) AntOut(i)) e AntIn(j): we can move e to the start of block j without generating un-necessary computation e AvailOut(i): no previous computation of e is available from the exit of i: if such an e exists, it would make the computation on <i,j> redundant e (Killed(i) AntOut(i)): we cannot move e further upward e Killed(i): e cannot be moved to an edge <x,i> with the same value e AntOut(i): there is another path starting with edge <i,x> along which e is not evaluated with the same value 46 EARLIEST(I,J) = ANTIN(J) AVAILOUT(I) (NOTKILLED(I) ANTOUT(I)) z = a x > 3 B B2 z = x * y y < 5 {x*y} B3 z < 7 {x*y} Block Not- Killed Avail -Out Ant-In B {x*y} {} {} {} Ant- Out B2 {x*y} {x*y} {x*y} {x*y} B4 B5 B3 {x*y} {} {} {} B4 {x*y} {x*y} {x*y} {x*y} B6 b = x * y B8 B7 B5 {x*y} {} {x*y} {x*y} B6 {x*y} {x*y} {x*y} {x*y} B7 {x*y} {} {} {} c = x * y Exit B9 B8 {x*y} {} {x*y} {x*y} B9 {x*y} {x*y} {x*y} {x*y} Exit {x*y} {} {} {} 47 72

73 EARLIEST(I,J) = ANTIN(J) AVAILOUT(I) (NOTKILLED(I) ANTOUT(I)) edge Earliest Block Not- Killed Avail- Out Ant-In Ant-Out,2 {x*y} {x*y} ({ } {x*y}) = {x*y},3 { } {x*y} ({ } {x*y}) = { } 2,4 {x*y} { } ({ } { }) = { } 2,6 {x*y} { } ({ } { }) ={ } 3,5 {X*y} {x*y} ({ } {x*y}) = {x*y} 3,7 { } {x*y} ({ } {x*y}) = { } 4,8 {x*y} { } ({ } { }) = { } 5,8 {x*y} {x*y} ({ } { }) = { } 6,exit { } { } ({ } { }) = { } 7,exit { } {x*y} ({ } {x*y}) = { } 8,9 {x*y} { } ({ } { }) = { } 9,exit { } { } ({ } {x*y}) = { } B {x*y} {} {} {} B2 {x*y} {x*y} {x*y} {x*y} B3 {x*y} {} {} {} B4 {x*y} {x*y} {x*y} {x*y} B5 {x*y} {} {x*y} {x*y} B6 {x*y} {x*y} {x*y} {x*y} B7 {x*y} {} {} {} B8 {x*y} {} {x*y} {x*y} B9 {x*y} {x*y} {x*y} {x*y} Exit {x*y} {} {} {} 48 PLACING EXPRESSIONS Earliest placement For an edge <i,j> in CFG, an expression e is in Earliest (i,j) if and only if the computation can legally move to <i,j> and cannot move to any earlier edge EARLIEST(i,j) = AntIn(j) AvailOut(i) (NotKilled(i) AntOut(i)) EARLIEST(n 0,j) = AntIn(j) AvailOut(n 0 ) We can never move e before entry point n 0 : the last term is ignored n 0 must be the dummy entry point 49 73

74 POSTPONE EVALUATIONS We want to delay the evaluation of expressions as long as possible Motivation: save register usage There is a limit to this delay Not past the use of the expression Not so far that we end up computing an expression that is already evaluated 50 PLACING EXPRESSIONS Later (than earliest) placement An expression e is in LaterIn(k) if evaluation of e can be moved through entry to k without losing any benefit e LaterIn(k) if and only if every path that reaches k includes an edge <p,q> s.t. e EARLIEST(p,q), and the path from q to k neither kills e nor uses e LaterIn(j) = i pred(j) LATER(i,j), j n 0 LaterIn(n 0 ) = Ø LATER(i,j) = (EARLIEST(i,j) LaterIn(i)) UEExpr(i), i pred(j) An expression e is in LATER(i,j) if evaluation of e can be moved (postponed) to CFG edge <i,j> e LATER(i,j) if <i,j> is its earliest placement, or it can be moved to the entry of i and there is no evaluation(use) of e in block i 5 74

75 B6 EXAMPLE: LATER EARLIEST {x*y} {x*y} B2 z = x * y y < 5 B4 b = x * y z = a x > 3 c = x * y LaterIn(j) = i pred(j) LATER(i,j), j n 0 B LATER(i,j) = (EARLIEST(i,j) LaterIn(i)) UEExpr(i), i pred(j) z < 7 {x*y} {x*y} B5 {x*y} B8 B9 B3 B7 LATER Block UE- Expr B {} {} Later In B2 {x*y} {x*y} B3 {} {} B4 {} {} B5 {} {x*y} B6 {x*y} {} B7 {} {} B8 {} {} B9 {x*y} {} Exit Exit {} {} 52 REWRITING CODE Insert set for each CFG edge The computations that LCM should insert on that edge Insert(i,j) = LATER(i,j) LaterIn(j) e Insert(i,j) means an evaluation of e should be added between block i and block j Three possible places to add Delete set for each block The computations that LCM should delete from that block Delete(i) = UEExpr(i) LaterIn(i), i n 0 The first computation in i is redundant 53 75

76 EXAMPLE: INSERT & DELETE INSERT z = a x > 3 B Insert(i,j)= LATER(i,j) LaterIn(j) Delete(i)= UEExpr(i) LaterIn(i), i n 0 B2 B6 z = x * y y < 5 B4 b = x * y {x*y} B3 z < 7 {x*y} B5 {x*y} {x*y} B7 B8 LATER B9 c = x * y Exit Block UE- Expr Later In B {} {} {} B2 {x*y} {x*y} {} B3 {} {} {} B4 {} {} {} B5 {} {x*y} {} 54 Delete B6 {x*y} {} {x*y} B7 {} {} {} B8 {} {} {} B9 {x*y} {} {x*y} Exit {} {} {} REWRITING CODE Insert set for each CFG edge The computations that LCM should insert on that edge Insert(i,j) = LATER(i,j) LaterIn(j) If i has only one successor, insert computations at the end of i If j has only one predecessor, insert computations at the entry of j Otherwise, split the edge and insert the computations in a new block between i and j Delete set for each block The computations that LCM should delete from that block Delete(i) = UEExpr(i) LaterIn(i), i n 0 The first computation in i is redundant: remove it 55 76

77 INSERTING CODE Evaluation placement for x INSERT(i,j) Three cases succs(i) = insert at end of i succs(i) >, but preds(j) = insert at start of j succs(i) >, & preds(j) > create new block in <i,j> for x x B i B i x B h B i B j x B j B k B j B k succs(i) = preds(j) = succs(i) > preds(j) > 56 EXAMPLE: INSERT & DELETE INSERT z = a x > 3 B Insert(i,j)= LATER(i,j) LaterIn(j) Delete(i)= UEExpr(i) LaterIn(i), i n 0 B2 z = x * y y < 5 B4 z < 7 B5 B3 Block UE- Expr Later In B {} {} {} B2 {x*y} {x*y} {} B3 {} {} {} Delete B6 b = x * y {x*y} B7 B4 {} {} {} B5 {} {x*y} {} B8 B6 {x*y} {} {x*y} B7 {} {} {} c = x * y B9 B8 {} {} {} B9 {x*y} {} {x*y} Exit Exit {} {} {} 57 77

Control Flow Analysis

Control Flow Analysis COMP 6 Program Analysis and Transformations These slides have been adapted from http://cs.gmu.edu/~white/cs60/slides/cs60--0.ppt by Professor Liz White. How to represent the structure of the program? Based

More information

Lazy Code Motion. Comp 512 Spring 2011

Lazy Code Motion. Comp 512 Spring 2011 Comp 512 Spring 2011 Lazy Code Motion Lazy Code Motion, J. Knoop, O. Ruthing, & B. Steffen, in Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation, June 1992.

More information

Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama

Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama Introduction to Optimization CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

More information

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code Traditional Three-pass Compiler Source Code Front End IR Middle End IR Back End Machine code Errors Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce

More information

Intermediate representation

Intermediate representation Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.

More information

Introduction to Optimization Local Value Numbering

Introduction to Optimization Local Value Numbering COMP 506 Rice University Spring 2018 Introduction to Optimization Local Value Numbering source IR IR target code Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

Control Flow Analysis & Def-Use. Hwansoo Han

Control Flow Analysis & Def-Use. Hwansoo Han Control Flow Analysis & Def-Use Hwansoo Han Control Flow Graph What is CFG? Represents program structure for internal use of compilers Used in various program analyses Generated from AST or a sequential

More information

Control Flow Analysis

Control Flow Analysis Control Flow Analysis Last time Undergraduate compilers in a day Today Assignment 0 due Control-flow analysis Building basic blocks Building control-flow graphs Loops January 28, 2015 Control Flow Analysis

More information

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,

More information

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis Challenges in the back end CS3300 - Compiler Design Basic Blocks and CFG V. Krishna Nandivada IIT Madras The input to the backend (What?). The target program instruction set, constraints, relocatable or

More information

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection An Overview of GCC Architecture (source: wikipedia) CS553 Lecture Control-Flow, Dominators, Loop Detection, and SSA Control-Flow Analysis and Loop Detection Last time Lattice-theoretic framework for data-flow

More information

Control-Flow Analysis

Control-Flow Analysis Control-Flow Analysis Dragon book [Ch. 8, Section 8.4; Ch. 9, Section 9.6] Compilers: Principles, Techniques, and Tools, 2 nd ed. by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jerey D. Ullman on reserve

More information

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip

More information

Lecture 9: Loop Invariant Computation and Code Motion

Lecture 9: Loop Invariant Computation and Code Motion Lecture 9: Loop Invariant Computation and Code Motion I. Loop-invariant computation II. III. Algorithm for code motion Partial redundancy elimination ALSU 9.5-9.5.2 Phillip B. Gibbons 15-745: Loop Invariance

More information

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators. The Lecture Contains: Algorithm to Find Dominators Loop Detection Algorithm to Detect Loops Extended Basic Block Pre-Header Loops With Common eaders Reducible Flow Graphs Node Splitting Interval Analysis

More information

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

CS577 Modern Language Processors. Spring 2018 Lecture Optimization CS577 Modern Language Processors Spring 2018 Lecture Optimization 1 GENERATING BETTER CODE What does a conventional compiler do to improve quality of generated code? Eliminate redundant computation Move

More information

CS293S Redundancy Removal. Yufei Ding

CS293S Redundancy Removal. Yufei Ding CS293S Redundancy Removal Yufei Ding Review of Last Class Consideration of optimization Sources of inefficiency Components of optimization Paradigms of optimization Redundancy Elimination Types of intermediate

More information

Goals of Program Optimization (1 of 2)

Goals of Program Optimization (1 of 2) Goals of Program Optimization (1 of 2) Goal: Improve program performance within some constraints Ask Three Key Questions for Every Optimization 1. Is it legal? 2. Is it profitable? 3. Is it compile-time

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210 Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control

More information

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Compiler Construction 2009/2010 SSA Static Single Assignment Form Compiler Construction 2009/2010 SSA Static Single Assignment Form Peter Thiemann March 15, 2010 Outline 1 Static Single-Assignment Form 2 Converting to SSA Form 3 Optimization Algorithms Using SSA 4 Dependencies

More information

Lecture 3 Local Optimizations, Intro to SSA

Lecture 3 Local Optimizations, Intro to SSA Lecture 3 Local Optimizations, Intro to SSA I. Basic blocks & Flow graphs II. Abstraction 1: DAG III. Abstraction 2: Value numbering IV. Intro to SSA ALSU 8.4-8.5, 6.2.4 Phillip B. Gibbons 15-745: Local

More information

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz Compiler Design Fall 2015 Control-Flow Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

More information

Control flow and loop detection. TDT4205 Lecture 29

Control flow and loop detection. TDT4205 Lecture 29 1 Control flow and loop detection TDT4205 Lecture 29 2 Where we are We have a handful of different analysis instances None of them are optimizations, in and of themselves The objective now is to Show how

More information

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive)

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive) DDC86 Compiler Optimizations and Code Generation Control Flow Analysis. Page 1 C.Kessler,IDA,Linköpings Universitet, 2009. Control Flow Analysis [ASU1e 10.4] [ALSU2e 9.6] [Muchnick 7] necessary to enable

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Loop Invariant Code Motion Last Time Loop invariant code motion Value numbering Today Finish value numbering More reuse optimization Common subession elimination Partial redundancy elimination Next Time

More information

Why Global Dataflow Analysis?

Why Global Dataflow Analysis? Why Global Dataflow Analysis? Answer key questions at compile-time about the flow of values and other program properties over control-flow paths Compiler fundamentals What defs. of x reach a given use

More information

Data Flow Analysis and Computation of SSA

Data Flow Analysis and Computation of SSA Compiler Design 1 Data Flow Analysis and Computation of SSA Compiler Design 2 Definitions A basic block is the longest sequence of three-address codes with the following properties. The control flows to

More information

Lecture 23 CIS 341: COMPILERS

Lecture 23 CIS 341: COMPILERS Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:

More information

Compiler Optimization Techniques

Compiler Optimization Techniques Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence

More information

Lecture 10: Lazy Code Motion

Lecture 10: Lazy Code Motion Lecture : Lazy Code Motion I. Mathematical concept: a cut set II. Lazy Code Motion Algorithm Pass : Anticipated Expressions Pass 2: (Will be) Available Expressions Pass 3: Postponable Expressions Pass

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion Outline Loop Optimizations Induction Variables Recognition Induction Variables Combination of Analyses Copyright 2010, Pedro C Diniz, all rights reserved Students enrolled in the Compilers class at the

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto Data Structures and Algorithms in Compiler Optimization Comp314 Lecture Dave Peixotto 1 What is a compiler Compilers translate between program representations Interpreters evaluate their input to produce

More information

Lecture 8: Induction Variable Optimizations

Lecture 8: Induction Variable Optimizations Lecture 8: Induction Variable Optimizations I. Finding loops II. III. Overview of Induction Variable Optimizations Further details ALSU 9.1.8, 9.6, 9.8.1 Phillip B. Gibbons 15-745: Induction Variables

More information

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts Compiler Structure Source Code Abstract Syntax Tree Control Flow Graph Object Code CMSC 631 Program Analysis and Understanding Fall 2003 Data Flow Analysis Source code parsed to produce AST AST transformed

More information

Computer Science 160 Translation of Programming Languages

Computer Science 160 Translation of Programming Languages Computer Science 160 Translation of Programming Languages Instructor: Christopher Kruegel Code Optimization Code Optimization What should we optimize? improve running time decrease space requirements decrease

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

Flow Graph Theory. Depth-First Ordering Efficiency of Iterative Algorithms Reducible Flow Graphs

Flow Graph Theory. Depth-First Ordering Efficiency of Iterative Algorithms Reducible Flow Graphs Flow Graph Theory Depth-First Ordering Efficiency of Iterative Algorithms Reducible Flow Graphs 1 Roadmap Proper ordering of nodes of a flow graph speeds up the iterative algorithms: depth-first ordering.

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control Control Flow Graphs Nodes Edges Statements or asic locks (Maximal sequence of code with branching only allowed at end) Possible transfer of control Example: if P then S1 else S2 S3 S1 P S3 S2 CFG P predecessor

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions Agenda CS738: Advanced Compiler Optimizations Data Flow Analysis Amey Karkare karkare@cse.iitk.ac.in http://www.cse.iitk.ac.in/~karkare/cs738 Department of CSE, IIT Kanpur Static analysis and compile-time

More information

Languages and Compiler Design II IR Code Optimization

Languages and Compiler Design II IR Code Optimization Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization

More information

Topic 9: Control Flow

Topic 9: Control Flow Topic 9: Control Flow COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Front End The Back End (Intel-HP codename for Itanium ; uses compiler to identify parallelism)

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Intermediate Representation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Unambiguous Program representation

More information

A main goal is to achieve a better performance. Code Optimization. Chapter 9

A main goal is to achieve a better performance. Code Optimization. Chapter 9 1 A main goal is to achieve a better performance Code Optimization Chapter 9 2 A main goal is to achieve a better performance source Code Front End Intermediate Code Code Gen target Code user Machineindependent

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

Control Flow Graphs. (From Matthew S. Hetch. Flow Analysis of Computer Programs. North Holland ).

Control Flow Graphs. (From Matthew S. Hetch. Flow Analysis of Computer Programs. North Holland ). Control Flow Graphs (From Matthew S. Hetch. Flow Analysis of Computer Programs. North Holland. 1977. ). 38 Control Flow Graphs We will now discuss flow graphs. These are used for global optimizations (as

More information

CODE GENERATION Monday, May 31, 2010

CODE GENERATION Monday, May 31, 2010 CODE GENERATION memory management returned value actual parameters commonly placed in registers (when possible) optional control link optional access link saved machine status local data temporaries A.R.

More information

Topic I (d): Static Single Assignment Form (SSA)

Topic I (d): Static Single Assignment Form (SSA) Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge

More information

EECS 583 Class 2 Control Flow Analysis LLVM Introduction

EECS 583 Class 2 Control Flow Analysis LLVM Introduction EECS 583 Class 2 Control Flow Analysis LLVM Introduction University of Michigan September 8, 2014 - 1 - Announcements & Reading Material HW 1 out today, due Friday, Sept 22 (2 wks)» This homework is not

More information

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7 Control Flow Analysis CS2210 Lecture 11 Reading & Topics Muchnick: chapter 7 Optimization Overview Control Flow Analysis Maybe start data flow analysis Optimization Overview Two step process Analyze program

More information

Overview Of Op*miza*on, 2

Overview Of Op*miza*on, 2 OMP 512 Rice University Spring 2015 Overview Of Op*miza*on, 2 Superlocal Value Numbering, SE opyright 2015, Keith D. ooper & Linda Torczon, all rights reserved. Students enrolled in omp 512 at Rice University

More information

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis CS 432 Fall 2018 Mike Lam, Professor Data-Flow Analysis Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00

More information

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compiler Design Fall 2015 Data-Flow Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292 pedro@isi.edu

More information

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Data Flow Analysis. CSCE Lecture 9-02/15/2018 Data Flow Analysis CSCE 747 - Lecture 9-02/15/2018 Data Flow Another view - program statements compute and transform data So, look at how that data is passed through the program. Reason about data dependence

More information

Example. Example. Constant Propagation in SSA

Example. Example. Constant Propagation in SSA Example x=1 a=x x=2 b=x x=1 x==10 c=x x++ print x Original CFG x 1 =1 a=x 1 x 2 =2 x 3 =φ (x 1,x 2 ) b=x 3 x 4 =1 x 5 = φ(x 4,x 6 ) x 5 ==10 c=x 5 x 6 =x 5 +1 print x 5 CFG in SSA Form In SSA form computing

More information

Lecture 2: Control Flow Analysis

Lecture 2: Control Flow Analysis COM S/CPRE 513 x: Foundations and Applications of Program Analysis Spring 2018 Instructor: Wei Le Lecture 2: Control Flow Analysis 2.1 What is Control Flow Analysis Given program source code, control flow

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 4 Dataflow Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This lecture:

More information

Static Single Assignment (SSA) Form

Static Single Assignment (SSA) Form Static Single Assignment (SSA) Form A sparse program representation for data-flow. CSL862 SSA form 1 Computing Static Single Assignment (SSA) Form Overview: What is SSA? Advantages of SSA over use-def

More information

ELEC 876: Software Reengineering

ELEC 876: Software Reengineering ELEC 876: Software Reengineering () Dr. Ying Zou Department of Electrical & Computer Engineering Queen s University Compiler and Interpreter Compiler Source Code Object Compile Execute Code Results data

More information

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops Lecture 4 More on Data Flow: Constant Propagation, Speed, Loops I. Constant Propagation II. Efficiency of Data Flow Analysis III. Algorithm to find loops Reading: Chapter 9.4, 9.6 CS243: Constants, Speed,

More information

CS553 Lecture Profile-Guided Optimizations 3

CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Contents of Lecture 2

Contents of Lecture 2 Contents of Lecture Dominance relation An inefficient and simple algorithm to compute dominance Immediate dominators Dominator tree Jonas Skeppstedt (js@cs.lth.se) Lecture / Definition of Dominance Consider

More information

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean? Static Single Assignment Form What If Many of the complexities of optimization and code generation arise from the fact that a given variable may be assigned to in many different places. Thus reaching definition

More information

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler

More information

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis Loop Invariant Code Motion Loop Invariant Code Motion Background: ud- and du-chains Last Time Control flow analysis Today Loop invariant code motion ud-chains A ud-chain connects a use of a variable to

More information

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve CONTROL FLOW ANALYSIS The slides adapted from Vikram Adve Flow Graphs Flow Graph: A triple G=(N,A,s), where (N,A) is a (finite) directed graph, s N is a designated initial node, and there is a path from

More information

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1 CSE P 0 Compilers SSA Hal Perkins Spring 0 UW CSE P 0 Spring 0 V- Agenda Overview of SSA IR Constructing SSA graphs Sample of SSA-based optimizations Converting back from SSA form Sources: Appel ch., also

More information

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Program Analysis Compile-time reasoning about run-time behavior

More information

Lecture 5. Partial Redundancy Elimination

Lecture 5. Partial Redundancy Elimination Lecture 5 Partial Redundancy Elimination I. Forms of redundancy global common subexpression elimination loop invariant code motion partial redundancy II. Lazy Code Motion Algorithm Mathematical concept:

More information

Partial Redundancy Analysis

Partial Redundancy Analysis Partial Redundancy Analysis Partial Redundancy Analysis is a boolean-valued data flow analysis that generalizes available expression analysis. Ordinary available expression analysis tells us if an expression

More information

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210 Redundant Computation Elimination Optimizations CS2210 Lecture 20 Redundancy Elimination Several categories: Value Numbering local & global Common subexpression elimination (CSE) local & global Loop-invariant

More information

Data Flow Analysis. Program Analysis

Data Flow Analysis. Program Analysis Program Analysis https://www.cse.iitb.ac.in/~karkare/cs618/ Data Flow Analysis Amey Karkare Dept of Computer Science and Engg IIT Kanpur Visiting IIT Bombay karkare@cse.iitk.ac.in karkare@cse.iitb.ac.in

More information

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Review; questions Basic Analyses (2) Assign (see Schedule for links) Class 2 Review; questions Basic Analyses (2) Assign (see Schedule for links) Representation and Analysis of Software (Sections -5) Additional reading: depth-first presentation, data-flow analysis, etc.

More information

Introduction to Machine-Independent Optimizations - 4

Introduction to Machine-Independent Optimizations - 4 Introduction to Machine-Independent Optimizations - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

Anticipation-based partial redundancy elimination for static single assignment form

Anticipation-based partial redundancy elimination for static single assignment form SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 00; : 9 Published online 5 August 00 in Wiley InterScience (www.interscience.wiley.com). DOI: 0.00/spe.68 Anticipation-based partial redundancy elimination

More information

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships Control Flow Analysis Last Time Constant propagation Dominator relationships Today Static Single Assignment (SSA) - a sparse program representation for data flow Dominance Frontier Computing Static Single

More information

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space Code optimization Have we achieved optimal code? Impossible to answer! We make improvements to the code Aim: faster code and/or less space Types of optimization machine-independent In source code or internal

More information

When we eliminated global CSE's we introduced copy statements of the form A:=B. There are many

When we eliminated global CSE's we introduced copy statements of the form A:=B. There are many Copy Propagation When we eliminated global CSE's we introduced copy statements of the form A:=B. There are many other sources for copy statements, such as the original source code and the intermediate

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Code Placement, Code Motion

Code Placement, Code Motion Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Why? Loop-invariant code motion Global value numbering destroys block membership Remove

More information

Induction Variable Identification (cont)

Induction Variable Identification (cont) Loop Invariant Code Motion Last Time Uses of SSA: reaching constants, dead-code elimination, induction variable identification Today Finish up induction variable identification Loop invariant code motion

More information

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

CSE Section 10 - Dataflow and Single Static Assignment - Solutions CSE 401 - Section 10 - Dataflow and Single Static Assignment - Solutions 1. Dataflow Review For each of the following optimizations, list the dataflow analysis that would be most directly applicable. You

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Loop Invariant Code Motion Last Time SSA Today Loop invariant code motion Reuse optimization Next Time More reuse optimization Common subexpression elimination Partial redundancy elimination February 23,

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

An Implementation of Lazy Code Motion for Machine SUIF

An Implementation of Lazy Code Motion for Machine SUIF An Implementation of Lazy Code Motion for Machine SUIF Laurent Rolaz Swiss Federal Institute of Technology Processor Architecture Laboratory Lausanne 28th March 2003 1 Introduction Optimizing compiler

More information

Global Register Allocation via Graph Coloring

Global Register Allocation via Graph Coloring Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm

More information

CS202 Compiler Construction

CS202 Compiler Construction S202 ompiler onstruction pril 15, 2003 S 202-32 1 ssignment 11 (last programming assignment) Loop optimizations S 202-32 today 2 ssignment 11 Final (necessary) step in compilation! Implement register allocation

More information

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks 4//5 Low-Level Virtual Machine (LLVM) LLVM AND SSA Slides adapted from those prepared by Steve Zdancewic at Penn Open-Source Compiler Infrastructure see llvm.org for full documntation Created by Chris

More information

Compiler Optimization and Code Generation

Compiler Optimization and Code Generation Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen Melikyan 1 Course Overview Introduction: Overview of Optimizations 1 lecture Intermediate-Code Generation 2 lectures Machine-Independent

More information