Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1
Agenda IR Optimization Redundancy Elimination Sample: CSE Partial Redundancy Elimination (PRE) Copy Propagation Value Numbering Loop Invariant Code Motion Counter Examples Strength Reduction Induction Variable (IV) Elimination PSU CS322 HM 2
IR Optimization Definition: Optimization is the translation of an original program P1 into a semantically equivalent program P2 with better properties Better depends on the project. Possibilities include code compactness, execution speed, numeric precision, and others PSU CS322 HM 3
IR Optimization Optimizations transform a program into a functionally-equivalent program with better performance. Transformation can be implemented at various stages and levels. Advantages of IR-Level Optimization: IR Operations are explicit, so cost estimations can be accurate IR Optimizations are machine-independent, hence the results are portable across different target machines Scopes of Optimization: Local: Transforming code by analyzing a single basic block Global: Transforming code by analyzing a whole subroutine Inter-Procedural: By analyzing the whole program Concepts and Techniques: Basic blocks & flow graphs Control-flow analysis & data-flow analysis PSU CS322 HM 4
Redundancy Elimination IR code optimization removes redundant computations. The following are specific examples: Common Subexpression Elimination (CSE) Based on lexical representation, applicable to global scope Partial Redundancy Elimination More powerful than CSE Copy Propagation Companion optimization to CSE Value Numbering (VN) Value based, single Basic Block Super-local Value Numbering Extends VN to multiple blocks Loop Invariant Elimination Removes code from frequently to rarely executed part of program PSU CS322 HM 5
Common Subexpression Elimination (CSE) E is a common subexpression if it occurs at L1 and L2, was computed at L1, and no components received new values along path to L2 To achieve CSE, introduce Temp to hold subexpression when first evaluated; see Example from Quicksort(): BB before CSE t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := 4*j a[t15] := x BB after CSE t11 := 4*i x := a[t11] t12 := t11 t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := t13 a[t15] := x BB after total CSE t11 := 4*i x := a[t11] t13 := 4*j t14 := a[t13] a[t11]:= t14 a[t13]:= x The second occurrence of 4*i in BB --from Quicksort()-- is a common subexpression; so is the second occurrence of 4*j PSU CS322 HM 6
CSE Across BBs CSE can eliminate redundant computation across Basic Blocks: before CSE BB1 i := j a := 4 * i if goto BB3 after CSE BB1 i := j temp := 4 * i a := temp if goto BB3 BB2 i := j b := 4 * i BB2 i := j b := temp i := j c := 4 * i BB3 i := j c := temp BB3 PSU CS322 HM 7
Global CSE both 4*i in BB5 (and BB6) are CSEs eliminate t6 and t11, t7, t12, replace with t2 4*j in BB5 and BB6 are CSEs eliminate t10 and t15, replace with t8 and t13 i := m-1 j := n t1 := 4*n v := a[t1] BB1 BB2 i := i+1 t2 := 4*i t3 := a[t2] if t3<v goto BB2 BB3 j := j-1 t4 := 4*j t5 := a[t4] if t5 > v goto BB3 Now a[t2] in BB5 and BB6 become CSEs replace with t3 BB4 if i >= j goto BB6 t6 := 4*i t11 := 4*i x := a[t6] x := a[t11] t7 := 4*i t12 := 4*i t8 := 4*j t13 := 4*j t9 := a[t8] t14 := a[t13] a[t7]:= t9 a[t12]:= t14 t10 := 4*j t15 := 4*j a[t10]:= x PSU CS322 a[t15] := x HM 8 goto BB2 BB5 BB6
Global CSE i := m-1 j := n t1 := 4*n v := a[t1] BB1 BB2 i := i+1 t2 := 4*i t3 := a[t2] if t3 < v goto BB2 BB3 j := j-1 t4 := 4*j t5 := a[t4] if t5 > v goto BB3 BB4 if i >= j goto BB6 BB5 BB6 x := t3 a[t2]:= t5 a[t4]:= x goto BB2 x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x PSU CS322 HM 9
CSE Algorithm Available expressions: An expression x y is available at node n if every path from the entry node to n evaluates the expression, and there are no definitions of x or y after the last evaluation Algorithm: 1. Compute available expressions for all expressions. 2. At each node n : w := x y, where the expression x y is available, search backwards for the evaluations of x y that reach n 3. Replace each evaluation v := x y found in the search by t := x y; v := t 4. Replace n by w := t PSU CS322 HM 10
An Improved CSE Algorithm The previous CSE algorithm performs the expensive backward search and inserts a new temp for every use of a common subexpression. The following ideas can improve the algorithm: Reduce number of new temps by assigning a unique name to each unique expression Avoid backward search by a separate traversal of the CFG Algorithm: 1. Compute available expressions for all expressions 2. Initialize an array Name[ e ] = ø for all expressions 3. At each node n : w := x y, where the expression x y (denoted e below) is available: If Name[ e ] = ø, allocate new name t and set Name[ e ] = t; Else let t = Name[ e ]; Replace n by w := t; 4. In a subsequent traversal of CFG, at each node v := e, if Name[ e ]!= ø, let t = Name[ e ]; replace the node by t := e; v := t; PSU CS322 HM 11
Yet Another CSE Algorithm Ideas: Create one temp for each unique expression. Let subsequent pass eliminate unnecessary temps. Algorithm: 1. Compute available expressions for all expressions. 2. At each evaluation of e: Hash e to a name, t, in a table Insert assignment t = e. 3. At a use of e where e is available: Look up e s name t in the hash table Replace e with t. PSU CS322 HM 12
Partial Redundancy Elimination (PRE) An expression x y is partially redundant at node n, if some path from entry node to n evaluates x y, and there are no definitions of x or y after the last evaluation PRE Optimization (it subsumes CSE): Discover partially redundant expressions Convert them to fully redundant expressions Remove redundancy, to reduce # of overall computations at runtime =... x y x y x y x y x y x y n x y n n =... PSU CS322 HM 13
Copy Propagation Copy statement has the form f := g A large number of copy statements may be generated after performing CSE optimizations. Copy propagation eliminates copy statements by using g for f wherever possible t6 := 4*i x := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7]:= t9 t10 := t8 a[t10]:= x goto BB2 Before BB5 t6 := 4*i x := a[t6] t8 := 4*j t9 := a[t8] a[t6]:= t9 a[t8]:= x goto BB2 After BB5 PSU CS322 HM 14
Cascading Problem CSE transformations may have a cascading effect more rounds of CSE/Copy-propagation may be needed before reaching the final form: x := b + c y := a + x u := b + c v := a + u x := b + c y := a + x u := x v := a + u x := b + c y := a + x v := a + x x := b + c y := a + x v := y PSU CS322 HM 15
Value Numbering Each variable is assumed to have a unique initial value Each unique value is assigned a unique number An expression s value is represented by a corresponding symbolic expression based on the operands numbers E.g. expression x + y s value is 1+2, if 1 and 2 are x and y s value numbers, respectively Each unique expression value is also assigned a unique number When a new variable or expression is encountered, check to see if it has been assigned a number, if so, use the number, otherwise assign it a new number Use a hash table for efficient number lookup PSU CS322 HM 16
Sample: Value Numbering statement var or expr assigned # x := b + c b 1 c 2 b+c (1+2) 3 x := b + c y := a + x u := b + c v := a + u y := a + x x a a+x (4+3) 3 4 5 y 5 u := b + c u (1+2) 3 v := a + u v (4+3) 5 Value numbering uses a single round to calculate the effect of cascaded optimizations PSU CS322 HM 17
Loop Invariant Code Motion If a loop contains a statement t a b such that a and b have the same values each time around the loop, then t will also have the same value each time. Hoist such loop-invariant statement out of loop! t1 := 0 BB1 t1 := 0 t2 := a * b BB1 BB2 i := i+1 t2 := a * b M[i]:= t2 if a < N goto BB3 BB2 i := i+1 M[i]:= t2 if a < N goto BB3 BB3 BB3 x := t2 x := t2 PSU CS322 HM 18
Loop Invariant Criteria A statement S : t a1 a2 is loop-invariant within loop L if, for each operand a i 1.) a i is a constant, or 2.) all definitions of a i that reach S are outside the loop, or 3.) only 1 definition of a i reaches S, which is loop-invariant An iterative algorithm can be used to find all loop-invariant statements PSU CS322 HM 19
Strength Reduction (SR) Definition: Reduction in strength is the replacement of an operation by a cheaper one, e.g. replace * by + if feasible Do not make such changes in the source, e.g. do not replace j=2*k; with j=k+k; let optimizer do this BB1 if i >= y goto BB3 BB1 if i >= y goto BB3 Call func1 j := 2 * k i := i + 1 goto BB1 BB2 Call func1 j := k + k i++ goto BB1 BB2 x :=... BB3 x :=... BB3 PSU CS322 HM 20
Induction Variable Elimination (IVE) Definition: Induction Variable (IV) is a variable iterating through a linear progression of values in a program section The program section is frequently a proper loop IV are either fundamental or dependent on other IVs IV elimination reduces multiple IVs into fewer, thus saving operations Since these operations are inside inner loops, savings can be significant After IVE other optimizations can be applied too, e.g. SR PSU CS322 HM 21
Induction Variable Elimination, Cont d integer a(100) do i = 1, 100 a(i) = 2 * i enddo BB0 t1 = 1 // i -- low bound is 1, not 0 like in C++ or Java, subtract! -- OK for i to be undefined after loop -- rhs deliberately not 4 * i, which would be easy: = IV BB0 t0 = 0 // IV t1 = 1 // i BB0 t0 = A(a) // IV t1 = 1 // i BB1 If t1>100 goto BB3 BB1 If t0>= 400 goto BB3 BB1 If t0>= A(a)+400 goto BB3 BB2 BB2 BB2 t2 = 2 * t1 t3 = 4 * t1 t4 = t3 4 t5 = A(a)+t4 *t5 = t2 t1 = t1 + 1 Goto BB1 t2 = 2 * t1 t5 = A(a)+t0 *t5 = t2 t0 = t0 + 4 Goto BB1 t2 = 2 * t1 *t0 = t2 t0 = t0 + 4 Goto BB1 BB3 Ater loop i undefined BB3 Ater loop i undefined BB3 Ater loop i undefined PSU CS322 HM 22