Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler Construction 8.1.3 Relevant Theory / Literature Survey: Code Optimization:- Code Optimization is the phase of compilation that focuses on generating a good code. Most of the time a good code means a code that runs fast. However, there are some cases where a good code is a code that does not require a lot of memory. 8.1.3.1 Types of optimizations Techniques used in optimization can be broken up among various scopes which can affect anything from a single statement to the entire program. Generally speaking, locally scoped techniques are easier to implement than global ones but result in smaller gains. Some examples of scopes include: 8.1.3.1.1 Local optimizations These only consider information local to a function definition. This reduces the amount of analysis that needs to be performed (saving time and reducing storage requirements) but means that worst case assumptions have to be made when function calls occur or global variables are accessed. Optimization is considered local if it is done at a basic block level (a sequence of instruction where there is no branch in and out through its entirety). Local Optimization focuses on: Elimination of redundant operations Effective instruction scheduling Effective register allocation. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 1

8.1.3.1.2 Global optimizations Global Optimization focuses on: Same techniques performed by local optimization but at multi-basic-block level Code modifications to improve the performance of loops. Both local and global optimization use a control flow graph to represent the program, and a data flow analysis algorithm to trace the flow of information. 8.1.3.1.3 Peephole optimizations Usually performed late in the compilation process after machine code has been generated. This form of optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2 might be more efficiently executed by left-shifting the value or by adding the value to itself. (This example is also an instance of strength reduction.) Peephole Optimization works by sliding a several-instruction window (a peephole) over the target code, and looking for suboptimal patterns of instructions. The patterns to look for are heuristic, and typically based on special instructions available on a given machine. The peephole is a small, moving window on the target program. The code in the peephole need not contiguous, although some implementations do require this. it is characteristic of peephole optimization that each improvement may spawn opportunities for additional improvements. We shall give the following examples of program transformations that are characteristic of peephole optimizations: Redundant-instructions elimination Flow-of-control optimizations Algebraic simplifications Use of machine idioms Unreachable Code 1. Redundant Loads and Stores: If we see the instructions sequence SNJB s Late Sau. KBJ College Of Engineering, Chandwad 2

(1) MOV R0,a (2) MOV a,r0 We can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of a is already in register R0.If (2) had a label we could not be sure that (1) was always executed immediately before (2) and so we could not remove (2). 2. Unreachable Code: Another opportunity for peephole optimizations is the removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence of instructions. 3. Flows-Of-Control Optimizations: The unnecessary jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimizations. We can replace the jump sequence goto L1... L1: gotol2 by the sequence goto L2... L1: goto L2 If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2 provided it is preceded by an unconditional jump. 4. Algebraic Simplification: There is no end to the amount of algebraic simplification that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth considering implementing them.for example, statements such as x := x+0 Or x := x * 1 SNJB s Late Sau. KBJ College Of Engineering, Chandwad 3

Are often produced by straightforward intermediate code-generation algorithms, and they can be eliminated easily through peephole optimization. 5. Reduction in Strength: Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be used as special cases of more expensive operators. For example, x2 is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-point multiplication or division by a power of two is cheaper to implement as a shift. Floating-point division by a constant can be implemented as multiplication by a constant, which may be cheaper. X2 X*X 6. Use of Machine Idioms: The target machine may have hardware instructions to implement certain specific operations efficiently. For example, some machines have auto-increment and autodecrement addressing modes. These add or subtract one from an operand before or after using its value. The use of these modes greatly improves the quality of code when pushing or popping a stack, as in parameter passing. These modes can also be used in code for statements like i :=i+1. i:=i+1 i++ i:=i-1 i- - 8.1.3.1.4 Loop optimizations These act on the statements which make up a loop, such as a for loop (e.g., loopinvariant code motion). Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops. 8.1.3.1.5 Inter procedural or whole-program optimization These analyze all of a program's source code. The greater quantity of information extracted means that optimizations can be more effective compared to when they only have access to local information (i.e., within a single function). This kind of optimization SNJB s Late Sau. KBJ College Of Engineering, Chandwad 4

can also allow new techniques to be performed. For instance function in lining, where a call to a function is replaced by a copy of the function body. 8.1.3.1.6 Machine code optimization These analyze the executable task image of the program after all of a executable machine code has been linked. Some of the techniques that can be applied in a more limited scope, such as macro compression (which saves space by collapsing common sequences of instructions), are more effective when the entire executable task image is available for analysis. [1] In addition to scoped optimizations there are two further general categories of optimization: 8.1.3.2 Programming language-independent vs language-dependent Most high-level languages share common programming constructs and abstractions: decision (if, switch, case), looping (for, while, repeat.. until, do.. while), and encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However, certain language features make some kinds of optimizations difficult. For instance, the existence of pointers in C and C++makes it difficult to optimize array accesses (see Alias analysis). However, languages such as PL/1 (that also supports pointers) nevertheless have available sophisticated optimizing compilers to achieve better performance in various other ways. Conversely, some language features make certain optimizations easier. For example, in some languages functions are not permitted to have side effects. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can immediately infer that the function's result need be computed only once. 8.1.3.3 Machine independent vs machine dependent Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. The following is an instance of a local machine dependent optimization. To set a register to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 5

On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". (A potential p roblem with this is that XOR may introduce a data dependency on the previous value of the register, causing a pipeline stall. However, processors often have XOR of a register with itself as a special case that doesn't cause stalls.) Performing operations at compile-time (if possible) Computations and type conversions on constants, computing addresses of array elements with constant indexes, can be performed already by the compiler. Value propagation Tracing the VALUE of Inlining small functions Repeatedly inserting the function code instead of calling it, saves the calling overhead and enable further optimizations. Inlining large functions will make the executable too large. Code hoisting Moving as much as possible computations outside loops, saves computing time. In the following example (2.0 * PI) is an invariant expression that there is no reason to recompute it 100 times. DO I = 1, 100 ARRAY(I) = 2.0 * PI * I ENDDO Introducing a temporary variable 't' it can be transformed to: t = 2.0 * PI DO I = 1, 100 ARRAY(I) = t * I ENDDO Dead store elimination If the compiler detects variables that are never used, it may safely ignore many of the operations that compute their values. Such operations can't be ignored if there are (non-intrinsic) function calls involved, those functions have to be called, because of their possible side effects. Remember that before Fortran 95, Fortran didn't have the concept of "pure" function. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 6

Programs used as performance tests, and perform no 'real' computations, should be written to avoid being 'completely optimized out', writing the 'results' to screen/file may be enough to fool the compiler. Strength reduction Taking advantage of the machine architecture --------------------------- A simple example, the subject is clearly too machine dependant and highly technical for more than that: Register operations are much faster than memory operations, so all compilers try to put in registers data that is supposed to be heavily used, like temporary variables and array indexes. To facilitate such 'register scheduling' the largest sub-expressions may be computed before the smaller ones. Elimination of Common Sub Expressions (CSE) A. Construction of the tuple CSE_DAG: 1. Initialize a table which will list the nodes which currently define each variable or temporary. 2. Each CSE_DAG node consists of an operand or operator, a name list, and node number. If the node consists of an operator, it will also have edges to appropriate operand nodes. Initially, pre-defined variables (assigned prior to this basic block) will be unconnected nodes in the graph. 3. For each tuple; Identify the nodes currently defining each operand from the table. If a node does not join the two operand nodes with the correct operator, create one. If the tuple destination is a temporary, add the tuple destination to the operator node's name list and update the entry in the node table for the destination temporary. If the tuple destination is a variable and the operator node is not a child, add the tuple destination to the operator node s name list and update the entry in the node table for the destination variable. If the tuple destination is a variable and the operator node is a child, create a new node for the destination variable using the = operator. B. Creating the optimized tuples from the CSE_DAG: Traverse the CSE_DAG operator nodes in the order they were created; for each operator 1. Construct a tuple using the first variable name as destination. If no variables are listed, use the first temporary. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 7

2. If more than one variable name is present, add assignment tuples for each extra name. X = A * LOG(Y) + (LOG(Y) ** 2) We introduce an explicit temporary variable t: t = LOG(Y) X = A * t + (t ** 2) We saved one 'heavy' function call, by an elimination of the common sub-expression LOG(Y), now we will save the exponentiation by: X = (A + t) * t which is much better. The compiler may do all of this automatically, so don't waste too much energy on such transformations. A classic example - computing the value of a polynomial-------------------------------------- Eliminating Common Subexpressions may inspire good algorithms like the classic 'Horner's rule' for computing the value of a polynomial. y = A + B*x + C*(x**2) + D*(x**3) (canonical form) It is more efficient (i.e. executes faster) to perform the two exponentiations by converting them to multiplications, in this way we will get 3 additions and 5 multiplications in all. The following forms are more efficient to compute, they require less operations, and the operations that are saved are the 'heavy' ones (multiplication is an operation that takes a lot of CPU time, much more than addition). Stage #1: y = A + (B + C*x + D*(x**2))*x Stage #2 and last: y => A + (B + (C + D*x)*x)*x The last form requires 3 additions and only 3 multiplications! The algorithm hinted here, can be implemented with one loop to compute an arbitrary order polynomial. It may also be better numerically than direct computation of the canonical form. Example #1: d := ( a + b ) * c; e := a + b; f := (a + b ) * c; SNJB s Late Sau. KBJ College Of Engineering, Chandwad 8

tuples: + a b t1 * t1 c t2 = t2 0 d + a b t3 = t3 0 e + a b t4 * t4 c t5 = t5 0 f Example #2: d := ( a + b ) * c; c := a + b; f := (a + b ) * c; tuples: + a b t1 * t1 c t2 := t2 0 d + a b t3 := t3 0 c + a b t4 * t4 c t5 := t5 0 f Example #3: a := b * c + d * e; f := c + e * d; g := b * c + d * e * f; h := f * b * g; c := c + 1; g := b * c + d * e * f; h := d * e + d * e; tuples: * b c t1 * t10 g t11 * d e t2 := t11 0 h + t1 t2 t3 + c 1 t12 SNJB s Late Sau. KBJ College Of Engineering, Chandwad 9

:= t3 0 a := t12 0 c * e d t4 * b c t13 + c t4 t5 * d e t14 := t5 0 f * t14 f t15 * b c t6 + t13 t15 t16 * d e t7 := t16 0 g * t7 f t8 * d e t17 + t6 t8 t9 * d e t18 := t9 0 g + t17 t18 t19 * f b t10 := t19 0 h 8.1.4 Assignment Questions: 1. What is code optimization? 2. List the principle sources of code optimization. 3. What is meant by loop optimization? 4. Define optimizing compilers. 5. Give the organization of code optimizers. 6. What are the two levels of code optimization techniques? 7. What are the phases of code optimization? 8. Define local optimization. 9. Define global optimization. 10. What is code motion? SNJB s Late Sau. KBJ College Of Engineering, Chandwad 10