Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Similar documents
Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Compiler Optimization

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Goals of Program Optimization (1 of 2)

Tour of common optimizations

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Code Generation. M.B.Chandak Lecture notes on Language Processing

Intermediate Code & Local Optimizations

CODE GENERATION Monday, May 31, 2010

UNIT-V. Symbol Table & Run-Time Environments Symbol Table

More Code Generation and Optimization. Pat Morin COMP 3002

ECE 486/586. Computer Architecture. Lecture # 7

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Compiler Optimization Techniques

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Machine-Independent Optimizations

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Compiler Optimization Intermediate Representation

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

Code generation and local optimization

Computer Science 160 Translation of Programming Languages

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Code generation and local optimization

Programming Language Implementation

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Compiler Design and Construction Optimization

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Group A Assignment 3(2)

Languages and Compiler Design II IR Code Optimization

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

VIVA QUESTIONS WITH ANSWERS

CS153: Compilers Lecture 15: Local Optimization

CS 406/534 Compiler Construction Putting It All Together

Running class Timing on Java HotSpot VM, 1

Understand the factors involved in instruction set

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Optimization Prof. James L. Frankel Harvard University

Code Optimization. Code Optimization

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Intermediate Code & Local Optimizations. Lecture 20

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Programming Language Processor Theory

Compiler construction 2009

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

Lecture Notes on Loop-Invariant Code Motion

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

R (2) Write a program to demonstrate Subneting & find the Subnet Mask.

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

8 Optimisation. 8.2 Machine-Independent Optimisation

55:132/22C:160, HPCA Spring 2011

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Midterm II CS164, Spring 2006

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

Communicating with People (2.8)

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces

Advanced Database Systems

Lecture 21 CIS 341: COMPILERS

Compiler Code Generation COMP360

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

COMPUTER ORGANIZATION & ARCHITECTURE

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

Calvin Lin The University of Texas at Austin

Chapter 5. A Closer Look at Instruction Set Architectures

Lecture Notes on Loop Optimizations

VETRI VINAYAHA COLLEGE OF ENGINEERING AND TECHNOLOGY

Intermediate representation

Control flow graphs and loop optimizations. Thursday, October 24, 13

COMPILER DESIGN - CODE OPTIMIZATION

We briefly explain an instruction cycle now, before proceeding with the details of addressing modes.

Digital System Design Using Verilog. - Processing Unit Design

Intermediate Representations Part II

Introduction. L25: Modern Compiler Design

The View from 35,000 Feet

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Chapter 7 The Potential of Special-Purpose Hardware

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Intermediate Code Generation

Fixed-Point Math and Other Optimizations

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

DSP VLSI Design. Instruction Set. Byungin Moon. Yonsei University

Multi-dimensional Arrays

Compiler Construction 2010/2011 Loop Optimizations

CHAPTER 5 A Closer Look at Instruction Set Architectures

Transcription:

Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler Construction 8.1.3 Relevant Theory / Literature Survey: Code Optimization:- Code Optimization is the phase of compilation that focuses on generating a good code. Most of the time a good code means a code that runs fast. However, there are some cases where a good code is a code that does not require a lot of memory. 8.1.3.1 Types of optimizations Techniques used in optimization can be broken up among various scopes which can affect anything from a single statement to the entire program. Generally speaking, locally scoped techniques are easier to implement than global ones but result in smaller gains. Some examples of scopes include: 8.1.3.1.1 Local optimizations These only consider information local to a function definition. This reduces the amount of analysis that needs to be performed (saving time and reducing storage requirements) but means that worst case assumptions have to be made when function calls occur or global variables are accessed. Optimization is considered local if it is done at a basic block level (a sequence of instruction where there is no branch in and out through its entirety). Local Optimization focuses on: Elimination of redundant operations Effective instruction scheduling Effective register allocation. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 1

8.1.3.1.2 Global optimizations Global Optimization focuses on: Same techniques performed by local optimization but at multi-basic-block level Code modifications to improve the performance of loops. Both local and global optimization use a control flow graph to represent the program, and a data flow analysis algorithm to trace the flow of information. 8.1.3.1.3 Peephole optimizations Usually performed late in the compilation process after machine code has been generated. This form of optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2 might be more efficiently executed by left-shifting the value or by adding the value to itself. (This example is also an instance of strength reduction.) Peephole Optimization works by sliding a several-instruction window (a peephole) over the target code, and looking for suboptimal patterns of instructions. The patterns to look for are heuristic, and typically based on special instructions available on a given machine. The peephole is a small, moving window on the target program. The code in the peephole need not contiguous, although some implementations do require this. it is characteristic of peephole optimization that each improvement may spawn opportunities for additional improvements. We shall give the following examples of program transformations that are characteristic of peephole optimizations: Redundant-instructions elimination Flow-of-control optimizations Algebraic simplifications Use of machine idioms Unreachable Code 1. Redundant Loads and Stores: If we see the instructions sequence SNJB s Late Sau. KBJ College Of Engineering, Chandwad 2

(1) MOV R0,a (2) MOV a,r0 We can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of a is already in register R0.If (2) had a label we could not be sure that (1) was always executed immediately before (2) and so we could not remove (2). 2. Unreachable Code: Another opportunity for peephole optimizations is the removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence of instructions. 3. Flows-Of-Control Optimizations: The unnecessary jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimizations. We can replace the jump sequence goto L1... L1: gotol2 by the sequence goto L2... L1: goto L2 If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2 provided it is preceded by an unconditional jump. 4. Algebraic Simplification: There is no end to the amount of algebraic simplification that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth considering implementing them.for example, statements such as x := x+0 Or x := x * 1 SNJB s Late Sau. KBJ College Of Engineering, Chandwad 3

Are often produced by straightforward intermediate code-generation algorithms, and they can be eliminated easily through peephole optimization. 5. Reduction in Strength: Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be used as special cases of more expensive operators. For example, x2 is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-point multiplication or division by a power of two is cheaper to implement as a shift. Floating-point division by a constant can be implemented as multiplication by a constant, which may be cheaper. X2 X*X 6. Use of Machine Idioms: The target machine may have hardware instructions to implement certain specific operations efficiently. For example, some machines have auto-increment and autodecrement addressing modes. These add or subtract one from an operand before or after using its value. The use of these modes greatly improves the quality of code when pushing or popping a stack, as in parameter passing. These modes can also be used in code for statements like i :=i+1. i:=i+1 i++ i:=i-1 i- - 8.1.3.1.4 Loop optimizations These act on the statements which make up a loop, such as a for loop (e.g., loopinvariant code motion). Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops. 8.1.3.1.5 Inter procedural or whole-program optimization These analyze all of a program's source code. The greater quantity of information extracted means that optimizations can be more effective compared to when they only have access to local information (i.e., within a single function). This kind of optimization SNJB s Late Sau. KBJ College Of Engineering, Chandwad 4

can also allow new techniques to be performed. For instance function in lining, where a call to a function is replaced by a copy of the function body. 8.1.3.1.6 Machine code optimization These analyze the executable task image of the program after all of a executable machine code has been linked. Some of the techniques that can be applied in a more limited scope, such as macro compression (which saves space by collapsing common sequences of instructions), are more effective when the entire executable task image is available for analysis. [1] In addition to scoped optimizations there are two further general categories of optimization: 8.1.3.2 Programming language-independent vs language-dependent Most high-level languages share common programming constructs and abstractions: decision (if, switch, case), looping (for, while, repeat.. until, do.. while), and encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However, certain language features make some kinds of optimizations difficult. For instance, the existence of pointers in C and C++makes it difficult to optimize array accesses (see Alias analysis). However, languages such as PL/1 (that also supports pointers) nevertheless have available sophisticated optimizing compilers to achieve better performance in various other ways. Conversely, some language features make certain optimizations easier. For example, in some languages functions are not permitted to have side effects. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can immediately infer that the function's result need be computed only once. 8.1.3.3 Machine independent vs machine dependent Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. The following is an instance of a local machine dependent optimization. To set a register to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 5

On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". (A potential p roblem with this is that XOR may introduce a data dependency on the previous value of the register, causing a pipeline stall. However, processors often have XOR of a register with itself as a special case that doesn't cause stalls.) Performing operations at compile-time (if possible) Computations and type conversions on constants, computing addresses of array elements with constant indexes, can be performed already by the compiler. Value propagation Tracing the VALUE of Inlining small functions Repeatedly inserting the function code instead of calling it, saves the calling overhead and enable further optimizations. Inlining large functions will make the executable too large. Code hoisting Moving as much as possible computations outside loops, saves computing time. In the following example (2.0 * PI) is an invariant expression that there is no reason to recompute it 100 times. DO I = 1, 100 ARRAY(I) = 2.0 * PI * I ENDDO Introducing a temporary variable 't' it can be transformed to: t = 2.0 * PI DO I = 1, 100 ARRAY(I) = t * I ENDDO Dead store elimination If the compiler detects variables that are never used, it may safely ignore many of the operations that compute their values. Such operations can't be ignored if there are (non-intrinsic) function calls involved, those functions have to be called, because of their possible side effects. Remember that before Fortran 95, Fortran didn't have the concept of "pure" function. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 6

Programs used as performance tests, and perform no 'real' computations, should be written to avoid being 'completely optimized out', writing the 'results' to screen/file may be enough to fool the compiler. Strength reduction Taking advantage of the machine architecture --------------------------- A simple example, the subject is clearly too machine dependant and highly technical for more than that: Register operations are much faster than memory operations, so all compilers try to put in registers data that is supposed to be heavily used, like temporary variables and array indexes. To facilitate such 'register scheduling' the largest sub-expressions may be computed before the smaller ones. Elimination of Common Sub Expressions (CSE) A. Construction of the tuple CSE_DAG: 1. Initialize a table which will list the nodes which currently define each variable or temporary. 2. Each CSE_DAG node consists of an operand or operator, a name list, and node number. If the node consists of an operator, it will also have edges to appropriate operand nodes. Initially, pre-defined variables (assigned prior to this basic block) will be unconnected nodes in the graph. 3. For each tuple; Identify the nodes currently defining each operand from the table. If a node does not join the two operand nodes with the correct operator, create one. If the tuple destination is a temporary, add the tuple destination to the operator node's name list and update the entry in the node table for the destination temporary. If the tuple destination is a variable and the operator node is not a child, add the tuple destination to the operator node s name list and update the entry in the node table for the destination variable. If the tuple destination is a variable and the operator node is a child, create a new node for the destination variable using the = operator. B. Creating the optimized tuples from the CSE_DAG: Traverse the CSE_DAG operator nodes in the order they were created; for each operator 1. Construct a tuple using the first variable name as destination. If no variables are listed, use the first temporary. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 7

2. If more than one variable name is present, add assignment tuples for each extra name. X = A * LOG(Y) + (LOG(Y) ** 2) We introduce an explicit temporary variable t: t = LOG(Y) X = A * t + (t ** 2) We saved one 'heavy' function call, by an elimination of the common sub-expression LOG(Y), now we will save the exponentiation by: X = (A + t) * t which is much better. The compiler may do all of this automatically, so don't waste too much energy on such transformations. A classic example - computing the value of a polynomial-------------------------------------- Eliminating Common Subexpressions may inspire good algorithms like the classic 'Horner's rule' for computing the value of a polynomial. y = A + B*x + C*(x**2) + D*(x**3) (canonical form) It is more efficient (i.e. executes faster) to perform the two exponentiations by converting them to multiplications, in this way we will get 3 additions and 5 multiplications in all. The following forms are more efficient to compute, they require less operations, and the operations that are saved are the 'heavy' ones (multiplication is an operation that takes a lot of CPU time, much more than addition). Stage #1: y = A + (B + C*x + D*(x**2))*x Stage #2 and last: y => A + (B + (C + D*x)*x)*x The last form requires 3 additions and only 3 multiplications! The algorithm hinted here, can be implemented with one loop to compute an arbitrary order polynomial. It may also be better numerically than direct computation of the canonical form. Example #1: d := ( a + b ) * c; e := a + b; f := (a + b ) * c; SNJB s Late Sau. KBJ College Of Engineering, Chandwad 8

tuples: + a b t1 * t1 c t2 = t2 0 d + a b t3 = t3 0 e + a b t4 * t4 c t5 = t5 0 f Example #2: d := ( a + b ) * c; c := a + b; f := (a + b ) * c; tuples: + a b t1 * t1 c t2 := t2 0 d + a b t3 := t3 0 c + a b t4 * t4 c t5 := t5 0 f Example #3: a := b * c + d * e; f := c + e * d; g := b * c + d * e * f; h := f * b * g; c := c + 1; g := b * c + d * e * f; h := d * e + d * e; tuples: * b c t1 * t10 g t11 * d e t2 := t11 0 h + t1 t2 t3 + c 1 t12 SNJB s Late Sau. KBJ College Of Engineering, Chandwad 9

:= t3 0 a := t12 0 c * e d t4 * b c t13 + c t4 t5 * d e t14 := t5 0 f * t14 f t15 * b c t6 + t13 t15 t16 * d e t7 := t16 0 g * t7 f t8 * d e t17 + t6 t8 t9 * d e t18 := t9 0 g + t17 t18 t19 * f b t10 := t19 0 h 8.1.4 Assignment Questions: 1. What is code optimization? 2. List the principle sources of code optimization. 3. What is meant by loop optimization? 4. Define optimizing compilers. 5. Give the organization of code optimizers. 6. What are the two levels of code optimization techniques? 7. What are the phases of code optimization? 8. Define local optimization. 9. Define global optimization. 10. What is code motion? SNJB s Late Sau. KBJ College Of Engineering, Chandwad 10