Lesson 7: Code optimisation. Compilers. Code optimisation. Code optimisation. Kinds of optimisations. Introductory concepts

Similar documents
8 Optimisation. 8.2 Machine-Independent Optimisation

8 Optimisation. 8.2 Machine-Independent Optimisation

8 Optimisation. 8.1 Introduction to Optimisation

COMPILER DESIGN - CODE OPTIMIZATION

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Computer Architecture and Assembly Language. Practical Session 3

Computer Architecture and System Programming Laboratory. TA Session 3

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Machine-Independent Optimizations

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

Basic Assembly Instructions

Tour of common optimizations

Code Generation. CS 540 George Mason University

Program Optimization. Jo, Heeseung

Intermediate Code Generation

THEORY OF COMPILATION

Compiler Design and Construction Optimization

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

4. An interpreter is a program that

Informatics Ingeniería en Electrónica y Automática Industrial

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB. Lab # 8. Conditional Processing

Programming Language Implementation

Compiler Architecture

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

Lexical Considerations

Code generation and optimization

Expression Evaluation and Control Flow. Outline

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)

Using Static Single Assignment Form

CP FAQS Q-1) Define flowchart and explain Various symbols of flowchart Q-2) Explain basic structure of c language Documentation section :

Compiler Construction 2010/2011 Loop Optimizations

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Compiler Construction 2016/2017 Loop Optimizations

Principles of Computer Science

UNIT IV INTERMEDIATE CODE GENERATION

Faculty of Engineering Student Number:

University of Technology Department of Computer Sciences. Final Examination st Term. Subject:Compilers Design

Program Verification using Templates over Predicate Abstraction. Saurabh Srivastava and Sumit Gulwani

Induction Variable Identification (cont)

Operators in C. Staff Incharge: S.Sasirekha

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Practical Malware Analysis

3.1 DATA MOVEMENT INSTRUCTIONS 45

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

Programming for Engineers Iteration

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Inline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing

Assignment 3. Problem Definition:

1 Lexical Considerations

Load Effective Address Part I Written By: Vandad Nahavandi Pour Web-site:

CS553 Lecture Generalizing Data-flow Analysis 3

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

x86 assembly CS449 Fall 2017

INSTITUTE OF AERONAUTICAL ENGINEERING

Cache Memories. Lecture, Oct. 30, Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

A flow chart is a graphical or symbolic representation of a process.

Module 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Unit 3. Operators. School of Science and Technology INTRODUCTION

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Condition-Controlled Loop. Condition-Controlled Loop. If Statement. Various Forms. Conditional-Controlled Loop. Loop Caution.

QUIZ: What value is stored in a after this

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Lab 6: Conditional Processing

CS110D: PROGRAMMING LANGUAGE I

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

The PCAT Programming Language Reference Manual


CSCI 127 Introduction to Database Systems

Intel386 TM DX MICROPROCESSOR SPECIFICATION UPDATE

Control flow and loop detection. TDT4205 Lecture 29

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

RAQUEL s Relational Operators

A main goal is to achieve a better performance. Code Optimization. Chapter 9

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

Repetition and Loop Statements Chapter 5

Compilers. Lecture 2 Overview. (original slides by Sam

UNIT- 3 Introduction to C++

Language Extensions for Vector loop level parallelism

Basic operators, Arithmetic, Relational, Bitwise, Logical, Assignment, Conditional operators. JAVA Standard Edition

LABORATORY WORK NO. 7 FLOW CONTROL INSTRUCTIONS

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Introduction to Visual Basic and Visual C++ Arithmetic Expression. Arithmetic Expression. Using Arithmetic Expression. Lesson 4.

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Second Part of the Course

CONTENTS: Compilation Data and Expressions COMP 202. More on Chapter 2

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Transcription:

Compilers Lesson 7: Code optimisation Third year Spring term Alfonso Ortega: alfonso.ortega@uam.es Enrique Alfonseca: Enrique.Alfonseca@uam.es Code optimisation Introductory concepts This process can be performed in three places: Between the semantic analysis and the code generation, to obtain optimised quadruples. During the generation of the code. After the generation (by optimising the code just produced) Perfect code optimisation is an undecidable problem (Aho, 70). Therefore, It can only be performed partially. Optimisation and debugging are not compatible. For instance, the optimisation may reorder the instructions, or eliminate the code associated to a part of the code, so it will be much more difficult to debug the program. Code optimisation The different kinds of optimisation can be grouped in the following types: Machine-dependent: Assignment of registers Special instructions Code reordering Kinds of optimisations Machine-independent: Execution at compilation time. Elimination of redundancies Reordering of operations Frequency reduction Strength reduction 3 4

Machine-dependent code optimisation Special instructions Most assemblers provide special instructions that facilitate the generation of the code in assembly language. For instance, in nasm loop, loope, loopz, loopne, loopnz. implement loops with a counter that is decremented at each iteration, even though there is no corresponding instruction in machine code. 5 Machine-dependent code optimisation Code reordering The underlying idea is that the order of the assembly instructions may influence the size of the final program. The following is an example: The following ASPLE code a:=b/c; d:=b%c; can produce the following code in assembly language mov dword eax, [b] cdq idiv dword eax, [c] mov dword [a], eax mov dword eax, [b] cdq idiv dword eax, [c] mov dword [d], edx 6 Machine-dependent code optimisation That code is not as efficient as it could be, as the instruction idiv leaves the remainder of the division in edx, and it is not necessary to repeat it. The following code is equivalent: mov dword eax, [b] cdq idiv dword eax, [c] mov dword [a], eax mov dword [d], edx Code reordering Execution during the compilation The motivation of this kind of optimisation is that there are some expressions whose value can be determined by the compiler during execution time. To do that, it is necessary to know the value of each variable at each point in the program: It can be stored in the symbols table or in a special table used just for this purpose. This optimisation is usually applied: To arithmetic and logic operations. To type conversions. 7 8

Execution during the compilation The following algorithm will be applied to each of the quadruples: (op, op1, op2, res), where op1 is an identifier, and (op1,v1) is in the symbols table T We substitute the quadruple op1 by v1. (op, op1, op2, res), where op2 is an identifier, and (op2,v2) is in the symbols table T We substitute the quadruple op2 by v2. (op, v1, v2, res), where v1 and v2 are constant values We eliminate the quadruple, remove from T the pair (res, v), if it exists, and add to T the pair (res, v1 op v2), unless v1 op v2 produces an error. In this case, we shall output a warning message and leave the quadruple as it was. Example: if (false) f = 1/0; (warning, not error message) (=, v1,, res), remove from T the pair (res, v), if exists. If v1 is a constant value, add to T the pair (res, v1). (+,2,3,t1) (=,t1,,i) (=,4,,i) (CIF,i,,t2) (+,t2,2.5,t3) (=,t3,,f) Optimised quadruples: Execution during the compilation Remove quadruple, T = {(t1,5)} Subst. by (=,5,,i), T = {(t1,5),(i,5)} T = {(t1,5),(i,4)} Subst. by (CIF,4,,t2), Rem. quadruple, T = {(t1,5),(i,4),(t2,4.0)} Subst. by (+,4.0,2.5,t3) Rem. quadruple, T = {(t1,5),(i,4),(t2,4.0),(t3,6.5)} Subst. by (=,6.5,,f) (=,5,,i) (=,4,,i) (=,6.5,,f) 9 10 Execution during the compilation: observations The compiler must have the values stored in the table permanently, and it has to be correctly updated. Whenever a variable may possibly change its value, it has to keep track of it: After a label, every value is forgotten. Loops Function calls etc. Problem: if the compiler executes in a computer, and generates code for a different computer, the first compiler might have less precision than the second one. int a,b,c,d; Removing redundancies: introductory example a = a+b*c; (*,b,c,t1) (*,b,c,t1) (+,a,t1,t2) (+,a,t1,t2) (=,t2,,a) (=,t2,,a) d = a+b*c; (*,b,c,t3) (+,a,t3,t4) (+,a,t1,t4) (=,t4,,d) (=,t4,,d) b = a+b*c; (*,b,c,t5) (+,a,t5,t6) (=,t6,,b) (=,t4,,b) 11 12

Removing redundancies: algorithm Each of the variable in the symbols table is assigned the dependency -1. Number the quadruples. for (i=0; i< number-of-quadruples ; i++) { The identifier that stores the result of quadruple (i) is assigned the dependency i. The quadruple number (i) will be assigned as dependency: 1 + max. number of dependencies of the operands. If Either quadruple i is not an assignment, and it has the same operation code and operands as j, j<i, Or quadruple i is an assignment, and it has the same operation code, operands, and result variable, and the dependencies of both quadruples are the same, then: Substitute the quadruple i by a null quadruple (SAME,j,0,0), which will not generate code. In the next quadruples, we are going to substitute the result of quadruple i with the result of quadruple j. VAR 0 * b c t1 1 + a t1 t2 2 = t2 a 3 * b c t3 INITIAL SET OF QUADRUPLES a -1 13 14 i VAR 0 * b c a -1 1 + a t1 t2 2 = t2 a 3 * b c t3 VAR 0 * b c a -1 1 + a t1 2 = t2 a 3 * b c t3 QUADRUPLE 0 QUADRUPLE 1 15 16

VAR 0 * b c a -1 2 1 + a t1 3 * b c t3 VAR 0 * b c a -1 2 1 + a t1 3 * b c t3 0 QUADRUPLE 2 QUADRUPLE 3.1 17 18 VAR 0 * b c a -1 2 1 + a t1 3 * b c t3 0 SAME 0 - - QUADRUPLE 3.2 VAR 0 * b c a -1 2 1 + a t1 QUADRUPLE 3.3 19 20

VAR 0 * b c 1 + a t1 t1 QUADRUPLE 4.1 VAR 0 * b c 1 + a t1 t1 3 QUADRUPLE 4.2 21 22 VAR 0 * b c 1 + a t1 5 4 + a t1 t4 3 5 QUADRUPLE 5 VAR 0 * b c 1 + a t1 d 5 4 + a t1 t4 3 5 0 SAME 0 - - t5 6 QUADRUPLE 6 23 24

VAR 0 * b c 1 + a t1 d 5 4 + a t1 t4 3 5 6 SAME 0 - - (0) 3 t1 t5 6 t6 7 QUADRUPLE 7.1 VAR 0 * b c 1 + a t1 d 5 4 + a t1 t4 3 5 6 SAME 0 - - (0) 7 + a t1 t6 3 SAME 4 - - t5 6 t6 7 QUADRUPLE 7.2 25 26 VAR 0 * b c 1 + a t1 d 5 4 + a t1 t4 3 5 6 SAME 0 - - (0) 7 SAME 4 - - (3) 5 t5 6 t4 t6 7 QUADRUPLE 8.1 VAR 0 * b c 1 + a t1 8 d 5 4 + a t1 t4 3 5 6 SAME 0 - - (0) 7 SAME 4 - - (3) 8 = t4 b 5 t5 6 t6 7 QUADRUPLE 8.2 27 28

Reordering operations (* b c t1) (+ a t1 t2) (= t2 a) (+ a t1 t4) (= t4 d) (= t4 b) FINAL RESULT t1 = b * c a = a + t1 d = a + t1 b = t4 We can take advantage of the commutability and associability of some operations to obtain a code which is more efficient, without altering the result. For instance, let us consider the following quadruple: (*, b, c, -) (*, c, b, -) We shall see the following possibilities: Fixing a canonical ordering of the operands in commutative operations. Maximising the use of monadic operators, to increase the number of equivalent quadruples. Reusing intermediate variables, when applying the properties of the operations, to transform the initial expressions. 29 30 Reordering operations We can order the operands according to the following order: 1. Terms that are neither variables nor constants. 2. Variables, by alphabetical order. 3. Constants. Example: a=1+c+d+3; a=c+d+1+3; b=d+c+2; b=c+d+2; In this case, the reordering allows us: To optimise 1+3 because it will be done during compilation time. Optimise the calculations, because c+d will be identified as common parts of those expressions. We shall not attempt to optimise every possible situation. For instance: a=1+c+d+3; a=c+d+1+3; b=d+c+c+d; b=c+c+d+d In this case, one of the two c+d is not recognised as common part of those expressions. Reordering operations: increasing the use of unary operators Some sub-expressions may appear in other parts of the code, affected by a monadic operator. For instance, in this example, we have c-d and d-c: a = c-d; (-,c,d,t1) (-,c,d,t1) (=,t1,,a) (=,t1,,a) b = d-c; (-,d,c,t2) (-,t1,,t2) (=,t2,,b) (=,t2,,b) In this case, the optimisation does not reduce the number of clauses, but a clause is substituted by other which is more efficient. 31 32

Reordering operations: reducing intermediate variables Note that reducing the number of intermediate variables may be incompatible with other kinds of optimisation. The procedure will be illustrated with one example. Consider the following expression: (a*b)+(c+d) Because addition is associative: (a*b)+(c+d) ((a*b)+c )+d We have two possible sets of tuples for this expression: (*,a,b,t1) (*,a,b,t1) (+,c,d,t2) (+,t1,c,t1) (+,t1,t2,t1) (+,t1,d,t1) Because addition is commutative, (a+b)+(c*d) (c*d)+(a+b) the following are alternative sets of tuples for the expression: (+,a,b,t1) (*,c,d,t2) (+,t1,t2,t1) (*,c,d,t1) (+,a,t1,t1) (+,t1,b,t1) Algorithm for calculating the min. number of variables required 1. Build a graph for the expression. 2. If (j,k)are the labels of the children of node i: 3. If (j=k) 1. Associate (k+1) to node i. 4. Otherwise, associate max(j,k)to node i. 33 34 : example (a*b)+(c+d) ((a*b)+c)+d +(2) +(1) --------------------- -------------------- *(1) +(1) +(1) d(0) ------------- ------------ ------------------- a(0) b(0) c(0) d(0) *(1) c(0) ------------- a(0) b(0) (a+b)+(c*d) a+(c*d)+b +(2) +(1) ----------------- ------------------ +(1) *(1) +(1) b(0) ------------- --- --------- ------------------ a(0) b(0) c(0) d(0) a(0) *(1) ------------ c(0) d(0) 35 Loop optimisation It is possible to optimise the execution of loops in the following two ways: 1. By loop invariance: An operation is invariant with respect to a loop if none of its operands changes its value during the execution of the loop. In these cases, it is possible to take the comparison out of the loop, because it is not necessary to execute it inside the loop. 2. By strength reduction: This consists in substituting, inside the loop, a strong operation (one that is computationally expensive, such as the product) with a weak operation (a less expensive one), such as the addition or the change of sign. 36

Optimising loops with strength reduction In the following loop: for (i=a; i<c; i+=b) { d=i*k; } the variable d receives the following values: d=a*k d=(a+b)*k d=(a+b+b)*k We can assume that b,k are invariant inside the loop (if b is an expression, all its operands should be invariant) i is not modified inside the body of the loop. It only changes in the instruction i+=b d is never used before the instruction, and it is not modified afterwards. Optimising loops with strength reduction In this case, we can substitute the loop with the following code: d=a*k; t1=b*k; for (i=a; i<c; i+=b, d+=t1) {} In this case, the values of d are the same as before: d=a*k d=a*k+b*k d=a*k+b*k+b*k But, at each iteration of the loop, rather than executing a product we only need to execute an addition. 37 38 Optimising loops with strength reduction: generalisation INIT (*, vb, ib, v) (*, vb, ib, v) (*, inc, ib, t) INCR (+, v, t, v) (+, vb, ib, v) INIT INCR (+, vb, ib, v) (+, v, inc, v) vb variable of the loop ib invariant variable in the loop inc increment of vb 39 Optimising loops with strength reduction: example In the following loop: for (i=0; i<10; i++) { a=(b+c*i)*d; } The variables b, c, d are invariant in the loop. We only need to identify in each sentence which is invariant, which is the increment, and which is the loop variable. INIT: (=, 0,,,i) LOOP: (*, c, i, t1) INCR: (+, i, 1, i) inc vb INIT: (=, 0,, i) (*, c, i, t1) (*, c, 1, t4) LOOP: INCR: (+, i, 1, i) (+, t1, t4, t1) 40

INIT: (=, 0,, i) (*, c, 0, t1) (*, c, 1, t4) LOOP: INCR: (+, i, 1, i) (+, t1, t4, t1) Optimising loops with strength reduction: example inc vb INIT: (=, 0,, i) (*, c, 0, t1) (*, c, 1, t4) LOOP: INCR: (+, i, 1, i) (+, t1, t4, t1) (+, t2, t4, t2) This is not used inside the loop, so we may remove it 41 INIT: (=, 0,, i) (*, c, 0, t1) (*, c, 1, t4) LOOP: INCR: (+, i, 1, i) (+, t2, t4, t2) Optimising loops with strength reduction: example inc vb INIT: (=, 0,, i) (*, c, 0, t1) (*, c, 1, t4) (*, t4, d, t5) LOOP: INCR: (+, i, 1, i) (+, t2, t4, t2) (+, t3, t5, t3) This is not used inside the loop, so we may remove it 42 Final result: Optimising loops with strength reduction: example INIT: (=, 0,, i) (*, c, 0, t1) (*, c, 1, t4) (*, t4, d, t5) LOOP: INCR: (+, i, 1, i) (+, t3, t5, t3) Optimising loops with strength reduction: example If we have nested loops, strength reduction should be applied starting in the inner ones. If there are functions calls inside the loop, The compiler must know whether the function receives the reference of a variable (in this case it may not be invariant). Equally, the function might change the value of global variables. It a loop will only be executed a few times, the optimisation will not be appreciated. It may even degrade the performance. This means that loop optimisation is not always recommended. Sometimes, it is performed in two steps: Analysing of the program, and gathering information about the variables. Loop optimisation. 43 44

Other optimisations Region optimisation: When we try to optimise a complete program, we apply all these optimisations, and the program is next represented as a graph, with blocks, indicating the program s control flow. There are techniques to simplify the blocks. Concerning dead assignments: In a program, a variable might receive a value, which is not used before the variable is assigned a different value. In this case, we say that the first assignment is a dead assignment, and we can remove it. To identify these circumstances in the whole program may be too costly, because it may be necessary to consider every possible control flow. 45