Compiler Design and Construction Optimization

Similar documents
Code generation and optimization

Code generation and local optimization

Code generation and local optimization

Optimization Prof. James L. Frankel Harvard University

Tour of common optimizations

More Code Generation and Optimization. Pat Morin COMP 3002

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Compiler Optimization

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Instruction Set Architecture (Contd)

Machine-Independent Optimizations

Compiler Optimization Techniques

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

ECE 486/586. Computer Architecture. Lecture # 7

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Compiler Code Generation COMP360

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Languages and Compiler Design II IR Code Optimization

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

CMSC 611: Advanced Computer Architecture

COMPILER DESIGN - CODE OPTIMIZATION

CODE GENERATION Monday, May 31, 2010

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

UNIT-V. Symbol Table & Run-Time Environments Symbol Table

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

CSE Lecture In Class Example Handout

PERFORMANCE OPTIMISATION

CSE Lecture In Class Example Handout

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Three address representation

55:132/22C:160, HPCA Spring 2011

CS 2461: Computer Architecture 1

Intermediate Code & Local Optimizations

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Principles of Compiler Design

CS415 Compilers. Intermediate Represeation & Code Generation

CS 6353 Compiler Construction, Homework #3

Code Generation. Lecture 30

A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Goals of Program Optimization (1 of 2)

Optimising with the IBM compilers

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

Compiler Construction 2016/2017 Loop Optimizations

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

A main goal is to achieve a better performance. Code Optimization. Chapter 9

CSc 453. Code Generation I Christian Collberg. Compiler Phases. Code Generation Issues. Slide Compilers and Systems Software.

Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Instructions: Assembly Language

Compiler Architecture

Compiler construction in4303 lecture 9

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Compiler Construction 2010/2011 Loop Optimizations

Lecture #6 Intro MIPS; Load & Store

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Introduction. CSc 453. Compilers and Systems Software. 19 : Code Generation I. Department of Computer Science University of Arizona.

Communicating with People (2.8)

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate representation

Type checking. Jianguo Lu. November 27, slides adapted from Sean Treichler and Alex Aiken s. Jianguo Lu November 27, / 39

MIPS Functions and the Runtime Stack

Lecture 21 CIS 341: COMPILERS

MODEL ANSWERS COMP36512, May 2016

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Intermediate Representation (IR)

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Code Generation. CS 540 George Mason University

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

We can emit stack-machine-style code for expressions via recursion

Pipeline Architecture RISC

Review of the C Programming Language for Principles of Operating Systems

Why Pointers. Pointers. Pointer Declaration. Two Pointer Operators. What Are Pointers? Memory address POINTERVariable Contents ...

Final Exam Review Questions

Intermediate Code Generation (ICG)

Code Generation. Lecture 31 (courtesy R. Bodik) CS164 Lecture14 Fall2004 1

COMPILER DESIGN. Intermediate representations Introduction to code generation. COMP 442/6421 Compiler Design

2. dead code elimination (declaration and initialization of z) 3. common subexpression elimination (temp1 = j + g + h)

Translating JVM Code to MIPS Code 1 / 43

CS 430 Computer Architecture. C/Assembler Arithmetic and Memory Access William J. Taffe. David Patterson

Instruction Selection and Scheduling

Reminder: tutorials start next week!

EE 4683/5683: COMPUTER ARCHITECTURE

CSCE 5610: Computer Architecture

Code Generation. M.B.Chandak Lecture notes on Language Processing

Instruction Set Architecture. "Speaking with the computer"

Control flow graphs and loop optimizations. Thursday, October 24, 13

2/16/2018. Procedures, the basic idea. MIPS Procedure convention. Example: compute multiplication. Re-write it as a MIPS procedure

EC 413 Computer Organization

CSE 401/M501 Compilers

Lecture 20: AVR Programming, Continued. AVR Program Visible State (ones we care about for now)

INSTITUTE OF AERONAUTICAL ENGINEERING (AUTONOMOUS)

Transcription:

Compiler Design and Construction Optimization

Generating Code via Macro Expansion Macroexpand each IR tuple or subtree A := B+C; D := A * C; lw $t0, B, lw $t1, C, add $t2, $t0, $t1 sw $t2, A lw $t0, A lw $t1, C mul $t2, $t0, $t1 sw $t2, D 2

Generating Code via Macro Expansion D := (B+C)*C; t1=b+c lw $t0, B, lw $t1, C add $t2, $t0, $t1 sw $t2, t1 t2=t1*c lw $t0, t1 lw $t1, C mul $t2, $t0, $t1 sw $t2, t2 d = t2 lw $t0, t2 sw $t0, d 3

Generating Code via Macro Expansion Macroexpansion gives poor quality code if each tuple expanded separately Ignoring state (values already loaded) What if more than 1 tuple can be replaced with 1 instruction Powerful addressing modes Powerful instructions Loop construct that decrements, tests and jumps if necessary 4

Register and Temporary Management Efficient use of registers Values used often remain in registers Temp values reused if possible Define Register classes Allocatable Explicitly allocated and freed Reserved Have a fixed function Volatile Can be used at any time Good for temp values (A:=B) 5

Temporaries Usually in registers (for quick access) Storage temporaries Reside in memory Save registers or large data objects Pseudoregisters Load into volatile, then save back out Generates poor code Moving temp from register to pseudo register is called spilling 6

A separate generator for each tuple Modularized Simpler Harder to generate good code Easy to add to yacc! A single generator More complex 7

Instruction selection Addressing modes, intermediates R-R, R-M, M-M, RI... Address-mode selection Remember all the types! Register allocation These are tightly coupled Address-mode affects instruction Instruction can affect register See handout for a + code generator (following slides) Doesn't handle 0 or same oprnd twice 8

for Integer Add ( From Fischer Leblanc, Fig 15.1 Generate code for integer add: (+,A,B,C) A,B operands, C is destination Possible operand modes for A and B are: (1) Literal (stored in value field) (2) Indexed (stored in adr field as (Reg,Displacement) pair; indirect=f) (3) Indirect (stored in adr field as (Reg,Displacement) pair, indirect=t) (4) Live register (stored in Reg field) (5) Dead register (stored in Reg field) Possible operand modes for C are: (1) Indexed (stored in adr field as (Reg,Displacement) pair, indirect=f) (2) Indirect (stored in adr field as (Reg,Displacement) pair, indirect=t) (3) Live register (stored in Reg field) (4) Unassigned register (stored in Reg field, when assigned) 9

(a) Swap operands (knowing addition is commutative) if (B.mode == DEAD_REGISTER A.mode == LITERAL) Swap A and B; /* This may save a load or store since addition overwrites the first operand. */ 10

(b) Target result directly into C (if possible) switch (C.mode) { case LIVE_REGISTER: Target = C.reg; break; case UNASSIGNED_REGISTER: if (A.mode == DEAD_REGISTER) C.reg = A.reg; /* Compute into A's reg, then assign it to C. */ else Assign a register to C.reg; } C.mode = LIVE_REGISTER; Target = C.reg; break; case INDIRECT: case INDEXED: if (A.mode == DEAD_REGISTER) Target = A.reg; else Target = v2; /* vi is the i-th volatile register. */ break; 11

(c) Map operand B to right operand of add instruction (the "Source") if (B.mode == INDIRECT) { /* Use indexing to simulate indirection. */ generate(load,b.adr,v1, ); // v1 is a volatile register. B.mode = INDEXED; B.adr = (address) B.reg = v1; B.displacement = 0; ; } Source = B; 12

(d) Now generate the add instruction switch (A.mode) { /* Load operand A (if necessary). */ case LITERAL: if (B.mode == LITERAL) // Do we need to fold? generate(load,#(a.val+b.val),target, ); break; else generate(load, #A, val.target); case INDEXED: generate(load,a.adr,target, ); break; case LIVE_REGISTER: generate(load,a.reg,target, ); break; case INDIRECT: generate(load,a.adr,v2, ); t.reg = v2; t.displacement = 0; generate(load,t,target, ); break; case DEAD_REGISTER: if (Target!= A.reg) generate(load,a.reg,target, ); break; } generate(add,source,target, ); 13

(e) Store result into C (if necessary) if (C.mode == INDEXED) generate(store,c.adr,target, ); else if (C.mode == INDIRECT) { generate(load,c.adr,v3, ); t.reg = v3; t.displacement = 0; generate(store,t,target, ); } 14

Improving Code Removing extra loads and stores r2 := r1 + 5 i := r2 r3 := i r3 := r3 3 Copy propagation r2 := r1 r3 := r1 + r2 r2 := 5 r2 := r1 r3 := r1 + r1 r2 := 5 r2 := r1 + 5 i := r2 r3 := r2 3 r3 := r1 + r1 r2 := 5 What about? if (?) then A := B +C; D := B*C; // Can we use previous loads? 15

Improving Code Constant folding r2 := 4 * 3 Constant propagation r2 := 4 r3 := r1 + r2 r2 :=... r2 := 4 r3 := r1 + 4 r2 :=... r2 := 12 r3 := r1 + 4 r2 :=... 16

Redundant computations Common subexpression (CSE) A := b+c; D := 3 * (b+c); b+c already calculated, so don't do again But what about? A := b+c; b := 3; D := 3 * (b+c); Need to know if the CSE is alive or dead This also applies with copy propagation Array indexing often causes CSEs 17

Redundant computations Common subexpression (CSE) A := b+c; D := b+f+c; b+c already calculated, don't do again A := b+c; D := A+f; Problem is to identify the CSEs Store A+B+C, A+C+B, B+C+A all in the same form E = A + C; D = A + B + C 18

Redundant computations To take advantage of CSEs, keep track of what values are already in the temp registers and when they die This can be complex Can use a simple stack approach More complex allocation scheme allocation/deallocation with spilling Allocation with auto deallocate based on usage pattern. What about A(i) := b+c; D := A(j); A(j) is already stored if (i == j) This is aliasing and can cause problems If A(j) gets set, A(i) should be killed 19

Peephole Optimization As in the + example in the handout, we could have special cases in all of the semantic routines Or we could worry about it later and look at the generated code for special cases Pattern-replacement pairs can be used A pattern replace pair is of the form Pattern replacement If Pattern is seen, it is replaced with replacement 20

Peephole Optimization: Useful Replacement Rules Constant folding don t calc constants (+,Lit1,Lit2,Result) (:=,Lit1+Lit2,Result) (:=,Lit1,Result1),(+,Lit2,Result,Result2) (:=,Lit1,Result1),(:=,Lit1+Lit2,Result2) Strength reduction - slow op to fast op (*,Oprnd,2,Res) (ShiftLeft,Oprnd,1,Res) (*,Oprnd,4,Res) (ShiftLeft,Oprnd,2,Res) Null sequences - delete useless calcs (+,Oprnd,0,Res) (:=,Oprnd,Res) (*,Oprnd,1,Res) (:=,Oprnd,Res) 21

Peephole Optimization: Useful Replacement Rules Combine Operations many with 1 Load A,Ri; Load A+1,Ri+1 DoubleLoad A,Ri BranchZero L1,R1; Branch L1; L1: BranchNotZero L2, R1 Subtract #1,R1; BranchZero L1,R1 SubtractOneBranch L1,R1 Algebraic Laws (+,Lit,Oprnd,Res) (+,Oprnd,Lit,Res) (-,0,Oprnd,Res) (negate,oprnd,res) 22

Peephole Optimization: Useful Replacement Rules Combine Operations many with 1 Subtract #1, R1 decrement R1 Add #1, R1 increment R1 Load #0, R1; Store A, R1 Clear A Address Mode operations Load A, R1; Add 0(R1),R2 Add @A, R2 @A denotes indirect addressing Subtract #2, R1; Clear 0(R1) Clear -(R1) -(Ri) denotes auto decrement Others (:=,A,A) (:=,oprand1, A)(:=,oprnd2,A) (:=,Oprnd2,A) 23

Global Optimizations vs Local Optimizations Consider if A = B then C := 1; D := 2; else E:= 3; endif; A := 1; Data flow graph Local optimization On a branch Global optimization Between branches T C := 1; D:=2; A = B? A := 1; F E := 3; 24

Global Optimizations vs Local Optimizations Consider A := B+C; D := B+C; if A > 0 then E := B+C; endif; 1st CSE detected with a local optimization The second requires a global one 25

Loop optimizations Invariant expressions within a loop while J > I loop C := 8 * I; A(J) := C; J := J - 2; Should c:= 8 * I happen each iteration? Can we move it out of the loop? C := 8 * I; while J > I loop A(J) := C; J := J - 2; 26

Loop optimizations Invariant expressions within a loop Can we move it out of the loop? C:= 3; J = 1; I = 10; /* Values before loop */ while J > l loop C := 8*I; C:=8*I; while J > I loop A(J):=C; A(J) := C; J := J 2; J := J - 2; R:=C; R := C; What value is R with our optimization? What value is R without it? 27

Loop optimizations Invariant expressions within a loop while J > I loop A(J) := 10/I; J := J - 2; Should 10/I happen each iteration? Can we move it out of the loop? We are only moving a subexpression that isn't used elsewhere. 28

Loop optimizations Invariant expressions within a loop I := 0; J := -3; while J > I loop A(J) := 10/I; J := J - 2; Should 10/I happen each iteration? Can we move it out of the loop? We are only moving a subexpression that isn't used elsewhere. What happens? 29

Loop optimizations Invariant expressions within a loop loop I = 1 to 1,000,000 while j > I loop SomeExpensiveCalculationThatdoesn'tusej J := J - 2; end loop Should we move the expensive calculation? 30

Loop optimizations Invariant expressions within a loop j := 3; loop I = 1 to 1,000,000 while j > I loop SomeExpensiveCalculationThatdoesn'tusej J := J - 2; end loop What happens if we move it? 31

Loop optimizations Invariant expressions within a loop for (i = 0, i < N, i++) b = e * f; c(i) = b * i; end if What is happening? C(0) = b*0 = 0 C(1) = b*1 = 0+b C(2) = b*2 = 0+b+b C(3) = b*3 = 0+b+b+b... Can change to an add, strength reduction What is consequence of changing to an add if b were int or real? 32

The Truth About Global Optimization Global optimization is complex, expensive and sometimes unsafe Effect of calls must be considered when performing other optimizations Most programs don't need optimized Optimization can save 25-50% (speed & size) A better algorithm is much more effective! Loops & function calls best place to apply 33

Loop Optimization for i = 1..100 loop for j = 1..100 loop for k = 1..100 loop A(i,j,k) := i*j*k; for i = 1..100 loop for j = 1..100 loop for k = 1..100 loop A(i)(j)(k) := (i*j)*k; 3,000,000 subscripting 2,000,000 mutliplies 34

Loop Optimization: Factor inner loop for i = 1..100 loop for j = 1..100 loop for k = 1..100 loop A(i,j,k) := i*j*k; for i = 1..100 loop for j = 1..100 loop temp1 := adr(a(i)(j)); temp2 := i*j; for k = 1..100 loop temp1(k) := temp2*k; 3,000,000 subscripting 2,000,000 mutliplies factor inner loop 1,020,000 subscripts 1,010,000 mults 35

Loop Optimization for i = 1..100 loop for j = 1..100 loop temp1 := adr(a(i)(j)); temp2 := i*j; for k = 1..100 loop temp1(k) := temp2*k; for i = 1..100 loop temp3 := adr(a(i)) for j = 1..100 loop temp1 := adr(temp3(j)); temp2 := i*j; for k = 1..100 loop temp1(k) := temp2*k; factor inner loop 1,020,000 subscripts 1,010,000 mults factor 2nd loop Add? Subscripts? 1,010,100 Mults? 1,010,000 36

Loop Optimization: Strength Reduction for i = 1..100 loop temp3 := adr(a(i)) for j = 1..100 loop temp1 := adr(temp3(j)); temp2 := i*j; for k = 1..100 loop temp1(k) := temp2*k; for i = 1..100 loop temp3 := adr(a(i)) temp4 := I; -- init i*j for j = 1..100 loop temp1 := adr(temp3(j)); temp2 := temp4; -- t4 == i*j temp5 := temp2; -- init temp2*k for k = 1..100 loop temp1(k) := temp5; temp5 := temp5 + temp2; temp4 := temp4+i Add? Subscripts? Mults? 37

Loop Optimization: Copy Propagation for i = 1..100 loop temp3 := adr(a(i)) temp4 := I; -- init i*j for j = 1..100 loop temp1 := adr(temp3(j)); temp2 := temp4; -- t4 == i*j temp5 := temp2; -- init temp2*k for k = 1..100 loop temp1(k) := temp5; temp5 := temp5 + temp2; temp4 := temp4+i for i = 1..100 loop temp3 := adr(a(i)) temp4 := I; -- initial value of i*j for j = 1..100 loop temp1 := adr(temp3(j)); -- temp2 := temp4; -- t4 == i*j temp5 := temp4; for k = 1..100 loop temp1(k) := temp5 -- holds I*j*K; temp5 := temp5 + temp4; temp4 := temp4+i Add? Subscripts? Mults? 38

Loop Optimization: Expand Subscripting Code for i = 1..100 loop temp3 := adr(a(i)) temp4 := I; -- initial value of i*j for j = 1..100 loop temp1 := adr(temp3(j)); -- temp2 := temp4; -- t4 == i*j temp5 := temp4; for k = 1..100 loop temp1(k) := temp5; -- holds I*j*K temp5 := temp5 + temp4; temp4 := temp4+i for i = 1..100 loop temp3 := A0+(10000*I)-10000; temp4 := I; -- initial value of i*j for j = 1..100 loop temp1 := temp3+(100*j)-100; -- temp4 holds i*j temp5 := temp4; for k = 1..100 loop (temp1+k-1) := temp5; -- holds i*j*k temp5 := temp5 + temp4; temp4 := temp4+i Add? Subscripts? Mults? 39

Strength Reduction on Subscripting code for i = 1..100 loop temp3 := A0+(10000*I)-10000; temp4 := I; -- initial value of i*j for j = 1..100 loop temp1 := temp3+(100*j)-100; -- temp4 holds i*j temp5 := temp4; for k = 1..100 loop (temp1+k-1) := temp5; -- holds i*j*k temp5 := temp5 + temp4; temp4 := temp4+i temp6:=a0; for i = 1..100 loop temp3 := temp6; temp4 := I; -- initial value of i*j temp7 := temp3; initial value of Adr(A(i)(j)) for j = 1..100 loop temp1 :=temp7; temp5 := temp4;-- initial value of temp4*k temp8 := temp1 -- initial value of Adr(A(i)(j)(k) for k = 1..100 loop temp8 := temp5 -- temp5 holds i*j*k temp5 := temp5 + temp4; temp8 := temp8 + 1; temp4 := temp4+i temp7 := temp7+100; temp6 := temp6+1000; Add? Subscripts? Mults? 40

Copy Propogation temp6:=a0; for i = 1..100 loop temp3 := temp6; temp4 := I; -- initial value of i*j temp7 := temp3; initial value of Adr(A(i)(j)) for j = 1..100 loop temp1 :=temp7; temp5 := temp4;-- initial value of temp4*k temp8 := temp1 -- initial value of Adr(A(i)(j)(k) for k = 1..100 loop temp8 := temp5 -- temp5 holds i*j*k temp5 := temp5 + temp4; temp8 := temp8 + 1; temp4 := temp4+i temp7 := temp7+100; temp6 := temp6+1000; temp6:=a0; for i = 1..100 loop -- temp3 := temp6; temp4 := I; -- initial value of i*j temp7 := temp6; initial value of Adr(A(i)(j)) for j = 1..100 loop -- temp1 :=temp7; temp5 := temp4;-- initial value of temp4*k temp8 := temp7 -- initial value of Adr(A(i)(j)(k) for k = 1..100 loop temp8 := temp5 -- temp5 holds i*j*k temp5 := temp5 + temp4; temp8 := temp8 + 1; temp4 := temp4+i temp7 := temp7+100; temp6 := temp6+1000; Add? Subscripts? Mults? 41

Loop Optimization for i = 1..100 loop for j = 1..100 loop for k = 1..100 loop A(i,j,k) := i*j*k; 3,000,000 subscripting 2,000,000 mutliplies 42 Adds 0 Subscripts 3,000,000 Mults 2,000,000 Assigns 1,000,000 temp6:=a0; for i = 1..100 loop -- temp3 := temp6; temp4 := I; -- initial value of i*j temp7 := temp6; initial value of Adr(A(i)(j)) for j = 1..100 loop -- temp1 :=temp7; temp5 := temp4;-- initial value of temp4*k temp8 := temp7 -- initial value of Adr(A(i)(j)(k) for k = 1..100 loop temp8 := temp5 -- temp5 holds i*j*k temp5 := temp5 + temp4; temp8 := temp8 + 1; temp4 := temp4+i temp7 := temp7+100; temp6 := temp6+1000; Add 2,020,200 Subscripts 0 Mults 0 Assigns 3,040,301

Data Flow Analysis Determine which variables are live going in and out of a block These tools allow deeper analysis A := D if D = B then if A = B then B := 1 B := 1 else C:= 1 else C := 1 end if; end if; A := D+B; A := A + B A only used twice between assignments, using copy propagation reduces to zero so we can remove the assignment 43

Data Flow Analysis Removing dead code If (debug) printf(... This will never get executed if you have Debug := false; prior to the if 44

Optimizations for Machine Code Filling load and branch delays May be competing with HW scheduler For CISC/VLIW replace multiple instructions with more complex instructions Loop unrolling Taking advantage of memory accesses Understanding pipelines Multiple cores 45