Compiler Design. Code Shaping. Hwansoo Han

Similar documents
Code Shape II Expressions & Assignment

CS415 Compilers. Intermediate Represeation & Code Generation

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

(just another operator) Ambiguous scalars or aggregates go into memory

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Implementing Control Flow Constructs Comp 412

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Handling Assignment Comp 412

Arrays and Functions

Intermediate Code Generation

CSE 504: Compiler Design. Runtime Environments

Intermediate Code Generation (ICG)

Compiling Techniques

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

The structure of a compiler

Intermediate Code Generation

Intermediate Representations

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Instruction Selection: Preliminaries. Comp 412

CS 432 Fall Mike Lam, Professor. Code Generation

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

The Procedure Abstraction

Intermediate Representations

CSE 401/M501 Compilers

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Intermediate Code Generation

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Instruction Selection and Scheduling

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

CS5363 Final Review. cs5363 1

Intermediate Representations Part II

Intermediate Representations & Symbol Tables

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

CSE 401/M501 Compilers

Optimizer. Defining and preserving the meaning of the program

Instruction Selection, II Tree-pattern matching

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Intermediate Representations

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Code Generation. Dragon: Ch (Just part of it) Holub: Ch 6.

Instruction Sets: Characteristics and Functions Addressing Modes

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

CODE GENERATION Monday, May 31, 2010

Naming in OOLs and Storage Layout Comp 412

Review of the C Programming Language

Alternatives for semantic processing

Semantic actions for declarations and expressions

Semantic actions for expressions

Semantic actions for declarations and expressions. Monday, September 28, 15

Lecture 4: Instruction Set Design/Pipelining

CS 406/534 Compiler Construction Putting It All Together

Runtime Support for Algol-Like Languages Comp 412

Control Abstraction. Hwansoo Han

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

Semantic actions for declarations and expressions

Instruction Sets Ch 9-10

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Module 27 Switch-case statements and Run-time storage management

Instruction Sets Ch 9-10

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

Type Checking Binary Operators

1 Lexical Considerations

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Compiler Optimization Techniques

Instruction Scheduling

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

CS 314 Principles of Programming Languages. Lecture 13

Review of the C Programming Language for Principles of Operating Systems

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Goals of Program Optimization (1 of 2)

Lexical Considerations

Compilers. Lecture 2 Overview. (original slides by Sam

CMSC 611: Advanced Computer Architecture

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University

We can emit stack-machine-style code for expressions via recursion

CSE P 501 Exam Sample Solution 12/1/11

Intermediate Representation (IR)

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Compilers. Compiler Construction Tutorial The Front-end

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Instruction Set II. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 7. Instruction Set. Quiz. What is an Instruction Set?

Procedure and Object- Oriented Abstraction

Tour of common optimizations

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

The SPL Programming Language Reference Manual

Compiler Code Generation COMP360

Parsing II Top-down parsing. Comp 412

Addressing Modes. Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack

Transcription:

Compiler Design Code Shaping Hwansoo Han

Code Shape Definition All those nebulous properties of the code that impact performance & code quality Includes code, approach for different constructs, cost, storage requirements & mapping, & choice of operations Code shape is the end product of many decisions Code shape influences algorithm choice & results Code shape can encode important facts, or hide them Expose as much derived information as possible 2

Code Shape (example 1) An example x + y + z x + y t1 x + z t1 y + z t1 t1+ z t2 t1+ y t2 t1+ z t2 z x y z x y What if x is 2 and z is 3? What if y+z is evaluated earlier? x z y y z Addition is commutative & associative for integers x The best shape for x+y+z depends on contextual knowledge There may be several conflicting options May need to consider resolution of data type (e.g. float) May need to consider common subexpression elimination 3

Code Shape (example 2) Another example -- the case statement Implement it as cascaded if-then-else statements Cost depends on where your case actually occurs O(number of cases) Implement it as a binary search Less dense set of conditions, and too many for linear search Uniform (log n) cost Implement it as a jump table Lookup address in a table & jump to it Need a dense set of conditions to search Uniform (constant) cost Compiler must choose the best implementation strategy 4

Program Construct Expressions Types Scalar variable, Array, structure Control statements Conditionals (if-then-else, switch) Loops Function calls 5

Generating Code for Expressions The key code quality issue is holding values in registers When can a value be safely allocated to a register? When only 1 name can reference its value Pointers, parameters, aggregates & arrays all cause trouble When should a value be allocated to a register? When it is both safe & profitable Encoding this knowledge into the IR Assign a virtual register to anything that can go into register Relies on a strong register allocator Load or store the others at each reference 6

Generating Code for Expressions expr(node) { int result, t1, t2; switch (type(node)) { case,,, : t1 expr(left child(node)); t2 expr(right child(node)); result NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1 base(node); t2 offset(node); result NextRegister(); emit (loadao, t1, t2, result); break; case NUMBER: result NextRegister(); emit (loadi, val(node), none, result); break; } return result; } The concept Use a simple treewalk evaluator Bury complexity in routines > base(), offset(), & val() Implements expected behavior > Visits & evaluates children > Emits code for the op itself > Returns register with result Works for simple expressions Easily extended to other operators Does not handle control flow 7

Example (1) expr(node) { int result, t1, t2; switch (type(node)) { case,,, : t1 expr(left child(node)); t2 expr(right child(node)); result NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1 base(node); t2 offset(node); result NextRegister(); emit (loadao, t1, t2, result); break; case NUMBER: result NextRegister(); emit (loadi, val(node), none, result); break; } return result; } Example: Produces: expr( x ) x loadi @x r1 loadao r arp, r1 r2 expr( y ) loadi @y r3 loadao r arp, r3 r4 NextRegister() r5 emit(add,r2,r4,r5) add r2, r4 r5 y 8

Example (2) expr(node) { int result, t1, t2; switch (type(node)) { case,,, : t1 expr(left child(node)); t2 expr(right child(node)); result NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1 base(node); t2 offset(node); result NextRegister(); emit (loadao, t1, t2, result); break; case NUMBER: result NextRegister(); emit (loadi, val(node), none, result); break; } return result; } Example: Generates: x 2 loadi @x r1 loadao r0, r1 r2 loadi 2 r3 loadi @y r4 loadao r0,r4 r5 mult r3, r5 r6 sub r2, r6 r7 r0 = r arp y 9

Extending the Simple Treewalk (1) What about values in registers? Modify the IDENTIFIER case Already in a register return the register name Not in a register load it as before, but record the fact Choose names to avoid creating false dependences (anti-, output-) What about parameter values? Many linkages pass the first several values in registers Accessing parameters Call-by-value just a local variable with negative offset Call-by-reference needs an extra indirection What about function calls in expressions? Generate the calling sequence & load the return value Severely limits compiler s ability to reorder operations 10

Extending the Simple Treewalk (2) Adding other operators Evaluate the operands, then perform the operation Complex operations may turn into library calls Handle assignment as an operator Mixed-type expressions Insert conversions as needed from conversion table Most languages have symmetric & rational conversion tables Typical Addition Table + Integer Real Double Complex Integer Integer Real Double Complex Real Real Real Double Complex Double Double Double Double Complex Complex Complex Complex Complex Complex 11

Extending the Simple Treewalk (3) What about evaluation order? Can use commutativity & associativity to improve code This problem is truly hard What about order of evaluating operands? 1 st operand must be preserved while 2 nd is evaluated Takes an extra register for 1 st operand Should evaluate more demanding operand expression first 12

Handling Assignment lhs rhs Treat as just another operator Evaluate rhs to a value (an rvalue) Evaluate lhs to a location (an lvalue) lvalue is a register move rhs lvalue is an address store rhs If rvalue & lvalue have different types Evaluate rvalue to its natural type Convert that value to the type of *lvalue Location Register: scalars with no aliases Memory: scalars with possible aliases, aggregates such as structures, arrays 13

Generating Code in the Parser Need to generate an initial IR form ASTs vs. 3-address code Generate an AST Use AST for some high-level, near-source work (type checking, optimization), Then traverse AST and emit a lower-level IR The big picture Recursive algorithm really works bottom-up Actions on non-leaves occur after children are done Can encode the same basic structure into ad-hoc SDT scheme Identifiers load themselves & stack virtual register name Operators emit appropriate code & stack resulting VR name Assignment requires evaluation to an lvalue or an rvalue 14

Ad-hoc SDT vs. Recursive Treewalk Goal : Expr { $$ = $1; } ; Expr: Expr PLUS Term { t = NextRegister(); emit(add,$1,$3,t); $$ = t; } Expr MINUS Term { } Term { $$ = $1; } ; Term: Term TIMES Factor { t = NextRegister(); emit(mult,$1,$3,t); $$ = t; }; Term DIVIDES Factor { } Factor { $$ = $1; }; Factor: NUMBER { t = NextRegister(); emit(loadi,val($1),none, t ); $$ = t; } ID { t1 = base($1); t2 = offset($1); t = NextRegister(); emit(loadao,t1,t2,t); $$ = t; } expr(node) { int result, t1, t2; switch (type(node)) { case,,, : t1 expr(left child(node)); t2 expr(right child(node)); result NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1 base(node); t2 offset(node); result NextRegister(); emit (loadao, t1, t2, result); break; case NUMBER: result NextRegister(); emit (loadi, val(node), none, result); break; } return result; } 15

Handle Arrays - A[i,j] Row-major order (most languages) Lay out as a sequence of consecutive rows Rightmost subscript varies fastest A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3] Column-major order (Fortran) Lay out as a sequence of columns Leftmost subscript varies fastest A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3] Indirection vectors (Java) Vector of pointers to pointers to to values Takes much more space, trades indirection for arithmetic Everything is 1D vector conceptually simple design Not amenable to analysis 16

Laying Out Arrays The Concept A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 These have distinct & different cache behavior Row-major order A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Column-major order A 1,1 2,1 1,2 2,2 1,3 2,3 1,4 2,4 Indirection vectors 1,1 1,2 1,3 1,4 A 2,1 2,2 2,3 2,4 17

Computing an Array Address A[ i ] @A + ( i low ) x sizeof(a[1]) Almost always a power of 2, known at compile-time use a shift for speed In general: base(a) + ( i low ) x sizeof(a[1]) What about A[i 1,i 2 ]? int A[1:10] low is 1 Make low 0 for faster access (saves a ) Row-major order, two dimensions @A + (( i 1 low 1 ) x (high 2 low 2 + 1) + i 2 low 2 ) x sizeof(a[1]) Column-major order, two dimensions @A + (( i 2 low 2 ) x (high 1 low 1 + 1) + i 1 low 1 ) x sizeof(a[1]) Indirection vectors, two dimensions *(A[i 1 ])[i 2 ] where A[i 1 ] is, itself, a 1-d array reference Expensive! Require two memory references 18

Optimizing Address Calculation for A[i,j] In row-major order where w = sizeof(a[1,1]) @A + (i low 1 )(high 2 low 2 +1) x w + (j low 2 ) x w Which can be factored into @A + i x (high 2 low 2 +1) x w + j x w (low 1 x (high 2 low 2 +1) x w) + (low 2 x w) If low i, high i, and w are known, Define @A 0 = @A (low 1 x (high 2 low 2 +1) x w + low 2 x w len 2 = (high 2 -low 2 +1) Then, the address expression becomes @A 0 + (i x len 2 + j ) x w Compile-time constants 19

Array-valued Parameters Some languages rely on compiler to pass necessary information at function calls Whole arrays, as call-by-reference parameters Need dimension information build a dope vector Store the values in the calling sequence Pass the address of the dope vector in the parameter slot Generate complete address polynomial at each reference Some improvement is possible Save len i and low i rather than low i and high i Pre-compute the fixed terms in prologue sequence What about call-by-value? Most call-by-value languages pass arrays by reference This is a language design issue @A low 1 high 1 low 2 high 2 20

Array Address Calculations in a Loop DO J = 1, N A[I,J] = A[I,J] + B[I,J] END DO Naïve: Perform the address calculation twice DO J = 1, N R1 = @A 0 + (J x len 1 + I ) x floatsize R2 = @B 0 + (J x len 1 + I ) x floatsize MEM(R1) = MEM(R1) + MEM(R2) END DO 21

Array Address Calculations in a Loop DO J = 1, N A[I,J] = A[I,J] + B[I,J] END DO Move common calculations out of loop R1 = I x floatsize c = len 1 x floatsize! Compile-time constant R2 = @A 0 + R1 R3 = @B 0 + R1 DO J = 1, N a = J x c R4 = R2 + a R5 = R3 + a MEM(R4) = MEM(R4) + MEM(R5) END DO 22

Array Address Calculations in a Loop DO J = 1, N A[I,J] = A[I,J] + B[I,J] END DO Convert multiply to add (Operator Strength Reduction) R1 = I x floatsize c = len 1 x floatsize! Compile-time constant R2 = @A 0 + R1 ; R3 = @B 0 + R1 DO J = 1, N R2 = R2 + c R3 = R3 + c MEM(R2) = MEM(R2) + MEM(R3) END DO 23

Boolean & Relational Values (1) Numerical representation Assign values to TRUE and FALSE Use hardware AND, OR, and NOT operations Use comparison to get a boolean from a relational expression Examples cmp_lt r x,r y r 1 If (x < y) cbr r 1 L T,L F then becomes L T : stmt_t stmt_t br L E else L F : stmt_f stmt_f L E : other stmts 24

Boolean & Relational Values (2) Positional (implicit) representation ISA uses a condition code Compare provides enough info for every possible relation Must use a conditional branch to interpret result of compare Example: cmp r x,r y cc 1 If (x < y) cbr_lt cc 1 L T,L F then becomes L T : stmt_t stmt_t br L E else L F : stmt_f stmt_f L E : other stmts Condition codes are an architect s hack allow ISA to avoid some comparisons complicates code for simple cases 25

Conditional Move & Predication Conditional move & Predication both simplify if-then-else Example if (x < y) then a c + d else a e + f OTHER ARCHITECTURAL VARIATIONS Conditional Move Predicated Execution comp r x,r y cc 1 cmp_lt r x,r y r 1, r 2 add r c,r d r 1 (r 1 ) add r c,r d r a add r e,r f r 2 (r 2 ) add r e,r f r a i2i_lt cc 1,r 1,r 2 r a Both versions avoid the branches Both are shorter than CCs or Boolean-valued compare Are they better? 26

Control Flow If-then-else Follow model for evaluating relationals & booleans with branches Branching versus predication (e.g., IA-64) Frequency of execution Uneven distribution do what it takes to speed common case Amount of code in each case Unequal amounts means predication may waste issue slots Another control flow inside then-block or else-block Any branching activity within the block complicates the predicates and makes branches attractive 27

Control Flow - loop Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top (if needed) Merges post-test with last block of loop body Fill the delay slot of post-test branch with instructions in last block of loop body Pre-test Loop body while, for, do, & until all fit this basic model Post-test Next block 28

Loop Implementation Code for (i = 1; i< 100; i++) { body } next statement loadi 1 r 1 loadi 1 r 2 loadi 100 r 3 cmp_ge r 1, r 3 r 4 cbr r 4 L 2, L 1 L 1 : body Initialization Pre-test add r 1, r 2 r 1 cmp_lt r 1, r 3 r 5 cbr r 5 L 1, L 2 L 2 : next statement Post-test 29

Break & Continue Statements Many modern languages include a break Exits from the innermost control-flow statement Out of the innermost loop Out of a case statement Pre-test Translates into a jump Loop head Targets statement outside controlflow construct Creates multiple-exit construct Break in B 1 B 1 B 2 Skip in B 2 Skip in loop goes to next iteration Post-test Next block 30

Control Flow - case Case Statements 1 Evaluate the controlling expression 2 Branch to the selected case 3 Execute the code for that case 4 Branch to the statement after the case (use break) Parts 1, 3, & 4 are well understood, part 2 is the key Strategies Linear search (nested if-then-else constructs) Build a table of case expressions & binary search it Directly compute an address (requires dense case set) 31

Procedure Call register save If p calls q, one of them must Preserve register values in ARs (caller-saves vs. callee-saves) Space tradeoff Pre-call & post-return occur on every call Prolog & epilog occur once per procedure Moving operations into prolog/epilog saves space As register sets grow, save/restore code does, too Each register costs 2 operations Optimizations Hardware support for save/restore or storem/loadm Use a library routine for save/restore to save code space 32

Procedure Call - parameters Evaluating parameters Call by reference evaluate parameter to an lvalue Call by value evaluate parameter to an rvalue & store it Aggregates, arrays, & strings are usually call-by-reference Procedure-valued parameters Must pass starting address of procedure With access links, need the lexical level as well Procedure value is a tuple < level, address > Precall inserts the appropriate code to fetch level and adjust access link 33

Summary Expression Treewalk with unlimited number of virtual registers Use registers for multiple uses of same variable Array references Address calculations for element accesses Optimize array references within loop Boolean & relational value Numerical representation Positional (implicit) representation use condition code Control flow Patterns for If-then-else, Loop, & Case Procedure call Register save, Parameter passing 34