CSE 401/M501 Compilers

Similar documents
CSE 401 Compilers. Code Shape I Basic Constructs Hal Perkins Winter UW CSE 401 Winter 2015 K-1

CSE 401/M501 Compilers

CSE 401/M501 Compilers

CSE 401/M501 Compilers

CSE P 501 Exam Sample Solution 12/1/11

CSE 401 Compilers. x86-64 Lite for Compiler Writers A quick (a) introduccon or (b) review. Hal Perkins Winter UW CSE 401 Winter 2017 J-1

CSC 252: Computer Organization Spring 2018: Lecture 6

CSE P 501 Exam 12/1/11

L08: Assembly Programming II. Assembly Programming II. CSE 351 Spring Guest Lecturer: Justin Hsia. Instructor: Ruth Anderson

x86-64 Programming III & The Stack

Assembly Programming IV

CS201- Lecture 7 IA32 Data Access and Operations Part II

x86 Programming I CSE 351 Winter

The Stack & Procedures

x86 64 Programming I CSE 351 Autumn 2017 Instructor: Justin Hsia

x86 Programming II CSE 351 Winter

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

x86 64 Programming II

Recitation #6. Architecture Lab

Changes made in this version not seen in first lecture:

CSE 401/M501 Compilers

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Assembly II: Control Flow

MACHINE LEVEL PROGRAMMING 1: BASICS

x86-64 Programming III

x86 64 Programming I CSE 351 Autumn 2018 Instructor: Justin Hsia

CSE P 501 Compilers. x86-64 Lite for Compiler Writers A quick (a) introduceon (b) review. Hal Perkins Winter UW CSE P 501 Winter 2016 J-1

L09: Assembly Programming III. Assembly Programming III. CSE 351 Spring Guest Lecturer: Justin Hsia. Instructor: Ruth Anderson

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

Assembly Programming IV

CS 432 Fall Mike Lam, Professor. Code Generation

CSE 401/M501 Compilers

Intel x86-64 and Y86-64 Instruction Set Architecture

CS 3330 Exam 3 Fall 2017 Computing ID:

Question Points Score Total: 100

Machine-Level Programming II: Control

Lecture 3 CIS 341: COMPILERS

CS367. Program Control

Areas for growth: I love feedback

Control flow (1) Condition codes Conditional and unconditional jumps Loops Conditional moves Switch statements

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

C to Machine Code x86 basics: Registers Data movement instructions Memory addressing modes Arithmetic instructions

Machine-Level Programming II: Control

Code Generation: An Example

x86-64 Programming II

x86 Programming I CSE 351 Autumn 2016 Instructor: Justin Hsia

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

Compiling C Programs Into X86-64 Assembly Programs

Compiler Design. Code Shaping. Hwansoo Han

The Stack & Procedures

THE UNIVERSITY OF BRITISH COLUMBIA CPSC 261: MIDTERM 1 February 14, 2017

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks.

last time Assembly part 2 / C part 1 condition codes reminder: quiz

x86 Programming III CSE 351 Autumn 2016 Instructor: Justin Hsia

Assembly Language: Function Calls

CSE 582 Autumn 2002 Exam 11/26/02

Assembly Programming III

Machine-Level Programming (2)

Midterm 1 topics (in one slide) Bits and bitwise operations. Outline. Unsigned and signed integers. Floating point numbers. Number representation

CSE 351 Midterm - Winter 2015

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 5

Changelog. Changes made in this version not seen in first lecture: 26 October 2017: slide 28: remove extraneous text from code

Changelog. Performance. locality exercise (1) a transformation

Machine/Assembler Language Putting It All Together

CSE351 Spring 2018, Midterm Exam April 27, 2018

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

CSE 401 Final Exam. December 16, 2010

Stack Frame Components. Using the Stack (4) Stack Structure. Updated Stack Structure. Caller Frame Arguments 7+ Return Addr Old %rbp

CS415 Compilers. Intermediate Represeation & Code Generation

CSE 351 Midterm - Winter 2015 Solutions

Pipelining and Vector Processing

CS 3330 Exam 2 Fall 2017 Computing ID:

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Corrections made in this version not seen in first lecture:

Optimization part 1 1

CS429: Computer Organization and Architecture

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

2/12/2016. Today. Machine-Level Programming II: Control. Condition Codes (Implicit Setting) Processor State (x86-64, Partial)

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

%r8 %r8d. %r9 %r9d. %r10 %r10d. %r11 %r11d. %r12 %r12d. %r13 %r13d. %r14 %r14d %rbp. %r15 %r15d. Sean Barker

Instructions and Instruction Set Architecture

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

CS 406/534 Compiler Construction Putting It All Together

Intermediate Code Generation

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

CS 3330 Exam 1 Fall 2017 Computing ID:

Compiler Construction D7011E

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

CSE 5317 Midterm Examination 4 March Solutions

Machine-Level Programming III: Procedures

Transcription:

CSE 401/M501 Compilers Code Shape I Basic Constructs Hal Perkins Autumn 2018 UW CSE 401/M501 Autumn 2018 K-1

Administrivia Semantics/type check due next Thur. 11/15 How s it going? Be sure to (re-)read the MiniJava project overview carefully as well as the semantics/type-checking assignment to be sure you catch all the things in MiniJava Midterm: a bit too long, but in the end the scores were very good Regrades: check solution first, then submit via gradescope if we goofed Perspective: how to look at the results, course grades, etc. UW CSE 401/M501 Autumn 2018 K-2

Agenda Mapping source code to x86-64 Mapping for other common architectures is similar This lecture: basic statements and expressions We ll go quickly since this is review for many, fast orientation for others, and pretty straightforward Next: Object representation, method calls, and dynamic dispatch Later: specific details for project (in lecture or sections depending on timing) *Footnote: These slides include more than is specifically needed for the course project UW CSE 401/M501 Autumn 2018 K-3

Review: Variables For us, all data will be either: In a stack frame (method local variables) In an object (instance variables) Local variables accessed via %rbp movq -16(%rbp),%rax Object instance variables accessed via an offset from an object address in a register Details later UW CSE 401/M501 Autumn 2018 K-4

Conventions for Examples Examples show code snippets in isolation Much the way we ll generate code for different parts of the AST in a compiler visitor pass A different perspective from the 351 holistic view Register %rax used here as a generic example Rename as needed for more complex code using multiple registers 64-bit data used everywhere A few peephole optimizations shown to suggest what s possible Some might be fairly easy to do in the compiler project UW CSE 401/M501 Autumn 2018 K-5

What we re skipping for now Real code generator needs to deal with many things like: Which registers are busy at which point in the program Which registers to spill into memory when a new register is needed and no free ones are available Dealing with different sizes of data Exploiting the full instruction set UW CSE 401/M501 Autumn 2018 K-6

Code Generation for Constants Source 17 x86-64 movq $17,%rax Idea: realize constant value in a register Optimization: if constant is 0 xorq %rax,%rax (but some processors do better with movq $0,%rax and this has changed over time, too) UW CSE 401/M501 Autumn 2018 K-7

Assignment Statement Source var = exp; x86-64 <code to evaluate exp into, say, %rax> movq %rax,offset var (%rbp) UW CSE 401/M501 Autumn 2018 K-8

Unary Minus Source -exp x86-64 <code evaluating exp into %rax> negq %rax Optimization Collapse -(-exp) to exp Unary plus is a no-op UW CSE 401/M501 Autumn 2018 K-9

Binary + Source exp 1 + exp 2 x86-64 <code evaluating exp 1 into %rax> <code evaluating exp 2 into %rdx> addq %rdx,%rax UW CSE 401/M501 Autumn 2018 K-10

Binary + Some optimizations If exp 2 is a simple variable or constant, don t need to load it into another register first. Instead: addq exp 2,%rax Change exp 1 + (-exp 2 ) into exp 1 -exp 2 If exp 2 is 1 incq %rax Somewhat surprising: whether this is better than addq $1,%rax depends on processor implementation and has changed over time UW CSE 401/M501 Autumn 2018 K-11

Binary -, * Same as + Use subq for (but not commutative!) Use imulq for * Some optimizations Use left shift to multiply by powers of 2 If your multiplier is slow or you ve got free scalar units and multiplier is busy, you can do 10*x = (8*x)+(2*x) But might be slower depending on microarchitecture Use x+x instead of 2*x, etc. (often faster) Can use leaq (%rax,%rax,4),%rax to compute 5*x, then addq %rax,%rax to get 10*x, etc. etc. Use decq for x-1 (but check: subq $1 might be faster) UW CSE 401/M501 Autumn 2018 K-12

Signed Integer Division Ghastly on x86-64 Only works on 128-bit int divided by 64-bit int (similar instructions for 64-bit divided by 32-bit in 32-bit x86) Requires use of specific registers Source exp 1 / exp 2 x86-64 <code evaluating exp 1 into %rax ONLY> <code evaluating exp 2 into %rbx> cqto # extend to %rdx:%rax, clobbers %rdx idivq %rbx # quotient in %rax, remainder in %rdx UW CSE 401/M501 Autumn 2018 K-13

Control Flow Basic idea: decompose higher level operation into conditional and unconditional gotos In the following, j false is used to mean jump when a condition is false No such instruction on x86-64 Will have to realize with appropriate sequence of instructions to set condition codes followed by conditional jumps Normally don t need to actually generate the value true or false in a register But this is a useful shortcut hack for the project UW CSE 401/M501 Autumn 2018 K-14

While Source while (cond) stmt x86-64 test: <code evaluating cond> j false done <code for stmt> jmp test done: Note: In generated asm code we will need to have unique labels for each loop, conditional statement, etc. UW CSE 401/M501 Autumn 2018 K-15

Aside Instruction execution Actual execution of an instruction has multiple steps/phases inside a processor. Fairly typical steps for a simple processor: IF: instruction fetch. Load instruction from memory/cache into internal processor register(s) ID: instruction decode / read operand registers EX: execute or calculate memory addresses MEM: access memory (not all instructions) WB: write back store result (x86-64 is waaaaay more complex, but basic ideas are the same) See 351 textbook, sec. 4.4, 4.5, etc. for more details UW CSE 401/M501 Autumn 2018 K-16

Pipelining (on 1 slide, oversimplified) If instructions are independent, we can execute them on an assembly line start processing the next one while previous one is in some later stage. Ideally we could overlap like this: 1. IF ID EX MEM WB 2. IF ID EX MEM WB 3. IF ID EX MEM WB 4. IF ID EX MEM WB 5. IF ID Modern processors have multiple function units and buffers to support this UW CSE 401/M501 Autumn 2018 K-17

Pipelining bottlenecks This strategy works great if the instructions are independent. Things that cause problems: Output of one instruction needed for next one: next one can t proceed until data is available from earlier one Jumps: If there s a conditional jump, the processor has to either stall the pipeline until we decide whether to jump, or make a guess and be prepared to undo if it guesses wrong Processors have lots of hardware to try to guess right and avoid delays caused by these dependencies, but Compilers can help the processor by generating code to minimize these issues. UW CSE 401/M501 Autumn 2018 K-18

Optimization for While Put the test at the end: loop: test: jmp test <code for stmt> <code evaluating cond> j true loop Why bother? Pulls one instruction (jmp) out of the loop Avoids a pipeline stall on jmp on each iteration Although modern processors will often predict control flow and avoid the stall x86-64 does this particularly well Easy to do from AST or other IR; not so easy if generating code on the fly (e.g., recursive descent 1-pass compiler) UW CSE 401/M501 Autumn 2018 K-19

Do-While Source x86-64 do stmt while(cond) loop: <code for stmt> <code evaluating cond> j true loop UW CSE 401/M501 Autumn 2018 K-20

If Source x86-64 if (cond) stmt <code evaluating cond> j false skip <code for stmt> skip: UW CSE 401/M501 Autumn 2018 K-21

If-Else Source if (cond) stmt 1 else stmt 2 x86-64 <code evaluating cond> j false else <code for stmt 1 > jmp done else: <code for stmt 2 > done: UW CSE 401/M501 Autumn 2018 K-22

Jump Chaining Observation: naïve implementation can produce jumps to jumps (if-else if- -else; or nested loops or conditionals, ) Optimization: if a jump has as its target an unconditional jump, change the target of the first jump to the target of the second Repeat until no further changes Often done in peephole optimization pass after initial code generation UW CSE 401/M501 Autumn 2018 K-23

Boolean Expressions What do we do with this? x > y It is an expression that evaluates to true or false Could generate the value (0/1 or whatever the local convention is) But normally we don t want/need the value we re only trying to decide whether to jump (Although for our project we might simplify and always produce the value) UW CSE 401/M501 Autumn 2018 K-24

Code for exp1 > exp2 Basic idea: Generated code depends on context: What is the jump target? Jump if the condition is true or if false? Example: evaluate exp1 > exp2, jump on false, target if jump taken is L123 <evaluate exp1 to %rax> <evaluate exp2 to %rdx> cmpq %rdx,%rax # dst-src = exp1-exp2 jng L123 UW CSE 401/M501 Autumn 2018 K-25

Boolean Operators:! Source! exp Context: evaluate exp and jump to L123 if false (or true) To compile!, just reverse the sense of the test: evaluate exp and jump to L123 if true (or false) UW CSE 401/M501 Autumn 2018 K-26

Boolean Operators: && and In C/C++/Java/C#/many others, these are short-circuit operators Right operand is evaluated only if needed Basically, generate the if statements that jump appropriately and only evaluate operands when needed UW CSE 401/M501 Autumn 2018 K-27

Example: Code for && Source if (exp 1 && exp 2 ) stmt x86-64 <code for exp 1 > j false skip <code for exp 2 > j false skip <code for stmt> skip: UW CSE 401/M501 Autumn 2018 K-28

Example: Code for Source if (exp 1 exp 2 ) stmt x86-64 <code for exp 1 > j true doit <code for exp 2 > j false skip doit: <code for stmt> skip: UW CSE 401/M501 Autumn 2018 K-29

Realizing Boolean Values If a boolean value needs to be stored in a variable or method call parameter, generate code needed to actually produce it Typical representations: 0 for false, +1 or -1 for true C specifies 0 and 1; we ll use that Best choice can depend on machine instructions & language; normally some convention is picked during the primeval history of the architecture UW CSE 401/M501 Autumn 2018 K-30

Boolean Values: Example Source var = bexp; x86-64 <code for bexp> j false genfalse movq $1,%rax jmp store genfalse: movq $0,%rax # or xorq store: movq %rax,offset var (%rbp) # generated by asg stmt UW CSE 401/M501 Autumn 2018 K-31

Better, If Enough Registers Source var = bexp; x86-64 xorq %rax,%rax # or movq $0,%rax <code for bexp> j false store incq %rax # or movq $1,%rax store: movq %rax,offset var (%rbp) # generated by asg Better: use movecc instruction to avoid conditional jump Can also use conditional move instruction for sequences like x = y<z? y : z UW CSE 401/M501 Autumn 2018 K-32

Better yet: setcc Source var = x < y; x86-64 movq offset x (%rbp),%rax # load x cmpq offset y (%rbp),%rax # compare to y setl %al # set low byte %rax to 0/1 movzbq %al,%rax # zero-extend to 64 bits movq %rax,offset var (%rbp) # gen. by asg stmt UW CSE 401/M501 Autumn 2018 K-33

Other Control Flow: switch Naïve: generate a chain of nested if-else if statements Better: switch statement is intended to allow easier generation of O(1) selection, provided the set of switch values is reasonably compact Idea: create a 1-D array of jumps or labels and use the switch expression to select the right one Need to generate the equivalent of an if statement to ensure that expression value is within bounds UW CSE 401/M501 Autumn 2018 K-34

Switch Source switch (exp) { case 0: stmts 0 ; case 1: stmts 1 ; case 2: stmts 2 ; } break is an unconditional jump to the end of switch x86-64: <put exp in %rax> if (%rax < 0 %rax > 2) jmp defaultlabel movq swtab(,%rax,8),%rax jmp *%rax.data swtab:.quad L0.quad L1.quad L2.text L0: <stmts 0 > L1: <stmts 1 > L2: <stmts 2 > UW CSE 401/M501 Autumn 2018 K-35

Arrays Several variations C/C++/Java 0-origin: an array with n elements contains variables a[0] a[n-1] 1 dimension (Java); 1 or more dimensions using row major order (C/C++) Key step is evaluate subscript expression, then calculate the location of the corresponding array element UW CSE 401/M501 Autumn 2018 K-36

0-Origin 1-D Integer Arrays Source x86-64 exp 1 [exp 2 ] <evaluate exp 1 (array address) in %rax> <evaluate exp 2 in %rdx> address is (%rax,%rdx,8) # if 8 byte elements UW CSE 401/M501 Autumn 2018 K-37

2-D Arrays Subscripts start with 0 (default) C/C++, etc. use row-major order E.g., an array with 3 rows and 2 columns is stored in sequence: a(0,0), a(0,1), a(1,0), a(1,1), a(2,0), a(2,1) Fortran uses column-major order Exercises: What is the layout? How do you calculate location of a[i][j]? What happens when you pass array references between Fortran and C/C++ code? Java does not have real 2-D arrays. A Java 2-D array is a pointer to a list of pointers to the rows And rows may have different lengths (ragged arrays) UW CSE 401/M501 Autumn 2018 K-38

a[i][j] in C/C++/etc. If a is a real 0-origin, 2-D array, to find a[i][j], we need to know: Values of i and j How many columns (but not rows!) the array has Location of a[i][j] is: Location of a + (i*(#of columns) + j) * sizeof(elt) Can factor to pull out allocation-time constant part and evaluate that once no recalculating at runtime; only calculate part depending on i, j Details in most compiler books UW CSE 401/M501 Autumn 2018 K-39

Coming Attractions Code Generation for Objects Representation Method calls Inheritance and overriding Strategies for implementing code generators Code improvement optimization UW CSE 401/M501 Autumn 2018 K-40