THEORY OF COMPILATION

Similar documents
Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

x86 assembly CS449 Fall 2017

Program Exploitation Intro

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

An Introduction to x86 ASM

Process Layout and Function Calls

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Second Part of the Course

CSC 8400: Computer Systems. Machine-Level Representation of Programs

Compiler Construction D7011E

How Software Executes

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

x86 assembly CS449 Spring 2016

Digital Forensics Lecture 3 - Reverse Engineering

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Practical Malware Analysis

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

Towards the Hardware"

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

The x86 Architecture

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB. Lab # 7. Procedures and the Stack

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Principles of Compiler Design

Summary: Direct Code Generation

Winter Compiler Construction T10 IR part 3 + Activation records. Today. LIR language

CS241 Computer Organization Spring 2015 IA

16.317: Microprocessor Systems Design I Fall 2015

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

Introduction to Reverse Engineering. Alan Padilla, Ricardo Alanis, Stephen Ballenger, Luke Castro, Jake Rawlins

THEORY OF COMPILATION

Low Level Programming Lecture 2. International Faculty of Engineerig, Technical University of Łódź

W4118: PC Hardware and x86. Junfeng Yang

Algorithms for Dynamic Memory Management (236780) Lecture 1

Computer Architecture and Assembly Language. Practical Session 3

Complex Instruction Set Computer (CISC)

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Inline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing

Assembly Language: IA-32 Instructions

Credits and Disclaimers

Lab 3. The Art of Assembly Language (II)

Module 3 Instruction Set Architecture (ISA)

Machine Code and Assemblers November 6

Assembly Language: Overview!

Assignment 11: functions, calling conventions, and the stack

Lecture 2 Assembly Language

x86 Assembly Crash Course Don Porter

16.317: Microprocessor Systems Design I Spring 2015

CPS104 Recitation: Assembly Programming

Chapter 11. Addressing Modes

CS Bootcamp x86-64 Autumn 2015

Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998

Reverse Engineering II: The Basics

Assembly Language: Function Calls

CS61 Section Solutions 3

Credits and Disclaimers

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

Assembly Language: Function Calls" Goals of this Lecture"

Communicating with People (2.8)

The Hardware/Software Interface CSE351 Spring 2013

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

System calls and assembler

Load Effective Address Part I Written By: Vandad Nahavandi Pour Web-site:

SOEN228, Winter Revision 1.2 Date: October 25,

Putting the pieces together

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

The Instruction Set. Chapter 5

History of the Intel 80x86

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Reverse Engineering II: The Basics

Systems Architecture I

How Software Executes

Lecture 4 CIS 341: COMPILERS

Sistemi Operativi. Lez. 16 Elementi del linguaggio Assembler AT&T

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

EECE416 :Microcomputer Fundamentals and Design. X86 Assembly Programming Part 1. Dr. Charles Kim

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

Machine-Level Programming II: Control Flow

Assembly Language: Overview

ASSEMBLY III: PROCEDURES. Jo, Heeseung

Computer Organization & Assembly Language Programming

Computer Science Final Examination Wednesday December 13 th 2006

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Introduction to IA-32. Jo, Heeseung

Assembly III: Procedures. Jo, Heeseung

6/20/2011. Introduction. Chapter Objectives Upon completion of this chapter, you will be able to:

Lab 2: Introduction to Assembly Language Programming

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

INTRODUCTION TO IA-32. Jo, Heeseung

Control flow. Condition codes Conditional and unconditional jumps Loops Switch statements

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

Transcription:

Lecture 10 Code Generation THEORY OF COMPILATION EranYahav Reference: Dragon 8. MCD 4.2.4 1

You are here Compiler txt Source Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR) Code Gen. exe Executable text code 2

Last Week: Runtime Part II Nested procedures Object layout Inheritance Multiple inheritance 3

Today Runtime checks Garbage collection Generating assembly code 4

Runtime checks generate code for checking attempted illegal operations Null pointer check MoveField, MoveArray, ArrayLength, VirtualCall Reference arguments to library functions should not be null Array bounds check Array allocation size check Division by zero If check fails jump to error handler code that prints a message and gracefully exists program 5

Null pointer check # null pointer check cmp $0,%eax je labelnpe Single generated handler for entire program labelnpe: push $strnpe call println push $1 call exit # error message # error code 6

Array bounds check # array bounds check mov -4(%eax),%ebx # ebx = length mov $0,%ecx # ecx = index cmp %ecx,%ebx jle labelabe # ebx <= ecx? cmp $0,%ecx jl labelabe # ecx < 0? Single generated handler for entire program labelabe: push $strabe call println push $1 call exit # error message # error code 7

Array allocation size check # array size check cmp $0,%eax # eax == array size jle labelase # eax <= 0? Single generated handler for entire program labelase: push $strase # error message call println push $1 # error code call exit 8

Automatic Memory Management automatically free memory when it is no longer needed not limited to OO programs, we show it here because it is prevalent in OO languages such as Java also in functional languages approximate reasoning about object liveness use reachability to approximate liveness assume reachable objects are live non-reachable objects are dead Three classical garbage collection techniques reference counting mark and sweep copying 9

GC using Reference Counting add a reference-count field to every object how many references point to it when (rc==0) the object is non reachable non reachable => dead can be collected (deallocated) 10

Managing Reference Counts Each object has a reference count o.rc A newly allocated object o gets o.rc = 1 why? write-barrier for reference updates update(x,old,new) { old.rc--; new.rc++; if (old.rc == 0) collect(old); } collect(old) will decrement RC for all children and recursively collect objects whose RC reached 0. 11

Cycles! cannot identify non-reachable cycles reference counts for nodes on the cycle will never decrement to 0 several approaches for dealing with cycles ignore periodically invoke a tracing algorithm to collect cycles specialized algorithms for collecting cycles 12

GC Using Mark & Sweep Marking phase mark roots trace all objects transitively reachable from roots mark every traversed object Sweep phase scan all objects in the heap collect all unmarked objects 13

GC Using Mark & Sweep mark_sweep() { for Ptr in Roots mark(ptr) sweep() } mark(obj) { if mark_bit(obj) == unmarked { mark_bit(obj)=marked for C in Children(Obj) mark(c) } } Sweep() { } p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p) 14

Copying GC partition the heap into two parts: old space, new space GC copy all reachable objects from old space to new space swap roles of old/new space 15

Example old new A B Roots C D E 16

Example old new A B A Roots C C D E 17

Summary How objects are organized in memory Automatic management of memory Coming up Generating assembly code 18

target languages IR + Symbol Table Code Gen. Absolute machine code Relative machine code Assembly 19

From IR to ASM: Challenges mapping IR to ASM operations what instruction(s) should be used to implement an IR operation? how do we translate code sequences call/return of routines managing activation records memory allocation register allocation optimizations 20

Intel IA-32 Assembly Going from Assembly to Binary Assembling Linking AT&T syntax vs. Intel syntax We will use AT&T syntax matches GNU assembler (GAS) 21

AT&T vs. Intel Syntax Attribute AT&T Intel Parameter order Parameter Size Immediate value signals Effective addresses Source comes before the destination Mnemonics are suffixed with a letter indicating the size of the operands (e.g., "q" for qword, "l" for dword, "w" for word, and "b" for byte) Prefixed with a "$", and registers must be prefixed with a "% General syntax DISP(BASE,INDEX,SCALE) Example: movl mem_location(%ebx,%ecx,4), %eax Destination before Derived from the name of the register that is used The assembler automatically detects the type of symbols; i.e., if they are registers, constants or something else. Use variables, and need to be in square brackets; additionally, size keywords like byte, word, or dword have to be used.[1] Example: mov eax, dword [ebx + ecx*4 + mem_location] 22

IA-32 Registers Eight 32-bit general-purpose registers EAX accumulator for operands and result data. Used to return value from function calls. EBX pointer to data. Often use as array-base address ECX counter for string and loop operations EDX I/O pointer (GP for us) ESI GP and source pointer for string operations EDI GP and destination pointer for string operations EBP stack frame (base) pointer ESP stack pointer EFLAGS register EIP (instruction pointer) register Six 16-bit segment registers (ignore the rest for our purposes) 23

Not all registers are born equal EAX EDX Required operand of MUL,IMUL,DIV and IDIV instructions Contains the result of these operations Stores remainder of a DIV or IDIV instruction (EAX stores quotient) ESI, EDI ESI required source pointer for string instructions EDI required destination pointer for string instructions Destination Registers of Arithmetic operations EAX, EBX, ECX, EDX EBP stack frame (base) pointer ESP stack pointer 24

IA-32 Addressing Modes Machine-instructions take zero or more operands Source operand Immediate Register Memory location (I/O port) Destination operand Register Memory location (I/O port) 25

Immediate and Register Operands Immediate Value specified in the instruction itself GAS syntax immediate values preceded by $ add $4, %esp Register Register name is used GAS syntax register names preceded with % mov %esp,%ebp 26

Memory and Base Displacement Operands Memory operands Value at given address GAS syntax - parentheses mov (%eax), %eax Base displacement Value at computed address Address computed out of base register, index register, scale factor, displacement offset = base + (index*scale) + displacement Syntax: disp(base,index,scale) movl $42, $2(%eax) movl $42, $1(%eax,%ecx,4) 27

Base Displacement Addressing 4 4 4 4 4 4 4 4 7 0 2 4 5 6 7 1 Array Base Reference (%ecx,%ebx,4) Mov (%ecx,%ebx,4), %eax offset = base + (index*scale) + displacement %ecx = base %ebx = 3 offset = base + (3*4) + 0 = base + 12 28

How do we generate the code? break the IR into basic blocks basic block is a sequence of instructions with single entry (to first instruction), no jumps to the middle of the block single exit (last instruction) code execute as a sequence from first instruction to last instruction without any jumps edge from one basic block B1 to another block B2 when the last statement of B1 may jump to B2 29

Example B1 t 1 := 4 * i t 2 := a [ t 1 ] if t 2 <= 20 goto B 3 False True B2 t 3 := 4 * i t 4 := b [ t 3 ] goto B4 B3 t 5 := t 2 * t 4 t 6 := prod + t 5 prod := t 6 goto B 4 B4 t 7 := i + 1 i := t 2 Goto B 5 30

creating basic blocks Input: A sequence of three-address statements Output: A list of basic blocks with each threeaddress statement in exactly one block Method Determine the set of leaders (first statement of a block) The first statement is a leader Any statement that is the target of a conditional or unconditional jump is a leader Any statement that immediately follows a goto or conditional jump statement is a leader For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program 31

control flow graph A directed graph G=(V,E) nodes V = basic blocks edges E = control flow (B1,B2) E when control from B1 flows to B2 prod := 0 i := 1 t 1 := 4 * i t 2 := a [ t 1 ] t 3 := 4 * i t 4 := b [ t 3 ] t 5 := t 2 * t 4 t 6 := prod + t 5 prod := t 6 t 7 := i + 1 i := t 7 if i <= 20 goto B 2 B 1 B 2 32

example CFG source IR B 1 i = 1 B 2 j = 1 1) i = 1 for i from 1 to 10 do for j from 1 to 10 do a[i, j] = 0.0; for i from 1 to 10 do a[i, i] = 1.0; 2) j =1 3) t1 = 10*I 4) t2 = t1 + j 5) t3 = 8*t2 6) t4 = t3-88 7) a[t4] = 0.0 8) j = j + 1 9) if j <= 10 goto (3) 10) i=i+1 11) if i <= 10 goto (2) B 3 B 4 t1 = 10*I t2 = t1 + j t3 = 8*t2 t4 = t3-88 a[t4] = 0.0 j = j + 1 if j <= 10 goto B3 i=i+1 if i <= 10 goto B2 12) i=1 13) t5=i-1 B 5 i = 1 14) t6=88*t5 15) a[t6]=1.0 16) i=i+1 17) if I <=10 goto (13) B 6 t5=i-1 t6=88*t5 a[t6]=1.0 i=i+1 if I <=10 goto B6 33

Variable Liveness A statement x = y + z defines x uses y and z A variable x is live at a program point if its value is used at a later point y = 42 z = 73 x = y + z print(x); x undef, y live, z undef x undef, y live, z live x is live, y dead, z dead x is dead, y dead, z dead (showing state after the statement) 34

Computing Liveness Information between basic blocks dataflow analysis (next lecture) within a single basic block? idea use symbol table to record next-use information scan basic block backwards update next-use for each variable 35

Computing Liveness Information INPUT: A basic block B of three-address statements. symbol table initially shows all non-temporary variables in B as being live on exit. OUTPUT: At each statement i: x = y + z in B, liveness and next-use information of x, y, and z at i. Start at the last statement in B and scan backwards At each statement i: x = y + z in B, we do the following: 1. Attach to i the information currently found in the symbol table regarding the next use and liveness of x, y, and z. 2. In the symbol table, set x to "not live" and "no next use. 3. In the symbol table, set y and z to "live" and the next uses of y and z to i 36

Computing Liveness Information Start at the last statement in B and scan backwards At each statement i: x = y + z in B, we do the following: 1. Attach to i the information currently found in the symbol table regarding the next use and liveness of x, y, and z. 2. In the symbol table, set x to "not live" and "no next use. 3. In the symbol table, set y and z to "live" and the next uses of y and z to i x = 1 y = x + 3 z = x * 3 x = x * z can we change the order between 2 and 3? 37

common-subexpression elimination common-subexpression elimination a = b + c b = a d c = b + c d = a - d a = b + c b = a d c = b + c d = b 38

DAG Representation of Basic Blocks a = b + c b = a - d c = b + c d = a - d - + b,d c + a d0 b0 c0 39

DAG Representation of Basic Blocks a = b + c b = b - d c = c + d e = b + c + + e a - b + c b0 c0 d0 40

algebraic identities a = x^2 b = x*2 c = x/2 d = 1*x a = x*x b = x+x c = x*0.5 d = x 41

coming up next register allocation 42

The End 43