Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

Similar documents
Winter Compiler Construction T10 IR part 3 + Activation records. Today. LIR language

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

THEORY OF COMPILATION

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

x86 assembly CS449 Fall 2017

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

W4118: PC Hardware and x86. Junfeng Yang

THEORY OF COMPILATION

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Practical Malware Analysis

Compiler Construction D7011E

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

Program Exploitation Intro

Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

An Introduction to x86 ASM

How Software Executes

Sistemi Operativi. Lez. 16 Elementi del linguaggio Assembler AT&T

Stack -- Memory which holds register contents. Will keep the EIP of the next address after the call

CS241 Computer Organization Spring 2015 IA

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Process Layout and Function Calls

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Assembly Language: Function Calls

CSC 8400: Computer Systems. Machine-Level Representation of Programs

x86 assembly CS449 Spring 2016

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Reverse Engineering II: The Basics

Assembly Language: Function Calls" Goals of this Lecture"

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections

The x86 Architecture

Compilation /15a Lecture 7. Activation Records Noam Rinetzky

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8

IA-32 Architecture. CS 4440/7440 Malware Analysis and Defense

Assembly Language: Function Calls" Goals of this Lecture"

Machine Programming 1: Introduction

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

x86 Assembly Crash Course Don Porter

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB. Lab # 7. Procedures and the Stack

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

CS61 Section Solutions 3

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

How Software Executes

CMSC 313 Lecture 12. Project 3 Questions. How C functions pass parameters. UMBC, CMSC313, Richard Chang

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

x86 architecture et similia

Reverse Engineering II: The Basics

Computer Architecture and Assembly Language. Practical Session 3

Assembly Language Programming: Procedures. EECE416 uc. Charles Kim Howard University. Fall

How Software Executes

CMSC 313 Lecture 12 [draft] How C functions pass parameters

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

Second Part of the Course

x86 Assembly Tutorial COS 318: Fall 2017

238P: Operating Systems. Lecture 3: Calling conventions. Anton Burtsev October, 2018

Lecture 2 Assembly Language

Introduction to Reverse Engineering. Alan Padilla, Ricardo Alanis, Stephen Ballenger, Luke Castro, Jake Rawlins

Credits and Disclaimers

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

Assignment 11: functions, calling conventions, and the stack

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

Intel assembly language using gcc

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 11. Addressing Modes

Machine-Level Programming I: Introduction Jan. 30, 2001

Module 3 Instruction Set Architecture (ISA)

Function Call Convention

x86 Programming I CSE 351 Winter

Summary: Direct Code Generation

CPEG421/621 Tutorial

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

Instruction Set Architecture

Instruction Set Architecture

Assembly I: Basic Operations. Jo, Heeseung

Computer Systems Lecture 9

Systems I. Machine-Level Programming I: Introduction

Assembly Language Lab # 9

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

CS213. Machine-Level Programming III: Procedures

2.7 Supporting Procedures in hardware. Why procedures or functions? Procedure calls

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING PREVIEW SLIDES 16, SPRING 2013

CS 33: Week 3 Discussion. x86 Assembly (v1.0) Section 1G

X86 Review Process Layout, ISA, etc. CS642: Computer Security. Drew Davidson

143A: Principles of Operating Systems. Lecture 4: Calling conventions. Anton Burtsev October, 2017

Sample Exam I PAC II ANSWERS

Process Layout, Function Calls, and the Heap

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

ASSEMBLY III: PROCEDURES. Jo, Heeseung

Transcription:

Winter 2006-2007 Compiler Construction T11 Activation records + Introduction to x86 assembly Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University Today ic IC Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation exe Executable code Today: PA4 tips Activation records Memory layout Introduction to x86 assembly Next time: Code generation Register allocation Assembling Linking Activation records in depth PA5 2 Tips for PA4 Keep list of LIR instructions for each translated method Keep ClassLayout information for each class Field offsets Method offsets Don t forget to take superclass fields and methods into account May be useful to keep reference in each LIR instruction to AST node from which it was generated Two AST passes: Pass 1: collect and name strings literals (Map<ASTStringLiteral,String>), create ClassLayout for each class Pass 2: use literals and field/method offsets to translate method bodies Finally: print string literals, dispatch tables, print translation list of each method body 3 1

Roadmap PA4 (HIR/LIR) LIR data: string literals, dispatch vectors Array access and field access instructions Call/return primitives LIR registers LIR file (file.lir) generation PA5 (ASM) Static information: string literals, dispatch tables Handlers for runtime errors (here or in PA4) Call sequences Dynamic binding mechanism Machine registers Assembly file (file.s) generation 4 LIR vs. assembly #Registers Function calls Instruction set Types LIR Unlimited Implicit Abstract Basic and user defined Assembly Limited Runtime stack Concrete Limited basic types local variables, parameters, LIR registers 5 Function calls LIR simply call/return Conceptually Supply new environment (frame) with temporary memory for local variables Pass parameters to new environment Transfer flow of control (call/return) Return information from new environment (ret. value) Assembly pass parameters (+this), save registers, call, virtual method lookup, restore registers, pass return value, return 6 2

Activation records New environment = activation record (frame) Activation record = data of current function / method call User data Local variables Parameters Return values Register contents Administration data Code addresses Pointers to other activation records (not needed for IC) In IC a stack is sufficient! 7 Runtime stack Stack of activation records Call = push new activation record Return = pop activation record Only one active activation record top of stack Naturally handles recursion 8 Runtime stack Stack grows downwards (towards smaller addresses) SP stack pointer Top of current frame FP frame pointer Base of current frame Sometimes called BP (base pointer) FP Previous frame Current frame SP 9 3

x86 runtime stack support Register Usage ESP Stack pointer EBP Base pointer Pentium stack registers Instruction push, pusha, pop, popa, call ret Usage Push on runtime stack Pop from runtime stack Transfer control to called routine Transfer control back to caller x86 stack and call/ret instructions 10 Call sequences The processor does not save the content of registers on procedure calls So who will? Caller saves and restores registers Callee saves and restores registers But can also have both save/restore some registers Some conventions exist (e.g., cdecl) 11 Call sequences caller caller callee Caller push code call Callee push code (prologue) Callee pop code (epilogue) return Caller pop code Push caller-save registers Push actual parameters (in reverse order) push return address Jump to call address Push current base-pointer bp = sp Push local variables Push callee-save registers Pop callee-save registers Pop callee activation record Pop old base-pointer pop return address Jump to address Pop parameters Pop caller-save registers 12 4

Call sequences foo(42,21) caller caller push %ecx push $21 push $42 call _foo call push %ebp mov %esp, %ebp callee sub $8, %esp push %ebx pop %ebx mov %ebp, %esp pop %ebp ret return add $8, %esp pop %ecx Push caller-save registers Push actual parameters (in reverse order) push return address Jump to call address Push current base-pointer bp = sp Push local variables (callee variables) Push callee-save registers Pop callee-save registers Pop callee activation record Pop old base-pointer pop return address Jump to address Pop parameters Pop caller-save registers 13 Caller-save/callee-save convention Callee-saved registers need only be saved when callee modifies their value Some conventions exist (e.g., cdecl) %eax, %ecx, %edx caller save %ebx, %esi, %edi callee save %esp stack pointer %ebp frame pointer %eax procedure return value 14 Accessing stack variables Use offset from EBP Remember stack grows downwards Above EBP = parameters Below EBP = locals Examples %ebp + 4 = return address %ebp + 8 = first parameter %ebp 4 = first local FP+8 FP FP-4 Param n param1 Return address Previous fp Local 1 Local n Param n param1 SP Return address 15 5

main calling method bar int bar(int x) { int y; static void main(string[] args) { int z; Foo a = new Foo(); z = a.bar(31); Examples %ebp + 4 = return address %ebp + 8 = first parameter Always this in virtual function calls ESP, EBP-4 %ebp = old %ebp (pushed by callee) %ebp 4 = first local EBP+12 EBP+8 EBP 31 this a Return address Previous fp y main s frame bar s frame 16 main calling method bar int bar(foo this, int x) { int y; implicit parameter static void main(string[] args) { int z; Foo a = new Foo(); z = a.bar(a,31); implicit argument Examples %ebp + 4 = return address %ebp + 8 = first parameter Always this in virtual function calls ESP, EBP-4 %ebp = old %ebp (pushed by callee) %ebp 4 = first local EBP+12 EBP+8 EBP 31 this a Return address Previous fp y main s frame bar s frame 17 Conceptual memory layout Stack variables Param 1 Param n Return address Previous FP Local 1 Local k Heap locations Global variables Global 1 Global m 18 6

Memory segments layout Globals and static data: strings, dispatch tables Activation records: locals, parameters etc. Code Static area Stack Data layout and addresses known in advance (compile time) Dynamically allocated data: objects, arrays Heap 19 x86 assembly AT&T syntax and Intel syntax We ll be using AT&T syntax Work with GNU Assembler (GAS) Summary of differences Order of operands Memory addressing Size of memory operands Registers Constants disp(base, offset, scale) instruction suffixes (b,w,l) (e.g., movb, movw, movl) %eax, %ebx, etc. $4, $foo, etc AT&T op a,b means b = a op b (second operand is destination) op a, b means a = a op b (first operand is destination) [base + offset * scale + disp] operand prefixes (e.g., byte ptr, word ptr, dword ptr) eax, ebx, etc. 4, foo, etc Intel 20 IA-32 registers Eight 32-bit general-purpose registers EAX, EBX, ECX, EDX, ESI, EDI EBP stack frame (base) pointer ESP stack pointer EFLAGS register info on results of arithmetic operations EIP (instruction pointer) register Six 16-bit segment registers We ignore the rest 21 7

Register roles EAX Accumulator for operands and result data Required operand of MUL, IMUL, DIV and IDIV instructions Contains the result of these operations Used to store procedure return value EBX Pointer to data, often used as array-base address ECX Counter for string and loop operations EDX Stores remainder of a DIV or IDIV instruction (EAX stores quotient) ESI, EDI ESI required source pointer for string instructions EDI required destination pointer for string instructions Destination registers for arithmetic operations EAX, EBX, ECX, EDX EBP stack frame (base) pointer ESP stack pointer 22 Sample of x86 instructions Instruction add sub inc dec neg mul imul idiv Operands Reg,Reg imd,reg mem,reg Reg,Reg imd,reg mem,reg Reg Reg Reg Reg, %eax imd, %eax mem, %eax Reg Reg, %eax imd, %eax mem, %eax Reg Addition Increment value in register Decrement value in register Negate Meaning Unsigned multiplication Signed multiplication eax=eax*reg Signed division eax=quo, edx=rem eax / Reg 23 IA-32 addressing modes Machine-instructions take zero or more operands (usually allow at most one memory operand) Source operand Immediate Register Memory location (I/O port) Destination operand Register Memory location (I/O port) 24 8

Immediate and register operands Immediate Value specified in the instruction itself GAS syntax immediate values preceded by $ Example: add $4,%esp Register Register name is used GAS syntax register names preceded with % Example: mov %esp,%ebp 25 Memory and base displacement operands Memory operands Obtain value at given address GAS syntax parentheses Example: mov (%eax), %eax Base displacement Obtain value at computed address Address computed out of base register, index register, scale factor, displacement offset = base + (index * scale) + displacement Syntax: disp(base,index,scale) Example: mov $42,2(%eax) Example: mov $42,1(%eax,%ecx,4) 26 Representing strings and arrays 4 1 1 1 1 1 1 1 1 1 1 1 1 1 13 H e l l o w o r l d \ n String reference Each string preceded by a word indicating the length of the string Array length stored in memory word preceding base address (i.e., the location at offset -4) Project-wise String literals allocated statically, concatenation using stringcat Use allocatearray to allocate arrays NOTICE: accepts number of bytes not including the length word 27 9

Base displacement addressing 4 4 4 4 4 4 4 4 7 0 2 4 5 6 7 1 Array base reference (%ecx,%ebx,4) mov (%ecx,%ebx,4), %eax offset = base + (index * scale) + displacement %ecx = base %ebx = 3 offset = base + (3*4) + 0 = base + 12 28 Instruction examples Translate a=p+q into mov 16(%ebp),%ecx (load) add 8(%ebp),%ecx (arithmetic) mov %ecx,-8(%ebp) (store) Translate a=12 into mov $12,-8(%ebp) Accessing strings: str:.string Hello world! access as constant: push $str Array access: a[i]=1 mov -4(%ebp),%ebx (load a) mov -8(%ebp),%ecx (load i) mov $1,(%ebx,%ecx,4) (store into the heap) Jumps: Unconditional: jmp label2 Conditional: cmp %ecx,0 jnz L 29 Runtime check example Suppose we have an array access statement =v[i] or v[i]= In LIR: MoveArray R1[R2], or MoveArray,R1[R2] We have to check that 0 i < v.length cmp $0,-12(%ebp) (compare i to 0) jl ArrayBoundsError (test lower bound) mov -8(%ebp),%ecx (load v into %ecx) mov -4(%ecx),%ecx (load array length into %ecx) cmp -12(%ebp),%ecx (compare i to array length) jle ArrayBoundsError (test upper bound) 30 10

Hello world example class Library { void println(string s); class Hello { static void main(string[] args) { Library.println("Hello world!"); 31 Assembly file structure.title "hello.ic header statically-allocated data: string literals and dispatch tables # global declarations.global ic_main # data section.data.align 4.int 13 str1:.string "Hello world\n # text (code) section.text symbol exported to linker string length in bytes Method bodies and error handlers #----------------------------------------------------.align 4 ic_main: push %ebp # prologue comment mov %esp,%ebp push $str1 # print(...) call print add $4, %esp mov $0,%eax # return 0 mov %ebp,%esp pop %ebp ret # epilogue 32 Assembly file structure.title "hello.ic # global declarations.global ic_main # data section.data.align 4.int 13 str1:.string "Hello world\n Immediates have $ prefix Register names have % prefix Comments using # Labels end with the (standard) : # text (code) section.text prologue save ebp and set to be esp #----------------------------------------------------.align 4 ic_main: push %ebp # prologue mov %esp,%ebp push print parameter call print push $str1 call print add $4, %esp # print(...) pop parameter store return value of main in eax mov $0,%eax # return 0 mov %ebp,%esp # epilogue pop %ebp ret epilogue restore esp and ebp (pop) 33 11

Calls/returns Direct function call syntax: call name Example: call println Return instruction: ret 34 Virtual functions Indirect call: call *(Reg) Example: call *(%eax) Used for virtual function calls Dispatch table lookup Passing/receiving the this variable More on this next time 35 See you next week 36 12