Assembly Language: Function Calls

Similar documents
Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Assembly Language: Function Calls

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Assembly Language: Function Calls" Goals of this Lecture"

Machine Program: Procedure. Zhaoguo Wang

CS429: Computer Organization and Architecture

The Stack & Procedures

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSC 8400: Computer Systems. Using the Stack for Function Calls

Machine-level Programs Procedure

Mechanisms in Procedures. CS429: Computer Organization and Architecture. x86-64 Stack. x86-64 Stack Pushing

CSC 2400: Computer Systems. Using the Stack for Function Calls

Machine-Level Programming III: Procedures

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

The Stack & Procedures

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames

CSC 2400: Computing Systems. X86 Assembly: Function Calls

%r8 %r8d. %r9 %r9d. %r10 %r10d. %r11 %r11d. %r12 %r12d. %r13 %r13d. %r14 %r14d %rbp. %r15 %r15d. Sean Barker

Machine-Level Programming (2)

University of Washington

Assembly Programming IV

Procedures and the Call Stack

Part 7. Stacks. Stack. Stack. Examples of Stacks. Stack Operation: Push. Piles of Data. The Stack

Computer Organization: A Programmer's Perspective

Machine Programming 3: Procedures

x86-64 Programming III

Function Calls and Stack

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Assembly Programming IV

Machine- Level Representa2on: Procedure

CSE P 501 Exam Sample Solution 12/1/11

CSE351 Spring 2018, Midterm Exam April 27, 2018

143A: Principles of Operating Systems. Lecture 5: Calling conventions. Anton Burtsev January, 2017

Lecture 3 CIS 341: COMPILERS

CSE 401/M501 Compilers

Recitation: Attack Lab

Lecture 4 CIS 341: COMPILERS

143A: Principles of Operating Systems. Lecture 4: Calling conventions. Anton Burtsev October, 2017

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 5

CSC 2400: Computer Systems. Using the Stack for Function Calls

18-600: Recitation #4 Exploits

Areas for growth: I love feedback

Stack Frame Components. Using the Stack (4) Stack Structure. Updated Stack Structure. Caller Frame Arguments 7+ Return Addr Old %rbp


CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Buffer Overflow Attack (AskCypert CLaaS)

CSCI 2021: x86-64 Control Flow

Test I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Department of Electrical Engineering and Computer Science

x86-64 Programming III & The Stack

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

238P: Operating Systems. Lecture 3: Calling conventions. Anton Burtsev October, 2018

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

Computer Systems C S Cynthia Lee

Machine Language CS 3330 Samira Khan

Intel x86-64 and Y86-64 Instruction Set Architecture

L14: Structs and Alignment. Structs and Alignment. CSE 351 Spring Instructor: Ruth Anderson

Building an Executable

Announcements. Class 7: Intro to SRC Simulator Procedure Calls HLL -> Assembly. Agenda. SRC Procedure Calls. SRC Memory Layout. High Level Program

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

18-600: Recitation #4 Exploits (Attack Lab)

Computer Architecture I: Outline and Instruction Set Architecture. CENG331 - Computer Organization. Murat Manguoglu

Unoptimized Code Generation

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack "bottom".. Earlier Frames. Frame for calling function P. Increasing address

18-600: Recitation #3

Machine/Assembler Language Putting It All Together

CSC 252: Computer Organization Spring 2018: Lecture 6

Data Representa/ons: IA32 + x86-64

Credits and Disclaimers

x86 Programming I CSE 351 Winter

Structs and Alignment CSE 351 Spring

Generation. representation to the machine

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

void P() {... y = Q(x); print(y); return; } ... int Q(int t) { int v[10];... return v[t]; } Computer Systems: A Programmer s Perspective

Credits and Disclaimers

C++ Named Return Value Optimization

Princeton University Computer Science 217: Introduction to Programming Systems Exceptions and Processes

x64 Cheat Sheet Fall 2014

Structs & Alignment. CSE 351 Autumn Instructor: Justin Hsia

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 7

Introduction to Compilers and Language Design Copyright (C) 2017 Douglas Thain. All rights reserved.

cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb mrmovq D(rB), ra 5 0 ra rb OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest

Compiling C Programs Into x86-64 Assembly Programs

CSE 401 Compilers. x86-64 Lite for Compiler Writers A quick (a) introduccon or (b) review. Hal Perkins Winter UW CSE 401 Winter 2017 J-1

CS 3330 Exam 1 Fall 2017 Computing ID:

CSC 252: Computer Organization Spring 2018: Lecture 11

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

Processor state. Information about currently executing program. %rax, %rdi, ... %r14 %r15. %rsp. %rip CF ZF SF OF. Temporary data

Machine-Level Programming II: Control

Corrections made in this version not seen in first lecture:

Machine-Level Programming II: Control

THE UNIVERSITY OF BRITISH COLUMBIA CPSC 261: MIDTERM 1 February 14, 2017

CSE 351 Midterm - Winter 2017

CS367. Program Control

Assembly II: Control Flow

CA Compiler Construction

Transcription:

Assembly Language: Function Calls 1

Goals of this Lecture Help you learn: Function call problems x86-64 solutions Pertinent instructions and conventions 2

Function Call Problems (1) Calling and returning How does caller function jump to callee function? How does callee function jump back to the right place in caller function? (2) Passing arguments How does caller function pass arguments to callee function? (3) Storing local variables Where does callee function store its local variables? (5) Returning a value How does callee function send return value back to caller function? How does caller function access the return value? (6) Optimization How do caller and callee function minimize memory access? 3

Running Example long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } Calls standard C labs() function Returns absolute value of given long 4

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 5

Problem 1: Calling and Returning How does caller jump to callee? I.e., Jump to the address of the callee s first instruction How does the callee jump back to the right place in caller? I.e., Jump to the instruction immediately following the most-recently-executed call instruction absadd(3l, -4L); 2 1 long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 6

Attempted Solution: jmp Instruction Attempted solution: caller and callee use jmp instruction f: g: jmp g # Call g jmp freturnpoint # Return freturnpoint: 7

Attempted Solution: jmp Instruction Problem: callee may be called by multiple callers f1: g: jmp g # Call g jmp??? # Return f1returnpoint: f2: jmp g # Call g f2returnpoint: 8

Attempted Solution: Use Register Attempted solution: Store return address in register f1: g: movq $f1returnpoint, %rax jmp g # Call g jmp *%rax # Return f1returnpoint: f2: movq $f2returnpoint, %rax jmp g # Call g f2returnpoint: Special form of jmp instruction 9

Attempted Solution: Use Register Problem: Cannot handle nested function calls f: movq $freturnpoint, %rax jmp g # Call g freturnpoint: Problem if f() calls g(), and g() calls h() Return address g() -> f() is lost g: movq $greturnpoint, %rax jmp h # Call h greturnpoint: h: jmp *%rax # Return jmp *%rax # Return 10

x86-64 Solution: Use the Stack Observations: May need to store many return addresses The number of nested function calls is not known in advance A return address must be saved for as long as the invocation of this function is live, and discarded thereafter Stored return addresses are destroyed in reverse order of creation f() calls g() => return addr for g is stored RIP for h g() calls h() => return addr for h is stored RIP for g h() returns to g() => return addr for h is destroyed RIP for f g() returns to f() => return addr for g is destroyed LIFO data structure (stack) is appropriate x86-64 solution: Use the STACK section of memory Via call and ret instructions 11

call and ret Instructions ret instruction knows the return address f: call h 1 2 h: ret call g 3 6 g: call h 4 5 ret 12

Implementation of call RSP (stack pointer) register points to top of stack 0 Instruction pushq src popq dest Effective Operations subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp RSP 13

Implementation of call RIP (instruction pointer) register points to next instruction to be executed 0 Instruction pushq src popq dest Effective Operations subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp Note: Can t really access RIP directly, but this is implicitly what call is doing call addr pushq %rip jmp addr call instruction pushes return addr (old RIP) onto stack, then jumps RSP before call 14

Implementation of call 0 Instruction Effective Operations pushq src subq $8, %rsp movq src, (%rsp) popq dest movq (%rsp), dest addq $8, %esp call addr pushq %rip jmp addr RSP after call Old RIP 15

Implementation of ret 0 Instruction pushq src popq dest Effective Operations subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp Note: can t really access RIP directly, but this is implicitly what ret is doing call addr ret pushq %rip jmp addr popq %rip RSP before ret Old RIP ret instruction pops stack, thus placing return addr (old RIP) into RIP 16

Implementation of ret 0 Instruction pushq src popq dest Effective Operations subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp call addr pushq %rip jmp addr ret popq %rip RSP after ret 17

Running Example # long absadd(long a, long b) absadd: # long absa, absb, sum # absa = labs(a) call labs # absb = labs(b) call labs # sum = absa + absb # return sum ret 18

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 19

Problem 2: Passing Arguments Problem: How does caller pass arguments to callee? How does callee accept parameters from caller? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 20

X86-64 Solution 1: Use the Stack Observations (déjà vu): May need to store many arg sets The number of arg sets is not known in advance Arg set must be saved for as long as the invocation of this function is live, and discarded thereafter Stored arg sets are destroyed in reverse order of creation LIFO data structure (stack) is appropriate 21

x86-64 Solution: Use the Stack x86-64 solution: Pass first 6 (integer or address) arguments in registers RDI, RSI, RDX, RCX, R8, R9 More than 6 arguments => Pass arguments 7, 8, on the stack (Beyond scope of COS 217) Arguments are structures => Pass arguments on the stack (Beyond scope of COS 217) Callee function then saves arguments to stack Or maybe not! See optimization later this lecture Callee accesses arguments as positive offsets vs. RSP 22

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum # absa = labs(a) movq 8(%rsp), %rdi call labs # absb = labs(b) movq 0(%rsp), %rdi call labs # sum = absa + absb # return sum addq $16, %rsp ret RSP RSP+8 Old RIP b a 23

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 24

Problem 3: Storing Local Variables Where does callee function store its local variables? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 25

x86-64 Solution: Use the Stack Observations (déjà vu again!): May need to store many local var sets The number of local var sets is not known in advance Local var set must be saved for as long as the invocation of this function is live, and discarded thereafter Stored local var sets are destroyed in reverse order of creation LIFO data structure (stack) is appropriate x86-64 solution: Use the STACK section of memory Or maybe not! See later this lecture 26

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum subq $24, %rsp # absa = labs(a) movq 32(%rsp), %rdi call labs # absb = labs(b) movq 24(%rsp), %rdi call labs # sum = absa + absb movq 16(%rsp), %rax addq 8(%rsp), %rax movq %rax, 0(%rsp) # return sum addq $40, %rsp ret RSP RSP+8 RSP+16 RSP+24 RSP+32 Old RIP sum abs B abs A b a 27

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 28

Problem 4: Return Values Problem: How does callee function send return value back to caller function? How does caller function access return value? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 29

x86-64 Solution: Use RAX In principle Store return value in stack frame of caller Or, for efficiency Known small size => store return value in register Other => store return value in stack x86-64 convention Integer or address: Store return value in RAX Floating-point number: Store return value in floating-point register (Beyond scope of COS 217) Structure: Store return value on stack (Beyond scope of COS 217) 30

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum subq $24, %rsp # absa = labs(a) movq 32(%rsp), %rdi call labs movq %rax, 16(%rsp) # absb = labs(b) movq 24(%rsp), %rdi call labs movq %rax, 8(%rsp) # sum = absa + absb movq 16(%rsp), %rax addq 8(%rsp), %rax movq %rax, 0(%rsp) RSP RSP+8 RSP+16 RSP+24 RSP+32 Old RIP sum abs B abs A b a # return sum movq 0(%rsp), %rax addq $40, %rsp ret Complete! 31

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 32

Problem 5: Optimization Observation: Accessing memory is expensive More expensive than accessing registers For efficiency, want to store parameters and local variables in registers (and not in memory) when possible Observation: Registers are a finite resource In principle: Each function should have its own registers In reality: All functions share same small set of registers Problem: How do caller and callee use same set of registers without interference? Callee may use register that the caller also is using When callee returns control to caller, old register contents may have been lost Caller function cannot continue where it left off 33

x86-64 Solution: Register Conventions Callee-save registers RBX, RBP, R12, R13, R14, R15 Callee function cannot change contents If necessary Callee saves to stack near beginning Callee restores from stack near end Caller-save registers RDI, RSI, RDX, RCX, R8, R9, RAX, R10, R11 Callee function can change contents If necessary Caller saves to stack before call Caller restores from stack after call 34

Running Example Local variable handling in non-optimized version: At beginning, absadd() allocates space for local variables (absa, absb, sum) in stack Body of absadd() uses stack At end, absadd() pops local variables from stack Local variable handling in optimized version: absadd() keeps local variables in R13, R14, R15 Body of absadd() uses R13, R14, R15 Must be careful: absadd()cannot change contents of R13, R14, or R15 So absadd() must save R13, R14, and R15 near beginning, and restore near end 35

Running Example # long absadd(long a, long b) absadd: pushq %r13 # Save R13, use for absa pushq %r14 # Save R14, use for absb pushq %r15 # Save R15, use for sum # absa = labs(a) pushq %rsi # Save RSI call labs movq %rax, %r13 popq %rsi # Restore RSI # absb += labs(b) movq %rsi, %rdi call labs movq %rax, %r14 # sum = absa + absb movq %r13, %r15 addq %r14, %r15 # return sum movq %r15, %rax popq %r15 # Restore R15 popq %r14 # Restore R14 popq %r13 # Restore R13 ret absadd() stores local vars in R13, R14, R15, not in memory absadd() cannot change contents of R13, R14, R15 So absadd() must save R13, R14, R15 near beginning and restore near end 36

Running Example Parameter handling in non-optimized version: absadd() accepts parameters (a and b) in RDI and RSI At beginning, absadd() copies contents of RDI and RSI to stack Body of absadd() uses stack At end, absadd() pops parameters from stack Parameter handling in optimized version: absadd() accepts parameters (a and b) in RDI and RSI Body of absadd() uses RDI and RSI Must be careful: Call of labs() could change contents of RDI and/or RSI absadd() must save contents of RDI and/or RSI before call of labs(), and restore contents after call 37

Running Example # long absadd(long a, long b) absadd: pushq %r13 # Save R13, use for absa pushq %r14 # Save R14, use for absb pushq %r15 # Save R15, use for sum # absa = labs(a) pushq %rsi # Save RSI call labs movq %rax, %r13 popq %rsi # Restore RSI # absb += labs(b) movq %rsi, %rdi call labs movq %rax, %r14 # sum = absa + absb movq %r13, %r15 addq %r14, %r15 # return sum movq %r15, %rax popq %r15 # Restore R15 popq %r14 # Restore R14 popq %r13 # Restore R13 ret absadd() keeps a and b in RDI and RSI, not in memory labs() can change RDI and/or RSI absadd() must retain contents of RSI (value of b) across 1 st call of labs() So absadd() must save RSI before call and restore RSI after call 38

Non-Optimized vs. Optimized Patterns Non-optimized pattern Parameters and local variables strictly in memory (stack) during function execution Pro: Always possible Con: Inefficient gcc compiler uses when invoked without O option Optimized pattern Parameters and local variables strictly in registers during function execution Pro: Efficient Con: Sometimes impossible More than 6 local variables Local variable is a structure or array Function computes address of parameter or local variable gcc compiler uses when invoked with O option, when it can! 39

Hybrid Patterns Hybrids are possible Example Parameters in registers Local variables in memory (stack) Hybrids are error prone for humans Example (continued from previous) Step 1: Access local variable ß local var is at stack offset X Step 2: Push caller-save register Step 3: Access local variable ß local var is at stack offset X+8!!! Step 4: Call labs() Step 6: Access local variable ß local var is at stack offset X+8!!! Step 7: Pop caller-save register Step 8: Access local variable ß local var is at stack offset X Avoid hybrids for Assignment 4 40

Summary Function calls in x86-64 assembly language Calling and returning call instruction pushes RIP onto stack and jumps ret instruction pops from stack to RIP Passing arguments Caller copies args to caller-saved registers (in prescribed order) Non-optimized pattern: Callee pushes args to stack Callee uses args as positive offsets from RSP Callee pops args from stack Optimized pattern: Callee keeps args in caller-saved registers Be careful! 41

Summary (cont.) Storing local variables Non-optimized pattern: Callee pushes local vars onto stack Callee uses local vars as positive offsets from RSP Callee pops local vars from stack Optimized pattern: Callee keeps local vars in callee-saved registers Be careful! Returning values Callee places return value in RAX Caller accesses return value in RAX 42