Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Similar documents
Assembly Language: Function Calls

Assembly Language: Function Calls

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Assembly Language: Function Calls" Goals of this Lecture"

Machine Program: Procedure. Zhaoguo Wang

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS429: Computer Organization and Architecture

Machine-level Programs Procedure

The Stack & Procedures

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames

CSC 8400: Computer Systems. Using the Stack for Function Calls

Machine-Level Programming III: Procedures

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

CSC 2400: Computer Systems. Using the Stack for Function Calls

University of Washington

%r8 %r8d. %r9 %r9d. %r10 %r10d. %r11 %r11d. %r12 %r12d. %r13 %r13d. %r14 %r14d %rbp. %r15 %r15d. Sean Barker

Assembly Programming IV

Mechanisms in Procedures. CS429: Computer Organization and Architecture. x86-64 Stack. x86-64 Stack Pushing

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

Machine Programming 3: Procedures

The Stack & Procedures

CSC 2400: Computing Systems. X86 Assembly: Function Calls

Procedures and the Call Stack

Machine-Level Programming (2)

Stack Frame Components. Using the Stack (4) Stack Structure. Updated Stack Structure. Caller Frame Arguments 7+ Return Addr Old %rbp

Assembly Programming IV

x86-64 Programming III

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Function Calls and Stack

Areas for growth: I love feedback

x86-64 Programming III & The Stack

Lecture 3 CIS 341: COMPILERS

Machine/Assembler Language Putting It All Together

Machine- Level Representa2on: Procedure

Credits and Disclaimers

Building an Executable

CSE P 501 Exam Sample Solution 12/1/11

CSC 2400: Computer Systems. Using the Stack for Function Calls

CSE351 Spring 2018, Midterm Exam April 27, 2018

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 5

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack "bottom".. Earlier Frames. Frame for calling function P. Increasing address

Part 7. Stacks. Stack. Stack. Examples of Stacks. Stack Operation: Push. Piles of Data. The Stack

Data Representa/ons: IA32 + x86-64

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Credits and Disclaimers

CSCI 2021: x86-64 Control Flow

Machine Language CS 3330 Samira Khan

143A: Principles of Operating Systems. Lecture 5: Calling conventions. Anton Burtsev January, 2017

143A: Principles of Operating Systems. Lecture 4: Calling conventions. Anton Burtsev October, 2017

Generation. representation to the machine

18-600: Recitation #4 Exploits

C++ Named Return Value Optimization

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

void P() {... y = Q(x); print(y); return; } ... int Q(int t) { int v[10];... return v[t]; } Computer Systems: A Programmer s Perspective

CSE 401/M501 Compilers

Computer Organization: A Programmer's Perspective

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 7

Corrections made in this version not seen in first lecture:

Computer Systems C S Cynthia Lee


Credits and Disclaimers

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

Test I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Department of Electrical Engineering and Computer Science

Machine-Level Programming II: Control

Machine-Level Programming II: Control

CSE 351 Midterm - Winter 2017

Buffer Overflow Attack (AskCypert CLaaS)

Lecture 4 CIS 341: COMPILERS

Compiling C Programs Into X86-64 Assembly Programs

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 12

x86 Programming I CSE 351 Winter

Recitation: Attack Lab

Unoptimized Code Generation

238P: Operating Systems. Lecture 3: Calling conventions. Anton Burtsev October, 2018

CSC 252: Computer Organization Spring 2018: Lecture 6

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks.

18-600: Recitation #4 Exploits (Attack Lab)

Intel x86-64 and Y86-64 Instruction Set Architecture

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Arrays. CSE 351 Autumn Instructor: Justin Hsia

L14: Structs and Alignment. Structs and Alignment. CSE 351 Spring Instructor: Ruth Anderson

The Compilation Process

How Software Executes

Princeton University Computer Science 217: Introduction to Programming Systems Exceptions and Processes

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Machine Level Programming: Control

Systems Programming and Computer Architecture ( )

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

Control flow (1) Condition codes Conditional and unconditional jumps Loops Conditional moves Switch statements

Structs and Alignment CSE 351 Spring

Assembly II: Control Flow

Computer Architecture I: Outline and Instruction Set Architecture. CENG331 - Computer Organization. Murat Manguoglu

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Compiling C Programs Into x86-64 Assembly Programs

1. A student is testing an implementation of a C function; when compiled with gcc, the following x86-64 assembly code is produced:

CS429: Computer Organization and Architecture

Transcription:

Princeton University Computer Science 217: Introduction to Programming Systems Assembly Language: Function Calls 1

Goals of this Lecture Help you learn: Function call problems x86-64 solutions Pertinent instructions and conventions 2

Function Call Problems (1) Calling and returning How does caller function jump to callee function? How does callee function jump back to the right place in caller function? (2) Passing arguments How does caller function pass arguments to callee function? (3) Storing local variables Where does callee function store its local variables? (4) Returning a value How does callee function send return value back to caller function? How does caller function access the return value? (5) Optimization How do caller and callee function minimize memory access? 3

Running Example long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } Calls standard C labs() function Returns absolute value of given long 4

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 5

Problem 1: Calling and Returning How does caller jump to callee? i.e., Jump to the address of the callee s first instruction How does the callee jump back to the right place in caller? i.e., Jump to the instruction immediately following the most-recently-executed call instruction absadd(3l, -4L); 2 1 long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 6

Attempted Solution: jmp Instruction Attempted solution: caller and callee use jmp instruction f: jmp g freturnpoint: # Call g g: jmp freturnpoint # Return 7

Attempted Solution: jmp Instruction Problem: callee may be called by multiple callers f1: g: jmp g # Call g jmp??? # Return f1returnpoint: f2: jmp g # Call g f2returnpoint: 8

Attempted Solution: Use Register Attempted solution: Store return address in register f1: g: movq $f1returnpoint, %rax jmp g # Call g jmp *%rax # Return f1returnpoint: f2: movq $f2returnpoint, %rax jmp g # Call g f2returnpoint: Special form of jmp instruction 9

Attempted Solution: Use Register Problem: Cannot handle nested function calls f: movq $freturnpoint, %rax jmp g # Call g freturnpoint: Problem if f() calls g(), and g() calls h() Return address g() -> f() is lost g: movq $greturnpoint, %rax jmp h # Call h greturnpoint: h: jmp *%rax # Return jmp *%rax # Return 10

x86-64 Solution: Use the Stack Observations: May need to store many return addresses The number of nested function calls is not known in advance A return address must be saved for as long as the invocation of this function is live, and discarded thereafter Stored return addresses are destroyed in reverse order of creation f() calls g() return addr for g is stored g() calls h() return addr for h is stored h() returns to g() return addr for h is destroyed g() returns to f() return addr for g is destroyed LIFO data structure (stack) is appropriate RIP for h RIP for g RIP for f x86-64 solution: Use the STACK section of memory, usually accsesed via RSP Via call and ret instructions 11

call and ret Instructions ret instruction knows the return address f: call h call g 1 2 4 3 6 g: call h ret 5 h: ret 12

Stack operations RSP (stack pointer) register points to top of stack 0 Instruction pushq src popq dest Equivalent to subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp RSP 13

Implementation of call RIP (instruction pointer) register points to next instruction to be executed 0 Instruction pushq src popq dest Equivalent to subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp Note: Can t really access RIP directly, but this is implicitly what call does call addr pushq %rip jmp addr call instruction pushes return addr (old RIP) onto stack, then jumps RSP before call 14

Implementation of call RIP (instruction pointer) register points to next instruction to be executed 0 Instruction Equivalent to pushq src subq $8, %rsp movq src, (%rsp) popq dest call addr movq (%rsp), dest addq $8, %rsp pushq %rip jmp addr RSP after call Old RIP 15

Implementation of ret RIP (instruction pointer) register points to next instruction to be executed 0 Instruction pushq src Equivalent to subq $8, %rsp movq src, (%rsp) popq dest call addr ret movq (%rsp), dest addq $8, %rsp pushq %rip jmp addr popq %rip ret instruction pops stack, thus placing return addr (old RIP) into RIP RSP before ret Old RIP 16

Implementation of ret RIP (instruction pointer) register points to next instruction to be executed 0 Instruction pushq src popq dest call addr ret Equivalent to subq $8, %rsp movq src, (%rsp) movq (%rsp), dest addq $8, %rsp pushq %rip jmp addr popq %rip RSP after ret 17

Running Example # long absadd(long a, long b) absadd: # long absa, absb, sum # absa = labs(a) call labs # absb = labs(b) call labs # sum = absa + absb # return sum ret 18

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 19

Problem 2: Passing Arguments Problem: How does caller pass arguments to callee? How does callee accept parameters from caller? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 20

x86-64 Solution 1: Use the Stack Observations (déjà vu): May need to store many arg sets The number of arg sets is not known in advance Arg set must be saved for as long as the invocation of this function is live, and discarded thereafter Stored arg sets are destroyed in reverse order of creation LIFO data structure (stack) is appropriate 21

x86-64 Solution 2: Use Registers x86-64 solution: Pass first 6 (integer or address) arguments in registers for efficiency RDI, RSI, RDX, RCX, R8, R9 More than 6 arguments Pass arguments 7, 8, on the stack (Beyond scope of COS 217) Arguments are structures Pass arguments on the stack (Beyond scope of COS 217) Callee function then saves arguments to stack Or maybe not! See optimization later this lecture Callee accesses arguments as positive offsets vs. RSP 22

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum # absa = labs(a) movq 8(%rsp), %rdi call labs # absb = labs(b) movq 0(%rsp), %rdi call labs # sum = absa + absb # return sum addq $16, %rsp ret RSP RSP+8 Old RIP b a 23

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 24

Problem 3: Storing Local Variables Where does callee function store its local variables? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 25

x86-64 Solution: Use the Stack Observations (déjà vu again!): May need to store many local var sets The number of local var sets is not known in advance Local var set must be saved for as long as the invocation of this function is live, and discarded thereafter Stored local var sets are destroyed in reverse order of creation LIFO data structure (stack) is appropriate x86-64 solution: Use the STACK section of memory Or maybe not! See later this lecture 26

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum subq $24, %rsp # absa = labs(a) movq 32(%rsp), %rdi call labs # absb = labs(b) movq 24(%rsp), %rdi call labs # sum = absa + absb movq 16(%rsp), %rax addq 8(%rsp), %rax movq %rax, 0(%rsp) # return sum addq $40, %rsp ret RSP RSP+8 RSP+16 RSP+24 RSP+32 Old RIP sum absb absa b a 27

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 28

Problem 4: Return Values Problem: How does callee function send return value back to caller function? How does caller function access return value? long absadd(long a, long b) { long absa, absb, sum; absa = labs(a); absb = labs(b); sum = absa + absb; return sum; } 29

x86-64 Solution: Use RAX In principle Store return value in stack frame of caller Or, for efficiency Known small size store return value in register Other store return value in stack x86-64 convention Integer or address: Store return value in RAX Floating-point number: Store return value in floating-point register (Beyond scope of COS 217) Structure: Store return value on stack (Beyond scope of COS 217) 30

Running Example # long absadd(long a, long b) absadd: pushq %rdi # Push a pushq %rsi # Push b 0 # long absa, absb, sum subq $24, %rsp # absa = labs(a) movq 32(%rsp), %rdi call labs movq %rax, 16(%rsp) # absb = labs(b) movq 24(%rsp), %rdi call labs movq %rax, 8(%rsp) # sum = absa + absb movq 16(%rsp), %rax addq 8(%rsp), %rax movq %rax, 0(%rsp) RSP RSP+8 RSP+16 RSP+24 RSP+32 Old RIP sum absb absa b a # return sum movq 0(%rsp), %rax addq $40, %rsp ret Complete! 31

Agenda Calling and returning Passing arguments Storing local variables Returning a value Optimization 32

Problem 5: Optimization Observation: Accessing memory is expensive More expensive than accessing registers For efficiency, want to store parameters and local variables in registers (and not in memory) when possible Observation: Registers are a finite resource In principle: Each function should have its own registers In reality: All functions share same small set of registers Problem: How do caller and callee use same set of registers without interference? Callee may use register that the caller also is using When callee returns control to caller, old register contents may have been lost Caller function cannot continue where it left off 33

x86-64 Solution: Register Conventions Callee-save registers RBX, RBP, R12, R13, R14, R15 Callee function must preserve contents If necessary Callee saves to stack near beginning Callee restores from stack near end Caller-save registers RDI, RSI, RDX, RCX, R8, R9, RAX, R10, R11 Callee function can change contents If necessary Caller saves to stack before call Caller restores from stack after call 34

Running Example Local variable handling in unoptimized version: At beginning, absadd() allocates space for local variables (absa, absb, sum) on stack Body of absadd() uses stack At end, absadd() pops local variables from stack Local variable handling in optimized version: absadd() keeps local variables in R13, R14, R15 Body of absadd() uses R13, R14, R15 Must be careful: absadd()cannot change contents of R13, R14, or R15 So absadd() must save R13, R14, and R15 near beginning, and restore near end 35

Running Example # long absadd(long a, long b) absadd: pushq %r13 # Save R13, use for absa pushq %r14 # Save R14, use for absb pushq %r15 # Save R15, use for sum # absa = labs(a) pushq %rsi # Save RSI call labs movq %rax, %r13 popq %rsi # Restore RSI # absb += labs(b) movq %rsi, %rdi call labs movq %rax, %r14 # sum = absa + absb movq %r13, %r15 addq %r14, %r15 # return sum movq %r15, %rax popq %r15 # Restore R15 popq %r14 # Restore R14 popq %r13 # Restore R13 ret absadd() stores local vars in R13, R14, R15, not in memory absadd() cannot destroy contents of R13, R14, R15 So absadd() must save R13, R14, R15 near beginning and restore near end 36

Running Example Parameter handling in unoptimized version: absadd() accepts parameters (a and b) in RDI and RSI At beginning, absadd() copies contents of RDI and RSI to stack Body of absadd() uses stack At end, absadd() pops parameters from stack Parameter handling in optimized version: absadd() accepts parameters (a and b) in RDI and RSI Body of absadd() uses RDI and RSI Must be careful: Call of labs() could change contents of RDI and/or RSI absadd() must save contents of RDI and/or RSI before call of labs(), and restore contents after call 37

Running Example # long absadd(long a, long b) absadd: pushq %r13 # Save R13, use for absa pushq %r14 # Save R14, use for absb pushq %r15 # Save R15, use for sum # absa = labs(a) pushq %rsi # Save RSI call labs movq %rax, %r13 popq %rsi # Restore RSI # absb += labs(b) movq %rsi, %rdi call labs movq %rax, %r14 # sum = absa + absb movq %r13, %r15 addq %r14, %r15 # return sum movq %r15, %rax popq %r15 # Restore R15 popq %r14 # Restore R14 popq %r13 # Restore R13 ret absadd() keeps a and b in RDI and RSI, not in memory labs() can change RDI and/or RSI absadd() must retain contents of RSI (value of b) across 1 st call of labs() So absadd() must save RSI before call and restore RSI after call 38

Non-Optimized vs. Optimized Patterns Unoptimized pattern Parameters and local variables strictly in memory (stack) during function execution Pro: Always possible Con: Inefficient gcc compiler uses when invoked without O option Optimized pattern Parameters and local variables mostly in registers during function execution Pro: Efficient Con: Sometimes impossible More than 6 local variables Local variable is a structure or array Function computes address of parameter or local variable gcc compiler uses when invoked with O option, when it can! 39

Hybrid Patterns Hybrids are possible Example Parameters in registers Local variables in memory (stack) Hybrids are error prone for humans Example (continued from previous) Step 1: Access local variable local var is at stack offset X Step 2: Push caller-save register Step 3: Access local variable local var is at stack offset X+8!!! Step 4: Call labs() Step 6: Access local variable local var is at stack offset X+8!!! Step 7: Pop caller-save register Step 8: Access local variable local var is at stack offset X Avoid hybrids for Assignment 4 40

Summary Function calls in x86-64 assembly language Calling and returning call instruction pushes RIP onto stack and jumps ret instruction pops from stack to RIP Passing arguments Caller copies args to caller-saved registers (in prescribed order) Unoptimized pattern: Callee pushes args to stack Callee uses args as positive offsets from RSP Callee pops args from stack Optimized pattern: Callee keeps args in caller-saved registers Be careful! 41

Summary (cont.) Storing local variables Unoptimized pattern: Callee pushes local vars onto stack Callee uses local vars as positive offsets from RSP Callee pops local vars from stack Optimized pattern: Callee keeps local vars in callee-saved registers Be careful! Returning values Callee places return value in RAX Caller accesses return value in RAX 42

Putting it all together Add up the keys of a tree struct tree { int key; struct tree *left; struct tree *right; }; int sum (struct tree *t) { if (t==null) return 0; else return t->key + sum(t->left) + sum(t->right); } t 3 4 5 3 //// 5 4 0 0 0 0 //// ////.text.globl sum sum: # LOCAL VARIABLES: # %r12=t, %r13d=partial sum pushq %r12 pushq %r13 movq %rdi, %r12 cmpq $0, %r12 jne.l2 movl $0, %eax jmp.l3.l2: movl 0(%r12), %r13d movq 8(%r12), %rdi call sum addl %eax, %r13d movq 16(%r12), %rdi call sum addl %eax, %r13d movl %r13d, %eax.l3: popq %r13 popq %r12 ret 43