Machine/Assembler Language Putting It All Together

Similar documents
Introduction to Machine/Assembler Language

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

CSE351 Spring 2018, Midterm Exam April 27, 2018

Credits and Disclaimers

Machine Program: Procedure. Zhaoguo Wang

Do not turn the page until 5:10.

CSE 351 Midterm Exam Spring 2016 May 2, 2015

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Instruction Set Architectures

x86 Programming I CSE 351 Winter

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSE 351 Midterm Exam

CSCI 2021: x86-64 Control Flow

Procedures and the Call Stack

Assembly II: Control Flow

The von Neumann Machine

%r8 %r8d. %r9 %r9d. %r10 %r10d. %r11 %r11d. %r12 %r12d. %r13 %r13d. %r14 %r14d %rbp. %r15 %r15d. Sean Barker

Assembly Language Each statement in an assembly language program consists of four parts or fields.

CS429: Computer Organization and Architecture

CS241 Computer Organization Spring 2015 IA

CS 107 Lecture 10: Assembly Part I

MACHINE-LEVEL PROGRAMMING I: BASICS COMPUTER ARCHITECTURE AND ORGANIZATION

Intel x86-64 and Y86-64 Instruction Set Architecture

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections

Machine-Level Programming III: Procedures

x64 Cheat Sheet Fall 2014

Machine-level Programs Procedure

1 Number Representation(10 points)

Do not turn the page until 11:30.

Meet & Greet! Come hang out with your TAs and Fellow Students (& eat free insomnia cookies) When : TODAY!! 5-6 pm Where : 3rd Floor Atrium, CIT

The von Neumann Machine

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack "bottom".. Earlier Frames. Frame for calling function P. Increasing address

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

University of Washington

CSE 351 Section 4 GDB and x86-64 Assembly Hi there! Welcome back to section, we re happy that you re here

Today: Machine Programming I: Basics

Instruction Set Architecture (ISA) Data Types

Chapter 3 Machine-Level Programming I: Basics. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

Corrections made in this version not seen in first lecture:

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2

x86 Programming I CSE 351 Autumn 2016 Instructor: Justin Hsia

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

Function Calls and Stack

Machine Level Programming I: Basics

Machine-Level Programming I: Basics

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Areas for growth: I love feedback

Instruction Set Architectures

How Software Executes

x86 64 Programming I CSE 351 Autumn 2018 Instructor: Justin Hsia

The Hardware/Software Interface CSE351 Spring 2013

SYSTEMS PROGRAMMING AND COMPUTER ARCHITECTURE Assignment 5: Assembly and C

Carnegie Mellon. 5 th Lecture, Jan. 31, Instructors: Todd C. Mowry & Anthony Rowe

Assembly Programming IV

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Assembly Language Programming 64-bit environments

IA-32 & AMD64. Crash Dump Analysis 2015/2016. CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics.

Machine Programming 3: Procedures

Credits and Disclaimers

Building an Executable

Machine-level Programs Data

6/20/2011. Introduction. Chapter Objectives Upon completion of this chapter, you will be able to:

x86 Programming II CSE 351 Winter

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

The x86 Architecture

Machine- Level Programming II: Arithme6c & Control

MACHINE-LEVEL PROGRAMMING I: BASICS

CS Bootcamp x86-64 Autumn 2015

Machine-Level Programming II: Control

Machine-Level Programming II: Control

Machine Language CS 3330 Samira Khan

Machine- Level Representa2on: Procedure

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

Today: Machine Programming I: Basics. Machine Level Programming I: Basics. Intel x86 Processors. Intel x86 Evolution: Milestones

Lecture (02) The Microprocessor and Its Architecture By: Dr. Ahmed ElShafee

EEM336 Microprocessors I. Addressing Modes

Machine-Level Programming (2)

CS429: Computer Organization and Architecture

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

The Stack & Procedures

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

Access. Young W. Lim Fri. Young W. Lim Access Fri 1 / 18

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames

Instruction Set Architectures

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

Lecture 4: x86_64 Assembly Language

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

MODE (mod) FIELD CODES. mod MEMORY MODE: 8-BIT DISPLACEMENT MEMORY MODE: 16- OR 32- BIT DISPLACEMENT REGISTER MODE

u Arrays One-dimensional Multi-dimensional (nested) Multi-level u Structures Allocation Access Alignment u Floating Point

Brief Assembly Refresher

Transcription:

COMP 40: Machine Structure and Assembly Language Programming Fall 2015 Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

Goals for today Explore the compilation of realistic programs from C to ASM Fill in a few more details on things like structure mapping 2

Review

X86-64 / AMD 64 / IA 64 General Purpose Registers 63 31 0 63 0 %rax %eax %ax %ah $al %r8 %rcx %ecx %cx %ch $cl %r9 %rdx %edx %dx %dh $dl %r10 %rbx %ebx %bx %bh $bl %r11 %rsi %esi %si %r12 %rdi %edi %di %r13 %rsp %esp %sp %r14 %rbp %ebp %bp %r15 4

Machine code (typical) Simple instructions each does small unit of work Stored in memory Bitpacked into compact binary form Directly executed by transistor/hardware logic * * We ll show later that some machines execute user-provided machine code directly, some convert it to an even lower level machine code and execute that. 5

A simple function in C int times16(int i) { return i * 16; 0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq REMEMBER: you can see the assembler code for any C program by running gcc with the S flag. Do it!! 6

Addressing modes %rax (%rax) $123,%rax 0x10(%rax) $0x4089a0(, %rax, 8) (%ebx, %ecx, 2) 4(%ebx, %ecx, 2) // contents of rax is data // data pointed to by rax // Copy 123 into %rax // get *(16 + rax) // Global array index // of 8-byte things // Add base and scaled index // Add base and scaled // index plus offset 4 7

movl and leal %rax (%rax) $123,%rax 0x10(%rax) $0x4089a0(, %rax, 8) (%ebx, %ecx, 2) 4(%ebx, %ecx, 2) // contents of rax is data // data pointed to by rax // Copy 123 into %rax // get *(16 + rax) // Global array index // of 8-byte things // Add base and scaled index // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 8

movl and leal movl Moves the data at the address: %rax // contents of rax is data (%rax) similar to: // data pointed to by rax $123,%rax // Copy 123 into %rax 0x10(%rax) char *ebxp; // get *(16 + rax) $0x4089a0(, %rax, 8) char edx = ebxp[ecx] // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 9

movl and leal leal Moves the address itself: %rax // contents of rax is data (%rax) similar to: // data pointed to by rax $123,%rax // Copy 123 into %rax 0x10(%rax) char *ebxp; // get *(16 + rax) $0x4089a0(, %rax, 8) char *edxp = &(ebxp[ecx]); // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 10

movl and leal scale factors support indexing larger types %rax similar to: // contents of rax is data (%rax) // data pointed to by rax $123,%rax int *ebxp; // Copy 123 into %rax 0x10(%rax) int edxp = ebxp[ecx]; // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4)) leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4)) 11

Control flow: simple jumps.l4: code here j.l4 // jump back to L4 12

Conditional jumps.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq code %rcx, here %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4 // conditional: jump iff %r8!= %rdf This technique is the key to compiling if statements, for loops, while loops, etc. 13

Challenge: how would this compile? static code *jump_table[4] = { L0, L1, L2, L3 ; switch (exp) { case 0: SS0; case 1: SS1; case 2: SS2; case 3: SS3: default: SSd; t = exp; if ((unsigned)t < 4) { t = jump_table[t]; goto t; else goto Ld; L0: SS0; L1: SS1; L2: SS2; L3: SS3: Ld: SSd; Lend:... And break becomes goto Lend 14

Calling Functions (Review)

Function calls on Linux/AMD 64 Caller pushes return address on stack Where practical, arguments passed in registers Exceptions: Structs, etc. Too many What can t be passed in registers is at known offsets from stack pointer! Return values In register, typically %rax for integers and pointers Exception: structures Each function gets a stack frame Leaf functions that make no calls may skip setting one up 16

Arguments/return values in registers Operand Size Argument Number 1 2 3 4 5 6 64 %rdi %rsi %rdx %rcx %r8 %r9 32 %edi %esi %edx %ecx %r8d %r9d 16 %di %si %dx %cx %r8w %r9w 8 %dil %sil %dl %cl %r8b %r9b Arguments and return values passed in registers when types are suitable and when there aren t too many Return values usually in %rax, %eax, etc. Callee may change these and most other registers. However, callee must not change %rsp; %rbp, %rbx, %r12 %r15! MMX and FP 87 registers used for floating point Read the specifications for full details! 17

The stack general case Before call After callq If callee needs frame Arguments Arguments Arguments???? %rsp Return address %rsp Return???? address Callee vars Args to next call? framesize %rsp sub $0x{framesize,%rsp 18

A simple function unsigned int times2(unsigned int a) { return a * 2; and put it in return value register Double the argument times2: leaq (%rdi,%rdi), %rax ret 19

The stack general case Before call Must be multiple After of callq 16 when a function is called! But now it s not a multiple If callee of needs 16 frame Arguments Arguments Arguments???? %rsp Return address %rsp Return???? address Here too! Callee vars Args to next call? framesize %rsp sub $0x{framesize,%rsp 20

The stack general case Before call Must be multiple After of callq 16 when a function is called! If callee needs frame Arguments Arguments Arguments???? %rsp Return address %rsp Return???? address Waste 8 bytes %rsp If we ll be calling other functions with no arguments, very common to see sub $8, %rsp to re-establish alignment sub $8,%rsp 21

Getting the details on function call linkages Bryant and O Halloran has excellent introduction Watch for differences between 32 and 64 bit The official specification: System V Application Binary Interface: AMD64 Architecture Processor Supplement Find it at: http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf See especially sections 3.1 and 3.2 E.g., You ll see that %rbp, %rbx, %r12 %r15 are callee saves. All other GPRs are caller saves. See also Figure 3.4. 22

Factorial Revisited fact:.lfb2: pushq %rbx.lcfi0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je.l4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.l4: popq %rbx ret int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); 23

An example that integrates what we ve learned

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; 25

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 26

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 27

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; Looks like sum will be in %eax/%rax sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 28

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 29

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; If length argument is zero, just leave sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 30

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; We re testing the value of %rcx, but why??? sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 31

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; We re testing the value of %rcx, could that be our arr[i] > 0 test? Yes! sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 32

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; Hmm. We re conditionally updating our result (%rax) with $rsi sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 33

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; Hmm. We re conditionally updating our result (%rax) with $rsi we must have computed the sum there! sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 34

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 35

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; This is looking a lot like our for loop index increment & test. sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 36

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; But why are we adding 8 to the index variable??? sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 37

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return Address sum; just past array!! (unclear why compiler bothers to subtract one and then add 8) sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 38

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return sum; But what is %rdx? sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 39

Sum positive numbers in an array unsigned long sum_positives(long arr[], unsigned int length) { unsigned long sum = 0; unsigned int i; for (i = 0; i < length; i++) { if (arr[i] > 0) sum += arr[i]; return Offset sum; to the next array element we never actually compute variable i! sum_positives:.lfb0: movl $0, %eax testl %esi, %esi je.l2 subl $1, %esi leaq 8(,%rsi,8), %r8 movl $0, %edx.l4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne.l4.l2: rep ret 40

How C Language Data is Represented (Review)

Sizes and alignments C Type Width Alignment char 1 1 short 2 2 int 4 4 float 4 4 pointer, long, long long 8 8 double 8 8 long double 10 16 http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf#page=13 42

Why align to natural sizes? Memory systems delivers aligned to cache blocks CPU loads aligned words or blocks from cache Data that spans boundaries needs two accesses! Naturally aligned data never unnecessarily crosses a boundary Intel/AMD alignment rules Pre AMD 64: unaligned data allowed but slower AMD 64: data must be aligned as shown on previous slides http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf#page=13 43

Alignment of C Language Data Structures Wasted space used to pad between struct members if needed to maintain alignment Therefore: when practical, put big members of larger primitive types ahead of smaller ones! Structures sized for worst-case member Arrays of structures will have properly aligned members malloc return values always multiple of 16 structp = (struct s *)malloc(sizeof(struct s)) is always OK Argument area on stack is 16 byte aligned (%rsp - 8) is multiple of 16 on entry to functions Extra space wasted in caller frame as needed Therefore: called function knows how to maintain alignment 44

Summary C code compiled to assembler Data moved to registers for manipulation Conditional and jump instructions for control flow Stack used for function calls Compilers play all sorts of tricks when compiling code Simple C data types map directly to properly aligned machine types in memory and registers 45