Low-Level Essentials for Understanding Security Problems Aurélien Francillon

Similar documents
Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Assembly Language. Assembly language for x86 compatible processors using GNU/Linux operating system

Program Exploitation Intro

W4118: PC Hardware and x86. Junfeng Yang

x86 assembly CS449 Fall 2017

Practical Malware Analysis

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

Module 3 Instruction Set Architecture (ISA)

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Second Part of the Course

Digital Forensics Lecture 3 - Reverse Engineering

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

SOEN228, Winter Revision 1.2 Date: October 25,

x86 assembly CS449 Spring 2016

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB. Lab # 7. Procedures and the Stack

Process Layout and Function Calls

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

Compiler Construction D7011E

x86 architecture et similia

CS61 Section Solutions 3

System calls and assembler

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

An Introduction to x86 ASM

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

Assembly Language: Overview!

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

x86 Assembly Tutorial COS 318: Fall 2017

Assembly Language: Function Calls

The x86 Architecture

Assembly Language: Function Calls" Goals of this Lecture"

Towards the Hardware"

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

The Instruction Set. Chapter 5

Memory Models. Registers

Introduction to Reverse Engineering. Alan Padilla, Ricardo Alanis, Stephen Ballenger, Luke Castro, Jake Rawlins

3. Process Management in xv6

Lab 3. The Art of Assembly Language (II)

How Software Executes

x86 Assembly Crash Course Don Porter

Assembly Language: Overview

Sistemi Operativi. Lez. 16 Elementi del linguaggio Assembler AT&T

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

THEORY OF COMPILATION

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Lecture 4 CIS 341: COMPILERS

Assembly Language Lab # 9

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Sample Exam I PAC II ANSWERS

For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to

Dr. Ramesh K. Karne Department of Computer and Information Sciences, Towson University, Towson, MD /12/2014 Slide 1

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

CPS104 Recitation: Assembly Programming

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

Machine and Assembly Language Principles

Lab 2: Introduction to Assembly Language Programming

CS241 Computer Organization Spring 2015 IA

Real instruction set architectures. Part 2: a representative sample

Assembly Language Programming: Procedures. EECE416 uc. Charles Kim Howard University. Fall

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Systems Architecture I

Addressing Modes on the x86

Credits and Disclaimers

Instruction Set Architecture

Instruction Set Architecture

Stack -- Memory which holds register contents. Will keep the EIP of the next address after the call

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Reverse Engineering II: The Basics

CS429: Computer Organization and Architecture

CSE351 Spring 2018, Midterm Exam April 27, 2018

Assignment 11: functions, calling conventions, and the stack

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

SYSC3601 Microprocessor Systems. Unit 2: The Intel 8086 Architecture and Programming Model

16.317: Microprocessor Systems Design I Fall 2015

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

How Software Executes

CS Bootcamp x86-64 Autumn 2015

Reverse Engineering II: The Basics

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Summary: Direct Code Generation

Credits and Disclaimers

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Assembly Language. Lecture 2 x86 Processor Architecture

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110

CS 33: Week 3 Discussion. x86 Assembly (v1.0) Section 1G

Lecture 10 Return-oriented programming. Stephen Checkoway University of Illinois at Chicago Based on slides by Bailey, Brumley, and Miller

Using MMX Instructions to Perform Simple Vector Operations

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

MICROPROCESSOR TECHNOLOGY

Y86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View

Chapter 3: Addressing Modes

Computer Science 104:! Y86 & Single Cycle Processor Design!

Transcription:

Low-Level Essentials for Understanding Security Problems Aurélien Francillon francill@eurecom.fr

Computer Architecture The modern computer architecture is based on Von Neumann Two main parts: CPU (Central Processing Unit) and Memory This architecture is used almost everywhere This architecture is the most common one What is memory used for? E.g., location of cursor, size of windows, shape of each letter being displayed, graphics of icons, text, values, etc Von Neuman also says that not only data, but programs should also be in main memory 2

Von Neumann Architecture Core Address bus Data bus Instructions, data from the same memory The most common CPU architecture Beware of the caches Performance Self modifying code Memory Instructions Data Stack PC SP 3

Harvard Architecture Program memory Program Addresses Data Addresses SRAM memory PC Instructions Core Data Instructions bus Data bus Instructions, data from separate memory Common in embedded systems (AVR, PIC) and DSP Stack SP 4

The Core (or CPU) Storing data by itself, of course, is not enough The Core reads instructions from memory one by one Executes them (fetch-execute-cycle) Following components make up the Core Program Counter Instruction Decoder Data bus General-purpose registers Arithmetic and logic unit 5

Program Counter and Instruction Decoder Is used to tell the CPU where to fetch next instruction There is no difference between memory and data Program counter holds memory address of next instruction The instruction decoder then makes sense of the instruction Addition? Subtraction? Multiplication? Move operation? Etc. A typical instruction usually consists of memory locations as well E.g., move this piece of data from memory address X to memory address Y 6

The Data Bus and Registers The data bus connects the memory and the CPU i.e., it is the actual, physical wire In addition to the memory, the processor has highspeed, special memory location Called registers Special-purpose registers General-purpose registers Registers are used for computation 7

Arithmetic and Logic Unit Once instruction has been decoded, CPU passes data and decoded instruction to ALU Now, the instruction is actually executed Results of the computation are placed on the data bus and sent to memory locations given in the instruction For example, we can tell ALU to add 1 to register A and place result in register B 8

Some basics (you know them ;)) The number attached to each storage location is an address A single storage location is called a byte On x86 processors, a byte is between 0..255 Obviously, two bytes can be used to represent any number between 0..65536 Four bytes can be used to represent numbers between 0..4294967295., Luckily, we do not have to worry about this. The architecture helps us to do math with 4 byte numbers (8 bytes/64 bits: 0..18,446,744,073,709,551,615 (i.e. 18*10^18)) Addresses sizes are in words (a word is 4 bytes) Note that this means that a number and an address are dealt with the same way 9

Data Accessing Methods Processors have a number of different ways of accessing data Known as addressing modes The simplest method is known as immediate mode Data access is enabled in instruction itself In register addressing mode the instruction contains a register to access rather than memory location In direct addressing mode, the instruction contains memory address In indexed addressing mode, the instruction contains memory address to access and an index register to offset E.g., Address 1000, register=4, address=1004 10

Data Accessing Methods Processors have a number of different ways of accessing data Indirect addressing mode We take address that is stored in register Base point addressing mode You take address in register and add an offset Used a lot 11

A Simple Program in Assembler # No INPUT # Returns a status code, you can view it by typing echo $? # ebx holds the return code.intel_syntax noprefix.section.data.section.text.globl _start _start: mov eax, 1 #This is the sys call for exiting program mov ebx, 0 #This value is returned as status int 0x80 #This interrupt calls the kernel, to execute sys call 12

So now we have the source code So how do we create the application? Well, we need to assemble and link the code This can be done by using the assembler as: as exit.s o exit.o ld o exit exit.o 13

Anatomy of the Code.section breaks up the program into sections.data, obviously, is used for variables and data you might need.text is the code of the program that you write.globl _start indicates that _start is a symbol, required by the linker _start, in fact, indicates that the loader will load and start the program from this location The next instruction, mov, has two operands: destination and source mov, add, sub 14

A note on the syntax.intel_syntax noprefix Forces that the assembly is parsed in Intel-syntax instruction destination, source But: Majority of GNU/Linux tools default to AT&T Syntax instruction source, destination More explicit due to prefixes and suffixes Used syntax is a matter of taste Be aware of the differences! 15

The same program (AT&T syntax) # No INPUT # Returns a status code, you can view it by typing echo $? # %ebx holds the return code.section.data.section.text.globl _start _start: mov $1, %eax # This is the sys call for exiting program movl $0, %ebx # This value is returned as status int $0x80 # This interrupt calls the kernel, to execute sys call 16

Registers on the x86 Architecture General purpose registers: eax, ebx, ecx, edx, edi, esi Special purpose registers: ebp, esp, eip, eflags Some registers (e.g., eip, eflags) can only be accessed through special instructions 17

Let Us Write a More Complicated Program Task: Find the maximum of a list of numbers Questions to ask: Where will the numbers be stored? How do we find the maximum number? How much storage do we need? Will registers be enough or is memory needed? Let us designate registers for the task at hand: edi holds position in list ebx will hold current highest eax will hold current element examined 18

Algorithm we use Check if eax is zero (i.e., termination sign) If yes, exit If not, increase current position edi Load next value in the list to eax We need to think about what addressing mode to use here Compare eax (the current value) with highest value so far ebx If the current value is higher, replace ebx Repeat until termination 19

Let s get down to the code.intel_syntax noprefix.section.data data_items:.long 3,67,34,222,45,75,54,34,44,33,22,11,66,0.section.text.globl _start _start: mov edi, 0 # Reset index mov eax, [edi*4+data_items] mov ebx, eax #First item is the biggest so far 20

Let s get down to the code start_loop: cmp eax, 0 je loop_exit inc edi # Increment edi mov eax, [edi*4+data_items] #load the next value cmp eax, ebx # Compare ebx with eax jle start_loop # if it is less, then just jump to the beginning mov ebx, eax #Otherwise, store the new largest number jmp start_loop loop_exit: mov eax, 1 # Remember the exit sys call? It is 1 int 0x80 21

Important Instructions The compare instruction cmp eax, 0 je end_loop Other jump instructions jg, jge, jl, jle, jmp mov instruction Used often. One of the most important and common instructions that you are going to see and use, for example, when writing shell code 22

Different Addressing Modes direct addressing mode mov eax, [ADDRESS] indirect addressing mode mov ebx, [eax] indexed addressing mode mov eax, [ecx*1+data_start] uses the ADDRESS and the INDEX portion Immediate mode (already seen) 23

What if we do not want to move a word? Suppose you only need to move a byte at a time, not a word You can use the mov byte ptr instruction In eax, the least significant half is addressed by ax (i.e., 2 bytes) ax is devided into ah and al (1 byte in size) 24

Memory Layout Stack segment local variables procedure activation records Data segment global uninitialized variables (.bss) global initialized variables (.data) dynamic variables (heap) Code (Text) segment program instructions usually read-only In linux, under the proc filesystem $ cat /proc/<pid>/maps Top of Memory Environment variables Stack Heap ( bss.) data ( data.) data ( text.) code Shared Libraries Syssec 25

Functions A function is composed of several different pieces function name Symbol that represents where the function starts function parameters Data items passed to function for processing Local variables Temporary storage areas used in the function Thrown away when the processing finishes Static variables Storage area that is reused over invocations Global variables Storage areas outside the function 26

The Return Address The return address is a parameter which tells the function where to resume execution after the function is completed It is invisible the programmer does not necessarily know the address When a function is invoked, the calling point is saved When the function completes, it returns to the initial calling point In machine code, functions are called with call and they return when they execute ret Return value Usually, a single value is returned to caller 27

The Stack In most architectures (Intel, Sparc, Motorola, MIPS), the stack grows toward the bottom of the memory The ESP register (stack pointer) always points to the current top of the stack Composed of frames: Upon function call, a new frame is pushed on the stack Upon function return, the frame is discarded The EBP register (base pointer) points at the current frame Each frame contains: Return address (where to jump at the end of the function) Address of the previous frame Parameters passed to the function Local variables of the function Syssec 28

The Calling Convention The way that the variables are stored and the parameters and return values are transferred by the computer varies Depends on language, compiler, architecture The Application Binary Interface (ABI) This variance is known as the calling convention The assembly language can use any calling convention You can make one up yourself however If you want to inter-operate with other languages, use libraries, you need to follow their calling conventions E.g., suppose you want your code to be callable from a C program 29

The Stack Each computer program that runs uses a region of memory called the stack to enable functions to work properly You generally keep the things that you are working on toward the top, and you take things off as you are finished working with them The computer s stack lives at the very top addresses of memory You can push values onto stack using push You can retrieve items from the stack using pop 30

The Stack Where is the top of the stack Because of architectural considerations, the computer stack grows from higher addresses to lower addresses I.e., it grows downwards How do we know where the top of the stack is? The esp register stores a pointer to stack location If something is pushed onto stack, the stack decreases by esp 4 if push is used Yes, but if I only want to read? How can I do this without popping? mov eax, [esp] mov eax, [esp+4] (for addressing a higher value) 31

The C calling convention The stack is the key element for implementing a function s local variables, parameters, and return address Before executing a function, a program pushes all of the parameters for the function onto the stack in the reverse order that they are documented Program issues call instruction call does two things pushes address of next instruction (i.e., return) Modifies eip to point to function start 32

The C calling convention Parameter #N Parameter 2 Parameter 1 Return Address <--- (esp) Now, function has to do some thing It saves the current base pointer register ebp Needed for local variables and parameters push ebp mov ebp, esp 33

The C calling convention Copying the stack pointer into the base pointer at the beginning of a function allows you to always know where your parameters are Parameter #N <--- N*4+[ebp+4]... Parameter 2 <--- [ebp+12] Parameter 1 <--- [ebp+8] Return Address <--- [ebp+4] Saved ebp <--- [esp] and [ebp] 34

The C calling convention stack frame consists of all of the stack variables used within a function Next, the function reserves space on the stack for any local variables it needs This is done by simply moving the stack pointer out of the way E.g., if we need two words: sub esp, 8 Variables are local because when function returns, the stack frame is reset and variables disappear 35

The C calling convention Our stack now looks like this: Parameter #N <--- N*4+[ebp+4]... Parameter 2 <--- [ebp+12] Parameter 1 <--- [ebp+8] Return Address <--- [ebp+4] Saved ebp <--- [ebp] Local Variable 1 <--- [ebp-4] Local Variable 2 <--- [ebp-8] and [esp] 36

The C calling convention Now we can use the ebp for accessing all variables ebp was specifically designed for this purpose When a function is done It stores its return value in eax It resets the stack to what it was when it was called It gets rid of the current stack frame and puts the stack frame of the calling code back into effect It returns by invoking the ret command mov esp, ebp pop ebp ret 37

Bonus-Slide: Addressing Modes in AT&T-Syntax The general form of memory address references is this: ADDRESS_OR_OFFSET(%BASE_OR_OFFSET, %INDEX,MULTIPLIER) To calculate the address, simply perform the following calculation: FINAL ADDRESS = ADDRESS_OR_OFFSET + %BASE_OR_OFFSET + MULTIPLIER * %INDEX ADDRESS_OR_OFFSET and MULTIPLIER must both be constants, while the other two must be registers. 38