Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Similar documents
Practical Malware Analysis

Basic Pentium Instructions. October 18

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

Program Exploitation Intro

Lab 3. The Art of Assembly Language (II)

SOEN228, Winter Revision 1.2 Date: October 25,

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

CPS104 Recitation: Assembly Programming

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

A CRASH COURSE IN X86 DISASSEMBLY

Digital Forensics Lecture 3 - Reverse Engineering

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB

The IA-32 Stack and Function Calls. CS4379/5375 Software Reverse Engineering Dr. Jaime C. Acosta

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Lecture 2 Assembly Language

Assembly Language: IA-32 Instructions

16.317: Microprocessor Systems Design I Fall 2014

An Introduction to x86 ASM

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

Assembly Language: Function Calls

CS241 Computer Organization Spring 2015 IA

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

CS61 Section Solutions 3

Assembly Language: Function Calls" Goals of this Lecture"

22 Assembly Language for Intel-Based Computers, 4th Edition. 3. Each edge is a transition from one state to another, caused by some input.

Reverse Engineering II: The Basics

Assembly Language: Function Calls" Goals of this Lecture"

Reverse Engineering II: The Basics

3.1 DATA MOVEMENT INSTRUCTIONS 45

Second Part of the Course

x86 assembly CS449 Fall 2017

Computer Architecture and System Programming Laboratory. TA Session 3

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

mith College Computer Science CSC231 Assembly Week #11 Fall 2017 Dominique Thiébaut

Computer Architecture and Assembly Language. Practical Session 3

COMPUTER ENGINEERING DEPARTMENT

Introduction to Reverse Engineering. Alan Padilla, Ricardo Alanis, Stephen Ballenger, Luke Castro, Jake Rawlins

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 5

16.317: Microprocessor Systems Design I Fall 2015

Lecture 4 CIS 341: COMPILERS

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

Q1: Multiple choice / 20 Q2: Protected mode memory accesses

ECOM Computer Organization and Assembly Language. Computer Engineering Department CHAPTER 7. Integer Arithmetic

Computer Science Final Examination Wednesday December 13 th 2006

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

Basic Assembly Instructions

Sistemi Operativi. Lez. 16 Elementi del linguaggio Assembler AT&T

W4118: PC Hardware and x86. Junfeng Yang

CMSC 313 Lecture 07. Short vs Near Jumps Logical (bit manipulation) Instructions AND, OR, NOT, SHL, SHR, SAL, SAR, ROL, ROR, RCL, RCR

Dr. Ramesh K. Karne Department of Computer and Information Sciences, Towson University, Towson, MD /12/2014 Slide 1

Rev101. spritzers - CTF team. spritz.math.unipd.it/spritzers.html

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Process Layout and Function Calls

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

CSE351 Spring 2018, Midterm Exam April 27, 2018

Module 3 Instruction Set Architecture (ISA)

Machine and Assembly Language Principles

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

Inline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing

Summary: Direct Code Generation

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Ex: Write a piece of code that transfers a block of 256 bytes stored at locations starting at 34000H to locations starting at 36000H. Ans.

Lecture (08) x86 programming 7

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 12

T Jarkko Turkulainen, F-Secure Corporation

Towards the Hardware"

The x86 Architecture

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

The Instruction Set. Chapter 5

Instructions moving data

Intel x86-64 and Y86-64 Instruction Set Architecture

Defining and Using Simple Data Types

Control flow. Condition codes Conditional and unconditional jumps Loops Switch statements

We will first study the basic instructions for doing multiplications and divisions

Assembly Language Programming: Procedures. EECE416 uc. Charles Kim Howard University. Fall

Intel 8086: Instruction Set

How Software Executes

Compiler Construction D7011E

x86 Assembly Crash Course Don Porter

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB. Lab # 10. Advanced Procedures

Inside VMProtect. Introduction. Internal. Analysis. VM Logic. Inside VMProtect. Conclusion. Samuel Chevet. 16 January 2015.

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998

Q1: Multiple choice / 20 Q2: Memory addressing / 40 Q3: Assembly language / 40 TOTAL SCORE / 100

COMPUTER ENGINEERING DEPARTMENT

mith College Computer Science CSC231 Assembly Week #9 Spring 2017 Dominique Thiébaut

CSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK

Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION

16.317: Microprocessor Systems Design I Spring 2015

Intel Instruction Set (gas)

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Language of x86 processor family

Transcription:

1 Reverse Engineering Low Level Software CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Machine code 2

3 Machine code Assembly compile Machine Code disassemble

4 Machine code Assembly compile Directly mappable Not directly mappable Machine Code disassemble

5 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O

6 Computer Architecture Handles control logic CPU Control Unit Registers Main memory (RAM) ALU Disk I/O

7 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O Handles arithmetic

8 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O Short-term storage FAST access!

9 Computer Architecture External storage (longer term storage) Higher latency than registers CPU Control Unit Registers Main memory (RAM) ALU Disk I/O

10 Our Focus CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack

11 Our Focus Contains program instructions CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack

12 Low-level Instruction Sets Instruction set architecture Set of low-level instructions defined by the architecture vendor Map directly to machine code/digital logic in hardware e.g., mov ECX, = 0xB916 =

13 Low-level Instruction Sets Instruction set architecture Set of low-level instructions defined by the architecture vendor Map directly to machine code/digital logic in hardware e.g., mov ECX, = 0xB916 = 1011 1001 2 Limited set of registers corresponding to hw components

14 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value

15 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value

16 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value

17 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value

18 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value

19 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack

20 Low-Level Data Management Registers Small memory that reside within the processor Little or no performance penalty Very few (8 32-bit generic registers in IA-32) Used in conjunction with external memory These issues are managed in assembly code

21 Low-Level Perspectives Low-level pseudo code 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value May also multiply values directly from data memory

22 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack

23 Low-Level Data Management Stack Non-register memory Used for short-term secondary storage LIFO Uses of the stack Temporarily saved register values Local variables Function parameters and return addresses

24 Low-Level Data Management Stack 32-bits (DWORD) ESP EBP Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Previously Stored Value Lower Memory Address Higher Memory Address

25 Low-Level Data Management Stack 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Push Direction Lower Memory Address Higher Memory Address

26 Low-Level Data Management Stack EAX EBX ECX 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Lower Memory Address Higher Memory Address

27 Low-Level Data Management Stack EAX EBX ECX Value 3 Value 2 Value 1 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Lower Memory Address Higher Memory Address

28 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack

29 Low-Level Data Management Heap Variable sized memory allocation/de-allocation Program requests, gets a pointer/reference to allocated block (new, malloc, calloc, ) Used for objects that are too big for the stack Data section char szwelcome[] = Hello. ; Global variables Long-term storage

30 IA-32 Assembly Language Intel Architecture, 32-bit (AKA: i386) Used for most Intel compatible CPUS AMD, VIA, x86 Two notations (semantically equivalent) AT&T assembly for GNU (unix) Intel notation (windows)

31 IA-32 Assembly Language Intel Architecture, 32-bit (AKA: i386) Used for most Intel compatible CPUS AMD, VIA, x86 Two notations (semantically equivalent) AT&T assembly for GNU (unix) Intel notation (windows) In this class

32 Some IA-32 Registers 8 general registers 6 segment registers 1 FLAGS register 1 Instruction pointer

33 Some IA-32 Registers 8 general registers Used for any purpose, but some good practices 6 segment registers Points to areas in memory for efficiency 1 FLAGS register Maintains some state Set according to results of instruction execution 1 Instruction pointer Contains the memory address to the next instruction that will be executed

34 IA-32 General Registers Common usage

35 IA-32 General Registers Common usage General Purpose -EAX usually holds function return values -ECX usually holds iterator Points to the top of the stack Indicies for efficient memory copies Points to the base of the stack

36 Flags Register Special register (not directly modifiable) Contains flags to hold status and other information Record current logical state Updated by logical/integer instructions to record outcomes Later instructions may depend on these outcomes e.g., bit 0 is CF is set when result is out of range bit 6 is ZF: set when result of an operation is 0

37 Instruction Pointer Register Labeled as EIP Contains the address of the next instruction to execute tells the processor what to do next

38 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax EAX EBX

39 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 2 EAX EBX

40 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX EBX

41 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX 3 EBX

42 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX 3 EBX mov is really a copy

43 Instruction Format Usually instructions consist of: Opcode (operation code) and one or two operands function name and parameters Operands come in three forms: Register name Immediate (constant value) Memory address move(a, b)"

44 Operands Type Example Operand Description Register EAX Access EAX register for reading/writing Immediate 6, 0x4000 349e, <label>* Memory Address [0x4000 349e], [EAX], <label>* A constant value A memory address * With some exceptions, control flow instructions (jmp, call, etc.) treat labels as immediate while non-control flow instructions treat them as memory addresses (more on this later).

45 Common Arithmetic Operations Instruction 1. ADD A, B 2. SUB A, B 3. MUL A 4. DIV A 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature

46 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A 4. DIV A 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature

47 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A EDX:EAX = EAX * A (unsigned) 4. DIV A EAX=EDX:EAX/A EDX=EDX:EAX%A (unsigned) 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature

48 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A EDX:EAX = EAX * A (unsigned) 4. DIV A EAX=EDX:EAX/A EDX=EDX:EAX%A (unsigned) 5. IMUL A Same as 3. except signed 6. IDIV A Same as 4. except signed Note: Some opcodes have more than one signature

49 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=?? ZF=?? A=B CF=?? ZF=?? A>B CF=?? ZF=?? 2. TEST A, B

50 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=?? ZF=?? A>B CF=?? ZF=?? 2. TEST A, B

51 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=?? ZF=?? 2. TEST A, B

52 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B

53 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B A AND B If A == 0 OR B==0 {??} Else {??}

54 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B A AND B If A == 0 OR B==0 {ZF=1; CF=0} Else {ZF=0, CF=0}

55 Function Call Instructions Instruction 1. CALL ADDR 1. Push address of the instruction after CALL onto stack i. Adjust stack pointer (ESP) 2. Place ADDR into EIP 2. Leave 1. Set top of the stack to previous top (MOV ESP, EBP) 2. Set EBP to old EBP (POP EBP) 3. RET/RETN 1. Pop return address from stack and place into EIP i. Adjust ESP

56 Function Calls FuncA: PUSH EAX CALL FuncA ADD ESP, 4 <do something> RET Steps: 1. Push parameters 2. Push current state 3. Process FuncA 4. Pop previous state and parameters 5. Adjust stack 6. Continue processing ESP EBP current state data Value in EAX Previously Stored Value

57 Function Calls FuncA: PUSH EAX CALL FuncA ADD ESP, 4 <do something> RET Steps: 1. Push parameters 2. Push current state 3. Process FuncA 4. Pop previous state and parameters 5. Adjust stack 6. Continue processing ESP EBP current state data Value in EAX Previously Stored Value

58 Common Jumping Instructions Instruction Based on results from previous instructions, flags are set. Conditional jumps will use the flags to determine control. 1. jz/je target Jump if zero (zero flag is 1 or set) 2. jnz/jne target Jump if not zero (zero flag not set) 3. ja target Jump if above (zero flag not set and carry not set) (unsigned) 4. jb target Jump if below (carry is set) (unsigned) 5. jg Jump if greater (signed) 6. jl Jump if less (signed) 7. jge Jump if greater or equal (signed) 8. jmp target Just jump

59 Other Common Instructions Instruction 1. SHR A, B 2. SHL A, B 3. ROR A, B 4. ROL A, B 5. XOR A, B

60 Other Common Instructions Instruction 1. SHR A, B Shift right (divide by 2) store in A 2. SHL A, B Shift left (multiply by 2) store in A 3. ROR A, B Rotate right (1001 -> 1100) store in A 4. ROL A, B Rotate left (1100 -> 1001) store in A 5. XOR A, B Xor A B Result (stored in A) 0 0 0 0 1 1 1 0 1 1 1 0

61 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0xf020 ->??

62 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0xf020 -> don t jump

63 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0x0000 -> jump

64 Example 2 1. mov edi,[ecx+0xb0] 2. nop 3. mov ebx,[ecx+0xb8] 4. mul edi,ebx No operation does nothing

65 Example 2 1. mov edi,[ecx+0xb0] 2. nop 3. mov ebx,[ecx+0xb8] 4. mul edi,ebx Probably accessing some data structure

66 Example 3 1. push eax 2. push ebx 3. push ecx 4. push esi 5. call 0x10026eeb

67 Example 3 1. push eax 2. push ebx 3. push ecx 4. push esi 5. call 0x10026eeb Pushing parameters onto the stack and then calling a function.

68 Example 4a Register Operands 1. mov eax, ebx

69 Example 4a Register Operands 1. mov eax, ebx EAX 0x00B3 0040

70 Example 4b Indirect Addressing 1. mov eax, [ebx+8]

71 Example 4b Indirect Addressing 1. mov eax, [ebx+8] EAX 0x0000 0020

72 Example 4c Load Effective Address 1. lea eax, [ebx+8]

73 Example 4c Load Effective Address 1. lea eax, [ebx+8] EAX 0x00B3 0048

74 Example 4d Offset and Code Labels 1. push offset loc_b30048 Stack loc_b30048 Previously Stored Value

75 Example 4d Offset and Code Labels 1. push offset loc_b30048 Stack loc_b30048 0x00B3 0048 Previously Stored Value

76 Label usage examples Control flow jmp <label> -jump to the memory address <label> (here treated as an immediate operand) <label>

77 Label usage examples Control flow jmp <label> -jump to the memory address <label> (here treated as an immediate operand) <label>

78 Label usage examples Non-control flow mov EAX, <label> -store value contained at memory address<label> (here treated as memory operand) <label>

79 Label usage examples Non-control flow mov EAX, <label> -store value contained at memory address<label> (here treated as memory operand) <label>

80 Label usage examples Non-control flow mov EAX, offset <label> -store memory address<label> (here treated as immediate operand) <label>

81 Example 5 1. mov ecx, esi 2. mov eax, [edx+ecx*4] 3. push eax 4. add ecx, 1 5. mov eax, [edx+ecx*4] 6. push eax 7. call 0x10026eeb

82 Example 5 1. mov ecx, esi 2. mov eax, [edx+ecx*4] 3. push eax 4. add ecx, 1 5. mov eax, [edx+ecx*4] 6. push eax 7. call 0x10026eeb

Size directives 83

84 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, mychar

85 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, mychar Compare a single byte at [eax] with a byte at mychar

86 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, [mychar] Compare a single byte at [eax] with a byte at??

87 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, [mychar] Compare a single byte at [eax] with a byte at the address inside of mychar

88 Some Things to Keep in Mind Endianness x86 is little endian (lsb in lowest mem) 0x42 = 0x42 00 00 00 IP data and others use big endian (lsb in highest mem) 0x42 = 0x00 00 00 42 Some compiler optimizations Loop unrolling Redundancy elimination Instruction reordering

89 Keep in Mind What if you encounter an unfamiliar instruction? http://www.intel.com/content/www/us/en/processors/a rchitectures-software-developer-manuals.html Volume I: Basic Architecture Volume II: Instruction Set Reference A-M, N-Z Volume III: System Programming Guide The x86 assembly guide http://www.cs.virginia.edu/~evans/cs216/guides/x86.ht ml#memory

90 Software Execution Environments - Bytecodes Bytecode execution High-level code compile Bytecode Compile/interpret Native execution High-level code compile Machine code/assembly Machine code/assembly CPU Execution

91 Software Execution Environments - Bytecodes Platform isolation Runs on any OS where the VM can execute Avoid compatibility issues Facilitates baseline software distribution Enhanced functionality Monitors not available on hardware Manage resources Type safety

92 Software Execution Environments - Bytecodes Drawbacks

93 Software Execution Environments - Bytecodes Drawbacks Performance! Alleviations: Just in time compilation Easier to reverse because of metadata used by the interpreter/vm/runtime Obfuscation can be used to make reversing more difficult

Exercise 94