1 Reverse Engineering Low Level Software CS5375 Software Reverse Engineering Dr. Jaime C. Acosta
Machine code 2
3 Machine code Assembly compile Machine Code disassemble
4 Machine code Assembly compile Directly mappable Not directly mappable Machine Code disassemble
5 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O
6 Computer Architecture Handles control logic CPU Control Unit Registers Main memory (RAM) ALU Disk I/O
7 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O Handles arithmetic
8 Computer Architecture CPU Control Unit Registers Main memory (RAM) ALU Disk I/O Short-term storage FAST access!
9 Computer Architecture External storage (longer term storage) Higher latency than registers CPU Control Unit Registers Main memory (RAM) ALU Disk I/O
10 Our Focus CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack
11 Our Focus Contains program instructions CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack
12 Low-level Instruction Sets Instruction set architecture Set of low-level instructions defined by the architecture vendor Map directly to machine code/digital logic in hardware e.g., mov ECX, = 0xB916 =
13 Low-level Instruction Sets Instruction set architecture Set of low-level instructions defined by the architecture vendor Map directly to machine code/digital logic in hardware e.g., mov ECX, = 0xB916 = 1011 1001 2 Limited set of registers corresponding to hw components
14 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value
15 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value
16 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value
17 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value
18 Low-Level Perspectives High-level (C code) Low-level steps 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value
19 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack
20 Low-Level Data Management Registers Small memory that reside within the processor Little or no performance penalty Very few (8 32-bit generic registers in IA-32) Used in conjunction with external memory These issues are managed in assembly code
21 Low-Level Perspectives Low-level pseudo code 1. Store current state prior to executing function code 2. Allocate memory for z 3. Load parameters x and y from memory to registers 4. Multiple x and y, store result into register 5. Copy result into memory allocated for z 6. Restore state from (1) 7. Return to caller and send back z as return value May also multiply values directly from data memory
22 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack
23 Low-Level Data Management Stack Non-register memory Used for short-term secondary storage LIFO Uses of the stack Temporarily saved register values Local variables Function parameters and return addresses
24 Low-Level Data Management Stack 32-bits (DWORD) ESP EBP Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Unknown Data (unused) Previously Stored Value Lower Memory Address Higher Memory Address
25 Low-Level Data Management Stack 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Push Direction Lower Memory Address Higher Memory Address
26 Low-Level Data Management Stack EAX EBX ECX 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Lower Memory Address Higher Memory Address
27 Low-Level Data Management Stack EAX EBX ECX Value 3 Value 2 Value 1 32-bits ESP EBP Unknown Data (unused) Unknown Data (unused) Value 3 Value 2 Value 1 Previously Stored Value Lower Memory Address Higher Memory Address
28 Computer Architecture CPU Control Unit Registers Lower Memory Main Memory Text Data ALU Higher Memory Heap Stack
29 Low-Level Data Management Heap Variable sized memory allocation/de-allocation Program requests, gets a pointer/reference to allocated block (new, malloc, calloc, ) Used for objects that are too big for the stack Data section char szwelcome[] = Hello. ; Global variables Long-term storage
30 IA-32 Assembly Language Intel Architecture, 32-bit (AKA: i386) Used for most Intel compatible CPUS AMD, VIA, x86 Two notations (semantically equivalent) AT&T assembly for GNU (unix) Intel notation (windows)
31 IA-32 Assembly Language Intel Architecture, 32-bit (AKA: i386) Used for most Intel compatible CPUS AMD, VIA, x86 Two notations (semantically equivalent) AT&T assembly for GNU (unix) Intel notation (windows) In this class
32 Some IA-32 Registers 8 general registers 6 segment registers 1 FLAGS register 1 Instruction pointer
33 Some IA-32 Registers 8 general registers Used for any purpose, but some good practices 6 segment registers Points to areas in memory for efficiency 1 FLAGS register Maintains some state Set according to results of instruction execution 1 Instruction pointer Contains the memory address to the next instruction that will be executed
34 IA-32 General Registers Common usage
35 IA-32 General Registers Common usage General Purpose -EAX usually holds function return values -ECX usually holds iterator Points to the top of the stack Indicies for efficient memory copies Points to the base of the stack
36 Flags Register Special register (not directly modifiable) Contains flags to hold status and other information Record current logical state Updated by logical/integer instructions to record outcomes Later instructions may depend on these outcomes e.g., bit 0 is CF is set when result is out of range bit 6 is ZF: set when result of an operation is 0
37 Instruction Pointer Register Labeled as EIP Contains the address of the next instruction to execute tells the processor what to do next
38 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax EAX EBX
39 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 2 EAX EBX
40 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX EBX
41 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX 3 EBX
42 Instruction Format I II III Instruction Name(opcode) Destination Operand, Source Operand Example: MOV eax, 2 ADD eax, 1 MOV ebx, eax 3 EAX 3 EBX mov is really a copy
43 Instruction Format Usually instructions consist of: Opcode (operation code) and one or two operands function name and parameters Operands come in three forms: Register name Immediate (constant value) Memory address move(a, b)"
44 Operands Type Example Operand Description Register EAX Access EAX register for reading/writing Immediate 6, 0x4000 349e, <label>* Memory Address [0x4000 349e], [EAX], <label>* A constant value A memory address * With some exceptions, control flow instructions (jmp, call, etc.) treat labels as immediate while non-control flow instructions treat them as memory addresses (more on this later).
45 Common Arithmetic Operations Instruction 1. ADD A, B 2. SUB A, B 3. MUL A 4. DIV A 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature
46 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A 4. DIV A 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature
47 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A EDX:EAX = EAX * A (unsigned) 4. DIV A EAX=EDX:EAX/A EDX=EDX:EAX%A (unsigned) 5. IMUL A 6. IDIV A Note: Some opcodes have more than one signature
48 Common Arithmetic Operations Instruction 1. ADD A, B A = A + B (unsigned) 2. SUB A, B A = A B (unsigned) 3. MUL A EDX:EAX = EAX * A (unsigned) 4. DIV A EAX=EDX:EAX/A EDX=EDX:EAX%A (unsigned) 5. IMUL A Same as 3. except signed 6. IDIV A Same as 4. except signed Note: Some opcodes have more than one signature
49 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=?? ZF=?? A=B CF=?? ZF=?? A>B CF=?? ZF=?? 2. TEST A, B
50 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=?? ZF=?? A>B CF=?? ZF=?? 2. TEST A, B
51 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=?? ZF=?? 2. TEST A, B
52 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B
53 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B A AND B If A == 0 OR B==0 {??} Else {??}
54 Common Conditional Instructions Instruction 1. CMP A, B A B A<B CF=1 ZF=0 A=B CF=0 ZF=1 A>B CF=0 ZF=0 2. TEST A, B A AND B If A == 0 OR B==0 {ZF=1; CF=0} Else {ZF=0, CF=0}
55 Function Call Instructions Instruction 1. CALL ADDR 1. Push address of the instruction after CALL onto stack i. Adjust stack pointer (ESP) 2. Place ADDR into EIP 2. Leave 1. Set top of the stack to previous top (MOV ESP, EBP) 2. Set EBP to old EBP (POP EBP) 3. RET/RETN 1. Pop return address from stack and place into EIP i. Adjust ESP
56 Function Calls FuncA: PUSH EAX CALL FuncA ADD ESP, 4 <do something> RET Steps: 1. Push parameters 2. Push current state 3. Process FuncA 4. Pop previous state and parameters 5. Adjust stack 6. Continue processing ESP EBP current state data Value in EAX Previously Stored Value
57 Function Calls FuncA: PUSH EAX CALL FuncA ADD ESP, 4 <do something> RET Steps: 1. Push parameters 2. Push current state 3. Process FuncA 4. Pop previous state and parameters 5. Adjust stack 6. Continue processing ESP EBP current state data Value in EAX Previously Stored Value
58 Common Jumping Instructions Instruction Based on results from previous instructions, flags are set. Conditional jumps will use the flags to determine control. 1. jz/je target Jump if zero (zero flag is 1 or set) 2. jnz/jne target Jump if not zero (zero flag not set) 3. ja target Jump if above (zero flag not set and carry not set) (unsigned) 4. jb target Jump if below (carry is set) (unsigned) 5. jg Jump if greater (signed) 6. jl Jump if less (signed) 7. jge Jump if greater or equal (signed) 8. jmp target Just jump
59 Other Common Instructions Instruction 1. SHR A, B 2. SHL A, B 3. ROR A, B 4. ROL A, B 5. XOR A, B
60 Other Common Instructions Instruction 1. SHR A, B Shift right (divide by 2) store in A 2. SHL A, B Shift left (multiply by 2) store in A 3. ROR A, B Rotate right (1001 -> 1100) store in A 4. ROL A, B Rotate left (1100 -> 1001) store in A 5. XOR A, B Xor A B Result (stored in A) 0 0 0 0 1 1 1 0 1 1 1 0
61 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0xf020 ->??
62 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0xf020 -> don t jump
63 Example 1 1. cmp ebx,0xf020 2. jnz 0x10026509 If EBX == 0x0000 -> jump
64 Example 2 1. mov edi,[ecx+0xb0] 2. nop 3. mov ebx,[ecx+0xb8] 4. mul edi,ebx No operation does nothing
65 Example 2 1. mov edi,[ecx+0xb0] 2. nop 3. mov ebx,[ecx+0xb8] 4. mul edi,ebx Probably accessing some data structure
66 Example 3 1. push eax 2. push ebx 3. push ecx 4. push esi 5. call 0x10026eeb
67 Example 3 1. push eax 2. push ebx 3. push ecx 4. push esi 5. call 0x10026eeb Pushing parameters onto the stack and then calling a function.
68 Example 4a Register Operands 1. mov eax, ebx
69 Example 4a Register Operands 1. mov eax, ebx EAX 0x00B3 0040
70 Example 4b Indirect Addressing 1. mov eax, [ebx+8]
71 Example 4b Indirect Addressing 1. mov eax, [ebx+8] EAX 0x0000 0020
72 Example 4c Load Effective Address 1. lea eax, [ebx+8]
73 Example 4c Load Effective Address 1. lea eax, [ebx+8] EAX 0x00B3 0048
74 Example 4d Offset and Code Labels 1. push offset loc_b30048 Stack loc_b30048 Previously Stored Value
75 Example 4d Offset and Code Labels 1. push offset loc_b30048 Stack loc_b30048 0x00B3 0048 Previously Stored Value
76 Label usage examples Control flow jmp <label> -jump to the memory address <label> (here treated as an immediate operand) <label>
77 Label usage examples Control flow jmp <label> -jump to the memory address <label> (here treated as an immediate operand) <label>
78 Label usage examples Non-control flow mov EAX, <label> -store value contained at memory address<label> (here treated as memory operand) <label>
79 Label usage examples Non-control flow mov EAX, <label> -store value contained at memory address<label> (here treated as memory operand) <label>
80 Label usage examples Non-control flow mov EAX, offset <label> -store memory address<label> (here treated as immediate operand) <label>
81 Example 5 1. mov ecx, esi 2. mov eax, [edx+ecx*4] 3. push eax 4. add ecx, 1 5. mov eax, [edx+ecx*4] 6. push eax 7. call 0x10026eeb
82 Example 5 1. mov ecx, esi 2. mov eax, [edx+ecx*4] 3. push eax 4. add ecx, 1 5. mov eax, [edx+ecx*4] 6. push eax 7. call 0x10026eeb
Size directives 83
84 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, mychar
85 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, mychar Compare a single byte at [eax] with a byte at mychar
86 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, [mychar] Compare a single byte at [eax] with a byte at??
87 Example 6 1. movzx eax, byte ptr [eax] 2. cmp al, [mychar] Compare a single byte at [eax] with a byte at the address inside of mychar
88 Some Things to Keep in Mind Endianness x86 is little endian (lsb in lowest mem) 0x42 = 0x42 00 00 00 IP data and others use big endian (lsb in highest mem) 0x42 = 0x00 00 00 42 Some compiler optimizations Loop unrolling Redundancy elimination Instruction reordering
89 Keep in Mind What if you encounter an unfamiliar instruction? http://www.intel.com/content/www/us/en/processors/a rchitectures-software-developer-manuals.html Volume I: Basic Architecture Volume II: Instruction Set Reference A-M, N-Z Volume III: System Programming Guide The x86 assembly guide http://www.cs.virginia.edu/~evans/cs216/guides/x86.ht ml#memory
90 Software Execution Environments - Bytecodes Bytecode execution High-level code compile Bytecode Compile/interpret Native execution High-level code compile Machine code/assembly Machine code/assembly CPU Execution
91 Software Execution Environments - Bytecodes Platform isolation Runs on any OS where the VM can execute Avoid compatibility issues Facilitates baseline software distribution Enhanced functionality Monitors not available on hardware Manage resources Type safety
92 Software Execution Environments - Bytecodes Drawbacks
93 Software Execution Environments - Bytecodes Drawbacks Performance! Alleviations: Just in time compilation Easier to reverse because of metadata used by the interpreter/vm/runtime Obfuscation can be used to make reversing more difficult
Exercise 94