Reverse Engineering II: The Basics This document is only to be distributed to teachers and students of the Malware Analysis and Antivirus Technologies course and should only be used in accordance with the course guidelines. Protecting the irreplaceable f-secure.com
Agenda Very basics Intel x86 crash course Basics of C reversing 2
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B 1 1 0 1 D - Byte 1 0 1 1 B 1 1 0 1 D 0 0 1 1 3 1 0 0 1 9 - Word 3
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12 00 01 = 0x1234 (Little Endian) = 0x3412 (Big Endian) 4
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 78 56 34 12 00 01 02 03 0x12345678 5
Endianness Matters Data exchange between computers Networking protocols File formats for disk storage 6
System Endianness Little Endian Intel x86 Intel 8051 Most ucontrollers Big Endian PowerPC (exc. G5) Sparc (exc. v9) System/370 Switchable Endianness ARM Alpha Intel IA64 7
ASCII Code 0x00-0x1F 0x20-0x3F 0x40-0x5F 0x60-0x7E Control Characters Digits and Punctuation Upper-case Letters and Special Lower-case Letters and Special Backspace, Line feed 0-9 <> =.,: *-()! ABCD... @[]\^_ abcd... `{} ~ 8
ASCII Example H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34 http://en.wikipedia.org/wiki/ascii 9
Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00 UTF-16 / UCS-2 http://en.wikipedia.org/wiki/utf-16/ucs-2 http://en.wikipedia.org/wiki/category:unicode 10
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F 11
Intel x86 Architecture Image Copyright 2004 GNU 12
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture 32-bit is called x86-32 or IA-32 64-bit is called x86-64, AMD64, EMT64T 80386 introduced in 1986 Has a 32-bit word length Has eight general-purpose registers Supports paging and virtual memory Addresses up to 4GiB of memory 13
Data Register Layout Image Copyright 1997-2008 Intel Corporation 14
Data Registers AL / AH / AX EAX Accumulator Arithmetic operations BL / BH / BX EBX Data index General data storage, index CL / CH / CX ECX Loop counter Loop constructs DL / DH / DX EDX Data register Arithmetics 15
Address Registers IP / EIP Instruction Pointer Program execution SP / ESP Stack Pointer Stack operation BP / EBP Base Pointer Stack frame SI / ESI Source Index String operation DI / EDI Destination Index String operation 16
Segment Registers CS Code Segment Program code DS Data Segment Program data ES / FS / GS Other Segments Other uses 17
EFLAGS Register Image Copyright 1997-2008 Intel Corporation 18
Mnemonic Examples MOV EAX, 1 ADD EDX, 5 SUB EBX, 2 AND ECX, 0 XOR EDX, 4 SHL ECX, 6 Move 1 to EAX Add 5 to EDX Subtract 2 from EBX Bit-wise AND 0 to ECX Bit-wise exclusive OR 4 to EDX Shift ECX left by six ROR EBX, 3 Bit-wise rotate EBX right by 3 INC ECX Increment ECX 19
More Mnemonics JNZ label JMP label CALL func RET LOOP label PUSH EAX POP EDI LODSB Jump if not zero (equal) Unconditional jump to label Call function Return from function ECX--, Jump to label if not zero Push EAX to stack Pop EDI from stack Load byte from DS:ESI to AL 20
Reversing C code
Basic Data Types char - 1 byte short - 2 bytes int - 4 bytes (platform word) long - 4 bytes float - 4 bytes floating point double - 8 bytes floating point
Pointers and Arrays Pointers can point to any memory location One-dimensional arrays are flat memory Multi-dimensional arrays use pointers A[0] A[1] A[2] A[3] char a[4]; char *b, c; c = a[2]; b = a; c = *(b+2);
Composite Types: Structure Memory is allocated for all members Members are accessible separately struct { unsigned int id; unsigned short age; char name[16]; } record;
Alignment Data structures are aligned to word size #pragma pack(n) directive can change it #pragma pack(1) removes alignment Important when reconstructing structures
Structure Storage Aligned long id; short age; 2-byte padding Packed long id; short age; char name[16]; char name[16]; sizeof(record) = 24 sizeof(record) = 22
Composite Types: Union Memory is allocated for the largest member Holds only one member at a time union foo { int one; }; char two;
Control Structures Conditional Branch Iteration Switch-Case Goto label
Conditional Branch: if var_c = dword ptr -0Ch int example_if() { int foo = 0; if (foo) { do_one_thing(); } else { do_another(); } } push ebp mov ebp, esp sub esp, 18h mov [ebp+var_c], 0 cmp [ebp+var_c], 0 jz short loc_1f27 call _do_one_thing jmp short locret_1f2c loc_1f27: call _do_another locret_1f2c: leave retn
Iteration: for int example_for() { int i; } for (i=0; i<10; i++) { if (check_something(i)) break; } push ebp mov ebp, esp sub esp, 28h mov [ebp+var_c], 0 jmp short loc_1f51 loc_1f3d: mov eax, [ebp+var_c] mov [esp], eax call _check_something test eax, eax jnz short locret_1f57 lea eax, [ebp+var_c] inc dword ptr [eax] loc_1f51: cmp [ebp+var_c], 9 jle short loc_1f3d locret_1f57: leave retn
Iteration: while int example_while() { int i = 0; } while (i < 100) { if (check_something(i)) break; } push ebp mov ebp, esp sub esp, 28h mov [ebp+var_c], 0 jmp short loc_1f77 loc_1f68: mov eax, [ebp+var_c] mov [esp], eax call _check_something test eax, eax jnz short locret_1f7d loc_1f77: cmp [ebp+var_c], 64h jl short loc_1f68 locret_1f7d: leave retn
Branching: Switch-Case int example_switch() { int i = 1; } switch (i) { case 0: do_one_thing(); break; case 1: do_another(); break; default: check_something(i); } push ebp mov ebp, esp sub esp, 38h mov [ebp+var_c], 1 mov eax, [ebp+var_c] mov [ebp+var_1c], eax cmp [ebp+var_1c], 0 jz short loc_1fab cmp [ebp+var_1c], 1 jz short loc_1fb2 mov eax, [ebp+var_c] mov [esp], eax call _check_something jmp short locret_1fb9 loc_1fab: call _do_one_thing jmp short locret_1fb9 loc_1fb2: call _do_another jmp short $+2 locret_1fb9: leave retn
Branching: Goto int example_goto(void) { open_files(); if do_one_thing() goto error; if do_another() goto error; close_files(); return 1; error: close_files(); return 0; } push ebp mov ebp, esp sub esp, 18h call _open_files call _do_one_thing test eax, eax jnz short loc_1fe6 call _do_another test eax, eax jnz short loc_1fe6 call _close_files mov [ebp+var_c], 1 jmp short loc_1ff2 loc_1fe6: call _close_files mov [ebp+var_c], 0 loc_1ff2: mov eax, [ebp+var_c] leave retn
Function Calling Conventions Common calling conventions: stdcall - Standard calls on Windows cdecl - Most common C calling convention fastcall - Uses registers for arguments thiscall - Pass this pointer in ECX in C++ Most important: Who is going to clean the stack? Mixing them will crash the program
Simple C Program int foobar(int x, int y) { int z; return x; } int main(void) { int z = foobar(1, 2); }
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
stdcall Calls PUSH arg2 PUSH arg1 CALL function PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RETN 8 ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
Reading Intel x86 Function-call Conventions: http://www.unixwiz.net/techtips/win32- callconv-asm.html C Programming Information: http://www.cprogramming.com/