Reverse Engineering II: The Basics Gergely Erdélyi Senior Manager, Anti-malware Research Protecting the irreplaceable f-secure.com
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 - Word B D 3 9 2
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12 = 0x1234 (Little Endian) = 0x3412 (Big Endian) 00 01 3
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 78 56 34 12 0x12345678 00 01 02 03 4
Endianness Matters Data exchange between computers Networking protocols File formats for disk storage Mixing endinannes 5
System Endianness Little Endian Big Endian Switchable Endianness Intel x86 PowerPC (exc. G5) ARM Intel 8051 Sparc (exc. v9) Alpha Most ucontrollers System/370 Intel IA64 6
ASCII Code 0x00-0x1F 0x20-0x3F Control Characters Digits and Punctuation Backspace, Line feed 0-9 <> =.,: *-()! 0x40-0x5F 0x60-0x7E Upper-case Letters and Special Lower-case Letters and Special ABCD... @[]\^_ abcd... `{} ~ 7
ASCII Example H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34 http://en.wikipedia.org/wiki/ascii 8
Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00 UTF-16 / UCS-2 http://en.wikipedia.org/wiki/utf-16/ucs-2 http://en.wikipedia.org/wiki/category:unicode 9
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F 10
Intel x86 Architecture Image Copyright 2004 GNU 11
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture 32-bit is called x86-32 or IA-32 64-bit is called x86-64, AMD64, EMT64T 80386 introduced in 1986 Has a 32-bit word length Has eight general-purpose registers Supports paging and virtual memory Addresses up to 4GiB of memory 12
Data Register Layout Image Copyright 1997-2008 Intel Corporation 13
Data Registers AL / AH / AX EAX Accumulator Arithmetic operations BL / BH / BX EBX Data index General data storage, index CL / CH / CX ECX Loop counter Loop constructs DL / DH / DX EDX Data register Arithmetics 14
Address Registers IP / EIP Instruction Pointer Program execution SP / ESP Stack Pointer Stack operation BP / EBP Base Pointer Stack frame SI / ESI Source Index String operation DI / EDI Destination Index String operation 15
Segment Registers CS Code Segment Program code DS Data Segment Program data ES / FS / GS Other Segments Other uses 16
EFLAGS Register Image Copyright 1997-2008 Intel Corporation 17
Mnemonic Examples MOV EAX, 1 ADD EDX, 5 SUB EBX, 2 AND ECX, 0 XOR EDX, 4 SHL ECX, 6 Move 1 to EAX Add 5 to EDX Subtract 2 from EBX Bit-wise AND 0 to ECX Bit-wise exclusive OR 4 to EDX Shift ECX left by six ROR EBX, 3 Bit-wise rotate EBX right by 3 INC ECX Increment ECX 18
More Mnemonics JNZ label JMP label CALL func RET LOOP label PUSH EAX POP EDI LODSB Jump if not zero (equal) Unconditional jump to label Call function Return from function ECX--, Jump to label if not zero Push EAX to stack Pop EDI from stack Load byte from DS:ESI to AL 19
Reversing C Code Image Copyright 1988, 1978 by Bell Telephone Labratories, Incorporated 20
Basic Data Types char - 1 byte short - 2 bytes int long - 4 bytes (platform word) - 4 bytes float - 4 bytes floating point double - 8 bytes floating point 21
Arrays and Pointers Pointers can point to any memory location One-dimensional arrays are flat memory Multi-dimensional arrays use pointers A A A A char a[4]; char *b, c; c = a[2]; c = *(b+2); 22
Structures and Unions Structure Union struct { unsigned int id; unsigned short age; char name[16]; } record; union foo { int one; char two; }; Memory is allocated for all members combined. Memory is allocated for the largest member only. sizeof(record) = 24 sizeof(foo) = 4 23
Structure Alignment Data structures are aligned to word size by default #pragma pack(n) directive can change it #pragma pack(1) removes alignment Important when reconstructing structures 24
Structure Storage Aligned DWORD id WORD age 2-byte padding Packed DWORD id WORD age 16 BYTES name 16 BYTES name sizeof(record) = 24 sizeof(record) = 22 25
Simple C Program int foobar(int x, int y) { int z = x+y; return z; } int main(void) { int z = foobar(1, 2); } 26
Function Calls Calling conventions are important to know Mixing them will crash the program stdcall - Standard calls on Windows cdecl - Most common C calling convention fastcall - Uses registers for arguments thiscall - Pass this pointer in ECX in C++ 27
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4 28
stdcall Calls PUSH arg1 PUSH arg2 CALL function PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RETN 8 ARG1 ARG2 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg1: EBP+12 loc1: EBP-4 29
Further Reading Intel Processor Documentation http://www.intel.com/products/processor/ manuals/index.htm Netwide Assembler Mnemonic Documentation http://sourceforge.net/docman/display_doc.php? docid=47259&group_id=6208 The Art of Assembly Language Programming Windows 32-bit Edition http://webster.cs.ucr.edu/aoa/index.html 30