Reverse Engineering II: Basics Gergely Erdélyi Senior Antivirus Researcher
Agenda Very basics Intel x86 crash course Basics of C
Binary Numbers
Binary Numbers 1
Binary Numbers 1 0 1 1
Binary Numbers 1 0 1 1 - Nibble
Binary Numbers 1 0 1 1 - Nibble B
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B 1 1 0 1 D
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B 1 1 0 1 D - Byte
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B 1 1 0 1 D - Byte 1 0 1 1 B 1 1 0 1 D
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 B 1 1 0 1 D - Byte 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 B D 3 9
Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 - Word B D 3 9
Byte Order a.k.a. Endianness
Byte Order a.k.a. Endianness 00 01 12 34
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian)
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 12
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12 00
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12 00 01
Byte Order a.k.a. Endianness 00 01 12 34 = 0x3412 (Little Endian) = 0x1234 (Big Endian) 34 12 = 0x1234 (Little Endian) = 0x3412 (Big Endian) 00 01
Little Endian Dword
Little Endian Dword 00 01 02 03 12 34 56 78
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 12 00 01 02 03
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 34 12 00 01 02 03
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 56 34 12 00 01 02 03
Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 78 56 34 12 0x12345678 00 01 02 03
Endianness Matters
Endianness Matters Data exchange between computers
Endianness Matters Data exchange between computers Networking protocols
Endianness Matters Data exchange between computers Networking protocols File formats for disk storage
System Endianness Little Endian Big Endian Switchable Endianness Intel x86 Intel 8051 Most ucontrollers PowerPC (exc. G5) Sparc (exc. v9) System/370 ARM Alpha Intel IA64
ASCII Code 0x00-0x1F 0x20-0x3F 0x40-0x5F 0x60-0x7E Control Characters Digits and Punctuation Upper-case Letters and Special Lower-case Letters and Special Backspace, Line feed 0-9 <> =.,: *-()! ABCD... @[]\^_ abcd... `{} ~
ASCII Example H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34
ASCII Example H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34 http://en.wikipedia.org/wiki/ascii
Unicode Strings H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00
Unicode Strings H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00
Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00
Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00 UTF-16 / UCS-2
Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00 UTF-16 / UCS-2 http://en.wikipedia.org/wiki/utf-16/ucs-2 http://en.wikipedia.org/wiki/category:unicode
String Storage
String Storage ASCIIZ: Zero-terminated ASCII
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string H e l l o
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string H e l l o ASCIIZ: 48 65 6C 6C 6F 00
String Storage ASCIIZ: Zero-terminated ASCII Pascal: Size byte + ASCII string Delphi: Size Dword + ASCII or Unicode string H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F
Intel x86 Architecture
Intel x86 Architecture Image Copyright 2004 GNU
Introduction to Intel x86
Introduction to Intel x86 Started with 8086 in 1978
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686...
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture 32-bit is called x86-32 or IA-32
Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture 32-bit is called x86-32 or IA-32 64-bit is called x86-64, AMD64, EMT64T
Intel 80386
Intel 80386 Introduced in 1986
Intel 80386 Introduced in 1986 Has a 32-bit word length
Intel 80386 Introduced in 1986 Has a 32-bit word length Has 8 general-purpose registers
Intel 80386 Introduced in 1986 Has a 32-bit word length Has 8 general-purpose registers Supports paging and virtual memory
Intel 80386 Introduced in 1986 Has a 32-bit word length Has 8 general-purpose registers Supports paging and virtual memory Addresses up to 4GiB of memory
Data Register Layout Image Copyright 1997-2008 Intel Corporation
Data Registers AL / AH / AX EAX BL / BH / BX EBX CL / CH / CX ECX DL / DH / DX EDX Accumulator Data index Loop counter Data register Arithmetic operations General data storage, index Loop constructs Arithmetics
Address Registers IP / EIP Instruction Pointer Program execution SP / ESP Stack Pointer Stack operation BP / EBP Base Pointer Stack frame SI / ESI Source Index String operation DI / EDI Destination Index String operation
EFLAGS Register Image Copyright 1997-2008 Intel Corporation
Segment Registers CS Code Segment Program code DS Data Segment Program data ES / FS / GS Other Segments Other uses
Mnemonic Examples MOV EAX, 1 ADD EDX, 5 SUB EBX, 2 AND ECX, 0 XOR EDX, 4 SHL ECX, 6 Move 1 to EAX Add 5 to EDX Subtract 2 from EBX Bit-wise AND 0 to ECX Bit-wise exclusive OR 4 to EDX Shift ECX left by six ROR EBX, 3 Bit-wise rotate EBX right by 3 INC ECX Increment ECX
More Mnemonics JNZ label JMP label CALL func RET LOOP label PUSH EAX POP EDI LODSB Jump if not zero (equal) Unconditional jump to label Call function Return from function ECX--, Jump to label if not zero Push EAX to stack Pop EDI from stack Load byte from DS:ESI to AL
Reversing C Image Copyright 1988, 1978 by Bell Telephone Labratories, Incorporated
Basic Data Types
Basic Data Types char - 1 byte
Basic Data Types char - 1 byte short - 2 bytes
Basic Data Types char - 1 byte short - 2 bytes int - 4 bytes (platform word)
Basic Data Types char - 1 byte short - 2 bytes int long - 4 bytes (platform word) - 4 bytes
Basic Data Types char - 1 byte short - 2 bytes int long - 4 bytes (platform word) - 4 bytes float - 4 bytes floating point
Basic Data Types char - 1 byte short - 2 bytes int long - 4 bytes (platform word) - 4 bytes float - 4 bytes floating point double - 8 bytes floating point
Arrays and Pointers
Arrays and Pointers Pointers can point to any memory location
Arrays and Pointers Pointers can point to any memory location One-dimensional arrays are flat memory
Arrays and Pointers Pointers can point to any memory location One-dimensional arrays are flat memory Multi-dimensional arrays use pointers
Arrays and Pointers Pointers can point to any memory location One-dimensional arrays are flat memory Multi-dimensional arrays use pointers A[0] A[1] A[2] A[3] char a[4]; char *b, c; c = a[2]; c = *(b+2);
Structures and Unions
Structures and Unions Structure struct { unsigned int id; unsigned short age; char name[16]; } record;
Structures and Unions Structure struct { unsigned int id; unsigned short age; char name[16]; } record; Memory is allocated for all members combined. sizeof(record) = 24
Structures and Unions Structure struct { unsigned int id; Union union foo { int one; unsigned short age; char two; char name[16]; }; } record; Memory is allocated for all members combined. sizeof(record) = 24
Structures and Unions Structure struct { unsigned int id; Union union foo { int one; unsigned short age; char two; char name[16]; }; } record; Memory is allocated for all members combined. sizeof(record) = 24 Memory is allocated for the largest member only. sizeof(foo) = 4
Structure Alignment
Structure Alignment Data structures are aligned to word size
Structure Alignment Data structures are aligned to word size #pragma pack(n) directive can change it
Structure Alignment Data structures are aligned to word size #pragma pack(n) directive can change it #pragma pack(1) removes alignment
Structure Alignment Data structures are aligned to word size #pragma pack(n) directive can change it #pragma pack(1) removes alignment Important when reconstructing structures
Structure Storage Aligned DWORD id WORD age 2 bytes padding Packed DWORD id WORD age 16 BYTES name 16 BYTES name sizeof(record) = 24 sizeof(record) = 22
Simple C Program int foobar(int x, int y) { int z = x+y; return z; } int main(void) { int z = foobar(1, 2); }
Function Calls
Function Calls Calling conventions are important to know
Function Calls Calling conventions are important to know Mixing them will crash the program
Function Calls Calling conventions are important to know Mixing them will crash the program stdcall - Standard calls on Windows
Function Calls Calling conventions are important to know Mixing them will crash the program stdcall - Standard calls on Windows cdecl - Most common C calling convention
Function Calls Calling conventions are important to know Mixing them will crash the program stdcall - Standard calls on Windows cdecl - Most common C calling convention fastcall - Uses registers for arguments
Function Calls Calling conventions are important to know Mixing them will crash the program stdcall - Standard calls on Windows cdecl - Most common C calling convention fastcall - Uses registers for arguments thiscall - Pass this pointer in ECX in C++
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
cdecl Calls PUSH arg2 PUSH arg1 CALL function ADD ESP,8 PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RET Stack ARG2 ARG1 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg2: EBP+12 loc1: EBP-4
stdcall Calls PUSH arg1 PUSH arg2 CALL function PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RETN 8 ARG1 ARG2 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg1: EBP+12 loc1: EBP-4
stdcall Calls PUSH arg1 PUSH arg2 CALL function PUSH EBP MOV EBP, ESP SUB ESP, 4 MOV EAX, [EBP+8] MOV ESP, EBP POP EBP RETN 8 ARG1 ARG2 RET Addr. Saved EBP LOC1 arg1: EBP+8 arg1: EBP+12 loc1: EBP-4
Further Reading Intel Processor Documentation http://www.intel.com/products/processor/ manuals/index.htm Netwide Assembler Mnemonic Documentation http://sourceforge.net/docman/ display_doc.php?docid=47259&group_id=6208 The Art of Assembly Language Programming Windows 32-bit Edition http://webster.cs.ucr.edu/aoa/index.html