Shellcode Analysis Chapter 19
What is Shellcode Shellcode a payload of raw executable code, attackers use this code to obtain interactive shell access. A binary chunk of data Can be generally referred as self-contained executable codes IDAPro can load the shellcode binary but no automatic analysis is available since no executable file format that describes the content
Position-Independent Code No hard-coded addresses shellcode Table 19-1, p. 408 call/jmp are position independent calculate target addresses by adding an offset mov accessing global memory location is not position independent/mov accessing addresses with an offset is position independent Shellcode no hard-coded memory addresses All branches and jumps relative Code can be placed anywhere in memory and still function as intended Essential in exploit code and shellcode being injected from a remote location since addresses are not known
Identifying Execution Location Shellcode may need to find out its execution location dereference base pointer x86 does not provide EIP-relative access to embedded data as it does for control-flow instructions Must load EIP into general purpose register Problem: mov %eax, %eip not allowed Two methods call/pop call pushes EIP of next instruction onto stack, pop retrieves it (Listing 19-1, p. 410)
Example JMP-CALL-POP Jmp to the shellcode Dynamically figure our the memory address Of Hello Word no hard coded address After call, the next instruction address will Be pushed to stack Inside call, pop this address on stack to EDI
Manual Symbol Resolution Shellcode need to resolve external symbols Shellcode can not use Windows loader to ensure libraries are in process memory - Find symbols by itself Must dynamically locate functions such as LoadLibraryA and GetProcAddress (both located in kernel32.dll) Finding kernel32.dll in memory Undocumented structure traversal (Figure 19-1, Listing 19-4, p. 414, 415) From Windows 2000 through Vista, kernel32.dll follows ntdll.dll (second place InInitializationOrderLinks) Windows 7/10 change this so need to confirm using UNICODE_STRING_FullDllName
Locate kernel32.dll Begins with TEB-> FS segment register offset 0x30 -> Offset 0xC within PEB -> linked list traversal Windows 2000-Vista, Kernel32.dll follows ntdll.dll; changed after windows 7.
Parsing PE Export Data After base address is found for kernel32.dll, Parsing PE Export Data in kernel32.dll for exported symbols. Addresses of exported calls in header (relative virtual addresses in IMAGE_EXPORT_DIRECTORY ) AddressOfFunctions, AddressOfNames, AddressOfNameOrdinals arrays (Figure 19-2, p. 417) To make shellcode compact, hashes of function names used to compare 32-bit rotate-right-additive hash (Listing 19-5, 19-6, p. 418-419) calculates a 32-bit hash value
Shellcode Encoding Shellcode must embed in the program before exploit occurs/or passed to exploit Exploit unsafe string function: strcpy, strcat they do not set maximum length (buffer overflow) Shellcode must look like valid data, no NULL bytes in the middle if using strcpy/strcat (ends with NULL), which will terminate buffer overflow pre-maturely Encode the payload to pass the filter (makes analysis more difficult)
Buffer Overflow Attacks Return address stored on stack Attackers want to overwrite the return address with another malicious address redirect to shellcode Attackers have to deal with two unknowns: 1. What is the distance between the overflown buffer and the return address slot? attackers have to make guesses about the displacement 2. What is the actual address of the shellcode? Shellcode is in the buffer, part of the data Attackers have to make guesses of the shellcode address use NOP sleds to increase hitting probability
NOP Sleds NOP no operation does nothing Long sequence of NOPs preceding shellcode Allows exploit to increase likelihood of hits by giving a range of addresses that will result in shellcode executing To avoid detection, can repeat increment/decrement of registers.