Exercise 6: Buffer Overflow and return-into-libc Attacks

Technische Universität Darmstadt Fachbereich Informatik System Security Lab Prof. Dr.-Ing. Ahmad-Reza Sadeghi M.Sc. David Gens Exercise 6: Buffer Overflow and return-into-libc Attacks Course Secure, Trusted and Trustworthy Computing WS 2017/2018 Organization!!! BRING USB THUMB DRIVES; NO INTERNET ON LAB PCs!!!.txt or.pdf file for your solution (there will be no points for.doc/.docx).c files that were used to solve the exercise all zipped, file name should include last names of all group members seperated by a hyhpen email with subject: [stc-ws17-ex06-sol] <last names of group members> latest 2.2.18, 23:59, to david.gens@cs.tu-darmstadt.de 1 Introduction In this exercise we will introduce the main principles of buffer overflow and return-into-libc attacks. In the practical assignment, you have to examine vulnerable sample programs and exploit these programs by means of a buffer overflow and return-into-libc attack. Despite the large amount of research that was made in the past decades, many applications still suffer from buffer overflow vulnerabilities. This is due to insufficient security-related assurance while developing applications and due to unsafe languages like C/C++. Such applications use dangerous functions that enforce no bounds-checking while retrieving input data from an untrusted user or from an untrusted device and therefore allows the adversary to launch buffer overflow attacks. The main target of a buffer overflow attack is usually to launch a (root) shell to the adversary in order to get full control over the system. A great deal of attention was paid to the W X security model [4] that marks a memory page either writable or executable. AMD and Intel even provide their new processors with a non-executable bit that can be enabled for each memory page. With W X, the adversary is no longer able to execute injected malicious code (that launches a shell to the adversary) because the injected code has to be placed into some writable (but not executable) memory area. However, a return-into-libc attack bypasses the W X security model as we will describe in Section 1.3. Before introducing the main principles of buffer overflow attacks in Section 1.2, we will briefly recall the x86 architecture in Section 1.1. 1

Function Arguments Stack grows downwards Return address Saved Base Ptr Local Variables Stack frame Stack pointer Figure 1: The stack frame 1.1 Intel x86 Architecture The Intel x86 or IA-32 architecture [2] is a well-established instruction set architecture deployed in personal computers. The size of one native machine word is 32 bit whereas each word is stored in little Endian format. 1 Instructions are of variable-length, and unaligned memory access is allowed. To enable program execution, x86 provides eight general-purpose registers (%eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, and %esp), six segment registers, one status register (%eflags), and the instruction pointer (%eip). Each general-purpose register is 32 bit, whereas some of them can be also accessed as a 16 bit (e.g., %ax) or as 8 bit registers (e.g., %ah and %al). The instruction pointer (%eip) holds the address of the next instruction to be executed. Usually, the instruction pointer is simply incremented for each instruction unless a branch instruction occurs. In this case the offset specified by the branch instruction is added to the current value of the instruction pointer. The traditional buffer overflow attack described by Aleph One [1] changes the flow of execution by overflowing a buffer allocated on the stack. Generally, the stack is a last in, first out (LIFO) memory area that provides two basic operations: pushing elements onto the stack (push instruction) and removing elements from the stack (pop instruction). Moreover, the stack is divided into individual stack frames (one stack frame per function) and each function call sets up a new stack frame on top of the stack. As depicted in Figure 1, a stack frame usually holds function arguments, the return address, the saved (old) base pointer (%ebp) and local variables. A special purpose register, the stack pointer (%esp), points to the top of the stack. On the x86 architecture the stack grows downwards (from high memory addresses to low memory addresses) and therefore the stack pointer has to point to the lowest address of the stack. Local variables and function arguments are accessed via an offset to the base pointer (therefore sometimes referred to as local frame pointer). Upon function return, control transfers to the code pointed to by the return address, i.e. control transfers back to the caller of the function. For this exercise, you don t need to have full knowledge over assembler programming. But for buffer overflow attacks it is very important that you understand the calling convention implemented on the x86 architecture. You need to know how the stack is arranged and for what purpose the %ebp and %esp registers are used. 1.2 Conventional Buffer Overflow Attack The main goal of a conventional buffer overflow attack [1] is to subvert the usual execution flow of a program by redirecting it to a malicious code that was not originally placed by the programmer. Basically, the attack consists of two tasks: (i) injecting new malicious code in some writable memory area and (ii) changing a code pointer in such a way that it points to the injected malicious code. The injected malicious code usually launches a shell to the adversary and is therefore often referred to as shellcode. The preferred code pointer to run the attack is the return address on the stack. However, also the saved base pointer can be used as an attack target, in the case the return address cannot be overwritten. This attack is referred to as frame pointer overwriting [3]. Figure 2 depicts a conventional buffer overflow on the stack described in the following. 1 Little Endian format means that the least significant byte (LSB) is stored at first. For instance, the string WORD, will be stored in memory as D, R, O, W 2

Program starts 1. vuln. func. called 4. waiting for input 2. Execution continues 5. vuln. func. returns 7. Shellcode New Ret Address Arbitrary data 6. Saved Base Ptr Arbitrary data local buffer Adversary 3. Stack frame of vulnerable function Figure 2: Conventional Buffer Overflow Attack 1. The vulnerable program is started by an authorized user. 2. After the program is initialized, user input is expected by the program that will afterwards be stored in a local buffer on the stack. 3. The adversary who has access to the program inserts input longer than expected by the program. The input of the adversary consists of arbitrary data (to fill the buffer and overwrite the saved base pointer), a new return address, and new code. The new return address points to the beginning of the injected code. 4. After retrieving user input, the current function continues execution until a return instruction is issued. 5. The return instruction of the function is reached. 6. The processor issues the return instruction and redirects control to the injected code because the adversary was able to change the return address in step 3. 7. The injected code is executed and it will perform its malicious behavior, e.g., typically launch a shell that the adversary can access and execute other commands on the machine. 1.3 Return-into-libc If the W X model [4] is enabled by the operating system (and supported by the hardware), the adversary will be no longer able to execute injected code, since a memory page is either marked writable or executable. Therefore, a more sophisticated attack was proposed to bypass defense mechanisms like W X by using a piece of code that resides in the process s image. The target for useful code pieces can particularly be found in the Unix C library libc which is linked to nearly every Unix program and provides a number of useful functions (to the adversary). Hence, the return address points to a valid function in libc like system or execve. The attack is referred to as return-into-libc [6]. Figure 3 depicts a return-into-libc attack described in the following. The adversary overfills a local buffer as in Section 1.2 and changes the return address to point to the libc function system in step 3. The system function allows the execution of one command with arguments. If, for instance, the target of the attack is to launch a shell, the adversary could execute the command /bin/sh. Above the address for the system function, the adversary pushes the return address of the exit function, which is another libc function, that closes the program. Further, another pointer is needed to reference the argument for the system function which is effectively the string /bin/sh that can be linked into the process image by setting up an environment variable. Finally, if the current vulnerable function returns in step 6, the system function is invoked with the argument /bin/sh to launch a new shell. After completion of the system function in step 7 (i.e., the shell is closed by the adversary), the exit function from libc is called (step 8) to close the program. 3

Program starts 7. libc system() {... return } exit() {... 8. } halt 1. vuln. func. called waiting for input 4. Environment variables 2. $SHELL= Pointer /bin/sh Address exit() Address system() "/bin/sh" Return Address of system( ) 6. Old Return Address of vuln. func. Execution continues 5. vuln. func. returns Arbitrary data Saved Base Ptr Arbitrary data local buffer Adversary 3. Stack frame of vulnerable function Figure 3: Return-into-libc attack that launches a shell to the adversary However, return-into-libc attacks are subject to some constraints. First, only those functions that reside in libc can be called by the adversary. If the designers of libc would remove functions that are of particular interest to the adversary (e.g., system, execve, etc.), crafting a return-into-libc attack will become more difficult. Second, the adversary can only execute straight-line code, i.e., he/she can only invoke functions one after the other. 1.4 GNU Debugger gdb In the practical exercise, we will often use the GNU debugger gdb. You will find a quick gdb reference card on the website http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf. You can start gdb as following: gdb program : Start program under control of gdb gdb -c core : Examine a core dump file with gdb The following gdb commands are neccessary for this exercise: quit: Exit gdb list: Show lines of the source code disassemble function : Disassemble a function disassemble main: Disassemble the main function break: Set a breakpoint break 5: Set a breakpoint at line number 5 break main: Set a breakpoint at the main function run [arglist] : Run the program with arguments arglist step: Execute until another line has been reached info registers: Print the processor register values. print [expression] : Show value of expression print $esp: Show the value of the %esp register x/[nuf] expression : Examine memory at address expression, whereas N indicates the Number of units to display, u the unit size, and f the printing format. x/4wx $esp: Show the top four (4) 32 bit words (w) on the stack ($esp) in hexadecimal (x) notation 4

2 Practical Assignments In the following you will launch buffer overflow and return-into-libc attacks. Note that Exercise 2.1 and 2.2 are largely inspired by an existing Buffer Overflow Primer (http://www.securitytube.net/ groups?operation=view&groupid=4). Please take the following steps before starting with the exercise: 1. To enforce the generation of core files set the maximum size for core files to unlimited as follows: $ ulimit -c unlimited Usually a core file will be created if, for instance, a segmentation fault occurs during program execution. In such a case the core file holds the return address that caused the problem. You can analyze core files with the GNU debugger gdb. 2. Address Space Layout Randomization (ASLR) randomizes the base address of the stack. Since we need to know the precise addresses of libc functions and injected shellcode, launching a buffer overflow or a return-into-libc attack becomes more difficult if ALSR is enabled. Hence, before you start with the practical exercises, disable ASLR as following: $ sudo nano /proc/sys/kernel/randomize_va_space and replace the 2 with a 0. 2.1 Simple Buffer Overflow In your Buffer-Overflow folder you will find a program (vulnerable.c) suffering from a buffer overflow vulnerability. Compile the program as following: $ gcc vulnerable.c -o vulnerable -ggdb -mpreferred-stack-boundary=2 \ -fno-stack-protector We use the -ggdb option to include debugging information. The -fno-stack-protector option disables return address protection mechanisms like StackGuard or ProPolice. Finally, the -mpreferred-stack-boundary=2 option sets up the stack into dword-size increments. 1. Explain why the program suffers from a buffer overflow vulnerability? 2. How is the stack arranged directly before the buffer overflow occurs?... Bytes... Bytes... Bytes 3. Now, overflow the local buffer with A (0x41) characters and change the return address to a value of 0x41414141! How many A characters are at least necessary to change the return address to 0x41414141? (Hint: If you don t want to type all A characters manually, you can use perl to print as many A characters as you desire: $./vulnerable `perl -e 'print A x 27'`) 4. After you overflow the local buffer, the program crashes. Explain why a segmentation fault occurs? 5. The I_Never_Execute function within the vulnerable program is never called and will therefore never execute. Your task now is to change the return address of the main function by means of a buffer overflow in order to transfer execution to the I_Never_Execute function. Describe your steps taken (with gdb) and write down the start address of the I_Never_Execute function! Hint: Probably you want to insert the address of the I_Never_Execute function on the command line. This can be accomplished with perl as following: $./vulnerable `perl -e 'print A x 27. \xd3\xc2\xb1\xa0 ' ` 5

2.2 Exploit a Vulnerable Program As you may have recognized in the previous subsection, we are able to launch a buffer overflow attack on the vulnerable.c program. We are able to crash the program and are able to issue the I_Never_Execute function. But usually the main target of buffer overflow attacks is to spawn a new shell to the adversary. Therefore we need so-called shellcode. AlephOne [1] describes in detail how shellcode can be written/created for the x86 architecture. However, the most difficult part of a buffer overflow attack is to transfer execution to the injected shellcode. For the following task, we use the 24 byte shellcode from http://www.exploit-db.com/exploits/13444/. Our 24 byte shellcode should be stored into an environment variable (called $ENV). Afterwards the environment variable is used as an input to the vulnerable program. Instead of launching the attack to the vulnerable.c program, we will use the target.c program by Shacham [5] that we also need for the return-oriented programming exercise. This program establishes the stack frame for our target buffer into some fixed allocated memory area. Thus, we are able to bypass randomization techniques in use. Nevertheless, this program suffers from the same buffer overflow vulnerability as the vulnerability.c program. Compile the target.c program as following: $ gcc target.c -o target -ggdb -mpreferred-stack-boundary=2 \ -fno-stack-protector After you exploit the program you want to have full control (root privileges) over the system. Hence, change the permissions of the target program as following: $ sudo chown root target $ sudo chmod +s target The first command changes the owner of the target program to root. Further, the second command will set the setuid bit, which allows ordinary users to run the target program with the privileges of the owner of the file (here: root). In practice, many programs are compiled in such a way. For instance, the setuid bit in the passwd command assigns an ordinary user root privileges to change his password. 1. Fill the exploit.c program with an appropriate function that inserts the 24 byte shellcode into the allocated memory area at position buffer+4! 2. Compile and run the exploit.c program and check if the environment variable was established. Write down the appropriate command that you used in order to check the value of $ENV! 3. As you may have recognized, the return address stored in the environment variable is 0x41414141. You have to change the return address so that it holds the start address of the 24 byte shellcode (i.e., the address of buffer+4 in the exploit.c program) as following! (a) Start the target program under control of gdb! (b) Set a breakpoint directly before the vulnerable function (i.e., the function where the buffer overflow occurs) is called! (c) Run the vulnerable program with the environment variable ($ENV) as input! (d) Once the breakpoint has been reached, examine the stack and find the return address of the overflow() function! Write down the return address of the overflow() function and describe how you determined the return address! (e) Make one step in the program! (The buffer is now overflown) (f) Once again, examine the stack and check if the return address has been changed to 0x41414141. Further, find out and write down the start address of the injected shellcode! (g) Now close gdb and change the return address in the exploit.c program to the address you determined in the step before! 4. Recompile and run the exploit.c program! Afterwards start the target program with input $ENV that should now launch a root shell to you. To check if you really become root issue the command $ whoami before and after the buffer overflow attack! Now save a copy of your exploit.c program! 6

2.3 Return-into-libc As we mentioned in Section 1.3, conventional buffer overflows cannot be launched if the W X security model is in use, because the adversary is no longer able to execute injected shellcode. Therefore, in the following we will use only code that resides in the process s image and launch a shell by means of a return-into-libc attack. Ideally, we want to issue the command system( /bin/sh ) that should launch a shell to us without injecting any own code. After the shell is closed by the adversary, the program should terminate without a segmentation fault. Thus, we make use of the exit function as we described in Section 1.3. 1. Since we want to invoke the system function from libc, start the target program with gdb and find out the address of the system function. Write down the precise address of the system function! 2. To close the program, we need the address of the exit function. Determine and write down the appropriate address! 3. You also need the address of the string /bin/sh. The easiest way for finding such string in the address space is by storing the string into an environment variable. Afterwards you just need the address of the environment variable. Therefore complete the following steps: (a) Create an environment variable named $MYSHELL with the /bin/sh string as content! (Hint: Use the export command on the command line) (b) In your code folder you will find a program getenv.c that expects as input an environment variable and will give you as result the appropriate address. Run the getenv.c program and write down the address of $MYSHELL! 4. Now you have all addresses you need to launch the return-into-libc attack. Run the attack and write down the payload you used! As before, check if you are root with the command$ whoami! 7

References [1] Aleph One. Smashing the stack for fun and profit. Phrack Magazine, 49(14), 1996. [2] Intel Corporation. Intel 64 and ia-32 architectures software developer s manuals. http://www.intel. com/products/processor/manuals/. [3] klog. The frame pointer overwrite. Phrack Magazine, 55(9), 1999. [4] PaX Team. http://pax.grsecurity.net/. [5] H. Shacham. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In CCS 07: Proceedings of the 14th ACM Conference on Computer and Communications Security, pages 552 561. ACM, 2007. [6] Solar Designer. "return-to-libc" attack. Bugtraq, 1997. 8