PRESENTED BY: SANTOSH SANGUMANI & SHARAN NARANG

Table of contents Introduction Binary Disassembly Return Address Defense Prototype Implementation Experimental Results Conclusion

Buffer Over2low Attacks Lack of array bound checking in the compiler results in: Overwriting return address on the stack. Corrupting memory pointer variables on the stack. Solutions: Build bound checking mechanisms in the compiler. Require applications to strictly follow programming guidelines. Implement a binary rewriting solution that provides same level of protection.

Challenges Where should the protection instructions be inserted? How to insert the instructions without disturbing existing code?

Binary Disassembly Identifies boundary of every procedure in the input program Linear sweep v/s Recursive algorithms Challenges: Data embedded in code region Variable instruction size Indirect branch instruction Callback functions Hand crafted assembly code

Binary Disassembly Consider the following code sequence 0x0F 0x85 0xC0 0x0F 0x85 jne offset 0x0F test eax, eax jne offset

Disassembly Engine Binary rewriting inserts protection code at the start of each procedural call. Need to identify as many code bytes as possible. Original program semantics must be maintained.

Disassembly Engine Step 1: Dispatch Tables Implementation Identify as potential address bytes in the dispatch table. Step 2: Main Entry point Perform control flow analysis. Identify code and data sections. Step 3: Callback Functions Identify interesting pieces of code associated with callback functions.

Disassembly Engine Implementation Step 4: Reset Points Parse instructions from reset point until an unconditional branch is encountered. Step 5: Unconditional branch instructions All code sequences end with an unconditional branch (jmp or ret). Bytes immediately after an unconditional branch must be: Data bytes Code byte which has been marked as function entry point.

Where to insert RAD code? Interesting Functions Function prolog: A copy of the return address is saved in the return address repository (RAR). Function Epilog Check the return address on the stack with the address saved in the RAR.

RAD Code JMP instruction transfers control to RAD prolog and then executes function prolog. For epilog the following order is observed: Function epilog until the RET instruction. RAD epilog checking code. RET instruction if there are no problems. Some instructions in the function need to be replaced by a JMP instruction (5 bytes).

Prolog Function Prolog of an interesting function compromises 3 instructions: push ebp mov ebp, esp sub esp, x // 1byte // 2 bytes // 3 6 bytes Thus, the function prolog contains at least 6 bytes.

Epilog Stack frame de-allocation uses the following instructions: mov esp, ebp pop ebp ret Or leave ret // 2 bytes // 1 byte // 1 byte // 1 byte // 1 byte Thus, the function epilog can be done in 2 4 bytes of instructions. Some more instructions prior to the stack frame de-allocation instructions must be replaced.

Epilog In some cases, it can be difficult to find instructions to be replaced. Use an interrupt to solve this problem. Replace first byte of instruction prior to RET with INT 3 instruction. Exception handler performs return address check.

Prototype Implementation We will discuss: Software Architecture RAR Initialization Limitations of the implementation

Software Architecture

A high level picture MEMORY.CODE Header Code + interleaved data

A high level picture MEMORY.CODE.RAD Header Code + interleaved data Legend: Existing stuff Newly added stuff RAD Set R during run-time Mine zones RAR Set RW RAD Set R during run-time

RAR Initialization Creating a mine section to protect Return Address Repository (RAR) Call win32 API VirtualProtect() Problem: How to call it? Case a: It is called by target program. Solution: Resides in its Import Address Table Case b: Not loaded in its address space. Solution: Idea to load it is derived from a virus code!

Limitations Limitations of disassembler Two types of problems: 1. Missed functions It can happen when either when a sequence of bytes is identified fully as code or when it is identified as data. Usually happens with callback functions 2. Falsely identified functions RAR might overflow Can happen with hand-crafted assembly

Limitations Limitations of RAD It helps exclusively in the case where return address is modified. No protection if GOT or IAT is modified e.g. printf might point to malicious code. Mutithreaded applications not handled Self modified code cannot be handled

Results Space overhead Disassembly accuracy

Results Disassembly accuracy (cont d) Runtime overhead

Final Thoughts Pros: It works on binaries, hence it can protect legacy applications Runtime overhead is quite reasonable The implementation and problems are described in good detail Cons: It protects only against stack buffer overflow attacks Some cases go unhandled due to less than 100% accuracy with disassembly

THANK YOU