Overview REWARDS TIE HOWARD Summary CS 6V Data Structure Reverse Engineering. Zhiqiang Lin

Similar documents
TIE: Principled Reverse Engineering of Types in Binary Programs! JongHyup Lee, Thanassis Avgerinos, and David Brumley

15-213/18-243, Spring 2011 Exam 1

CS , Spring 2004 Exam 1

CS , Fall 2004 Exam 1

CS , Fall 2001 Exam 1

CSE351 Autumn 2012 Midterm Exam (5 Nov 2012)

CSC 405 Computer Security Shellcode

The IA-32 Stack and Function Calls. CS4379/5375 Software Reverse Engineering Dr. Jaime C. Acosta

Labeling Library Functions in Stripped Binaries

Identifying and Analyzing Pointer Misuses for Sophisticated Memory-corruption Exploit Diagnosis

Computer Systems Organization V Fall 2009

Outline. x86 Architecture

UW CSE 351, Winter 2013 Midterm Exam

CS / ECE , Spring 2010 Exam 1

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

CS , Fall 2002 Exam 1

CMSC 313 Lecture 12 [draft] How C functions pass parameters

Machine Programming 3: Procedures

Assembly I: Basic Operations. Jo, Heeseung

THEORY OF COMPILATION

CMSC 313 Lecture 12. Project 3 Questions. How C functions pass parameters. UMBC, CMSC313, Richard Chang

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

Function Call Convention

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

Machine-level Programming (3)

Full Name: CISC 360, Fall 2008 Example of Exam

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Overview of Compiler. A. Introduction

Assembly Language: Function Calls

Abstraction Recovery for Scalable Static Binary Analysis

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Processes (Intro) Yannis Smaragdakis, U. Athens

Helping Johnny To Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study

CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

Buffer Overflow Attack

SYSTEM CALL IMPLEMENTATION. CS124 Operating Systems Fall , Lecture 14

Space Traveling across VM

Assembly Language: Function Calls" Goals of this Lecture"

Reverse Engineering II: The Basics

Università Ca Foscari Venezia

Machine Language, Assemblers and Linkers"

W4118: PC Hardware and x86. Junfeng Yang

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Shell Code For Beginners

15-213/18-243, Fall 2010 Exam 1 - Version A

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

An Experience Like No Other. Stack Discipline Aug. 30, 2006

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING PREVIEW SLIDES 16, SPRING 2013

MACHINE-LEVEL PROGRAMMING I: BASICS COMPUTER ARCHITECTURE AND ORGANIZATION

CS241 Computer Organization Spring 2015 IA

CMSC 313 Fall2009 Midterm Exam 2 Section 01 Nov 11, 2009

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Assembly III: Procedures. Jo, Heeseung

Binghamton University. CS-220 Spring Loading Code. Computer Systems Chapter 7.5, 7.8, 7.9

Return-orientated Programming

Program Exploitation Intro

ASSEMBLY III: PROCEDURES. Jo, Heeseung

Link 2. Object Files

Introduction to Computer Systems , fall th Lecture, Sep. 28 th

Second Part of the Course

CS213. Machine-Level Programming III: Procedures

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Section 4: Threads and Context Switching

BUFFER OVERFLOW DEFENSES & COUNTERMEASURES

CS 3214 Spring # Problem Points Min Max Average Median SD Grader. 1 Memory Layout and Locality Bill

CPS104 Recitation: Assembly Programming

CPEG421/621 Tutorial

Homework. In-line Assembly Code Machine Language Program Efficiency Tricks Reading PAL, pp 3-6, Practice Exam 1

CS 318 Principles of Operating Systems

University of Washington

The course that gives CMU its Zip! Machine-Level Programming III: Procedures Sept. 17, 2002

CSC 591 Systems Attacks and Defenses Return-into-libc & ROP

Section 4: Threads CS162. September 15, Warmup Hello World Vocabulary 2

Stack -- Memory which holds register contents. Will keep the EIP of the next address after the call

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

CISC 360. Machine-Level Programming I: Introduction Sept. 18, 2008

ICS143A: Principles of Operating Systems. Midterm recap, sample questions. Anton Burtsev February, 2017

CS , Fall 2009 Exam 1

Digital Forensics Lecture 3 - Reverse Engineering

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Link Edits and Relocatable Code

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Machine Programming 1: Introduction

Instruction Set Architectures

Instruction Set Architectures

Advanced Buffer Overflow

The Geometry of Innocent Flesh on the Bone

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

Practical Malware Analysis

Is stack overflow still a problem?

CS , Fall 2009 Exam 1

Machine-Level Programming II: Control Flow

IA-32 Architecture. CS 4440/7440 Malware Analysis and Defense

CS 3214 Computer Systems. Do not start the test until instructed to do so! printed

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

x86 assembly CS449 Fall 2017

Transcription:

CS 6V81-05 Data Structure Reverse Engineering Zhiqiang Lin Department of Computer Science The University of Texas at Dallas September 2 nd, 2011

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

What is Data Structure Reverse Engineering Data structure reverse engineering aims to infer both the 1 Syntax (layout, size, offset) of variables

What is Data Structure Reverse Engineering Data structure reverse engineering aims to infer both the 1 Syntax (layout, size, offset) of variables 2 Semantics (meaningful use, e.g., ip_addr, pid_t) of Types

What is Data Structure Reverse Engineering Data structure reverse engineering aims to infer both the 1 Syntax (layout, size, offset) of variables 2 Semantics (meaningful use, e.g., ip_addr, pid_t) of Types when only given a binary code.

Who cares 1 Memory Forensics

Who cares 1 Memory Forensics 2 Vulnerability Discovery

Who cares 1 Memory Forensics 2 Vulnerability Discovery 3 Protocol Reverse Engineering

Who cares 1 Memory Forensics 2 Vulnerability Discovery 3 Protocol Reverse Engineering 4 Program signature

Who cares 1 Memory Forensics 2 Vulnerability Discovery 3 Protocol Reverse Engineering 4 Program signature 5 Virtual machine introspection

Who cares 1 Memory Forensics 2 Vulnerability Discovery 3 Protocol Reverse Engineering 4 Program signature 5 Virtual machine introspection 6...

Understanding the Compilation Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Understanding the Compilation Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Understanding the Compilation Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Understanding the Compilation Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Our goal Again, from binary code, infer 1 Syntax (layout, size, offset) of variables 2 Semantics (meaningful use, e.g., ip_addr, pid_t) of Types

Three papers REWARDS: Automatic Reverse Engineering of Data Structures from Binary Execution. NDSS 2010

Three papers REWARDS: Automatic Reverse Engineering of Data Structures from Binary Execution. NDSS 2010 TIE: Principled Reverse Engineering of Types in Binary Programs. NDSS 2011

Three papers REWARDS: Automatic Reverse Engineering of Data Structures from Binary Execution. NDSS 2010 TIE: Principled Reverse Engineering of Types in Binary Programs. NDSS 2011 Howard: A Dynamic Excavator for Reverse Engineering Data Structures. NDSS 2011

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

Observation 1 struct { 1 extern foo 2 section.text 2 unsigned int pid; 3 global _start 3 char data[16]; 4 }test; 4 5 5 _start: 6 void foo(){ 6 call foo 7 mov eax,1 7 char *p="hello world"; 8 mov ebx,0 8 test.pid=my_getpid(); 9 int 80h 9 strcpy(test.data,p); 10 } (a) Source code of function foo and the _start assembly code [Nr] Name Type Addr Off Size... [ 1].text PROGBITS 080480a0 0000a0 000078 [ 2].rodata PROGBITS 08048118 000118 00000c [ 3].bss NOBITS 08049124 000124 000014... (c) Section map of the example binary rodata_0x08048118{ fun_0x08048110{ +00: char[12] +00: ret_addr_t } } bss_0x08049124{ +00: pid_t, fun_0x080480e0{ -08: unused[4], +04: char[12], -04: stack_frame_t, +16: unused[4] } +00: ret_addr_t, fun_0x080480b4{ +04: char*, +08: char* -28: unused[20], } -08: char *, -04: stack_frame_t, +00: ret_addr_t } 1 80480a0: e8 0f 00 00 00 call 0x80480b4 2 80480a5: b8 01 00 00 00 mov $0x1,%eax 3 80480aa: bb 00 00 00 00 mov $0x0,%ebx 4 80480af: cd 80 int $0x80 5... 6 80480b4: 55 push %ebp 7 80480b5: 89 e5 mov %esp,%ebp 8 80480b7: 83 ec 18 sub $0x18,%esp 9 80480ba: c7 45 fc 18 81 04 08 movl $0x8048118,0xfffffffc(%ebp) 10 80480c1: e8 4a 00 00 00 call 0x8048110 11 80480c6: a3 24 91 04 08 mov %eax,0x8049124 12 80480cb: 8b 45 fc mov 0xfffffffc(%ebp),%eax 13 80480ce: 89 44 24 04 mov %eax,0x4(%esp) 14 80480d2: c7 04 24 28 91 04 08 movl $0x8049128,(%esp) 15 80480d9: e8 02 00 00 00 call 0x80480e0 16 80480de: c9 leave 17 80480df: c3 ret 18 80480e0: 55 push %ebp 19 80480e1: 89 e5 mov %esp,%ebp 20 80480e3: 53 push %ebx 21 80480e4: 8b 5d 08 mov 0x8(%ebp),%ebx 22 80480e7: 8b 55 0c mov 0xc(%ebp),%edx 23 80480ea: 89 d8 mov %ebx,%eax 24 80480ec: 29 d0 sub %edx,%eax 25 80480ee: 8d 48 ff lea 0xffffffff(%eax),%ecx 26 80480f1: 0f b6 02 movzbl (%edx),%eax 27 80480f4: 83 c2 01 add $0x1,%edx 28 80480f7: 84 c0 test %al,%al 29 80480f9: 88 04 0a mov %al,(%edx,%ecx,1) 30 80480fc: 75 f3 jne 0x80480f1 31 80480fe: 89 d8 mov %ebx,%eax 32 8048100: 5b pop %ebx 33 8048101: 5d pop %ebp 34 8048102: c3 ret 35... 36 8048110: b8 14 00 00 00 mov $0x14,%eax 37 8048115: cd 80 int $0x80 38 8048117: c3 ret (d) Output of REWARDS (b) Disassembly code of the example binary

Key Ideas 1 Identifying type resolution points in binary code

Key Ideas 1 Identifying type resolution points in binary code 2 Tracking data flow to resolve other all involved memory/register types

Type Resolution points <getpid> 8048110: mov $0x14,%eax 8048115: int $0x80 8048117: ret System calls: syscall num (eax) Syscall Enter: Type parameter passing registers (i.e., ebx, ecx, edx, esi, edi, and ebp) if they involved

Type Resolution points <getpid> 8048110: mov $0x14,%eax 8048115: int $0x80 8048117: ret System calls: syscall num (eax) Syscall Enter: Type parameter passing registers (i.e., ebx, ecx, edx, esi, edi, and ebp) if they involved Syscall Exit: Type return value (eax)

Type Resolution points 80480ce: mov %eax,0x4(%esp) <arg2> 80480d2: movl $0x8049128, (%esp) <arg1> 80480d9: call 0x80480e0 <strcpy> Standard Library Call (API) Type corresponding argument and return value

Type Resolution points 80480ce: mov %eax,0x4(%esp) <arg2> 80480d2: movl $0x8049128, (%esp) <arg1> 80480d9: call 0x80480e0 <strcpy> Standard Library Call (API) Type corresponding argument and return value More wealth than System call 2016 APIs in Libc.so.6 vs. 289 sys call (2.6.15)

Type Resolution points Other type revealing instructions in binary code:

Type Resolution points Other type revealing instructions in binary code: String related (e.g., MOVS/B/D/W, LOADS/STOS/B/D/W) Floating-point related (e.g., FADD, FABS, FST) Pointer-related (e.g., MOV (%edx), %ebx))

Data Flow Tracking Standard technique:

Data Flow Tracking Standard technique: Using shadow memory to keep the variable attributes and track the propagation

Data Flow Tracking Standard technique: Using shadow memory to keep the variable attributes and track the propagation New challenges

Data Flow Tracking Standard technique: Using shadow memory to keep the variable attributes and track the propagation New challenges Two-way type resolution Dynamic life time of stack and heap variables Memory locations will be reused Multiple instances of the same type

Implementations Dynamic Binary Instrumentation: A pin-tool on top of Pin-2.6 http://www.pintool.org/

Implementations Dynamic Binary Instrumentation: A pin-tool on top of Pin-2.6 http://www.pintool.org/ Data structures of interest File information Network communication Process information Input-influenced (for vulnerability discovery)

Limitation Loss of data structure hierarchy

Limitation Loss of data structure hierarchy There is no corresponding type resolution point with the same hierarchical type Heuristics exist (e.g., AutoFormat [Lin et al, NDSS2008])

Limitation Loss of data structure hierarchy There is no corresponding type resolution point with the same hierarchical type Heuristics exist (e.g., AutoFormat [Lin et al, NDSS2008]) Path-sensitive memory reuse Compiler might assign different local variables declared in different program paths to the same memory address Path-sensitive analysis

Limitation Loss of data structure hierarchy There is no corresponding type resolution point with the same hierarchical type Heuristics exist (e.g., AutoFormat [Lin et al, NDSS2008]) Path-sensitive memory reuse Compiler might assign different local variables declared in different program paths to the same memory address Path-sensitive analysis Type cast

Limitation Loss of data structure hierarchy There is no corresponding type resolution point with the same hierarchical type Heuristics exist (e.g., AutoFormat [Lin et al, NDSS2008]) Path-sensitive memory reuse Compiler might assign different local variables declared in different program paths to the same memory address Path-sensitive analysis Type cast int a=(float)b;

Limitation Dynamic analysis

Limitation Dynamic analysis Full coverage of data structures? TIE [NDSS 2011]

Limitation Dynamic analysis Full coverage of data structures? TIE [NDSS 2011] Only user-level data structure OS kernel?

Limitation Dynamic analysis Full coverage of data structures? TIE [NDSS 2011] Only user-level data structure OS kernel? Obfuscated binary can cheat REWARDS

Limitation Dynamic analysis Full coverage of data structures? TIE [NDSS 2011] Only user-level data structure OS kernel? Obfuscated binary can cheat REWARDS No explicit library call Memory cast

Conclusion REWARDS

Conclusion REWARDS Binary only Dynamic analysis Data flow tracking

Conclusion REWARDS Binary only Dynamic analysis Data flow tracking Key insight

Conclusion REWARDS Binary only Dynamic analysis Data flow tracking Key insight Using system call/api/type revealing instruction as type resolution point Two-way type resolution Unique benefits to memory forensics and vulnerability discovery

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

Goal I: Variable Discovery Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Goal II: Type Inference Credit: figure is from Jonghyup Lee s NDSS 11 presentation

Problems: multiple possible types Credit: figure is from Jonghyup Lee s NDSS 11 presentation

TIE s Contribution reg32_t reg16_t reg8_t reg1_t num32_t ptr(α) code_t num16_t num8_t int32_t uint32_t int16_t uint16_t int8_t uint8_t

Key Idea Collecting Type Constraint from

Key Idea Collecting Type Constraint from Standard Library Call System Call Type Revealing Instructions

Key Idea Collecting Type Constraint from Standard Library Call System Call Type Revealing Instructions Solving Type Constraint based on the Type Lattice

Key Idea Collecting Type Constraint from Standard Library Call System Call Type Revealing Instructions Solving Type Constraint based on the Type Lattice Equality (A = B), essentially Type Unification Subtype relation (A <: B), using unification closure algorithm Conjunctive ( A B) Disjunctive ( A B)

Limitations Works on regular programs compiled from C code

Limitations Works on regular programs compiled from C code Not very informative for irregular programs

Limitations Works on regular programs compiled from C code Not very informative for irregular programs Infers types as what is in the TIE type system only

Conclusion Principled reverse engineering

Conclusion Principled reverse engineering Well defined process + Type theory

Conclusion Principled reverse engineering Well defined process + Type theory Type inference with more expressive type system (unification closure algorithm)

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

Key Idea Credit: Picture from Asia Slowinska s NDSS 11 presentation

Intuition in HOWARD Observe how memory is used at runtime to detect data structures. For instance, if A is a pointer, then if

Intuition in HOWARD Observe how memory is used at runtime to detect data structures. For instance, if A is a pointer, then if A is a function frame pointer, then *(A + 8) is perhaps a function argument

Intuition in HOWARD Observe how memory is used at runtime to detect data structures. For instance, if A is a pointer, then if A is a function frame pointer, then *(A + 8) is perhaps a function argument A is an address of a structure, then *(A + 8) is perhaps a field in this structure

Intuition in HOWARD Observe how memory is used at runtime to detect data structures. For instance, if A is a pointer, then if A is a function frame pointer, then *(A + 8) is perhaps a function argument A is an address of a structure, then *(A + 8) is perhaps a field in this structure A is an address of an array, then *(A + 8) is perhaps an element of this array

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage Can discover array and more internal data structure

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage Can discover array and more internal data structure Look for chains of accesses in a loop (e.g., elem = next++;)

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage Can discover array and more internal data structure Look for chains of accesses in a loop (e.g., elem = next++;) Look for sets of accesses with the same base in a linear space (e.g., elem = array[i];)

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage Can discover array and more internal data structure Look for chains of accesses in a loop (e.g., elem = next++;) Look for sets of accesses with the same base in a linear space (e.g., elem = array[i];) Look for chains of accesses in a loop

Contributions Increased coverage Leverage S2E/Klee to increase the path coverage Can discover array and more internal data structure Look for chains of accesses in a loop (e.g., elem = next++;) Look for sets of accesses with the same base in a linear space (e.g., elem = array[i];) Look for chains of accesses in a loop Boundary elements accessed outside the loop Nested loops Multiple loops in sequence

Outline 1 Overview 2 REWARDS 3 TIE 4 HOWARD 5 Summary

Summary All three techniques: from data use to infer syntax (offset, size, layout) of the variables

Summary All three techniques: from data use to infer syntax (offset, size, layout) of the variables REWARDS REWARDS makes a first step in reconstructing the syntax and semantics of data structure, and demonstrated the security benefits of such reversing.

Summary All three techniques: from data use to infer syntax (offset, size, layout) of the variables REWARDS REWARDS makes a first step in reconstructing the syntax and semantics of data structure, and demonstrated the security benefits of such reversing. TIE TIE is able to statically analyze the binary code, and figure out the types for each individual variables

Summary All three techniques: from data use to infer syntax (offset, size, layout) of the variables REWARDS REWARDS makes a first step in reconstructing the syntax and semantics of data structure, and demonstrated the security benefits of such reversing. TIE TIE is able to statically analyze the binary code, and figure out the types for each individual variables HOWARD HOWARD integrates symbolic execution to have a larger coverage than REWARDS, and also it focuses more on the internal data structures defined in the program.