Automated static deobfuscation in the context of Reverse Engineering

Similar documents
Everybody be cool, this is a roppery!

OptiCode: Machine Code Deobfuscation for Malware Analysis

Practical Malware Analysis

Lecture Notes on Decompilation

Code deobfuscation by optimization. Branko Spasojević

Detecting Metamorphic Computer Viruses using Supercompilation

Lecture 10 Return-oriented programming. Stephen Checkoway University of Illinois at Chicago Based on slides by Bailey, Brumley, and Miller

Random Code Variation Compilation Automated software diversity s performance penalties

Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 4: Code Obfuscation Minsk, Belarus, Spring 2014

Detecting Self-Mutating Malware Using Control-Flow Graph Matching

Software Protection with Obfuscation and Encryption

PLT Course Project Demo Spring Chih- Fan Chen Theofilos Petsios Marios Pomonis Adrian Tang

Obfuscating Transformations. What is Obfuscator? Obfuscation Library. Obfuscation vs. Deobfuscation. Summary

T Jarkko Turkulainen, F-Secure Corporation

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Software Protection with Obfuscation and Encryption

ESET Research Whitepapers // January 2018 // Author: Filip Kafka ESET S GUIDE TO DEOBFUSCATING AND DEVIRTUALIZING FINFISHER

Binary Code Analysis: Concepts and Perspectives

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

Program Exploitation Intro

Inside VMProtect. Introduction. Internal. Analysis. VM Logic. Inside VMProtect. Conclusion. Samuel Chevet. 16 January 2015.

Be a Binary Rockst r. An Introduction to Program Analysis with Binary Ninja

GCC Plugins Die by the Sword

Staticly Detect Stack Overflow Vulnerabilities with Taint Analysis

Assembly Language: Overview!

Rev101. spritzers - CTF team. spritz.math.unipd.it/spritzers.html

ISSISP 2014 Code Obfuscation Verona, Italy

Problem with Scanning an Infix Expression

ROPInjector: Using Return- Oriented Programming for Polymorphism and AV Evasion

Protecting Against Unexpected System Calls

SPRING TERM BM 310E MICROPROCESSORS LABORATORY PRELIMINARY STUDY

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Let s Break Modern Binary Code Obfuscation

CSCD 303 Essential Computer Security Fall 2017

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Background. How it works. Parsing. Virtual Deobfuscator

like a boss Automatically Extracting Obfuscated Strings from Malware

Problem with Scanning an Infix Expression

Breaking State-of-the-Art Binary Code Obfuscation

Binsec: a platform for binary code analysis

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Inject malicious code Call any library functions Modify the original code

Advanced Buffer Overflow

Instruction Embedding for Improved Obfuscation

Intro to Cracking and Unpacking. Nathan Rittenhouse

Talen en Compilers. Johan Jeuring , period 2. December 15, Department of Information and Computing Sciences Utrecht University

Computer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Digital Forensics Lecture 4 - Reverse Engineering

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

IFE: Course in Low Level Programing. Lecture 6

Second Part of the Course

Summary: Direct Code Generation

CS 11 C track: lecture 8

RetDec: An Open-Source Machine-Code Decompiler. Jakub Křoustek Peter Matula

IA-32 Architecture. CS 4440/7440 Malware Analysis and Defense

Lab 3. The Art of Assembly Language (II)

3.1 DATA MOVEMENT INSTRUCTIONS 45

Abstraction Recovery for Scalable Static Binary Analysis

Return-Oriented Rootkits

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Assembly Language: Overview

Program analysis and constraint solvers. Edgar Barbosa SyScan360 Beijing

Virus Analysis. Introduction to Malware. Common Forms of Malware

Emulating Code In Radare2. pancake Lacon 2015

Marking Scheme. Examination Paper. Module: Microprocessors (630313)

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

Towards the Hardware"

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

Industrial Approach: Obfuscating Transformations

Helping RE with LLVM.

Overview. EE 4504 Computer Organization. Much of the computer s architecture / organization is hidden from a HLL programmer

Advanced Buffer Overflow

Stack Shape Analysis to Detect Obfuscated calls in Binaries

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Return-orientated Programming

Basic Execution Environment

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Remix: On-demand Live Randomization

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

The Geometry of Innocent Flesh on the Bone

Project 3 Due October 21, 2015, 11:59:59pm

Intermediate Code & Local Optimizations

Registers. Registers

Defeating Return-Oriented Rootkits with Return-less Kernels

Dissecting APT Sample Step by Step. Kārlis Podiņš, CERT.LV

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Computer Systems Lecture 9

Reverse Engineering with IDA Pro. CS4379/5375 Software Reverse Engineering Dr. Jaime C. Acosta

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 12

SOEN228, Winter Revision 1.2 Date: October 25,

Introduction to Reverse Engineering. Alan Padilla, Ricardo Alanis, Stephen Ballenger, Luke Castro, Jake Rawlins

A New Approach to Instruction-Idioms Detection in a Retargetable Decompiler

BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE

Control Flow Integrity for COTS Binaries Report

Reverse Engineering Malware Binary Obfuscation and Protection

Lecture 3 CIS 341: COMPILERS

Digital Forensics Lecture 3 - Reverse Engineering

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

Android Obfuscation and Deobfuscation. Group 11

Transcription:

Automated static deobfuscation in the context of Reverse Engineering Sebastian Porst (sebastian.porst@zynamics.com) Christian Ketterer (cketti@gmail.com)

Sebastian zynamics GmbH Lead Developer BinNavi REIL/MonoREIL Christian Student University of Karlsruhe Deobfuscation

This talk Obfuscated Code Readable Code (mysterious things happen here) 20% 40% 40%

Motivation Combat common obfuscation techniques Can it be done? Will it produce useful results? Can it be integrated into our technology stack?

Examples of Obfuscation Simple Jump chains Splitting calculations Garbage code insertion Predictable branches Self-modifying code Control-flow flattening Opaque predicates Code parallelization Virtual Machines... Tricky

Our Deobfuscation Approach I. Copy ancient algorithms from compiler theory books II. Translate obfuscated assembly code to REIL III. Run algorithms on REIL code IV. Profit (?)

We re late in the game... 199X 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 U of Auckland F. Perriot Mathur M. Mohammed U of Wisc + TU Munich U of Ghent Mathur zynamics Christodorescu (see end of this presentation for proper source references) Bruschi

... but 199X 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Malware Research Defensive Reverse Engineering Offensive Reverse Engineering

REIL Reverse Engineering Intermediate Language Specifically designed for Reverse Engineering Design Goal: As simple as possible, but not simpler In use since 2007

Uses of REIL Register Tracking: Helps Reverse Engineers follow data flow through code (Never officially presented) Index Underflow Detection: Automatically find negative array accesses (CanSecWest 2009, Vancouver) Automated Deobfuscation: Make obfuscated code more readable (SOURCE Barcelona 2009, Barcelona) ROP Gadget Generator: Automatically generates return-oriented shellcode (Work in progress; scheduled for Q1/2010)

The REIL Instruction Set Arithmetical Bitwise Data Transfer Logical Other ADD SUB MUL DIV MOD BSH AND OR XOR STR LDM STM BISZ JCC NOP UNDEF UNKN

Why REIL? Simplifies input code Makes effects obvious Makes algorithms platform-independent

MonoREIL Monotone Framework for REIL Based on Abstract Interpretation Used to write static code analysis algorithms http://www.flickr.com/photos/wedrrc/3586908193/

Why MonoREIL? In General: Makes complicated algorithms simple (trade brain effort for runtime) Deobfuscator: Wrong choice really, but we wanted more real-life test cases for MonoREIL

Building the Deobfuscator Java BinNavi Plugin REIL + MonoREIL http://www.flickr.com/photos/mattimattila/3602654187/

Block Merging Long chains of basic blocks ending with unconditional jumps Confusing to follow in text-based disassemblers Advantage of higher abstraction level in BinNavi Block merging is purely cosmetic

Block Merging Before After

Constant Propagation and Folding Two different concepts One algorithm in our implementation Partial evaluation of the input code

Constant Propagation and Folding Before After

Dead Branch Elimination Removes branches that are never executed Turns conditional jumps into unconditional jumps Removes code from unreachable branch Requires constant propagation/folding

Dead Branch Elimination Before After

Dead Code Elimination Removes code that computes unused values Gets rid of inserted garbage code Cleans up after constant propagation/folding

Dead Code Elimination Before After

Dead Store Elimination Comparable to dead code elimination Removes useless memory write accesses Limited to stack access in our implementation Only platform-specific part of our optimizer

Dead Store Elimination Before After

Suddenly it dawned us: Deobfuscation for RE brings new problems which do not exist in other areas

Let s get some help

Problem: Side effects push 10 pop eax mov eax, 10 Removed code was used in a CRC32 integrity check as key of a decryption routine as part of an anti-debug check...

Problem: Code Blowup mov eax, 10 add eax, 10 mov eax, 20 clc... Good luck setting AF CF OF PF ZF

Problem: Moving addresses 0000: jmp ecx 0002: push 10 0003: pop eax 0000: jmp ecx 0002: mov eax, 10 ecx is 0003 but static analysis can not know this we just missed the pop instruction

Problem: Inability to debug mov eax, 10 Executable Input File Deobfuscated list of Instructions but no executable file

The only way to solve all* problems: A full-blown native code compiler with an integrated optimizer Too much work, maybe we can approximate... * except for the side-effects issue

Only generate optimized REIL code Before After

Only generate optimized REIL code Produces excellent input for other analysis algorithms Code blow-up solved Keeps address/instruction mapping Code can not be debugged natively but interpreted Side effects problem remains Pretty much unreadable for human reverse engineers

Effect comments Before After

Effect comments Results can easily be used by human reverse engineers Code blow-up solved Side effects problem remains Address mapping problem Code can not be debugged Comments have semantic meaning

Extract formulas from code Before After

Extract formulas from code Results can easily be used by human reverse engineers No code generation necessary, only extraction of semantic information Solves all problems because original program remains unchanged Not really deobfuscation (but produces similar result?)

Implement a small pseudo-compiler Before After

Implement a small pseudo-compiler This is what we did Closest thing to the real deal Code blow-up is solved Partially Natively debug the output not in our case pseudo x86 instructions Side effects problem remains Address mapping problem remains Why not go for a complete compiler?

Economic value in creating a complete optimizing compiler for RE? Not for us Small company Limited market Wrong approach?

Alternative Approaches Deobfuscator built into disassembler REIL-based formula extraction Hex-Rays Decompiler Code optimization and generation based on LLVM Emulation / Dynamic deobfuscation

Conclusion The concept of static deobfuscation is sound Except for things like side-effects, SMC,... A lot of work Expression reconstruction might be much easier and still produce comparable results

Related work A taxonomy of obfuscating transformations Defeating polymorphism through code optimization Code Normalization for Self-Mutating Malware Software transformations to improve malware detection Zeroing in on Metamorphic Computer Viruses...

http://www.flickr.com/photos/marcobellucci/3534516458/