T Reverse Engineering Malware: Static Analysis I

Similar documents
Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION

Reverse Engineering II: The Basics

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Reverse Engineering II: The Basics

T Jarkko Turkulainen, F-Secure Corporation

Reverse Engineering III: PE Format

Introduction to IA-32. Jo, Heeseung

INTRODUCTION TO IA-32. Jo, Heeseung

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

Complex Instruction Set Computer (CISC)

Module 3 Instruction Set Architecture (ISA)

Advanced Microprocessors

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Practical Malware Analysis

EXPERIMENT WRITE UP. LEARNING OBJECTIVES: 1. Get hands on experience with Assembly Language Programming 2. Write and debug programs in TASM/MASM

Instruction Set Architectures

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Addressing Modes on the x86

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

x86 Assembly Tutorial COS 318: Fall 2017

We can study computer architectures by starting with the basic building blocks. Adders, decoders, multiplexors, flip-flops, registers,...

The x86 Architecture

Systems Architecture I

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

MODE (mod) FIELD CODES. mod MEMORY MODE: 8-BIT DISPLACEMENT MEMORY MODE: 16- OR 32- BIT DISPLACEMENT REGISTER MODE

Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998

The Microprocessor and its Architecture

Instruction Set Architectures

Basic Execution Environment

Hardware and Software Architecture. Chapter 2

Instruction Set Architectures

W4118: PC Hardware and x86. Junfeng Yang

Chapter 11. Addressing Modes

CS241 Computer Organization Spring 2015 IA

CMSC Lecture 03. UMBC, CMSC313, Richard Chang

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

UMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents.

CS241 Computer Organization Spring Introduction to Assembly

Assembler Programming. Lecture 2

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit

Lecture (02) The Microprocessor and Its Architecture By: Dr. Ahmed ElShafee

Ethical Hacking. Assembly Language Tutorial

SPRING TERM BM 310E MICROPROCESSORS LABORATORY PRELIMINARY STUDY

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture

6/20/2011. Introduction. Chapter Objectives Upon completion of this chapter, you will be able to:

EEM336 Microprocessors I. The Microprocessor and Its Architecture

X86 Review Process Layout, ISA, etc. CS642: Computer Security. Drew Davidson

Assembly Language Each statement in an assembly language program consists of four parts or fields.

Assembly Language. Lecture 2 x86 Processor Architecture

6/17/2011. Introduction. Chapter Objectives Upon completion of this chapter, you will be able to:

History of the Intel 80x86

System calls and assembler

Chapter 2: The Microprocessor and its Architecture

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

Chapter 3: Addressing Modes

MACHINE-LEVEL PROGRAMMING I: BASICS COMPUTER ARCHITECTURE AND ORGANIZATION

EC-333 Microprocessor and Interfacing Techniques

IA32 Intel 32-bit Architecture

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

IA-32 Architecture. CS 4440/7440 Malware Analysis and Defense

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Machine Language, Assemblers and Linkers"

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections

SYSC3601 Microprocessor Systems. Unit 2: The Intel 8086 Architecture and Programming Model

Lecture 2 Assembly Language

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

The von Neumann Machine

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Lab 2: Introduction to Assembly Language Programming

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?

HLA v2.0 Intermediate Code Design Documentation

x86 architecture et similia

Assembly Language Lab # 9

1. Introduction to Assembly Language

The Instruction Set. Chapter 5

COS 318: Operating Systems. Overview. Prof. Margaret Martonosi Computer Science Department Princeton University

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Low Level Programming Lecture 2. International Faculty of Engineerig, Technical University of Łódź

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CS499. Intel Architecture

The von Neumann Machine

Addressing Modes. Outline

CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM

Machine and Assembly Language Principles

EEM336 Microprocessors I. Addressing Modes

x86 assembly CS449 Fall 2017

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

Dr. Ramesh K. Karne Department of Computer and Information Sciences, Towson University, Towson, MD /12/2014 Slide 1

Assembly Language: IA-32 Instructions

2.7 Supporting Procedures in hardware. Why procedures or functions? Procedure calls

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

Transcription:

T-110.6220 Reverse Engineering Malware: Static Analysis I Antti Tikkanen, F-Secure Corporation Protecting the irreplaceable f-secure.com

Representing Data 2

Binary Numbers 1 0 1 1 Nibble B 1 0 1 1 1 1 0 1 Byte B D 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 Word B D 3 9 3

Endianness aka. Byte Order c9 33 41 03 Actual bytes 0x034133c9 Little Endian 0xc9334103 Big Endian Intel x86 Intel 8051 PowerPC (exc. G5) Sparc (exc. v9) 4

Strings ASCII Unicode (UCS-2) H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34 (BOM) H e l l o (ff fe) 48 00 65 00 6C 00 6C 00 6F 00 5

How do you know when the string ends? ASCIIZ H e l l o 48 65 6C 6C 6F 00 Null-terminated Unicode Pascal Delphi H e l l o 48 00 65 00 6C 00 6C 00 6F 00 00 00 H e l l o 05 48 65 6C 6C 6F H e l l o 05 00 00 00 48 65 6C 6C 6F 6

Representing Code 7

CPU Architectures CISC Complex Instruction Set Computing Emphasis on hardware and assembly programming Complex multi-clock instructions High code density Examples: x86, m68k, IBM z/architecture RISC Reduced Instruction Set Computing Emphasis on software and high-level languages Simple, reduced instruction set Low code density Examples: ARM, PowerPC, MIPS 8

Introduction to Intel x86 Started with 8086 in 1978 Continued with 8088, 80186, 80286, 386, 486, Pentium, 686... CISC architecture 32-bit is called x86-32 or IA-32 64-bit is called x86-64, IA-32e, AMD64, EM64T, x64 or Intel 64 IA-64 is Itanium, which is different 9

Intel 80386 Introduced in 1986 Has a 32-bit word length Has 8 general-purpose registers Supports paging and virtual memory Addresses up to 4GiB of memory 10

Data Register Layout 11

Data Registers AL / AH / AX EAX BL / BH / BX EBX CL / CH / CX ECX DL / DH / DX EDX Accumulator Data index Loop counter Data register Arithmetic operations General data storage, index Loop constructs Arithmetics 12

Address Registers IP / EIP Instruction Pointer Program execution SP / ESP Stack Pointer Stack operation BP / EBP Base Pointer Stack frame SI / ESI DI / EDI Source Index Destination Index String operation String operation 13

Segment Registers CS Code Segment Program code DS Data Segment Program data ES / FS / GS Other Segments Other uses 14

EFLAGS 15

Examples of Mnemonics MOV EAX, 1 Move 1 to EAX ADD EDX, 5 Add 5 to EDX SUB EBX, 2 Subtract 2 from EBX AND ECX, 0 Bit-wise AND 0 to ECX XOR EDX, 4 Bit-wise exclusive OR 4 to EDX SHL ECX, 6 Shift ECX left by six ROR EBX, 3 Bit-wise rotate EBX right by 3 INC ECX Increment ECX 16

Examples of Mnemonics (contd) JNZ label JMP label CALL func RET LOOP label PUSH EAX POP EDI Jump if not zero (equal) Unconditional jump to label Call function Return from function ECX--, Jump to label if not zero Push EAX to stack Pop EDI from stack 17

Intel x86 Instruction Format (32-bit) Variable-size complex instruction format (CISC) Instruction format: OPERATION, [OPERAND 1, OPERAND 2, [OPERAND 3]] Operands can operate on register, memory address or immediate data Single instruction can be of any size from 1 byte up to 15 bytes (compare to RISC, where instruction size is constant, for example 4 bytes)!! 18

Instruction Format (64-bit) 64-bit mode is backward-compatible extension to 32-bit mode Adds optional REX prefix to 32-bit format Image Copyright Intel Corporation 19

Instruction Prefix Bytes Group 1: lock and repeat prefixes Lock ensures exclusive use of memory in multiple-processor environments Repeat is used in string operations (sort of a loop) Group 2: Segment overrides or branch hint Segment in memory access is implied by the instruction, but can be overridden using Group 2 prefix. Some bytes in this group indicates a branch hint, if the instruction is conditional jump Group 3: Operand size override Switch between 16 and 32-bit operand size Group 4: Address size override Switch between 16 and 32-bit addressing 20

REX Prefix Semantics for accessing extra registers in 64-bit mode Additional 64-bit operand size overrides 21

Opcode Bytes Byte(s) that presents the actual operation (MOV, INC, PUSH, etc.) Originally only single byte, but later specifications define 1-3 bytes (or 4, depending on definition) First byte can be any opcode in range 0-255, or escape byte (0F) indicating that another opcode byte follows Second opcode byte defines an opcode after escape byte Third opcode byte is a prefix (66, F2 or F3) to multimedia (SSE) instructions Latest instruction sets define escape bytes in the second opcode table (38 and 3A) Some opcodes imply the registers used, others need more information (MODRM/SIB) About 1200 different opcodes (!!!!) 22

MODRM and SIB These bytes define the registers and other data used by the instruction MODRM defines the registers used for addressing the memory or register Sometimes addressing is more complex than a simple register can offer SIB byte is used for complex memory addressing: indexing and scaling Example of complex addressing: MOV EBX, [EBP + EAX*2 + 4] Destination Register Base register Index register Scale factor Displacement offset 23

Displacement and Immediate Data Some complex addressing forms require offset within the memory reference, called displacement Displacement follows immediately the MODRM/SIB bytes (1, 2 or 4 bytes) Some instructions use immediate data as operand value (1, 2 or 4 bytes) Some instructions can use 64-bit (8 bytes) displacement and immediate values in 64-bit mode 24

Monster Instruction Assembly presentation: REP MOV [SS:EAX + EBX*8 + 0x11223344], 0x55667788 Data in hex: F3 36 C7 84 D8 44 33 22 11 88 77 66 55 F3 repetition prefix (REP) 36 segment override (SS) C7 opcode (MOV) 84, D8 MORDM, SIB (base register EAX, index register EBX, scale factor 8) 44 33 22 11 displacement data (0x11223344) 88 77 66 55 immediate data (0x55667788) 25

Example of RISC: ARM Fixed instruction size: 4 bytes (2 bytes in Thumb mode) Load/store architecture: operations only on registers But including fancy things like embedded shifting a += (j << 2) in one operation! Almost all instructions are conditional (mov, moveq, movne,..) Limited (but quite expressive) instruction set 26

Disassemblers 0 1 0 1 0 0 0 0 0x50 PUSH EAX Bits in memory Hexadecimal presentation Human-readable disassembly 8B 5C 45 04 MOV EBX, [EBP+EAX*2+4] 27

Disassemblers: Linear Sweep Linear sweep disassembler: Take first byte of code, decode instruction, use length of instruction to skip to next instruction Example of problem with data embedded in code: 0042ABFE: EB 01 jmp 0042AC01 0042AC00: F3 BB 44 33 22 11 rep mov ebx,11223344h 0042AC06: 5F pop edi 28

Disassemblers: Recursive Traversal Recursive traversal disassembler: Starting from entry point and follow each branch (depth or breadth-first) to locate new starting points Still problems, e.g. with dynamically calculated branch targets: 29

Reversing C 30

Reversing C 31

Basic Data Types char short int long float double - 1 byte - 2 bytes - 4 bytes (platform word) - 4 bytes - 4 bytes floating point - 8 bytes floating point 32

Arrays One-dimensional arrays: char a[4] A[0] A[1] A[2] A[3] Multidimensional arrays char a[2][3]; A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2] 33

Example of multidimensional arrays char val[2][3][2] = {{{'0','1'},{'2','3'},{'4','5'}}, {{'6','7'},{'8','9'},{'a','b'}}}; 34

Structures and Unions struct { unsigned int id; unsigned short age; char name[16]; } record; Memory allocated for all members combined: sizeof(record) = 24 union foo { int one; char two; }; Memory allocated for largest member only: sizeof(record) = 4 35

Structure Alignment Data structures are aligned to word size #pragma pack() directive can change it unsigned int id (4 bytes) short age (2 bytes) 2 bytes pad name (16 bytes) #pragma pack(1) unsigned int id (4 bytes) short age (2 bytes) name (16 bytes) 36

Calling Conventions: stdcall int stdcall foobar(int x, int y, char *foo, char *bar); 37

Calling Conventions: cdecl int cdecl foobar(int x, int y, char *foo, char *bar); 38

Calling Conventions: fastcall int fastcall foobar(int x, int y, char *foo, char *bar); 39

Calling Conventions: thiscall class Foo { public: int Bar(int x, int y, char *foo, char *bar); }; 40

PE/COFF 41

Introduction to PE/COFF PE stands for Portable Executable Microsoft introduced PE in Windows NT 3.1 It originates from Unix COFF Features dynamic linking, symbol exporting/importing Can contain Intel, Alpha, MIPS and even.net MSIL binary code 64-bit version is called PE32+ 42

Complete Structure of PE 43

MZ Header MZ Header PE Header.text.data Imports Resources 44

PE Header MZ Header File Header Optional Header Section Header[] 45

PE Header (continued) MZ Header File Header Optional Header Section Header[] 46

PE Header (continued) MZ Header File Header Optional Header Section Header[] 47

PE Header (continued) MZ Header File Header Optional Header Section Header[] 48

PE Loading File on disk MZ/PE Header Section Table Image in memory MZ/PE Header Section Table.text.data.text RVA = Relative Virtual Address = Offset from image base in memory.data 49

Importing Symbols Symbols (functions/data) can be imported from external DLLs The loader will load external DLLs automatically All the dependencies are loaded as well DLLs will be loaded only once External addresses are written to the Import Address Table (IAT) 50

Importing Symbols Each DLL has one IMAGE_IMPORT_DESCRIPTOR The descriptor points to two parallel lists of symbols to import Import Address Table (IAT) Import Name Table (INT) The primary list is overwritten by the loader, the second one is not Executables can be pre-bound to DLLs to speed up loading Symbols can be imported by ASCII name or ordinal 51

Import Descriptors 52

Exports Symbols can be exported with ordinals, names or both Ordinals are simple index numbers of symbols Name is a full ASCII name of the exported symbol Exports can be forwarded to another DLL Forwarded symbol s address points to a name in the exports section 53

Resources Resources in PE are similar to an archive Resource files can be organised into directory trees The data structure is quite complex but there are tools to handle it Most common resources: Icons Version information GUI resources 54

Base Relocation Preferred image base MZ/PE Header Section Table.text.data Relocation offset Actual image base MZ/PE Header Section Table.text.data 55

Base Relocation Sometimes a DLL can not be loaded to its preferred address When rebasing, the loader has to adjust all hardcoded addresses Relocations are done in 4KiB blocks (page size on x86) Each relocation entry gives a type and points to a location The loader calculates the base address difference The offsets are adjusted according to the difference 56

ELF ELF = Executable and Linkable Format Used on many platforms Linux, Solaris, FreeBSD, NetBSD, OpenBSD Native executables for Android We ll talk more about Android applications on later lectures We won t go into ELF details during this lecture 57