anti-virus and anti-anti-virus

Similar documents
anti-anti-virus (continued)

T Jarkko Turkulainen, F-Secure Corporation

CSE351 Spring 2018, Midterm Exam April 27, 2018

Inside VMProtect. Introduction. Internal. Analysis. VM Logic. Inside VMProtect. Conclusion. Samuel Chevet. 16 January 2015.

Practical Malware Analysis

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor

IA-32 Architecture. CS 4440/7440 Malware Analysis and Defense

CS 161 Computer Security. Week of January 22, 2018: GDB and x86 assembly

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

CSCD 303 Essential Computer Security Fall 2017

T Using debuggers to analyze malware. Antti Tikkanen, F-Secure Corporation

Corrections made in this version not seen in first lecture:

Machine/Assembler Language Putting It All Together

CS Bootcamp x86-64 Autumn 2015

last time Assembly part 2 / C part 1 condition codes reminder: quiz

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

CSCD 303 Essential Computer Security Fall 2018

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Changelog. Virtual Memory (2) exercise: 64-bit system. exercise: 64-bit system

Virus Analysis. Introduction to Malware. Common Forms of Malware

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Agenda. Motivation Generic unpacking Typical problems Results

Module 3 Instruction Set Architecture (ISA)

Credits and Disclaimers

Virtual Memory (2) 1

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Lecture 12 Malware Defenses. Stephen Checkoway University of Illinois at Chicago CS 487 Fall 2017 Slides based on Bailey s ECE 422

Instruction Set Architectures

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

Slides for Lecture 6

Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998

Areas for growth: I love feedback

Malware Analysis and Antivirus Technologies: Using Debuggers to Analyze Malware

Introduction to Machine/Assembler Language

Is stack overflow still a problem?

x86 Programming I CSE 351 Winter

Function Calls and Stack

1. A student is testing an implementation of a C function; when compiled with gcc, the following x86-64 assembly code is produced:

x86 architecture et similia

Exploiting Stack Buffer Overflows Learning how blackhats smash the stack for fun and profit so we can prevent it

Corrections made in this version not in first posting:

Intel x86-64 and Y86-64 Instruction Set Architecture

Program Exploitation Intro

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Machine Language CS 3330 Samira Khan

Intro to x86 Binaries. From ASM to exploit

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Important From Last Time

Lecture 4 CIS 341: COMPILERS

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

16.317: Microprocessor Systems Design I Fall 2014

The IA-32 Stack and Function Calls. CS4379/5375 Software Reverse Engineering Dr. Jaime C. Acosta

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

MACHINE-LEVEL PROGRAMMING I: BASICS COMPUTER ARCHITECTURE AND ORGANIZATION

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

CSCI 2021: x86-64 Control Flow

buffer overflow exploitation

PRESENTED BY: SANTOSH SANGUMANI & SHARAN NARANG

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

Instruction Set Architectures

16.317: Microprocessor Systems Design I Fall 2015

Lab 3. The Art of Assembly Language (II)

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

HUNTING FOR METAMORPHIC ENGINES

Rev101. spritzers - CTF team. spritz.math.unipd.it/spritzers.html

Instruction Set Architecture (ISA) Data Types

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

C++ to assembly to machine code

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

Branching and Looping

Buffer overflow is still one of the most common vulnerabilities being discovered and exploited in commodity software.

CSE 351 Section 4 GDB and x86-64 Assembly Hi there! Welcome back to section, we re happy that you re here

How Software Executes

Assembly Language Each statement in an assembly language program consists of four parts or fields.

Do not turn the page until 5:10.

15-213/18-243, Spring 2011 Exam 1

Rakan El-Khalil xvr α xvr net

x86 Assembly Tutorial COS 318: Fall 2017

CSE 351 Midterm Exam Spring 2016 May 2, 2015

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

logistics LEX assignment out exam in on week come with questions on Monday (review)

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8

Assembly II: Control Flow

16.317: Microprocessor Systems Design I Spring 2015

SOEN228, Winter Revision 1.2 Date: October 25,


The X86 Assembly Language Instruction Nop Means

Instruction Set Architectures

ID: Sample Name: 11youtube3.com Cookbook: default.jbs Time: 08:17:42 Date: 12/04/2018 Version:

Machine Programming 3: Procedures

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

Buffer Overflow Attack

Protecting Against Unexpected System Calls

Transcription:

anti-virus and anti-anti-virus 1

logistics: TRICKY 2 HW assignment out infecting an executable

anti-virus techniques last time: signature-based detection regular expression-like matching snippets of virus(-like) code heuristic detection look for suspicious things behavior-based detection look for virus activity not explicitly mentioned: producing signatures manual? analysis not explicitly mentioned: disinfection manual? analysis 3

anti-virus techniques last time: signature-based detection regular expression-like matching snippets of virus(-like) code heuristic detection look for suspicious things behavior-based detection look for virus activity not explicitly mentioned: producing signatures manual? analysis not explicitly mentioned: disinfection manual? analysis 4

regular expression cheatsheet a matches a a* matches (empty string), a, aa, aaa, a\ matches the string a* foo bar matches foo, bar [ab] matches a, b [^ab] matches any byte except a and b (foo bar)* (empty string), foo, bar, foobar, barfoo, (. \n)* matches anything whatsoever 5

recall: why regular expressions? 6 (essentially) one-pass, lookup table not the most flexible, but fast flex regular expressions + code for exceptions

recall: faster than regular expressions? 7 optimization 1: look for fixed-length strings sliding window + hashtable test with full pattern optimization 2: head/tail scanning avoid reading whole files

scanning for fixed strings 8 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 16-byte anchor 204D616C6963696F7573205468696E6720 34567890ABCDEF023456789ABCDEFG0345 6120766972757320737472696E679090F2 malware Virus A Virus B Virus C

scanning for fixed strings 8 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 hash function 16-byte 4-byte anchor hash 204D616C6963696F7573205468696E6720 FC923131 34567890ABCDEF023456789ABCDEFG0345 34598873 6120766972757320737472696E679090F2 994254A3 malware Virus A Virus B Virus C

scanning for fixed strings 8 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 hash function 16-byte 4-byte anchor hash 204D616C6963696F7573205468696E6720 FC923131 34567890ABCDEF023456789ABCDEFG0345 34598873 6120766972757320737472696E679090F2 994254A3 malware Virus A Virus B Virus C (full pattern for Virus B)

scanning for fixed strings 8 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 hash function 16-byte 4-byte anchor hash 204D616C6963696F7573205468696E6720 FC923131 34567890ABCDEF023456789ABCDEFG0345 34598873 6120766972757320737472696E679090F2 994254A3 malware Virus A Virus B Virus C (full pattern for Virus B)

scanning for fixed strings 8 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 hash function 16-byte 4-byte anchor hash 204D616C6963696F7573205468696E6720 FC923131 34567890ABCDEF023456789ABCDEFG0345 34598873 6120766972757320737472696E679090F2 994254A3 malware Virus A Virus B Virus C (full pattern for Virus B)

virus patterns 9 specific large snippet of code from virus false positives essentially impossible general strategy (e.g. push + ret) false positives possible real applications might do this? might appear in application data?

detecting new malware 10 goal: detect unseen malware some signatures might do this look for strategies also look for anomalies hope that real compilers/linkers/etc. don t do

anti-virus techniques last time: signature-based detection regular expression-like matching snippets of virus(-like) code heuristic detection look for suspicious things behavior-based detection look for virus activity not explicitly mentioned: producing signatures manual? analysis not explicitly mentioned: disinfection manual? analysis 11

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load (also, some other information) segment 1 data segment 2 data segment 3 data virus segment

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load (also, some other information) length edited by virus segment 1 data segment 2 data virus code + new entry point? segment 3 data virus segment

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load (also, some other information) heuristiclength 1: entry edited point byinvirus last segment? (last segment usually not code) segment 1 data segment 2 data virus code + new entry point? segment 3 data virus segment

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load (also, some other information) new segment added by virus segment 1 data segment 2 data segment 3 data virus segment

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load (also, some other information) heuristic 1: is entry point in last segment? new segment added by virus (last segment usually not code) segment 1 data segment 2 data segment 3 data virus segment

viruses and executable formats 12 header: machine type, file type, etc. program header: segments to load heuristic (also, 2: some did other virus mess information) up header? (e.g. do sizes newused segment by linker added but not by loader virus disagree) section names segment disagree 1 data with usage? segment 2 data segment 3 data virus segment

defeating entry point checking 13 insert jump in normal code section and set as entry point; or assume it s reached soon

defeating entry point checking 13 insert jump in normal code section and set as entry point; or assume it s reached soon dynamic heuristic: run code in VM for a while, see if switches sections

heuristics: library calls 14 dynamic linking functions called by name how do viruses add to dynamic linking tables? often don t! instead dynamically look-up functions if do could mess that up/lots of code heuristic: look for API function name strings (outside linking info)

evading library call checking 15 modify dynamic linking tables reimplement library call manually Linux: usually easy Windows: system calls not well documented, change hide names

evading library call checking 15 modify dynamic linking tables reimplement library call manually Linux: usually easy Windows: system calls not well documented, change hide names

hiding library call names 16 common approach: store hash of name runtime: read library, scan list of functions for name bonus: makes analysis harder

anti-virus techniques last time: signature-based detection regular expression-like matching snippets of virus(-like) code heuristic detection look for suspicious things behavior-based detection look for virus activity not explicitly mentioned: producing signatures manual? analysis not explicitly mentioned: disinfection manual? analysis 17

behavior-based detection 18 things malware does that other programs don t?

behavior-based detection 18 things malware does that other programs don t? modify system files modifying existing executables open network connections to lots of random places

behavior-based detection things malware does that other programs don t? modify system files modifying existing executables open network connections to lots of random places monitor all programs for weird behavior problem: false positives (e.g. installers) 18

heuristic detection 19 virus shortcuts generally: not producing executable via normal linker generally: trying to make analysis harder push then ret instead of jmp entry point in wrong segment switching segments library calls without normal dynamic linker mechanisms infection behavior modifying executables/system files weird network connections

example heuristics: DREBIN (1) from 2014 research paper on Android malware: Arp et al, DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket features from applications (without running): hardware requirements requested permissions whether it runs in background, with pushed notifications, etc. what API calls it uses network addresses detect dynamic code generation explicitly statistics (i.e. machine learning) to determine score 20

example heuristics: DREBIN (2) 21 advantage: Android uses Dalvik bytecode (Java-like) high-level machine code much easier/more useful to analyze accuracy? tested on 131k apps, 94% of malware, 1% false positives versus best commercial: 96%, < 0.3% false positives (probably has explicit patterns for many known malware samples) but statistics: training set needs to be typical of malware cat-and-mouse: what would attackers do in response?

anti-virus techniques last time: signature-based detection regular expression-like matching snippets of virus(-like) code heuristic detection look for suspicious things behavior-based detection look for virus activity not explicitly mentioned: producing signatures manual? analysis not explicitly mentioned: disinfection manual? analysis 22

anti-anti-virus 23 defeating signatures: avoid things compilers/linkers never do make analysis harder takes longer to produce signatures takes longer to produce repair program make changing viruses make any one signature less effective

some terms 24 armored viruses viruses designed to make analysis harder metamorphic/polymorphic/oligomorphic viruses viruses that change their code each time different terms different types of changes (later)

encrypted(?) data 25 char obviousstring[] = "Please open this 100%" " safe attachment"; char lessobviousstring[] = "osz^lz\037pozq\037kwvl\037\016\017" "\017\032\037L^YZ\037^KK^\\WRZQK"; for (int i = 0; i < sizeof(lessobviousstring) 1; lessobviousstring[i] = lessobviousstring[i] ^ '?'; }

recall: hiding API calls 26 /* functions, functionsnames retrieved from library before */ /* 0xd7c9e758 = hash("getfileattributesa") */ unsigned hashofstring = 0xd7c9e758; for (int i = 0; i < num_functions; ++i) { unsigned functionhash = 0; for (int j = 0; j < strlen(functionnames[i]); functionhash = (functionhash * 7 + functionnames[i][j]); } if (functionhash == hashofstring) { return functions[i]; } }

encrypted data and signatures 27 doesn t really stop signatures encrypted string + decryption code is more unique but makes analyzing virus a little harder how much harder? exercise: how would you decrypt strings?

encrypted data and signatures 27 doesn t really stop signatures encrypted string + decryption code is more unique but makes analyzing virus a little harder how much harder? exercise: how would you decrypt strings? can we do better?

encrypted(?) viruses 28 char encrypted[] = "\x12\x45..."; char key[] = "..."; virusentrypoint() { decrypt(encrypted, key); goto encrypted; } decrypt(char *buffer, char *key) {...} choose a new key each time! not good encryption key is there sometimes mixed with compression

encrypted viruses: no signature? 29 decrypt is a pretty good signature still need to a way to disguise that code how about analysis? how does one analyze this?

not just anti-antivirus 30 encrypted body just running objdump not enough instead run debugger, set breakpoint after decryption dump decrypted memory afterwords

unneeded steps 31 understanding the encryption algorithm more complex encryption algorithm won t help extracting the key and encrypted data making key less obvious won t help

unneeded steps 31 understanding the encryption algorithm more complex encryption algorithm won t help extracting the key and encrypted data making key less obvious won t help needed to know when encryption finished needed debugger to work

unneeded steps understanding the encryption algorithm more complex encryption algorithm won t help extracting the key and encrypted data making key less obvious won t help needed to know when encryption finished needed debugger to work countermeasures? encrypt in strange order? multiple passes? anti-debugging (later) 31

example: Cascade decrypter lea encrypted_code, %si decrypt: mov $0x682, %sp // length of body xor %si, (%si) xor %sp, (%si) inc %si dec %sp jnz decrypt encrypted_code:... Szor Listing 7.1 32

example: Cascade decrypter lea encrypted_code, %si decrypt: mov $0x682, %sp // length of body xor %si, (%si) xor %sp, (%si) inc %si dec %sp jnz decrypt encrypted_code:... Szor Listing 7.1 32

example: Cascade decrypter lea encrypted_code, %si decrypt: mov $0x682, %sp // length of body xor %si, (%si) xor %sp, (%si) inc %si dec %sp jnz decrypt encrypted_code:... Szor Listing 7.1 32

decrypter 33 more variations: nested decrypters, different orders, etc. still problem: decrypter code is signature but harder to distinguish different malware often tries to frustrate debugging in other ways e.g. use stack pointer (not for the stack) (more on this later)

decrypter 33 more variations: nested decrypters, different orders, etc. still problem: decrypter code is signature but harder to distinguish different malware often tries to frustrate debugging in other ways e.g. use stack pointer (not for the stack) disinfection (more this later) want to precisely identify malware

decrypter 33 more variations: nested decrypters, different orders, etc. still problem: decrypter code is signature but harder to distinguish different malware often tries to frustrate debugging in other ways e.g. use easiest stackway pointer to defeat (not for decrypter the stack) manually: (more on this later) run in debugger until code is decrypted

legitimate packers 34 some commercial software is packaged in this way including antidebugging stuff why? intended to be copy/reverse engineering protection

playing mouse 35 signature-based techniques: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long shortcut: scan top and bottom virus-writer hat: how can you defeat these? encrypting code? encrypter is pattern

playing mouse 36 signature-based techniques: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long shortcut: scan top and bottom virus-writer hat: how can you defeat these? encrypting code? encrypter is pattern change some trivial part of virus e.g. add nops somewhere

adding nops 37 instead of copying, copy but insert nops a little tricky only between instructions could have hard-coded places to insert likely easy to turn into signature or tricky to write or can parse instructions x86 encoding isn t that bad malware can use limited subset

producing changing malware 38 not just nop: switch between synonym instructions swap registers random instructions that manipulate unused register

oligomorphic viruses 39 use packing technique but make slight changes to decrypters

example: W95/Memorial decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al dec %ecx jnz decrypt... decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al loop decrypt...... Szor, Listsings 7.3 and 7.4 40 mov $0x405000, %ebp mov $0x550, %ecx lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al mov $0x550, %ecx mov $0x13bc000, %ebp lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al

example: W95/Memorial mov $0x405000, %ebp mov $0x550, %ecx lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al dec %ecx jnz decrypt... mov $0x550, %ecx mov $0x13bc000, %ebp lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al change instruction order; location of decryption key/etc. decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al loop decrypt...... Szor, Listsings 7.3 and 7.4 40

example: W95/Memorial mov $0x405000, %ebp mov $0x550, %ecx lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al dec %ecx jnz decrypt... mov $0x550, %ecx mov $0x13bc000, %ebp lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al variable choices of loop instructions decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al loop decrypt...... Szor, Listsings 7.3 and 7.4 40

example: W95/Memorial mov $0x405000, %ebp mov $0x550, %ecx lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al dec %ecx jnz decrypt... mov $0x550, %ecx mov $0x13bc000, %ebp lea 0x2e(%ebp), %esi add 0x29(%ebp), %ecx mov 0x2d(%ebp), %al Szor: 96 different decryptor patterns decrypt: nop nop xor %al, (%esi) inc %esi nop inc %al loop decrypt...... Szor, Listsings 7.3 and 7.4 40

more advanced changes? 41 Szor calls W95/Memorial oligomoprhic encrypted code plus small changes to decrypter What about doing more changes to decrypter? many, many variations Szor calls doing this polymorphic polymorphic example: 1260

example: 1260 (virus) inc %si mov $0x0e9b, %ax clc mov $0x12a, %di nop mov $0x571, %cx decrypt: xor %cx, (%di) sub %dx, %bx sub %cx, %bx sub %ax, %bx nop xor %cx, %dx xor %ax, (%di)... mov $0x0a43, %ax nop mov $0x15a, %di sub %dx, %bx sub %cx, %bx mov $0x571, %cx clc decrypt: xor %cx, (%di) xor %cx, %dx sub %cx, %bx nop xor %cx, %bx xor %ax, (%di)... adapted from Szor, Listing 7.5 42

example: 1260 (virus) inc %si mov $0x0e9b, %ax clc mov $0x12a, %di nop mov $0x571, %cx decrypt: xor %cx, (%di) sub %dx, %bx sub %cx, %bx sub %ax, %bx nop xor %cx, %dx xor %ax, (%di)... mov $0x0a43, %ax nop mov $0x15a, %di sub %dx, %bx sub %cx, %bx mov $0x571, %cx clc decrypt: xor %cx, (%di) xor %cx, %dx sub %cx, %bx nop xor %cx, %bx xor %ax, (%di)... adapted from Szor, Listing 7.5 42

example: 1260 (virus) inc %si mov $0x0e9b, %ax clc mov $0x12a, %di nop mov $0x571, %cx decrypt: xor %cx, (%di) sub %dx, %bx sub %cx, %bx sub %ax, %bx nop xor %cx, %dx xor %ax, (%di)... mov $0x0a43, %ax nop mov $0x15a, %di sub %dx, %bx sub %cx, %bx mov $0x571, %cx clc decrypt: xor %cx, (%di) xor %cx, %dx sub %cx, %bx nop xor %cx, %bx xor %ax, (%di)... do-nothing instructions adapted from Szor, Listing 7.5 42

example: 1260 (virus) inc %si mov $0x0e9b, %ax clc mov $0x12a, %di nop mov $0x571, %cx decrypt: xor %cx, (%di) sub %dx, %bx sub %cx, %bx sub %ax, %bx nop xor %cx, %dx xor %ax, (%di)... mov $0x0a43, %ax nop mov $0x15a, %di sub %dx, %bx sub %cx, %bx mov $0x571, %cx clc decrypt: xor %cx, (%di) xor %cx, %dx sub %cx, %bx nop xor %cx, %bx xor %ax, (%di)... adapted from Szor, Listing 7.5 42

example: 1260 (virus) inc %si mov $0x0e9b, %ax clc mov $0x12a, %di nop mov $0x571, %cx decrypt: xor %cx, (%di) sub %dx, %bx sub %cx, %bx sub %ax, %bx nop xor %cx, %dx xor %ax, (%di)... mov $0x0a43, %ax nop mov $0x15a, %di sub %dx, %bx sub %cx, %bx mov $0x571, %cx clc decrypt: xor %cx, (%di) xor %cx, %dx sub %cx, %bx nop xor %cx, %bx xor %ax, (%di)... different decryption key adapted from Szor, Listing 7.5 42

lots of variation 43 essentially limitless variations of decrypter huge number of nop-like sequences plus reordering non-nop instructions can t just make scanner that skips obvious nops

lots of variation 43 essentially limitless variations of decrypter huge number of nop-like sequences plus reordering non-nop instructions can t just make scanner that skips obvious nops could try to analyze more deeply for nops could identify when instruction s result is unused

lots of variation essentially limitless variations of decrypter huge number of nop-like sequences plus reordering non-nop instructions can t just make scanner that skips obvious nops could try to analyze more deeply for nops could identify when instruction s result is unused but attacker can be more sophisticated: inc %ax; dec %ax xor %ax, %bx; xor %bx, %ax; xor %ax, %bx 43

interlude: anti-packer strategies 44

finding packers 45 easiest way to decrypt self-decrypting code run it! solution: virtual machine in antivirus software makes antivirtualization/emulation more important

finding packers with VM 46 run program in VM for a while how long? then scan memory for known patterns or detect jumping to written memory

stopping packers 47 it s unusual to jump to code you wrote modern OSs: memory is executable or writable not both

stopping packers 47 it s unusual to jump to code you wrote modern OSs: memory is executable or writable not both

diversion: DEP/W^X 48 memory executable or writeable but not both exists for exploits (later in course), not packers requires hardware support to be fast (early 2000s+) various names for this feature: Data Execution Prevention (DEP) (Windows) W^X ( write XOR execute ) NX/XD/XN bit (underlying hardware support) (No Execute/eXecute Disable/eXecute Never) special system call to switch modes

unusual, but 49 binary translation convert machine code to new machine code at runtime Java virtual machine, JavaScript implementations just-in-time compilers dynamic linkers load new code from a file same as writing code? those packed commercial programs programs need to explicitly ask for write+exec

finding packers 50 easiest way to decrypt self-decrypting code run it! solution: virtual machine in antivirus software makes antivirtualization/emulation more important

antivirtualization techniques 51 query virtual devices time operations that are slower in VM/emulation use operations not supported by VM

antivirtualization techniques 52 query virtual devices time operations that are slower in VM/emulation use operations not supported by VM

virtual devices 53 VirtualBox device drivers? VMware-brand ethernet device?

antivirtualization techniques 54 query virtual devices solution: mirror devices of some real machine time operations that are slower in VM/emulation use operations not supported by VM

antivirtualization techniques 54 query virtual devices time operations that are slower in VM/emulation use operations not supported by VM

slower operations 55 not- native VM: everything is really slow otherwise trigger callbacks to VM implementation: system calls? allocating and accessing memory? and hope it s reliably slow enough

antivirtualization techniques 56 query virtual devices time operations that are slower in VM/emulation solution: virtual clock use operations not supported by VM

antivirtualization techniques 56 query virtual devices time operations that are slower in VM/emulation use operations not supported by VM

operations not supported 57 missing instructions kinds? FPU instructions MMX/SSE instructions undocumented (!) CPU instructions not handling OS features? setting up special handlers for segfault multithreading system calls that make callbacks antivirus not running system VM to do decryption needs to emulate lots of the OS itself

attacking emulation patience 58 looking for unpacked virus in VM or other malicious activity when are you done looking?

attacking emulation patience 58 looking for unpacked virus in VM or other malicious activity when are you done looking? malware solution: take too long not hard if emulator uses slow implementation

attacking emulation patience 58 looking for unpacked virus in VM or other malicious activity when are you done looking? malware solution: take too long not hard if emulator uses slow implementation malware solution: don t infect consistently

probability antivirus emulator: randomnumber() == 3 looks clean! if (randomnumber() == 4) { unpackandrunevilcode(); } real execution #1: randomnumber() == 2 no infection! real execution #N: randomnumber() == 4 infect! 59

on goats 60 analysis (and maybe detection) uses goat files sacrificial goat to get changed by malware heuristics can avoid simple goat files, e.g.: don t infect small programs don t infect huge programs don t infect programs with huge amounts of nops

goats as detection 61 tripwire for malware touching do-nothing.exe very likely bad

goats as analysis 62 more important for analysis of changing malware want examples of multiple versions want it to be obvious where malware code added e.g. big cavities to fill in original e.g. obvious patterns in original code/data

changing bodies 63 decrypting a virus body gives body for signature just need to run decrypter how about avoiding static signatures entirely called metamorphic versus polymorphic only change decrypter

example: changing bodies pop %edx mov $0x4h, %edi mov %ebp, %esi mov $0xC, %eax add $0x88, %edx mov (%edx), %ebx mov %ebx, 0x1118(%esi,%eax,4) code above: after decryption every instruction changes pop %eax mov $0x4h, %ebx mov %ebp, %esi mov $0xC, %edi add $0x88, %eax mov (%eax), %esi mov %esi, 0x1118(%esi, still has good signatures with alternatives for each possible register selection but harder to write/slower to match 64

case study: Evol 65 via Lakhatia et al, Are metamorphic viruses really invincible?, Virus Bulletin, Jan 2005. mutation engine run as part of propagating the virus code disassemble instr. lengths transform relocate code

case study: Evol 66 via Lakhatia et al, Are metamorphic viruses really invincible?, Virus Bulletin, Jan 2005. mutation engine run as part of propagating the virus code disassemble instr. lengths transform relocate code

Evol instruction lengths 67 sounds really complicated? virus only handles instructions it has: about 61 opcodes, 32 of them identified by first four bits e.g. opcode 0x7x conditional jump no prefixes, no floating point only %reg or $constant or offset(%reg)

case study: Evol 68 via Lakhatia et al, Are metamorphic viruses really invincible?, Virus Bulletin, Jan 2005. mutation engine run as part of propagating the virus code disassemble instr. lengths transform relocate code

Evol transformations some stuff left alone static or random one of N transformations example: mov %eax, 8(%ebp) push %ecx mov %ebp, %ecx add $0x12, %ecx mov %eax, 0xa(%ecx) pop %ecx uses more stack space save temporary code gets bigger each time Lakhotia et al., Are metamorphic viruses really invincible?, Virus Bulletin, Jan 2005 69

case study: Evol 70 via Lakhatia et al, Are metamorphic viruses really invincible?, Virus Bulletin, Jan 2005. mutation engine run as part of propagating the virus code disassemble instr. lengths transform relocate code

mutation with relocation 71 table mapping old to new locations list of number of bytes generated by each transformation list of locations references in original record relative offset in jump record absolute offset in original

relocation example mov... mov... decrypt: xor %rax, (%rbx) inc %rbx dec %rcx jne decrypt orig. len new len instr 5 10 mov1 2 3 mov2 2 7 xor1 1 1 inc1 1 5 dec1 3 3 jne1 address loc orig. target new target 10+3+7+1+5+1 xor1 (5 + 2) xor1 (10 + 3) (jne1+1) 72

mutation engines 73 tools for writing polymorphic viruses best: no constant bytes, no no-op instructions tedious work to build state-machine-based detector ((almost) a regular expression to match it) apparently done manually automatable? pattern: used until reliably detected

fancier mutation 74 can do mutation on generic machine code just need full disassembler identify both instruction lengths and addresses hope machine code not written to rely on machien code sizes, etc. hope to identify tables of function pointers, etc.

fancier mutation 75 also an infection technique no cavity needed create one obviously tricky to implement need to fix all executable headers what if you misparse assembly? what if you miss a function pointer? example: Simile virus

antiantivirus 76 already covered: break disassemblers with packers break VMs/emulators break debuggers make analysis harder break antivirus software itself retrovirus

antiantivirus 77 already covered: break disassemblers with packers break VMs/emulators break debuggers make analysis harder break antivirus software itself retrovirus

diversion: debuggers 78 we ll care about two pieces of functionality: breakpoints debugger gets control when certain code is reached single-step debugger gets control after a single instruction runs

diversion: debuggers 79 we ll care about two pieces of functionality: breakpoints debugger gets control when certain code is reached single-step debugger gets control after a single instruction runs

implementing breakpoints 80 idea: change movq %rax, %rdx addq %rbx, %rdx // BREAKPOINT HERE subq 0(%rsp), %r8... into movq %rax, %rdx jmp debugger_code subq 0(%rsp), %r8...

implementing breakpoints 80 idea: change movq %rax, %rdx addq %rbx, %rdx // BREAKPOINT HERE subq 0(%rsp), %r8... into movq %rax, %rdx jmp debugger_code subq 0(%rsp), %r8... problem: jmp might be bigger than addq?

int 3 81 x86 breakpoint instruction: int 3 Why 3? fourth entry in table of handlers one byte instruction encoding: CC debugger modifies code to insert breakpoint has copy of original somewhere invokes handler setup by OS debugger can ask OS to be run by handler or changes pointer to handler directly on old OSes

int 3 handler 82 kind of exception handler recall: exception handler = way for CPU to run OS code x86 CPU saves registers, PC for debugger x86 CPU has easy to way to resume debugged code from handler

detecting int 3 directly (1) checksum running code mycode:... movq $0, %rbx movq $mycode, %rax loop: addq (%rax), %rbx addq $8, %rax cmpq $endcode, %rax jl loop cmpq %rbx, $EXPECTED_VALUE jne debugger_found... endcode: 83

detecting int 3 directly (2) 84 query the handler for int 3 old OSs only; today: cannot set directly modern OSs: ask if there s a debugger attached or try to attach as debugger yourself doesn t work debugger present, probably does work broke any debugger? // Windows API function! if (IsDebuggerPresent()) {

modern debuggers 85 int 3 is the oldest x86 debugging mechanism modern x86: 4 breakpoint registers (DR0 DR3) contain address of program instructions need more than 4? sorry processor triggers exception when address reached 4 extra registers + comparators in CPU? flag to invoke debugger if debugging registers used enables nested debugging

diversion: debuggers 86 we ll care about two pieces of functionality: breakpoints debugger gets control when certain code is reached single-step debugger gets control after a single instruction runs

implementing single-stepping (1) set a breakpoint on the following instruction? movq %rax, %rdx addq %rbx, %rdx // STOPPED HERE subq 0(%rsp), %r8 // SINGLE STEP TO HERE subq 8(%rsp), %r8... transformed to movq %rax, %rdx addq %rbx, %rdx // STOPPED HERE int 3 // SINGLE STEP TO HERE subq 8(%rsp), %... then jmp to addq 87

implementing single-stepping (1) set a breakpoint on the following instruction? movq %rax, %rdx addq %rbx, %rdx // STOPPED HERE subq 0(%rsp), %r8 // SINGLE STEP TO HERE subq 8(%rsp), %r8... transformed to movq %rax, %rdx addq %rbx, %rdx // STOPPED HERE int 3 // SINGLE STEP TO HERE subq 8(%rsp), %... then jmp to addq 87

implementing single-stepping (2) 88 typically hardware support for single stepping x86:int 1 handler (second entry in table) x86: TF flag: execute handler after every instruction except during handler (whew!)

Defeating single-stepping 89 try to install your own int 1 handler (if OS allows) try to clear TF? (if debugger doesn t reset it)

unstealthy debuggers 90 is a debugger installed? unlikely on Windows, maybe ignore those machines is a debugger process running (don t check if it s tracing you)

confusing debuggers 91 broken executable formats e.g., recall ELF: segments and sections corrupt sections program still works overlapping segments/sections program still works use the stack pointer not for the stack stack trace?

antiantivirus 92 already covered: break disassemblers with packers break VMs/emulators break debuggers make analysis harder break antivirus software itself retrovirus

attacking antivirus (1) 93 how does antivirus software scan new things? register handlers with OS/applications new files, etc. how about registering your own?

hooking 94 hooking getting a hook to run on (OS) operations e.g. creating new files ideal mechanism: OS support less ideal mechanism: change library loading e.g. replace open, fopen, etc. in libraries less ideal mechanism: replace OS exception (system call) handlers very OS version dependent

hooking 95 hooking getting a hook to run on (OS) operations e.g. creating new files ideal mechanism: OS support less ideal mechanism: change library loading e.g. replace open, fopen, etc. in libraries less ideal mechanism: replace OS exception (system call) handlers very OS version dependent

96

hooking 97 hooking getting a hook to run on (OS) operations e.g. creating new files ideal mechanism: OS support less ideal mechanism: change library loading e.g. replace open, fopen, etc. in libraries less ideal mechanism: replace OS exception (system call) handlers very OS version dependent

changing library loading 98 e.g. install new library or edit loader, but not everything uses library functions what if your wrapper doesn t work exactly the same?

hooking 99 hooking getting a hook to run on (OS) operations e.g. creating new files ideal mechanism: OS support less ideal mechanism: change library loading e.g. replace open, fopen, etc. in libraries less ideal mechanism: replace OS exception (system call) handlers very OS version dependent

attacking antivirus (2) 100 just directly modify it example: IDEA.6155 modifies database of scanned files preserve checksums example: HybrisF preserved CRC32 checksums of infected files some AV software won t scan again

armored viruses 101 encrypted viruses not strong encryption key is there! self-changing viruses: encrypted oligiomorphic polymorphic metamorphic breaking debuggers, antivirus

residence 102 our model of malware runs when triggered reality: sometimes keep on running evade active detection spread to new programs/files as created/run

real signatures: ClamAV 103 ClamAV: open source email scanning software signature types: hash of file hash of contents of segment of executable built-in executable, archive file parser fixed string basic regular expressions wildcards, character classes, alternatives more complete regular expressions including features that need more than state machines meta-signatures: match if other signatures match icon image fuzzy-matching