RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN

Similar documents
Hash-Based Carving: Searching Media For Complete Files And File Fragments With Sector Hashing And Hashdb

24) Type a note then click the OK button to save the note. This is a good way to keep notes on items of interest.

Introduction to carving File fragmentation Object validation Carving methods Conclusion

Introduction. Collecting, Searching and Sorting evidence. File Storage

Civil Engineering Computation

Guide to Computer Forensics and Investigations Fourth Edition. Chapter 6 Working with Windows and DOS Systems

Quick Preview of Drives Using Autopsy

Ed Ferrara, MSIA, CISSP

Report For Algonquin Township Highway Department

Computer Forensics CCIC Training

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

CEIC 2007 May 8, 2007

INSTITUTO SUPERIOR TÉCNICO

This is a book about using Visual Basic for Applications (VBA), which is a

A DOZEN REASONS TO ABSOLUTELY NEVER USE ONENOTE

ANALYSIS AND VALIDATION

Scottish Improvement Skills

File Carving Using Sequential Hypothesis Testing

This book is about using Visual Basic for Applications (VBA), which is a

Quantifying FTK 3.0 Performance with Respect to Hardware Selection

Memory Addressing, Binary, and Hexadecimal Review

The Right Read Optimization is Actually Write Optimization. Leif Walsh

Computer Forensics: Investigating Data and Image Files, 2nd Edition. Chapter 3 Forensic Investigations Using EnCase

Design Tradeoffs for Developing Fragmented Video Carving Tools

Multi-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity

CDs & DVDs: Different Types of Disk Explained

Whitepaper Wishful Thinking vs. Reality in Regards to Virtual Backup and Restore Environments

Chapter Two File Systems. CIS 4000 Intro. to Forensic Computing David McDonald, Ph.D.

Overview. Top. Welcome to SysTools MailXaminer

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Debugging Your Python Code: For Dummies

Advancing the AFF4 to the challenges of Volatile Memory & Single Hashes

Here we will look at some methods for checking data simply using JOSM. Some of the questions we are asking about our data are:

Computer Forensics CCIC Training

Windows Forensics Advanced

CSE 142/143 Unofficial Commenting Guide Eric Arendt, Alyssa Harding, Melissa Winstanley

SYSTEM FORENSICS, INVESTIGATION, AND RESPONSE (INFORMATION SYSTEMS SECURITY & ASSURANCE) BY JOHN R. VACCA, K RUDOLPH

How to Rescue a Deleted File Using the Free Undelete 360 Program

XP: Backup Your Important Files for Safety

Contents Preamble... 4 System Requirements... 4 Generic Table Functions Column Selector... 5 Search-Boxes... 5

Spectroscopic Analysis: Peak Detector

Digital Forensics Lecture 01- Disk Forensics

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Computer Organization

YOUR GUIDE TO. Skype for Business

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Making a PowerPoint Accessible

Breaking the Performance Wall: The Case for Distributed Digital Forensics

@mafintosh

Taskbar: Working with Several Windows at Once

Final Examination CS 111, Fall 2016 UCLA. Name:

How to use the Vlookup function in Excel

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

10/13/11. Objectives. Live Acquisition. When do we consider doing it? What is Live Acquisition? The Order of Volatility. When do we consider doing it?

Computer Forensics CCIC Training

Software Compare and Contrast

Dissecting Files. Endianness. So Many Bytes. Big Endian vs. Little Endian. Example Number. The "proper" order of things. Week 6

On Criteria for Evaluating Similarity Digest Schemes

Basics of Database Corruption Repair

Oracle Forensics Part 2: Locating dropped objects

CS370 Operating Systems

SentinelOne Technical Brief

AccessData Advanced Forensics

CSN08101 Digital Forensics. Module Leader: Dr Gordon Russell Lecturers: Robert Ludwiniak

Digital Forensics Lecture 02- Disk Forensics

Last Class: Deadlocks. Where we are in the course

Intro to Algorithms. Professor Kevin Gold

Searching for Yahoo Chat fragments in Unallocated Space Detective Eric Oldenburg, Phoenix Police Department

It was a dark and stormy night. Seriously. There was a rain storm in Wisconsin, and the line noise dialing into the Unix machines was bad enough to

Rapid Forensic Imaging of Large Disks with Sifting Collectors

Quite Hot 3. Installation... 2 About the demonstration edition... 2 Windows... 2 Macintosh... 3

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Arrays. myints = new int[15];

CSE 4482 Computer Security Management: Assessment and Forensics. Computer Forensics: Working with Windows and DOS Systems

Chromebooks boot in seconds, and resume instantly. When you turn on a Chromebook and sign in, you can get online fast.

Processing Troubleshooting Guide

Forensic and Log Analysis GUI

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

Random Sampling applied to Rapid Disk Analysis

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

There are a number of ways of doing this, and we will examine two of them. Fig1. Circuit 3a Flash two LEDs.

Digital Forensics UiO

A Combination of Advanced Carver and Intelligent Parser

Introduction to Programming Style

Tales from cloud nine. Mihai Chiriac, BitDefender

Digital Forensics UiO. Digital Forensics in Incident Management. About Me. Outline. Incident Management. Finding Evidence.

TESTIMONY ORGANIZED, INTEGRATED, SEARCHABLE BRINGING YOU CLOSER TO THE WIN

Introduction to Volume Analysis, Part I: Foundations, The Sleuth Kit and Autopsy. Digital Forensics Course* Leonardo A. Martucci *based on the book:

From Smith-Waterman to BLAST

Windows Data Recovery 7

Cross Drive Analysis: A New Analytic Approach for Massive Data Sets

How to recover a failed Storage Spaces

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Power users are all about shortcuts,

COMP 273 Winter physical vs. virtual mem Mar. 15, 2012

Running head: FTK IMAGER 1

Here we will look at some methods for checking data simply using JOSM. Some of the questions we are asking about our data are:

Contact Information. Contact Center Operating Hours. Other Contact Information. Contact Monday through Thursday Friday

Topic Data carving, as defined by Digital Forensic Research Workshop is the process of

Transcription:

RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN

MANY THANKS TO: OSDFCon and Basis Bruce Allen Scott Young Joel Young Simson Garfinkel All of whom have helped with this project immensely.

PROBLEM STATEMENT: GIVEN A LIST OF INTERESTING FILES, FIND ALL DEVICES THAT CONTAIN AT LEAST ONE.

A COMMON SOLUTION: HASH EVERYTHING. COMPARE HASHES. 1. Keep a database of the hashes of your interesting files. 2. Extract files from each new device you see: - Overt + undeleted + carved 3. Hash extracted files. Look for matches against the database. MATCH? 4. Solved! YES/NO

THERE ARE MANY COMMON SCENARIOS WHERE FILE HASHING DOESN T WORK. Comparing cryptographic hashes only finds exact matches. Flip a bit, the hash changes completely. Everybody knows this already. Usually it comes up when people are worrying about mildly clever adversaries. But there are many more mundane reasons to worry.

EXAMPLE 1: EXCEL CHANGES FILE CONTENT WHEN YOU SAVE GET NEW MD5 How to get a new MD5 for your Excel file: Open your file. Click save. Complete!

HERE S THE HEX VIEW OF THE CHANGES BEFORE SAVING AFTER SAVING SHIFTED FROM HERE TO THE END (~1.5K)

EXAMPLE 2: POWERPOINT EXHIBITS SIMILAR BEHAVIOR GET NEW MD5 Not as many changes, but one byte is enough. Note that the user content is 100% unchanged!

DIFF OF POWERPOINT HEX DUMPS BEFORE AND AFTER SAVING:

MD5 (PLAN.PDF) = 0E2CDDFF785730F97C93E087A270250A THERE ARE MANY COMMON SCENARIOS WHERE FILE HASHING DOESN T WORK. MD5 (PLAN.PDF) = 04206496CA24F93A75C79186572E8670 Small, fixed length. Infinitesimal false positive rate. EXAMPLE 3: TWO IDENTICAL PDFS CREATED BY PDFLATEX

AGAIN, HERE IS THE HEX VIEW. IF YOU LOOK CLOSELY YOU CAN SEE THE TIME STAMP HAS CHANGED FILE ID?

CONCLUSION: MANY APPLICATIONS CHANGE FILE-LAYER CONTENT. The point is not that your investigation relies on Excel files or finding PowerPoint presentations. The point is: applications don t care about forensic integrity. The changes above are transparent to the user.

COROLLARY: WE MAY BE MISSING EVIDENCE. Other things you won t find with file hashes: Files that are deleted and partially overwritten. Slightly edited files. Corrupted files in overt space. Badly carved files. Fragments of files in unallocated space.

WORSE: THESE ARE CASES WHERE THE SUSPECT IS NOT EVEN TRYING. At most, they require someone to click save or empty the recycling bin. The state produced by default behavior is much more concerning than any single clever person. Many reasons to improve this process, but the most compelling is economic: Solving the problem should be cost less than creating it.

AN ARGUMENT FOR APPROXIMATE MATCHING STRATEGIES: Not a fancy thing: we often just want to find files with the exact same user-level content. Fundamentally, the approach is not that different: 1. Cut the object into pieces. 2. Hash the pieces. 3. Count matching hashes.

THIS PROBLEM HAS MANY KNOWN SOLUTIONS. Numerous sophisticated matching schemes: Rabin fingerprinting, ssdeep, sdhash, frag_find, sector hashing Lots of good stuff we try to build on it. A skilled examiner could find everything we mentioned. Many groups have in-house solutions. Why write a new tool?

REASONS FOR A NEW TOOL 1. Nothing on the previous slide will change the economic problem. 2. Make already developed techniques more available. 3. Getting it right is harder than it first seems. - In particular, the feature selection problem has not been solved.

OUR GOALS: TRIVIAL, SCALABLE, COMPREHENSIBLE. Trivial: It has to be easier to discover targets than it is to obscure them. Scalable: Should be able to search for millions of files at once. Comprehensible: operation of tool and interpretation of results must be intuitive.

SECTORSCOPE DEMO Target: the 3 files shown above. Media: a modified disk image from the m57 patents scenario (exfat). Files were added to a subfolder on the image then deleted. Subfolder was deleted then replaced so it is not trivially easy to recover through Autopsy / TSK.

STEP 1: CREATE A BLACKLIST We store our blacklist in hashdb: a fast, lightweight key-value store we designed for this project. Blacklist files are hashed in 512-byte chunks and imported into the database using the hashdb bulk_extractor scanner. 512B BULK EXTRACTOR MD5 At the moment, this requires a command line. HASHDB

STEP 2: SCAN MEDIA Could do this at the command line Let s use Autopsy. Install SectorScope and the SectorScope plugin. Add your disk image to the case. Run SectorScope ingest module.

STEP 3: ANALYZE THE RESULTS. Open SectorScope from the Reports listing:

RESULTS FROM OUR 3 FILE BLACKLIST. BLACKLIST MATCHES MEDIA IS REPRESENTED AS A SINGLE LINE SOURCES FOUND

PAN AND ZOOM TO NAVIGATE THE MEDIA. The list is sorted by percent of file found. Highlight a file to see where it is. Ignore it to get it off the screen. Select a range to see which sources are there. Adjust the resolution. Open a hex-dump of the data.

LET S LOOK AT A NON- TRIVIAL EXAMPLE.

THE SCAN RUNS BULK_EXTRACTOR S HASHDB MODULE TO QUERY THE BLACKLIST. Scanning took 109 seconds to scan against 20,772,713 hashes Lenovo Thinkpad with 8G of RAM, and a Dual Core i7-5600u CPU (2.6 GHz) You can beat this speed with faster hardware Compare to 99 seconds to scan against 2,607 hashes: a factor of a thousand takes 10 seconds longer.

THINGS GET MORE COMPLICATED WITH 20 MILLION HASHES. Added from govdocs not all high entropy. Need to use noise reduction to filter out false positives.

NOISE REDUCTION IS TRICKY Down at the level of a 512-byte block, it s not really clear what a match tells us. Some blocks are extremely rare. (2 4096 is a lot of possibilities). Some blocks show up all over the place. We use an algorithm published in a recent DFRWS paper: Garfinkel, S. L., & McCarrin, M. (2015). Hash-based carving: Searching media for complete files and file fragments with sector hashing and hashdb. Digital Investigation, 14, S95-S105.

FUTURE WORK: ANNOTATE WITH TSK AND BULK_EXTRACTOR DATA blacklist matches nulls Part 0 (Unalloc) Part 1 (NTFS) } overt files } unalloc, unknown Part 2 (Unalloc zoom detail friends.xls 555-3233 555-0999 jo@m57.biz pat@m57.biz

YOU NEED TO KNOW WHAT YOU DON T KNOW. Few forensics tools will tell you this. How much of the image contains data that you know nothing about at all? Our end goal is to bring available tools together to map out every sector.

Questions? Contact: Michael McCarrin mrmccarr@nps.edu Bruce Allen bdallen@nps.edu SectorScope on GitHub: https://github.com/nps-deep/nps-sectorscope