The Normalized Compression Distance as a File Fragment Classifier

Similar documents
File Fragment Encoding Classification: An Empirical Approach

Categories of Digital Investigation Analysis Techniques Based On The Computer History Model

File Classification Using Sub-Sequence Kernels

Ranking Algorithms For Digital Forensic String Search Hits

Developing a New Digital Forensics Curriculum

Extracting Hidden Messages in Steganographic Images

Design Tradeoffs for Developing Fragmented Video Carving Tools

An Evaluation Platform for Forensic Memory Acquisition Software

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Automated Identification of Installed Malicious Android Applications

Monitoring Access to Shared Memory-Mapped Files

System for the Proactive, Continuous, and Efficient Collection of Digital Forensic Evidence

OPSWAT Metadefender. Superior Malware Threat Prevention and Analysis

Fast Indexing Strategies for Robust Image Hashes

Android Forensics: Automated Data Collection And Reporting From A Mobile Device

Hash Based Disk Imaging Using AFF4

Statistical Disk Cluster Classification for File Carving

A Study of User Data Integrity During Acquisition of Android Devices

Rapid Forensic Imaging of Large Disks with Sifting Collectors

A Correlation Method for Establishing Provenance of Timestamps in Digital Evidence

CAT Detect: A Tool for Detecting Inconsistency in Computer Activity Timelines

File Upload extension User Manual

Predicting the Types of File Fragments

Unification of Digital Evidence from Disparate Sources

New Look for the My Goals Page

NAS 256 DataSync for Microsoft OneDrive. Backup data from Microsoft OneDrive to your NAS

Uploading a File in the Desire2Learn Content Area

On Criteria for Evaluating Similarity Digest Schemes

PhotoFast MemoriesCable U2. Market leading design and technology

A Functional Reference Model of Passive Network Origin Identification

Honeynets and Digital Forensics

Document Version: 1.0. Purpose: This document provides an overview of IBM Clinical Development v released by the IBM Corporation.

WHITE PAPER WEB CACHE DECEPTION ATTACK. Omer Gil. July

Multidimensional Investigation of Source Port 0 Probing

Breaking the Performance Wall: The Case for Distributed Digital Forensics

Privacy-Preserving Forensics

Extending Datacenter Capacity To Drive ROI

Cracked Character Count Tool pc software download free full ]

Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin and Man-Pyo Hong

COMMERCIAL CARD SERVICES CARDHOLDER GUIDE. Adding Receipt Images

Features & Functionalities

ScholarOne Manuscripts. Author File Upload Guide

bbc Adobe LiveCycle Content Services Mobile System requirements APPLIES TO Server CONTENTS iphone License information

A Novel Support Vector Machine Approach to High Entropy Data Fragment Classification

Supported File Types

. Click on the File Sharing link to access File Sharing..

Documents & Links. Introduction. Documents. Quick Answer. Uploading a New Document

Features & Functionalities

Introduction to Content

Completing the Administration Form

HTM, HTML, MHT, MHTML Web document Brightspace Learning Environment strips the <title> tag and text within the tag from user created web documents

Characterisation. Digital Preservation Planning: Principles, Examples and the Future with Planets. July 29 th, 2008

National Aeronautics and Space Admin. - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

CollegiateLink Student Leader User Guide

DBit Ersatz-11 PDP-11 Emulator - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

How to submit an assignment to Turnitin full student guide

Evaluating Text Extraction: Developing a Toolkit for Apache Tika

Enolsoft PDF Creator Tutorial

Different File Types and their Use

A Novel Approach of Mining Write-Prints for Authorship Attribution in Forensics

PEERNET File Conversion Center

Can Digital Evidence Endure the Test of Time?

FILE SHARING OVERVIEW HOW DO I ACCESS FILE SHARING? NAVIGATION

The content editor has two view modes: simple mode and advanced mode. Change the view in the upper-right corner of the content editor.

Pearson Higher Education - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Client Website Overview Guide

Genesis Webinar-To-Go Quick Reference Guide

Wealth Management Center Overview Guide

Cisdem PDF Converter OCR Tutorial

The purpose of this document is to explain how to use the Teacher/Principal Evaluation system as a person being evaluated (an evaluatee).

Eee Note Quick Start Guide

Instructions for Using Events

University of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Send to Binder in Content (KA #849)

JoomDOC Documentation

How to apply: The online application process explained step by step

Introduction. Collecting, Searching and Sorting evidence. File Storage

IKS Service GmbH - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Digital Signs 101. Learning the Content Manager Express (CMX) Web App Enterprise Digital Signage Training. Page! 1 of! 17

Using the Sp13 Text Editor elearning Blackboard Learn 9.1 for Students

Website Overview. Your Disclaimer Here. 1 Website Overview

Exhibitor Area User Guide

Broward County Cultural Division FY2019 GUIDE TO THE ONLINE APPLICATION

FlipBook Creator utility Print any Document to flash flip book. User Documentation

How to Join a Phone Conference

Coastal Connections. Student Leader User Guide

SharePoint Attachment Extractor and Metadata Manager for Microsoft Dynamics CRM. Release Notes

CollegiateLink Student Leader User Guide

Instructions for Web CRD

University of California - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

ALL IN ONE ANDROID NETWORK SCREENS

WHAT'S NEW UVC MOBILE APP VERSION 2.0

Instructor: Craig Duckett. Lecture 03: Tuesday, April 3, 2018 SQL Sorting, Aggregates and Joining Tables

Student Leader and Advisor User Guide shockersync.wichita.edu Updated November 2018

INFORMZ USER GUIDE: The Asset Manager

WebSpace - Creating Content, Pages, And Posts

DOWNLOAD OR READ : WORD AND IMAGE IN ARTHURIAN LITERATURE PDF EBOOK EPUB MOBI

SPring8/SACLA. Scholar One Manuscripts Online System Manual. Submitting Your Manuscript to Scholar One Manuscript. revision 2016/12/9 第 7 版

ELIMINATING ZERO-DAY MALWARE ATTACKS IN DOCUMENTS DO NOT ASSUME OPENING A NORMAL BUSINESS DOCUMENT IS RISK FREE

Publitas e-docconverter API User Guide Publitas e-docconverter API User Guide

Transcription:

DIGITAL FORENSIC RESEARCH CONFERENCE The Normalized Compression Distance as a File Fragment Classifier By Stefan Axelsson Presented At The Digital Forensic Research Conference DFRWS 2010 USA Portland, OR (Aug 2 nd - 4 th ) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. http:/dfrws.org

The Normalized Compression Distance as a File Fragment Classifier Stefan Axelsson, Sweden stefan.axelsson@bth.se

The problem An analyst is often faced with an assortment of file fragments from slack space on e.g. usb-sticks, harddisks etc. Often enough of the header hasn't survived to help identify the type of the file fragment The type is an important piece of information when trying to reconstruct the actual files (or parts of them at least) Thus automatic file fragment type classification is a worthwhile endeavour

The approach Normalised compression distance: Built on the idea that modern compression algorithms are close to the (incomputable) Kolmogorov complexity measure of distance The idea is that by compressing two vectors (A and B) and then the concatenation (AB) then the shorter the compressed concatenation is compared to A (or B) then the closer they are, i.e. the less redundancy there is between them Formally:

Classification Straight old k-nearest-neighbour (for different values of 'k') i.e. with k=10 if three of the closest feature vectors were of type zip and seven were of type exe, then the feature vector under classification is assigned class exe, even if the closest example might have been a zip feature vector We do the full n-valued classification here, i.e. we don't try and classify pairs of files against each other, but classify all file fragments into one of the (twenty-eight) different file types

The corpus In order to promote comparison between approaches we've used the public corpus by Simpson et.al. In particular the 1 million files of different file types It contains 28 different types of files (based on file name endings) pdf, html, jpg, text, doc, xls, ppt, xml, gif, ps, csv, gz, eps, png, swf, pps, sql, java, pptx, docx, ttf, js, pub, bmp, xbm, xlsx, jar, and zip

Experimental data As a rule fourteen 512byte fragments were selected at random from each file (for files that were long enough) 512 was chosen since that's the minimum frag size in common use and also the most conservative choice In total close to 32000 fragments were chosen Since run time is quadratic and in lieu of doing ten-fold cross validation we ran the experiment ten times on ca 3000 blocks each time

Overall results Overall accuracy (random would be 3.5%, i.e. 1/28

Results per file type

Min accuracy per frag type (%)

Partial confusion matrix

The prototype detector 20 fragments that were the best at predicting their brethren were chosen for each file type (counting excluded fragments from the same file) In total 560 fragments This includes examples of file types that do not do that well as the classifier would be blind to these types otherwise Downloaded ten files for each type at random and did a similar experiment as before (but only five runs)

Conclusion Even though the method is very straightforward and simple, simplistic even and easily implemented we still manage to correctly classify quite a few file types with a worthwhile degree of accuracy Given a fairly conservative setting, n-valued detection problem etc. However, even though the method is difficult to compare to other research in the area, others report better figures; so there's room for improvement Our data is available on request