Breaking the Performance Wall: The Case for Distributed Digital Forensics

Similar documents
Rapid Forensic Imaging of Large Disks with Sifting Collectors

Categories of Digital Investigation Analysis Techniques Based On The Computer History Model

Extracting Hidden Messages in Steganographic Images

File Fragment Encoding Classification: An Empirical Approach

Monitoring Access to Shared Memory-Mapped Files

Unification of Digital Evidence from Disparate Sources

System for the Proactive, Continuous, and Efficient Collection of Digital Forensic Evidence

Better Security Tool Designs: Brainpower, Massive Threading, and Languages

An Evaluation Platform for Forensic Memory Acquisition Software

File Classification Using Sub-Sequence Kernels

Automated Identification of Installed Malicious Android Applications

Developing a New Digital Forensics Curriculum

Design Tradeoffs for Developing Fragmented Video Carving Tools

The Normalized Compression Distance as a File Fragment Classifier

A Novel Approach of Mining Write-Prints for Authorship Attribution in Forensics

A Correlation Method for Establishing Provenance of Timestamps in Digital Evidence

Ranking Algorithms For Digital Forensic String Search Hits

Clustering and Reclustering HEP Data in Object Databases

On Criteria for Evaluating Similarity Digest Schemes

CAT Detect: A Tool for Detecting Inconsistency in Computer Activity Timelines

A Functional Reference Model of Passive Network Origin Identification

Fast Indexing Strategies for Robust Image Hashes

SuperImager TM -Rugged USB Display Touch Screen SAS Drive Slots A Computer Forensic- Field Analysis Platform Unit

A Strategy for Testing Hardware Write Block Devices

SECURE, AUDITED PROCESSING OF DIGITAL EVIDENCE: FILESYSTEM SUPPORT FOR DIGITAL EVIDENCE BAGS

Quantifying FTK 3.0 Performance with Respect to Hardware Selection

Computer Forensics: Investigating Data and Image Files, 2nd Edition. Chapter 3 Forensic Investigations Using EnCase

Android Forensics: Automated Data Collection And Reporting From A Mobile Device

ANALYSIS AND VALIDATION

Information Assurance In A Distributed Forensic Cluster

Honeynets and Digital Forensics

Implementing Software RAID

Forensic Toolkit System Specifications Guide

Virtualizing Agilent OpenLAB CDS EZChrom Edition with VMware

A Study of User Data Integrity During Acquisition of Android Devices

Entuity Network Monitoring and Analytics 10.5 Server Sizing Guide

Privacy-Preserving Forensics

Cisco MCS 7845-H1 Unified CallManager Appliance

Hash Based Disk Imaging Using AFF4

Multidimensional Investigation of Source Port 0 Probing

RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN

ECE 598 Advanced Operating Systems Lecture 14

MMR: A Platform for Large-Scale Forensic Computing

ARCHITECTURE OF MADIS DATA PROCESSING AND DISTRIBUTION AT FSL

Unification of Digital Evidence from Disparate Sources (Digital Evidence Bags)

Microsoft Office SharePoint Server 2007

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

Cisco MCS 7825-I1 Unified CallManager Appliance

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging

Securing the Frisbee Multicast Disk Loader

Archival Science, Digital Forensics and New Media Art

A Framework for Attack Patterns Discovery in Honeynet Data

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

A CLOUD COMPUTING PLATFORM FOR LARGE-SCALE FORENSIC COMPUTING

DIGITAL FORENSIC PROCEDURE. Procedure Name: Mounting an EnCase E01 Logical Image file with FTK Imager. Category: Image Mounting

The Google File System

Lab - Create a Partition in Windows 8


Chapter 11: File System Implementation. Objectives

Chapter 3 - Memory Management

PowerEdge 350 systems contain the following major features:

Modern RAID Technology. RAID Primer A Configuration Guide

Managing Multi-user Windows (Citrix) from a System Manager's View

Emulating Goliath Storage Systems with David

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION

Technical Brief: Specifying a PC for Mascot

Parallels Virtuozzo Containers 4.6 for Windows

Parallels Containers for Windows 6.0

Enhanced Performance of Database by Automated Self-Tuned Systems

Network+ Guide to Networks, Fourth Edition. Chapter 8 Network Operating Systems and Windows Server 2003-Based Networking

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Macrorit Partition Expert 4.3.5

ECE 598 Advanced Operating Systems Lecture 22

Disclaimer of Liability: Redistribution Policy:

Leveraging CybOX to Standardize Representation and Exchange of Digital Forensic Information

Lesson 1: Using Task Manager

The Lion of storage systems

Integrated Ultra320 Smart Array 6i Redundant Array of Independent Disks (RAID) Controller with 64-MB read cache plus 128-MB batterybacked

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

NetVault Backup Client and Server Sizing Guide 2.1

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

CYB 610 Project 6 Workspace Exercise

Chapter 12: Advanced Operating Systems

NetVault Backup Client and Server Sizing Guide 3.0

Designing Robustness and Resilience in Digital Investigation Laboratories

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Making a Bootable Linux USB Flash Drive with the Universal USB Installer.

Can Digital Evidence Endure the Test of Time?

File System Implementation. Sunu Wibirama

The Cisco MCS 7835-H2 can run any of the following Cisco applications:

AccessData Advanced Forensics

CSN08101 Digital Forensics. Module Leader: Dr Gordon Russell Lecturers: Robert Ludwiniak

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

Introduction. Collecting, Searching and Sorting evidence. File Storage

FILE SYSTEM IMPLEMENTATION. Sunu Wibirama

HP Enterprise Integration and Management of HP ProLiant Servers. Download Full Version :

CIT 668: System Architecture. Amazon Web Services

The DETER Testbed: Overview 25 August 2004

TRW. Agent-Based Adaptive Computing for Ground Stations. Rodney Price, Ph.D. Stephen Dominic TRW Data Technologies Division.

Transcription:

DIGITAL FORENSIC RESEARCH CONFERENCE Breaking the Performance Wall: The Case for Distributed Digital Forensics By Vassil Roussev, Golden Richard Presented At The Digital Forensic Research Conference DFRWS 2004 USA Baltimore, MD (Aug 11 th - 13 th ) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. http:/dfrws.org

Breaking the Performance Wall: The Case for Distributed Digital Forensics Golden G. Richard III Associate Professor, Dept. of Computer Science, University of New Orleans Technical Advisor, Gulf Coast Computer Forensics Laboratory (GCCFL) Co-founder, Digital Forensics Solutions golden@cs.uno.edu Vassil Roussev Assistant Professor, Dept. of Computer Science, University of New Orleans vassil@cs.uno.edu

Single CPU Forensics Investigations 300GB 300GB 2004) 2

Single CPU State of the Art Start right away approach (e.g., Encase) Little preprocessing of evidence Can start working right away Ridiculously slow searches for large disk images Unacceptable performance for larger images Preprocessing-first approach (e.g., FTK) Substantial preprocessing of evidence Can t work on case until after preprocessing completes can take days Have keyword indices, thumbnails, etc. available Ridiculously slow searches for un-indexed things Example: regular expression searches Common trait: Too slow. 2004) 3

Symptoms Machines tied up for days doing preprocessing Painful to think outside the box (i.e., outside the index) during investigation If a regular expression search takes an entire day to complete, what do you do in the meantime? Hint: It requires a $300 video card For large collections of digital evidence, no extra resources to devote to cooler tools 2004) 4

The Culprit: The Entire Machine CPU driving investigation... For simple analysis, OK. But it s holding us back. Hard drives storing digital evidence for investigation Capacity OK. WAY too slow. Case is OK. Memory. WAY too small. 2004) 5

What could we be doing? Summaries for digital video files Extraction of key frames Better image classification Beyond hashing feature identification Searching audio files for voiceprints Generation of searchable text from audio files using speech recognition Automatic detection of steganography Backgrounding digital evidence preprocessing Analysis of evidence during preprocessing phase means investigatory phase can start right away Live searches regular expression searching even on huge drive images uninterrupted brainstorming! What else do you want to do? 2004) 6

The Solution: Distributed Forensics Forensics analysis on a cluster More CPUs == more horsepower for sophisticated processing If you have high performance file storage, e.g., a RAID can load the file server much more effectively More memory == can cache digital evidence for analysis Cache entire disk image in memory of cluster machines 2004) 7

Why don t we already do this? Cluster computing is fairly mature Why so few distributed forensics tools? Field is young? Requires more sophistication on the part of tool builders? Cost? ~$125K for a dedicated machine that can process 200GB images entirely in memory Can be clever and do it for less We re not being elitist. We just can t take it anymore! 2004) 8

Distributed Digital Forensics (DDF) Framework Architecture Worker JOIN / LEAVE CANCEL / EXIT FETCH CACHE LOAD Coordinator HASH / GREP / COMMAND/ DONE / REPORT / ERROR Worker SHUTDOWN STARTUP STORE Local Store Common Store 2004) 9

Requirements For Framework Scalable Want to support at least IMAGE SIZE / RAM_PER_NODE nodes Platform independent Want to be able to incorporate any (reasonable) machine that s available Lightweight Horsepower is for forensics, not the framework less fat Highly interactive Extensible Allow incorporation of existing sequential tools e.g., stegdetect, image thumbnailing, file classification, hashing, Robust Clusters can be notorious Must handle failed nodes smoothly 2004) 10

Goals for Framework Allow investigation to begin immediately after drive image is loaded SIGNIFICANTLY speed up traditional evidence processing Beat the hell out of high performance file servers Cache all evidence in RAM Enable new investigatory techniques N machines greater than N-fold speedup Brainstorming == ON during investigation. No extensive idle time for human allowed 2004) 11

Protocol <id> JOIN <name> <cache_size> <IP> <port> <id> LEAVE <name> <id> EXIT <id> SHUTDOWN <name> <id> STARTUP <name> <id> CACHE <file_name> <file_size> <hash> <reply_port> <file_data> <id> FETCH <file_name> <reply_port> <id> LOAD <files> <reply_port> <id> STORE <files> <reply_port> <id> FREE <files> <reply_port> <id> DONE <id> ERROR <code> <message> <id> REPORT <report> <id> PROGRESS <processed> <all> <id> CANCEL <req_id> 2004) 12

Protocol (2) <id> CLASSIFY <files> <progress> <reply_port> <id> HASH <method> <files> <progress> <reply_port> <id> GREP <expr> <files> <progress> <reply_port> <id> THUMB {<files> <type>} <tdir> <progress> <reply_port> <id> STEGO {<file> <type>} <progress> <reply_port> <id> CRACK <file> <key_range> <progress> <reply_port> <id> EXEC <command> <arguments> <reply_port> 2004) 13

Experimental Setup all your data are belong to us 72 x 2.4GHz P4s 2004) 14

Experimental Setup (2) File Server CPU: 2x1.4GHz Xeon RAM: 2GB SCSI RAID: 504GB 1Gb Switch 96-port, 10/100/1000 Mb 24 Gb Backplane Node CPU: 2.4 GHz Pentium 4 RAM: 1 GB 2004) 15

A Few Preliminary Results Target: Dell Optiplex GX1 w/ 6.4GB IDE drive NTFS, ~110,000 files in ~7,800 directories Imaged using dd w/ a Linux boot disk Machine used for traditional investigation: 3GHz P4, 2GB RAM, 2 x 73GB 15Krpm Ultra320 SCSI FTK v1.43a Initial Operation Time (hh:mm:ss) FTK Add Evidence 1:38:00 CACHE 0:09:36 8-node LOAD 0:03:58 1-node LOAD 0:05:19 more nodes better at loading the fileserver 2004) 16

Results (2) Live string search: Vassil Roussev Regular expression search: v[a-z]*i[a-z]*a[a-z]*g[a-z]*r[a-z]*a Search time: String Expression (mm:ss) Search time: Regular Expression (mm:ss) FTK 08:27 41:50 8-node System 00:27 00:28 2004) 17

A Different Experiment Stego detection using Stegdetect 0.5 under RH9 Linux on the cluster Traditional: 6GB image mounted using loopback device find /mnt/loop exec./stegdetect {} \; 790 seconds == 13:10 minutes Using the distributed framework Stegdetect 0.5 code incorporated into framework Detection against cached files STEGO command (after IMAGE/CACHE) 82 seconds == 1:22 minutes 9.6X faster with 8 machines CPU bound operation 2004) 18

To Do User interface! (unless you love Putty) 2004) 19

To Do (2) Code cleanup Case persistence Better fault tolerance Intelligent caching schemes to support larger images Will swap save us? Collaboration with colleagues (you?) working in: Image analysis/classification Speech recognition Stego Other CPU horsepower-intensive, forensics-applicable stuff 2004) 20

? golden@cs.uno.edu golden@digitalforensicssolutions.com