DERRIC. Model-Driven Engineering in Digital Forensics. Jeroen van den Bos

Similar documents
Model-Driven Engineering in Digital Forensics. Jeroen van den Bos with Tijs van der Storm and Leon Aronson

Bytes are read Right to Left, so = 0x3412, = 0x

Rascal: Language Technology for Model-Driven Engineering

Bringing Domain-Specific Languages to Digital Forensics

A Little Language: Little Maintenance?

Domain-Specific Languages for Digital Forensics

Introduction to Programming Using Java (98-388)

CSE 401 Midterm Exam Sample Solution 11/4/11

Dissecting Files. Endianness. So Many Bytes. Big Endian vs. Little Endian. Example Number. The "proper" order of things. Week 6

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Computing Science 114 Solutions to Midterm Examination Tuesday October 19, In Questions 1 20, Circle EXACTLY ONE choice as the best answer

Force Open: Lightweight Black Box File Repair

Semantic Analysis. Compiler Architecture

Intermediate Code Generation

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS /534 Compiler Construction University of Massachusetts Lowell

Java Primer 1: Types, Classes and Operators

1 Lexical Considerations

EASY Programming with Rascal

COSC121: Computer Systems: Runtime Stack

Compilers and Code Optimization EDOARDO FUSELLA

Domain-Specific Languages for Program Analysis

Rascal Tutorial. Tijs van der Storm Wednesday, May 23, 12

Grammars and Parsing, second week

Accessing Arbitrary Hierarchical Data

Programming Assignment IV

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Recap: Pointers. int* int& *p &i &*&* ** * * * * IFMP 18, M. Schwerhoff

Computer Forensics: Investigating Data and Image Files, 2nd Edition. Chapter 3 Forensic Investigations Using EnCase

Question: How can we compare two objects of a given class to see if they are the same? Is it legal to do: Rectangle r(0,0,3,3);,, Rectangle s(0,0,4,4)

Programming Assignment IV Due Monday, November 8 (with an automatic extension until Friday, November 12, noon)

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Project 6 Due 11:59:59pm Thu, Dec 10, 2015

CS313D: ADVANCED PROGRAMMING LANGUAGE

Object-oriented Compiler Construction

Lexical Considerations

Intermediate Representations

The SPL Programming Language Reference Manual

Implementing a VSOP Compiler. March 20, 2018

Grammars and Parsing. Paul Klint. Grammars and Parsing

Compilers and computer architecture: Semantic analysis

Objectives. Chapter 4: Control Structures I (Selection) Objectives (cont d.) Control Structures. Control Structures (cont d.) Relational Operators

A little History Domain Specific Languages Examples Tools Benefits A more theoretical View Programming and Modeling The LWES Project Bonus: Best

Lecture Overview Code generation in milestone 2 o Code generation for array indexing o Some rational implementation Over Express Over o Creating

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Common File Formats. Need a standard to store images Raster data Photos Synthetic renderings. Vector Graphic Illustrations Fonts

Motivation was to facilitate development of systems software, especially OS development.

Project Compiler. CS031 TA Help Session November 28, 2011

This is not yellow. Image Files - Center for Graphics and Geometric Computing, Technion 2

Chapter 11 Introduction to Programming in C

Chapter 4: Control Structures I (Selection) Objectives. Objectives (cont d.) Control Structures. Control Structures (cont d.

CSC 467 Lecture 13-14: Semantic Analysis

Input File Syntax The parser expects the input file to be divided into objects. Each object must start with the declaration:

age = 23 age = age + 1 data types Integers Floating-point numbers Strings Booleans loosely typed age = In my 20s

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

Domain-Specific Languages

Chapter 1: Object-Oriented Programming Using C++

Programming Assignment IV Due Thursday, November 18th, 2010 at 11:59 PM

CSE 401 Final Exam. December 16, 2010

EECS1710. Checklist from last lecture (Sept 9, 2014) " get an EECS account (if you don t have it already) " read sections

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

ASML Language Reference Manual

Domain-Specific Languages Language Workbenches

Chapter 14 Abstract Classes and Interfaces

An overview about DroidBasic For Android

Chapter 11 Introduction to Programming in C

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

CGS 3066: Spring 2015 JavaScript Reference

Chapter 3 Syntax, Errors, and Debugging. Fundamentals of Java

7. Introduction to Denotational Semantics. Oscar Nierstrasz

Crafting a Compiler with C (II) Compiler V. S. Interpreter

2. Reachability in garbage collection is just an approximation of garbage.

Origin Tracking + Text Differencing = Textual Model Differencing

1 Overview. 2 Basic Program Structure. 2.1 Required and Optional Parts of Sketch

Project 1: Scheme Pretty-Printer

Syntax and Grammars 1 / 21

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

The Rascal Approach to Code in Prose, Computed Properties, and Language Extension

CSE 504: Compiler Design. Runtime Environments

The PCAT Programming Language Reference Manual

25. Interfaces. Java. Summer 2008 Instructor: Dr. Masoud Yaghini

IOSR Journal of Computer Engineering (IOSRJCE) ISSN: Volume 3, Issue 3 (July-Aug. 2012), PP

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

TABLES AND HASHING. Chapter 13

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

VARIABLES AND TYPES CITS1001

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Motivation was to facilitate development of systems software, especially OS development.

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

CSC 1052 Algorithms & Data Structures II: Interfaces

Lexical Considerations

Algorithms and Programming I. Lecture#12 Spring 2015

Welcome to Python 3. Some history

Transcription:

DERRIC Model-Driven Engineering in Digital Forensics Jeroen van den Bos jeroen@infuse.org

DERRIC Model-Driven Engineering in Digital Forensics Experience: Jeroen van den Bos jeroen@infuse.org 1998: software engineer 2002: architect at NFI 2009: researcher at NFI/CWI Education: BICT CS/MSc SE

Contents How we decided to build a DSL What DSL we built How we built it

How we decided to build a DSL Or: Domain analysis of and our experience in digital forensics

Netherlands Forensic Institute Improve our clients information position through high-quality forensic services

What is digital forensics? From Wikipedia: Digital forensics is a branch of forensic science encompassing the recovery and investigation of material found in digital devices, often in relation to computer crime.

Do we need (custom) software (engineering)? Software: yes, there is no other way to do digital forensics. Custom software: yes, because we have specific requirements. Software engineering: yes, for legal, business and engineering reasons.

What to develop?

RDD Defraser TULP2G Aftertime

Main activities Acquisition Recovery Analysis Securing the data Turning data into information Finding relevant information

One clear bottleneck Data acquisition Hardware and hardware interfaces are slow-moving. Data recovery New platforms, apps and versions emerge daily. Lots of variants due to vendor-specific implementations. Underlying storage approaches are slow-moving. Data analysis Analyses and types of questions are slow-moving.

Critical factors High time pressure Performance/ scalability concerns Last-minute tool modifications Changes required in several places Division of work

Requirements 1. Data structure definitions that are easy to develop and modify. 2. Highest possible runtime performance and scalability. 3. Reuse of changes across applications. 4. Separation forensic investigation and software engineering concerns.

What DSL we built Or: DERRIC, a user s perspective

A DERRIC description 1. Header Name and encoding/ type defaults Format PNG strings ascii size 1 unit byte sign false type integer order lsb0 endian little 2. Sequence Data structure ordering Sequence Signature IHDR (ITXT ICMT)* PLTE? IDAT IDAT* IEND 3. Structures Layout of individual data structures Structures IHDR { l: lengthof(d) size 4; n: IHDR ; d: {... } c: checksum (...) size 4; }

Structures Chunk { length: lengthof(chunkdata) size 4; chunktype: type string size 4; chunkdata: size length; crc: checksum(algorithm="crc32-ieee", fields=chunktype+chunkdata) size 4; } IHDR = Chunk { chunktype: "IHDR"; chunkdata: { width:!0 size 4; height:!0 size 4; bitdepth: 1 2 4 8 16; imagesize: (width*height*bitdepth)/8 size 4; colourtype: 0 2 3 4 6; interlace: 0 1; } }

Applying Derric Each format has one/several descriptions. Code generator uses descriptions: Applies (domain-specific) optimizations/transformations. Runs quickly, so easy to rerun after changes. May output for multiple targets. Runtime application uses components: Recognizes, extracts or ignores files. Implements algorithms and common optimizations.

input to storage device Derric Descriptions Code Generator Format Validators File Carver recovered files input to produces input to produces Excavator/Derric architecture

How we built it Or: Implementing a DSL

Format description (Derric) Storage device (hard drive) Language grammar Derric Generator (Rascal) Excavator recovery tool (Java) Recovery algorithm Report findings (files) implode invoke AST refine Format validator generate

Compilation stages 1. Parse (concrete syntax) 2. Implode (FileFormat AST) 3. Propagate defaults 4. Desugar (x5) 5. Propagate constants 6. Fold constants (5/6 chg? 5) 7. Annotate expressions 8. Transform (Validator AST) 9. Generate code (Java)

Tree transforms 1. Parse (concrete syntax) 2. Implode (FileFormat AST) 3. Propagate defaults 4. Desugar (x5) 5. Propagate constants 6. Fold constants (changed? 5) 7. Annotate expressions 8. Transform (Validator AST) 9. Generate code (Java)

Tree refinements 1. Parse (concrete syntax) 2. Implode (FileFormat AST) 3. Propagate defaults 4. Desugar (x5) 5. Propagate constants 6. Fold constants (changed? 5) 7. Annotate expressions 8. Transform (Validator AST) 9. Generate code (Java)

Tree refinement pattern Input tree. Build data structure with required knowledge. Traverse tree, removing/modifying nodes. Return tree.

private AST performrefinement(ast ast) { } rel[str, str, str, int] env = { }; for (term <- ast.terms) { // build the environment } Term fixterm(term term) { // transform node using environment return term; } return top-down visit(ast) { case t:term(list[field] fields) => fixterm(t) case t:term(str name, list[field] fields) => fixterm(t) }

Refinement examples Remove inheritance Find root, walk back Overwrite/add encountered fields Remove offset()-keyword For each field, determine all preceding fields Replace offset()s with lengthof()s of preceding fields Remove strings Replace each string with a list of byte values obtained from an encode function.

Tree annotations 1. Parse (concrete syntax) 2. Implode (FileFormat AST) 3. Propagate defaults 4. Desugar (x5) 5. Propagate constants 6. Fold constants (changed? 5) 7. Annotate expressions 8. Transform (Validator AST) 9. Generate code (Java)

Structures Chunk { length: lengthof(chunkdata) size 4; chunktype: type string size 4; chunkdata: size length; crc: checksum(algorithm="crc32-ieee", fields=chunktype+chunkdata) size 4; } IHDR = Chunk { chunktype: "IHDR"; chunkdata: { width:!0 size 4; height:!0 size 4; bitdepth: 1 2 4 8 16; imagesize: (width*height*bitdepth)/8 size 4; colourtype: 0 2 3 4 6; interlace: 0 1; } }

Source length: lengthof(chunkdata) size 4 @checkafter(chunkdata) chunkdata: unknown size length @ref(global) crc: checksum size 4 Target length: declare variable l read into l declare variable l2 calculate l2 chunkdata: declare global variable d read into d check lengthof(d) = l2 crc: declare variable c read into c declare variable c2 calculcate c2 check c = c2

Rules of thumb Transform from representation to representation until you reach target For each node, a simple (typically one-line) transformation When too complex, refine current representation A single step at a time Use annotations to guide transformations

General conclusions Don t invest in DSL/MDE unless you have extensive background in the problem domain. Software is hard, metasoftware is harder. Implement DSLs using a large amount of individual steps that are themselves simple. It actually works!

Questions? Jobs @ NFI: http://www.holmes.nl/