Random Sampling applied to Rapid Disk Analysis

Size: px
Start display at page:

Download "Random Sampling applied to Rapid Disk Analysis"

Transcription

1 1/28 Random Sampling applied to Rapid Disk Analysis System & Network Engineering Research Project Nicolas Canceill July 4, 2013

2 1 Rapid Disk Analysis 2 The Math 3 The Aftermath 2/28 4 Conclusions

3 2/28 Introduction Background Assoc. Prof. S. Garfinkel Navy Postgraduate School Advanced Forensics Format The Sleuth Kit Better analysis for digital evidence Searching a 1TB hard drive in 10 minutes (ACM 2013) Research E. van Eijk, Z. Geradts Nederlands Forensisch Instituut Stability? Scalability? Precision?

4 3/28 1 Rapid Disk Analysis 2 The Math 3 The Aftermath 4 Conclusions

5 4/28 Rapid Analysis: Why? Traditionally: investigation was leisurely Reading a 1TB hard drive: about 3.5h The cost of seek : 1 36GB 100, KiB New challenges Large installations: computers room, datacenter... Forensics control at checkpoints: border crossing, airports... The bomb will go off in the next hour!

6 5/28 Rapid Analysis: What for? Profit Indications Data analysis Determine free/wiped space Characterize data based on signatures Hash sectors to look for specific data

7 6/28 Rapid Analysis: How? Data characteristics Described (header/trailer) Encoded/formatted Sectorized and distributed Analysis strategies Simplify: hashing Tolerate: extract signature Reduce: random sampling

8 7/28 Research scope Research question How can random sampling help forensically investigate hard disk drives? What kind of indications may be provided? Which parameters are in play? Which degree of certainty may be achieved?

9 8/28 1 Rapid Disk Analysis 2 The Math 3 The Aftermath 4 Conclusions

10 9/28 Analysis process Built on top of S. Garfinkel s frag_find tool Input Image file to search Data-set/Signatures-set to look for Parameters: hashing, sampling, tolerance Process Build Bloom filter (hashing) Select sample For each block in sample: filter (and compare)

11 10/28 Random sampling: Basic model Using a random sample of a statistical population to estimate/predict characteristics Simple scenario Is this hard drive empty/wiped? M empty blocks out of N n sampled blocks out of N Error rate The probability to sample only empty blocks: E = i=n i=1 N (i 1) M N (i 1)

12 11/28 Random sampling: Data layout Data is sectorized: Data is not always aligned:

13 12/28 Random sampling: Advanced model A more realistic scenario Does this hard drive contain the target block? All possible offsets: overlap transactions by B F C All possible transactions: N = T (B F ) All target transactions: M = D T Error rate The probability to miss all target blocks: i=n C (i 1) D T (B F ) T E = (i 1) i=1 C T (B F )

14 13/28 Experimental protocol Experimental image set Parameters: image size, sector size, % of empty sectors, length of target data, offset size Input: Random files and NSRL Reference DataSet Experimental process Parameters: image size, sector size, transaction size, sampling fraction Randomly select a master file signature Generate several images (length of target data, % of empty sectors) Successively run several timed searches

15 14/28 1 Rapid Disk Analysis 2 The Math 3 The Aftermath 4 Conclusions

16 15/28 Results: statistical distribution 0.6 Presence of target data Nb. of transactions

17 16/28 Rapid Disk Analysis The Math The Aftermath Conclusions Results: block-to-transaction scaling Avg. error variance Transaction size 2 blocks 4 blocks 8 blocks Sample size (blocks)

18 17/28 Results: precision scaling Avg. error variance Nb. of transactions Image size 2MB 4MB 10MB 20MB

19 18/28 Results: time scaling Avg. search time (seconds) Image size 200kB 400kB 1MB 2MB 4MB 10MB 20MB 40MB 100MB Nb. of sampled blocks

20 19/28 Results: time overhead Avg. search time (seconds) Nb. of transactions Image size 2MB 4MB 10MB 20MB

21 20/28 1 Rapid Disk Analysis 2 The Math 3 The Aftermath 4 Conclusions

22 21/28 Contributions Main findings Parameters analyzed: Image characteristics: image size, sector size, data alignment, size of target data Sampling settings: sample size, transaction size, tolerance Scalability: Sample size scales with time: S t Error rate scales with time: E 1 t Public material Fork of S. Garfinkel s tools on GitHub Most of experimental scripts on Gist

23 22/28 Research answers What kind of indications may be provided? Presence/absence of target data or signature Which parameters are in play? Disk and data characteristics Sampling parameters Which degree of certainty may be achieved? Certainty scales well with time Insight about target disk will improve certainty Random sampling is a powerful, scalable, adaptive technique for fast HDD analysis Efficiency relies on suitable sampling settings, and limited insight on target HDD

24 23/28 Further research Improving insight of target Pre-determine sector size, data alignment Look for optimal block-to-transaction ratio One step further: pre-sampling Automate decision process Optimal time spending Automatic settings balance Simple user-side: time or certainty

25 24/28 Appendix 1: Bloom Filter (a) Hash-based filtering technique Initialize An array of n bits set to zero k different hash functions uniformly mapping to [0 n] Add an element Apply functions to compute k integers in [0 n] Set k corresponding bits to 1 Query an element Apply functions to compute k integers in [0 n] Check if k corresponding bits are all 1

26 Appendix 1: Bloom Filter (b) Avg. error variance Bloom filter size 8 bits 32bits 25/ Nb. of transactions

27 Appendix 1: Bloom Filter (c) 26/28 Avg. building and search time (seconds) Bloom filter size bits 16 bits bits 30 bits bits 32 bits Nb. of transactions

28 27/28 Appendix 2: Data layout (a) Optimal transaction size depends on sector size Best case: Worst case:

29 28/28 Appendix 2: Data layout (b) Optimal transaction size depends on data layout

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Ambry: LinkedIn s Scalable Geo- Distributed Object Store Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil

More information

Rapid Forensic Imaging of Large Disks with Sifting Collectors

Rapid Forensic Imaging of Large Disks with Sifting Collectors DIGITAL FORENSIC RESEARCH CONFERENCE Rapid Forensic Imaging of Large Disks with Sifting Collectors By Jonathan Grier and Golden Richard Presented At The Digital Forensic Research Conference DFRWS 2015

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

COMP 530: Operating Systems File Systems: Fundamentals

COMP 530: Operating Systems File Systems: Fundamentals File Systems: Fundamentals Don Porter Portions courtesy Emmett Witchel 1 Files What is a file? A named collection of related information recorded on secondary storage (e.g., disks) File attributes Name,

More information

Join Processing for Flash SSDs: Remembering Past Lessons

Join Processing for Flash SSDs: Remembering Past Lessons Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of

More information

File Systems: Fundamentals

File Systems: Fundamentals File Systems: Fundamentals 1 Files! What is a file? Ø A named collection of related information recorded on secondary storage (e.g., disks)! File attributes Ø Name, type, location, size, protection, creator,

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

File Systems: Fundamentals

File Systems: Fundamentals 1 Files Fundamental Ontology of File Systems File Systems: Fundamentals What is a file? Ø A named collection of related information recorded on secondary storage (e.g., disks) File attributes Ø Name, type,

More information

Aquaforest CheckPoint Reference Guide

Aquaforest CheckPoint Reference Guide Aquaforest CheckPoint Reference Guide Version 1.02 January 2018 Aquaforest Limited 2001-2018 Web: www.aquaforest.com E-mail: info@aquaforest.com Contents 1 Product Overview... 1 2 Installation and Licensing...

More information

CS Project Report

CS Project Report CS7960 - Project Report Kshitij Sudan kshitij@cs.utah.edu 1 Introduction With the growth in services provided over the Internet, the amount of data processing required has grown tremendously. To satisfy

More information

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23 FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most

More information

Summary Cache based Co-operative Proxies

Summary Cache based Co-operative Proxies Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck

More information

Autopsy as a Service Distributed Forensic Compute That Combines Evidence Acquisition and Analysis

Autopsy as a Service Distributed Forensic Compute That Combines Evidence Acquisition and Analysis Autopsy as a Service Distributed Forensic Compute That Combines Evidence Acquisition and Analysis Presentation to OSDFCon 2016 Dan Gonzales, Zev Winkelman, John Hollywood, Dulani Woods, Ricardo Sanchez,

More information

RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN

RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN RAPID RECOGNITION OF BLACKLISTED FILES AND FRAGMENTS MICHAEL MCCARRIN BRUCE ALLEN MANY THANKS TO: OSDFCon and Basis Bruce Allen Scott Young Joel Young Simson Garfinkel All of whom have helped with this

More information

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Two hours - online. The exam will be taken on line. This paper version is made available as a backup COMP 25212 Two hours - online The exam will be taken on line. This paper version is made available as a backup UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Monday 21st

More information

SMORE: A Cold Data Object Store for SMR Drives

SMORE: A Cold Data Object Store for SMR Drives SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm

More information

The Virtual Desktop Infrastructure Storage Behaviors and Requirements Spencer Shepler Microsoft

The Virtual Desktop Infrastructure Storage Behaviors and Requirements Spencer Shepler Microsoft The Virtual Desktop Infrastructure Storage Behaviors and Requirements Spencer Shepler Microsoft Storage for Hyper-V 2012 Hyper-V VMs container formats VHD VHDX (new) Stacked on top of regular file system

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

The Lion of storage systems

The Lion of storage systems The Lion of storage systems Rakuten. Inc, Yosuke Hara Mar 21, 2013 1 The Lion of storage systems http://www.leofs.org LeoFS v0.14.0 was released! 2 Table of Contents 1. Motivation 2. Overview & Inside

More information

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research

More information

When user select menu 2. Tables from the Main menu, the following screen will appear:

When user select menu 2. Tables from the Main menu, the following screen will appear: July 21, 2004 5.1 The tables from the menu allows the campus user to view 19 individual tables that are used in the forms. This chapter will provide details of each table. When user select menu 2. from

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Google is Really Different.

Google is Really Different. COMP 790-088 -- Distributed File Systems Google File System 7 Google is Really Different. Huge Datacenters in 5+ Worldwide Locations Datacenters house multiple server clusters Coming soon to Lenior, NC

More information

Disk Scheduling COMPSCI 386

Disk Scheduling COMPSCI 386 Disk Scheduling COMPSCI 386 Topics Disk Structure (9.1 9.2) Disk Scheduling (9.4) Allocation Methods (11.4) Free Space Management (11.5) Hard Disk Platter diameter ranges from 1.8 to 3.5 inches. Both sides

More information

EMC CLARiiON Backup Storage Solutions

EMC CLARiiON Backup Storage Solutions Engineering White Paper Backup-to-Disk Guide with Computer Associates BrightStor ARCserve Backup Abstract This white paper describes how to configure EMC CLARiiON CX series storage systems with Computer

More information

1. Creates the illusion of an address space much larger than the physical memory

1. Creates the illusion of an address space much larger than the physical memory Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for

More information

Aquaforest CheckPoint Reference Guide

Aquaforest CheckPoint Reference Guide Aquaforest CheckPoint Reference Guide Version 1.01 April 2015 Copyright 2005-2015 Aquaforest Limited http://www.aquaforest.com/ Contents 1 Product Overview... 3 2 Installation and Licensing... 4 2.1 Installation...

More information

Recovering Disk Storage Metrics from low level Trace events

Recovering Disk Storage Metrics from low level Trace events Recovering Disk Storage Metrics from low level Trace events Progress Report Meeting May 05, 2016 Houssem Daoud Michel Dagenais École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction and

More information

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014 Ceph vs Swift Performance Evaluation on a Small Cluster edupert monthly call July, 24th 2014 About me Vincenzo Pii Researcher @ Leading research initiative on Cloud Storage Under the theme IaaS More on

More information

FCP: A Fast and Scalable Data Copy Tool for High Performance Parallel File Systems

FCP: A Fast and Scalable Data Copy Tool for High Performance Parallel File Systems FCP: A Fast and Scalable Data Copy Tool for High Performance Parallel File Systems Feiyi Wang (Ph.D.) Veronica Vergara Larrea Dustin Leverman Sarp Oral ORNL is managed by UT-Battelle for the US Department

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

Introduction to Volume Analysis, Part I: Foundations, The Sleuth Kit and Autopsy. Digital Forensics Course* Leonardo A. Martucci *based on the book:

Introduction to Volume Analysis, Part I: Foundations, The Sleuth Kit and Autopsy. Digital Forensics Course* Leonardo A. Martucci *based on the book: Part I: Foundations, Introduction to Volume Analysis, The Sleuth Kit and Autopsy Course* Leonardo A. Martucci *based on the book: File System Forensic Analysis by Brian Carrier LAM 2007 1/12h Outline Part

More information

Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual Memory. CS 351: Systems Programming Michael Saelee Virtual Memory CS 351: Systems Programming Michael Saelee registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM

More information

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Review Segmentation Segmentation Implementation Advantage of Segmentation Protection Sharing Segmentation with Paging Segmentation with Paging Segmentation with Paging Reason for the segmentation with

More information

Summary optimized CRUSH algorithm more than 10% read performance improvement Design and Implementation: 1. Problem Identification 2.

Summary optimized CRUSH algorithm more than 10% read performance improvement Design and Implementation: 1. Problem Identification 2. Several months ago we met an issue of read performance issues (17% degradation) when working on ceph object storage performance evaluation with 10M objects (scaling from 10K objects to 1Million objects),

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Using Hashing to Improve Volatile Memory Forensic Analysis

Using Hashing to Improve Volatile Memory Forensic Analysis Using Hashing to Improve Volatile Memory Forensic Analysis American Academy of Forensic Sciences Annual Meeting February 21, 2008 AAron Walters awalters@volatilesystems.com Blake Matheny, LLC Center for

More information

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24 FILE SYSTEMS, PART 2 CS124 Operating Systems Fall 2017-2018, Lecture 24 2 Last Time: File Systems Introduced the concept of file systems Explored several ways of managing the contents of files Contiguous

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval

A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval Simon Jonassen and Svein Erik Bratsberg Department of Computer and Information Science Norwegian University of

More information

Windows Support for PM. Tom Talpey, Microsoft

Windows Support for PM. Tom Talpey, Microsoft Windows Support for PM Tom Talpey, Microsoft Agenda Industry Standards Support PMDK Open Source Support Hyper-V Support SQL Server Support Storage Spaces Direct Support SMB3 and RDMA Support 2 Windows

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access File File System Implementation Operating Systems Hebrew University Spring 2009 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write

More information

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer UCS Invicta: A New Generation of Storage Performance Mazen Abou Najm DC Consulting Systems Engineer HDDs Aren t Designed For High Performance Disk 101 Can t spin faster (200 IOPS/Drive) Can t seek faster

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Lifecycle of an SQL Query CSE 190D base System Implementation Arun Kumar Query Query Result

More information

A STUDY OF THE PERFORMANCE TRADEOFFS OF A TRADE ARCHIVE

A STUDY OF THE PERFORMANCE TRADEOFFS OF A TRADE ARCHIVE A STUDY OF THE PERFORMANCE TRADEOFFS OF A TRADE ARCHIVE CS737 PROJECT REPORT Anurag Gupta David Goldman Han-Yin Chen {anurag, goldman, han-yin}@cs.wisc.edu Computer Sciences Department University of Wisconsin,

More information

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance

More information

CSE 190D Database System Implementation

CSE 190D Database System Implementation CSE 190D Database System Implementation Arun Kumar Topic 1: Data Storage, Buffer Management, and File Organization Chapters 8 and 9 (except 8.5.4 and 9.2) of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris

More information

Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services

Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services 2 nd IEEE International Conference on Cloud Computing Technology and Science Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services Jesús Montes, Bogdan Nicolae, Gabriel Antoniu,

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological

More information

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG Storage Services Yves Goeleven Solution Architect - Particular Software Shipping software since 2001 Azure MVP since 2010 Co-founder & board member AZUG NServiceBus & MessageHandler Used azure storage?

More information

Windows Support for PM. Tom Talpey, Microsoft

Windows Support for PM. Tom Talpey, Microsoft Windows Support for PM Tom Talpey, Microsoft Agenda Windows and Windows Server PM Industry Standards Support PMDK Support Hyper-V PM Support SQL Server PM Support Storage Spaces Direct PM Support SMB3

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Computing Visibility on Terrains in External Memory

Computing Visibility on Terrains in External Memory Computing Visibility on Terrains in External Memory Herman Haverkort Laura Toma Yi Zhuang TU. Eindhoven Netherlands Bowdoin College USA Visibility Problem: visibility map (viewshed) of v terrain T arbitrary

More information

Assessing performance in HP LeftHand SANs

Assessing performance in HP LeftHand SANs Assessing performance in HP LeftHand SANs HP LeftHand Starter, Virtualization, and Multi-Site SANs deliver reliable, scalable, and predictable performance White paper Introduction... 2 The advantages of

More information

Distributed Summary Statistics with Bro. Vlad Grigorescu

Distributed Summary Statistics with Bro. Vlad Grigorescu Distributed Summary Statistics with Bro Vlad Grigorescu 1 > whoami Member of the Bro development team Senior Developer at Broala LLC Senior Information Security Engineer at Carnegie Mellon University https://github.com/grigorescu

More information

Detailed study on Linux Logical Volume Manager

Detailed study on Linux Logical Volume Manager Detailed study on Linux Logical Volume Manager Prashanth Nayak, Robert Ricci Flux Research Group Universitiy of Utah August 1, 2013 1 Introduction This document aims to provide an introduction to Linux

More information

Case Study II: A Web Server

Case Study II: A Web Server Case Study II: A Web Server Prof. Daniel A. Menascé Department of Computer Science George Mason University www.cs.gmu.edu/faculty/menasce.html 1 Copyright Notice Most of the figures in this set of slides

More information

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture

More information

C has been and will always remain on top for performancecritical

C has been and will always remain on top for performancecritical Check out this link: http://spectrum.ieee.org/static/interactive-the-top-programminglanguages-2016 C has been and will always remain on top for performancecritical applications: Implementing: Databases

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Cache memory idea Use a small faster memory, a cache memory, to store recently

More information

Experimental Mathematics and Data Mining: Extracting Identities from the Online Encyclopedia of Integer Sequences

Experimental Mathematics and Data Mining: Extracting Identities from the Online Encyclopedia of Integer Sequences Experimental Mathematics and Data Mining: Extracting Identities from the Online Encyclopedia of Integer Sequences Hieu D. Nguyen Rowan University Mathfest - Lexington, KY August 4, 2011 2 Experimental

More information

CSE 124: Networked Services Lecture-17

CSE 124: Networked Services Lecture-17 Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

White paper ETERNUS Extreme Cache Performance and Use

White paper ETERNUS Extreme Cache Performance and Use White paper ETERNUS Extreme Cache Performance and Use The Extreme Cache feature provides the ETERNUS DX500 S3 and DX600 S3 Storage Arrays with an effective flash based performance accelerator for regions

More information

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law EECS 252 Graduate Computer Architecture Lec 18 Storage David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley Review Disks: Arial Density now 30%/yr vs. 100%/yr

More information

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1 Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Storage Subsystem in Linux OS Inode cache User Applications System call Interface Virtual File System (VFS) Filesystem

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Dell Compellent Storage Center and Windows Server 2012/R2 ODX

Dell Compellent Storage Center and Windows Server 2012/R2 ODX Dell Compellent Storage Center and Windows Server 2012/R2 ODX A Dell Technical Overview Kris Piepho, Microsoft Product Specialist October, 2013 Revisions Date July 2013 October 2013 Description Initial

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious Algorithms Succinct Data Structures RAM MODEL Almost everything you do in Haskell assumes this model Good for ADTs,

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

Computing Visibility on Terrains in External Memory

Computing Visibility on Terrains in External Memory Computing Visibility on Terrains in External Memory Herman Haverkort Laura Toma Yi Zhuang TU. Eindhoven Netherlands Bowdoin College USA ALENEX 2007 New Orleans, USA Visibility Problem: visibility map (viewshed)

More information

Appendix D: Storage Systems

Appendix D: Storage Systems Appendix D: Storage Systems Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Storage Systems : Disks Used for long term storage of files temporarily store parts of pgm

More information

STORING DATA: DISK AND FILES

STORING DATA: DISK AND FILES STORING DATA: DISK AND FILES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? How does a DBMS store data? disk, SSD, main memory The Buffer manager controls how

More information

Using Secure Computation for Statistical Analysis of Quantitative Genomic Assay Data

Using Secure Computation for Statistical Analysis of Quantitative Genomic Assay Data Using Secure Computation for Statistical Analysis of Quantitative Genomic Assay Data Justin Wagner Ph.D. Candidate University of Maryland, College Park Advisor: Hector Corrada Bravo Genomic Assay Analysis

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following

More information

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date:

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date: Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date: 8-17-5 Table of Contents Table of Contents...1 Table of Figures...1 1 Overview...4 2 Experiment Description...4

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space Today CSCI 5105 Coda GFS PAST Instructor: Abhishek Chandra 2 Coda Main Goals: Availability: Work in the presence of disconnection Scalability: Support large number of users Successor of Andrew File System

More information

Design of Flash-Based DBMS: An In-Page Logging Approach

Design of Flash-Based DBMS: An In-Page Logging Approach SIGMOD 07 Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon,, Korea 440-746 wonlee@ece.skku.ac.kr Bongki Moon Department of Computer

More information

File Directories Associated with any file management system and collection of files is a file directories The directory contains information about

File Directories Associated with any file management system and collection of files is a file directories The directory contains information about 1 File Management 2 File Directories Associated with any file management system and collection of files is a file directories The directory contains information about the files, including attributes, location

More information

Computer Organization (Autonomous)

Computer Organization (Autonomous) 2-7-27 Computer Organization (Autonomous) UNIT IV Sections - A & D SYLLABUS The Memory System: Memory Hierarchy, - RAM and ROM Chips, Memory Address Maps, Memory Connection to, Auxiliary Magnetic Disks,

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Source: https://articles.forensicfocus.com/2018/03/02/evidence-acquisition-using-accessdata-ftk-imager/

Source: https://articles.forensicfocus.com/2018/03/02/evidence-acquisition-using-accessdata-ftk-imager/ by Chirath De Alwis Source: https://articles.forensicfocus.com/2018/03/02/evidence-acquisition-using-accessdata-ftk-imager/ Forensic Toolkit or FTK is a computer forensics software product made by AccessData.

More information

2.0. Technical Release Notes

2.0. Technical Release Notes LEAPFROG WORKS LEAPFROG WORKS 2.0 Technical Release Notes This document outlines the features available in the version 2.0 release of Leapfrog Works. Contact your Leapfrog support team to arrange access

More information

Computer Forensics: Investigating Data and Image Files, 2nd Edition. Chapter 3 Forensic Investigations Using EnCase

Computer Forensics: Investigating Data and Image Files, 2nd Edition. Chapter 3 Forensic Investigations Using EnCase Computer Forensics: Investigating Data and Image Files, 2nd Edition Chapter 3 Forensic Investigations Using EnCase Objectives After completing this chapter, you should be able to: Understand evidence files

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

NAVAL POSTGRADUATE SCHOOL THESIS

NAVAL POSTGRADUATE SCHOOL THESIS NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS USING DISTINCT SECTORS IN MEDIA SAMPLING AND FULL MEDIA ANALYSIS TO DETECT PRESENCE OF DOCUMENTS FROM A CORPUS by Kristina Foster September 2012 Thesis

More information

Multi-version concurrency control

Multi-version concurrency control Spanner Storage insights 2P & CC = strict serialization Provides semantics as if only one transaction was running on DB at time, in serial order + Real-time guarantees CS 518: Advanced Computer Systems

More information

Evaluation of Performance of Cooperative Web Caching with Web Polygraph

Evaluation of Performance of Cooperative Web Caching with Web Polygraph Evaluation of Performance of Cooperative Web Caching with Web Polygraph Ping Du Jaspal Subhlok Department of Computer Science University of Houston Houston, TX 77204 {pdu, jaspal}@uh.edu Abstract This

More information

DATABASE COMPRESSION. Pooja Nilangekar [ ] Rohit Agrawal [ ] : Advanced Database Systems

DATABASE COMPRESSION. Pooja Nilangekar [ ] Rohit Agrawal [ ] : Advanced Database Systems DATABASE COMPRESSION Pooja Nilangekar [ poojan@cmu.edu ] Rohit Agrawal [ rohit10@cmu.edu ] 15721 : Advanced Database Systems PROJECT OBJECTIVE Compressing the DBMS :- Use less space to store cold data

More information

GiST: A Generalized Search Tree for Database Systems

GiST: A Generalized Search Tree for Database Systems GiST: A Generalized Search Tree for Database Systems Joe Hellerstein UC Berkeley jmh - GiST 1/19/96, p 1 Road Map Motivation Intuition on Generalized Search Trees Overview of GiST ADT Example indices:

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information