Generating Realistic Impressions for File-System Benchmarking. Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
|
|
- Brett Dennis Campbell
- 6 years ago
- Views:
Transcription
1 Generating Realistic Impressions for File-System Benchmarking Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
2 For better or for worse," benchmarks shape a field " David Patterson 2
3 Inputs to file-system benchmarking Application Disk layout FS logical organization File System Storage device Input: Benchmark workload Postmark, FileBench, Fstress, Bonnie, IOZone, TPCC, etc etc Input: In-memory state Cold cache/warm cache Input: File-System Image Anything goes! 3
4 FS images in past: use what is convenient Typical desktop file system w/ no description (SOSP 05) 5-deep tree, 5 subdirs, 10 8KB files in each (FAST 04) Randomly generated files of several MB (FAST 08) 1000 files in 10 dirs w/ random data (SOSP 03) 188GB and 129GB volumes in Engg dept (OSDI 99) files from /usr/local, size 354MB (SOSP 01) 1641 files, 109 dirs, 13.4 MB total size (OSDI 02)
5 Performance of find operation Relative Time Taken Disk layout & cache state File-system logical organization 5
6 Problem scope Characteristics of file-system images have strong impact on performance We need to incorporate representative file-system images in benchmarking & design How to create representative file-system images? 6
7 Requirements for creating FS images Access to data on file systems and disk layout Properties of file-system metadata [Satyanarayan81, Mullender84, Irlam93, Sienknecht94, Douceur99, Agrawal07] Disk fragmentation [Smith97] More such studies in future? A technique to create file-system images that is Representative: given a set of input distributions Controllable: supply additional user constraints Reproducible: control & report internal parameters Easy to use: for widespread adoption and consensus 7
8 Introducing Impressions Powerful statistical framework to generate file-system images Takes properties of file-system attributes as input Works out underlying statistical details of the image Mounted on a disk partition for real benchmarking Satisfies the four design goals Applying Impressions gives useful insights What is the impact on performance and storage size? How does an application behave on a real FS image? 8
9 Outline Introduction Generating realistic file-system images Applying Impressions: Desktop search Conclusion 9
10 Overview of Impressions Impressions 10
11 Properties of file-system metadata Five-year study of file-system metadata [FAST07] (Agrawal, Bolosky, Douceur, Lorch) Used as exemplar for metadata properties in Impressions 11
12 Features of Impressions Modes of operation for different usages Basic mode: choose default settings for parameters Advanced mode: several individually tunable knobs Thorough statistical machinery ensures accuracy Uses parameterized curve fits Allows arbitrary user constraints Built-in statistical tests for goodness-of-fit Generates namespace, metadata, file content, and disk fragmentation using above techniques 12
13 Creating valid metadata Creating file-system namespace Uses Generative Model proposed earlier [FAST 07] Explains the process of directory tree creation Accurately regenerates distribution of directory size and of directory depth 13
14 Creating namespace Cumulative Fraction of % directories of Probability of parent selection Count(subdirs)+2 Dirs Dirs by namespace by subdir count depth Directories by by Subdirectory Namespace Count Depth D G Dataset D 0.02 Generated G Namespace depth (bin size 1) i Count of subdirectories Directory tree Monte Carlo run Incorporates dirs by depth and dirs by subdir count 14
15 Creating valid metadata Creating file-system namespace Creating files: stepwise process File size, file extension, file depth, parent directory Uses statistical models & analytical approximations 15
16 Example: creating realistic file sizes Contribution to used space File Sizes Lognormal Hybrid Containing file size (bytes, log scale) Pure lognormal distribution no longer good fit Hybrid model: lognormal body, Pareto tail Fits observed data more accurately, used to recreate file sizes in Impressions 16
17 Creating files Files by Files containing by Containing bytes Bytes Fraction of bytes D G 0 8 2K 512K 512M128G File Size (bytes, log scale) S 8 S 9 S 7 S 6 S 5 S 4 S 3 S 2 S 1 i File Size Model Lognormal body, Pareto tail Captures bimodal curve 17
18 Creating files Top extensions Top Extensions by Count count 1 Fraction of files Desired others txt null jpg htm h gif exe dll cpp others txt null jpg htm h gif exe dll cpp Generated S 9 S 8 E 9 E 8 S 7 S 6 E 7 E 6 S 5 E 5 S 4 S 3 S 2 S 1 E 4 E 3 E 2 E 1 i File Extensions Percentile values Top 20 extensions account for 50% of files and bytes 18
19 Creating files S 9 S 8 E 9 E 8 D 9 D 8 S 7 S 6 S 5 E 7 E 6 E 5 D 7 D 6 D 5 S 4 S 3 S 2 S 1 E 4 E 3 E 2 E 1 D 4 D 3 D 2 D 1 Mean Fraction bytes of per files file (log scale) Bytes by namespace depth Files by Files Bytes namespace by by Namespace depth Depth D MB G 768KB KB KB KB i Namespace depth (bin size 1) 1) File Depth Poisson Multiplicative model along with bytes by depth 19
20 Creating files Fraction of files Files by namespace depth i 0 Files by Namespace Depth (with Special Directories) w/ special dirs D G Namespace depth (bin size 1) Parent Dir Inverse Polynomial Satisfies distribution of dirs with file count 20
21 Resolving arbitrary constraints 21
22 Resolving arbitrary constraints Fraction of files OC Original Constrained 8 2K 512K 8M File Size (bytes, log scale) Contrived for sum Constraint: Given count of files & size distribution, ensure Accurate both for the sum and the distribution sum of file sizes matches a desired total file system size 22
23 Resolving arbitrary constraints Arbitrarily specified on file system parameters Variant of NP-complete Subset Sum Problem Approximation algorithm based solution (in paper) Oversampling to get additional sample values Local improvement to iteratively converge to the desired sum by identifying best-fit in current sample While constraints are satisfied, constrained distribution also retains original characteristics 23
24 Interpolation and extrapolation Why don t we just use available data values? Limited to empirical data in input dataset What-if analysis beyond available dataset Efficient to maintain compact curve fits and use interpolation/extrapolation instead of all data Technique: Piecewise interpolation 24
25 Interpolation technique & accuracy 0.14 Piecewise Interpolation Piecewise Interpolation Fraction of bytes Segment Value Interpola9on: Seg File System Size (GB) Segment GB 50 GB 10 GB K 8K 32K 128K File Size Interpolation (75 GB) Each distribution broken down 0.12into segments Fraction of bytes File Size interpolation 75GB Data points within a segment used for curve fit Combine segment interpolations Extrapolated for new curve 0 R I Real 8 2K 512K 128M 32G File Size Interpolated Fraction of bytes 512K 2M M 0 32M 128M 512M 2G 8G 32G 128G File Size extrapolation Extrapolation ( GB GB) R E Real 8 2K 512K 128M 32G 25 File Size
26 File content Files having natural language content Word-popularity model (heavy tailed) Word-length frequency model (for the long tail) Content for other files (mp3, gif, mpeg etc) Impressions generates valid header/footer Uses third-party libraries and software 26
27 Disk layout and fragmentation 27
28 Disk layout and fragmentation Simplistic technique Layout Score for degree of fragmentation [Smith97] Pairs of file create/delete operations till desired layout score is achieved In future more File 1 nuanced ways are File possible 2 Out-of-order file writes, writes with long delays Access to file-system specific interfaces 1 non-contiguous FIPMAP in block Linux, (out XFS_IOC_GETBMAP of 8) All blocks for contiguous XFS File Layout Score = 7/8 File Layout Score = 1 (6/6) Perhaps a tool complementary to Impressions 28
29 Outline Introduction Generating realistic file-system images Applying Impressions: Desktop search Conclusion 29
30 Applying Impressions Case study: desktop search Google desktop for linux (GDL) and Beagle Metrics of interest: Size of index, time to build initial search index Identifying application bugs and policies GDL doesn t index content beyond 10-deep Computing realistic rules of thumb Overhead of metadata replication? 30
31 Impact of file content Index Size Comparison Index Size/FS size Text (1 Word) Text (Model) Binary Beagle GDL File Understanding content has design: significant GDL affect: index around smaller 300% than increase Beagle for in index text files, size for larger both for GDL binary & Beagle files 31
32 Impact of metadata and content Relative Index Size Original Beagle: Index Size TextCache Default Text Image Binary DisDir DisFilter Developer aid: understanding impact of different file system content & different index schemes 32
33 Impact of metadata and content Relative Index Size Original Beagle: Index Size TextCache Beagle Default Text Image Binary DisDir DisFilter Future App Reproducing identical file-system image to compare other apps or ones developed later 33
34 Conclusion Impressions framework for realistic FS images Representative, controllable, reproducible, easy to use Includes almost all file system params of interest Extensible architecture Plug in new statistical constructs, new models for metadata and content generation Powerful utility for file-system benchmarking To be contributed publicly (coming soon) 34
35 Questions? Nitin Agrawal Impressions download (coming soon) i 35
Generating Realistic Impressions for File-System Benchmarking
Generating Realistic Impressions for File-System Benchmarking NITIN AGRAWAL, ANDREA C. ARPACI-DUSSEAU, and REMZI H. ARPACI-DUSSEAU University of Wisconsin-Madison The performance of file systems and related
More informationGenerating Realistic Impressions for File-System Benchmarking
enerating Realistic Impressions for File-System Benchmarking Nitin Agrawal, Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau Department of Computer Sciences, University of Wisconsin-Madison {nitina,
More informationEmulating Goliath Storage Systems with David
Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create
More informationFILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23
FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most
More informationZettabyte Reliability with Flexible End-to-end Data Integrity
Zettabyte Reliability with Flexible End-to-end Data Integrity Yupu Zhang, Daniel Myers, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 5/9/2013 1 Data Corruption Imperfect
More informationFile Systems. Chapter 11, 13 OSPP
File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System Performance Controlled Sharing Convenience: naming Reliability File System Workload File sizes Are most files
More informationYiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison
Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison 1 Indirection Reference an object with a different name Flexible, simple, and
More informationFile System Aging: Increasing the Relevance of File System Benchmarks
File System Aging: Increasing the Relevance of File System Benchmarks Keith A. Smith Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences File System Performance Read Throughput
More informationWarming up Storage-level Caches with Bonfire
Warming up Storage-level Caches with Bonfire Yiying Zhang Gokul Soundararajan Mark W. Storer Lakshmi N. Bairavasundaram Sethuraman Subbiah Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau 2 Does on-demand
More informationMODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2014) Vol. 3 (4) 273 283 MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION MATEUSZ SMOLIŃSKI Institute of
More informationUniversity of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space
University of Osnabruck - FTP Site Statistics Property Value FTP Server ftp.usf.uni-osnabrueck.de Description University of Osnabruck Country Germany Scan Date 17/May/2014 Total Dirs 29 Total Files 92
More informationFile System: Interface and Implmentation
File System: Interface and Implmentation Two Parts Filesystem Interface Interface the user sees Organization of the files as seen by the user Operations defined on files Properties that can be read/modified
More informationDesigning a True Direct-Access File System with DevFS
Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies
More informationCoerced Cache Evic-on and Discreet- Mode Journaling: Dealing with Misbehaving Disks
Coerced Cache Evic-on and Discreet- Mode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale *, Vijay Chidambaram, Deepak Ramamurthi Andrea Arpaci- Dusseau, Remzi Arpaci- Dusseau * Data Domain
More informationMain Points. File layout Directory layout
File Systems Main Points File layout Directory layout File System Design Constraints For small files: Small blocks for storage efficiency Files used together should be stored together For large files:
More informationPreview. COSC350 System Software, Fall
Preview File System File Name, File Structure, File Types, File Access, File Attributes, File Operation Directories Directory Operations File System Layout Implementing File Contiguous Allocation Linked
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationSegmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)
Review Segmentation Segmentation Implementation Advantage of Segmentation Protection Sharing Segmentation with Paging Segmentation with Paging Segmentation with Paging Reason for the segmentation with
More informationFile System Internals. Jo, Heeseung
File System Internals Jo, Heeseung Today's Topics File system implementation File descriptor table, File table Virtual file system File system design issues Directory implementation: filename -> metadata
More informationFile Systems. CS170 Fall 2018
File Systems CS170 Fall 2018 Table of Content File interface review File-System Structure File-System Implementation Directory Implementation Allocation Methods of Disk Space Free-Space Management Contiguous
More informationService and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838
COMP4442 Service and Cloud Computing Lecture 10: DFS2 www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu PQ838 csgeorge@comp.polyu.edu.hk 1 Preamble 2 Recall the Cloud Stack Model A B Application
More informationAtari Games - FTP Site Statistics. Top 20 Directories Sorted by Disk Space
Property Value FTP Server ftp.infogrames.net Description Atari Games Country United States Scan Date 02/Apr/2015 Total Dirs 488 Total Files 1,547 Total Data 26.66 GB Top 20 Directories Sorted by Disk Space
More informationFS Consistency & Journaling
FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching
More informationREPRESENTATIVE, REPRODUCIBLE, AND PRACTICAL BENCHMARKING OF FILE AND STORAGE SYSTEMS. Nitin Agrawal
REPRESENTATIVE, REPRODUCIBLE, AND PRACTICAL BENCHMARKING OF FILE AND STORAGE SYSTEMS by Nitin Agrawal A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationECE 598 Advanced Operating Systems Lecture 14
ECE 598 Advanced Operating Systems Lecture 14 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 March 2015 Announcements Homework #4 posted soon? 1 Filesystems Often a MBR (master
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationFile System. Preview. File Name. File Structure. File Types. File Structure. Three essential requirements for long term information storage
Preview File System File System File Name, File Structure, File Types, File Access, File Attributes, File Operation Directories Directory Operations Contiguous Allocation Linked List Allocation Linked
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 22 File Systems Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Disk Structure Disk can
More informationIntroduction (1 of 2) Performance and Extension of User Space File. Introduction (2 of 2) Introduction - FUSE 4/15/2014
Performance and Extension of User Space File Aditya Raigarhia and Ashish Gehani Stanford and SRI ACM Symposium on Applied Computing (SAC) Sierre, Switzerland, March 22-26, 2010 Introduction (1 of 2) Developing
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationTFS: A Transparent File System for Contributory Storage
TFS: A Transparent File System for Contributory Storage James Cipar, Mark Corner, Emery Berger http://prisms.cs.umass.edu/tcsm University of Massachusetts, Amherst Contributory Applications Users contribute
More informationLong-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces
File systems 1 Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple processes must be able to access the information
More informationFile Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System
File Systems Kartik Gopalan Chapter 4 From Tanenbaum s Modern Operating System 1 What is a File System? File system is the OS component that organizes data on the raw storage device. Data, by itself, is
More informationECE 598 Advanced Operating Systems Lecture 18
ECE 598 Advanced Operating Systems Lecture 18 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 5 April 2016 Homework #7 was posted Project update Announcements 1 More like a 571
More informationCS 318 Principles of Operating Systems
CS 318 Principles of Operating Systems Fall 2018 Lecture 16: Advanced File Systems Ryan Huang Slides adapted from Andrea Arpaci-Dusseau s lecture 11/6/18 CS 318 Lecture 16 Advanced File Systems 2 11/6/18
More informationPERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019
PERSISTENCE: FSCK, JOURNALING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA Project 4b: Due today! Project 5: Out by tomorrow Discussion this week: Project 5 AGENDA / LEARNING OUTCOMES How does
More informationTEFS: A Flash File System for Use on Memory Constrained Devices
2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) TEFS: A Flash File for Use on Memory Constrained Devices Wade Penson wpenson@alumni.ubc.ca Scott Fazackerley scott.fazackerley@alumni.ubc.ca
More informationCase study: ext2 FS 1
Case study: ext2 FS 1 The ext2 file system Second Extended Filesystem The main Linux FS before ext3 Evolved from Minix filesystem (via Extended Filesystem ) Features Block size (1024, 2048, and 4096) configured
More informationImproving Confidence in Network Simulations
Improving Confidence in Network Simulations Michele C. Weigle Department of Computer Science Old Dominion University Winter Simulation Conference December 5, 2006 Introduction! Credibility crisis in networking
More informationUnderstanding Your Virtual Workload. Irfan Ahmad CTO CloudPhysics
What s PRESENTATION Your Shape? TITLE GOES 5 HERE Steps to Understanding Your Virtual Workload Irfan Ahmad CTO CloudPhysics SNIA Legal Notice The material contained in this tutorial is copyrighted by the
More informationFile Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]
File Systems CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse] The abstraction stack I/O systems are accessed through a series of layered abstractions Application
More informationTolera'ng File System Mistakes with EnvyFS
Tolera'ng File System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci Dusseau Remzi H. Arpaci Dusseau University of Wisconsin Madison File Systems
More informationmode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime
Recap: i-nodes Case study: ext FS The ext file system Second Extended Filesystem The main Linux FS before ext Evolved from Minix filesystem (via Extended Filesystem ) Features (4, 48, and 49) configured
More informationCS 550 Operating Systems Spring File System
1 CS 550 Operating Systems Spring 2018 File System 2 OS Abstractions Process: virtualization of CPU Address space: virtualization of memory The above to allow a program to run as if it is in its own private,
More informationVirtual memory Paging
Virtual memory Paging M1 MOSIG Operating System Design Renaud Lachaize Acknowledgments Many ideas and slides in these lectures were inspired by or even borrowed from the work of others: Arnaud Legrand,
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationCase study: ext2 FS 1
Case study: ext2 FS 1 The ext2 file system Second Extended Filesystem The main Linux FS before ext3 Evolved from Minix filesystem (via Extended Filesystem ) Features Block size (1024, 2048, and 4096) configured
More informationComputer Systems Laboratory Sungkyunkwan University
File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationThere is a general need for long-term and shared data storage: Files meet these requirements The file manager or file system within the OS
Why a file system? Why a file system There is a general need for long-term and shared data storage: need to store large amount of information persistent storage (outlives process and system reboots) concurrent
More informationFile Management By : Kaushik Vaghani
File Management By : Kaushik Vaghani File Concept Access Methods File Types File Operations Directory Structure File-System Structure File Management Directory Implementation (Linear List, Hash Table)
More informationVirtual Allocation: A Scheme for Flexible Storage Allocation
Virtual Allocation: A Scheme for Flexible Storage Allocation Sukwoo Kang, and A. L. Narasimha Reddy Dept. of Electrical Engineering Texas A & M University College Station, Texas, 77843 fswkang, reddyg@ee.tamu.edu
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationSlacker: Fast Distribution with Lazy Docker Containers. Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H.
Slacker: Fast Distribution with Lazy Docker Containers Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Container Popularity spoon.net Theory and Practice Theory:
More informationA study of practical deduplication
A study of practical deduplication Dutch T. Meyer University of British Columbia Microsoft Research Intern William Bolosky Microsoft Research Why Dutch is Not Here A study of practical deduplication Dutch
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationUniversity of Wisconsin-Madison
Evolving RPC for Active Storage Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin-Madison Architecture of the future Everything is active Cheaper, faster processing
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationTowards Efficient, Portable Application-Level Consistency
Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash
More informationFile System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Case Studies Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics The Original UNIX File System FFS Ext2 FAT 2 UNIX FS (1)
More informationCS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding
CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,
More informationFilesystem. Disclaimer: some slides are adopted from book authors slides with permission
Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Recap Directory A special file contains (inode, filename) mappings Caching Directory cache Accelerate to find inode
More informationFile System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Implementation Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Implementing a File System On-disk structures How does file system represent
More informationFile Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18
File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18 1 Introduction Historically, the parallel version of the HDF5 library has suffered from performance
More informationViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +
ViewBox Integrating Local File Systems with Cloud Storage Services Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau + + University of Wisconsin Madison *NetApp, Inc. 5/16/2014
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationFile Systems: Fundamentals
File Systems: Fundamentals 1 Files! What is a file? Ø A named collection of related information recorded on secondary storage (e.g., disks)! File attributes Ø Name, type, location, size, protection, creator,
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationFile System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Case Studies Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics The Original UNIX File System FFS Ext2 FAT 2 UNIX FS (1)
More informationUbiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses
Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses Pengfei Tang Computer Science Dept. Worcester Polytechnic Institute (WPI) Introduction:
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationOutline. V Computer Systems Organization II (Honors) (Introductory Operating Systems) Advantages of Multi-level Page Tables
Outline V22.0202-001 Computer Systems Organization II (Honors) (Introductory Operating Systems) Lecture 15 Memory Management (cont d) Virtual Memory March 30, 2005 Announcements Lab 4 due next Monday (April
More information2. PICTURE: Cut and paste from paper
File System Layout 1. QUESTION: What were technology trends enabling this? a. CPU speeds getting faster relative to disk i. QUESTION: What is implication? Can do more work per disk block to make good decisions
More informationFile Systems: Fundamentals
1 Files Fundamental Ontology of File Systems File Systems: Fundamentals What is a file? Ø A named collection of related information recorded on secondary storage (e.g., disks) File attributes Ø Name, type,
More informationVirtual Memory. Kevin Webb Swarthmore College March 8, 2018
irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement
More informationFile Systems. File system interface (logical view) File system implementation (physical view)
File Systems File systems provide long-term information storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple processes must be able
More informationCS 537 Lecture 6 Fast Translation - TLBs
CS 537 Lecture 6 Fast Translation - TLBs Michael Swift 9/26/7 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift Faster with TLBS Questions answered in this lecture: Review
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationCopyright 2014 Fusion-io, Inc. All rights reserved.
Snapshots in a Flash with iosnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Presented By: Samer Al-Kiswany Snapshots Overview Point-in-time representation
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More information*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services
*-Box (star-box) Towards Reliability and Consistency in -like File Synchronization Services Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 6/27/2013
More informationFile Systems: Allocation Issues, Naming, and Performance CS 111. Operating Systems Peter Reiher
File Systems: Allocation Issues, Naming, and Performance Operating Systems Peter Reiher Page 1 Outline Allocating and managing file system free space File naming and directories File volumes File system
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationFile System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table
More informationA file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files.
File Systems A file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files. Module 11: File-System Interface File Concept Access :Methods Directory
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 25 File Systems Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Q 2 Data and Metadata
More informationKVZone and the Search for a Write-Optimized Key-Value Store
KVZone and the Search for a Write-Optimized Key-Value Store Salil Gokhale, Nitin Agrawal, Sean Noonan, Cristian Ungureanu NEC Laboratories America, Princeton, NJ {salil, nitin, sean, cristian}@nec-labs.com
More informationProgramming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines
A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs
More informationFile System Implementation
File System Implementation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong (jinkyu@skku.edu) Implementing
More information1 / 23. CS 137: File Systems. General Filesystem Design
1 / 23 CS 137: File Systems General Filesystem Design 2 / 23 Promises Made by Disks (etc.) Promises 1. I am a linear array of fixed-size blocks 1 2. You can access any block fairly quickly, regardless
More informationLocal File Stores. Job of a File Store. Physical Disk Layout CIS657
Local File Stores CIS657 Job of a File Store Recall that the File System is responsible for namespace management, locking, quotas, etc. The File Store s responsbility is to mange the placement of data
More informationFilesystem. Disclaimer: some slides are adopted from book authors slides with permission 1
Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Storage Subsystem in Linux OS Inode cache User Applications System call Interface Virtual File System (VFS) Filesystem
More informationFile Directories Associated with any file management system and collection of files is a file directories The directory contains information about
1 File Management 2 File Directories Associated with any file management system and collection of files is a file directories The directory contains information about the files, including attributes, location
More informationEfficient Metadata Management in Cloud Computing
Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 1485-1489 1485 Efficient Metadata Management in Cloud Computing Open Access Yu Shuchun 1,* and
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationAnnouncements. Persistence: Log-Structured FS (LFS)
Announcements P4 graded: In Learn@UW; email 537-help@cs if problems P5: Available - File systems Can work on both parts with project partner Watch videos; discussion section Part a : file system checker
More informationIntroduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1
Introduction to OS File Management MOS Ch. 4 Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Introduction to OS 1 File Management Objectives Provide I/O support for a variety of storage device
More information