File System Aging: Increasing the Relevance of File System Benchmarks

Similar documents
Published in the Proceedings of the 1997 ACM SIGMETRICS Conference, June File System Aging Increasing the Relevance of File System Benchmarks

A Comparison of FFS Disk Allocation Policies

File Layout and File System Performance

A Comparison of FFS Disk Allocation Policies

[537] Fast File System. Tyler Harter

Main Points. File layout Directory layout

File System Logging versus Clustering: A Performance Comparison

EECS 482 Introduction to Operating Systems

Outlook. File-System Interface Allocation-Methods Free Space Management

File Systems Part 1. Operating Systems In Depth XIV 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

File System Logging Versus Clustering: A Performance Comparison

CS 318 Principles of Operating Systems

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

CS 318 Principles of Operating Systems

File Systems. Chapter 11, 13 OSPP

TFS: A Transparent File System for Contributory Storage

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

Fast File System (FFS)

Operating Systems. Operating Systems Professor Sina Meraji U of T

Improving throughput for small disk requests with proximal I/O

CS 537 Fall 2017 Review Session

Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Erik Riedel Hewlett-Packard Labs

Journaling versus Soft-Updates: Asynchronous Meta-data Protection in File Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CSC369 Lecture 9. Larry Zhang, November 16, 2015

SMORE: A Cold Data Object Store for SMR Drives

File Systems: FFS and LFS

File System Internals. Jo, Heeseung

File System Implementations

Table of Contents. HP A7173A PCI-X Dual Channel Ultra320 SCSI Host Bus Adapter. Performance Paper for HP PA-RISC Servers

L7: Performance. Frans Kaashoek Spring 2013

CMU Storage Systems 11 Oct 2005 Fall 2005 Exam 1

CS370 Operating Systems

b. How many bits are there in the physical address?

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Locality and The Fast File System. Dongkun Shin, SKKU

File Systems: Fundamentals

File Systems Part 2. Operating Systems In Depth XV 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Availability and Utility of Idle Memory in Workstation Clusters. Anurag Acharya, UC-Santa Barbara Sanjeev Setia, George Mason Univ

File Systems: Fundamentals

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

Chapter 12: File System Implementation

makes floppy bootable o next comes root directory file information ATTRIB command used to modify name

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

CS 111. Operating Systems Peter Reiher

Chapter 14: File-System Implementation

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

CSE 153 Design of Operating Systems

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University

COMP 530: Operating Systems File Systems: Fundamentals

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

I/O and file systems. Dealing with device heterogeneity

CS3600 SYSTEMS AND NETWORKS

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

Final Exam Preparation Questions

File Systems. File system interface (logical view) File system implementation (physical view)

WHITE PAPER. Optimizing Virtual Platform Disk Performance

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 4284 Systems Capstone

CS370 Operating Systems

Maximizing VMware ESX Performance Through Defragmentation of Guest Systems

Lecture S3: File system data layout, naming

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

CMU Storage Systems 11 Oct 2005 Fall 2005 Exam 1

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

File System Management

File System Implementation

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

What is a file system

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

OPERATING SYSTEM. Chapter 12: File System Implementation

CS4500/5500 Operating Systems File Systems and Implementations

CS510 Operating System Foundations. Jonathan Walpole

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

CS 550 Operating Systems Spring File System

Chapter 11: Implementing File Systems

THE World Wide Web is a large distributed information

File Systems. CS170 Fall 2018

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Main Points. File layout Directory layout

Disk Scheduling COMPSCI 386

FFS. CS380L: Mike Dahlin. October 9, Overview of file systems trends, future. Phase 1: Basic file systems (e.g.

Maximizing NFS Scalability

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Systems Architecture II

Advanced Computer Architecture

2. PICTURE: Cut and paste from paper

File Systems. Before We Begin. So Far, We Have Considered. Motivation for File Systems. CSE 120: Principles of Operating Systems.

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Name: 1. CS372H: Spring 2009 Final Exam

Name: Instructions. Problem 1 : Short answer. [56 points] CMU Storage Systems 25 Feb 2009 Spring 2009 Exam 1

Stupid File Systems Are Better

CSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Transcription:

File System Aging: Increasing the Relevance of File System Benchmarks Keith A. Smith Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

File System Performance Read Throughput (MB/sec) 3.0 2.5 2.0 1.5 1.0 0.5 Empty 80% Full 0 16 64 256 1024 File Size (KB) 4096 16384

Problem #1 Full and empty file systems perform differently. Most research uses empty file systems. Real world file systems are never empty.

Don t benchmark empty file systems!

Problem #2 Just filling a file system isn t enough. The history of a file system determines its state. Design decisions may affect how state evolves over time. Most research uses empty file systems. Researchers ignore a large area of design space.

Don t benchmark empty file systems!

Our Solution Use simulated workload to age file system.

Overview Problem File system aging Creating the workload Verifying the workload Example Conclusions

File System Aging Goals Examine state of file system after many months of activity. Support different workloads. Allow reproducibility. Be architecture independent. Make easy to use.

File System Aging Method Use real file system usage patterns to generate artificial aging workload. Aging workload is sequence of file create, write, and delete operations. Different workloads mimic different usage patterns. Reproducibility provided by reusing same workload. Workload parameterized in terms of POSIX interface.

Source for Aging Workload Long term trace was impractical. Data we had available: 1.Unix file system snapshots Describes all files on file system. Daily for one year 2.NFS traces All NFS requests to large file server. Continuous for two weeks.

Generating Aging Workload 1. Start with sequence of snapshots. 2. Populate file system. Create files present in first snapshot. 3. Add inter-day file activity. Compare successive snapshots. Identify created and deleted files. Add corresponding create, write, and delete operations.

Generating Aging Workload 4. Add intra-day file activity. Use NFS traces to model short-lived file activity. Intersperse create, write, and delete operations based on model.

Sample Workload Aging Workload: Seven months of activity 1 GB file system ~1.3 million file operations Writes 87.3 GB to disk Typical run time is 39 hours.

Verifying Workload Start with empty file system. Age file system using workload. Execute file operations from workload on the test file system. Compare file fragmentation on aged file system to last snapshot of file system from which workload was generated.

Verification Metric Layout Score Measures quality of file layout Range: 0.0 1.0 Inversely proportional to file fragmentation Score is percentage of file system blocks that are contiguous 1.0 => All files are contiguously allocated 0.0 => No contiguous allocation

Aging Verification 1.0 0.8 Layout Score 0.6 0.4 0.2 Simulated Real 0.0 0 50 100 150 200 Time (Days)

Example Modification to UNIX file system (FFS) Use aging to evaluate performance tradeoffs.

Test Platform 200 MHz Pentium Pro 32 MB RAM PCI Bus NCR 53c825 SCSI controller Fujitsu M2694ES disk 1 GB, 5400 RPM, 15 Heads, 94 Sect./ Track (avg.), 1818 Cyl. 9.5 ms Avg. Seek BSD/OS 2.1 8 KB file system block size maxcontig = 7 blocks (56 KB)

Baseline FFS Performance (Aged file system) 2.5 Throughput (MB/sec) 2.0 1.5 1.0 0.5 Read Write 0.0 16 64 256 1024 4096 16384 File Size (KB) 96KB

The UNIX File System (FFS) File System 0......N...... Cylinder Group Data Blocks Inode Block Inode Size Owner Permission Block List

Cylinder Groups Cylinder groups are allocation pools. They exploit locality of reference. Related data are collocated in same cylinder group. All files in a directory Sequential blocks of a file

File Allocation First 12 file data blocks are allocated from same cylinder group as the file s directory. The 13th and subsequent blocks are allocated in a different cylinder group. All files have a large seek between 12th and 13th block. 12 blocks = 96 KB

Solution NoSwitch file system Don t switch cylinder groups after the 12th file block.

Potential Problem Too many large files in one directory would cause cylinder group to run out of space. Creates split files. Files in different cylinder group than their directory. Extra seek to get from directory to file. But does this happen? If so, how does it affect performance?

Evaluation of NoSwitch Age two file systems, one that switches cylinder groups, and one that doesn t Compare the resulting file systems Overall performance Number of split files.

Performance File System Throughput (MB/sec) 3 2 1 NoSwitch (Read) Baseline (Read) NoSwitch (Write) Baseline (Write) 0 16 64 256 1024 4096 16384 File Size (KB)

Number of Split Files Number of Files Number of Split Files Percentage of Split Files Baseline NoSwitch 33,797 33,797 4,312 9,155 13% 27%

Hot File Benchmark Measure performance using files from aging workload Files modified during final 30 days 92 MB (14.5% of allocated storage) 3,207 files (9.5% of files) 119 files large enough to benefit from NoSwitch Two phase benchmark: 1.Read entire file set 2.Overwrite entire file set

Hot File Performance Layout Score Number of Split Files Read Throughput Write Throughput Baseline NoSwitch 0.928 0.931 327 594 0.81 MB/sec 0.84 MB/sec 0.49 MB/sec 0.50 MB/sec

Analysis NoSwitch file system improves performance of medium and large files. NoSwitch file system increases the number of split files. Net effect is small performance improvement. Exact trade-off depends on workload!

Conclusions Benchmarking empty file systems is unrealistic. Benchmarking empty file systems can be misleading. File system aging is a technique for increasing the relevance of file system benchmarking.

Don t benchmark empty file systems!

File System Aging: Increasing the Relevance of File System Benchmarks Keith A. Smith Margo I. Seltzer keith@eecs.harvard.edu margo@eecs.harvard.edu http://www.eecs.harvard.edu/~keith/sigmetrics97

Fragmentation Metric Layout Score measures fragmentation Fraction of blocks that are contiguous Ignores first block of a file. Score 0.0 Sample File Layout 0.5 1.0 Contiguous Not Contiguous

Sequential I/O Benchmark 32 MB data set Uniform file size (16 16,384 KB) 25 files per directory Two Phases Create Phase: Create and write all files Read Phase: Read all files

Comparison (empty) 6 Read Throughput (MB/sec) 5 4 3 2 1 Smart Clustering Dumb Clustering 0 16 64 256 1024 4096 16384 File Size (KB)

Comparison (aged) 6 Throughput (MB/sec) 5 4 3 2 1 Smart Clustering Dumb Clustering 0 16 64 256 1024 4096 16384 File Size (KB)

Aging Verification 1.0 0.8 Layout Score 0.6 0.4 0.2 Simulated Real 0.0 16 64 256 1024 4096 16384 65536 File Size (KB)

Performance (empty) File System Throughput (MB/sec) 3 2 1 NoSwitch (Read) Baseline (Read) NoSwitch (Write) Baseline (Write) 0 16 64 256 1024 4096 16384 File Size (KB)

Seek Distances in Split Files 10000 Number of Split Files (cumul.) 8000 6000 4000 2000 NoSwitch Baseline 0 0 10 20 30 40 50 60 Distance (# of cylinder groups)

Future Work Improve aging algorithm Expand to cover more workloads. Parameterize for amount of aging or size of file system.