Passive NFS Tracing of and Research Workloads. Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST April 1, 2003

Similar documents
Proceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies

SOS Technical Report: Everything You Always Wanted to Know about NFS Trace Analysis, but Were Afraid to Ask

The Utility of File Names

On the Relationship of Server Disk Workloads and Client File Requests

File System Aging: Increasing the Relevance of File System Benchmarks

Due: March 8, 11:59pm. Project 1

New NFS Tracing Tools and Techniques for System Analysis

Network File System (NFS)

Network File System (NFS)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Finding a needle in Haystack: Facebook's photo storage

Distributed File Systems. Distributed Systems IT332

Chapter 11: Implementing File Systems

File System Internals. Jo, Heeseung

Chapter 12: File System Implementation

Chapter 10: File System Implementation

Distributed File Systems Part II. Distributed File System Implementation

Evaluating SMB2 Performance for Home Directory Workloads

EECS 482 Introduction to Operating Systems

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

OPERATING SYSTEM. Chapter 12: File System Implementation

Attribute-Based Prediction of File Properties

SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers

PostMark: A New File System Benchmark

Distributed Systems 16. Distributed File Systems II

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Let s Make Parallel File System More Parallel

What is a file system

Announcements. Reading: Chapter 16 Project #5 Due on Friday at 6:00 PM. CMSC 412 S10 (lect 24) copyright Jeffrey K.

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

CS-537: Final Exam (Fall 2013) The Worst Case

File System Basics. Farmer & Venema. Mississippi State University Digital Forensics 1

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

VFS Interceptor: Dynamically Tracing File System Operations in real. environments

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Chapter 12: File System Implementation

ROP. Research opportunity program applications are due this Friday.

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

Vnodes. Every open file has an associated vnode struct All file system types (FFS, NFS, etc.) have vnodes

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

Chapter 11: Implementing File Systems

Week 12: File System Implementation

Lecture S3: File system data layout, naming

CS 537 Fall 2017 Review Session

Making Storage Smarter Jim Williams Martin K. Petersen

Chapter 11: File System Implementation

Backup types. Excerpted from

Chapter 11: File System Implementation

University of Wisconsin-Madison

File Systems Management and Examples

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

A Comprehensive Analytical Performance Model of DRAM Caches

Chapter 11: Implementing File Systems

L7: Performance. Frans Kaashoek Spring 2013

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

File Systems. Chapter 11, 13 OSPP

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

Measuring the Processing Performance of NetSniff

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

I/O Characterization of Commercial Workloads

Visualization of Internet Traffic Features

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

Flash vs. Disk Storage: Testing Workloads is Key

Web Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery

Storage validation at GoDaddy Best practices from the world s #1 web hosting provider

NFS version 4 LISA `05. Mike Eisler Network Appliance, Inc.

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

Oracle Database 12c Performance Management and Tuning

Effective Use of CSAIL Storage

NFS Design Goals. Network File System - NFS

This project must be done in groups of 2 3 people. Your group must partner with one other group (or two if we have add odd number of groups).

Introduction. File System Design for an NFS File Server Appliance. Introduction: WAFL. Introduction: NFS Appliance 4/1/2014

Analyzing NFS Client Performance with IOzone

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

METADATA FRAMEWORK 6.3. Probe Configuration

Distributed Systems - III

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

SoftNAS Cloud Performance Evaluation on Microsoft Azure

The Google File System

Computer Systems Laboratory Sungkyunkwan University

Storage and File System

File Systems. CS170 Fall 2018

Presented by: Nafiseh Mahmoudi Spring 2017

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

CA485 Ray Walshe Google File System

It s Time for Mass Scale VDI Adoption

HYRISE In-Memory Storage Engine

Capriccio: Scalable Threads for Internet Services

Linux Filesystems Ext2, Ext3. Nafisa Kazi

I/O CHARACTERIZATION AND ATTRIBUTE CACHE DATA FOR ELEVEN MEASURED WORKLOADS

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

A Robust Classifier for Passive TCP/IP Fingerprinting

File Systems. Before We Begin. So Far, We Have Considered. Motivation for File Systems. CSE 120: Principles of Operating Systems.

Chapter 6. File Systems

Lesson 9 Transcript: Backup and Recovery

Transcription:

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003

Talk Outline Motivation Tracing Methodology Trace Summary New Findings Conclusion 2

Motivation - Why Gather Traces? Our research agenda: build file systems that Tune themselves for their workloads Can adapt to diverse workloads Underlying assumptions: There is a significant variation between workloads. There are workload-specific optimizations that we can apply on-the fly. We must test whether these assumptions hold for contemporary workloads. 3

Why Use Passive NFS Traces? Passive: no changes to the server or client Sniff packets from the network Non-invasive trace methods are necessary for real-world data collection NFS is ubiquitous and important Many workloads to trace Analysis is useful to real users Captures exactly what the server sees Matches our research needs 4

Difficulties of Analyzing NFS Traces Underlying file system details are hidden Disk activity File layout The NFS interface is different from a native file system interface No open/close, no seek Client-side caching can skew the operation mix Some NFS calls and responses are lost NFS calls may arrive out-of-order 5

Our Tracing Software Based on tcpdump and libpcap (a packetcapture library) Captures more information than tcpdump Handles RPC over TCP and jumbo frames Anonymizes the traces Very important for real-world data collection Tunable to remove/preserve specific information Open source, freely available 6

Overview of the Traced Systems CAMPUS Central college facility Almost entirely email: SMTP, POP/IMAP, pine No R&D Normal users Digital UNIX 53G of storage (1 of 14 home directory disks) EECS EE/CS facility No email Research: software projects, experiments Research users Network Appliances filer 450G of storage 7

Summary of Average Daily Activity 10/21/2001-10/27/2001 Total Ops Read Ops Write Ops Other R/W Ops CAMPUS 26.7 Million 65% (119.6 GB) 21% (44.6 GB) 14% 3.01 EECS 4.4 Million 10% (5.1 GB) 15% (9.1 GB) 75% - getattr, lookup, access 0.69 8

Workload Characteristics CAMPUS Data-Oriented 95%+ of reads/writes are to large mailboxes For newly created files: 96%+ are zero-length Most of the remainder are < 16k < 1% are write-only EECS Metadata-Oriented Mix of applications, mix of file sizes For newly created files: 5% are zero-length Less than half of the remainder are < 16k 57% are write-only 9

How File Data Blocks Die CAMPUS: 99.1% of the blocks die by overwriting Most blocks live in immortal mailboxes EECS: 42.4% of the blocks die by overwriting 51.8% die because their file is deleted Overwriting is common, and a potential opportunity to relocate/reorganize blocks on disk 10

File Data Block Life Expectancy CAMPUS: More than 50% live longer than 15 minutes EECS: Less than 50% live longer than 1 second Of the rest, only 50% live longer than two minutes Most blocks die in the cache on EECS, but on CAMPUS blocks are more likely to die on disk 11

Talk Outline Motivation Tracing Methodology Trace Summary New Findings Conclusion 12

Variation of Load Over Time EECS: load has detectable patterns CAMPUS: load is quite predictable Busiest 9am-6pm and evenings Monday - Friday Quiet in the late night / early morning Each system has idle times, which could be used for file system tuning or reorganization. Analyses of workload must include time. 13

The Daily Rhythm of CAMPUS 14

New Finding: File Names Predict File Properties For most files, there is a strong relationship between the file name and its properties Many filenames are chosen by applications Applications are predictable The filename suffix is useful by itself, but the entire name is better The relationships between filenames and file properties vary from system to another 15

Name-Based Hints for CAMPUS Files named inbox are large, live forever, are overwritten frequently, and read sequentially. Files with names starting with inbox.lock or ending with the name of the client host are zero-length lock files and live for a fraction of a second. Files with names starting with # are temporary composer files. They always contain data, but are usually short and are deleted after a few minutes. Dot files are read-only, except.history. 16

Name-Based Hints for EECS On EECS the patterns are harder to see. We wrote a program to detect relationships between file names and properties, and make predictions based upon them Developed this tool on CAMPUS and EECS data Successful on other later traces as well We can automatically build a model to accurately predict important attributes of a file based on its name. 17

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction Accuracy Length = 0 Length < 16K 99.4% 91.8% 18

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction Accuracy % of files Accuracy w/o names Length = 0 99.4% 12.5% 87.5% Length < 16K 91.8% 35.2% 64.8% 19

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction Accuracy % of files Accuracy w/o names % Error Reduction Length = 0 99.4% 12.5% 87.5% 94.9% Length < 16K 91.8% 35.2% 64.8% 76.6% 20

New Finding: Out-of-Order Requests On busy networks, requests can be delivered to the server in a different order than they were generated by the client nfsiods can re-order requests Network effects can also contribute This can break fragile read-ahead heuristics on the server We investigated this for FreeBSD and found that read-ahead was affected 21

Conclusions Workloads do vary, sometimes enormously New traces are valuable We gain new insights from almost every trace We have identified several possible areas for future research: Name-based file system heuristics Handling out-of-order requests 22

The Last Word Please contact me if you are interested in exchanging traces or using our tracing software or anonymizer: http://www.eecs.harvard.edu/sos Another resource: ellard@eecs.harvard.edu www.snia.org 23