A Low-bandwidth Network File System

Similar documents
A Low-bandwidth Network File System

Distributed File Systems I

DISTRIBUTED FILE SYSTEMS & NFS

Stanford University Computer Science Department CS 240 Sample Quiz 2 Questions Winter February 25, 2005

3/4/14. Review of Last Lecture Distributed Systems. Topic 2: File Access Consistency. Today's Lecture. Session Semantics in AFS v2

Key Point. What are Cache lines

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Announcements. Persistence: Log-Structured FS (LFS)

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Network File System (NFS)

Distributed file systems

Network File System (NFS)

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Distributed File Systems II

Distributed File Systems

Operating Systems. File Systems. Thomas Ropars.

Efficient Resource Allocation And Avoid Traffic Redundancy Using Chunking And Indexing

The Google File System (GFS)

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Securing the Frisbee Multicast Disk Loader

Networks and distributed computing

I/O Buffering and Streaming

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

15-440/15-640: Homework 1 Due: September 23, :59pm

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Replication, History, and Grafting in the Ori File System Ali José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières Stanford University

GFS: The Google File System

File Systems: Consistency Issues

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Current Topics in OS Research. So, what s hot?

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Distributed File Systems. Paul Krzyzanowski

Staggeringly Large Filesystems

Network File Systems

GFS: The Google File System. Dr. Yingwu Zhu

Announcements. Persistence: Crash Consistency

Stanford University Computer Science Department CS 240 Sample Quiz 2 Answers Winter February 25, 2005

Distributed File Systems Part II. Distributed File System Implementation

University of Wisconsin-Madison

Remote Procedure Call (RPC) and Transparency

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

The Google File System

NFSv4 as the Building Block for Fault Tolerant Applications

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

The Google File System

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

Google File System. Arun Sundaram Operating Systems

COS 318: Operating Systems

Scalability of web applications

CS 318 Principles of Operating Systems

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Page 1. Multilevel Memories (Improving performance using a little cash )

CS454/654 Midterm Exam Fall 2004

Heckaton. SQL Server's Memory Optimized OLTP Engine

Ext3/4 file systems. Don Porter CSE 506

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Async Programming & Networking. CS 475, Spring 2018 Concurrent & Distributed Systems

Advanced Memory Management

Section 14: Distributed Storage

S3FS in the wide area Alvaro Llanos E M.Eng Candidate Cornell University Spring/2009

Operating Systems Design Exam 3 Review: Spring 2011

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005

Networks and distributed computing

The Google File System

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

High Performance Computing Lecture 26. Matthew Jacob Indian Institute of Science

CSE 153 Design of Operating Systems

Extreme Computing. Introduction to MapReduce. Cluster Outline Map Reduce

Improving Bandwidth Efficiency of Peer-to-Peer Storage

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

CIFS Acceleration Techniques

EECS 482 Introduction to Operating Systems

Storage and File Hierarchy

Operating Systems Design 16. Networking: Remote File Systems

Flowreplay Design Notes

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Virtualization Technique For Replica Synchronization

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Storage and File System

CLOUD-SCALE FILE SYSTEMS

Performance Tuning on the Blackfin Processor

6.033 Computer System Engineering

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Chapter 17: Distributed Systems (DS)

Distributed File Systems. Directory Hierarchy. Transfer Model

CS 318 Principles of Operating Systems

FS Consistency & Journaling

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Consistency

Remote Procedure Call. Tom Anderson

What is Network Acceleration?

I/O and file systems. Dealing with device heterogeneity

Transcription:

A Low-bandwidth Network File System Athicha Muthitacharoen, Benjie Chen MIT Lab for Computer Science David Mazières NYU Department of Computer Science

Motivation Network file systems are a useful abstraction... But few people use them over wide-area networks - E.g., people who travel use network file systems - But don t use them over 802.11b while traveling - FSes used over WAN provide non-traditional semantics Network file systems require too much bandwidth - Saturate bottleneck links - Interfere with other users - Block processes for seconds while waiting for network

Other ways of accessing remote data Relax consistency semantics (CODA, CVS,... ) - Many applications need strict consistency (email, RCS,... ) Copy files back and forth to work on them - Threatens consistency where is latest version? - Not all files will work if copied (symlinks, CVS/Root,... ) Use remote login to work on files remotely - Graphical applications require too much bandwidth (figure editors, postscript previewers,... ) - Interactive programs sensitive to latency and packet loss - Delayed character echoes are extremely frustrating!

Remote login frustration! Execution time (seconds) 2000 1500 1000 500 0 0% 2% 4% 6% 8% Loss rate ssh AFS Leases+Gzip LBFS

MBytes 50 40 30 20 10 0 Client server bandwidth Keys typed NFS AFS LBFS

Observation: Much inter-file commonality Editing/word processing workloads - Often only modify one part of a large file - Generate autosave files with mostly redundant content Software development workloads - Modify header & recompile recreate similar object files - Concatenate object files into a library LBFS: Exploit commonality to save bandwidth - Won t always work, but big potential savings

Avoiding redundant data transfers Identify blocks by collision-resistant hash To transfer a file between client and server - Break file into 8K data chunks - Send hashes of the file s chunks - Only send chunks actually needed by recipient Index file system and client cache to find chunks - Keep database mapping hash file, offset, len - Use chunks from any file in reconstructing any other

Dividing files into chunks Straw man: Split file into aligned 8K chunks - Inserting one byte at start of file changes all chunks Base chunks on file contents, not position - Allow variable-length chunks - Compute running hash of every overlapping 48-byte region - If hash mod 8K is special value, create chunk boundary Chunk boundaries insensitive to shifting offsets - Inserting/deleting data only effects surrounding chunk(s)

Example: Breaking a file into chunks a. c 1 c 2 c 3 c 4 c 5 c 6 c 7 b. c 1 c 2 c 3 c 8 c 5 c 6 c 7 c. c 1 c 2 c 3 c 8 c 9 c 10 c 6 c 7 d. c 1 c 11 c 8 c 9 c 10 c 6 c 7

Tiny chunks Pathological cases - E.g., caused by unlucky 48-byte region repeated - Sending hashes consume more bandwidth than data Enormous chunks - E.g., long run of all zeros - Hard to handle (can t hold chunks in memory) Solution: Impose min/max chunk sizes (2K/64K) - Could conceivably derail alignment - Just an optimization, can afford low-probability failures - Problem-cases often very compressible!

LBFS overview Provides traditional file system semantics - Close-to-open consistency - Data safely stored on server before close returns Large client cache holds user s working set - Eliminates all communication not required for consistency - When user modifies file, must write through to server - When different client modifies file, download new version Elides transfers of redundant data Conventionally compresses remaining traffic

LBFS protocol Derived from the NFS protocol Adds more aggressive caching - Persistent, on-disk cache holds user s entire working set - Callbacks & Leases save an RPC for many open/stat calls Client and server index data chunks with a B-tree Five new RPCs exploit inter-file commonality - GETHASH like read, but returns hashes not data - CONDWRITE a write that takes a hash instead of data - 3 RPCs for atomic file updates

Read caching Leases let client validate cached attributes - Most file operations grant client a lease on attributes - Server must notify client if attributes change while leased Attributes let client validate cached file contents - Check modification/change times When client must downloaded a file - Retrieve file s chunk hashes with GETHASH - Request chunks not already in cache using normal READs - Update the local chunk index to reflect new cache data

Read protocol Client GETHASH READ READ Server (hash1, size1) (hash2, size2) (hash3, size3) EOF data2 data3

Writing back a modified file Idea: First send hashes, then missing data Complications: - New file likely contains many chunks it is overwriting - Unaligned writes can be expensive (cause disk read) - Reordering writes creates confusing intermediary states - What if client crashes in the middle of sending file? Solution: Atomic updates - Write data to new temporary file - Commit contents of temporary file to file being written

Atomic update RPCs MKTMPFILE RPC creates a temporary file - File named by client-chosen descriptor CONDWRITE sends hashes of chunks - Can be immediately pipelined behind MKTMPFILE - Server writes chunk if in DB, else returns NOTFOUND TMPWRITE sends data for NOTFOUND chunks COMMITTMP copies temporary file to target file Server updates chunk index

Update protocol Client MKTMPFILE CONDWRITE CONDWRITE TMPWRITE COMMITTMP Server OK NOTFOUND OK OK

Implementation chunk index LBFS client TCP LBFS server chunk index cache xfs client NFS server file sys. Client uses xfs, device driver of ARLA AFS clone Server accesses FS by pretending to be NFS client Index uses BerkeleyDB B-tree

Implementation details Never assume chunk index is correct - Automatically fix errors as encountered - No need for expensive crash-recovery precautions - Allows server to be updated by non-lbfs clients Keep old temporary files around - Often contain useful chunks for subsequent files - Move to trash directory, evict in FIFO order Background thread deletes invalid DB entries

Bandwidth: emacs recompile MBytes 30 20 10 NFS AFS Leases+Gzip LBFS, new DB LBFS 0 Upstream Downstream

Performance: emacs recompile 1312 NFS LAN Seconds 1000 500 138 470 193 182 113 NFS AFS Leases+Gzip LBFS, new DB LBFS 0 Evaluated over simulated ADSL line - 1.5 Mbit/sec downstream, 348 Kbit upstream, 30 ms latency - LBFS on ADSL beats NFS on 100Mbit/sec LAN

Compile time vs. bandwidth Execution time (Seconds) 1600 1400 1200 1000 800 600 400 200 AFS Leases+Gzip LBFS 0 0.1 1 10 Bandwidth (Mbps)

Saving 1.4 MByte MSWord doc MBytes 5 4 3 2 1 Seconds 100 80 60 40 20 CIFS LAN AFS LAN CIFS AFS Leases+Gzip LBFS 0 Upstream b/w 0 Run time

Effect of network latency on performance Execution time (seconds) 400 350 300 250 200 150 100 50 0 AFS Leases+Gzip LBFS 20 40 60 80 100 Round trip time (ms)

Related work Weaken consistency (CODA) Send deltas (Diff/patch, CVS, xdelta) - Requires server to keep around old versions of files The rsync algorithm (synchronize two files) - One file often contains chunks of many files (e.g., ar) - Not obvious which file to choose at receiving end (emacs: #foo# foo, RCS: 1v22825 foo,v,... )

Conclusions Network file system often best way to access data - Copying files back and forth threatens consistency - Remote login frustrating given latency or packet loss Most file systems too bandwidth-hungry for WAN LBFS exploits file commonality to save bandwidth - Break files into variable-size chunks based on contents - Index chunks in file system and client cache - Avoid sending chunks already present in other files LBFS works where other file systems impractical