COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Similar documents
COS 318: Operating Systems. Journaling, NFS and WAFL

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Introduction. File System Design for an NFS File Server Appliance. Introduction: WAFL. Introduction: NFS Appliance 4/1/2014

CS 202 Advanced OS. WAFL: Write Anywhere File System

DISTRIBUTED FILE SYSTEMS & NFS

What is a file system

Deduplication File System & Course Review

Today: Distributed File Systems. File System Basics

Today: Distributed File Systems

1 / 23. CS 137: File Systems. General Filesystem Design

CS 550 Operating Systems Spring File System

File Systems: Naming

Introduction to the Network File System (NFS)

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Chapter 11: Implementing File Systems

Distributed File Systems: Design Comparisons

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

File System Design for an NFS File Server Appliance

OPERATING SYSTEM. Chapter 12: File System Implementation

Distributed File Systems

Introduction to the Network File System (NFS)

Chapter 11: File System Implementation

Deduplication Storage System

Chapter 11: File System Implementation

Chapter 12: File System Implementation

Chapter 11: Implementing File Systems

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

1 / 22. CS 135: File Systems. General Filesystem Design

Chapter 11: File System Implementation. Objectives

CS5460: Operating Systems Lecture 20: File System Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

CSE 153 Design of Operating Systems

[537] Fast File System. Tyler Harter

Case study: ext2 FS 1

CSE 265: System and Network Administration

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Optimizing MySQL performance with ZFS. Neelakanth Nadgir Allan Packer Sun Microsystems

MEMBRANE: OPERATING SYSTEM SUPPORT FOR RESTARTABLE FILE SYSTEMS

Virtual File System. Don Porter CSE 306

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

Distributed File Systems. File Systems

CSE 333 Lecture 9 - storage

Chapter 10: File System Implementation

File System Internals. Jo, Heeseung

File Systems. Chapter 11, 13 OSPP

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

The Google File System

Lecture 18: Reliable Storage

Network File System (NFS)

Network File System (NFS)

Case study: ext2 FS 1

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File systems: management 1

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

Directory. File. Chunk. Disk

Soft Updates Made Simple and Fast on Non-volatile Memory

Current Topics in OS Research. So, what s hot?

Operating Systems CMPSC 473 File System Implementation April 10, Lecture 21 Instructor: Trent Jaeger

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Chapter 11: Implementing File-Systems

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Computer Systems Laboratory Sungkyunkwan University

CS 537 Fall 2017 Review Session

Dynamic Metadata Management for Petabyte-scale File Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 11: Implementing File Systems

ZFS: What's New Jeff Bonwick Oracle

Distributed File Systems

CA485 Ray Walshe Google File System

Chapter 12 File-System Implementation

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

Week 12: File System Implementation

Lamassu: Storage-Efficient Host-Side Encryption

Distributed Systems. Distributed File Systems. Paul Krzyzanowski

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

HP AutoRAID (Lecture 5, cs262a)

File Systems. CS170 Fall 2018

Virtual File System. Don Porter CSE 506

Operating Systems Design 16. Networking: Remote File Systems

Distributed Systems - III

Operating Systems. File Systems. Thomas Ropars.

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

ECE 598 Advanced Operating Systems Lecture 18

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Operating Systems, Fall

Chapter 11: Implementing File

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Tricky issues in file systems

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

CS307: Operating Systems

Distributed Systems. Lecture 07 Distributed File Systems (1) Tuesday, September 18 th, 2018

Announcements. Persistence: Crash Consistency

VERITAS Foundation Suite TM 2.0 for Linux PERFORMANCE COMPARISON BRIEF - FOUNDATION SUITE, EXT3, AND REISERFS WHITE PAPER

The UNIX Time- Sharing System

Block Device Scheduling. Don Porter CSE 506

Transcription:

COS 318: Operating Systems NSF, Snapshot, Dedup and Review

Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2

Network File System! Sun introduced NFS v2 in early 80s! NFS server exports directories to clients! Clients mount NFS server s exported directories (auto-mount is possible)! Multiple clients share a NFS server NFS server Network Clients 3

NFS Protocol (v3) 1. NULL: Do nothing 2. GETATTR: Get file attributes 3. SETATTR: Set file attributes 4. LOOKUP: Lookup filename 5. ACCESS: Check Access Permission 6. READLINK: Read from symbolic link 7. READ: Read From file 8. WRITE: Write to file 9. CREATE: Create a file 10. MKDIR: Create a directory 11. SYMLINK: Create a symbolic link 12. MKNOD: Create a special device 13. REMOVE: Remove a File 14. RMDIR: Remove a Directory 15. RENAME: Rename a File or Directory 16. LINK: Create Link to an object 17. READDIR: Read From Directory 18. READDIRPLUS: Extended read from directory 19. FSSTAT: Get dynamic file system information 20. FSINFO: Get static file system Information 21. PATHCONF: Retrieve POSIX information 22. COMMIT: Commit cached data on a server to 4

NFS Protocol! No open and close " Server doesn t really know what clients are doing, who has files open, etc! Use a global handle in the protocol " Read some bytes " Write some bytes! Questions " What is stateless? " Is NFS stateless? " What is the tradeoffs of stateless vs. stateful? 5

NFS Implementation NFS Server Virtual file system Client kernel Virtual file system Local FS Local FS NFS server NFS client Local FS Local FS Buffer cache Buffer cache Network 6

NFS Client Caching Issues! Client caching " Read-only file and directory data (expire in 60 seconds) " Data written by the client machine (write back in 30 seconds)! Consistency issues " Multiple client machines can perform writes to their caches " Some cache file data only and disable client caching of a file if it is opened by multiple clients " Some implement a network lock manager 7

NFS Protocol Development! Version 2 issues " 18 operations " Size: limit to 4GB file size " Write performance: server writes data synchronously " Several other issues! Version 3 changes (most products still use this one) " 22 operations " Size: increase to 64 bit " Write performance: WRITE and COMMIT " Fixed several other issues " Still stateless! Version 4 changes " 42 operations " Solve the consistency issues " Security issues " Stateful 8

Case Study: NetApp s NFS File Server! WAFL: Write Anywhere File Layout " The basic NetApp s file system! Design goals " Fast services (fast means more operations/sec and higher bandwidth) " Support large file systems and allow growing smoothly " High-performance software RAID " Restart quickly after a crash! Special features " Introduce snapshots " Use NVRAM to reduce latency and maintain consistency 9

Snapshots! A snapshot is a read-only copy of the file system " Introduced in 1993 " It has become a standard feature of today s file server! Use snapshots " System administrator configures the number and frequency of snapshots " An initial system can keep up to 20 snapshots " Use snapshots to recover individual files! An example arizona% cd.snapshot arizona% ls hourly.0 hourly.2 hourly.4 nightly.0 nightly.2 weekly.1 hourly.1 hourly.3 hourly.5 nightly.1 weekly.0 arizona%! How much space does a snapshot consume? " 10-20% space per week 10

i-node, Indirect and Data Blocks! WAFL uses 4KB blocks " i-nodes (evolved from UNIX s) " Data blocks! File size < 64 bytes " i-node stores data directly! File size < 64K bytes " i-node stores 16 pointers to data Data Data Data! File size < 64M bytes " i-node stores 16 pointers to indirect blocks " Each indirect pointer block stores 1K pointers to data Data Data Data 11

WAFL Layout! A WAFL file system has " A root i-node: root of everything " An i-node file: contains all i-nodes " A block map file: indicates free blocks " An i-node map file: indicates free i-nodes Metadata in files " Data files: real files that users can see 12

Why Keeping Metadata in Files! Allow meta-data blocks to be written anywhere on disk " This is the origin of Write Anywhere File Layout " Any performance advantage?! Easy to increase the size of the file system dynamically " Adding a disk can lead to adding i-nodes! Enable copy-on-write to create snapshots " Fixed metadata locations are cumbersome " Copy-on-write new data and metadata to any new disk locations 13

Snapshot Implementation! WAFL file system is a tree of blocks! Snapshot step 1 " Replicate the root i-node " New root i-node is the active file system " Old root i-node is the snapshot! Snapshot step 2 n " Copy-on-write blocks to the root " Active root i-node points to the new blocks " Writes to the new block " Future writes into the new blocks will not trigger copy-on-write A Root 1 2 B C D F Root 1 1 C C Modify Modify 14

File System Consistency! Create a snapshot " Create a consistency point or snapshot every 10 seconds " On a crash, revert the file system to this snapshot " Not visible by users! Many requests between consistency points " Consistency point i " Many writes " Consistency point i+1 (advanced atomically) " Many writes "! Question " Any relationships with transactions? 15

Non-Volatile RAM! Non-Volatile RAM " Flash memory (slower) " Battery-backed DRAM (fast but battery lasts for only days)! Use an NVRAM to buffer writes " Buffer all write requests since the last consistency point " A clean shutdown empties NVRAM, creates one more snapshot, and turns off NVRAM " A crash recovery needs to recover data from NVRAM to the most recent snapshot and turn on the system! Use two logs " Buffer one while writing another! Issues " What is the main disadvantage of NVRAM? " How large should the NVRAM be? 16

Write Allocation! WAFL can write to any blocks on disk " File metadata (i-node file, block map file and i-node map file) is in the file system! WAFL can write blocks in any order " Rely on consistency points to enforce file consistency " NVRAM to buffer writes to implement ordering 17

Snapshot Data Structure! WAFL uses 32-bit entries in the block map file " 32-bit for each 4KB disk block " 32-bit entry = 0: the block is free! Bit 0 = 1: active file system references the block! Bit 1 = 1: the most recent snapshot references the block 18

Snapshot Creation! Problem " Many NFS requests may arrive while creating a snapshot " File cache may need replacements " Undesirable to suspend the NFS request stream! WAFL solution " Before a creation, mark dirty cache data in-snapshot and suspend NFS request stream " Defer all modifications to in-snapshot data " Modify cache data not marked in-snapshot " Do not flush cache data not marked in-snapshot 19

Algorithm! Steps " Allocate disk space for in-snapshot cached i-nodes Copy these i-nodes to disk buffer Clear in-snapshot bit of all cached i-nodes " Update the block-map file For each entry, copy the bit for active FS to the new snapshot " Flush Write all in-snapshot disk buffers to their new disk locations Restart NFS request stream " Duplicate the root i-node! Performance " Typically it takes less than a second 20

Snapshot Deletion! Delete a snapshot s root i-node! Clear bits in block-map file " For each entry in block-map file, clear the bit representing the snapshot 21

Performance! SPEC SFS benchmark shows 8X faster than others 22

Progress on Data Protection Deduplication for capacity optimization 111001010100 100101010110 101010111010 101010010101 111010101010 10111110101 10111110101 ~10-50X Traffic & storage reduction Deduplication for bandwidth optimization 111001010100 100101010110 101010111010 101010010101 111010101010 10101010111 111001010100 100101010110 101010111010 101010010101 111010101010 ~10X Traffic reduction Continuous Data Protection ~3x Traffic reduction Snapshot ~5x Traffic reduction 23

Global Compression or Deduplication Then, apply Local Compression on unique segments 24

De-duplicated Replication 25

One Method (used by Data Domain)! A file is divided into segments " Use secure hashes as references to segments! Read a file " Use the hashes to fetch data segments! Write a file " Anchor the write data stream into segments " For each segment Compute its secure hash and lookup database If the hash is new, output the segment and insert the hash into database If it is a duplicate, output hash " Apply local compression to each unique segment " Bundle many segments before writing to disk 26

Why Challenging?! High read throughput " Prefetch the right segments " Coarse-grained segments are better! High write throughput " Avoid disk I/Os to check redundancy " Coarse-grained segments are better! High global compression ratio " Handle data shifts by content-based anchoring (Manber 94) " Fine-grained segments are better! Data Domain s performance " Throughput (~250MB/sec on 2x dual Xeon server) " Compression ratios: 10-30x (~8KB segments) 27