Replication, History, and Grafting in the Ori File System Ali José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières Stanford University

Similar documents
Orisync Usability Improvement

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Chapter 11: Implementing File Systems

OPERATING SYSTEM. Chapter 12: File System Implementation

Chapter 10: File System Implementation

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Chapter 12: File System Implementation

Storage and File Hierarchy

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

COS 318: Operating Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Week 12: File System Implementation

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation

Using Git For Development. Shantanu Pavgi, UAB IT Research Computing

GFS: The Google File System

CS307: Operating Systems

Chapter 11: Implementing File

Deduplication Storage System

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

CS3600 SYSTEMS AND NETWORKS

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 17, 2018


Flexible Wide Area Consistency Management Sai Susarla

CA485 Ray Walshe Google File System

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

StorageCraft OneXafe and Veeam 9.5

CLOUD-SCALE FILE SYSTEMS

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

Agenda. What is Replication?

COS 318: Operating Systems. Journaling, NFS and WAFL

Test-King.VMCE_V8.40Q.A

ADVANCED DATA REDUCTION CONCEPTS

Chapter 11: File System Implementation

Chapter 11: File System Implementation

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Chapter 11: Implementing File Systems

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Cloud Computing CS

Table of Contents. Introduction 3

The Google File System

Distributed File Systems

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Staggeringly Large Filesystems

C13: Files and Directories: System s Perspective

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Datacenter replication solution with quasardb

A Low-bandwidth Network File System

Example Implementations of File Systems

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Clotho: Transparent Data Versioning at the Block I/O Level

NPTEL Course Jan K. Gopinath Indian Institute of Science

Cloud Computing CS

CSE 124: Networked Services Fall 2009 Lecture-19

Windows. Everywhere else

CS370 Operating Systems

Opportunistic Use of Content Addressable Storage for Distributed File Systems

Efficiently Backing up Terabytes of Data with pgbackrest. David Steele

Chapter 11 DISTRIBUTED FILE SYSTEMS

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Distributed File Systems II

Global Software Distribution with CernVM-FS

Introduction to Version Control

The Old World. Have you ever had to collaborate on a project by

Git. Christoph Matthies Software Engineering II WS 2018/19. Enterprise Platform and Integration Concepts group

Distributed Systems 16. Distributed File Systems II

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Towards A Better SCM: Matt Mackall Selenic Consulting

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Cohesity Microsoft Azure Data Box Integration

CSE 124: Networked Services Lecture-16

Changing Requirements for Distributed File Systems in Cloud Storage

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

FS Consistency & Journaling

Fundamentals of Git 1

CS 390 Software Engineering Lecture 3 Configuration Management

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh

Introduction to Git and Github

GFS: The Google File System. Dr. Yingwu Zhu

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Push up your code next generation version control with (E)Git

File systems: management 1

Backup App V7. Quick Start Guide for Windows

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Transcription:

Replication, History, and Grafting in the Ori File System Ali José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières Stanford University

Managed Storage $5-10/GB+ $1/GB/Year Local Storage $0.04/GB

What s missing? Data management Availability Data is always live. Accessibility Data is globally accessible. Durability Data is never lost. (History, Snapshots, Backup) Usability Collaboration and version control are easy

Ori File System Goal: All the benefits of Managed Storage, implemented with hardware you already own. Local Storage $0.04/GB

Two Main Usage Models Personal storage Shared storage Public Folders Public Folders

Managed storage limitations today Bandwidth - Limited by WAN bandwidth Privacy Storage cost - $ per GB of managed solutions Poor integration of replication, versioning & sharing - Copying files across machines - Apple Time Machine, Windows 8 File History, Applications implement their own versioning - Emailing documents, Distributed version control

Idea: Leverage trends to do better Big disks Fast LANs Mobile storage

Growth (log scale) Disk vs WAN Throughput Growth 100000 10000 1000 100 10 1 Transfer time: Internet Speed Disk Space 1990 2013 14 hours 278 days 468x Transfer Time Gap!

Ori design principles Store not just files but file history - Take advantage of disk space Replicate files and history widely - Make replication easy and instantaneous - No master replica (OK if any device fails) - Uses LAN speed and disk space Use history for sharing

Ori Provides History Public Folders Replication File Sharing with History (Grafting) Recovery

History

SFSRO/Git-like Data Model Content Addressable Storage... Older Commit Commit SHA-256 Hash Tree Tree Globally unique namespace Large Blob Tree Tree Tree Blob Deduplication Blob (fragment) Blob (fragment) Blob (shared)

Apply DVCS Techniques Merge diverging replicas Detect conflicts - No magic bullets for all file types - Make merge base available - 3-way merge line-oriented files Provide convenient tools - History, snapshots, branches,

Storage Layout Objects are deduplicated, compressed, and stored Log structured storage (files on your local file system) Index used to lookup object locations

Replication Simplify data management

Today Backup Centralized File Storage Dropbox SCP/Rsync/Airdrop

Egalitarian Replication

Replication subsumes backup Crash! Recover with Replication Background Fetch optimization makes replica creation feel instantaneous

Replication in Ori Opportunistic replication (Use LAN) - Bulk transport over SSH Automatic device discovery and synchronization - UDP multicast messages 5 second interval - Set a cluster name and symmetric key - Protected by AES-CBC

Replicate Deltas Delta... Older Commit Commit Delta consists of a collection of objects Tree Large Blob Tree Tree Tree Versioning makes Tree Blob replication easy! Blob (fragment) Blob (fragment) Blob (shared) Δ Δ

Protocol Content Addressable Storage: Objects are identical on disk and wire - No rewriting of objects Reference Counting: Decompress metadata to update reference counts - Decompression is faster than compression

Distributed Fetch WAN (Mbps) Depends on content addressable storage Trade off Storage for Bandwidth Fast LAN (Gbps) Unrelated File System

Grafting File Sharing with History

Collaboration Today Cloud Over Email Version Control

File Sharing with Versioning We want the file system to manage versioning and sharing Require no forethought in setting up version control No more insane naming: Presentation_Alice_Final_Bob_2_F inal.pptx

Grafting in Ori Alice s Latest Alice s Latest Snapshot Snapshot Alice: A 1 A 2 A 3 B 3* Bob: B 1 B 2 A 1* A 2* A 3* Cross repository links B 3 Commit History

Conflicts in Ori Detects conflicts using history Automatic merging when possible Otherwise, provide files for 3-way merge file, file:conflict, file:base Conflicts rarely occur in single user model Conflicts more likely with Grafts merges are explicit

Mobile Devices Sneakernets!

Today: Device space underutilized icloud, Google Drive, Office 365/SkyDrive

Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 Dec-14 Capacity (GB) Data Carriers: Phone Storage Space 140 120 100 80 60 40 20 0

Bandwidth (Mbps) Fast wireless networks 10000 Per-stream Bandwidth 802.11ad 1000 100 10 802.11 802.11b 802.11g 802.11n 802.11ac 1 Oct-95 Jul-98 Apr-01 Jan-04 Oct-06 Jul-09 Apr-12 Dec-14 4-8 Streams (MIMO)

Sneakernets

Sneakernets

Sneakernets Average Commute in US: 25 Minutes Carry 16 GB Storage 5.2 Gbps Effective Bandwidth

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. - Andrew S. Tanenbaum

Performance

Performance File system benchmarks: Filebench Network file system: Source code build * Everything measured on an SSD, except the network benchmark

File system in User Space (FUSE) Ori is built using FUSE Benchmark FUSE Driver (orifs, loopback) User Space Baseline against the FUSE loopback FUSE Kernel Module Kernel Ext4 Compare: ext4, ori, loopback SSD

Architecture orifs (FUSE Driver) FS Metadata In Memory (directories, fstat) Staging Area (File Data Only) libori Blob Tree Commit HttpStorage LocalStorage Connection Manager SSHStorage Object Storage (Packfiles) Index Metadata Staging Area (Data Cache) ext4

Operations/s (Normalized) Filebench: Synthetic Workloads 2.5 2 * 1.5 1 0.5 0 fileserver webserver varmail webproxy networkfs ext4 ori loopback Higher is better

Time (s) Time (s) Ori vs NFS: Remote compile 60 50 40 30 20 10 0 LAN (1 Gbps) 20.4519.45 NFSv3 NFSv4 Ori Ori w/bf 11.33 16.04 WAN (2/20 Mbps 17 ms) 60 50 40 30 20 10 0 54.85 44.07 15.3 19.34 40% longer 23% longer Lower is better BF = On-demand Background Fetch

Related Work Network File Systems AFP, CIFS, LBFS, NFS, Shark, Distributed File Systems AFS, Disconnected File Systems Coda, Ficus, JetFile, Intermezzo, Archival File Systems Elephant, Plan 9, WAFL, Wayback, ZFS, Version Control Git, Mercurial, Application Solutions Bayou, Dropbox,

Lessons Learned Hardware and use cases have evolved File systems need to catch up! Replication is no longer just for data-centers Keeping file history should be the default Mobile devices create an opportunity for better solutions - Fast LAN, Large Storage, Sneakernets

Future Work Application Support for Merging on Ori API Complications Merges can surprise applications and users Event notification? Integrating Grafting and Orisync Authentication

Questions? Visit: http://ori.scs.stanford.edu/ Available for OS X, Linux, and FreeBSD See paper for details on additional features

Backup Slides

Mobile Device Battery Life Use 802.11 (or USB) Better for battery life Some platforms have: - Periodic callbacks (opportunistic optimize battery life) - Geofencing callbacks (wake up when arriving at a location)

Operations Per Second Bonnie: IO Benchmark 300000 250000 200000 150000 100000 50000 0 16K read 16K write 16K rewrite ext4 ori loopback Higher is better

Time (s) Distributed Fetch - Performance 180 160 140 120 100 80 60 40 20 0 7.75 Distributed Pull 132.05 Partially Distributed Pull 170.79 Remote Pull Remote pull of Python 3.2.3 source Peer either has Python 2.7.3 or 3.2.3 Source Nearby Peer Destinatio n Internet 110ms 290/530KB up/down

Ori vs NFS NFSv3 NFSv4 Ori Ori on-demand LAN WAN LAN WAN LAN WAN LAN WAN Replicate 0.49 s 2.93 s Configure 8.14 s 21.52 s 7.25 s 15.54 s 0.66 s 0.66 s 1.01 s 1.33 s Build 12.32 s 33.33 s 12.20 s 28.54 s 9.50 s 9.55 s 11.45 s 12.77 s Snapshot 0.19 s 0.19 s 2.72 s 3.37 s Push 0.49 s 1.58 s 0.85 s 1.89 s Total 20.45 s 54.85 s 19.45 s 44.07 s 11.33 s 15.30 s 16.04 s 19.34 s