Opportunistic Use of Content Addressable Storage for Distributed File Systems

Similar documents
Opportunistic Use of Content Addressable Storage for Distributed File Systems

Consistency-preserving Caching of Dynamic Database Content

Deduplication Storage System

Improving Bandwidth Efficiency of Peer-to-Peer Storage

TFS: A Transparent File System for Contributory Storage

Content Addressable Storage Provider in Linux. Vesselin Dimitrov. Project Advisor: Jessen Havill Department of Mathematics and Computer Science

Column Stores vs. Row Stores How Different Are They Really?

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

GFS: The Google File System

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

Replication, History, and Grafting in the Ori File System Ali José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières Stanford University

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

15-441: Computer Networks Homework 3

No-Compromise Caching of Dynamic Content from Relational Databases

Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility

Application Protocols and HTTP

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

HTTP. Robert Grimm New York University

Simplifying Wide-Area Application Development with WheelFS

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/15

Peer-to-Peer Systems and Distributed Hash Tables

Outline 9.2. TCP for 2.5G/3G wireless

Consistency-preserving Caching of Dynamic Database Content

CIFS Acceleration Techniques

DFS Case Studies, Part 2. The Andrew File System (from CMU)

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Overview Content Delivery Computer Networking Lecture 15: The Web Peter Steenkiste. Fall 2016

University of Wisconsin-Madison

Hypertext Transport Protocol HTTP/1.1

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

PLEASE READ CAREFULLY BEFORE YOU START

SCCM Plug-in (for the Jamf Software Server) User Guide. Version 3.51

Disconnected Operation in the Coda File System

Database proxies in Perl

CMSC 332 Computer Networking Web and FTP

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

Computer Networks. Wenzhong Li. Nanjing University

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

Panzura White Paper Panzura Distributed File Locking

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University

Distributed File Systems Part II. Distributed File System Implementation

IOPStor: Storage Made Easy. Key Business Features. Key Business Solutions. IOPStor IOP5BI50T Network Attached Storage (NAS) Page 1 of 5

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

The Google File System

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Deduplication File System & Course Review

NSF Future Internet Architecture. Outline. Predicting the Future is Hard! The expressive Internet Architecture: from Architecture to Network

A Low-bandwidth Network File System

Clotho: Transparent Data Versioning at the Block I/O Level

Peer-to-peer Sender Authentication for . Vivek Pathak and Liviu Iftode Rutgers University

Files/News/Software Distribution on Demand. Replicated Internet Sites. Means of Content Distribution. Computer Networks 11/9/2009

3/4/14. Review of Last Lecture Distributed Systems. Topic 2: File Access Consistency. Today's Lecture. Session Semantics in AFS v2

ORB Performance: Gross vs. Net

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

HYDRAstor: a Scalable Secondary Storage

Oracle Advanced Compression: Reduce Storage, Reduce Costs, Increase Performance Bill Hodak Principal Product Manager

Distributed File Systems

System Requirements. PREEvision. System requirements and deployment scenarios Version 7.0 English

Latency of Remote Access

Ultra high-speed transmission technology for wide area data movement

Improving the Robustness of TCP to Non-Congestion Events

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

CSE 486/586: Distributed Systems

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

Using Transparent Compression to Improve SSD-based I/O Caches

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties:

What is a file system

Research on Hierarchical Storage System based on Wireless Mesh Network. Xin Hu

Teleporter: An Analytically and Forensically Sound Duplicate Transfer System

ECE7995 (7) Parallel I/O

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

This Readme describes the NetIQ Access Manager 3.1 SP5 release.

CFLRU:A A Replacement Algorithm for Flash Memory

Slides for Chapter 10: Peer-to-Peer Systems

Cloud Computing CS

FILE EXCHANGE PROTOCOLS AND ZERO CONFIGURATION NETWORKING

Portable Storage Support for Cyber Foraging

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

CernVM a virtual software appliance for LHC applications

1-1. Switching Networks (Fall 2010) EE 586 Communication and. September Lecture 10

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh

Warming up Storage-level Caches with Bonfire

Speeding Up Cloud/Server Applications Using Flash Memory

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

Port-Scanning Resistance in Tor Anonymity Network. Presented By: Shane Pope Dec 04, 2009

Transcription:

Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia *, Michael Kozuch, M. Satyanarayanan *, Brad Karp, Thomas Bressoud, and Adrian Perrig * * Carnegie Mellon University, Intel Research Pittsburgh and Denison University

Introduction Using a Distributed File System on a Wide Area Network is slow! However, there seems to be a growth in the number of providers of Content Addressable Storage (CAS) Therefore can we make opportunistic use of these CAS providers to benefit client-server file systems like NFS, AFS, and Coda? 2

Content Addressable Storage Content Addressable Storage is data that is identified by its contents instead of a name Foo.txt Foo.txt 0xd58e23b71b1b... File Cryptographic Hash Content Addressable Name An example of a CAS provider is a Distributed Hash Table (DHT) 3

Motivation Use CAS as a performance enhancement when the file server is remote Convert data transfers from WAN to LAN CAS Providers File Server Client 4

Talk Outline Introduction The CASPER File System Building Blocks Recipes Jukeboxes Recipe Servers Architecture Benchmarks and Performance Fuzzy Matching Conclusions 5

The CASPER File System It can make use of any available CAS provider to improve read performance However it does not depend on the CAS providers In the absence of a useful CAS provider, you are no worse off than you originally were Writes are sent directly to the File Server and not to the CAS provider 6

Recipes 0x330c7eb274a4... 0xf13758906c8d... 0xe13b918d6a50... 0xf9d09794b6d7... 0x1deb72e98470... File Data Cryptographic Hash Content Recipe Addressable Name 7

Building Blocks: Recipes Description of objects in a content addressable way First class entity in the file system Can be cached Uses XML for data representation Compression used over the network Can be maintained lazily as they contain version information 8 lego.com

Recipe Example (XML) <recipe type="file"> <metadata> <version>00 00 01 04 01</version> </metadata> <recipe_choice> <hash_list hash_type="sha-1" block_type="variable" number="5"> <hash size="4189">330c7eb274a4...</hash> </hash_list> </recipe_choice> </recipe> 9

CASPER Architecture Client DFS Client WAN Connection DFS File Server Recipe Server LAN Connection DFS File Server Server CAS Provider Jukebox 10

Building Blocks: Jukeboxes Jukeboxes are abstractions of a Content Addressable Storage provider Provide access to data based on their hash value Provide no guarantee to consumers about persistence or reliability Support a Query() and Fetch() interface MultiQuery() and MultiFetch() also available Examples include your desktop, a departmental jukebox, P2P systems, etc. 11

Building Blocks: Recipe Server This module generates recipe representation of files present in the underlying file system Can be placed either on the Distributed File System server or on any other machine well connected to it Helps in maintaining consistency by informing the client of changes in files that it has reconstructed 12 lego.com

CASPER details The file system is based on Coda Whole file caching, open-close consistency Proxy based layering approach used Coda takes care of consistency, conflict detection, resolution, etc. The file server is the final authoritative source CASPER allows us to service cache misses faster that might be usually possible 13

CASPER Architecture Client Coda Client DFS Client DFS Proxy WAN Connection Recipe Server LAN Connection Coda DFS File Server Server CAS Provider Jukebox 14

CASPER Implementation 2. Recipe Request 1. File Request Client 3. Recipe Response Coda Client DFS Proxy WAN Connection Recipe Server LAN Connection 4. CAS Request 5. CAS Response 6. Missed Block Request 7. Missed Block Response Coda File Server Server CAS Provider Jukebox 15

Talk Outline Introduction The CASPER File System Building Blocks Recipes Jukeboxes Recipe Servers Architecture Benchmarks and Performance Fuzzy Matching Conclusions 16

Experimental Setup Client Jukebox NIST Net Router File Server + Recipe Server WAN bandwidth limitations between the server and client were controlled using NIST Net 10 Mb/s, 1 Mb/s + 10ms, and 100 Kb/s + 100ms The hit-ratio on the jukebox was set to 100%, 66%, 33%, and 0% Clients began with a cold cache (no files or recipes) 17

Benchmarks Binary Install Benchmark Virtual Machine Migration Modified Andrew Benchmark 18

Benchmark Description Binary Install (RPM based) Installed RPMs for the Mozilla 1.1 browser 6 RPMs, Total size of 13.5 MB Virtual Machine migration Time taken to resume a migrated Virtual Machine and execute a MS Office-based benchmark Trace accesses ~1000 files @ 256 KB each No think time modeled 19

Benchmark Description (II) Modified Andrew Benchmark Phases include: Create Directory, Copy, Scan Directory, Read All, and Make However, only the Copy phase will exhibit an improvement Uses Apache 1.3.27 11.36 MB source tree, 977 files 53% of files are less than 4 KB and 71% of them are less than 8 KB in size 20

Mozilla (RPM) Install 1.4 1.2 Time normalized against vanilla Coda Normalized Runtime 1 0.8 0.6 0.4 0.2 Gain most pronounced at lower bandwidth Very low overhead seen for these experiments (between 1-5 %) 0 100 Kb/s 1 Mb/s 10 Mb/s 100% 66% 33% 0% Baseline 1238 sec 150 sec 44 sec 21

Virtual Machine Migration 1.4 1.2 Time normalized against vanilla Coda Normalized Runtime 1 0.8 0.6 0.4 0.2 Large amounts of data show benefit even at higher bandwidths High overhead seen at 10 Mb/s is an artifact of data buffering 0 100 Kb/s 1 Mb/s 10 Mb/s 100% 66% 33% 0% Baseline 21523 sec 2046 sec 203 sec 22

Andrew Benchmark 1.4 1.2 1.5 1.61.8 Only the Copy phase shows benefits Normalized Runtime 1.0 0.8 0.6 0.4 0.2 More than half of the files are fetched over the WAN without talking to the jukebox because of their small size 0.0 100 Kb/s 1 Mb/s 10 Mb/s 100% 66% 33% 0% Baseline 1150 sec 103 sec 13 sec 23

Commonality The question of where and how much commonality can be found is still open However, there are a number of applications that will benefit from this approach Some of the applications that would benefit include: Virtual Machine Migration Binary Installs and Upgrades Software Development 24

Mozilla binary commonality 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% mozilla- 16 mozilla- 17 mozilla- 18 mozilla- 19 mozilla- 20 mozilla- 21 mozilla- 22 mozilla- 23 mozilla- 24 mozilla- 25 mozilla-16 mozilla-17 mozilla-18 mozilla-19 mozilla-20 mozilla-21 mozilla-22 mozilla-23 mozilla-24 mozilla-25 25

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Linux kernel commonality 2.2 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 2.2.16 2.2.17 2.2.18 2.2.19 2.2.20 2.2.21 2.2.22 2.2.23 2.2.24 2.2.1 2.2.0 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 2.2.16 2.2.17 2.2.18 2.2.19 2.2.20 2.2.21 2.2.22 2.2.23 2.2.24 26 2.2.0

Related Work Delta Encoding rsync, HTTP, etc. Distributed File Systems NFS, AFS, Coda, etc. P2P Content Addressable Networks Chord, Pastry, Freenet, CAN, etc. Hash based storage and file systems Venti, LBFS, Ivy, EMC s Centera, Farsite, etc. 27

Conclusions Introduction of the concept of recipes Proven benefits of opportunistic use of content providers by traditional distributed file systems on WANs Introduced Fuzzy Matching 28

29

Backup Slides 30

Where did the time go? For the Andrew benchmark Reconstruction of a large number of small files takes 4 roundtrips There is also the overhead of compression, verification, etc. Some part of the system (CAS requests) can be optimized by performing work in parallel 31

Number of round trips 1. Recipe Request Client Recipe Response Coda Client DFS Proxy WAN Connection Recipe Server LAN Connection 2 & 3 CAS Request CAS Response 4. Missed Block Request Missed Block Response Coda File Server Server CAS Provider Jukebox 32

Absolute Andrew Jukebox Hit-Ratio 100 Kb/s Network Bandwidth 1 Mb/s 10 Mb/s 100% 261.3 (0.5) 40.3 (0.9) 17.3 (1.2) 66% 520.7 (0.5) 64.0 (0.8) 20.0 (1.6) 33% 762.7 (0.5) 85.0 (0.8) 21.3 (0.5) 0% 1069.3 (1.7) 108.7 (0.5) 23.7 (0.5) Baseline 1150.7 (0.5) 103.3 (0.5) 13.3 (0.5) Andrew Benchmark: Copy Performance (sec) 33

NFS Implementation? The benchmark results would not change significantly (with the possible exception of the Virtual Machine migration benchmark). It is definitely possible to adopt a similar approach In fact, an NFS proxy (without CASPER) exists. However, the semantics of such a system are still unclear 34

Fuzzy Matching Question: Can we convert an incorrect block into what we need? If there is a block that is near to what is needed, treat it as a transmission error Fix it by applying an error-correcting code Fuzzy Matching needs three components Exact hash Fuzzy hash ECC information 35

Fuzzy Hashes File Data 4 7 8 9 2 0 Shingleprints 3 Sort 0 2 3 4 7 8 9 Fuzzy Hash 36

Fuzzy Hashing and ECCs A fuzzy hash could simply be a shingle Hash a number of features with a sliding window Take the first m hashes after sorting to be representative of the data These m shingles are used to find near blocks After finding a similar block, an ECC that tolerates a number of changes could be applied to recover the original block Definite tradeoff between recipe size and Fuzzy Matching but this approach is promising 37

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Linux kernel commonality - 2.4 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 2.4.16 2.4.17 2.4.18 2.4.19 2.4.20 2.4.0 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 2.4.16 2.4.17 2.4.18 2.4.19 2.4.20 38 2.4.1 2.4.0

Linux kernel 2.2 (B/W) 39