Simplifying Wide-Area Application Development with WheelFS

Similar documents
Flexible, Wide-Area Storage for Distributed Systems Using Semantic Cues

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Flexible, Wide-Area Storage for Distributed Systems with WheelFS

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Distributed File Systems

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

The Application Layer HTTP and FTP

Dynamic Metadata Management for Petabyte-scale File Systems

The Google File System

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

GFS: The Google File System

Outline. ASP 2012 Grid School

Democratizing Content Publication with Coral

Distributed File Systems II

Democratizing Content Publication with Coral

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

The Google File System (GFS)

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University

Google File System. Arun Sundaram Operating Systems

CA485 Ray Walshe Google File System

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

HDFS Architecture Guide

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

The Google File System

CSE 124: Networked Services Lecture-16

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Scalability of web applications

Complex Interactions in Content Distribution Ecosystem and QoE

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

User-Relative Names for Globally Connected Personal Devices

Understanding StoRM: from introduction to internals

Engineering Distributed Systems

Engineering Distributed Systems. experiences, lessons, and suggestions

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

Google File System (GFS) and Hadoop Distributed File System (HDFS)

ATS Summit: CARP Plugin. Eric Schwartz

Efficient On-Demand Operations in Distributed Infrastructures

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen

Opportunistic Use of Content Addressable Storage for Distributed File Systems

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Distributed Fine-Grained Node Localization in Ad-Hoc Networks. A Scalable Location Service for Geographic Ad Hoc Routing. Presented by An Nguyen

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Scaling Out Key-Value Storage

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Distributed Systems Principles and Paradigms

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

Cluster-Level Google How we use Colossus to improve storage efficiency

The Google File System. Alexandru Costan

Lecture 11 Hadoop & Spark

Google File System. By Dinesh Amatya

Staggeringly Large File Systems. Presented by Haoyan Geng

What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT

Caching. Caching Overview

MapReduce. U of Toronto, 2014

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

NPTEL Course Jan K. Gopinath Indian Institute of Science

Distributed Systems 16. Distributed File Systems II

GFS: The Google File System. Dr. Yingwu Zhu

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

The Google File System

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

CONSISTENCY IN DISTRIBUTED SYSTEMS

Distributed Filesystem

Chapter 6: Distributed Systems: The Web. Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al.

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Staggeringly Large Filesystems

The Google File System

The Google File System

Introduction to Cloud Computing

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

CSE 124: Networked Services Fall 2009 Lecture-19

Advanced Data Management Technologies

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Computer Networks. Wenzhong Li. Nanjing University

an Object-Based File System for Large-Scale Federated IT Infrastructures

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

BigData and Map Reduce VITMAC03

The Google File System

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

CS 655 Advanced Topics in Distributed Systems

10. Replication. Motivation

Don t Give Up on Distributed File Systems

Kaisen Lin and Michael Conley

A Low-bandwidth Network File System

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

Transcription:

Simplifying Wide-Area Application Development with WheelFS Jeremy Stribling In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris MIT CSAIL & New York University

Resources Spread over Wide-Area Net PlanetLab Google datacenters China grid 2

Grid Computations Share Data s in a distributed computation share: Program binaries Initial input data Processed output from one node as intermediary input to another node 3

So Do Users and Distributed Apps Apps aggregate disk/computing at hundreds of nodes Example apps Content distribution networks (CDNs) Backends for web services Distributed digital research library All applications need distributed storage 4

State of the Art in Wide-Area Storage Existing wide-area file systems are inadequate Designed only to store files for users E.g., No hundreds of nodes can write files to the same dir E.g., Strong consistency at the cost of availability Each app builds its own storage! Distributed Hash Tables (DHTs) ftp, scp, wget, etc. 5

Current Solutions Testbed/Grid Copy foo FileCentral foo File Server Usual drawbacks: All data flows through one node File systems are too transparent Mask failures Incur long delays 6

If We Had a Good Wide-Area FS? CDN write( /a/foo ) Wide-area FS read( /a/foo ) CDN write( /a/bar ) File bar File foo FS makes building apps simpler 7

Why Is It Hard? Apps care a lot about performance WAN is often the bottleneck High latency, low bandwidth Transient failures How to give app control without sacrificing ease of programmability? 8

Our Contribution: WheelFS Suitable for wide-area apps Gives app control through cues over: Consistency vs. availability tradeoffs Data placement Timing, reliability, etc. Prototype implementation 9

Talk Outline Challenges & our approach Basic design Application control Running a Grid computation over WheelFS 10

What Does a File System Buy You? Re-use existing software Simplify the construction of new applications A hierarchical namespace A familiar interface Language-independent usage 11

Why Is It Hard To Build a FS on WAN? High latency, low bandwidth 100s ms instead of 1s ms latency 10s Mbps instead of 1000s Mbps bandwidth Transient failures are common 32 outages over 64 hours across 132 paths [Andersen 01] 12

What If Grid Uses AFS over WAN? GRID node (AFS client) GRID node (AFS client) GRID node (AFS client) AFS server a.dat Blocks forever under failure despite available cached copy Potentially unnecessary data transfer 13

Design Challenges High latency Store data close to where it is needed Low wide-area bandwidth Avoid wide-area communication if possible Transient failures are common Cannot block all access during partial failures Only applications have the needed information! 14

WheelFS Gives Apps Control Goal Can apps control how to handle failures? AFS,NFS, GFS Total network transparency X WheelFS Application control Can apps control data placement? X 15

WheelFS: Main Ideas Apps control Apps embed semantic cues to inform FS about failure handling, data placement... Good default policy Write locally, strict consistency 16

Talk Outline Challenges & our approach Basic design Application control Running a Grid computation over WheelFS 17

File Systems 101 Basic FS operations: Name resolution: hierarchical name flat id open( /wfs/a/foo, ) id: 1235 Data operations: read/write file data read(1235, ) write(1235, ) Namespace operations: add/remove files or dirs mkdir( /wfs/b, ) 18

File Systems 101 id: 0 a: 246 b: 357 id: 246 file2: 468 file3: 579 id: 357 file1: 135 id: 135 19

Distribute a FS across nodes id: 0 a: 246 b: 357 Must automatically configure node membership Must locate files/dirs using ids id: 357 file1: 135 id: 246 id: 135 file2: 468 file3: 579 Must keep files versioned to distinguish new and old 20

Basic Design of WheelFS 653 076 135 id 135 150 554 id id 135 135 v2 v3 Consistency Servers 135 135 v3 v2 135 076 150 257 402 554 653 402 257 135 135 135 v2 v3 21

Default Behavior: Write Locally, Strict Consistency Write locally: Store newly created files at the writing node Writes are fast Transfer data lazily for reads when necessary Strict consistency: data behaves as in a local FS Once new data is written and the file is closed, the next open will see the new data 22

1. Choose an ID 2. Create dir entry 3. Write local file File 550 (bar) Write Locally 554 653 Create foo/bar bar = 550 076 Read foo/bar 150 550 402 257 Dir 209 (foo) 23

Strict Consistency All writes go to the same node All reads do too File 135? 653 076 Write File 135 554 402 Write File 135 257 150 File File 135 135 v2 24

Talk Outline Challenges & our approach Basic design Application control Running a Grid computation over WheelFS 25

WheelFS Gives Apps Control with Cues Apps want to control consistency, data placement... How? Embed cues in path names Flexible and minimal interface change /wfs/cache/.cue/a/b/ /wfs/cache/a/b/.cue/foo Coarse-grained: Cues apply recursively over an entire subtree of files Fine-grained: Cues can apply to a single file 26

Eventual Consistency: Reads Read any version of the file you can find In a given time limit 653 076 File 135? Cached 135 554 150 File 135 402 257 27

Eventual Consistency: Writes Write to any replica of the file 653 076 Write File 135 554 402 Write File 135 257 135 135 v2 150 File File 135 135 v2 28

Handle Read Hotspots 1. Contact node 2. Receive list 3. Get chunks Cached 135 Cached Chunk 135 554 653 402 Read file 135 Read file 135 076 257 076 076 554 653 653 Cached Chunk 135 150 File 135 Chunk 29

WheelFS Cues Name Purpose Control consistency Eventual- Consistency MaxTime= Control whether reads must see fresh data, and whether writes must be serialized Specify time limit for operations Large reads Hint about data placement HotSpot Site=, = This file will be read simultaneously by many nodes, so use p2p caching Hint which node or group of nodes a file should be stored Other types of cues: Durability, system information, etc. 30

Example Use of Cues: Content Distribution Networks CDNs prefer availability over consistency Apache Caching Proxy wfs node Apache Caching Proxy wfs node Apache Caching Proxy wfs node Apache Caching Proxy wfs node blocks under failure with default strong consistency If $url exists in cache dir read $url from WheelFS else get page from web server store page in WheelFS One line change in Apache config file: /wfs/cache/$url 31

Example Use of Cues: CDN Apache proxy handles potentially stale files well The freshness of cached web pages can be determined from saved HTTP headers Cache dir: /wfs/cache/.eventualconsistency /.HotSpot Tells WheelFS to read a cached file even when the corresponding file server cannot be contacted Tells WheelFS to write the file data anywhere even when the corresponding file server cannot be contacted Tells WheelFS to read data from the nearest client cache it can find 32

Example Use of Cues: BLAST Grid Computation DNA alignment tool run on Grids Copy separate DB portions and queries to many nodes Run separate computations Later fetch and combine results Read binary using.hotspot Write output using.eventualconsistency 33

Talk Outline Challenges & our approach Basic design Application control Running a Grid computation over WheelFS 34

Experiment Setup Up to 16 nodes run WheelFS on Emulab 100Mbps access links 12ms delay 3 GHz CPUs nr protein database (673 MB), 16 partitions 19 queries sent to all nodes 35

BLAST Achieves Near-Ideal Speedup on WheelFS 16 14 12 Speedup 10 8 6 4 2 BLAST on WheelFS Ideal speedup 0 0 2 4 6 8 10 12 14 16 Number of nodes 36

Related Work Cluster FS: Farsite, GFS, xfs, Ceph Wide-area FS: JetFile, CFS, Shark Grid: LegionFS, GridFTP, IBP POSIX I/O High Performance Computing Extensions 37

Conclusion A WAN FS simplifies app construction FS must let app control data placement & consistency WheelFS exposes such control via cues Building apps is easy with WheelFS 38