GlobalFS: A Strongly Consistent Multi-Site Filesystem

Similar documents
Architekturen für die Cloud

Changing Requirements for Distributed File Systems in Cloud Storage

Global Data Plane. The Cloud is not enough: Saving IoT from the Cloud & Toward a Global Data Infrastructure PRESENTED BY MEGHNA BAIJAL

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Basic vs. Reliable Multicast

Distributed File Systems II

Replication in Distributed Systems

Clouds are complex so they fail. Cloud File System Security and Dependability with SafeCloud-FS

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Map-Reduce. Marco Mura 2010 March, 31th

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

DIVING IN: INSIDE THE DATA CENTER

Google is Really Different.

Designing Fault-Tolerant Applications

EECS 498 Introduction to Distributed Systems

Lecture 6 Consistency and Replication

CA485 Ray Walshe Google File System

an Object-Based File System for Large-Scale Federated IT Infrastructures

Distributed Systems. GFS / HDFS / Spanner

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Building Consistent Transactions with Inconsistent Replication

A Journey to DynamoDB

Migrating Oracle Databases To Cassandra

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

11. Replication. Motivation

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Dynamo: Amazon s Highly Available Key-Value Store

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

Building Consistent Transactions with Inconsistent Replication

Handling Big Data an overview of mass storage technologies

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Intra-cluster Replication for Apache Kafka. Jun Rao

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

The Google File System

Eventual Consistency 1

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

The Google File System. Alexandru Costan

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Towards Transparent Integration of Heterogeneous Cloud Storage Platforms

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

CAP Theorem, BASE & DynamoDB

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

CS 655 Advanced Topics in Distributed Systems

10. Replication. Motivation

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Chapter 24 NOSQL Databases and Big Data Storage Systems

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering

HPSS Treefrog Introduction.

Replication and Consistency. Fall 2010 Jussi Kangasharju

Cloud Computing & Visualization

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Distributed Filesystem

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CIT 668: System Architecture. Amazon Web Services

Extreme Computing. NoSQL.

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Riak. Distributed, replicated, highly available

Applications of Paxos Algorithm

DISTRIBUTED COMPUTER SYSTEMS

Consistency & Replication

Replication. Feb 10, 2016 CPSC 416

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Cloud & AWS Essentials Agenda. Introduction What is the cloud? DevOps approach Basic AWS overview. VPC EC2 and EBS S3 RDS.

Azure Cosmos DB Technical Deep Dive

/ Cloud Computing. Recitation 8 March 1 st, 2016

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

A BigData Tour HDFS, Ceph and MapReduce

CSAL: A CLOUD STORAGE ABSTRACTION LAYER TO ENABLE PORTABLE CLOUD APPLICATIONS

What is a distributed system?

Achieving the Potential of a Fully Distributed Storage System

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Replication and Consistency

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Eventual Consistency Today: Limitations, Extensions and Beyond

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

CS5412: DIVING IN: INSIDE THE DATA CENTER

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Enhancing Throughput of

FOR A WALL STREET INVESTMENT BANK JOSH WEST SOLUTIONS ARCHITECT RED HAT FINANCIAL SERVICES

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Exploring Amazon RDS MySQL Second Tier Read Replica

A Design for Networked Flash

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

AWS 101. Patrick Pierson, IonChannel

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Transcription:

GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivière Pascal Felber RainbowFS Workshop May 3rd, 2017

Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications Distributed Storage GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Coordination Systems Caches File Systems GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Coordination Systems Caches File Systems Easy interoperability for existing aplications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

Global infrastructure Amazon s AWS global infrastructure GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 3

CAP theorem Weak Consistency Lower latency Higher availability Possibly incorrect/unexpected results Strong Consistency Clear semantics and guarantees Easier to reason about Block instead of providing incorrect results GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 4

What is GlobalFS? Geographically distributed filesystem Familiar interface (POSIX) Strong consistency Fault-tolerance through replication Flexible performance through locality GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 5

Overall design Separate data and metadata Partial replication Metadata protocol exploiting atomic multicast Causal reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 6

Separate data and metadata Immutable data Variable sized blobs Metadata Controls file contents, properties and filesystem structure Metadata refers to data blobs 1 2 3 4 GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 7

Partial replication Immutable data is simple to replicate consistently Metadata is partitioned between replica groups (i.e., partitions) GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 8

Partial replication US EU SA GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 9

Partial replication US EU / www bin etc home SA alice bob mark GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 10

Partial replication US EU / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 11

Partial replication US EU Global Replication / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 12

Partial replication US EU Global Replication / www bin etc home SA alice bob mark Local US multicast SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 13

US Partial replication Global multicast (global replication) - costly updates - Global fast local Replication reads / EU www bin etc home SA alice bob mark Local US multicast SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 14

Partial ordering GlobalFS exploits atomic multicast Atomic delivery to groups of processes Partial ordering: messages for different groups don t have to be ordered betweem themselves Partial ordering is critical for scalability GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 15

Architecture Metadata replicas Send read or update commands Atomic multicast Application Client (FUSE) Data store Insert or fetch immutable data GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 16

Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17

Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update Req Single-partition Reply Req Uncoordinated multi-partition Reply Req Coordinated multi-partition Reply G 1 G 1 G 1 G 2 G 2 G 2 write to file in G 1 write to file in {G 1,G 2 } move file from G 1 to G 2 Atomic Multicast Execution GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17

Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg Client B Opens the pets.html page and finds a broken image reference Where is the cat? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

Causal read operations Step 1 Contact a metadata replica for a list of blob ids Step 2 Get the data from the data store Approach inspired by vector clocks Vector is composed of one counter per replica group GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 19

Evaluation Complete prototype in Java https://github.com/pacheco/globalfs Filesystem in Userspace (FUSE) URingPaxos for atomic multicast Global deployment using Amazon EC2 GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 20

Maximum throughput by operation GlobalFS throughput Operations/sec 60000 50000 40000 30000 20000 10000 0 read 1KB GlobalFS CalvinFS Locality local write 1KB local create 1KB 1800 1600 1400 1200 1000 800 600 400 200 0 glob. write 1KB glob. create 1KB 3 region deployment US west, US east and Europe GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 21

Geographical scalability 1 Region 3 Regions 6 Regions 9 Regions Geographical Scalability 1 0.8 0.6 0.4 0.2 16081 ops 6882 ops 3072 ops Ideal read 1KB create write 1KB Normalized throughput per region as more regions are added 9 regions uses all EC2 regions available at the time GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 22

GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23

GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations Thank you! Leandro Pacheco pachecol@usi.ch GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23