CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

Size: px
Start display at page:

Download "CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout"

Transcription

1 CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout

2 Improved update performance of storage systems with master-back replication Fast: updates complete before replication to backups Safe: save RPC requests and retry if master crashes Two variants: CGAR-C: save RPC requests in client library CGAR-W: save RPC requests in a different server (Witness) Performance Result Overview RAMCloud: 0.5x latency, 4x throughput Redis: strongly consistent (cost: 12% latency é) Slide 2

3 CGAR s Role in Platform Lab Granular Computing Platform Cluster Scheduling Low-Latency RPC Scalable Notifications Thread/ App Mgmt Hardware Accelerat ors Low- Latency Storage CGAR Slide 3

4 Consistency in Master- Master-backup replication: client send updates to a master and master replicate state to backups. Consistency after crash Responses for update operations must wait for backup replications (synchronous replication) Must not reveal non-replicated value client write x = 1 X: 1 ok X: 1 ok X: 1 Master Slide 4

5 Waiting for Replication is Not Cheap Synchronous replication increases latency of updates Alternative: asynchronous replication Non-replicated data can be lost Sacrifice consistency if master crashes Enables batched replication (more efficient) Client Master Processing time for RAMCloud WRITE operation 4 µs 3 µs 8 µs Asynchronous update: 7 µs Slide 5

6 Consistency over Performance: RAMCloud RAMCloud uses synchronous replication Consistent even after crash Write: 14.3 µs vs. Read: 5 µs Focused on minimizing latency while consistent Polling wait for replication è Write throughput is only 18% of read throughput Client Write Ok Master(s) s Durable Log Write Slide 6

7 Performance over Consistency: Redis Redis uses asynchronous replication to a file in disk Default: fsync every second Lose data if a master crashes Option for strong consistency: fsync-always On SSDs, 1~2 ms delay Without fsync, SET takes 25 µs. Client SET Ok Server Memory Fsync Log File Server Disk Can we have both consistency and performance? Slide 7

8 Consistency Guaranteed Asynchronous Replication Asynchronous Replication à performance For consistency Save RPC requests in 3 rd -party server (Witness) Replay RPCs in Witness if master crashes Witness Client RPC Master recover Slide 8

9 Witness Record Operation Client multicasts RPC request to master and witness Witness vouches the RPC will be retried if master crash write x = 1 Witness (8MB) client X: 1 Master async X: 0 Slide 9

10 Recovery Steps of CGAR-W Step 1: recover from backups Step 2: retry update RPCs in witness write x = 1 Witness retry X: 01 Y: 7 New Master client recover replicate X: 1 Y: 7 Master X: 0 Y: 7 1 Slide 10

11 Challenges in Using Witness for Recovery Witness may receive RPCs in a different order than master Solution: witness saves only 1 record per key Concurrent operations on same key? Witness rejects all but first Retry may re-execute an RPC Solution: use RIFL to ignore already completed RPC. Update may depend on unreplicated value in master Master cannot assume witness saved the RPC request Solution: delay update if current value is not yet replicated Slide 11

12 Example: RPCs in a Different Order Witness write x = 2 Client Red write x = 1 Client Blue write x = 2 Master 1 Slide 12

13 Example: RPCs in a Different Order Witness write x = 2 Client Red Must wait for replication write x = 1 Client Blue ok Can complete as soon as master returns write x = 2 Master Slide 13

14 Garbage Collection Witness must drop a record before accepting new one with same key client write x = 2 accept write x = 1 client Witness Drop write x = 1 [use RPC ids assigned by RIFL] X: 1 Master async X: 1 Slide 14

15 Using Multiple Witnesses A system can use multiple witnesses per each master Higher availability (recovery can use any witnesses) To use async update, all witnesses must accept client write x = 1 write x = 1 Witnesses X: 1 Master Slide 15

16 Evaluation of CGAR Ø RAMCloud implementation Performance improvement Latency reduction Ø Redis implementation Supports wide range of operations Slide 16

17 RAMCloud s Latency after CGAR Writes are issued sequentially by a client to a master Median 14.3 μs Median 6.6 μs, 7.1 μs Slide 17

18 RAMCloud s Throughput after CGAR Batching replication improved throughput Slide 18

19 Making Redis Consistent with Small Cost SET: write to key-value store HMSET: write to a member of hashmap INCR: increment an integer counter Slide 19

20 Conclusion Fast: updates don t wait for replication Consistent: CGAR saves RPC requests in witness; If server crashes, retry the saved RPCs to recover High throughput: replication can be batched Slide 20

21 Questions Slide 21

22 Latency under Skewed Workloads YCSB-A: Zipfian dist (1M objects, p = 0.99) Slide 25

23 CGAR Decoupled Replication from Update Delay replication RPC s completion time Slide 26

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering

More information

Making RAMCloud Writes Even Faster

Making RAMCloud Writes Even Faster Making RAMCloud Writes Even Faster (Bring Asynchrony to Distributed Systems) Seo Jin Park John Ousterhout Overview Goal: make writes asynchronous with consistency. Approach: rely on client Server returns

More information

Exploiting Commutativity For Practical Fast Replication

Exploiting Commutativity For Practical Fast Replication Exploiting Commutativity For Practical Fast Replication Seo Jin Park Stanford University John Ousterhout Stanford University Abstract Traditional approaches to replication require client requests to be

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman) DRAM in Storage

More information

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout) a Storage System

More information

Lightning Talks. Platform Lab Students Stanford University

Lightning Talks. Platform Lab Students Stanford University Lightning Talks Platform Lab Students Stanford University Lightning Talks 1. MC Switch: Application-Layer Load Balancing on a Switch Eyal Cidon and Sean Choi 2. ReFlex: Remote Flash == Local Flash Ana

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout Goals for the OS Interface More convenient abstractions than hardware interface Manage shared resources Provide near-hardware

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1 Introduction Distributed low-latency in-memory key-value stores are

More information

Tailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes

Tailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes Tailwind: Fast and Atomic RDMA-based Replication Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes In-Memory Key-Value Stores General purpose in-memory key-value stores are widely used nowadays

More information

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015 Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015 Datacenters: Scale and Latency Scale: 1M+ cores 1-10 PB memory 200 PB disk storage Latency: < 0.5 µs speed-of-light delay Most

More information

Millisort: An Experiment in Granular Computing. Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout

Millisort: An Experiment in Granular Computing. Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout Millisort: An Experiment in Granular Computing Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout Massively Parallel Granular Computing Massively parallel computing as an application of granular

More information

At-most-once Algorithm for Linearizable RPC in Distributed Systems

At-most-once Algorithm for Linearizable RPC in Distributed Systems At-most-once Algorithm for Linearizable RPC in Distributed Systems Seo Jin Park 1 1 Stanford University Abstract In a distributed system, like RAM- Cloud, supporting automatic retry of RPC, it is tricky

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Harsha V. Madhyastha Multiple updates and reliability Data must survive crashes and power outages Assume: update of one block atomic and durable Challenge:

More information

Low-Latency Datacenters. John Ousterhout

Low-Latency Datacenters. John Ousterhout Low-Latency Datacenters John Ousterhout The Datacenter Revolution Phase 1: Scale How to use 10,000 servers for a single application? New storage systems: Bigtable, HDFS,... New models of computation: MapReduce,

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Lecture 18: Reliable Storage

Lecture 18: Reliable Storage CS 422/522 Design & Implementation of Operating Systems Lecture 18: Reliable Storage Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Amazon Aurora: Amazon s New Relational Database Engine Carlos Conde Technology Evangelist @caarlco 2015, Amazon Web Services,

More information

High Performance Transactions in Deuteronomy

High Performance Transactions in Deuteronomy High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

Arachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout

Arachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Arachne Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Granular Computing Platform Zaharia Winstein Levis Applications Kozyrakis Cluster Scheduling Ousterhout Low-Latency RPC

More information

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/

More information

Outline. Failure Types

Outline. Failure Types Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 10 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University Outline Introduction and Motivation Our Design System and Implementation

More information

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services

More information

Log-structured Memory for DRAM-based Storage

Log-structured Memory for DRAM-based Storage Abstract Log-structured Memory for DRAM-based Storage Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout Stanford University (Draft Sept. 24, 213) Traditional memory allocation mechanisms are not

More information

COLLIN LEE INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM

COLLIN LEE INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM COLLIN LEE INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM GOAL OF THIS TALK Introduce design ideas and issues for a granular computing

More information

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication

More information

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care

More information

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Kevin Beineke, Florian Klein, Michael Schöttner Institut für Informatik, Heinrich-Heine-Universität Düsseldorf Outline

More information

VoltDB vs. Redis Benchmark

VoltDB vs. Redis Benchmark Volt vs. Redis Benchmark Motivation and Goals of this Evaluation Compare the performance of several distributed databases that can be used for state storage in some of our applications Low latency is expected

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements

More information

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018 416 Distributed Systems Distributed File Systems 1: NFS Sep 18, 2018 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples NFS: network file system AFS:

More information

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY SparkStreaming Large scale near- realtime stream processing Tathagata Das (TD) UC Berkeley UC BERKELEY Motivation Many important applications must process large data streams at second- scale latencies

More information

STAR: Scaling Transactions through Asymmetric Replication

STAR: Scaling Transactions through Asymmetric Replication STAR: Scaling Transactions through Asymmetric Replication arxiv:1811.259v2 [cs.db] 2 Feb 219 ABSTRACT Yi Lu MIT CSAIL yilu@csail.mit.edu In this paper, we present STAR, a new distributed in-memory database

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems

More information

EMC RecoverPoint. EMC RecoverPoint Support

EMC RecoverPoint. EMC RecoverPoint Support Support, page 1 Adding an Account, page 2 RecoverPoint Appliance Clusters, page 3 Replication Through Consistency Groups, page 4 Group Sets, page 22 System Tasks, page 24 Support protects storage array

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:

More information

DAL: A Locality-Optimizing Distributed Shared Memory System

DAL: A Locality-Optimizing Distributed Shared Memory System DAL: A Locality-Optimizing Distributed Shared Memory System Gábor Németh, Dániel Géhbeger and Péter Mátray Ericsson Research {gabor.a.nemeth, daniel.gehberger, peter.matray}@ericsson.com HotCloud '17 2017-07-10

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much

More information

2. PICTURE: Cut and paste from paper

2. PICTURE: Cut and paste from paper File System Layout 1. QUESTION: What were technology trends enabling this? a. CPU speeds getting faster relative to disk i. QUESTION: What is implication? Can do more work per disk block to make good decisions

More information

Network File System (NFS)

Network File System (NFS) Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 19 th October, 2009 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent

More information

Efficiently Backing up Terabytes of Data with pgbackrest. David Steele

Efficiently Backing up Terabytes of Data with pgbackrest. David Steele Efficiently Backing up Terabytes of Data with pgbackrest PGConf US 2016 David Steele April 20, 2016 Crunchy Data Solutions, Inc. Efficiently Backing up Terabytes of Data with pgbackrest 1 / 22 Agenda 1

More information

Network File System (NFS)

Network File System (NFS) Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 14 th October 2015 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent

More information

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store Yiming Zhang, Rui Chu @ NUDT Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Haitao Wu @ MSRA June, 2012 1 Background Disk-based storage

More information

Rocksteady: Fast Migration for Low-latency In-memory Storage

Rocksteady: Fast Migration for Low-latency In-memory Storage Rocksteady: Fast Migration for Low-latency In-memory Storage Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman University of Utah ABSTRACT Scalable in-memory key-value stores

More information

Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal

Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal With Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang and John Ousterhout Conjecture Can a key value store support strongly consistent

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

Gnothi: Separating Data and Metadata for Efficient and Available Storage Replication

Gnothi: Separating Data and Metadata for Efficient and Available Storage Replication Gnothi: Separating Data and Metadata for Efficient and Available Storage Replication Yang Wang, Lorenzo Alvisi, and Mike Dahlin The University of Texas at Austin {yangwang, lorenzo, dahlin}@cs.utexas.edu

More information

Implementing Linearizability at Large Scale and Low Latency

Implementing Linearizability at Large Scale and Low Latency Implementing Linearizability at Large Scale and Low Latency Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout Stanford University, NEC Abstract Linearizability is the strongest

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

ni.com Decisions Behind the Design: LabVIEW for CompactRIO Sample Projects

ni.com Decisions Behind the Design: LabVIEW for CompactRIO Sample Projects Decisions Behind the Design: LabVIEW for CompactRIO Sample Projects Agenda Keys to quality in a software architecture Software architecture overview I/O safe states Watchdog timers Message communication

More information

Amazon Aurora Relational databases reimagined.

Amazon Aurora Relational databases reimagined. Amazon Aurora Relational databases reimagined. Ronan Guilfoyle, Solutions Architect, AWS Brian Scanlan, Engineer, Intercom 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Current

More information

Efficiently Backing up Terabytes of Data with pgbackrest

Efficiently Backing up Terabytes of Data with pgbackrest Efficiently Backing up Terabytes of Data with pgbackrest David Steele Crunchy Data PGDay Russia 2017 July 6, 2017 Agenda 1 Why Backup? 2 Living Backups 3 Design 4 Features 5 Performance 6 Changes to Core

More information

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

No compromises: distributed transactions with consistency, availability, and performance

No compromises: distributed transactions with consistency, availability, and performance No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS TABLE OF CONTENTS Introduction 3 Multi-Tenant Logging and Storage Layer with Service-Oriented Architecture 3 High Availability with Self-Healing

More information

Warm standby done right. Heikki Linnakangas / Pivotal

Warm standby done right. Heikki Linnakangas / Pivotal Warm standby done right Heikki Linnakangas / Pivotal This presentation About built-in tools Not about repmgr, WAL-e etc. You probably should use those tools though! Not about monitoring, heartbeats etc.

More information

Amazon Aurora Deep Dive

Amazon Aurora Deep Dive Amazon Aurora Deep Dive Enterprise-class database for the cloud Damián Arregui, Solutions Architect, AWS October 27 th, 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enterprise

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication

More information

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam TokuDB internals Percona team, Vlad Lesin, Sveta Smirnova Slides plan Introduction in Fractal Trees and TokuDB Files Block files Fractal

More information

sinfonia: a new paradigm for building scalable distributed systems

sinfonia: a new paradigm for building scalable distributed systems sinfonia: a new paradigm for building scalable distributed systems marcos k. aguilera arif merchant mehul shah alistair veitch christos karamanolis hp labs hp labs hp labs hp labs vmware motivation 2 corporate

More information

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE, VAIBHAV ARORA, REBECCA TAFT, ANDY PAVLO, DIVY AGRAWAL, AMR EL ABBADI Higher OLTP Throughput Demand for High-throughput

More information

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino Performance Monitoring AlwaysOn Availability Groups Anthony E. Nocentino aen@centinosystems.com Anthony E. Nocentino Consultant and Trainer Founder and President of Centino Systems Specialize in system

More information

MySQL HA Solutions Selecting the best approach to protect access to your data

MySQL HA Solutions Selecting the best approach to protect access to your data MySQL HA Solutions Selecting the best approach to protect access to your data Sastry Vedantam sastry.vedantam@oracle.com February 2015 Copyright 2015, Oracle and/or its affiliates. All rights reserved

More information

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina, PushyDB Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina, osong}@mit.edu https://github.com/jeffchan/6.824 1. Abstract PushyDB provides a more fully featured database that exposes

More information

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University Chapter 11 Implementing File System Da-Wei Chang CSIE.NCKU Source: Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University Outline File-System Structure

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2 CMU 18-746/15-746 Storage Systems 20 April 2011 Spring 2011 Exam 2 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino Performance Monitoring AlwaysOn Availability Groups Anthony E. Nocentino aen@centinosystems.com Anthony E. Nocentino Consultant and Trainer Founder and President of Centino Systems Specialize in system

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:

More information

Blizzard: A Distributed Queue

Blizzard: A Distributed Queue Blizzard: A Distributed Queue Amit Levy (levya@cs), Daniel Suskin (dsuskin@u), Josh Goodwin (dravir@cs) December 14th 2009 CSE 551 Project Report 1 Motivation Distributed systems have received much attention

More information

[537] RAID. Tyler Harter

[537] RAID. Tyler Harter [537] RAID Tyler Harter Review Disks/Devices Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O Disks Doing an I/O requires:

More information

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University Indexing in RAMCloud Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout Stanford University RAMCloud 1.0 Introduction Higher-level data models Without sacrificing latency and scalability Secondary

More information

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion Stoica UC BERKELEY Motivation Many important

More information

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari Making Non-Distributed Databases, Distributed Ioannis Papapanagiotou, PhD Shailesh Birari Dynomite Ecosystem Dynomite - Proxy layer Dyno - Client Dynomite-manager - Ecosystem orchestrator Dynomite-explorer

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

Improve Web Application Performance with Zend Platform

Improve Web Application Performance with Zend Platform Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching

More information

Redis as a Reliable Work Queue. Percona University

Redis as a Reliable Work Queue. Percona University Redis as a Reliable Work Queue Percona University 2015-02-12 Introduction Tom DeWire Principal Software Engineer Bronto Software Chris Thunes Senior Software Engineer Bronto Software Introduction Introduction

More information

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino Performance Monitoring AlwaysOn Availability Groups Anthony E. Nocentino aen@centinosystems.com TUGA IT 2017 LISBON, PORTUGAL THANK YOU TO OUR SPONSORS Anthony E. Nocentino Consultant and Trainer Founder

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information