RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

Similar documents
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

At-most-once Algorithm for Linearizable RPC in Distributed Systems

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University

A Brief Introduction to Key-Value Stores

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

Low-Latency Datacenters. John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

FaRM: Fast Remote Memory

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store

Log-structured Memory for DRAM-based Storage

CS3600 SYSTEMS AND NETWORKS

Data Analytics on RAMCloud

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

contributed articles With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Memshare: a Dynamic Multi-tenant Key-value Cache

The Google File System

Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout

4 Myths about in-memory databases busted

Making RAMCloud Writes Even Faster

CS 318 Principles of Operating Systems

Distributed File Systems II

BlueDBM: An Appliance for Big Data Analytics*

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Chapter 11: Implementing File Systems

Copysets: Reducing the Frequency of Data Loss in Cloud Storage

CPS 512 midterm exam #1, 10/7/2016

CS 318 Principles of Operating Systems

Chapter 11: Implementing File

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CLOUD-SCALE FILE SYSTEMS

CS 655 Advanced Topics in Distributed Systems

Scaling Distributed Machine Learning with the Parameter Server

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Lightning Talks. Platform Lab Students Stanford University

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 12, 2016 ISSN (online):

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

OPERATING SYSTEM. Chapter 12: File System Implementation

NPTEL Course Jan K. Gopinath Indian Institute of Science

Week 12: File System Implementation

Exploiting Commutativity For Practical Fast Replication

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

An Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

File System Implementation

High-Level Data Models on RAMCloud

Chapter 12: File System Implementation

File System Implementations

Lecture 18: Reliable Storage

No Compromises. Distributed Transactions with Consistency, Availability, Performance

Chapter 10: File System Implementation

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

CS370 Operating Systems

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Atlas: Baidu s Key-value Storage System for Cloud Data

GFS: The Google File System. Dr. Yingwu Zhu

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Dept. Of Computer Science, Colorado State University

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Chapter 12: File System Implementation

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

SMORE: A Cold Data Object Store for SMR Drives

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Deduplication File System & Course Review

Durability for Memory-Based Key-Value Stores

Chapter 12: File System Implementation

VOLTDB + HP VERTICA. page

Memory-Based Cloud Architectures

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

Cluster-Level Google How we use Colossus to improve storage efficiency

Distributed File Systems

CSE 153 Design of Operating Systems

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CA485 Ray Walshe Google File System

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Dalí: A Periodically Persistent Hash Map

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Improved Solutions for I/O Provisioning and Application Acceleration

GFS: The Google File System

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Leveraging Flash in HPC Systems

Transcription:

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout)

a Storage System that provides: Scale Data size: 10 PB Accessible by 100,000 nodes (10 Million cores) Uniform fast random access time to all data 100 B read: 2 µs RPC 100 B write: 5 µs RPC Durable and available What if you had February 07, 2013 RAMCloud Slide 2

RAMCloud General-purpose storage system All data always in DRAM Scale: 1000 10000 servers, 1 PB data Performance goals: High throughput: 1M ops/sec/server Low-latency access: 5-10µs RPC Durable and available Potential impact: enable new class of applications Primary motivation: Web sphere Maybe HPC? February 07, 2013 RAMCloud Slide 3

DRAM in Storage Systems Facebook: 200 TB total data 150 TB cache! memcached UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM Main-memory DBs, again 1970 1980 1990 2000 2010 February 07, 2013 RAMCloud Slide 4

DRAM in Storage Systems DRAM usage specialized/limited Clumsy (consistency with backing store) Lost performance (backing store, cache misses) Main-memory databases Web indexes entirely in DRAM Facebook: 200 TB total data 150 TB cache! memcached UNIX buffer cache Large file caches Main-memory DBs, again 1970 1980 1990 2000 2010 February 07, 2013 RAMCloud Slide 5

Dataset Size (TB) DRAM is cheaper! 10000 Lowest TCO from "Andersen et al., "FAWN: A Fast Array of Wimpy Nodes", Proc. 22nd Symposium on Operating System Principles, 2009, pp. 1-14. 1000 Disk 100 10 Flash 1 DRAM 0.1 0.1 1 10 100 1000 February 07, 2013 Query Rate (millions/sec) RAMCloud Slide 6

Application Servers Why Does Latency Matter? Traditional Application Web Application UI App. Logic Single machine Data Structures << 1µs latency 0.5-10ms latency UI App. Logic Datacenter Storage Servers Large-scale apps struggle with high latency Random access data rate has not scaled! Facebook: can only make 100-150 internal requests per page February 07, 2013 RAMCloud Slide 7

Application Servers RAMCloud Goal: Scale and Latency Traditional Application Web Application UI App. Logic Data Structures Single machine << 1µs latency 0.5-10ms latency 5-10µs Enable new class of applications UI App. Logic Datacenter Storage Servers February 07, 2013 RAMCloud Slide 8

RAMCloud Architecture 1000 100,000 Application Servers Commodity Servers Appl. Library Appl. Library Appl. Library Datacenter Network Appl. Library High-speed networking: 5 µs round-trip Full bisection bandwidth Coordinator Master Backup Master Backup Master Backup Master Backup DRAM 32-256 GB per server 1000 10,000 Storage Servers February 07, 2013 RAMCloud Slide 9

Data Model: Key-Value Store Tables read(tableid, key) => blob, version write(tableid, key, blob) => version cwrite(tableid, key, blob, version) => version delete(tableid, key) enumerate(tableid) Object Key ( 64KB) Version (64b) Blob ( 1MB) Richer model in the future: Indexes? Transactions? Graphs? February 07, 2013 RAMCloud Slide 10

Durability and Availability Goals: No impact on performance Minimum cost, energy Keep replicas in DRAM of other servers? 3x system cost, energy Still have to handle power failures RAMCloud approach: 1 copy in DRAM Backup copies on disk/flash: durability ~ free! Issues to resolve: Synchronous disk I/O s during writes?? Data unavailable after crashes?? February 07, 2013 RAMCloud Slide 11

Hash Table Buffered Logging Write request Buffered Segment Backup Buffered Segment Disk Disk Backup In-Memory Log Master Buffered Segment Backup Disk No disk I/O during write requests Log-structured: backup disks and master s memory Log cleaning February 07, 2013 RAMCloud Slide 12

Crash Recovery Server crashes: Must replay log to reconstruct data Crash recovery: Choose recovery master Backup reads log info from disk Transfers logs to recovery master Recovery master replays log Dead Master Recovery Master Meanwhile, data is unavailable RAMCloud approach: fast crash recovery 1-2 seconds for 100 GB of data Use system scale to get around bottlenecks Backups February 07, 2013 RAMCloud Slide 13

Fast Crash Recovery Scatter backup data across backups Divide each master s data into partitions Recover each partition on a separate recovery master Each backup divides its log data among recovery masters Dead Master Recovery Masters Backups February 07, 2013 RAMCloud Slide 14

RAMCloud Project Status Goal: build production-quality implementation Nearing 1.0-level release Current test cluster: 80 servers, 2 TB data High speed Infiniband networking Performance: 100 B read: 5.3 µs RPC 100 B write: 15 µs RPC Interested in finding applications for RAMCloud February 07, 2013 RAMCloud Slide 15

Is RAMCloud right for HPC apps? Properties of RAMCloud relevant to application developers: Durability and availability Key-value store Commodity hardware Read / write access latency Random access to small objects February 07, 2013 RAMCloud Slide 16

General-purpose storage system All data always in DRAM Designed for: Scale: 1000 10000 servers, 1 PB data Performance: 5-10µs RPC Durable and available Conclusion February 07, 2013 RAMCloud Slide 17

Is RAMCloud appropriate for HPC Applications? Durability and availability Key-value store Commodity hardware Read / write access latency Questions Random access to small objects One thing that we could change to make RAMCloud interesting to you! February 07, 2013 RAMCloud Slide 18

Thank you!