PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Similar documents
PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS: Yahoo! s Hosted Data Serving Platform

Extreme Computing. NoSQL.

NFSv4 as the Building Block for Fault Tolerant Applications

CS November 2017

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CS November 2018

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Distributed Systems. Tutorial 9 Windows Azure Storage

Asynchronous View Maintenance for VLSD Databases

MillWheel:Fault Tolerant Stream Processing at Internet Scale. By FAN Junbo

BigTable: A Distributed Storage System for Structured Data

Tools for Social Networking Infrastructures

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Rule 14 Use Databases Appropriately

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CA485 Ray Walshe Google File System

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CLOUD-SCALE FILE SYSTEMS

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

DIVING IN: INSIDE THE DATA CENTER

CSE 124: Networked Services Lecture-17

10. Replication. Motivation

MapReduce. U of Toronto, 2014

Distributed File Systems II

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

CS 347 Parallel and Distributed Data Processing

Research Faculty Summit Systems Fueling future disruptions

CS 347 Parallel and Distributed Data Processing

Coherence & WebLogic Server integration with Coherence (Active Cache)

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Chapter 24 NOSQL Databases and Big Data Storage Systems

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Distributed Filesystem

The Google File System

BigData and Map Reduce VITMAC03

CISC 7610 Lecture 2b The beginnings of NoSQL

GR Reference Models. GR Reference Models. Without Session Replication

Data Acquisition. The reference Big Data stack

Distributed System. Gang Wu. Spring,2018

Dynamo: Amazon s Highly Available Key-Value Store

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

The Google File System

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

ebay s Architectural Principles

Huge market -- essentially all high performance databases work this way

CS5412: OTHER DATA CENTER SERVICES

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google is Really Different.

MongoDB Distributed Write and Read

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

BigTable. CSE-291 (Cloud Computing) Fall 2016

SCALABLE WEB PROGRAMMING. CS193S - Jan Jannink - 2/02/10

The Google File System (GFS)

ebay Marketplace Architecture

A BigData Tour HDFS, Ceph and MapReduce

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Data Storage Revolution

Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro

Map-Reduce. Marco Mura 2010 March, 31th

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

FOR A WALL STREET INVESTMENT BANK JOSH WEST SOLUTIONS ARCHITECT RED HAT FINANCIAL SERVICES

Intra-cluster Replication for Apache Kafka. Jun Rao

Changing Requirements for Distributed File Systems in Cloud Storage

GFS: The Google File System. Dr. Yingwu Zhu

Distributed System. Gang Wu. Spring,2018

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

CS 655 Advanced Topics in Distributed Systems

The Google File System

Replication in Distributed Systems

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Integrity in Distributed Databases

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Couchbase Architecture Couchbase Inc. 1

Architekturen für die Cloud

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Current Topics in OS Research. So, what s hot?

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Datacenter replication solution with quasardb

Oracle E-Business Availability Options. Solution Series for Oracle: 2 of 5

Scaling with mongodb

Big Data Analytics. Rasoul Karimi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems 16. Distributed File Systems II

Scalability of web applications

Amazon Aurora Relational databases reimagined.

EECS 498 Introduction to Distributed Systems

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

MySQL Architecture Design Patterns for Performance, Scalability, and Availability

Amazon Aurora Deep Dive

Transcription:

PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013

What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically distributed Per-record consistency web apps, not complex queries

Goals and Requirements Scalability Response Time and Geographic Scope High Availability and Fault Tolerance Relaxed Consistency Guarantees 1. Scalability (architectural, handle periods of rapid growth) 2. Response Time and Geographic Scope (reads from nearby server -> low latency for users across the globe) 3. High Availability and Fault Tolerance (read & write availability, handle server failures, network partitions, power loss, etc)) 4. Relaxed Consistency Guarantees

Consistency Tradeoff between performance, availability, consistency Serializable transactions expensive in distributed systems Strong consistency not always important for web apps Want to make it easy to reason about consistency

Eventual Consistency Updates to photo metadata on social site U1: Remove his mother from the list of people who can view his photos U2: Post spring-break photos

Per-record timeline consistency All replicas of a record apply record updates in same order

API and Specified Consistency Read-any Read-critical(>=version) Read-latest Write Test-and-set-write(version)

Per-Record Timeline Consistency example U1: Remove his mother from the list of people who can view his photos U2: Post spring-break photos

Data Model Simplified relational data model Tables of records with attributes Blob data types w/ arbitrary structures Updates/deletes specify primary key Point/range access Parallel multi-get range has predicate no complex queries, no constraint enforcement

Tables and Tablets Tables (ordered, hash) Partitioned into tablets Hash more efficient at load balancing

Architecture Regions with identical components

Storage Units Physical data storage nodes API: GET/SET/SCAN

Tablet Controller Holds interval -> tablet mappings Remaps under load imbalance Handles failure

Tablet splitting and balancing

Router Routes requests Keeps tablet mapping cache on error from SU, updates cache

Message Broker (YMB) Persistently updates logs Guarantees in-order delivery - pub/sub Sends updates to master on error from SU, updates cache

Record-Level Mastering Each record has chosen master Master updated for locality Update Sent to master node Sent to YMB & committed Forwarded to slave nodes Tablet master selected for each tablet Ensures no duplicate inserts on primary key ~85% of reads/writes are with good locality/latency history of 3 masters kept - if changing, relocate master.

Failure and Recovery Copy lost tablets from another replica 1. Tablet controller requests from source tablet replica 2. Checkpoint message to YMB to ensure inflight updates reach source replica 3. Source tablet copied to new region Made possible by synchronized split boundaries

Other Features Scatter-gather engine Part of router Can support Top-K in range query Notifications Pub/sub support via YMB Hosted database service Balances capacity among added servers Automatic recovery Isolation between different workloads/applications (via different SU)

Experimental Results 1 router, 2 message brokers, 5 storage units High cost for inserts in non-master region

More Experimental results

Limitations No multi-record transactions Record-level consistency forces use of same model for in-order updates Poor latency guarantees Writes & consistent reads go to (possibly remote) master Optimized for read/write single records and small scans (tens or hundreds of records)

Other Criticisms Range scans don t scale Slow/expensive failure recovery Unclear how YMB works/scales On-record-at-a-time consistency not always enough Experiment not very large scale Is scale tested at all? Ordered table not tested at scale hot keys?

Future Work Bundled updates Multi-record consistency Relaxed consistency e.g. for major region outages Indexes and materialized view via update stream Batch-query processing

PNUTS Conclusion Rich database functionality and low latency at massive scale Async replication ensures low latency w/ geographic replication Per-record timeline consistency model YMB as replication mechanism + redo log Hosted service to minimize operation cost

Acknowledgements Information, figures, etc. PNUTS: Yahoo!'s Hosted Data Serving Platform, B. Cooper, et al. Consistency and tablet diagrams adapted/taken from Yahoo talk. http: //www.slideshare.net/smilekg1220/pnuts-12502407. Relevant source overview to help understand the material: http://the-papertrail.org/blog/yahoos-pnuts/.