CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

Similar documents
Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Distributed PostgreSQL with YugaByte DB

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

CISC 7610 Lecture 2b The beginnings of NoSQL

Introduction to NoSQL Databases

Shen PingCAP 2017

Distributed Computation Models

CPS 512 midterm exam #1, 10/7/2016

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Distributed Data on Distributed Infrastructure. Claudius Weinberger & Kunal Kusoorkar, ArangoDB Jörg Schad, Mesosphere

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Database Architectures

MySQL High Availability Solutions. Alex Poritskiy Percona

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

Building Consistent Transactions with Inconsistent Replication

Modern Database Concepts

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Next-Generation Cloud Platform

Rule 14 Use Databases Appropriately

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Integrity in Distributed Databases

Distributed System. Gang Wu. Spring,2018

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

How to Scale MongoDB. Apr

@joerg_schad Nightmares of a Container Orchestration System

Replication in Distributed Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Virtual protection gets real

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

10. Replication. Motivation

Technical Note. Dell/EMC Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract

Best practices for OpenShift high-availability deployment field experience

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

DrRobert N. M. Watson

Become a MongoDB Replica Set Expert in Under 5 Minutes:

Microsoft SQL AlwaysOn and High Availability

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Everything You Need to Know About MySQL Group Replication

Database Architectures

There Is More Consensus in Egalitarian Parliaments

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Distributed Data Analytics Partitioning

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Exam 2 Review. October 29, Paul Krzyzanowski 1

Differentiating Your Datacentre in the Networked Future John Duffin

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

4 Myths about in-memory databases busted

Distributed Systems 8L for Part IB

MySQL HA Solutions Selecting the best approach to protect access to your data

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

NewSQL Without Compromise

Distributed Data Management Transactions

Parallel DBs. April 25, 2017

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Azure File Sync. Webinaari

MQ High Availability and Disaster Recovery Implementation scenarios

Datacenter replication solution with quasardb

A Guide to Architecting the Active/Active Data Center

Don t Give Up on Serializability Just Yet. Neha Narula

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech

Real-time Protection for Microsoft Hyper-V

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Big Data Analytics. Rasoul Karimi

HOW CLUSTRIXDB RDBMS SCALES WRITES & READS

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Intra-cluster Replication for Apache Kafka. Jun Rao

CA485 Ray Walshe NoSQL

SQL Server Availability Groups

HIGH-AVAILABILITY & D/R OPTIONS FOR MICROSOFT SQL SERVER

Outline. Spanner Mo/va/on. Tom Anderson

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

Distributed Data Management Replication

Scaling for Humongous amounts of data with MongoDB

MySQL Enterprise High Availability

CS5412: TRANSACTIONS (I)

There is a tempta7on to say it is really used, it must be good

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

BIG DATA AND CONSISTENCY. Amy Babay

Transcription:

CockroachDB on DC/OS Ben Darnell, CTO, Cockroach Labs

Agenda A cloud-native database CockroachDB on DC/OS Why CockroachDB Demo!

Cloud-Native Database

What is Cloud-Native? Horizontally scalable Individual machines aren t special Built to handle failures

Why? Adapt to Change Grow and shrink based on demands Easy, rapid deployment Reorganize to reduce load and latency Load follows the sun

Traditional Databases More comfortable scaling vertically than horizontally Primaries vs secondaries vs read-only replicas Failover is often manual Asynchronous replication may lose data

Cloud-Native CockroachDB Add nodes with automatic rebalancing Any node can serve any role Automatic failover with consistent replication

CockroachDB on DC/OS

DC/OS Provides an Ideal Environment Elastic allocation and scheduling Discovery, VIPs, and load balancing Weakly persistent local storage

Storage Dilemma RAID or network-attached storage is expensive CockroachDB supplies its own redundancy Naive container deployments could lose all local volumes at once DC/OS SDK provides persistent scheduling

DC/OS Package dcos package install cockroachdb Starts a 3-node cluster Add more later by changing NODE_COUNT variable Or one-click install through the web

Why CockroachDB? An open source SQL database for global cloud services.

Why CockroachDB? Distributed SQL Data integrity at global scale Multi-active availability Consistent transactions

Distributed SQL A database that grows with your application

What is Distributed SQL? SQL with ACID semantics Runs across multiple database servers Distributed storage Distributed compute

Distributed Storage Tables scale to any size No manual partitioning necessary Automatic rebalancing as machines are added

Distributed Execution Nodes with data compute their portion of results SELECT SUM(duration) FROM sessions

Flexible Scheduling Schedule CockroachDB nodes......near your application servers for lower latency...or on your application nodes when you have underutilized disks

Global Distribution Span three datacenters to survive major outages Put data close to your customers

Multi-Active Availability Survive disasters

Disaster Recovery, Version 1.0 DC1 DC2 microservices{ Remote Backup

Recovery 2.0: Active/Passive DC1 DC2 async Primary Secondary

Recovery 3.0: Active/Active DC1 DC2

Multi-Active Availability DC1 DC2 consensus consensus consensus DC3

Consistent Transactions Data Integrity at Global Scale

Consensus Replication A majority of replicas must agree to commit Failovers never lose committed data Raft consensus algorithm

Distributed Transactions Transactions can span rows, tables, even databases ACID semantics Serializable isolation

Demo

Status

Current Version: 1.0.6 First production-ready release in May 2017 Distributed SQL Multi-active availability Enterprise: Incremental, distributed backup/restore

Very Soon: 1.1 Ruggedization Diagnostic & debugging tools Performance SQL coverage Fast data import (CSV)

Up Next Performance SQL feature parity (JSONB) Change data capture (via Kafka) Global data architecture Enterprise: row-level partitioning

The End https://www.cockroachlabs.com https://github.com/cockroachdb/cockroach https://github.com/dcos/examples https://gitter.im/cockroachdb/cockroach

Appendix

Target Customers Is CockroachDB right for you?

Fastest growing companies in the European Gaming Company world are using CockroachDB to solve their global data challenges $2B+ gaming company that uses CockroachDB to improve performance and availability for their global customer base. $60B+ web services company that uses CockroachDB to power their knowledge graph Real-time gaming platform that uses CockroachDB as the enabling technology to deliver their PaaS Proprietary & Confidential

Is CockroachDB right for you? Global, scalable cloud services We hope this eventually means everyone Not for: Latency-sensitive writes Unstructured data Efficient OLAP processing (for now) Analytics in a non-sql language (mapreduce)

Correctness or Performance? We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions Authors of Google s Spanner paper

CockroachDB Internals Thorax, wings, eyes, legs,...

Layered architecture SQL Transactional KV Distribution Replication Storage

Data Distribution SQL Transactional KV Distribution Replication Storage

Data Distribution Data is a large collection of Key:Value pairs Data split into Ranges of Key:Value pairs Ranges are ~64 MB of data CockroachDB/Bigtable/HBase/Spanner

Data Distribution Options for finding data Hashing Order-Preserving

Order-preserving data distribution Pro: efficient scans over total ordering of data Con: requires additional indexing

Keyspace as Monolithic, Sorted Map

Ranges Form the Keyspace

Ranges Map to Nodes

Data Lookup

Consensus SQL Transactional KV Distribution Replication Storage

Consensus Replication Replicate to N (where N >= 3) nodes Commit happens when a quorum have written the data CockroachDB/Etcd/Spanner/Aurora/Zookeeper/...

Consensus Replication Raft is our consensus protocol of choice. Run one consensus group per range of data

Consensus Replication Put cherry Leader apricot banana blueberry cherry grape Follower apricot banana blueberry Follower apricot banana blueberry grape grape

Consensus Replication Put cherry Leader apricot Put cherry banana blueberry cherry grape Follower apricot banana blueberry Follower apricot banana blueberry grape grape

Consensus Replication Put cherry Leader apricot Put cherry banana blueberry cherry grape Write committed when 2 out of 3 nodes have written data Follower apricot banana blueberry cherry grape Follower apricot banana blueberry grape

Consensus Replication Put cherry Leader Put cherry apricot Ack banana blueberry cherry grape Ack Follower apricot banana blueberry Follower apricot banana blueberry cherry grape grape

Consensus Replication Put cherry Leader apricot Put cherry banana blueberry cherry grape Ack Follower apricot banana blueberry Follower apricot banana blueberry cherry cherry grape grape

Consensus Replication What happens during failover?

Consensus Replication: Failover Put cherry Leader apricot banana blueberry cherry grape Follower apricot banana blueberry Follower apricot banana blueberry grape grape

Consensus Replication: Failover Put cherry Leader Elect a new Leader apricot banana blueberry cherry grape Follower apricot banana blueberry Follower apricot banana blueberry grape grape

Consensus Replication: Failover Follower apricot Leader apricot banana blueberry On failover, only data written to a quorum is considered present Follower apricot banana grape banana blueberry blueberry cherry grape grape

Consensus Replication: Failover Read cherry Key not found Follower apricot Leader apricot banana blueberry Follower apricot banana grape banana blueberry blueberry cherry grape grape

Consensus Replication Consensus provides atomic writes But only for each range What about operations that hit multiple ranges?

Distributed Transactions SQL Transactional KV Distribution Replication Storage

Transactions Supports ACID semantics All-or-nothing Serializable [default] & Snapshot isolation levels

Distributed Transactions Transaction record Keyed by a transaction UUID Located on first written range Atomic commit or abort via write to txn record One phase commit fast path

Distributed Transactions 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) grape

Distributed Transactions 1. Begin Txn 1 Put cherry 2. Put mango Txn-1 Record Status: PENDING apricot cherry (txn 1) grape lemon mango (txn 1) orange

Distributed Transactions 1. Begin Txn 1 Put cherry 2. Put mango 3. Commit Txn 1 Txn-1 Record Status: COMMITTED apricot cherry (txn 1) grape lemon mango (txn 1) orange

Distributed Transactions 1. Begin Txn 1 Put cherry 2. Put mango 3. Commit Txn 1 4. Clean up intents Txn-1 Record Status: COMMITTED apricot cherry grape lemon mango orange

Distributed Transactions 1. Begin Txn 1 Put cherry 2. Put mango 3. Commit Txn 1 4. Clean up intents 5. Remove Txn 1 apricot cherry grape lemon mango orange

Distributed Transactions That s the happy case What about conflicting transactions?

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) grape

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) 1. Begin Txn 2 2. Get cherry grape

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) 1. Begin Txn 2 2. Get cherry Check Txn-1 record grape

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) 1. Begin Txn 2 2. Get cherry Check Txn-1 record Wait while PENDING grape

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry 2. Commit Txn1 Txn-1 Record Status: COMMITTED apricot cherry (txn 1) 1. Begin Txn 2 2. Get cherry Check Txn-1 record Wait while PENDING grape

Distributed Transactions (read conflict) 1. Begin Txn 1 Put cherry 2. Commit Txn1 Txn-1 Record Status: COMMITTED apricot cherry (txn 1) 1. Begin Txn 2 2. Get cherry Check Txn-1 record Wait while PENDING 3. Commit Txn2 grape

Distributed Transactions What about write / write conflicts?

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING apricot cherry (txn 1) grape

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING Txn-2 Record Status: PENDING 1. Begin Txn 2 2. Put cherry apricot cherry (txn 1) grape

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING Txn-2 Record Status: PENDING 1. Begin Txn 2 2. Put cherry Check Txn 1 record apricot cherry (txn 1) grape

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry Txn-1 Record Status: PENDING Txn-2 Record Status: PENDING 1. Begin Txn 2 2. Put cherry Check Txn 1 record Wait while PENDING apricot cherry (txn 1) grape

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry 2. Commit Txn 1 Txn-1 Record Status: COMMITTED Txn-2 Record Status: PENDING 1. Begin Txn 2 2. Put cherry Check Txn 1 record Wait while PENDING apricot cherry (txn 1) grape

Distributed Transactions (write conflict) 1. Begin Txn 1 Put cherry 2. Commit Txn 1 Txn-1 Record Status: COMMITTED Txn-2 Record Status: PENDING 1. Begin Txn 2 2. Put cherry Check Txn 1 record Resolve cherry (txn 1) Put cherry apricot cherry (txn 2) cherry grape

Click to edit title Click to edit text Second level Third level Fourth level» Fifth level