A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

Similar documents
Shen PingCAP 2017

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Distributed PostgreSQL with YugaByte DB

TiDB: NewSQL over HBase.

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

Introduction to MySQL InnoDB Cluster

Everything You Need to Know About MySQL Group Replication

Introduction to Database Services

MySQL High Availability Solutions. Alex Poritskiy Percona

TokuDB vs RocksDB. What to choose between two write-optimized DB engines supported by Percona. George O. Lorch III Vlad Lesin

Percona XtraDB Cluster

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Architecture of a Real-Time Operational DBMS

MySQL Architecture Design Patterns for Performance, Scalability, and Availability

MySQL High Availability

Which technology to choose in AWS?

Percona XtraDB Cluster MySQL Scaling and High Availability with PXC 5.7 Tibor Korocz

CISC 7610 Lecture 2b The beginnings of NoSQL

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

State of the Dolphin Developing new Apps in MySQL 8

App Engine: Datastore Introduction

Vitess. The Complete Story. Percona Live Data Performance Conference April 20, Sugu Sougoumarane, Anthony Yeh.

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

Database Solution in Cloud Computing

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Vitess on Kubernetes. followed by a demo of VReplication. Jiten Vaidya

MySQL & NoSQL: The Best of Both Worlds

W b b 2.0. = = Data Ex E pl p o l s o io i n

Hierarchical Chubby: A Scalable, Distributed Locking Service

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

The course modules of MongoDB developer and administrator online certification training:

DATABASE SCALE WITHOUT LIMITS ON AWS

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

What s new in Mongo 4.0. Vinicius Grippa Percona

Using the MySQL Document Store

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

PhxSQL: A High-Availability & Strong-Consistency MySQL Cluster. Ming

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Welcome to Virtual Developer Day MySQL!

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Introduction to Column Stores with MemSQL. Seminar Database Systems Final presentation, 11. January 2016 by Christian Bisig

Crescando: Predictable Performance for Unpredictable Workloads

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

RocksDB Key-Value Store Optimized For Flash

Parallel DBs. April 25, 2017

MySQL Cluster Web Scalability, % Availability. Andrew

Scaling with mongodb

Heckaton. SQL Server's Memory Optimized OLTP Engine

MySQL Group Replication in a nutshell

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

MySQL HA Solutions Selecting the best approach to protect access to your data

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

HOW CLUSTRIXDB RDBMS SCALES WRITES & READS

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun

Lessons from database failures

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

MapR Enterprise Hadoop

Migrating to Aurora MySQL and Monitoring with PMM. Percona Technical Webinars August 1, 2018

Data Transformation and Migration in Polystores

SQL Server 2014: In-Memory OLTP for Database Administrators

Course Content MongoDB

Percona XtraDB Cluster 5.7 Enhancements Performance, Security, and More

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

The Hazards of Multi-writing in a Dual-Master Setup

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

MyRocks Storage Engine Status Update. Sergei Petrunia MariaDB Meetup New York February, 2018

Migrating Oracle Databases To Cassandra

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

MyRocks in MariaDB. Sergei Petrunia MariaDB Tampere Meetup June 2018

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Using MHA in and out of the Cloud. Garrick Peterson Percona University, Toronto 2013

MongoDB Revs You Up: What Storage Engine is Right for You?

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

CONFIGURING SQL SERVER FOR PERFORMANCE LIKE A MICROSOFT CERTIFIED MASTER

What's New in MySQL 5.7?

Database Architectures

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Migrating to Vitess at (Slack) Scale. Michael Demmer Percona Live - April 2018

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

What s New in MySQL 5.7

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

A Fast and High Throughput SQL Query System for Big Data

Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Transcription:

A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP

About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source hacker Codis / TiDB / TiKV Golang / Python / Rust

What would you do when RDBMS is becoming the performance bottleneck of your backend service The amount of data stored in RDBMS is overwhelming You want to do some complex queries on a sharding cluster e.g. simple JOIN or GROUP BY Your application needs ACID transaction on a sharding cluster

TiDB Project - Goal SQL is necessary Transparent sharding and data movement 100% OLTP + 80% OLAP Transaction + Complex query Compatible with MySQL, at most cases 24/7 availability, even in case of datacenter outages Thanks to Raft consensus algorithm Open source, of course.

Agenda Technical overview of TiDB / TiKV Storage Distributed SQL Tools Real-world cases and benchmarks Demo

Architecture Stateless SQL Layer TiDB... TiDB... TiDB Metadata / Timestamp request grpc grpc Placement Driver (PD) TiKV... TiKV... TiKV... TiKV Raft Raft Raft Distributed Storage Layer grpc Control flow: Balance / Failover

Storage stack 1/2 TiKV is the underlying storage layer Physically, data is stored in RocksDB We build a Raft layer on top of RocksDB What is Raft? Written in Rust! TiKV API (grpc) Transaction MVCC Raft (grpc) RocksDB Transactional KV API (https://github.com/pingcap /tidb/blob/master/cmd/ben chkv/main.go) Raw KV API (https://github.com/pingc ap/tidb/blob/master/cmd /benchraw/main.go)

Storage stack 2/2 Data is organized by Regions Region: a set of continuous key-value pairs RPC (grpc) Transaction MVCC Raft RocksDB Region 1:[a-e] Region 1:[a-e] Region 2:[f-j] Region 1:[a-e] Region 3:[k-o] Region 2:[f-j] Region 3:[k-o] Region 2:[f-j] Raft group Region 4:[p-t] Region 5:[u-z] Region 3:[k-o] Region 4:[p-t] Region 5:[u-z] Region 4:[p-t] Region 5:[u-z] RocksDB... Instance RocksDB... Instance RocksDB... RocksDB... Instance Instance

Dynamic Multi-Raft What s Dynamic Multi-Raft? Dynamic split / merge Safe split / merge split Region 1.1:[a-c] Region 1:[a-e] split Region 1.2:[d-e]

Safe Split: 1/4 Raft group Region 1:[a-e] Region 1:[a-e] Region 1:[a-e] TiKV1 raft TiKV2 raft TiKV3 Leader Follower Follower

Safe Split: 2/4 Region 1.1:[a-c] Region 1.2:[d-e] raft Region 1:[a-e] TiKV2 raft Region 1:[a-e] TiKV3 TiKV1 Follower Follower Leader

Safe Split: 3/4 Split log (replicated by Raft) Region 1.1:[a-c] Region 1.2:[d-e] TiKV1 Split log Region 1:[a-e] TiKV2 Region 1:[a-e] TiKV3 Leader Follower Follower

Safe Split: 4/4 Region 1.1:[a-c] raft Region 1.1:[a-c] raft Region 1.1:[a-c] Region 1.2:[d-e] TiKV1 raft Region 1.2:[d-e] TiKV2 raft Region 1.2:[d-e] TiKV3 Leader Follower Follower

Scale-out (initial state) Node B Region 1* Region 1 Region 2 Region 1 Region 3 Node D Region 2 Region 2 Region 3 Region 3 Node A Node C

Scale-out (add new node) Node B Region 1* Region 1^ Region 2 Region 1 Region 3 Node D Region 2 Region 2 Node A Region 3 Region 3 Node C Node E 1) Transfer leadership of region 1 from Node A to Node B

Scale-out (balancing) Node B Region 1 Region 1* Region 2 Region 1 Region 3 Node D Region 2 Region 2 Region 3 Region 3 Node C Node A Node E Region 1 2) Add Replica on Node E

Scale-out (balancing) Node B Region 1* Region 2 Region 2 Region 2 Region 1 Region 3 Node D Node A Region 3 Node E Region 1 Region 3 Node C 3) Remove Replica from Node A

ACID Transaction Based on Google Percolator Almost decentralized 2-phase commit Timestamp Allocator Optimistic transaction model Default isolation level: Repeatable Read External consistency: Snapshot Isolation + Lock SELECT FOR UPDATE

Distributed SQL Full-featured SQL layer Predicate pushdown Distributed join Distributed cost-based optimizer (Distributed CBO)

TiDB SQL Layer overview

What happens behind a query CREATE TABLE t (c1 INT, c2 TEXT, KEY idx_c1(c1)); SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = percona ;

Query Plan Physical Plan on TiDB Final Aggregate SUM(COUNT(c1)) DistSQL Scan COUNT(c1) Physical Plan on TiKV (index scan) Partial Aggregate COUNT(c1) Filter c2 = percona Row Row COUNT(c1) COUNT(c1) COUNT(c1) Read Row Data by RowID RowID TiKV TiKV TiKV Read Index idx1: (10, + )

What happens behind a query CREATE TABLE left (id INT, email TEXT,KEY idx_id(id)); CREATE TABLE right (id INT, email TEXT, KEY idx_id(id)); SELECT * FROM left join right WHERE left.id = right.id;

Distributed Join (HashJoin)

Supported Distributed Join Type Hash Join Sort merge Join Index-lookup Join

No silver bullet (anti-patterns for TiDB SQL) Join between large tables without index or any hints Get distinct values from large tables without index Sort without index Result set is too large (forget LIMIT N?)

Best practices Random, massive, read / write workload No hot small table Use transaction, but not much conflicts

Tools matter Syncer TiDB-Binlog Mydumper/MyLoader(loader) Open sourced, too. https://github.com/pingcap/tidb-tools

Syncer Synchronize data from MySQL in real-time Hook up as a MySQL replica MySQL (master) Syncer MySQL binlog Syncer Syncer Syncer Rule Filter or TiDB Cluster Fake slave Save Point (disk) TiDB Cluster TiDB Cluster

TiDB-Binlog Subscribe the incremental data from TiDB Output Protobuf formatted data or MySQL Binlog format(wip) TiDB Server Pumper Cistern 3rd party applications TiDB Server Pumper Sorter Protobuf MySQL Binlog Another TiDB-Cluster MySQL TiDB Server Pumper

MyDumper / Loader Backup/restore in parallel Works for TiDB too Actually, we don t have our own data migration tool for now

Use case 1: OLTP + OLAP One of the most popular bike sharing companies in China 7-nodes TiDB cluster for order storage (OLTP). Hook up as MySQL Replica, synchronize data to a 10-nodes TiDB cluster for Ad-hoc OLAP.... Master Master Master Master Master syncer syncer syncer syncer syncer Slave cluster

Use case 1: Ad-hoc OLAP TiDB Elapse (3 nodes) MySQL Elapse 5.07699437s 19.93s 10.524703077s 43.23s 10.077812714s 43.33s 10.285957629s >20 mins 10.462306097s 36.81s 9.968078965s 1 min 0.27 sec 9.998030375s 44.05s 10.866549284s 43.18s

Use case 2: Distributed OLTP One of the biggest MMORPG game in China. 2.2 T, 18 nodes. Drop-in replacement for MySQL Distributed OLTP

Sysbench OS linux (ubuntu 14.04) CPU RAM DISK 28 ECUs, 8 vcpus, 2.8 GHz, Intel Xeon E5-2680v2 16 G 80 G (SSD) Notice: 3 replicas

Sysbench (Read) table count table size sysbench threads qps latency(avg/. 95) 3 nodes 16 1M rows 256 21899.59 11.69ms / 19.87ms 6 nodes 16 1M rows 256 41928.84 6.10ms / 10.96ms 9 nodes 16 1M rows 256 58044.80 4.41ms / 7.36ms

Sysbench (Read)

Sysbench (Insert) table count table size sysbench threads TPS latency(avg/. 95) 3 nodes 16 1M rows 256 6686.59 38.28ms / 78.21ms 6 nodes 16 1M rows 256 11448.08 22.36ms / 44.61ms 9 nodes 16 1M rows 512 14977.01 34.18ms / 86.85ms

Sysbench (Insert)

Roadmap TiSpark: Integrate TiKV with SparkSQL Better optimizer (Statistic && CBO) Json type and document store for TiDB MySQL 5.7.12+ X-Plugin Integrate with Kubernetes Operator by CoreOS

Thanks https://github.com/pingcap/tidb https://github.com/pingcap/tikv Contact me: huang@pingcap.com