MyRocks Engineering Features and Enhancements. Manuel Ung Facebook, Inc. Dublin, Ireland Sept th, 2017

Similar documents
How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

POLARDB for MyRocks Extending shared storage to MyRocks. Zhang, Yuan Alibaba Cloud Apr, 2018

RocksDB Key-Value Store Optimized For Flash

MyRocks in MariaDB. Sergei Petrunia MariaDB Tampere Meetup June 2018

MySQL Storage Engines Which Do You Use? April, 25, 2017 Sveta Smirnova

RocksDB Embedded Key-Value Store for Flash and RAM

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

MySQL 101. Designing effective schema for InnoDB. Yves Trudeau April 2015

TRANSACTIONS AND ABSTRACTIONS

TokuDB vs RocksDB. What to choose between two write-optimized DB engines supported by Percona. George O. Lorch III Vlad Lesin

Why Choose Percona Server For MySQL? Tyler Duzan

InnoDB: What s new in 8.0

MyRocks Storage Engine Status Update. Sergei Petrunia MariaDB Meetup New York February, 2018

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Chapter 8: Working With Databases & Tables

MySQL usage of web applications from 1 user to 100 million. Peter Boros RAMP conference 2013

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Practical MySQL indexing guidelines

Load Testing Tools. for Troubleshooting MySQL Concurrency Issues. May, 23, 2018 Sveta Smirnova

Switching to Innodb from MyISAM. Matt Yonkovit Percona

IBM DB2 UDB V7.1 Family Fundamentals.

MySQL 8.0 What s New in the Optimizer

MySQL Database Scalability

5. Single-row function

Distributed File Systems II

MongoDB Shell: A Primer

<Insert Picture Here> New MySQL Enterprise Backup 4.1: Better Very Large Database Backup & Recovery and More!

Mysql Manually Set Auto Increment To 1000

ALTER TABLE Improvements in MARIADB Server. Marko Mäkelä Lead Developer InnoDB MariaDB Corporation

Effective Testing for Live Applications. March, 29, 2018 Sveta Smirnova

How Oracle Does It. No Read Locks

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Scylla Open Source 3.0

HBase. Леонид Налчаджи

Transaction Management Chapter 11. Class 9: Transaction Management 1

DHRUBA BORTHAKUR, ROCKSET PRESENTED AT PERCONA-LIVE, APRIL 2017 ROCKSDB CLOUD

MongoDB Storage Engine with RocksDB LSM Tree. Denis Protivenskii, Software Engineer, Percona

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

Oracle 1Z0-882 Exam. Volume: 100 Questions. Question No: 1 Consider the table structure shown by this output: Mysql> desc city:

Databases - Transactions II. (GF Royle, N Spadaccini ) Databases - Transactions II 1 / 22

Distributed PostgreSQL with YugaByte DB

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

TRANSACTIONS OVER HBASE

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Covering indexes. Stéphane Combaudon - SQLI

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Shen PingCAP 2017

MySQL JSON. Morgan Tocker MySQL Product Manager. Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Database Management and Tuning

MongoDB Revs You Up: What Storage Engine is Right for You?

Oracle Database: Introduction to SQL/PLSQL Accelerated

Seminar 3. Transactions. Concurrency Management in MS SQL Server

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Ed Lynch IBM. Monday, May 8, :00 p.m. 02:10 p.m. Platform: DB2 for z/os & LUW

Why we re excited about MySQL 8

Oracle Syllabus Course code-r10605 SQL

Instant ALTER TABLE in MariaDB Marko Mäkelä Lead Developer InnoDB

PolarDB. Cloud Native Alibaba. Lixun Peng Inaam Rana Alibaba Cloud Team

XA Transactions in MySQL

CSE 530A ACID. Washington University Fall 2013

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

CSE 530A. Inheritance and Partitioning. Washington University Fall 2013

Troubleshooting Locking Issues. Sveta Smirnova Principal Technical Services Engineer May, 12, 2016

Monitoring and Resolving Lock Conflicts. Copyright 2004, Oracle. All rights reserved.

InnoDB: What s new in 8.0

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Migrating to XtraDB Cluster 2014 Edition

Introduction to SQL/PLSQL Accelerated Ed 2

Data Access 3. Managing Apache Hive. Date of Publish:

Module 15: Managing Transactions and Locks

Why Choose Percona Server for MongoDB? Tyler Duzan

Cassandra 1.0 and Beyond

Oracle Database: Introduction to SQL

Heckaton. SQL Server's Memory Optimized OLTP Engine

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011

Upgrading Databases. without losing your data, your performance or your mind. Charity

Replication features of 2011

Mysql Insert Manual Timestamp Into Datetime Field

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Application Development Best Practice for Q Replication Performance

Index. Symbol function, 391

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Amazon Aurora. User Guide

Weak Levels of Consistency

Automating Information Lifecycle Management with

Oracle Database: SQL and PL/SQL Fundamentals

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Mastering the art of indexing

InnoDB Compression Present and Future. Nizameddin Ordulu Justin Tolmer Database

Table of Contents. Oracle SQL PL/SQL Training Courses

The Right Read Optimization is Actually Write Optimization. Leif Walsh

Firebird in 2011/2012: Development Review

MySQL Schema Review 101

MySQL Performance Tuning 101

Developing SQL Databases (762)

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Transcription:

MyRocks Engineering Features and Enhancements Manuel Ung Facebook, Inc. Dublin, Ireland Sept 25 27 th, 2017

Agenda Bulk load Time to live (TTL) Debugging deadlocks Persistent auto-increment values Improved transactions 2

Bulk Load

Sorted Bulk Load RocksDB usual writes bulk load t1 Memtable Memtable Memtable SST Datafile SST SST SST bulk load t2 SST SST SST SET ROCKSDB_BULK_LOAD = 1; to enable. RocksDB feature SST FileWriter. Bypass memtable, writes go directly to SST files. Keys must be added in ascending or descending order (no SKs) 4

Fast Secondary Key Creation RocksDB ALTER TABLE ADD INDEX SST SST SST Primary key tmpfile SST SST SST Secondary key Integrate SST Filewriter into ALTER TABLE ADD INDEX. Disable secondary keys during initial table load. Add them back after. 5

Unsorted Bulk Load INSERT INTO t... RocksDB tmpfile SST SST SST Primary key tmpfile SST SST SST Secondary key SET ROCKSDB_BULK_LOAD_ALLOW_UNSORTED = 1; No need to drop secondary keys INSERTs can occur out of primary key order 6

Time to Live (TTL)

Time to Live (TTL) Some workloads have datasets that should expire after some time. One solution: add create-time column and issue delete through daily job. Requires CPU for processing delete query. Adds delete markers slowing down scans. With RocksDB, we can leverage compaction filter for this. Compaction filter is already used for dropping tables. Respond immediately to request to drop table. Actual data is removed when compaction occurs. 8

DDL Syntax Implicit timestamp: CREATE TABLE t1 (a INT, b INT, c INT, PRIMARY KEY (a)) ENGINE=ROCKSDB COMMENT "ttl_duration=3600;"; Explicit timestamp: CREATE TABLE t2 (a INT, b INT, c INT, ts BIGINT UNSIGNED NOT NULL, PRIMARY KEY (a)) ENGINE=ROCKSDB COMMENT "ttl_duration=3600;ttl_col=ts;"; 9

Row Format INSERT INTO t1 (a, b, c) VALUES (1,10,20); t1-pk 1 TTL-now 10 20 INSERT INTO t2 (a, b, c, ts) VALUES (3,30,35, 1490000000); t2-pk 3 1490000000 30 35 1490000000 TTL field of create-time added to each table row. Implicit timestamp uses row insertion time. Explicit timestamp uses value from column specified by ttl_col. 10

Read Filtering Rows might disappear during a transaction if they expire while the transaction is active. Remove only rows that expired before than oldest snapshot. Filter rows on read based on snapshot creation time. This is a problem for repeatable read. 11

Read Filtering Repeatable Read ttl_duration: 1000 Time 0 1000 2000 3000 4000 Transaction 1 Transaction 2 Compaction INSERT INTO t VALUES (1) INSERT INTO t VALUES (2) BEGIN; SELECT * from t Compaction removes row 1 and keeps 2 SELECT * from t SELECTs sees row 2 only because row 1 is filtered out from result set. timestamp row 1 < timestamp current ttl_duration Compaction keeps row 2 despite it being expired already. timestamp row 2 >= timestamp oldest snapshot ttl_duration 12

TTL with Secondary Keys Read filtering makes secondary keys with TTL possible. Implicit timestamp: CREATE TABLE t1 (a INT, b INT, c INT, PRIMARY KEY (a), KEY(b)) ENGINE=ROCKSDB COMMENT "ttl_duration=3600;"; Explicit timestamp: CREATE TABLE t2 (a INT, b INT, c INT, ts BIGINT UNSIGNED NOT NULL, PRIMARY KEY (a), KEY(b)) ENGINE=ROCKSDB COMMENT "ttl_duration=3600;ttl_col=ts;"; 13

Debugging Deadlocks

Snapshot Conflicts vs Deadlocks Both snapshot conflicts and deadlocks return ER_LOCK_DEADLOCK. Snapshot conflicts Happens during REPEATABLE READ when multiple transactions modify same row. Deadlock found when trying to get lock; try restarting transaction (snapshot conflict) Deadlocks Happens when multiple transactions lock rows in different orders. Deadlock found when trying to get lock; try restarting transaction Get most recent deadlocks from SHOW ENGINE ROCKSDB TRANSACTION STATUS; Number of deadlocks stored controlled by rocksdb_max_latest_deadlocks 15

Latest Detected Deadlocks mysql> SHOW ENGINE ROCKSDB TRANSACTION STATUS; ----------LATEST DETECTED DEADLOCKS---------- *** DEADLOCK PATH ========================================= TRANSACTION ID: 2 COLUMN FAMILY NAME: default WAITING KEY: 0000010580000001 LOCK TYPE: EXCLUSIVE INDEX NAME: PRIMARY TABLE NAME: test.t ---------------WAITING FOR--------------- TRANSACTION ID: 1 COLUMN FAMILY NAME: default WAITING KEY: 0000010580000002 LOCK TYPE: EXCLUSIVE INDEX NAME: PRIMARY TABLE NAME: test.t ---------------WAITING FOR--------------- TRANSACTION ID: 2 COLUMN FAMILY NAME: default WAITING KEY: 0000010580000001 LOCK TYPE: EXCLUSIVE INDEX NAME: PRIMARY TABLE NAME: test.t Transaction 1 Transaction 2 BEGIN; SELECT * FROM t WHERE i = 1 FOR UPDATE; SELECT * FROM t WHERE i = 2 FOR UPDATE; (deadlock) BEGIN; SELECT * FROM t WHERE i = 2 FOR UPDATE; SELECT * FROM t WHERE i = 1 FOR UPDATE; 16 --------TRANSACTION ID: 1 GOT DEADLOCK--------- -----------------------------------------

Persistent Auto-increment Values

Auto-increment values Auto-increment values are not persisted (both InnoDB and RocksDB) InnoDB behavior fixed in MySQL 8.0 RocksDB fixed by storing maximum id in data dictionary STATEMENT CREATE TABLE t (i int AUTO_INCREMENT PRIMARY KEY); INSERT INTO t VALUES (NULL); 1 INSERT INTO t VALUES (NULL); 2 INSERT INTO t VALUES (NULL); 3 DELETE FROM t; # Restart server INSERT INTO t VALUES (NULL); 1 LAST_INSERT_ID 18

Data Dictionary 0x9 INDEX_ID VERSION AUTO_INC ID Maximum auto-increment ID is stored in data dictionary. Keyed by primary key index ID of the table containing auto-increment column. Makes use of RocksDB feature merge operator. 19

Merge Operator tx1 INSERT INTO t VALUES (NULL); PUT(1) : MERGE(IDX_ID) : 1 COMMIT Memtable tx2 INSERT INTO t VALUES (NULL); PUT(2) : MERGE(IDX_ID) : 2 COMMIT MERGE(IDX_ID) : 2 MERGE(IDX_ID) : 3 MERGE(IDX_ID) : 1 tx3 INSERT INTO t VALUES (NULL); PUT(3) : MERGE(IDX_ID) : 3 COMMIT 20

Merge Operator GET(IDX_ID) Memtable MERGE(IDX_ID) : 2 MERGE(IDX_ID) : 3 MO(2, 3) VALUE : 3 MO(3, 1) GET(IDX_ID) VALUE : 3 MERGE(IDX_ID) : 1 21

Improved Transactions

Problems Low throughput Commit stalls Memory footprint 23

Transactions per second Low Throughput Separate queues for prepare and commit Decrease queue latency for commits Linkbench FlushWAL Avoids fwrite syscall latency in commit path http://rocksdb.org/blog/2017/08/25/flushwal.html 64 32 16 8 Threads Before After 24

Commit Stalls Move memtable write from commit to prepare. Less work done during commit time. Higher throughput Large transactions won t stall the server Work in progress. 25

Memory Footprint Move memtable write from prepare to put. Uncommitted data will be written into the database without needing to buffer in memory. Work in progress. 26

Additional Information

GitHub https://github.com/facebook/mysql-5.6 Currently based on 5.6.35 Welcome feedback and contributions! 28

29 Q&A