PostgreSQL to MySQL A DBA's Perspective. Patrick

Similar documents
Optimizing Queries with EXPLAIN

MySQL Query Tuning 101. Sveta Smirnova, Alexander Rubin April, 16, 2015

Tired of MySQL Making You Wait? Alexander Rubin, Principal Consultant, Percona Janis Griffin, Database Evangelist, SolarWinds

MySQL 5.6: Advantages in a Nutshell. Peter Zaitsev, CEO, Percona Percona Technical Webinars March 6, 2013

Introduction to MySQL NDB Cluster. Yves Trudeau Ph. D. Percona Live DC/January 2012

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

Copyright 2017, Oracle and/or its aff iliates. All rights reserved.

OKC MySQL Users Group

MySQL 8.0 Optimizer Guide

Let Robots Manage Your Schema Without Killing All Humans. Jenni

Troubleshooting Slow Queries. Sveta Smirnova Principal Support Engineer April, 28, 2016

If Only I Could Find My Databases-Service Discovery with SmartStack and MySQL. Susanne Lehmann, Yelp

Writing High Performance SQL Statements. Tim Sharp July 14, 2014

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle

There And Back Again

Covering indexes. Stéphane Combaudon - SQLI

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012

Practical MySQL indexing guidelines

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Indexes - What You Need to Know

1Z MySQL 5 Database Administrator Certified Professional Exam, Part II Exam.

Oracle 1Z MySQL 5.6 Database Administrator. Download Full Version :

Query Optimization Percona, Inc. 1 / 74

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

What s New in MySQL 5.7

Mastering the art of indexing

Tools and Techniques for Index Design. Bill Karwin, Percona Inc.

MySQL Indexing. Best Practices for MySQL 5.6. Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA

PostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

MySQL 5.1 Past, Present and Future MySQL UC 2006 Santa Clara, CA

MySQL Performance Optimization

The Future of Postgres Sharding

Lecture 19 Query Processing Part 1

Practical Performance Tuning using Digested SQL Logs. Bob Burgess Salesforce Marketing Cloud

How to Use JSON in MySQL Wrong

The MySQL Query Cache

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Use Cases for Partitioning. Bill Karwin Percona, Inc

<Insert Picture Here> Upcoming Changes in MySQL 5.7 Morgan Tocker, MySQL Community Manager

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011

SQL and Semi-structured data with PostgreSQL

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

Query Optimizer MySQL vs. PostgreSQL

MySQL Database Scalability

Target Practice. A Workshop in Tuning MySQL Queries OSCON Jay Pipes Community Relations Manager, North America MySQL, Inc.

Major Features: Postgres 9.5

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Query Optimizer MySQL vs. PostgreSQL

The MariaDB/MySQL Query Executor In-depth. Presented by: Timour Katchaounov Optimizer team: Igor Babaev, Sergey Petrunia, Timour Katchaounov

State of MariaDB. Igor Babaev Notice: MySQL is a registered trademark of Sun Microsystems, Inc.

pgconf.de 2018 Berlin, Germany Magnus Hagander

MySQL Indexing. Best Practices. Peter Zaitsev, CEO Percona Inc August 15, 2012

SP-GiST a new indexing framework for PostgreSQL

Advanced query optimization techniques on large queries. Peter Boros Percona Webinar

Kathleen Durant PhD Northeastern University CS Indexes

Azure-persistence MARTIN MUDRA

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

ITS. MySQL for Database Administrators (40 Hours) (Exam code 1z0-883) (OCP My SQL DBA)

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

Advanced MySQL Query Tuning

Automatic MySQL Schema Management with Skeema. Evan Elias Percona Live, April 2017

MySQL 101. Designing effective schema for InnoDB. Yves Trudeau April 2015

Next-Generation Parallel Query

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Introduction to MySQL Cluster: Architecture and Use

MySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015

T-sql Check If Index Exists Information_schema

MTA Database Administrator Fundamentals Course

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

Eternal Story on Temporary Objects

Column Stores vs. Row Stores How Different Are They Really?

Performance improvements in MySQL 5.5

Discuss physical db design and workload What choises we have for tuning a database How to tune queries and views

Replication features of 2011

Mobile MOUSe MTA DATABASE ADMINISTRATOR FUNDAMENTALS ONLINE COURSE OUTLINE

Migrating Oracle Databases To Cassandra

Mining for insight. Osma Ahvenlampi, CTO, Sulake Implementing business intelligence for Habbo

MongoDB. David Murphy MongoDB Practice Manager, Percona

Query Optimization, part 2: query plans in practice

Comparing SQL and NOSQL databases

PostgreSQL/Jsonb. A First Look

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

PostgreSQL Replication 2.0

MySQL vs MariaDB. Where are we now?

MySQL Schema Review 101

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Ghislain Fourny. Big Data 5. Wide column stores

What is wrong with PostgreSQL? OR What does Oracle have that PostgreSQL should? Richard Stephan

MongoDB w/ Some Node.JS Sprinkles

Delegates must have a working knowledge of MariaDB or MySQL Database Administration.

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

NewSQL Databases. The reference Big Data stack

Introduction to Column Stores with MemSQL. Seminar Database Systems Final presentation, 11. January 2016 by Christian Bisig

Performance Enhancements In PostgreSQL 8.4

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Transcription:

PostgreSQL to MySQL A DBA's Perspective Patrick King @mr_mustash

Yelp s Mission Connecting people with great local businesses.

My Database Experience Started using Postgres 7 years ago Postgres 8.4 (released in 2009) 50+ TB OLAP database Started using MySQL 11 months, 22 days, and 2 hours ago First time using another RDBMS in a professional setting

Replication Schema Changes Query Plans Indexing Topics I'll Cover Today

Topics I Won't Cover Today

There is more that unites us than divides us.

MySQL at Yelp Monolithic LAMP stack dating back to 2004 Moving features and data out from the monolith and into services Hundreds of DBs/Schemas 15+ Schema Changes each week Achieved by using pt-osc 400 engineers with 100 interns, and 4 DBAs

MySQL at Yelp MySQL 5.6 Statement based replication Replication trees that are up to 5 nodes deep One "intermediate master" per datacenter Vertical sharding at the database level, no horizontal sharding of data across multiple machines No site downtime allowed for database maintenance Online re-mastering of a database cluster

MySQL at Yelp: Surprises No physical sharding or partitioning? Largest single table is 4B+ rows The number of schema changes we do each week Nested replication hierarchy MySQL replication in general

Postgres at Yelp Used by both Eat24 and Yelp Reservations Postgres 9.5 and 9.6 Monolithic data, very few services or service databases

Replication

Replication: Postgres Streaming replication Replicas are byte-for-byte copies of the master database Replicas are fully read-only hot_standby_feedback Write-Ahead Log (WAL) is used for both replication and crash recovery

Replication: MySQL Statement based replication Each insert/update/delete is logged in the binary logs after it is committed Replica pull changes from the binary logs and runs the same SQL statement No other communication between master and replica Allows for awesome architecture designs where replicas have partial data, or different indexes

Replication: Lessons Learned MySQL replicas only receive transaction after it has been committed on the master Long running statements (like update on non-indexed WHERE clauses) can take forever on the master, and then take forever on each replica

Replication: Lessons Learned MySQL Statement Based vs Postgres replication delay Frequent cause in MySQL: Large insert/update/deletes on the master database being ran by all replicas Frequent cause in Postgres: Long running selects on the replica locking rows/tables that need to be updated

Replication: Lessons Learned Long running transactions on the replica in Postgres cause the master to slow down due to hot_standby_feedback Long running transaction on the replica in MySQL cause replication delay on the replica

Schema Changes

Schema Changes: Postgres Most changes can just be performed with minimal table locking or replication concerns This is because Postgres is using WAL replication, so on-disk changes are shipped over to the replicas while they're happening on the master

Schema Changes: Postgres Exceptions to this include: Adding a column with a default value Changing a column type Adding an index (Use CREATE INDEX CONCURRENTLY instead)

Schema Changes: MySQL Tools like the pt-online-schema-change or gh-ost are required for safe changes during online operations MySQL does have some online schema changes, but we chose to use pt-osc for tables over 100MB

Schema Changes: MySQL This is especially true if using statement based replication as the ALTER statement will only be shipped to replicas after it completes on the master See Jenni Snyder's PL16 Talk "Let Robots Manage your Schema (without destroying all humans)"

Schema Changes: Lessons Learned There's no one correct way to do schema changes Pick the tool and method that are best for your environment

Query Plans

create table species_groups ( id_no serial PRIMARY KEY, species varchar(64) NOT NULL ); create table doctors ( id_no serial PRIMARY KEY, name varchar(64) NOT NULL, hire_date date NOT NULL, termination_date date ); create table doctors_species_groups ( doctor_no integer, species_groups_no );

select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no ;

pking@[local]:5432 [test]=# explain select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no; QUERY PLAN ------------------------------------------------------------------------------- -- Hash Join (cost=24.34..51.24 rows=143 width=152) Hash Cond: (t2.species_groups_no = t3.id_no) -> Hash Join (cost=4.22..29.15 rows=143 width=10) Hash Cond: (t1.id_no = t2.doctor_no) -> Seq Scan on doctors t1 (cost=0.00..16.00 rows=1000 width=10) -> Hash (cost=2.43..2.43 rows=143 width=8) -> Seq Scan on doctors_species_groups t2 (cost=0.00..2.43 rows=143 width=8) -> Hash (cost=14.50..14.50 rows=450 width=150) -> Seq Scan on species_groups t3 (cost=0.00..14.50 rows=450 width=150) (9 rows) Time: 2.681 ms

pking@[local]:5432 [test]=# explain analyze select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no; QUERY PLAN ---------------------------------------------------------------------------------------------- ------------------------------------------- Hash Join (cost=24.34..51.24 rows=143 width=152) (actual time=0.354..0.354 rows=0 loops=1) Hash Cond: (t2.species_groups_no = t3.id_no) -> Hash Join (cost=4.22..29.15 rows=143 width=10) (actual time=0.059..0.330 rows=143 loops=1) Hash Cond: (t1.id_no = t2.doctor_no) -> Seq Scan on doctors t1 (cost=0.00..16.00 rows=1000 width=10) (actual time=0.010..0.120 rows=1000 loops=1) -> Hash (cost=2.43..2.43 rows=143 width=8) (actual time=0.041..0.041 rows=143 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Seq Scan on doctors_species_groups t2 (cost=0.00..2.43 rows=143 width=8) (actual time=0.007..0.019 rows=143 loops=1) -> Hash (cost=14.50..14.50 rows=450 width=150) (actual time=0.007..0.007 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on species_groups t3 (cost=0.00..14.50 rows=450 width=150) (actual time=0.003..0.005 rows=6 loops=1) Planning time: 0.191 ms Execution time: 0.376 ms (13 rows)

pking@[local]:5432 [test]=# explain (analyze, buffers) select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no; QUERY PLAN ---------------------------------------------------------------------------------------------- ------------------------------------------- Hash Join (cost=24.34..51.24 rows=143 width=152) (actual time=0.306..0.306 rows=0 loops=1) Hash Cond: (t2.species_groups_no = t3.id_no) Buffers: shared hit=8 -> Hash Join (cost=4.22..29.15 rows=143 width=10) (actual time=0.050..0.277 rows=143 loops=1) Hash Cond: (t1.id_no = t2.doctor_no) Buffers: shared hit=7 -> Seq Scan on doctors t1 (cost=0.00..16.00 rows=1000 width=10) (actual time=0.007..0.105 rows=1000 loops=1) Buffers: shared hit=6 -> Hash (cost=2.43..2.43 rows=143 width=8) (actual time=0.036..0.036 rows=143 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB Buffers: shared hit=1 -> Seq Scan on doctors_species_groups t2 (cost=0.00..2.43 rows=143 width=8) (actual time=0.004..0.014 rows=143 loops=1) Buffers: shared hit=1 -> Hash (cost=14.50..14.50 rows=450 width=150) (actual time=0.007..0.007 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: shared hit=1

mysql> explain select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no; +----+-------------+-------+------------+--------+---------------+---------+---------+-------- -------------------+------+----------+-------------+ id select_type table partitions type possible_keys key key_len ref rows filtered Extra +----+-------------+-------+------------+--------+---------------+---------+---------+-------- -------------------+------+----------+-------------+ 1 SIMPLE t2 NULL ALL NULL NULL NULL NULL 100 100.00 Using where 1 SIMPLE t3 NULL eq_ref PRIMARY,id_no PRIMARY 8 test.t2.species_groups_no 1 100.00 Using where 1 SIMPLE t1 NULL eq_ref PRIMARY,id_no PRIMARY 8 test.t2.doctor_no 1 100.00 Using where +----+-------------+-------+------------+--------+---------------+---------+---------+-------- -------------------+------+----------+-------------+ 3 rows in set, 1 warning (0.00 sec)

mysql> explain FORMAT=JSON select t1.name, t3.species FROM doctors t1 INNER JOIN doctors_species_groups t2 ON t1.id_no = t2.doctor_no INNER JOIN species_groups t3 on t2.species_groups_no = t3.id_no; { "query_block": { "select_id": 1, "cost_info": { "query_cost": "261.00" }, "nested_loop": [ { "table": { "table_name": "t2", "access_type": "ALL", "rows_examined_per_scan": 100, "rows_produced_per_join": 100, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "20.00", "prefix_cost": "21.00", "data_read_per_join": "1K" }, "used_columns": [ "doctor_no", "species_groups_no" ], "attached_condition": "((`test`.`t2`.`species_groups_no` is not null) and (`test`.`t2`.`doctor_no` is not null))" } }, { "table": { "table_name": "t3", "access_type": "eq_ref", "possible_keys": [ "PRIMARY", "id_no" ], "key": "PRIMARY", "used_key_parts": [ "id_no" ], "key_length": "8", "ref": [ "test.t2.species_groups_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 100, "filtered": "100.00",

Query Plans: Lessons Learned Learning to read query plans correctly is hard for any database MySQL: Baron Schwartz's "EXPLAIN Demystified" Postgres: Josh Berkus "Explain Explained"

Index Types and Indexing Strategies

Indexes: Postgres B-tree GiST SP-GiST GIN BRIN Hash

Indexes: Postgres Functional Indexing Useful to speed up particularly complex queries Indexing on the WHERE clause of a given query CREATE INDEX order_not_completed ON orders USING btree (restaurant_id, creation_date) WHERE ((paid = 0) AND (payment_id IS NULL))

Indexes: Postgres You will often find more indexes than the number of columns in a table Postgres is already optimized for rewriting data all the time, which is why the cost of having so many indexes isn't cumbersome

Indexes: MySQL InnoDB has clustered indexing Long primary keys are bad and affect performance on all indexes because of clustered indexing

Community Postgres doesn't have the equivalent of the MySQL Utilities, percona toolkit, or just searching GitHub for MySQL While there are big Postgres consulting companies there is no one company driving the major changes No official bug tracker in Postgres Almost all communication is done on the official Postgres email lists

Thing I Miss from Postgres Flexible Indexing Transactional DDL In-database online schema changes WAL-style replication

Things I wish Postgres had from MySQL Sub-millisecond, primary key selects on large tables Community support Replication flexibility

Questions?

fb.com/yelpengineers @YelpEngineering engineeringblog.yelp.com github.com/yelp

We're Hiring! www.yelp.com/careers/