Preventing and Resolving MySQL Downtime. Jervin Real, Michael Coburn Percona

Similar documents
Resolving and Preventing MySQL Downtime

InnoDB Scalability Limits. Peter Zaitsev, Vadim Tkachenko Percona Inc MySQL Users Conference 2008 April 14-17, 2008

MySQL Performance Troubleshooting

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Innodb Performance Optimization

MySQL Database Scalability

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

Best Practices for MySQL Scalability. Peter Zaitsev, CEO, Percona Percona Technical Webinars May 1, 2013

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011

MySQL usage of web applications from 1 user to 100 million. Peter Boros RAMP conference 2013

Migrating to Aurora MySQL and Monitoring with PMM. Percona Technical Webinars August 1, 2018

Effective Testing for Live Applications. March, 29, 2018 Sveta Smirnova

MySQL 5.6: Advantages in a Nutshell. Peter Zaitsev, CEO, Percona Percona Technical Webinars March 6, 2013

What's new in MySQL 5.5? Performance/Scale Unleashed

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

Choosing a MySQL HA Solution Today

MySQL High Availability Solutions. Alex Poritskiy Percona

Backup & Restore. Maximiliano Bubenick Sr Remote DBA

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Performance improvements in MySQL 5.5

Optimizing MySQL performance with ZFS. Neelakanth Nadgir Allan Packer Sun Microsystems

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

MySQL Performance Improvements

Which technology to choose in AWS?

Dave Stokes MySQL Community Manager

Caching and reliability

Percona XtraDB Cluster

Tips from the Trenches Preventing downtime for the over extended DBA. Andrew Moore Senior Remote DBA Percona Managed Services

Percona XtraDB Cluster ProxySQL. For your high availability and clustering needs

PolarDB. Cloud Native Alibaba. Lixun Peng Inaam Rana Alibaba Cloud Team

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

Background. Let s see what we prescribed.

Oracle 1Z MySQL 5.6 Database Administrator. Download Full Version :

How to Fulfill the Potential of InnoDB's Performance and Scalability

The Care and Feeding of a MySQL Database for Linux Adminstrators. Dave Stokes MySQL Community Manager

Innodb Architecture and Internals. Peter Zaitsev Percona Live, Washington DC 11 January 2012

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

MyRocks in MariaDB. Sergei Petrunia MariaDB Tampere Meetup June 2018

10 Percona Toolkit tools every MySQL DBA should know about

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Mysql Cluster Global Schema Lock

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

MySQL High Availability

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

Troubleshooting Best Practices

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

Migrating to XtraDB Cluster 2014 Edition

Upgrading MySQL Best Practices. Apr 11-14, 2011 MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

1Z Oracle. MySQL 5 Database Administrator Certified Professional Part I

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

InnoDB: What s new in 8.0

Optimizing BOINC project databases

Percona XtraDB Cluster 5.7 Enhancements Performance, Security, and More

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com

Replication features of 2011

Creating a Best-in-Class Backup and Recovery System for Your MySQL Environment. Akshay Suryawanshi DBA Team Manager,

MySQL Group Replication & MySQL InnoDB Cluster

MySQL Replication Update

Why Choose Percona Server For MySQL? Tyler Duzan

Deploying MySQL in Production Daniel Kowalewski Senior Technical Operations Engineer, Percona

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

What s new in Percona Xtradb Cluster 5.6. Jay Janssen Lead Consultant February 5th, 2014

MySQL Utilities, part 1. Sheeri Cabral. Senior DB Admin/Architect,

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Amazon Aurora. User Guide

Aurora, RDS, or On-Prem, Which is right for you

Load Testing Tools. for Troubleshooting MySQL Concurrency Issues. May, 23, 2018 Sveta Smirnova

Amazon Aurora Deep Dive

MySQL Replication: Pros and Cons

PostgreSQL Performance The basics

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

PostgreSQL migration from AWS RDS to EC2

Taking hot backups with XtraBackup. Principal Software Engineer April 2012

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Open Source Database Performance Optimization and Monitoring with PMM. Fernando Laudares, Vinicius Grippa, Michael Coburn Percona

The Oracle DBMS Architecture: A Technical Introduction

InnoDB: What s new in 8.0

Oracle Exam 1z0-883 MySQL 5.6 Database Administrator Version: 8.0 [ Total Questions: 100 ]

Manual Mysql Query Cache Hit Rate 0

How to get MySQL to fail

Accelerating NoSQL. Running Voldemort on HailDB. Sunny Gleason March 11, 2011

Tips & Tricks on Tuning MySQL Performance

High Availability Solutions for the MySQL Database

MySQL HA Solutions Selecting the best approach to protect access to your data

GFS: The Google File System. Dr. Yingwu Zhu

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Percona XtraDB Cluster MySQL Scaling and High Availability with PXC 5.7 Tibor Korocz

G a l e r a C l u s t e r Schema Upgrades

Choosing a MySQL High Availability Solution. Marcos Albe, Percona Inc. Live Webinar June 2017

Percona Xtrabackup: Hot Backup Solution for MySQL

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Bitnami MySQL for Huawei Enterprise Cloud

Amazon Aurora Deep Dive

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

Avoiding Common (but Deadly) MySQL Operations Mistakes

Architecture and Design of MySQL Powered Applications. Peter Zaitsev CEO, Percona Highload Moscow, Russia 31 Oct 2014

MySQL Replication. Rick Golba and Stephane Combaudon April 15, 2015

Still All on One Server: Perforce at Scale

Percona XtraDB Cluster

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Transcription:

Preventing and Resolving MySQL Downtime Jervin Real, Michael Coburn Percona

About Us Jervin Real, Technical Services Manager Engineer Engineering Engineers APAC Michael Coburn, Principal Technical Account Manager Responsible for managing technical relationship with Percona's highest revenue customers 2

What is Downtime? When your Application is completely unavailable When your Application is in a degraded state Whenever your boss says so :) 3

Why Prevent Downtime? Your business loses money when the Application is down You and your team's reputation suffers 4

Agenda Real world adventures Problems Solutions Prevention Putting them all together 5

6 I Had a Crash On You

7 I Had a Crash On You (1): Page Corruption

I Had a Crash On You (1): Page Corruption > About Disk bad sectors problem, not monitored or checked Page corruption on disk level Server crashes when reading page from disk Keeps crashing :( 8

I Had a Crash On You (1): Page Corruption > Solutions Percona Server, we tried: innodb_table_corrupt_action = salvage Worked! Dropped table, recreated - application back online Worst case: innodb_force_recovery > 0 Data Recovery 9

I Had a Crash On You (2): Assertion > About Running 5.6.11, early adopter, InnoDB FULLTEXT Upgrade to 5.6.18, MySQL crashed Data was unusable - bug#72079 10

I Had a Crash On You (2): Assertion > Solutions Downgrade and restore from backup Re-execute upgrade to avoid the bug 11

I Had a Crash On You (1): Page Corruption > Preventions innodb_corrupt_table_action=salvage / warn pt-table-checksum Regularly recurse your data and check for errors in error log RAID card health checks Can vary by vendor SMART checks Be vigilant for disk level errors 12

13 Nobody s Watching

Nobody s Watching (1): Nobody Cared > About Percona XtraDB Cluster, 3 nodes Few months ago node 3 went down due to conflict, but nobody noticed Few hours ago, node 2 was killed by OOM, cluster lost quorum EVERYBODY NOTICED! 14

Nobody s Watching (1): Nobody Cared > Solutions Bootstrap remaining node SET GLOBAL wsrep_provider_options= pc.bootstrap=1 ; SST second and 3rd node Define wsrep_notify_cmd temporarily Implement better alerting 15

Nobody s Watching (2): Dropped the Bomb > About New sysadmin received disk space alert du -hx --max-depth=1 / /var has lots of data find /var/ -size +5G -exec rm -rf {} \; Bam, ibdata1 gone! Restart maintenance occurred later in the day... 16

Nobody s Watching (2): Dropped the Bomb > Solutions Restore from backup Really, they were lucky! 17

Nobody s Watching: Prevention Percona Monitoring Plugins pmp-check-deleted-files pmp-check-mysql-status pmp-check-mysql-innodb Define a script executable by mysql user Triggered on node state changes Take backups, and alert on failure Don't restart the server - file handles are still open! 18

19 Self Induced Pain

Self Induced Pain (1): Query Cache Waiting for query cache lock root# ~> pt-sift /var/lib/pt-stalk/... --processlist-- State 226 90 Waiting for query cache lock 4 Sending data 4 Master has sent all binlog to slave; waiting for binlog to be updated 2 init 20

Self Induced Pain (1): Query Cache > About Global mutex Point of contention Especially on hot dataset/table More so, with large QC 21

Self Induced Pain (1): Query Cache > Solutions Set it to small size - to reduce performance overhead Disable completely to to avoid contention Hint offending queries to skip the query cache i.e. SELECT SQL_NO_CACHE 22

Self Induced Pain (2): Buffer Pool Dump/Restore Dumps buffer pool page list to disk Reloads buffer pool based on this list at startup Meant to help speed up buffer pool warmup 23

Self Induced Pain (2): Buffer Pool Dump/Restore > About Maintenance restart, buffer dump and restore enabled Yey! Expecting everything to go well. 30mins in performance still really bad, IO trashing Large buffer pool, busy read/write 24

Self Induced Pain (2): Buffer Pool Dump/Restore > Solutions Extend your maintenance period to let the server warmup if possible, otherwise they will contend on IO RAID1 of 2 SATA disks is not a license to use buffer pool warmup on 240GB of buffer pool 25

Self-Induced Pain Prevention Percona Toolkit pt-stalk pt-sift pt-kill Disable OOM killer Configure appropriate disk scheduler Check the error log for "Buffer pool load complete" 26

27 MySQL, MySQL! What Have Suffereth Ye Thee?

MySQL, MySQL! What Have Suffereth Ye Thee? (1): Grind to a Halt > About Slow queries Connections build up Slow response times Long running transactions Stop the World scenario 28

MySQL, MySQL! What Have Suffereth Ye Thee? (1): Grind to a Halt > About --innodb-- txns: 486xACTIVE (28s) 994xnot (0s) 227xLOCK WAIT (25844s) 0 queries inside InnoDB, 0 queries in queue Main thread: sleeping, pending reads 0, writes 28, flush 1 Log: lsn = 2147483647, chkp = 2147483647, chkp age = 210625191 29

MySQL, MySQL! What Have Suffereth Ye Thee? (1): Grind to a Halt > About ---TRANSACTION 230207990, ACTIVE 13779 sec fetching rows mysql tables in use 1, locked 1 80337 lock struct(s), heap size 8271400, 10979242 row lock(s) MySQL thread id 671621, OS thread handle 0x7fe03528a700, query id 37505085 localhost magento Sending data SELECT `sales_flat_quote_item`.* FROM `sales_flat_quote_item` LIMIT 376 OFFSET 491056 30

MySQL, MySQL! What Have Suffereth Ye Thee? (1): Grind to a Halt > Solutions KILL long running trx pt-kill for persistent long running trx Deploy immediate code changes to disable erroring code 31

MySQL, MySQL! What Have Suffereth Ye Thee? (2): CPU Load > About MySQL is still responding All sorts of mutexes trx_sys->mutex block->lock lock_sys->mutex lock_sys->wait_mutex and is killing latency Service impact means lost income 32

MySQL, MySQL! What Have Suffereth Ye Thee? (2): CPU Load > Solutions innodb_thread_concurrency > 0 33

MySQL, MySQL! What Have Suffereth Ye Thee? (3): CPU Load > About Opening tables, Closing tables --processlist-- State 578 Opening tables 32 closing tables 34

MySQL, MySQL! What Have Suffereth Ye Thee? (3): CPU Load > About Contention on LOCK_open mutex Risk of negative scalability 35

MySQL, MySQL! What Have Suffereth Ye Thee? (3) : CPU Load > Solutions Tune table_open_cache/table_definition_cache table_open_cache_instances (5.6+) Shard either logically/horizontally, run multiple mysql instances to reduce object size by instance 36

MySQL, MySQL! What Have Suffereth Ye Thee? (2,3) : Prevention pt-kill --log MySQL Server Configuration a. Remember to tune innodb_thread_ concurrency (default is 0) b. innodb_table_cache + innodb_table_cache_instances Application Stack Configuration (Schema Design) a. Single tenant per schema b. Multiple tenants per schema (each table has client_id column) c. All tenants in one schema 37

Wizard of OS (1): Disk Performance Disk performance cascading to MySQL to application 38

Wizard of OS (1): Disk Performance > About Slow writes, binlogs, redo logs, syncs Transactions stalling on COMMIT, updating, inserting Replication getting delayed if node is a slave Translates to latency 39

Wizard of OS (1): Disk Performance > Solutions RAID Controller in Write-Through Could also be a bad disk! 40

Wizard of OS (2): Swapping Swapping heavily, with significant amount of RAM free 41

Wizard of OS (2): Swapping > About Swapping induces significant amount of IO Swapping in and out of disk is mighty expensive Affects MySQL in magnificent ways Swap Insanity! 42

Wizard of OS (2): Swapping > Solutions NUMA Interleave Percona Server is NUMA configurable numa_interleave Flush_caches Check numastat - perl check_numa.pl 43

Wizard of OS : Prevention Tune: Vm.swappiness NUMA policy disk scheduler mount options appropriately (ext4, xfs) (nobarrier, noatime) pt-heartbeat - monitor replication delay 44

Percona Server Features Enable InnoDB Buffer Pool warming Enable userstat for table & index statistics Enable verbose slow log Enable Query Response Time plugin 45

Thank You! Jervin Real jervin.real@percona.com Technical Services Manager, APAC Michael Coburn michael.coburn@percona.com Principal Technical Account Manager, USA 46