Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

Similar documents
Choosing a MySQL HA Solution Today

MySQL High Availability

MySQL High Availability Solutions. Alex Poritskiy Percona

Percona XtraDB Cluster MySQL Scaling and High Availability with PXC 5.7 Tibor Korocz

Percona XtraDB Cluster

What s new in Percona Xtradb Cluster 5.6. Jay Janssen Lead Consultant February 5th, 2014

Percona XtraDB Cluster powered by Galera. Peter Zaitsev CEO, Percona Slide Credits: Vadim Tkachenko Percona University, Washington,DC Sep 12,2013

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

Which technology to choose in AWS?

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

Choosing a MySQL High Availability Solution. Marcos Albe, Percona Inc. Live Webinar June 2017

Percona XtraDB Cluster 5.7 Enhancements Performance, Security, and More

Using MySQL for Distributed Database Architectures

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com

The Hazards of Multi-writing in a Dual-Master Setup

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Various MySQL High Availability (HA) Solutions

Migrating to XtraDB Cluster 2014 Edition

Using MHA in and out of the Cloud. Garrick Peterson Percona University, Toronto 2013

MySQL HA Solutions Selecting the best approach to protect access to your data

MySQL Replication. Rick Golba and Stephane Combaudon April 15, 2015

Percona XtraDB Cluster ProxySQL. For your high availability and clustering needs

MySQL usage of web applications from 1 user to 100 million. Peter Boros RAMP conference 2013

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

Lessons from database failures

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Geographically Dispersed Percona XtraDB Cluster Deployment. Marco (the Grinch) Tusa September 2017 Dublin

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

MySQL InnoDB Cluster. New Feature in MySQL >= Sergej Kurakin

Understanding Percona XtraDB Cluster 5.7 Operation and Key Algorithms. Krunal Bauskar PXC Product Lead (Percona Inc.)

A Guide to Architecting the Active/Active Data Center

Reliable Crash Detection and Failover with Orchestrator

Introduction to MySQL InnoDB Cluster

Kenny Gryp. Ramesh Sivaraman. MySQL Practice Manager. QA Engineer 2 / 60

G a l e r a C l u s t e r Schema Upgrades

FromDual Annual Company Meeting

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

MySQL Multi-Site/Multi-Master Done Right

Architecture and Design of MySQL Powered Applications. Peter Zaitsev CEO, Percona Highload Moscow, Russia 31 Oct 2014

High availability with MariaDB TX: The definitive guide

MySQL Cluster An Introduction

MySQL Architecture Design Patterns for Performance, Scalability, and Availability

Datacenter replication solution with quasardb

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

ShardProxy Replication-Manager. DataOps - Juin 2018 Kentoku SHIBA - Stephane VAROQUI

High Availability Solutions for the MySQL Database

Percona Software & Services Update

How Percona Contributes to Open Source Database Ecosystem. Peter Zaitsev 5 October 2016

Choosing a MySQL HA Solution. Ernie Souhrada, Senior Consultant Webinar Presentation 05 June 2013

Amazon AWS and RDS, moving towards it. Dimitri Vanoverbeke Solution Percona

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

Upgrading MySQL Best Practices. Apr 11-14, 2011 MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

CISC 7610 Lecture 2b The beginnings of NoSQL

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Aurora, RDS, or On-Prem, Which is right for you

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

SQL Server Availability Groups

MySQL Backup Best Practices and Case Study:.IE Continuous Restore Process

AlwaysOn Availability Groups: Backups, Restores, and CHECKDB

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Percona XtraDB Cluster

What s New in MySQL and MongoDB Ecosystem Year 2017

MySQL HA vs. HA. DOAG Konferenz 2016, Nürnberg. Oli Sennhauser. Senior MySQL Consultant, FromDual GmbH.

MySQL Database Scalability

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

EECS 498 Introduction to Distributed Systems

MySQL Group Replication & MySQL InnoDB Cluster

Autopsy of an automation disaster. Simon J Mudd (Senior Database Engineer) Percona Live, 25 th April 2017

MySQL High Availability with Percona XtraDB Cluster 5.7 Hands on tutorial!

MySQL Group Replication in a nutshell

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Everything You Need to Know About MySQL Group Replication

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

Distributed Data Management Replication

MySQL Architecture Options:

Percona Live Europe 2016 Use ProxySQL to Improve Your MySQL High Availability Solution

MySQL Replication Advanced Features In 20 minutes

Architekturen für die Cloud

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

Migrating to Aurora MySQL and Monitoring with PMM. Percona Technical Webinars August 1, 2018

Future-Proofing MySQL for the Worldwide Data Revolution

Replication in Distributed Systems

Mysql Cluster Could Not Acquire Global Schema Lock

MariaDB Enterprise Cluster. MariaDB Training

MySQL High Availability

Lessons learned while automating MySQL in the AWS cloud. Stephane Combaudon DB Engineer - Slice

Percona Live Europe 2016 Use ProxySQL to Improve Your MySQL High Availability Solution Marco Tusa Manager Consulting Amsterdam, Netherlands October 3

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Introduction to MySQL Cluster: Architecture and Use

SQL Azure. Abhay Parekh Microsoft Corporation

MySQL Cluster Ed 2. Duration: 4 Days

MySQL for Database Administrators Ed 3.1

Introduction. As such horrible solutions like the following are still implemented.

How to Scale MongoDB. Apr

How to setup Orchestrator to manage thousands of MySQL servers. Simon J Mudd 3 rd October 2017

DATABASE SCALE WITHOUT LIMITS ON AWS

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

MySQL at Scale at Square

Transcription:

Choosing a MySQL HA Solution Today Choosing the best solution among a myriad of options

Questions...Questions...Questions??? How to zero in on the right solution

You can t hit a target if you don t have one! Many in upper management don t really know what they want and then just end up saying they want it all! Your job is to determine what is most essential component for success. 4 No solution is a One-Size- Fits-All; rather, the best you can hope for is One-Size- Fits-Most.

What is the problem you are trying to solve? A very important question!

Redundancy vs. Scaling vs. High Availability These are not necessarily all the same! Redundancy Scaling Need multiple copies of data in event of a disaster Need to increase read and/or write throughput 6 High Availability Need to minimize outage duration

CAP Theorem Choose any two of the following: Consistency All nodes see the same data at the same time Availability Every request receives a response about whether it succeeded or not CAP Theorem Consistency Availability Partition Tolerance 7 Partition Tolerance The system continues to operate despite arbitrary partitioning due to network failures

Guaranteeing Consistency The problem is that although MySQL replication is great, it alone does not guarantee consistency across all nodes. There is always the potential that data is out of sync since transactions can be lost during failover and other reasons. 8 Galera-based clusters such as PXC are certification-based to prevent this!

Can I afford to lose data? Depends on the Application Apps should check status codes on transactions to be sure they were committed. Many do not! Lost transactions During failover, simple replication schemes have the possibility of losing data 9 Inconsistent nodes Without conflict detection and resolution, it is unavoidable Run pt-table-checksum often to check for inconsistent data across replication nodes Use a Galera-based Distributed Cluster, such as PXC with certification processes

Avoiding Single Point of Failure What is watching your system? Or is anything standing ready to intervene in a failure? For replication, take a look at MHA and MySQL Orchestrator. Both are great tools to perform failover of a Replica. There are others. 10 For PXC, failover is typically much faster, but it is not the perfect solution in every case.

Can I afford lost transactions? Many MySQL DBAs worry about setting innodb_flush_log_at_trx_commit to 1 for ACID compliance and sync_binlog but then use replication with no consistency checks! Is this logically consistent? PXC maintains consistency through certification 11

Conflict Detection & Resolution 12 Galera s Certification Process Transaction continues on a node as normal until it reaches COMMIT stage Changes are collected into a writeset Writeset is sent to all nodes for certification PKs are used to determine if the writeset can be applied If certification fails, the writeset is dropped and the transaction is rolled back. If it succeeds, the transaction commits and the writesets are applied to all of the nodes. All nodes will reach the same decision on every transaction and is thus deterministic.

Do I want Failover or a Distributed System? Failover Pitfalls Failover systems have a monitor which detects failed nodes and moves services elsewhere if available Failover takes time! Distributed Systems to the Rescue Distributed Systems minimize failover time 13

Automatic or Manual Failover? Advantage of Manual Failover The primary advantage to failing over manually is that a human usually can make a better decision as to whether failover is necessary. Systems rarely get it perfect, but they can be close! Advantage of Automatic Failover More Nines due to minimized outages No need to wait on a DBA to perform 14

How Fast Does Failover Have to Occur? Replication / MHA / MMM Depends upon how long it takes for pending Replica transactions to complete before failover can occur Typically around 30 seconds DRBD Typically between 15 and 30 seconds PXC / MySQL Cluster VERY fast failover. Typically less than 1 second depending upon Load Balancer 15

How Many 9 s Do You Really Need? Every manager always says As many as I can get. That sounds great, but the reality is that tradeoffs are required! Many applications can tolerate a few minutes of downtime with minimal impact. 16

17

Do I need to scale reads and/or writes? Scaling Reads Most solutions offer ability to read from multiple nodes or replicas MHA, PXC, MySQL Cluster, and others are well suited for this Scaling Writes Many people wrongly try to scale writes by writing to multiple nodes in PXC leading to conflicts Others try it with Master-Master Replication which is also problematic Possibly the best solution in this regard is MySQL Cluster 18

What about provisioning new nodes? Replication Largely, this is a manual process MySQL Utilities makes this easier than ever Distributed Clusters PXC and MySQL Cluster make this much easier PXC uses state transfer (either SST or IST) to automate the process for cluster nodes 19

The Rule of Threes With PXC, try to have three of everything If you span a data center, have 3 data centers If your nodes are on a switch, try to have 3 switches PXC really needs at least three nodes in the cluster. An odd number is preferred for voting reasons. Forget about trying to keep a cluster alive during failure with only two data centers. You are better off making one a DR site. Forget about custom weighting to try to get by on two data centers. The 51% rule will get you anyway! 20

How many data centers do I have? What if I only have 1 data center? You can gain protection against a single failed node or more, depending on cluster size What if I have 2 data centers? You should probably be considering the second data center as a DR solution What about 3 or more? Most robust solution when using Galerabased clusters such as PXC 21

How do I plan for Disaster Recovery? Make sure the DR node(s) can handle the traffic, if even at minimized performance level Replicating from a PXC Cluster to a DR site Asyncronous Replication from PXC to a single node Asyncronous Replication from PXC to a replication topology Asyncronous Replication from PXC to another PXC cluster 22

What storage engine(s) do I need? MHA Not storage engine dependent. Works with all storage engines PXC Requires InnoDB. Support for MyISAM is experimental and should not be used in Production MySQL Cluster Requires NDB Storage Engine Build Optimize Fix Manage 23

Load Balancer Options HAProxy Open-source software solution Cannot split reads and writes. If that is a requirement, the app will need to do it! F5 BigIP Typical hardware solution MaxScale Can do read/write splitting Elastic Load Balancer (ELB) Amazon solution 24

What happens if the cluster reboots? A power outage in a single data center could lead to issues PXC can be configured to auto bootstrap May not always work when all nodes lose power simultaneously. While server is running, the grastate.dat file shows -1 for seqno Surviving a Reboot Helpful if nodes are shutdown by a System Administrator for a reboot or other such process Normal shutdown sets seqno properly 25

Do I need to be able to read after writing? Asynchronous Replication does not guarantee consistent views of data across nodes PXC offers Causal Reads Replica will wait for the event to be applied before processing additional queries, guaranteeing a consistent read state across nodes. 26

What if I do a lot of data loading? In the recent past, it was conventional wisdom to use replication in such scenarios over PXC. MTS does help if data is distributed over multiple schemas but is not a fit for all situations. PXC is now a viable option since we discovered a bug in Galera which did not properly split large transactions. 27

Have I taken precautions against split brain? Split Brain occurs when a cluster has its nodes divided from one another, most often due to network blip, and nodes form two or more new and independent (and thus divergent) clusters PXC is configured to go into a nonprimary state and refuse to take traffic 28 A newer setting with PXC will allow for dirty reads for non-primary nodes

Does my app require high concurrency? Newer approaches to replication allow for parallel threads (PXC has had this from the beginning.), such as Multi- Thread Slaves (MTS) MTS Allows a replica to have multiple SQL threads all with their own relay logs Enable GTID to make backups via Percona XTRABackup safer due to not being able to trust SHOW SLAVE STATUS to get relay log position 29

Am I limited on RAM? Some Distributed solutions such as MySQL Cluster require a lot of RAM, even with file-based tables. Be sure to plan appropriately. PXC works much more like a stand-alone node 30

31 How stable is my network? Networks are never really 100% reliable. Some Network Problems are due to outside factors such as system resource contention (especially on virtual machines) Network problems cause inappropriate failover issues. Use LAN segments with PXC to minimize network traffic across WAN

Making the right choice depends upon... Knowing what you really need! Knowing your options. Knowing your constraints! Understanding the pros/cons of each solution Setting expectations properly! 32

Q&A Your chance to ask questions!