Lessons learned while automating MySQL in the AWS cloud. Stephane Combaudon DB Engineer - Slice

Similar documents
Which technology to choose in AWS?

Migrating and living on RDS/Aurora. life after Datacenters

MongoDB in AWS (MongoDB as a DBaaS)

Training on Amazon AWS Cloud Computing. Course Content

The Long Road from Capistrano to Kubernetes

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Aurora, RDS, or On-Prem, Which is right for you

CPM. Quick Start Guide V2.4.0

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Percona XtraDB Cluster

Amazon Web Services Training. Training Topics:

Amazon Web Services (AWS) Training Course Content

Using MHA in and out of the Cloud. Garrick Peterson Percona University, Toronto 2013

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Migrating to Aurora MySQL and Monitoring with PMM. Percona Technical Webinars August 1, 2018

Container Orchestration on Amazon Web Services. Arun

PracticeDump. Free Practice Dumps - Unlimited Free Access of practice exam

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Principal Solutions Architect. Architecting in the Cloud

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

AWS_SOA-C00 Exam. Volume: 758 Questions

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

A Journey to DynamoDB


MySQL Replication. Rick Golba and Stephane Combaudon April 15, 2015

Actifio Test Data Management

AWS London Loft: CloudFormation Workshop

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

APACHE COTTON. MySQL on Mesos. Yan Xu xujyan

Architecting for Greater Security in AWS

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

Percona XtraDB Cluster powered by Galera. Peter Zaitsev CEO, Percona Slide Credits: Vadim Tkachenko Percona University, Washington,DC Sep 12,2013

MySQL HA Solutions Selecting the best approach to protect access to your data

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

Amazon AWS-Solution-Architect-Associate Exam

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

FAST TRACK YOUR AMAZON AWS CLOUD TECHNICAL SKILLS. Enterprise Website Hosting with AWS

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

IBM Compose Managed Platform for Multiple Open Source Databases

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

High Availability & Disaster Recovery. Witt Mathot

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

Lessons from database failures

Client Success in an Open Source World. Udi Shamay Head of Client Strategy, Magento

AWS Well Architected Framework

AWS Solutions Architect Exam Tips

MySQL in the Cloud Tricks and Tradeoffs

How to Backup at Petabyte Scale When Every Transaction Counts

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Percona XtraDB Cluster MySQL Scaling and High Availability with PXC 5.7 Tibor Korocz

NGF0502 AWS Student Slides

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

Automating Elasticity. March 2018

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

How to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases

Migrating To MySQL The Live Database Upgrade Guide

Amazon Aurora Deep Dive

Amazon AWS and RDS, moving towards it. Dimitri Vanoverbeke Solution Percona

Percona XtraDB Cluster 5.7 Enhancements Performance, Security, and More

1. What is Cloud Computing (CC)? What are the Pros and Cons of CC? Technologies of CC 27

Migrating Enterprise Applications to the Cloud Session 672. Leighton L. Nelson

Important DevOps Technologies (3+2+3days) for Deployment

Introduction to Database Services

Choosing a MySQL HA Solution Today

MySQL High Availability Solutions. Alex Poritskiy Percona

Security Camp 2016 Cloud Security. August 18, 2016

Migrating Existing Applications to AWS. Matt Tavis Principal Solutions Architect

FIREFLY ARCHITECTURE: CO-BROWSING AT SCALE FOR THE ENTERPRISE

Cloud Computing /AWS Course Content

Introduction to Cloud Computing

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN

Cloud security 2.0: Joko nyt pilveen voi luottaa?

Amazon Aurora Deep Dive

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

About Intellipaat. About the Course. Why Take This Course?

SOLUTION BRIEF Fulfill the promise of the cloud

~3333 write ops/s ms response

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD

Designing Fault-Tolerant Applications

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

How to setup Orchestrator to manage thousands of MySQL servers. Simon J Mudd 3 rd October 2017

CIT 668: System Architecture. Amazon Web Services

Two years of on Kubernetes

LINUX, WINDOWS(MCSE),

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US

Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2

TestkingPass. Reliable test dumps & stable pass king & valid test questions

What s New in MySQL and MongoDB Ecosystem Year 2017

Amazon Aurora Deep Dive

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Nicman Group Test Data Management 2.0 Leveraging Copy Data Virtualization Technology in QA for SQuAD. November 2016

Deploy. A step-by-step guide to successfully deploying your new app with the FileMaker Platform

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Transcription:

Lessons learned while automating MySQL in the AWS cloud Stephane Combaudon DB Engineer - Slice

Our environment 5 DB stacks Data volume ranging from 30GB to 2TB+. Master + N slaves for each stack. Master is handling all application traffic. Specialized slaves (backups, reports, custom jobs). Stacks are duplicated in several dimensions Regions (US, JP) Environment (QA, Staging, Prod) 2

Problems we wanted to fix Hosted in the AWS cloud, but relying on a 3 rd party vendor for DB automation. 3 rd party vendor became a liability Expensive Automation only works with MySQL 5.5 Security issues Fail over unavailable 3

Our goals Create our own MySQL automation! Instance lifecycle DBA/SA people create instance from a template. Software gets provisioned automatically. Data gets provisioned automatically. Replication (if slave) starts automatically. Bonus: add ability to fail over to a slave easily. How can we get there? 4

Technical Solution Overview Creating instances from a template CloudFormation Installing software Chef Data provisioning Galera? Custom scripts? High availability Galera? MHA? 5

CloudFormation Provides a way to manage AWS resources through templates (infrastructure as code). A CloudFormation template Is a JSON file. Describes the configuration of your resources. Pro: any AWS resource can be described. Con: learning curve is steep. 6

AWS Components - 1 st try Master Slave N MHA Manager Slave 1 Standalone EC2 instance Autoscaling Group 7

Data Provisioning Galera Natively solves data provisioning + HA issue. But not a good fit for all our workloads + app changes needed. Let s write a custom provisioning script! For a master Do nothing. We only create a master for a new (empty) stack. For a slave Restore latest available backup. Start replication. But how will servers know whether they re a master or a slave? 8

MHA Automated vs semi-automated mode App is not ready for automatic MySQL failover. Semi automated mode is chosen Master failure detection is manual, slave promotion is a single command. MHA requirement MHA configuration needs to know the exact instances of the replication topology. 9

Back to AWS components CloudFormation allows you to add dependencies between components Create MHA Manager. Add IP of MHA Manager in some file of the MySQL servers when they are created by CloudFormation. During MySQL bootstrap, add IP of MySQL server to MHA config. But there s a catch: if MHA Manager goes down, we lose our failover ability. 10

AWS Components - 2 nd try Master Slave N MHA Manager Slave 1 Autoscaling Group Autoscaling Group of 1 instance 11

Back to AWS components again CloudFormation is no longer able to know the IP of the MHA Manager in advance. Therefore MySQL servers can no longer register themselves in MHA config file. This time again we need service discovery. 12

Service Discovery - 1 No such service available in our infrastructure. We tried several options Zookeeper, etcd: another infrastructure to manage. DynamoDB: race conditions. In the end the AWS API seemed a strong enough option. 13

Service Discovery - 2 All components in a CF stack share a tag (aws:cloudformation:stack name) Within a CF stack, names of ASGs are predictable. We can then find the IP address of all instances within a specific ASG 14

Back to MHA config Now instances are able to register themselves when bootstrapping. Upon instance termination MHA config needs to be updated. Hooks can be added to run custom script, but not very fast. What else can we do? 15

Another MHA problem - 1 MHA command lines are not very user friendly. mha@manager$ masterha_master_switch --conf=/etc/mha.conf --master_state=alive --new_master_host=172.25.2.73 --orig_master_is_new_slave --interactive=0 We built a wrapper script. Simpler options Autocompletion root@manager# db_ha promote --new_master=172.25.2.73 16

Another MHA problem - 2 Wait, couldn t we also sync the MHA conf in this when running this script? Yes, of course! MHA conf is synced on demand with this script Ensures the conf is always up-to-date when we need it. No more need to care about MySQL instance termination. 17

Another MHA problem - 3 So far, so good but Some of our slaves are not suitable at all to become master. We want no_master=1 in MHA config for these servers. MHA Manager just knows a bunch of MySQL servers, how can it add the no_master flag? We need to refine our AWS components diagram again. 18

AWS Components - 3 rd try Master Slave N MHA Manager Slave 1 ASG1 (Potential Masters) ASG2 (Slaves only) Autoscaling Group of 1 instance 19

Recap so far At this point We can create an arbitrary number of MySQL servers. MHA config is synced automatically. Any node (MySQL or Manager) that fails is rebuilt automatically thanks to ASGs. All good? Not exactly 20

Back to Data Provisioning We have separate code paths for master and slaves. But how do we know if a new instance is a master or a slave? Let s use AWS API again If the instance is part of ASG2: slave. If the instance is part of ASG1: 1 st instance is master, others are slaves. We add a replication_role tag for each instance. 21

Backups - XtraBackup vs EBS snapshots EBS snapshots Simple to use and super fast (incremental backups). XtraBackup Very complex, super slow. Incremental backups are difficult. Let s use EBS snapshots then? Well, not so fast 22

Backups vs Restores EBS snapshots are great for backups, not for restores. Data is lazily loaded from S3, ie warmup takes forever. Example on our write-heaviest cluster Restore + replication catchup with XB: 8-9 hours. Same with EBS snapshots: I gave up after 2 days. 23

Backup Script XtraBackup takes full backup. Backup is uploaded to S3. Frequency of backups is stack dependent Configuration file in S3 Tags are added on backup servers Timestamp and status of latest backup. Progress bar if a backup is taken. 24

Roadmap Migration to 5.7 Automation already supports both 5.5 and 5.7. Better monitoring of errors on restores. Integration with PMM Implemented but broken. Realtime binlog streaming to Elastic Filesystem Implemented but broken. Group Replication instead of MHA. 25

The end Thanks for attending!! Questions/comments stephane@slice.com 26