O Reilly RailsConf,

Similar documents
Redis to the Rescue? O Reilly MySQL Conference

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

NoSQL Databases Analysis

CISC 7610 Lecture 2b The beginnings of NoSQL

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016

MySQL in the Cloud Tricks and Tradeoffs

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Triple R Riak, Redis and RabbitMQ at XING

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

Beating the Final Boss: Launch your game!

Running a Virtualized Splunk Enterprise Infrastructure Ted Knudsen

Database Solution in Cloud Computing

PostgreSQL migration from AWS RDS to EC2

Deploy. A step-by-step guide to successfully deploying your new app with the FileMaker Platform

Docker and HPE Accelerate Digital Transformation to Enable Hybrid IT. Steven Follis Solutions Engineer Docker Inc.

Unlimited Scalability in the Cloud A Case Study of Migration to Amazon DynamoDB

Announcements. Problem Set 3 is due this Tuesday! Midterm graded and will be returned on Friday during tutorial (average 60%)

Fixing Twitter.... and Finding your own Fail Whale. John Adams Twitter Operations

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Architecture and Design of MySQL Powered Applications. Peter Zaitsev CEO, Percona Highload Moscow, Russia 31 Oct 2014

Microservices. SWE 432, Fall 2017 Design and Implementation of Software for the Web

Introduction to Database Services

Improve WordPress performance with caching and deferred execution of code. Danilo Ercoli Software Engineer

CACHE ME IF YOU CAN! GETTING STARTED WITH AMAZON ELASTICACHE. AWS Charlotte Meetup / Charlotte Cloud Computing Meetup Bilal Soylu October 2013

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

Eventually Consistent HTTP with Statebox and Riak

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

AUTOMATED RESTORE TESTING FOR TIVOLI STORAGE MANAGER

How to bootstrap a startup using Django. Philipp Wassibauer philw ) & Jannis Leidel

How Enova Financial Uses Postgres. Jim Nasby, Lead Database Architect

My Other Car is a Redis. Etan Grundstein & Sasha Popov DYNAMIC YIELD

Caching Memcached vs. Redis

Matthias Wobben working in Berlin, Germany. Senior Sales Engineer at Nextcloud

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Proven video conference management software for Cisco Meeting Server

Amazon AWS-Solution-Architect-Associate Exam

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars

Beyond 1001 Dedicated Data Service Instances

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018

MySQL Database Scalability

Architekturen für die Cloud

Real-Time & Big Data GIS: Best Practices. Josh Joyner Adam Mollenkopf

Keeping Rails on the Tracks

+ + a journey to zero-downtime

The Long Road from Capistrano to Kubernetes

Principal Solutions Architect. Architecting in the Cloud

Jordan Levesque - Keeping your Business Secure

Introduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011

Automatic Renewal Using DIY Technology to Create an Improved Patron Experience

Data Center Fundamentals: The Datacenter as a Computer

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

How you can benefit from using. javier

Amazon AWS and RDS, moving towards it. Dimitri Vanoverbeke Solution Percona

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years

CIT 668: System Architecture. Amazon Web Services

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

MySQL Performance Improvements

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Developing Enterprise Cloud Solutions with Azure

Persistence & State. SWE 432, Fall 2016 Design and Implementation of Software for the Web

Identifying Workloads for the Cloud

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Give Your Site a Boost With memcached. Ben Ramsey

Life as a Service. Scalability and Other Aspects. Dino Esposito JetBrains ARCHITECT, TRAINER AND CONSULTANT

Application Deployment

Redis as a Reliable Work Queue. Percona University

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

ApsaraDB for Redis. Product Introduction

Case Study. Performance Optimization & OMS Brainvire Infotech Pvt. Ltd Page 1 of 1

BEYOND CLOUD HOSTING. Andrew Melck, Regional Manager DACH,

Using MySQL for Distributed Database Architectures

Map Reduce. Yerevan.

Manual Mysql Query Cache Hit Rate 0

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Migrating to Cassandra in the Cloud, the Netflix Way

Home of Redis. April 24, 2017

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

Atlassian s Journey Into Splunk

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

Large-Scale Web Applications

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

Persistence. SWE 432, Fall 2017 Design and Implementation of Software for the Web

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

INFRASTRUCTURE BEST PRACTICES FOR PERFORMANCE

Aurora, RDS, or On-Prem, Which is right for you

Redis Tuesday, May 29, 12

Erlang and VoltDB TechPlanet 2012 H. Diedrich

Sample Title. Magento 2 performance comparison in different environments. DevelopersParadise 2016 / Opatija / Croatia

Scaling DreamFactory

The SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

How to pimp high volume PHP websites. 27. September 2008, PHP conference Barcelona. By Jens Bierkandt

XP: Backup Your Important Files for Safety

Transcription:

O Reilly RailsConf, 2011-05- 18

Who is that guy? Jesper Richter- Reichhelm / @jrirei Berlin, Germany Head of Engineering @ wooga

Wooga does social games

Wooga has dedicated game teams Cooming soon PHP Cloud Ruby Cloud Ruby Bare metal Erlang Cloud

Flash client sends state changes to backend Flash client Ruby backend

Social games need to scale quite a bit 1 million daily users 200 million daily HTTP requests

Social games need to scale quite a bit 1 million daily users 200 million daily HTTP requests 100,000 DB operagons per second 40,000 DB updates per second

Social games need to scale quite a bit 1 million daily users 200 million daily HTTP requests 100,000 DB operagons per second 40,000 DB updates per second but no cache

A journey to 1,000,000 daily users Start of the journey 6 weeks of pain Paradise Looking back

We used two MySQL shards right from the start data_fabric gem Easy to use Sharding by Facebook s user id Sharding on controller level

Scaling applicalon servers was very easy Running at Amazon EC2 Scalarium for automagon and deployment Role based Chef recipes for customiza5on Scaling app servers up and out was simple

Master- replicalon for DBs worked fine lb app app app

We added a few applicalon servers over Lme lb app app app app app app app app app

Basic setup worked well for 3 months May Jun Jul Aug Sep Oct Nov Dec

A journey to 1,000,000 daily users Start of the journey 6 weeks of pain Paradise Looking back

SQL queries generated by Rubyamf gem Rubyamf serializes returned objects Wrong config => it also loads associagons

SQL queries generated by Rubyamf gem Rubyamf serializes returned objects Wrong config => it also loads associagons Really bad but very easy to fix

New Relic shows what happens during an API call

More traffic using the same cluster lb app app app app app app app app app

Our DB problems began May Jun Jul Aug Sep Oct Nov Dec

MySQL hiccups were becoming more frequent Everything was running fine...... and then DB throughput went down to 10% A]er a few minutes everything stabilizes again...... just to repeat the cycle 20 minutes later!

AcLveRecord s checks caused 20% extra DB load Check connecgon state before each SQL query MySQL process list full of status calls

AcLveRecord s status checks caused 20% extra DB Check connecgon state before each SQL query MySQL process list full of status calls

I/O on MySQL masters slll was the boyleneck New Relic shows DB table usage 60% of all UPDATEs were done on the Gles table

New Relic shows most used tables

Tiles are part of the core game loop Core game loop: 1) plant 2) wait 3) harvest All are operagons on Tiles table.

We started to shard on model, too We put this table in 2 extra shards old master old

We started to shard on model, too We put this table in 2 extra shards 1) Setup new masters as s of old ones old master old new master

We started to shard on model, too We put this table in 2 extra shards 1) Setup new masters as s of old ones old master old new master new

We started to shard on model, too We put this table in 2 extra shards 1) Setup new masters as s of old ones 2) App servers start using new masters, too old master old new master new

We started to shard on model, too We put this table in 2 extra shards 1) Setup new masters as s of old ones 2) App servers start using new masters, too 3) Cut replica5on old master old new master new

We started to shard on model, too We put this table in 2 extra shards 1) Setup new masters as of old ones 2) App servers start using new masters, too 3) Cut replica5on 4) Truncate not- used tables old master old new master new

8 DBs and a few more servers lb app app app app app app app app app app app app app app app app 5les 5les 5les 5les

Doubling the amount of DBs didn t fix it May Jun Jul Aug Sep Oct Nov Dec

We improved our MySQL setup RAID- 0 of EBS volumes Using XtraDB instead of vanilla MySQL Tweaking my.cnf inno_flush_log_at_trx_commit inno_flush_method

Data- fabric gem circumvented AR s internal cache 2x Tile.find_by_id(id) => 1x SELECT

Data- fabric gem circumvented AR s internal cache 2x Tile.find_by_id(id) => 1x SELECT

2 + 2 masters and slll I/O was not fast enough We were geing desperate: If 2 + 2 is not enough,... perhaps 4 + 4 masters will do?

It s no fun to handle 16 MySQL DBs lb app app app app app app app app app app app app app app app app app app 5les 5les 5les 5les

It s no fun to handle 16 MySQL DBs lb app app app app app app app app app app app app app app app app app app 5les 5les 5les 5les 5les 5les 5les 5les

We were at a dead end with MySQL May Jun Jul Aug Sep Oct Nov Dec

I/O remained the boyleneck for MySQL UPDATEs Peak throughput overall: 5,000 writes/s Peak throughput single master: 850 writes/s We could get to ~1,000 writes/s but then what?

Pick the right tool for the job!

Redis is fast but goes beyond simple key/value Redis is a key- value store Hashes, Sets, Sorted Sets, Lists Atomic opera5ons like set, get, increment 50,000 transacgons/s on EC2 Writes are as fast as reads

Shelf Lles : An ideal candidate for using Redis Shelf Lles: { plant1 => 184, plant2 => 141, plant3 => 130, plant4 => 112, }

Shelf Lles : An ideal candidate for using Redis Redis Hash HGETALL: load whole hash HGET: load a single key HSET: set a single key HINCRBY: increase value of single key

Migrate on the fly when accessing new model

Migrate on the fly - but only once true if id could be added else false

Migrate on the fly - and clean up later 1. Let this running everything cools down 2. Then migrate the rest manually 3. Remove the migragon code 4. Wait ungl no fallback necessary 5. Remove the SQL table

MigraLons on the fly all look the same Typical migragon throughput over 3 days Ini5al peak at 100 migra5ons / second

A journey to 1,000,000 daily users Start of the journey 6 weeks of pain Paredise (or not?) Looking back

Again: Tiles are part of the core game loop Core game loop: 1) plant 2) wait 3) harvest All are operagons on Tiles table.

Size mayers for migralons MigraGon check overloaded Redis Migra5on only on startup We overlooked an edge case Next 5me only migrate 1% of users Migrate the rest when everything worked out

In- memory DBs don t like to saving to disk You sgll need to write data to disk eventually SAVE is blocking BGSAVE needs free RAM Dumping on master increased latency by 100%

In- memory DBs don t like to dump to disk You sgll need to write data to disk eventually SAVE is blocking BGSAVE needs free RAM Dumping on master increased latency by 100% Running BGSAVE s every 15 minutes

ReplicaLon puts Redis master under load ReplicaGon on master starts with a BGSAVE Not enough free RAM => big problem Master queue requests that cannot process Restoring dump on s is blocking

Redis has a memory fragmenlon problem 44 GB 24 GB in 8 days

Redis has a memory fragmenlon problem 38 GB 24 GB in 3 days

If MySQL is a truck... Fast enough for reads Can store on disk Robust replicagon h\p://www.flickr.com/photos/erix/245657047/

If MySQL is a truck, Redis is a Ferrari Fast enough for reads Super fast reads/writes Can store on disk Out of memory => dead Robust replicagon Fragile replicagon h\p://www.flickr.com/photos/erix/245657047/

Big and stalc data in MySQL, rest goes to Redis Fast enough for reads Super fast reads/writes Can store on disk Out of memory => dead Robust replicagon Fragile replicagon 256 GB data 60 GB data 10% writes 50% writes h\p://www.flickr.com/photos/erix/245657047/

95% of operalon effort is handling DBs! lb lb app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app redis redis redis redis redis redis redis redis redis redis

We fixed all our problem - we were in Paradise! May Jun Jul Aug Sep Oct Nov Dec

A journey to 1,000,000 daily users Start of the journey 6 weeks of pain Paredise (or not?) Looking back

Data handling is most important How much data will you have? How o]en will you read/update that data? What data is in the client, what in the server? It s hard to refactor data later on

Monitor and measure your applicalon closely On applicagon level / on OS level Learn your app s normal behavior Store historical data so you can compare later React if necessary!

Listen closely and answer truthfully Q & A Jesper Richter- Reichhelm @jrirei slideshare.net/wooga

If you are in Berlin just drop by Thank you Jesper Richter- Reichhelm @jrirei slideshare.net/wooga