Introduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011

Similar documents
DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

Distributed Computation Models

What am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle

From Relational to Riak

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Tuning Intelligent Data Lake Performance

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Riak. Distributed, replicated, highly available

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

Redis to the Rescue? O Reilly MySQL Conference

Introduction to the Active Everywhere Database

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.

#IoT #BigData. 10/31/14

Tuning Intelligent Data Lake Performance

Eventually Consistent HTTP with Statebox and Riak

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Effective Testing for Live Applications. March, 29, 2018 Sveta Smirnova

HBase Solutions at Facebook

Developing MapReduce Programs

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Distributed Data Management Replication

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Scalable Streaming Analytics

CS 655 Advanced Topics in Distributed Systems

MySQL High Availability

How to pimp high volume PHP websites. 27. September 2008, PHP conference Barcelona. By Jens Bierkandt

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Aurora, RDS, or On-Prem, Which is right for you

Triple R Riak, Redis and RabbitMQ at XING

/ Cloud Computing. Recitation 9 March 15th, 2016

Job sample: SCOPE (VLDBJ, 2012)

Working with Velti NoSQL Roadshow Basel

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

VOLTDB + HP VERTICA. page

ApsaraDB for Redis. Product Introduction

Python, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA

Relational databases

Search Engines and Time Series Databases

Non-Relational Databases. Pelle Jakovits

Migrating and living on RDS/Aurora. life after Datacenters

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

Remote Procedure Call. Tom Anderson

What is the Future of PostgreSQL?

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Why NoSQL? Why Riak?

MySQL Replication. Rick Golba and Stephane Combaudon April 15, 2015

Dr. Chuck Cartledge. 3 Dec. 2015

How you can benefit from using. javier

Choosing a MySQL HA Solution Today

PhxSQL: A High-Availability & Strong-Consistency MySQL Cluster. Ming

The Stream Processor as a Database. Ufuk

Using MySQL for Distributed Database Architectures

Data-Intensive Distributed Computing

Clustering Lecture 8: MapReduce

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

CIB Session 12th NoSQL Databases Structures

Extreme Computing. NoSQL.

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Database Architectures

Apache Flink. Alessandro Margara

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12

Preventing and Resolving MySQL Downtime. Jervin Real, Michael Coburn Percona

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

What Came First? The Ordering of Events in

MySQL Cluster Web Scalability, % Availability. Andrew

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Scaling for Humongous amounts of data with MongoDB

Tuning Enterprise Information Catalog Performance

Utilizing Databases in Grid Engine 6.0

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

ProxySQL Tutorial. With a GPL license! High Performance & High Availability Proxy for MySQL. Santa Clara, California April 23th 25th, 2018

Scaling Up from 1000 to 10 Nodes

How to setup Orchestrator to manage thousands of MySQL servers. Simon J Mudd 3 rd October 2017

CISC 7610 Lecture 2b The beginnings of NoSQL

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà

Hadoop An Overview. - Socrates CCDH

Hadoop. copyright 2011 Trainologic LTD

Fixing Twitter.... and Finding your own Fail Whale. John Adams Twitter Operations

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Accelerating NoSQL. Running Voldemort on HailDB. Sunny Gleason March 11, 2011

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com

IBM InfoSphere Streams v4.0 Performance Best Practices

Transactions and ACID

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

ProxySQL - GTID Consistent Reads. Adaptive query routing based on GTID tracking

Transcription:

Scaling at Showyou Operations September 26, 2011

I m Kyle Kingsbury Handle aphyr Code http://github.com/aphyr Email kyle@remixation.com Focus Backend, API, ops

What the hell is Showyou?

Nontrivial complexity

Challenges ˆ Scanning social networks ˆ Feeds ˆ Search ˆ Trends ˆ Responsive client experience

Challenges ˆ Scanning social networks ˆ Feeds ˆ Search ˆ Trends ˆ Responsive client experience ˆ Everything fails all the time

Storage

We left MySQL

We left MySQL ˆ Changing the schema requires downtime ˆ Crashes ˆ Master-slave lag ˆ Slow restarts ˆ Node replacements difficult ˆ Fully normalized queries complex, slow

MySQL does scale But there are tradeoffs

Riak ˆ Key/value store ˆ Homogenous ˆ Scales linearly with nodes ˆ Excellent durability/recoverability ˆ Eventually consistent

We use Riak as our durable datastore ˆ Users, feeds, videos, etc ˆ Highly denormalized ˆ Limited MR queries (feeds, etc) ˆ Latency-bounded MR jobs are Erlang ˆ Hot-deployable ˆ Extensive use of conflict resolution ˆ Made possible by Risky

Riak at Showyou ˆ 51 million keys (153 M replicated) ˆ 100 GB of data (300 GB replicated) ˆ 260 gets/sec (baseline) ˆ 75 puts/sec (baseline) ˆ Capable of over 3000 ops/sec

SSDs are amazing WD 7200RPM ˆ 100 ops/sec ˆ 95%: 100-300ms Micron RealSSD P300 ˆ 1000+ ops/sec ˆ 95%: 3-5ms

When Riak fails, ˆ Another node takes up the slack ˆ Clients connected to that node reconnect to others ˆ Typically no service interruption ˆ However, latencies may rise ˆ Especially for MR jobs

Riak has downsides ˆ Difficult to debug ˆ Membership changes are dangerous ˆ Significantly slower than MySQL ˆ (Bitcask) All keys must fit in memory ˆ Mapreduce is only appropriate for known keys ˆ List-keys can take down your cluster Long story short: it s only a KV store

+Redis

We use Redis for fast, temporary state ˆ List of users ˆ List of videos ˆ Counters ˆ Queues Incredibly fast, excellent primitives

When Redis fails, ˆ Daemons using those indexes pause ˆ Frontend service continues ˆ Bitcask scanners and incremental updaters repair any lost data

When Redis fails, ˆ Daemons using those indexes pause ˆ Frontend service continues ˆ Bitcask scanners and incremental updaters repair any lost data Eventually consistent.

We also use SOLR extensively ˆ Supplements Riak ˆ Complex indices ˆ Full-text search ˆ Analytics More on that later...

Processing

Do one thing well Lots of small processes handling well-defined tasks ˆ Easier to debug ˆ Easier to test ˆ Simplifies parallelism ˆ Simplifies error handling ˆ Less likely to cause total system failure

Minimize Shared State ˆ Vector clocks for concurrent modification ˆ Queues for message passing ˆ Riak for durable storage ˆ Redis for fast synchronous state

Crash by Default ˆ Someone else will take your work ˆ Repair constantly ˆ Assume everybody is out to kill you

Distribute ˆ Multiple threads, processes, hosts ˆ Failover IPs with Heartbeat ˆ Rolling restarts mean frequent deploys and nobody notices ˆ Losing a node is no big deal ˆ Scaling out is easy

Monitoring

UState: A state aggregator

Receive states over protobufs Host backend1.showyou.com Service feed merger rate Time unix epoch seconds State ok Metric 12.5 Description 12.5 feed items/sec

Query states ˆ state = "warning" or state = "critical" ˆ service = "api %" and host!= null

ˆ Combine states together (sum, average,... ) ˆ Send email on changes ˆ Forward to another UState server ˆ Forward to Graphite ˆ Dashboard

Understand application behavior

When can we...?

It s 23:15 PST.

It s 23:15 PST. Do you know where YOUR database is?

http://github.com/aphyr/ustate

Recap ˆ Robust, discrete components ˆ Highly distributed ˆ Message passing ˆ Eventual consistency ˆ Comprehensive monitoring

Thanks! ˆ Basho (esp. Pharkmillups!) ˆ Formspring ˆ Bump