TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

Similar documents
DATABASE SCALE WITHOUT LIMITS ON AWS

HOW CLUSTRIXDB RDBMS SCALES WRITES & READS

THREE KEY ELEMENTS TO MAINTAINING A SUCCESSFUL E-COMMERCE WEBSITE

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Introduction to the Active Everywhere Database

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Intro Cassandra. Adelaide Big Data Meetup.

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Rule 14 Use Databases Appropriately

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

5 Fundamental Strategies for Building a Data-centered Data Center

CISC 7610 Lecture 2b The beginnings of NoSQL

QLIK INTEGRATION WITH AMAZON REDSHIFT

Architekturen für die Cloud

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster

COMPARISON WHITEPAPER. Snowplow Insights VS SaaS load-your-data warehouse providers. We do data collection right.

Become a MongoDB Replica Set Expert in Under 5 Minutes:

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

MariaDB MaxScale 2.0, basis for a Two-speed IT architecture

When, Where & Why to Use NoSQL?

Strategic Briefing Paper Big Data

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Embedded Technosolutions

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Now on Now: How ServiceNow has transformed its own GRC processes

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

Microsoft Exam

Progress DataDirect For Business Intelligence And Analytics Vendors

Exploring Amazon RDS MySQL Second Tier Read Replica

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Five Common Myths About Scaling MySQL

RAIFFEISENBANK BULGARIA

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Postgres Plus and JBoss

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

SQL Server 2017 for your Mission Critical applications

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Technology Overview ScaleArc. All Rights Reserved.

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Business today runs on technology. Modernize Your Datacenter. Challenges facing IT. Modernize Your Datacenter 10/17/ % Enterprise IT

מרכז התמחות DBA. NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Advanced Database Technologies NoSQL: Not only SQL

Identifying Workloads for the Cloud

Backup 2.0: Simply Better Data Protection

10. Replication. Motivation

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Big Data Analytics. Rasoul Karimi

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Transactions and ACID

NoSQL Databases Analysis

Real-time Protection for Microsoft Hyper-V

70-532: Developing Microsoft Azure Solutions

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Challenges for Data Driven Systems

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016

Intermedia s Private Cloud Exchange

CIB Session 12th NoSQL Databases Structures

Renovating your storage infrastructure for Cloud era

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Open Source Database Ecosystem in Peter Zaitsev 3 October 2016

IBM Compose Managed Platform for Multiple Open Source Databases

Tim Cohn TimWCohn

DATACENTER SERVICES DATACENTER

MongoDB - a No SQL Database What you need to know as an Oracle DBA

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

e BOOK Do you feel trapped by your database vendor? What you can do to take back control of your database (and its associated costs!

Microsoft Big Data and Hadoop

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

INTRODUCTION. In this guide, I m going to walk you through the most effective strategies for growing an list in 2016.

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Microsoft SQL Server on Stratus ftserver Systems

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

2013 AWS Worldwide Public Sector Summit Washington, D.C.

A Robust, Flexible Platform for Expanding Your Storage without Limits

THE MORE THINGS CHANGE THE MORE THEY STAY THE SAME FOR BACKUP!

Mining for insight. Osma Ahvenlampi, CTO, Sulake Implementing business intelligence for Habbo

Drecom Accelerates Anytime, Anywhere Gaming

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Hybrid Data Platform

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

MySQL Cluster Web Scalability, % Availability. Andrew

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

STATE OF MODERN APPLICATIONS IN THE CLOUD

CASE STUDY INSURANCE. Innovation at Moody's Analytics: A new approach to database provisioning using SQL Clone

Solid IT infrastructure foundations for Telford Homes

~3333 write ops/s ms response

NoSQL database and its business applications

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

CASE STUDY FINANCE. Enhancing software development with SQL Monitor

Pump Solutions Group CASE STUDY. The Whitewater gateway and the Veeam software work very well together to improve VM data protection.

Windows Server 2012/R2 Overview

Transcription:

TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day without a full-time DBA. ClustrixDB has seamlessly supported the exponential growth in Twoo users and new application features, expanding from 24 to 168 cores over the past 3 years. With this kind of scale-out simplicity designed for cloud, ClustrixDB is not your run-of-the-mill legacy SQL database. Read on to learn more about Twoo s massive workload 2 Terabyte tables with seven-way joins and complex analytics and the benefits of u sing a scale-out SQL database solution. CASE STUDY

With over 11 million unique visitors monthly, Twoo.com has grown into one of the largest social discovery sites and most popular network of its kind. What makes Twoo special, and sets it apart from its competitors, is that it s free to use and features innovative matchmaking algorithms that bring you closer to the people you want to meet. The site offers an enormous number of active and real members; 3 million active users every single day. It is available in 38 languages and 200 countries, with support for desktop and mobile devices. A Platform for Hyper-Scale Growth to Millions of Users When Facebook first saw rapid growth, very few scalable database options existed. The Facebook operational database was built and still runs on top of a highly customized sharded MySQL environment with an abstraction layer added in the middle. Separately they built Cassandra for inbox search. Websites such as Facebook and Twitter rolled out their own custom infrastructure because they had no choice. Twoo.com, on the other hand, has been able to build and grow entirely on a single scale-out Clustrix database, enabling sophisticated application and user queries. The distributed database industry has evolved quickly over the last few years. Innovative companies now have the option of using proven, off-the-shelf database platforms that don t require in-house database engineering so they can keep all of their engineering talent focused on application innovation. With ClustrixDB, Twoo.com never had to hardcode their application for database sharding, making the core simpler and cheaper to maintain. And the customer service provided by ClustrixDB engineers made sure that Twoo.com could run their business with five-9s of availability. Twoo Takes the New Path The Twoo.com team was passionate about avoiding a sharded database deployment. Massive Media, the company behind Twoo.com, has another website called Netlog. To scale Netlog s database in 2007, Massive Media decided to partition, or shard it by language. As the database continued to grow, this strategy proved insufficient and they were forced to shard by language and user ID, which required a very large in-house development effort to maintain and optimize the database layer. In addition, the Netlog development team was limited in the types of queries they could efficiently execute in this architecture. It was a maintenance headache. There is always some risk, but a trust relationship needs to build. We really liked Clustrix since it is building ground-breaking technology. Twoo.com started out on a single MySQL server and quickly ran into scale issues. At that time ClustrixDB was being tested as a slave, but when Twoo s MySQL server could not handle the load and crashed, they switched over completely to ClustrixDB. By 2010, the product was in production and they have not looked back since. World s Largest Scale-Out SQL Deployment Twoo.com supports millions of complex user interactions every day, resulting in billions of transactions per day. It is also a data-intensive application, with petabytes of read/write data volume each month. 2

Twoo.com runs entirely on a single ClustrixDB database cluster. It is the largest scale-out SQL database in the world. Twoo.com has a master cluster and a slave cluster for disaster recovery. 30 MILLION USERS 4.4 BILLION TRANSACTIONS PER DAY 70,000 TRANSACTIONS PER SECOND The Twoo Application The goal of the Twoo application is to be the most fun way to meet new people near you. It allows users to log in and discover interesting people who match their interests. Twoo.com is not just a dating site, since it also enables users to find friends and develop relationships. With the site s high number of users and fast growth, the application has challenging requirements. Twoo runs the entire website and stores all their business-critical data on ClustrixDB. In addition, they have logging data that goes to Hadoop, where offline analytics and reporting are performed, albeit with a multi-hour delay. Primary Data Profile Core to the Twoo.com site is the user information, which is updated very frequently. The site also includes the text and messaging system allowing users to communicate with one another. To support the matchmaking algorithm, the application uses many multi-billion row tables. Naturally these tables have an intense query load, and schemas change as the company s algorithm evolves. User_contactlist recently had an online schema change for a table that is nearly 2 terabytes in size and ClustrixDB handled it easily. Complex Analytics Queries & Recommendations Ad hoc analytic queries are performed routinely in ClustrixDB. Some queries can be challenging. For example, a male may be interested in finding females with certain characteristics between 30 to 35 years of age, but we can only show him females who are also looking for males who are 33 years of age. 3

This type of complex query needs split-second response time and is typical of social applications. It presents a more complicated problem than is dealt with by traditional e-commerce sites such as Amazon. For example, a bookstore can easily suggest that users who bought book_1 also buy book_2. Clearly book_2 is not going to decide it s not interested in customer A! An example query (bottom) to get the top messages in your inbox as you log in has a 5-table join, including the chat_message table with 4.7 billion rows and the chat_conversation table that is more than 300GB in size. (Right) This example query looks up the list of friends of a user; it involves a 7-table join including uxxx_cxxx, a table that is 1.9TB in size. Strict Low-Latency Requirements A critical requirement for Twoo.com is that every page must render within 800 milliseconds. Given the network latency, resources (such as images and javascript), and application work, the total database access time must be less than 40 milliseconds. ClustrixDB is up to the challenge with average query times under 5 milliseconds. Other Options Sharding MySQL and NoSQL are other options that companies struggling with scale typically consider. These options are often inferior. Here s why these options didn t work for Twoo. 4

Why Clustrix is Preferable to Sharding We spent a lot of time on optimizing for performance and scale. Now we can spend those resources better. With ClustrixDB, database-wide ad hoc queries can be run easily to mine the live database for useful trends a feature that was not possible in a sharded architecture. In their previous web property Netlog, the company needed multiple DBAs to maintain the sharded infrastructure. Now they don t have any dedicated DBAs, only application developers. Toon Coppens, CTO and co-founder of Twoo, said, It does not make sense to build a new site using the old system of sharding MySQL. Pre-ClustrixDB, we spent a lot of time on optimizing for performance and scale. Now we can spend those resources better. Why NoSQL Was Not an Option NoSQL databases didn t provide any real-time analytics capabilities. Twoo really cares about SQL and ACID guarantees. In addition to their free user tier, Twoo provides users with premium for-fee functionality. This level of service requires the data on their end to be correct and consistent all the time with no issues. When first considering the database options, Twoo looked at MongoDB, Redis, and other NoSQL data stores, but the company could not tolerate any inconsistencies or irregularities in the data store. They also found that tool support for the NoSQL databases was lacking. Tools such as phpmyadmin show data in an understandable way, but the company was not able to find such tools for the NoSQL databases. To make matters worse, NoSQL databases don t provide any real-time analytics capabilities, as they offer no support for joins and other SQL constructs. Twoo s Journey with ClustrixDB & Its Rapid Growth Since 2010 With ClustrixDB, we have not run into scaling issues anymore. As we need capacity, we just add nodes and see linear growth. We have seamlessly gone from a 3-node cluster to a 21-node cluster. Nicolas Van Eenaeme, CIO of Twoo s parent company Massive Media Growth The design decision to use ClustrixDB enabled very rapid development, and the Twoo.com website went from concept to launch in less than six months from December 2010 to May 2011 with less than 30 developers. The 5

Twoo.com website experienced greater than 1,000% growth from June to December 2011. The site started on a single MySQL server and quickly ran out of capacity. Since switching to ClustrixDB, they have been able to handle growth in traffic simply by adding more nodes. Now they re pushing the limits of the MySQL replication protocol as they run the highest-volume MySQL replication stream on the planet to their disaster recovery cluster. Business Impact On the rapid growth experienced by Twoo.com, Van Eenaeme said, Growth as a web 2.0 company was challenging because in 1.5 years it became the largest dating site. When asked if he believes ClustrixDB helped the company grow, he added, It definitely helped us grow faster. Our company wants to build the website and its features and not worry about the database or database development. We don t have a database lab like Clustrix has. We also didn t want to hire a team of DBAs. Clustrix has many high-traffic customers. If you d ask us again, we d happily go down this path. We don t regret it one bit since then. ClustrixDB is designed to be self-managing, and ClustrixDB Insight provides a rich view into the database. The rich built-in statistics inform both the user and the automation inside ClustrixDB. ClustrixDB intelligently distributes data across the cluster. Tables are divided into slices and each slice has a replica. Indexes are internally treated like tables and get their own distribution. The patented Rebalancer running in the background makes sure that the data and load are always evenly distributed. New nodes are added with a single command or click and the Rebalancer will move data to the new node. If node or disk failure occurs, the Rebalancer will regenerate lost copies and move them to other nodes automatically. If the cluster is under stress or some event needs attention, the database will send an email to notify the user and will optionally alert the Clustrix support team. Clustrix Clustrix provides the leading scale-out SQL database engineered for the cloud. With ClustrixDB you can build innovative business critical applications that deliver real-time analytics on live operational data with massive transactional volume. Our exceptional customer service team supports more than one trillion transactions per month across a range of industry segments including Ad Tech, e-commerce, and social analytics. Clustrix customers include AOL, engage:bdr, MedExpert, Photobox, Rakuten, Symantec, and Twoo.com. Headquartered in San Francisco, Clustrix is funded by HighBAR Partners, Sequoia Capital, U.S. Venture Partners, Don Listwin, and ATA Ventures. ClustrixDB is available as a free community edition for developers, a software download that runs on any cloud, and on the AWS marketplace. To learn more about Clustrix, visit us at www.clustrix.com 6