Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Similar documents
Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Introduction to Database Services

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

A Journey to DynamoDB

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 8 October 18, 2016

Cloud Analytics and Business Intelligence on AWS

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Large-Scale Web Applications

Amazon Aurora Relational databases reimagined.

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Beating the Final Boss: Launch your game!

2013 AWS Worldwide Public Sector Summit Washington, D.C.

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Gabriel Villa. Architecting an Analytics Solution on AWS

CS 655 Advanced Topics in Distributed Systems

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Amazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:

Cluster-Level Google How we use Colossus to improve storage efficiency

CIB Session 12th NoSQL Databases Structures

AWS 101. Patrick Pierson, IonChannel

Aurora, RDS, or On-Prem, Which is right for you

When, Where & Why to Use NoSQL?

Cloud Computing & Visualization

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

CIT 668: System Architecture. Amazon Web Services

Amazon Linux: Operating System of the Cloud

About Intellipaat. About the Course. Why Take This Course?

CACHE ME IF YOU CAN! GETTING STARTED WITH AMAZON ELASTICACHE. AWS Charlotte Meetup / Charlotte Cloud Computing Meetup Bilal Soylu October 2013

Next-Generation Cloud Platform

/ Cloud Computing. Recitation 9 March 17th and 19th, 2015

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS

Lean & Mean on AWS: Cost-Effective Architectures. Constantin Gonzalez, Solutions Architect, AWS

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Amazon. Exam Questions AWS-Certified-Solutions-Architect- Professional. AWS-Certified-Solutions-Architect-Professional.

Best Practices and Performance Tuning on Amazon Elastic MapReduce

4 Myths about in-memory databases busted

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product

CS / Cloud Computing. Recitation 11 November 5 th and Nov 8 th, 2013

5 Fundamental Strategies for Building a Data-centered Data Center

Unlimited Scalability in the Cloud A Case Study of Migration to Amazon DynamoDB

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

Aerospike Scales with Google Cloud Platform

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

Amazon AWS-Solution-Architect-Associate Exam

STATE OF MODERN APPLICATIONS IN THE CLOUD

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Overview of AWS Security - Database Services

Dynamo: Amazon s Highly Available Key-Value Store

Migrating to Cassandra in the Cloud, the Netflix Way

Reactive Microservices Architecture on AWS

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

Performance Evaluation of NoSQL Databases

Cloud Computing. What is cloud computing. CS 537 Fall 2017

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Why NoSQL? Why Riak?

Mega-scale Postgres How to run 1,000,000 Postgres Databases

Distributed Key-Value Stores UCSB CS170

Scaling Pinterest. Marty Weiner Level 83 Interwebz Geek

VOLTDB + HP VERTICA. page

Huge market -- essentially all high performance databases work this way

Exadata Implementation Strategy

Microservices on AWS. Matthias Jung, Solutions Architect AWS

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Data-Intensive Distributed Computing

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

Migrating Oracle Databases To Cassandra

Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

AWS Database Migration Service

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture

Under the Covers of DynamoDB. Steffen Krause Technology

A Guide to Architecting the Active/Active Data Center

NOSQL OPERATIONAL CHECKLIST

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

High Performance NoSQL with MongoDB

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

SQL, Scaling, and What s Unique About PostgreSQL

/ Cloud Computing. Recitation 9 March 15th, 2016

How to go serverless with AWS Lambda

MongoDB - a No SQL Database What you need to know as an Oracle DBA

Using MySQL for Distributed Database Architectures

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

QLIK INTEGRATION WITH AMAZON REDSHIFT

Stages of Data Processing

Cloud Gaming Architectures

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Transcription:

Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL

Building high performance apps There is a lot to building high performance apps Scalability Performance at high percentiles Availability Database choice on has disproportionate impact on these

History of Databases in Amazon Amazon.com is a big startup Composed of thousands of service Each service is developed and operated independently No central mandate or coordination (slows down execution..) A vast ecosystem.. Survival of the fittest! We have gone through multiple iterations on the following question What is the right database architecture for Amazon apps?

Relational Era Amazon.com page composed of responses from 1000s of independent services Query patterns for different service are different Catalog service is usually heavy key-value Ordering service is very write intensive (key-value) Catalog search has a different pattern for querying However: Usually, a relational database is the default database choice!

Relational Era (contd..) How did it go? Poor availability Poor Scalability (Q4/Christmas was a big project) Exorbitantly high costs for hardware, software and administration

Lessons Relational Database is used even when they are not the right tool! Didn t need all the query capabilities RDBMS provide Need a database that can provide: Extreme availability Seamless scalability: No more re-architecture for planning Embrace failures and make it part of normal operations (Hey, this was early 2000s when these were not obvious)

Distributed Systems Era: Amazon Dynamo Replicated DHT with consistency management Consistent hashing Optimistic replication Sloppy quorum Anti-entropy mechanisms Object versioning Specialist tool: Limited query capabilities Simpler consistency

Amazon Dynamo Usage Higher Availability Incremental Scalability Lower costs.. however less query capability Adopted by services for which scale and availability are most important Dynamo inspired many other internal variants for distributed caching, messaging etc.. Services that needed complex queries used RDBMS

Amazon Dynamo: Lessons learned What could have been better? Lack of strong consistency Forced a model which may not fit every app We forced every engineer to learn distributed systems Version clocks, sloppy quorum, anti-entropy, Cluster balancing, Operational complexity Required each service to carry pagers Manage their fleet Deal with performance tuning In the end, Amazon developers wanted Dynamo as a service not a product!

Cloud Era Time when AWS was just starting Developers loved: Why? Amazon S3 for storage Amazon EC2 for compute Lets them focus on their app Not deal with operations They wanted equivalent of S3 for databases Seamless scalability No operational overhead

Cloud Era: Amazon DynamoDB Non- Rela)onal Fast & Predictable Performance Seamless Scalability Easy Administra)on

We built DynamoDB to make developers life easier

Where is Amazon.com right now? NoSQL (DynamoDB) has been a huge central piece for Amazon Most of the online workloads are using DynamoDB No other solution meets our scale, availability and cost needs We use other cloud databases too! RDS for relational workloads ElastiCache for caching Redshift for warehouse applications EMR for analytics

So much for Amazon.com, what about AWS and its customers?

State of NoSQL in AWS: Brief Recap

DynamoDB: Looking ahead We will continue to invest in making sure DynamoDB continues to be Secure Extremely reliable Three datacenter replication Synchronous replication Extremely well tested replication pipeline No compromise on reliability for costs or performance Seamlessly scalable Cost effective Launched in April: 75% price drop for storage + 35% drop for throughput Reserved capacity options: 1-year = 53% discount; 3-year = 76% discount 4KB read capacity units

DynamoDB: Looking ahead (contd..) We will continue to improve query capabilities Launched Local secondary indexes (April 2013) Launched Parallel Scan API (May 2013) Launched geospatial indexing library today! Lot more to come.. We will continue to reduce your operational overhead Example: Dynamic DynamoDB, autoscale-dynamodb, etc.. We will continue to integrate with other AWS services seamlessly EMR integration One click copy to Redshift (Feb 2013) Data Pipeline template to backup/restore (Mar 2013) More to come..

ElastiCache Managed caching service Offered memcache as a service Added Redis support yesteday! Lookout for more caching features here..!

Run your own database on EC2 There is a rich ecosytem of NoSQL solutions in EC2 MongoDB Cassandra Riak Graph databases Pick the right solutions based on your needs.

Getting back to original question

How do I choose the right database for my app?

So many choices, what to pick? Choose the right tool for each job.

Redux.. Decision point #1: Optimize Query patterns Decision point #2: Plan for (business) success Decision point #3: Plan for (infrastructure) failures Decision point #4: What is the operational expense for my pick?

Decision #1: Choosing right query patterns Understand your apps s query pattern carefully Identify which queries need to scale linearly with growth in user base For those queries, pick a database architecture that scales linearly Perf should be same for 10MB table or 10GB table or 10TB table. If your db does not grow with your business growth Signing up for operational hell Don t think about sharding as after thought

Decision #1: Choosing right query patterns (contd..) Separate query patterns carefully Interactive part of your apps need to perform well and scale Avoid non-scalable queries in interactive user workflow Good real-time query Example: Load user preferences, set user preferences Bad real-time query Example: Compute all friends of friends for user A who are interested in X Perform complex queries, pre-compute and store in a cache Example: Compute recommendations for user-a and store in a cache

Optimize Query Patterns For time series data Separate cold data from hot data Enables you to separate read heavy workload from write heavy workload Example: Ordering application is a great example for time series data Past few days orders are hot 6 month old orders are cold Recommendation: Create an ordering table every week Store recent orders in this week s table Archive the old tables or dial down their read throughput You can query across tables

Decision #2: Plan for success Understand scale needs Talk to your CFO/product visionary/business owner What does success look like? Don t postpone tough decisions until you are successful Re-architecting while dealing with growth is a pain Pick query flexibility vs. scalability carefully Don t take shortcuts Plan to sleep well for other 51 weekends

Decision #2: Plan for success (contd..) Test for scale You will find strange bottlenecks in these tests Connection timeouts Cluster reconfiguration issues Load balancing.. Test how system scales More throughput capacity (for DynamoDB) More cache nodes (for elasticache) More ec2 instances (for run your own database)

Decision #3: Plan for failure Do not treat failure as a special case Replication and redundancy is key! Pick replication technology carefully Synchronous vs. Asynchronous Hint: If you care about your data, pick synchronous replication Multi-AZ vs. Single-AZ Hint: If you care about availability, pick Multi-AZ replication Pick replication factor carefully Two is a terrible number in distributed systems Three is better (and is not a crowd)

Decision #3: Test for failures.. Plans are only good intentions.. In DynamoDB, we test for failures Unit tests Mock tests Cluster tests Performance tests Datacenter failure tests Network degradation tests Dependency failure tests Also, we use strong theoretical foundation when necessary.. Fault injection testing is key!

Decision #4: What is the operational overhead? Understand the operational costs of your app Don t underestimate the cost of Managing hardware Maintaining and patching software Configuring and keeping multi-az replication Plan for repeated game days and hardware upgrades Plan for optimizing costs Plan for operations staff If a cloud service works for you and meets your needs (#1 to #3) great! If not, do it your own but plan accordingly.

Simple rule of thumb.. When you need seamless scale and super availability: DynamoDB Complex query workloads and need relational capabilities Choose Amazon RDS Usually MySQL is a good choice Caching ElastiCache - memcached for scaling key-value ElastiCache Redis for advanced datastructures For data warehousing: Choose Amazon Redshift Cases where these services are not the right fit: Build your own on EC2!

Thank you! swami@amazon.com