DATABASES IN THE CMU-Q December 3 rd, 2014

Similar documents
Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

VOLTDB + HP VERTICA. page

A Predictive Load Balancing Service for Cloud-Replicated Databases

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

BenchPress: Dynamic Workload Control in the OLTP-Bench Testbed

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Part #1 Part #2 Part #3. Background Engineering Oracle Rant

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Part #1 Part #2 Part #3. Background Engineering Oracle Rant

Introduction to Database Services

IBM Compose Managed Platform for Multiple Open Source Databases

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

DATABASE SCALE WITHOUT LIMITS ON AWS

Autonomous Database Level 100

White Paper / Azure Data Platform: Ingest

28 February 1 March 2018, Trafo Baden. #techsummitch

5 Fundamental Strategies for Building a Data-centered Data Center

Oracle Autonomous Database

RACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Amazon AWS-Solution-Architect-Associate Exam

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

What is Data Warehouse like

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Safe Harbor Statement

HyPer-sonic Combined Transaction AND Query Processing

The Freedom to Choose

Aurora, RDS, or On-Prem, Which is right for you

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

OtterTune. Automatic Database Management System Tuning Through Large-scale Machine Learning

Transform your data estate with cloud, data and AI

Amazon Aurora Deep Dive

Model-Driven Geo-Elasticity In Database Clouds

Data-Intensive Distributed Computing

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Pervasive Insight. Mission Critical Platform

An Insider s Guide to Oracle Autonomous Transaction Processing

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Performance Isolation in Multi- Tenant Relational Database-asa-Service. Sudipto Das (Microsoft Research)

AWS Solution Architect Associate

CompSci 516 Database Systems

Modernizing Business Intelligence and Analytics

Modern Data Warehouse The New Approach to Azure BI

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Oracle #1 RDBMS Vendor

Big Data solution benchmark

Benchmarking Database Cloud Services

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

CA ERwin Data Modeler s Role in the Relational Cloud. Nuccio Piscopo.

HyPer-sonic Combined Transaction AND Query Processing

DOLLY: Virtualization-Driven Database Provisioning for the Cloud

MySQL Cluster Web Scalability, % Availability. Andrew

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

What is new in the cloud? Donald Kossmann ETH Zurich

Percona Server for MySQL 8.0 Walkthrough

S-Store: Streaming Meets Transaction Processing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 17 Database Systems as a Cloud Service

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

Oracle made it easy: Cloud DB Vergleich

HyPer on Cloud 9. Thomas Neumann. February 10, Technische Universität München

Cloud & AWS Essentials Agenda. Introduction What is the cloud? DevOps approach Basic AWS overview. VPC EC2 and EBS S3 RDS.

Which technology to choose in AWS?

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Distributed Databases: SQL vs NoSQL

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 3 days Instructor Led

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

Migrating Enterprise Applications to the Cloud Session 672. Leighton L. Nelson

Shark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Cloud Analytics and Business Intelligence on AWS

Amazon Aurora Deep Dive

Memory-Based Cloud Architectures

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

SQL Server on Linux and Containers

Acknowledgements. Beyond DBMSs. Presentation Outline

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

SQL Server Everything built-in

Crescando: Predictable Performance for Unpredictable Workloads

Overview of AWS Security - Database Services

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

HYRISE In-Memory Storage Engine

AWS Database Migration Service

Informatica Cloud Data Integration Winter 2017 December. What's New

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Cross-Site Virtual Network Provisioning in Cloud and Fog Computing

Transcription:

DATABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd, 2014

OLTP vs. OLAP databases. Source: https://www.flickr.com/photos/adesigna/3237575990

On-line Transaction Processing Fast operations that ingest new data and then update state using ACID transactions. Only access a small amount of data. Volume: 1k to 1m txn/sec Latency: >1-50 ms Database Size: 100s GB to 10s TB 3

Example the OLTP database. -line game in Game Application Framework Click Stream OLTP DBMS Pre-computed model decides the next level the player is shown. Game Updates 4

Example the OLTP database. -line game in Game Application Framework Click Stream OLTP DBMS Game Updates Real-time Monitoring 5

Database Warehouses Complete history of OLTP databases. Complex queries that analyze large segments of fact tables and combine them with dimension tables. Volume: A couple queries per second Latency: 1-60 seconds Database Size: 100s TB to 10s PB 6

Example Compute model used to guide OLTP DBMS decisions from historical data. Game Application Framework Click Stream OLTP DBMS ETL OLAP DBMS Game Updates New Model 7

OLTP vs. OLAP Storage Format: OLTP Row-oriented OLAP Column-oriented Primary Database Location: OLTP In-Memory OLAP Disks Workloads: OLTP Write-Heavy OLAP Read-Only 8

Things to consider with databases in the cloud. Source: https://www.flickr.com/photos/arvidnn/15285491335

Good Things Better Resource Utilization Elastic Scaling Database-as-a-Service Offerings 10

Better Resource Utilization Combine multiple silos onto overprovisioned resources. Public platform providers achieve better economies of scale. Database machines are (mostly) dead. Optimal multi-tenant placement is a difficult problem. 11

Elastic Scaling Automatically provision new resources on the fly as needed. Scaling up vs. Scaling out. Difficult for OLTP DBMS to continue processing transactions while data migrates. 12

OLTP Scale-out Example Elapsed Time TPC-C Benchmark on H-Store (Fall 2014) Scaling from 3 to 4 nodes E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing R.Taft, E.Mansour, M.Serafini, J.Duggan, A.J. Elmore, A.Aboulnaga, A.Pavlo, M.Stonebraker Proceedings of the VLDB Endowment, vol. 8, iss. 3, pages. 245 256, November 2014. 13

Database-as-a-Service Cloud provider manages physical configuration of a DBMS. Ideal for applications that are co-located in Combine private data with curated databases (i.e., data marts) 14

Bad Things I/O Virtualization File system Replication Security + Privacy Concerns Performance Variance 15

I/O Virtualization Distributed file system stores data transparently across multiple nodes. This causes a DBMS pull data to query push query to data 16

OLAP I/O Virtualization SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC Terabytes! OLAP DBMS Distributed Filesystem 17

OLAP I/O Virtualization SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC OLAP DBMS Bytes! Distributed Filesystem 18

File System Replication The DBMS should not rely on file system replication for durability. OLTP systems maintain replicas in-memory. OLAP systems can store copies of tables in different ways on replica nodes. 19

OLAP Replication Sort Order OLAP DBMS Table 1: Table 2: Table 1: Table 2: name name id id Replica #1 Replica #2 Sort Order 20

OLAP Replication Sort Order Table1.name Table2.name OLAP DBMS Table 1: Table 2: Table 1: Table 2: name name id id Replica #1 Replica #2 Sort Order 21

Security + Privacy Concerns No truly encrypted solution exists. Many companies are unable to use public cloud platforms. 22

Performance Variance DBMSs are sensitive to changes in underlying hardware performance. large fluctuations in performance. 23

OLTP Performance Variance 35% Difference YCSB on MySQL (Winter 2012) Medium EC2 Instances OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, Philippe Cudre-Mauroux Proceedings of the VLDB Endowment, vol. 7, pages. 277 288, December 2013. 24

Cloud database vendors. Source: https://www.flickr.com/photos/alestra/8891585632

Important Features Automatic Back-ups Geo-replication Elasticity / Live Reconfiguration Efficient Multi-Tenancy Workload Awareness 26

Cloud Database Vendors Cloud-friendly systems Database-as-a-Service (DBaaS) 27

Cloud-friendly DBMSs Most DBMS vendors make it easy to deploy on cloud platforms. Others provide support for easy scale-out in a cloud environment. More than just pre-configured instances. 28

OLTP DBaaS Amazon RDS / Aurora Microsoft Azure Google Cloud SQL Database.com ClearDB GenieDB Clustrix 29

Amazon Redshift Google BigQuery Microsoft Azure Snowflake OLAP DBaaS 30

Parting Thoughts The cloud does not magically make database problems go away. DBMS on the cloud. AFAIK, there is no truly autonomous DBMS as of yet. 31

32

END @andy_pavlo