The End of an Architectural Era (It's Time for a Complete Rewrite)

Similar documents
Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Low Overhead Concurrency Control for Partitioned Main Memory Databases

NewSQL Databases. The reference Big Data stack

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

The End of an Architectural Era (It s Time for a Complete Rewrite)

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

The End of an Architectural Era (It s Time for a Complete Rewrite)

Column-Stores vs. Row-Stores: How Different Are They Really?

Using MVCC for Clustered Databases

C-STORE: A COLUMN- ORIENTED DBMS

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

NewSQL Databases MemSQL and VoltDB Experimental Evaluation

Distributed KIDS Labs 1

Concurrency Control In Distributed Main Memory Database Systems. Justin A. DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Column Stores vs. Row Stores How Different Are They Really?

Traditional RDBMS Wisdom is All Wrong -- In Three Acts. Michael Stonebraker

Main-Memory Databases 1 / 25

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

An Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS

Column-Stores vs. Row-Stores: How Different Are They Really?

NewSQL. Database Landscape From: the 451 group. OLTP Focus. NewSQL: Flying on ACID. Cloud DB, Winter 2014, Lecture 14

NewSQL: Flying on ACID

CompSci 516 Database Systems

Traditional RDBMS Wisdom is All Wrong -- In Three Acts "

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

STEPS Towards Cache-Resident Transaction Processing

VOLTDB + HP VERTICA. page

Architecture of C-Store (Vertica) On a Single Node (or Site)

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL

CSE 544: Principles of Database Systems

C-Store: A column-oriented DBMS

Scott Meder Senior Regional Sales Manager

CSE544 Database Architecture

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Primary-Backup Replication

Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, Samuel Madden Presenter : Akshada Kulkarni Acknowledgement : Author s slides are used with

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Distributed PostgreSQL with YugaByte DB

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

CSE 530A ACID. Washington University Fall 2013

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

A Database System Performance Study with Micro Benchmarks on a Many-core System

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Chapter 1 Introduction

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

HyPer-sonic Combined Transaction AND Query Processing

Performance comparisons and trade-offs for various MySQL replication schemes

HyPer-sonic Combined Transaction AND Query Processing

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CSE 544 Principles of Database Management Systems

Integrity in Distributed Databases

Database Architectures

Panu Silvasti Page 1

IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES

Huge market -- essentially all high performance databases work this way

10. Replication. Motivation

Crescando: Predictable Performance for Unpredictable Workloads

Which technology to choose in AWS?

PostgreSQL: Hyperconverged DBMS

Schema-Agnostic Indexing with Azure Document DB

Table of contents. OpenVMS scalability with Oracle Rdb. Scalability achieved through performance tuning.

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Fundamentals Chapter 1

CSE 444: Database Internals. Lectures 13 Transaction Schedules

Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013

6.UAP Final Report: Replication in H-Store

Performance tests of Hypertable. Mauro Giacchini, Robert Petkus Group Meeting 10/27, BROOKHAVEN SCIENCE

Scaling PostgreSQL on SMP Architectures

Data Tamer: A Scalable Data Curation System. Michael Stonebraker

Chapter 11 - Data Replication Middleware

Lock Tuning. Concurrency Control Goals. Trade-off between correctness and performance. Correctness goals. Performance goals.

Consistency in Distributed Systems

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia Final Exam. Administrivia Final Exam

VoltDB for Financial Services Technical Overview

Announcements. SQL: Part IV. Transactions. Summary of SQL features covered so far. Fine prints. SQL transactions. Reading assignments for this week

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek

ScaleArc Performance Benchmarking with sysbench

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Transaction Management: Concurrency Control, part 2

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

A Single Source of Truth

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Transcription:

The End of an Architectural Era (It's Time for a Complete Rewrite) Michael Stonebraker Samuel Madden Daniel Abadi Stavros Harizopoulos Nabil Hachem Pat Helland Paper presentation: Craig Hawkins craig_hawkins@brown.edu February 23, 2015

2007 An emergent time...

2 3 1 5 4 1:www.apple.com (Jan 2007 announcement) 2:www.toyota.com (mainstream ~ 2007) 3:www.intel.com (rollout late 2006 - early 2007) 4:www.wikipedia.org (worldwide rollout jan 2007.. 32 and 64 bit editions...mac >> intel chips 2006) 5: www.facebook.com

While all of this is happening, there is unrest in the database forest... 1 1: phrase adapted from Trees, by the music group Rush

Meet the new boss: Michael Stonebraker Princeton, Michigan U.C. Berkeley, M.I.T. 1 (Same as the old boss) 2 Ingres, Postgres, Vertica, VoltDB H-Store, Aurora 1: www.wikipedia.org 2: lyrics from Won't Get Fooled Again, by the music group The Who It's no longer a one-size-fits-all database world.

The rationale: 350 300 250 200 150 100 50 Seagate hard drive mb/s transfer (normalized) Thinkpad RAM (normalized) 0 1992 1996 2007 sources: www.wikipedia.org cs.berkeley.edu http://www.eetimes.com/document.asp?doc_id=1209142 http://oldcomputers.net/ibm-thinkpad.html http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15k.5/sas/100384784c.pdf

Key aspects of H-Store...

H-Store Architecture A shared-nothing grid of computers. Each H-Store site is single-threaded, and performs incoming SQL commands to completion, without interruption. Each site is decomposed into a number of single-threaded logical sites, one for each core; each logical site gets its own partition of the main memory. Each node manages one or more "partitions" i.e. disjoint sets of data. Transactions may be initiated from any site on the grid. (VLDB paper, pg 5). A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage. Shared-nothing architectures are highly scalable, Google et. al. www.wikipedia.org

1 1. http://cs.brown.edu/courses/csci1270/files/lectures/l19_paralleldbs_1.pdf NAS: "network-attached storage", RAID, etc. contention/coordination doesn't scale well... the mythical man month control node, non-standardized middleware, erlang, go

A single-threaded execution model

Advantage: Simplification: No need for multi-threaded data structures or concurrent B-trees

Disadvantage: Not good for long-running, ad-hoc commands. However, this does not happen often in OLTP environments.

An in-memory data store Useful transaction work is completed in under 1 millisecond. Factor of 82 speed up on TPC-C benchmark. 1 1. 2007 VLDB Conference, The End of an Architectural Era, pgs 1,3 TPCC: Transaction Processing Performance Council

The complete H-Store workload is specified in advance as a collection of transaction classes. 1 1. 2007 VLDB Conference, The End of an Architectural Era, pg 4 some of these classes perform better than others

Optimal classes: Transactions that can be completed entirely at a single node (i.e. single sited) can run without stalls. They can also run without much intranode communication. 1. 2007 VLDB Conference, The End of an Architectural Era, pg 4

Optimal classes: One-shot applications: all of their transactions can be executed in parallel with little or no intra-node communication. Each transaction can be decomposed into a collection of single-site plans 1. 2007 VLDB Conference, The End of an Architectural Era, pg 4 ~divide and conquer approach

H-Store's "automatic physical database designer": Specify horizontal partitioning, replication locations, and indexing that maximizes the likelihood of transactions being single-sited. 1. 2007 VLDB Conference, The End of an Architectural Era, pg 4 other aspects: two-phase, strongly two-phase, commutable, sterile

The complete workload is specified in advance as a collection of transaction classes. 1 SQL statements must be executed as stored procedures. 1. 2007 VLDB Conference, The End of an Architectural Era, pg 4

What on 1 is a stored procedure? 1. www.nasa.gov

A stored procedure is a named, parameterized SQL routine. It is stored in the database as part of the schema.

What are some of its advantages: Reduced network congestion Security: permissioning abstraction, indirect user-access to tables DBMS can optimize / compile once rather than each time.

What are some of its disadvantages: Increased programmer sophistication Varying vendor implementation DB admin. overhead: 1000 tables X 4 CRUD = 4000 prepared statements.

How do you create one: CREATE PROCEDURE get_balance(in customerid CHAR(10)) SELECT balance FROM accounts WHERE id = customerid; 1. www.nasa.gov

How do you use one: CALL get_balance(1004200011); cs = this.con.preparecall("{call SHOW_SUPPLIERS()}"); ResultSet rs = cs.executequery(); while (rs.next()) { String supplier = rs.getstring("sup_name"); String coffee = rs.getstring("cof_name"); System.out.println(supplier + ": " + coffee); } 1. http://docs.oracle.com/javase/tutorial/jdbc/basics/storedprocedures.html#calling_javadb_mysql Java DB with mysql example

H-Store's Transaction management...

Each transaction receives a timestamp on entry into H-Store: site_id local_unique_timestamp 1 transactions are executed in timestamp order clocks need to be nearly in sync 1. 2007 VLDB Conference, The End of an Architectural Era, pg 6 clock sync vis-a-vis NTP: Network Time Protocol sterile transcactions: fully "commutable" H-Store classes. order doesn't matter... don't have to wait otherwise wait a small amount to account for network delays, replication, clock variance...infiniband all replicas update in the same order

H-Store takes an "optimistic" approach Favors aborts over heavy locking and latch wait schemes Uses a transaction coordinator, and a transaction monitor. locks, rows. latch waits: protecting a block's integrity when moving in/out of disk transaction coordinator: at arrival site. sends out subplan pieces to "worker sites". when gets "ok" from all sites, proceeds to next collection of subplans. when no more subplans...commits, otherwise aborts. (VLDB 2007 paper: pg 6) small time stalls placed in to see if a subplan with an earlier timestamp appears (VLDB 2007 paper: pg 6) transaction monitor: if notices too many aborts, will adjust plan (VLDB 2007 paper: pg 6) OLTP: fast insert, update, delete

Where's the disk? 1 1. www.wikipedia.org

Data persistence is achieved through snapshots and a command log. Snapshot: each DBMS node continuously write asynchronous snapshots of the entire database to disk at fixed intervals. Command log: in-between snapshots, a disk-based command log records the successful transactions only. Interestingly, the log contains the transactions, not record-level logging. ACID: client can't see effects of a transaction until it gets recorded in the command log. Optimizations: batch processing of command log writes, delta updates for snapshotting (VLDB 2013 paper pgs 3, 6)

Major drawback: What happens when you run out of main memory? This brings us to...

Anti-caching! Go Christian!