Fast HTAP over loosely coupled Nodes

Similar documents
10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Database Management Systems

TRANSACTIONS AND ABSTRACTIONS

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Advanced Databases Lecture 17- Distributed Databases (continued)

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

TRANSACTIONS OVER HBASE

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

Building Consistent Transactions with Inconsistent Replication

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Heckaton. SQL Server's Memory Optimized OLTP Engine

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

RocksDB Key-Value Store Optimized For Flash


Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Relational databases

Dynamo: Key-Value Cloud Storage

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Information Systems (Informationssysteme)

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

SQL Server 2014: In-Memory OLTP for Database Administrators

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Transactions and ACID

Migrating Oracle Databases To Cassandra

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Consistency in Distributed Systems

Distributed KIDS Labs 1

Eventual Consistency Today: Limitations, Extensions and Beyond

TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT

Scaling Out Key-Value Storage

Important Lessons. Today's Lecture. Two Views of Distributed Systems

CSC 261/461 Database Systems Lecture 24

CS Amazon Dynamo

Introduction to Database Services

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

9/26/2017 Sangmi Lee Pallickara Week 6- A. CS535 Big Data Fall 2017 Colorado State University

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Transaction Management. Pearson Education Limited 1995, 2005

CSE 530A ACID. Washington University Fall 2013

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

PROCEDURAL DATABASE PROGRAMMING ( PL/SQL AND T-SQL)

Chapter 11 - Data Replication Middleware

CSE 444: Database Internals. Lecture 25 Replication

TRANSACTION MANAGEMENT

Are You OPTIMISTIC About Concurrency?

Modern Database Concepts

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed PostgreSQL with YugaByte DB

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

LazyBase: Trading freshness and performance in a scalable database

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Introduction to Data Management CSE 344

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

In-Memory Data Management Jens Krueger

Final Review. May 9, 2017

T ransaction Management 4/23/2018 1

Final Review. May 9, 2018 May 11, 2018

Distributed File Systems II

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Chapter 22. Transaction Management

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

Data Consistency Now and Then

Main-Memory Databases 1 / 25

CSE 344 Final Review. August 16 th

Chapter 7 (Cont.) Transaction Management and Concurrency Control

Multi-User-Synchronization

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Transaction Management & Concurrency Control. CS 377: Database Systems

Phantom Problem. Phantom Problem. Phantom Problem. Phantom Problem R1(X1),R1(X2),W2(X3),R1(X1),R1(X2),R1(X3) R1(X1),R1(X2),W2(X3),R1(X1),R1(X2),R1(X3)

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM)

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Transaction Management: Introduction (Chap. 16)

Azure-persistence MARTIN MUDRA

SCALABLE CONSISTENCY AND TRANSACTION MODELS

NewSQL Database for New Real-time Applications

Database Architectures

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Chapter 8: Working With Databases & Tables

CPS 512 midterm exam #1, 10/7/2016

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 13 Managing Transactions and Concurrency

EECS 498 Introduction to Distributed Systems

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Transcription:

Wildfire: Fast HTAP over loosely coupled Nodes R. Barber, C. Garcia-Arellano, R. Grosman, V. Raman, R. Sidle, M. Spilchen, A. Storm, Y. Tian, P. Tozun, D. Zilio, M. Huras, C. Mohan, F. Ozcan, H. Pirahesh IBM Research, IBM Analytics

Apps emit tons of events IOT Mobile commerce Today s DBMS is too slow for this firehose And too costly (storage) Events happen in the real-world not tied to transaction commit Place Order Withdraw Cash Events always have asterisks Concurrent events Event ordering done later wearables

Apps emit tons of events IOT Mobile commerce wearables Today s DBMS is too slow for this firehose Multi-master, And too costly (storage) disconnected operation Events happen in the real-world not tied to transaction commit Place Order Withdraw Cash Events always have asterisks Concurrent events Event ordering done later Events need Availability and Consistency Today: Consistency achieved via application logic Compensation, apologies, coupons, Weak atomicity and durability Growing pressure for DBMS to give both Availability and Consistency Due to globalization e.g., credit cards

Apps emit tons of events IOT Mobile commerce wearables Events need Avail. and Consistency Events need HTAP Today s DBMS is too slow for this firehose Multi-master, Mobile commerce And too costly (storage) disconnected operation Retail: inventory analysis, shipping time Events happen in the real-world Today: Consistency analysisachieved via not tied to transaction commit application logic Securities trading: global risk analysis Withdraw Place Order Compensation, apologies, Cash Margin rules coupons, Events always have asterisks Weak atomicity Complex and analysis durability within transaction Concurrent events Touching lots of rows Growing pressure for DBMS to give Event ordering done later both Availability Handled Consistency poorly by both 2PL and OCC Due to globalization Analysis involves far more than SQL e.g., credit cards Graph, machine learning,

Apps emit tons of events IOT Mobile commerce WildFire Goals wearables Events need Avail. and Consistency Events need HTAP Today s DBMS is too slow for this firehose Multi-master, Mobile commerce And too costly (storage) disconnected operation Peak transaction speed Retail: Multi-Master inventory analysis, and ACID shipping time Events Inserts/updates: happen in the keep real-world up with bandwidth Today: Consistency of analysis Commit achieved is local via (no consensus) durability not tied to mechanism transaction (today: commit 1e6/s/node) application logic Securities High-value trading: events global can risk wait analysis for Full indexing: Withdraw Place Order keep up w. random access Compensation, speed apologies, Cash coupons, Complex conflict analysis resolution within transaction (after commit ) (hash tables + atomics on SSD à NVRAM) Events Fully versioned always have asterisks Weak atomicity Handled durability poorly by both 2PL and OCC Concurrent events HTAP Growing pressure for DBMS to give Open Event Format ordering done later In-transaction analytics both Availability and Consistency Animation All data groomed WF targets to Parquet 1M format Analytics over snapshot Due to globalization xsacs/s/node on shared storage (run at bandwidth of (1s or 10mins) e.g., credit cards durable Directly medium accessible / network) by analytics platforms Higher throughput and (e.g., Spark) more economical scaleout

Wildfire architecture Applications analytics can tolerate slightly stale data requires most recent data high-volume transactions wildfire engine wildfire engine wildfire engine SSD/NVM wildfire engine SSD/NVM shared file system

Data lifecycle OLTP nodes postgroom groom ORGANIZED zone TIME (PBs of data) HTAP (see latest: snapshot isolation) 1-sec old snapshot Optimized snapshot (10 mins stale) Analytics nodes GROOMED zone (~10 mins) LIVE zone (~1sec) ML, etc (Spark) BI Snapshot Lookups

xsacs Live Zone: Thin Commit per xsac logs (uncommitted) log (committed) replicate xsacs What happens at Commit 1. append xsac deltas (Ins/Del/Upd) to common log; replicated in background -- everything is an upsert: key, (values)* -- no synchronous conflict resolution 2. flush to local SSD 3. high-value xsacs wait for grooming (to timestamp the xsac and resolve conflicts) -- can time-out Driven by speed No tracking down prior versions No indexing No waiting for consensus with other nodes multiple versions for same key can coexist -- queries pick right version based on their xsac snapshot

xsacs Grooming (Live à Groomed zone) per xsac logs (uncommitted) replicate xsacs log (committed) groom Runs distributed consensus to timestamp the xsacs (pick serialization order) take quorum-visible deltas, form data blocks, and publish to shared file system Add begints field to each row: (groomts localtime nodeid) Conflicts and constraints resolved lazily (including logical rollback) No assumption about Clock synchronization Partitioning / failures (multiple groomers possible) Details offline

Postgrooming Organize data so that Queries can run fast (and deal with immutable storage!) Resolve conflicts (and stamp xsacs with resolution/rollback status) Compute endtime and prevrid Partition data (along multiple dimensions) Maintain primary indexes, secondary indexes, and synopses Continual refinement done in background - big challenge is supporting concurrent groom and queries CURRENT Partitions Groomed Block Log HISTORY Partitions ORGANIZED Zone GROOMED Zone LIVE Zone

Postgrooming: index maintenance Primary: maps key hash à RID [+ include columns] Secondary: maps key hash à pkey [+ TSN hint] Works in background to add groomed records to indexes Index is variant of LSM tree Merging in background Lives in multiple tiers: memory, SSD, shared storage, and purged Multiple versions of each key live in index 11

Postgrooming: computing endts At groom, each row has a begints - but no endts Without postgrooming, every table scan must group-by on primary key to pick appropriate version Postgroom picks groomed rows and assigns endts 1. Massive set intersection: PrimaryKeyIndex (keyàlatestrid) with RecentlyGroomed prior versions can be arbitrarily far back! 2. Squeeze in endts into data blocks (with concurrent readers!)

Postgrooming: resolving transaction status (single-shard) ReadSet tracking is pessimistic, especially with complex queries Eg: currentinventory ß select sum(..) from ledger where productid=_ if (currentinventory > 2) insert into ledger values (-1, productid, ); # buy one item Constraint-based resolution Transaction Type1: { read* ; fullyspecifiedwrite*; If (trigger) { rollback or other action }} ATM withdrawal, Securities trading (higher-granularity checks) Transaction Type2: { read* ; if (trigger) { fullyspecifiedwrite* } } Submit order Trigger Condition is checked as a continuous query (incrementally feed new deltas) ReadSet-based resolution {Read*; write*; if (trigger) {rollback or other action}} (trigger readset changed since query snapshot) is checked as a continuous query More general transactions (eg RWRW) hard to check incrementally fall back to traditional readset tracking

Postgrooming: resolving transaction status (multi-shard) Each transaction can produce multiple deltas, spread across groom cycles No 2PC Each transaction stamped with its delta count Resolve only considers a delta if all deltas of that transaction are available

Concluding Remarks OLTP/OLAP separation is going away DBMS needs to be much faster, and stop controlling the data format, and stop controlling the data storage, and stop controlling the kinds of analytics DBMS can be the manager for event data à V V LDB Thank you

BACKUP