HyPer-sonic Combined Transaction AND Query Processing

Similar documents
HyPer-sonic Combined Transaction AND Query Processing

The mixed workload CH-BenCHmark. Hybrid y OLTP&OLAP Database Systems Real-Time Business Intelligence Analytical information at your fingertips

Main-Memory Databases 1 / 25

HyPer on Cloud 9. Thomas Neumann. February 10, Technische Universität München

In-Memory Columnar Databases - Hyper (November 2012)

VOLTDB + HP VERTICA. page

Column Stores vs. Row Stores How Different Are They Really?

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Crescando: Predictable Performance for Unpredictable Workloads

In-Memory Data Management

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track. FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X

Incrementally Parallelizing. Twofold Speedup on a Quad-Core. Thread-Level Speculation. A Case Study with BerkeleyDB. What Am I Working on Now?

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray

Architecture of a Real-Time Operational DBMS

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Bottleneck Hunters: How Schooner increased MySQL throughput by more than 800% Jeremy Cole

SQL Server 2014 In-Memory OLTP: Prepare for Migration. George Li, Program Manager, Microsoft

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Benchmarking Databases. PGcon 2016 Jan Wieck OpenSCG

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

MemTest: A Novel Benchmark for In-memory Database

CSE 344 Final Review. August 16 th

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Parallelism and Concurrency. COS 326 David Walker Princeton University

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

HYRISE In-Memory Storage Engine

Memory-Based Cloud Architectures

STAR: Scaling Transactions through Asymmetric Replication

April Copyright 2013 Cloudera Inc. All rights reserved.

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Performance Testing of SQL Server on Kaminario K2 Storage

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Comparing Software versus Hardware RAID Performance

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CompSci 516 Database Systems

SQL Server 2014 Highlights der wichtigsten Neuerungen In-Memory OLTP (Hekaton)

SAP HANA Scalability. SAP HANA Development Team

How Scalable is your SMB?

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

Traditional RDBMS Wisdom is All Wrong -- In Three Acts "

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

DATABASE SCALE WITHOUT LIMITS ON AWS

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

HP ProLiant delivers #1 overall TPC-C price/performance result with the ML350 G6

Storage Optimization with Oracle Database 11g

GFS: The Google File System. Dr. Yingwu Zhu

Columnstore in real life

From SQL-query to result Have a look under the hood

Dell Microsoft Reference Configuration Performance Results

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

Database Ph.D. Qualifying Exam Spring 2006

Chapter 8: Main Memory

Oracle Database 10g The Self-Managing Database

Low Overhead Concurrency Control for Partitioned Main Memory Databases

Staggeringly Large Filesystems

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

CSE 124: Networked Services Lecture-17

Data pipelines with PostgreSQL & Kafka

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

The Case for Heterogeneous HTAP

Benchmarking In PostgreSQL

NewSQL. Database Landscape From: the 451 group. OLTP Focus. NewSQL: Flying on ACID. Cloud DB, Winter 2014, Lecture 14

Chapter 8: Memory-Management Strategies

SQL Server Performance on AWS. October 2018

NewSQL: Flying on ACID

Adaptive Query Processing on Prefix Trees Wolfgang Lehner

Database Architectures

Database Architectures

STEPS Towards Cache-Resident Transaction Processing

Scaling PostgreSQL on SMP Architectures

The Google File System (GFS)

Data Warehousing & Data Mining

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Top Trends in DBMS & DW

CA485 Ray Walshe Google File System

Data-Intensive Distributed Computing

Hardware Transactional Memory on Haswell

Evolution of Database Systems

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Background. Contiguous Memory Allocation

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Transcription:

HyPer-sonic Combined Transaction AND Query Processing Thomas Neumann Technische Universität München October 26, 2011

Motivation - OLTP vs. OLAP OLTP and OLAP have very different requirements OLTP high rate of small/tiny transactions high locality in data access update performance is critical OLAP few, but long running transactions aggregates large parts of the database must see a consistent database state the whole time Traditionally, DBMSs either good at OLTP or good at OLAP Thomas Neumann HyPer 2 / 18

Motivation - Traditional Solution OLTP Requests /Tx OLAP Queries ETL Extract Transform Load not very satisfying. stale data, redundancy, etc. Thomas Neumann HyPer 3 / 18

Motivation - Hardware Trends Intel Tera Scale Initiative Server with 1 TB main memory ca. 40K Euro from Dell main memory grows faster than (business) data can afford to keep data in memory memory is not just a fast disk should make use of this facts Amazon Data Volume Revenue: 25 billion Euro Avg. Item Price: 15 Euro ca. 1.6 billion order lines per year ca. 54 Bytes per order line ca. 90 GB per year + additional data - compression Transaction Rate Avg: 32 orders per s Peak rate: Thousands/s + inquiries Thomas Neumann HyPer 4 / 18

HyPer Our system HyPer OLTP Requests /Tx Hybrid OLTP&OLAP High-Performance Database System OLAP Queries Combined OLTP/OLAP system using modern hardware Thomas Neumann HyPer 5 / 18

HyPer - Design OLTP performance is crucial avoid anything that would slow down OLTP OLTP should operate as if there were no OLAP OLAP is not that performance sensitive, but needs consistency locking/latching is out of question (OLAP would slow down OLTP) Idea: we are a main memory database. Use hardware support. Thomas Neumann HyPer 6 / 18

HyPer - Pure OLTP workload Main Memory OLTP Requests /Tx purely main memory, OLTP transactions need a few µs can afford serial execution of transactions (at least initially) avoids any concurrency issues Thomas Neumann HyPer 7 / 18

HyPer - Virtual Memory Supported Snapshots OLTP Requests /Tx c d b a Read a Virtual Memory OLAP sessions need a consistent snapshot over a relatively long time use the MMU / OS support to separate OLTP and OLAP the fork separates OLTP from OLAP, even though they are initially the same Thomas Neumann HyPer 8 / 18

HyPer - Copy on Update OLTP Requests /Tx c b a Read a d b a Virtual Memory the MMU detects writes to shared data modified pages are copied, both parts have unique copies afterwards avoids any interaction between OLTP and OLAP like an ultra-efficient shadow paging without the disadvantages Thomas Neumann HyPer 9 / 18

HyPer - Snapshots We use fork to create transaction consistent snapshots each OLAP sessions sees one certain point in time can do long-running aggregates/analysis the data (apparently) stays the same if it changes, the MMU makes sure that OLAP does not notice eliminates need for latching/locking And fork is cheap! only the page table is copied, not the pages themselves some care is needed to scale to large memory sizes but can fork 40GB in 2.7ms Thomas Neumann HyPer 10 / 18

HyPer - Missing Pieces c d b b a OLAP Ses Undo- Log a OLTP Requests /Tx a c b a Backup Process OLAP Session d b a Virtual Memory multiple OLAP sessions, each copies just what is needed logging is needed for ACID properties backups for fast restart Thomas Neumann HyPer 11 / 18

Data-Centric Query Execution HyPer does not use the classical iterator model Why does the iterator model (and its variants) use the operator structure for execution? it is convenient, and feels natural the operator structure is there anyway but otherwise the operators only describe the data flow in particular operator boundaries are somewhat arbitrary What we really want is data centric query execution data should be read/written as rarely as possible data should be kept in CPU registers as much as possible the code should center around the data, not the data move according to the code increase locality, reduce branching Thomas Neumann HyPer 12 / 18

Data-Centric Query Execution (2) Processing is oriented along pipeline fragments. Corresponding code fragments: initialize memory of B a=b, B c=z, and Γ z for each tuple t in R 1 if t.x = 7 materialize t in hash table of B a=b for each tuple t in R 2 if t.y = 3 aggregate t in hash table of Γ z for each tuple t in Γ z materialize t in hash table of B z=c for each tuple t 3 in R 3 for each match t 2 in B z=c [t 3.c] for each match t 1 in B a=b [t 3.b] output t 1 t 2 t 3 R 1 x=7 a=b z=c z;count(*) y=3 R 2 R 3 Thomas Neumann HyPer 13 / 18

Data-Centric Query Execution (3) The algebraic expression is translated into query fragments. Each operator has two interfaces: 1. produce asks the operator to produce tuples and push it into 2. consume which accepts the tuple and pushes it further up Note: only a mental model! the functions are not really called they only exist conceptually during code generation each call generates the corresponding code operator boundaries are blurred, code centers around data we generate machine code at compile time initially using C++, now using LLVM Thomas Neumann HyPer 14 / 18

Evaluation We used a combined TPC-C and TPC-H benchmark (12 warehouses) Region (5) Supplier (10k) in Nation (62) in (10W, 10W ) Warehouse (W ) (100k, 100k) serves District (W 10) (10, 10) (1, 1) sup-by stored History (W 30k) located-in (1, 1) has (1, 1) (1, 1) (1, 1) (1, 1) Stock (W 100k) New-Order (W 9k) Customer (W 30k) (1, 1) (3, 3) (1, 1) (1, ) in (3k, 3k) OLTP Workload OLAP Workload (n 1 parallel Tx Streams) (m 1 parallel Query Streams) StockLevel 4% NewOrder (10 OrderLines) 44% Payment 44% Q 1 Q 2 Q 3 Q 4... Q π(1) Q π(2) Q π(3) Q π(4)... (W, W ) of Item (100k) pending issues available (1, 1) (0, 1) (1, 1) Order-Line (W 300k) Order (W 30k) (1, 1) (5, 15) OrderStatus 4% Delivery (batch 10 Orders) 4%... Q 20 Q 21 Q 22... Q π(20) Q π(21) Q π(22) contains TPC-C transactions are unmodified TPC-H queries adapted to the combined schema OLTP and OLAP runs in parallel Thomas Neumann HyPer 15 / 18

TPC-C+H Performance HyPer configurations MonetDB VoltDB one query session (stream) 3 query sessions (streams) no OLTP no OLAP single threaded OLTP 5 OLTP threads 1 query stream only OLTP OLTP Query resp. OLTP Query resp. Query resp. results from Query No. throughput times (ms) throughput times (ms) times (ms) VoltDB web page Q1 67 71 63 Q2 163 212 210 Q3 66 73 75 Q4 194 226 6003 Q5 1276 1564 5930 Q6 9 17 123 Q7 1151 1466 1713 Q8 399 593 172 Q9 206 249 208 Q10 1871 2260 6209 Q11 33 35 35 Q12 156 170 192 Q13 185 229 284 Q14 122 156 722 Q15 528 792 533 Q16 1353 1500 3562 Q17 159 168 342 Q18 108 119 2505 Q19 103 183 1698 Q20 114 197 750 Q21 46 50 329 Q22 7 9 141 Dual Intel X5570 Quad-Core-CPU, 64GB RAM, RHEL 5.4 new order: 56961 tps; total: 126576 tps new order: 171384 tps; total: 380868 tps Thomas Neumann HyPer 16 / 18 55000 tps on single node; 300000 tps on 6 nodes

Memory Consumption 6000 5000 A. OLTP only B. Hybrid (idle OLAP) C. Hybrid (idle OLAP, respawned) B C A Memory Consumption [MB] 4000 3000 2000 1000 0 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 4.5e+06 Transactions we only have to replicate the working set Thomas Neumann HyPer 17 / 18

Conclusion main memory databases change the game very high throughput, transactions should never wait minimize latching and locks to get best performance use MMU support instead to separate OLTP and OLAP compiled, data-centric queries for excellent performance HyPer is a very fast hybrid OLTP/OLAP system top performance for both OLTP and OLAP full ACID support It is indeed possible to build a combined OLTP/OLAP system! Thomas Neumann HyPer 18 / 18