Hyrise - a Main Memory Hybrid Storage Engine

Similar documents
HYRISE In-Memory Storage Engine

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

In-Memory Data Management

In-Memory Data Management Jens Krueger

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

Column Stores vs. Row Stores How Different Are They Really?

In-Memory Data Structures and Databases Jens Krueger

Data Structures for Mixed Workloads in In-Memory Databases

class 5 column stores 2.0 prof. Stratos Idreos

In-Memory Columnar Databases - Hyper (November 2012)

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

Advanced Databases: Parallel Databases A.Poulovassilis

Main-Memory Databases 1 / 25

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

After completing this course, participants will be able to:

Data Partitioning and MapReduce

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Architecture-Conscious Database Systems

Fast Column Scans: Paged Indices for In-Memory Column Stores

Bridging the Processor/Memory Performance Gap in Database Applications

Safe Harbor Statement

Weaving Relations for Cache Performance

class 6 more about column-store plans and compression prof. Stratos Idreos

Data Structures for Mixed Workloads in In-Memory Databases

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Information Management course

COURSE 12. Parallel DBMS

Evolution of Database Systems

In-Memory Technology in Life Sciences

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

CS317 File and Database Systems

HadoopDB: An open source hybrid of MapReduce

OLAP Introduction and Overview

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Distributed KIDS Labs 1

Architecture and Implementation of Database Systems (Summer 2018)

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Efficient Transaction Processing for Hyrise in Mixed Workload Environments

Composite Group-Keys

1Z0-526

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Scalable Enterprise Networks with Inexpensive Switches

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

COLUMN DATABASES A NDREW C ROTTY & ALEX G ALAKATOS

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Insider s Guide on Using ADO with Database In-Memory & Storage-Based Tiering. Andy Rivenes Gregg Christman Oracle Product Management 16 November 2016

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 3 days Instructor Led

Database Applications (15-415)

Locality. Christoph Koch. School of Computer & Communication Sciences, EPFL

Things To Know. When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich

Processing Analytical Queries over Encrypted Data

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Evolving To The Big Data Warehouse

CompSci 516 Database Systems

Evaluation of Relational Operations: Other Techniques

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehousing and OLAP

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Architecture and Implementation of Database Systems Query Processing

Query Processing on Prefix Trees Revisited

CSE 544: Principles of Database Systems

CS317 File and Database Systems

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

NoDB: Querying Raw Data. --Mrutyunjay

Distributed Databases: SQL vs NoSQL

SQL Server 2014 In-Memory OLTP: Prepare for Migration. George Li, Program Manager, Microsoft

Evaluation of Relational Operations: Other Techniques

Database Applications (15-415)

Evaluation of Relational Operations

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Generic Business Simulation Using an In-Memory Column Store

CSIT5300: Advanced Database Systems

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Performance in the Multicore Era

Updating Your Skills to SQL Server 2016

High Speed ETL on Low Budget

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Automating Information Lifecycle Management with

Evaluation of Relational Operations

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

Parallel DBMS. Chapter 22, Part A

Enterprise Data Warehousing

Row-Store / Column-Store / Hybrid-Store

CHAPTER 3 Implementation of Data warehouse in Data Mining

How to Deploy Enterprise Analytics Applications With SAP BW and SAP HANA

Autonomic Workload Execution Control Using Throttling

Oracle In-Memory & Data Warehouse: The Perfect Combination?

Data Warehousing and Decision Support

VOLTDB + HP VERTICA. page

SAP Research. Sumeet Bajaj, SAP Labs, Palo Alto. April 30, 2009 SYSTEMATIC THOUGHT LEADERSHIP FOR INNOVATIVE BUSINESS

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Transcription:

Hyrise - a Main Memory Hybrid Storage Engine Philippe Cudré-Mauroux exascale Infolab U. of Fribourg - Switzerland & MIT joint work w/ Martin Grund, Jens Krueger, Hasso Plattner, Alexander Zeier (HPI) and Sam Madden (MIT) October 6, 2010 SAP Labs Palo alto

Outline Motivation (OLTP VS OLAP) Hyrise Architecture Hybrid Layouts Query Execution Cost Model Physical Layouters Optimal Layouter Scalable Layouter Performance Current & Future Work

Motivation: the DBMS Divide OLTP Data entry Mix of reads and writes to few rows at a time Index structures (B+ Trees) Transaction Processing System VS OLAP Aggregates Bulk updates (ETL) Large sequential scans spanning few columns but many rows Warehousing Rise of columnstores OLTP HYRISE OLAP

Problems w/ current divide Increasing needs for real-time analytics and hybrid workloads e.g., ATP (Available To Promise) applications involve both OLTP queries and OLAP aggregates fundamental bottleneck implied by implicit separation Duplication of data management systems is cumbersome and costly for SMEs... and even for large companies

Hyrise Hyrise: a Main memory Simplicity Efficiency Future of DBMS market for enterprise data Hybrid storage engine Partitions tables into vertical partitions of varying widths depending on how columns accessed (e.g., transactionally or analytically) OLTP HYRISE OLAP

Hyrise Architecture Query Processor R R Layout Manager Layouter R Workload Data In-Memory Storage Manager Attribute Groups Attribute Groups Data Container

Hybrid Layouts A2 A4 A5 A6 A7 A1 A3 A8 A9 OLTP -style hybrid chunks OLAP -style

Query Processing Query plans Trees of operators Standard operators Look-ups, aggregations, joins, etc. Support both early and late-materialization strategies

Examples of query plans Position Lookup Position AND Positions Values Positions Position Lookup 3,4 Position AND Hash Join 1,3,4,5 Positions 3,4,5 Values 1,2,3,4 Filter Filter Filter Filter Filter 1 2 3 4 5 1 2 3 4 5 Dimension Tables 1 2 3 4 5 Fact Table 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Join Plan non-join Plan

Physical Database Design (1) Obviously, performance heavily depends on the way attributes are partitioned

Physical Database Design (2) Optimal physical layout λopt dependent on Set of attributes and #tuples (DB) Important / frequent queries (W) Cost of various queries given a layout (Cost) Two issues Accurate cost-model Optimal and efficient layout-selection algorithm

Cost-Model (1) What to model? layout-dependent VS layout-independent costs Where does time go in main-memory layoutdependent operations? cache misses (L1/L2) Highly accurate model for cache misses in hybrid DBMSs

Cost Model Basics C.o a1 a2 a3 a4 a5 r0 r1 r2 r3!.w C.w L.w Partial projections: with and

Cost Model (2) Additional expressions for Arbitrary combination of partial projections Selections Joins, aggregates Container padding, cache collision, prefetching decisions etc.

Layout Selection We can discriminate layouts thanks to the cost-model How many layouts shall we consider for n attributes? a(n) = (2n 1)a(n 1) (n 1)(n 2)a(n 2) w/ a(0) = a(1) = 1 3,535,017,524,403 for a table of 15 attributes Efficient optimal algorithm Scalable approximate algorithm

Optimal Layouter Three phases Candidate Generation Primary partitions (no container overhead cost) Candidate Merging Combine candidates Discard candidates that are subsumed by a set of more efficient candidates e.g., Cost( A1, A2 ) < Cost ( A1,A2 ) Layout Generation Generate valid layouts Exponential in the worst-case Relevant for small workloads

Scalable Layouter (1) Divide-and-conquer approximate algorithm Graph storing affinities between primary partitions p 7 p 2 p1 p 5 p 8 p 9 p 3 p 0 p 10 p 4 p 6 min-cut graph-partitioning to generate sub-graphs containing at most K primary partitions approximate multilevel k-way partitioner p 7 p 2 p1 p 5 p 8 p 9 p 3 p 0 p 10 p 4 p 6

Scalable Layouter (2) Generate optimal layouts for each subset worst-case exponential in K Combine sub-layouts to generate final layout Requires layout evaluation in the worst case ( P = total # partitions from previous step) Approximate result Very tight upper-bound on penalty incurred Scales to hundred of primary partitions and thousands of queries e.g., a few minutes for 1000 queries and 500 att.

Hyrise Performance (1) Detailed performance evaluation on a realistic hybrid workload (Krueger 2010) Business Partner (KNA1) KUNNR Business Partner Address (ADRC) KUNNR Sales Document Header (VBAK) Material Text (MAKT) VBELN MATNR Sales Document Item (VBAP) MATNR Material (MARA) MATNR Material Hierarchy (MATH)

Workload (OLTP-Style)

Workload ( OLAP-Style )

Results (1) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Q1 Q3 Q12 Hybrid 9707 2585 101615 Column 26488 2356 101957 Row 14995 2764 840862 Hybrid Column Row

Results (2) Hyrise combines the best of both-worlds (OLTP & OLAP systems) in one system Final speed-up depends on query weighting 40% to 400% speed-up compared to all-column / all-row layouts on a hybrid workload

Current Status Fully functional prototype Cost-model framework Automated layouter Article in PVLDB 2010 presentation at VLDB 2011 Current work on Multi-core optimizations Hybrid horizontal + vertical partitionings

Future Work Data replication & partial materialization Maximize utility function given a space budget Hybrid storage for new application domains multidimensional / scientific / multimedia data graph data (Linked Data?) XILAB

Hyrise - a Main Memory Hybrid Storage Engine Philippe Cudré-Mauroux exascale Infolab U. of Fribourg - Switzerland & MIT pcm@unifr.ch October 6, 2010 SAP Labs Palo alto