In-Memory Data Management

Similar documents
In-Memory Data Management Jens Krueger

In-Memory Data Structures and Databases Jens Krueger

In-Memory Technology in Life Sciences

HYRISE In-Memory Storage Engine

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

Main-Memory Databases 1 / 25

In-Memory Columnar Databases - Hyper (November 2012)

Column Stores vs. Row Stores How Different Are They Really?

HANA Performance. Efficient Speed and Scale-out for Real-time BI

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Hyrise - a Main Memory Hybrid Storage Engine

Evolving To The Big Data Warehouse

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Evolution of Database Systems

OLAP Introduction and Overview

Jignesh M. Patel. Blog:

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

VOLTDB + HP VERTICA. page

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

Main Memory Databases for Enterprise Applications

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Strategic Briefing Paper Big Data

Data Warehousing (1)

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

VLDB. Partitioning Compression

Automating Information Lifecycle Management with

HyPer-sonic Combined Transaction AND Query Processing

Self Test Solutions. Introduction. New Requirements for Enterprise Computing. 1. Rely on Disks Does an in-memory database still rely on disks?

Oracle Exadata: Strategy and Roadmap

HyPer-sonic Combined Transaction AND Query Processing

In-Memory Databases: Applications in Healthcare

SAP HANA for Next-Generation Business Applications and Real-Time Analytics

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Was ist dran an einer spezialisierten Data Warehousing platform?

One Size Fits All: An Idea Whose Time Has Come and Gone

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Oracle #1 RDBMS Vendor

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

April Copyright 2013 Cloudera Inc. All rights reserved.

<Insert Picture Here> Value of TimesTen Oracle TimesTen Product Overview

LazyBase: Trading freshness and performance in a scalable database

Exadata Implementation Strategy

The mixed workload CH-BenCHmark. Hybrid y OLTP&OLAP Database Systems Real-Time Business Intelligence Analytical information at your fingertips

Self Test Solutions. Introduction. New Requirements for Enterprise Computing

Optimizing Data Transformation with Db2 for z/os and Db2 Analytics Accelerator

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe

Architecture-Conscious Database Systems

Trends and Concepts in Software Industry I

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Making Storage Smarter Jim Williams Martin K. Petersen

Understanding the SAP HANA Difference. Amit Satoor, SAP Data Management

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Four Steps to Unleashing The Full Potential of Your Database

C-Store: A column-oriented DBMS

SAP HANA Update. Saul Cunningham SAP Big Data Centre of Excellence

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Oracle 1Z0-515 Exam Questions & Answers

Processing of Very Large Data

Technical Deep-Dive in a Column-Oriented In-Memory Database

SAP HANA Scalability. SAP HANA Development Team

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

The Case for Heterogeneous HTAP

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Data Structures for Mixed Workloads in In-Memory Databases

Implementing Oracle database12c s Heat Map and Automatic Data Optimization to optimize the database storage cost and performance

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

Data Transformation and Migration in Polystores

Top Trends in DBMS & DW

Data Blocks: Hybrid OLTP and OLAP on compressed storage

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

File Structures and Indexing

Datenbanksysteme II: Caching and File Structures. Ulf Leser

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Information Management course

Oracle Exadata: The World s Fastest Database Machine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Migrate from Netezza Workload Migration

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

April 2010 Rosen Shingle Creek Resort Orlando, Florida

IBM Storwize V7000 TCO White Paper:

Distributed KIDS Labs 1

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Fast Column Scans: Paged Indices for In-Memory Column Stores

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

In-Memory Data Management Research. Martin Boissier, Markus Dreseler

Transcription:

In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Agenda 2 1. Changed Hardware 2. Advances in Data Processing 3. Todays Enterprise Applications 4. The In-Memory Data Management for Enterprise Applications 5. Impact on Enterprise Applications 6. Influence on Processes

All Areas have to be taken into account 3 Changed Hardware Advances in data processing (software) Complex Enterprise Applications Our focus

Why a New Data Management? 4 DBMS architecture has not changed over decades Redesign needed to handle the changes in: Hardware trends (CPU/cache/memory) Changed workloads Data characteristics Data amount Query engine Buffer pool Some academic prototypes: MonetDB, C-store, HyPer, HYRISE Several database vendors picked up the idea and have new databases in place (e.g., SAP, Vertica, Greenplum, Oracle) Traditional DBMS Architecture

Changes in Hardware 5 give an opportunity to re-think the assumptions of yesterday because of what is possible today. Multi-Core Architecture (96 cores per server) One blade ~$50.000 = 1 Enterprise Class Server Parallel scaling across blades A 64 bit address space 2TB in current servers 25GB/s per core Cost-performance ratio rapidly declining Memory hierarchies Main Memory becomes cheaper and larger

In the Meantime Research as come up with 6 several advance in software for processing data Column-oriented data organization (the column-store) Sequential scans allow best bandwidth utilization between CPU cores and memory Independence of tuples within columns allows easy partitioning and therefore parallel processing + Lightweight Compression Reducing data amount, while.. Increasing processing speed through late materialization And more, e.g., parallel scan/join/aggregation

Memory Hierarchy 7 Nehalem Quadcore Core 0 Core 1 Core 2 Core 3 TLB Nehalem Quadcore Core 1 Core 2 Core 3 Core 0 TLB L1 Cacheline L1 L1 L2 Cacheline L2 L2 L3 Cacheline L3 Cache QPI QPI L3 Cache Memory Page Main Memory Main Memory

Two Different Principles of Physical Data Storage: Row- vs. Column-Store 8 Row-store: Rows are stored consecutively Optimal for row-wise access (e.g. *) Column-store: Columns are stored consecutively Optimal for attribute focused access (e.g. SUM, GROUP BY) Note: concept is independent from storage type But only in-memory implementation allows fast tuple reconstruction in case of a column store + Row-Store Column-store Row 1 Doc Num Doc Date Sold- To Value Sales Status Org Row 2 Row 3 Row 4

OLTP- and OLAP-style Queries Favor Different Storage Patterns 9 Row Store Column Store Row 1 Doc Num Doc Date Sold- To Value Sales Status Org SELECT * FROM Sales Orders WHERE Document Number = 95779216 Row 2 Row 3 Row 4 Row 1 Doc Num Doc Date Sold- To Value Sales Status Org SELECT SUM(Order Value) FROM Sales Orders WHERE Document Date > 2009-01-20 Row 2 Row 3 Row 4

Motivation for Compression in Databases 10 Main memory access is the bottleneck Idea: Trade CPU time to compress and decompress data Lightweight Compression Lossless Reduces I/O operations to main memory Leads to less cache misses due to more information on a cache line Enables operations directly on compressed data Allows to offset by the use of fixed-length data types

Lightweight Dictionary Encoding for Compression and Late Materialization 11 Store distinct values once in separate mapping table (the dictionary) Associate unique mapping key (valueid) for each distinct value Store valueid instead of value in attribute vector Enables offsetting with bit-encoded fixed-length data types RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 Table JAN FEB MAR APR MAY JUN JUL AUG INTEL ABB HP INTEL IBM IBM SIEMENS INTEL 1 2 2 4 5 5 4 3 Attribute: Company Name Attribute Vector RecId ValueId 1 4 2 1 3 2 4 4 5 3 6 3 7 5 8 4 Dictionary ValueId Value 1 ABB 2 HP 3 IBM 4 Intel 5 SIEMENS Inverted Index ValueId RecIdList 1 2 2 3 3 5,6 Typical compression factor for enterprise software 10 In financial applications up to 50 4 1,4,8 5 7

Data Modifications in a Compressed Store 12 Differential Store: two separate in-memory partitions Read-optimized main partition (ROS) Write-optimized delta partition (WOS) Both represent the current state of the data WOS/Delta as an intermediate storage for several modifications Re-compression costs are shared among all recent modifications (merge process) Insert/Update Select (union) WOS Merge Process (asynchronously) ROS

Todays Enterprise Applications 13 Enterprise applications have evolved: not just OLTP vs. OLAP Demand for real-time analytics on transactional data High throughput analytics è completely in memory Examples Available-To-Promise Check Perform real-time ATP check directly on transactional data during order entry, without materialized aggregates of available stocks. Dunning Search for open invoices interactively instead of scheduled batch runs. Operational Analytics Instant customer sales analytics with always up-to-date data. Data integration as big challenge (e.g. POS data)

14 Enterprise Workloads are Read-Mostly 14 It is a myth that OLTP is write-oriented, and OLAP is readoriented Real world is more complicated than single tuple access, lots of range queries Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % OLTP OLAP Write: Delete Modification Insert Read: Range Select Table Scan Lookup Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % TPC-C Write: Delete Modification Insert Read: Select

Enterprise Data is Typically Sparse 15 Enterprise data is wide and sparse Most columns are empty or have a low cardinality of distinct values Sparse distribution facilitates high compression % of Columns 80% 70% 60% 50% 40% 30% 20% 10% 0% 64% 78% 12% 9 % Inventory Management Financial Accounting 24% 13% 1-32 33-1023 1024-100000000 Number of Distinct Values

Challenge 1 for Enterprises: Dealing with all Sorts of Data 16 Transactional Data Entry Sources: Machines, Transactional Apps, User Interaction, etc. Real-time Analytics, Structured Data Sources: Reporting, Classical Analytics, planning, simulation CPUs (multi-core + Cache + Memory) Event Processing, Stream Data Sources: machines, sensors, high volume systems Data Management Text Analytics, Unstructured Data Sources: web, social, logs, support systems, etc.

Challenge 2 for Enterprises: Current application architectures 17 create different application-specific silos with redundant data that reduce real-time behavior & increase complexity. ANALYTICAL CUBES Transactional ANALYTICAL Text Processing ETL RDB on Disk (Tuples) RDB on Disk (Star Schemas) Accelerator Column Store In-Memory (Fully Cached Result Sets) Blobs and Text Columns In-Memory

Drawbacks of this Separation 18 Historically, OLTP and OLAP system are separated because of resource contention and hardware limitations. But, this separation has several disadvantages: OLAP system does not have the latest data OLAP system does only have predefined subset of the data Cost-intensive ETL process has to keep both systems in synch There is a lot of redundancy Different data schemas introduce complexity for applications combining sources

19 Approach Change overall data management system assumption In-Memory only Vertically partitioned (column store) CPU-cache optimized Only one optimization objective main memory access In-Memory Column + Row OLTP + OLAP + Text Column Rethink how enterprise application persistence is build Single data management system No redundant data, no materialized views, cubes Computational application logic closer to the database (i.e. complex queries, stored procedures) ROW Backup TEXT

Intermezzo 20 Hardware advances More computing power through multi-core CPU s Larger and cheaper main memory Algorithms need to be aware of the memory wall Software advances Column Stores superior for analytic style queries Light-weight compression schemes utilize modern hardware Enterprise applications Need to execute complex queries in real-time One single source of truth is needed

How does it all come together? 21 1. Mixed Workload combining OLTP and analytic-style queries Column Stores are best suited for analytic-style queries In-memory database enables fast tuple re-construction In-memory column store allows aggregation on the fly 2. Sparse enterprise data Lightweight compression schemes are optimal Increases query execution Improves feasibility of in-memory database Changed Hardware Advances in data processing (software) 3. Mostly read workload Read-optimized stores provide best throughput i.e. compressed in-memory column-store Write-optimized store as delta partition to handle data changes is sufficient Our focus Complex Enterprise Applications

Merge SanssouciDB: An In-Memory Database for Enterprise Applications 22 In-Memory Database (IMDB) Data resides permanently in main memory Main Memory is the primary persistence Still: logging to disk/recovery from disk Main memory access is the new bottleneck Cache-conscious algorithms/ data structures are crucial (locality is king) Interface Services and Session Management Query Execution Metadata TA Manager Active Data Main Store Column Column Data aging Combined Column Time travel Differential Store Column Column Combined Column Logging Indexes Inverted Object Data Guide Recovery Log Distribution Layer at Blade i Main Memory at Blade i Non-Volatile Memory Passive Data (History) Snapshots

Impact on Application Development 23 Traditional In-Memory Column-Store Application cache Materialized views Prebuilt aggregates Raw data Less caches needed No redundant objects No maintenance of materialized views or aggregates Minimized index maintenance Data movements are minimized

Impact on Enterprise Applications: Financials as of Today 24

Impact on Enterprise Applications: Simplified Financials on In-Memory DB 25 Only base tables, algorithms, and some indexes Reduces complexity Lowers TCO While adding more flexibility, integration, and functionality

IMDB Conclusion 26 In-memory column stores are better suited as database management system (DBMS) for enterprise applications than conventional DBMS In-memory column stores utilizes modern hardware optimally Several data processing techniques leverage in-memory only data processing Enterprise applications show specific characteristics: Sparsely filled data tables Complex read-mostly workload Real-world experiences have proven the feasibility of the in-memory column-store

Challenge the status quo 27 Don t ask: Where Do we currently have performance problems Systems are productive with majority of performance issues addressed through user training, process adaptions, workarounds People are used to situations and have adapted Don t only think incremental Only making things faster will not create the full potential Design new processes around the assumption of available information

Ask indirectly (1/2) 28 Where did we define processes around performance Where did we train people What did we forbid/disable to do Where do we have workaround processes Where do we have exceptions driven processes deriving from lack of information Where do we have the most change requests in reporting Which batch processes run multiple times a day Where would I love to drill down from reporting into transactions

Ask indirectly (2/2) 29 Where do my users work with the pattern data gathering, intermediate storage in Excel Where have you simplified the data model Automation of information worker tasks Which processes include massive re-work due to data latency Which processes are too slow Which processes have not been implemented due to performance requirements & data quantity Which processes require variations of existing aggregates

Thanks! 30 Questions? Martin Faust Hasso Plattner Institute Martin.Faust@hpi.uni-potsdam.de