Introduction to Column Stores with MemSQL. Seminar Database Systems Final presentation, 11. January 2016 by Christian Bisig

Similar documents
Introduction to Column Stores with MemSQL

Introduction to Column Stores with Microsoft SQL Server 2016

Column Stores vs. Row Stores How Different Are They Really?

Sepand Gojgini. ColumnStore Index Primer

Fast, In-Memory Analytics on PPDM. Calgary 2016

Column Store Internals

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Greenplum Architecture Class Outline

Introduction to Database Services

Main-Memory Databases 1 / 25

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

SQL Server 2014 Internals and Query Tuning

Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years

ColumnStore Indexes. מה חדש ב- 2014?SQL Server.

Database Vs. Data Warehouse

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

HyPer-sonic Combined Transaction AND Query Processing

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Heckaton. SQL Server's Memory Optimized OLTP Engine

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

CSE 344 Final Review. August 16 th

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description.

[MS10987A]: Performance Tuning and Optimizing SQL Databases

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman

In-Memory Columnar Databases - Hyper (November 2012)

NewSQL Databases MemSQL and VoltDB Experimental Evaluation

INTRODUCTION TO COLUMN STORES

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

Shen PingCAP 2017

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Accelerating Analytical Workloads

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

CompSci 516 Database Systems

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987)

Microsoft Developing SQL Databases

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Multi-threaded Queries. Intra-Query Parallelism in LLVM

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Questions about the contents of the final section of the course of Advanced Databases. Version 0.3 of 28/05/2018.

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe

EECS 647: Introduction to Database Systems

Boosting DWH Performance with SQL Server ColumnStore Index

April Copyright 2013 Cloudera Inc. All rights reserved.

Performance of popular open source databases for HEP related computing problems

Evaluation of Relational Operations

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

20762B: DEVELOPING SQL DATABASES

Modern Database Systems CS-E4610

PostgreSQL and PL/Python. Daniel Swann Matt Small Ethan Holly Vaibhav Mohan

Oracle Compare Two Database Tables Sql Query Join

Processing a Trillion Cells per Mouse Click

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Guest Lecture. Daniel Dao & Nick Buroojy

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Motivation for Sorting. External Sorting: Overview. Outline. CSE 190D Database System Implementation. Topic 3: Sorting. Chapter 13 of Cow Book

Hardware & System Requirements

7. Query Processing and Optimization

CloudExpo November 2017 Tomer Levi

Microsoft. [MS20762]: Developing SQL Databases

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series Storage Arrays

ClickHouse Deep Dive. Aleksei Milovidov

Data Warehouse Appliance: Main Memory Data Warehouse

Oracle Compare Two Database Tables Sql Query List All

PostgreSQL to MySQL A DBA's Perspective. Patrick

Developing SQL Databases

Large-Scale Data Engineering. Modern SQL-on-Hadoop Systems

Time Series Analytics with Simple Relational Database Paradigms Ben Leighton, Julia Anticev, Alex Khassapov

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

MySQL Database Scalability

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Database System Architectures Parallel DBs, MapReduce, ColumnStores

Databases IIB: DBMS-Implementation Exercise Sheet 13

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

CIB Session 12th NoSQL Databases Structures

Quantifying FTK 3.0 Performance with Respect to Hardware Selection

Column-Stores vs. Row-Stores: How Different Are They Really?

Optimize OLAP & Business Analytics Performance with Oracle 12c In-Memory Database Option

Jignesh M. Patel. Blog:

Exadata Implementation Strategy

In-Memory Data Management Jens Krueger

Column-Stores vs. Row-Stores: How Different Are They Really?

Data Transformation and Migration in Polystores

DATABASE SCALE WITHOUT LIMITS ON AWS

Dremel: Interactive Analysis of Web-Scale Database

Time Series Storage with Apache Kudu (incubating)

Hustle Documentation. Release 0.1. Tim Spurway

MySQL Cluster Web Scalability, % Availability. Andrew

Super SQL Bootcamp. Price $ (inc GST)

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

MTA Database Administrator Fundamentals Course

Transcription:

Final presentation, 11. January 2016 by Christian Bisig

Topics Scope and goals Approaching Column-Stores Introducing MemSQL Benchmark setup & execution Benchmark result & interpretation Conclusion Questions and feedback 2

Scope and goals

Scope and goals Understandable preparation of the topic Column Stores Why and for what is the columnar data storage used? Introduction to MemSQL Columnar table usage in MemSQL Benchmark (MemSQL vs. PostgreSQL) (and in-memory tables vs. columnar tables) Deliverables: Article Presentation Tutorial modul 4

Approaching Column-Stores

Approaching Column-Stores A decomposition storage model - 1985 SIGMOD conference Vertical partitioned data C-Store: A Column-oriented DBMS - 2005 One of the first column-store DBMS The Design and Implementation of Modern Column-Oriented Database Systems - 2012 6

Approaching Column-Stores Business process automation Mostly transactional based (OLTP) e.g. register new client data, execute money transaction, etc. Additionally, business process improvement through gaining business intelligence Analytical processing (OLAP) e.g. evaluate client purchases, budget forecasts, etc. 7

Approaching Column-Stores Row based: Column based: 8

Approaching Column-Stores! Dos Large table scans and aggregations Range queries, BETWEEN, IN, <, > Data compression (sparse and repeated data) Large data load " Don ts Random / specific searches Large transaction volume (inserts and updates) Small inserts and updates (single-record insert performance) 9

Introducing MemSQL

Intro MemSQL Developed as in-memory database Added columnar tables with version 3.0 Provides a solution for both OLTP (row tables, in-memory tables) and OLAP (columnar tables on the harddisk) Wire compatible to MySQL Compiled queries 11

Intro MemSQL Two-tier architecture Distributed Systems (commodity hardware) Reference tables Shard tables Lock-free data structures Skip-Lists, Hash-Tables, Stacks, Queues MVCC 12

Intro MemSQL Sharding (Shard tables) Data partitioning distributed on leafs Reference tables 13

Intro MemSQL Row Table Columnar Table CREATE TABLE gnis ( x double precision not null, y double precision not null, fid integer primary key, name text, class text, state text, county text, elevation integer, map text ); CREATE COLUMNAR TABLE gnis_col ( x double precision not null, y double precision not null, fid integer, name text, class text, state text, county text, elevation integer, map text, KEY (`fid`) USING CLUSTERED COLUMNSTORE, SHARD KEY() ); 14

Intro MemSQL MemSQL column-store segmentation To consider: Every Insert or update creates a new row-segment-group The more row-segment-groups the worse the performance 15

Intro MemSQL Compression in MemSQL compression algorithms Dictionary (tokenization), Run-length-encoding example with osm_poi_tag_ch table Table-statistics compression rate of 3.6:1 which results in around 72% space savings 16

Benchmark setup & execution

Benchmark setup MemSQL (v 4.1.10) running with Creating the GNIS tables as columnar and row tables Comparing the performance of columnar and row tables PostgreSQL (9.4) row tables Benchmark on: imac (late 2009), 2.8 GHz Intel Core i7, OSX El Capitan Ram: 16GB 1067 MHz DDR3 SSD 500GB, Read: ~260MB/s, Write: ~270MB/s 18

Benchmark setup SQL Load script major changes to original scripts: Instead of PostgreSQL \copy command to load CSV > LOAD DATA LOCAL INFILE INTO TABLE Instead of CREATE TABLE AS SELECT > CREATE TABLE and INSERT INTO SELECT for the creation of the 1mio, 2mio, 3mio record tables Slightly different naming (e.g. column name keyz instead of key ) 19

Benchmark execution Python scripts for benchmark execution Both for PostgreSQL and MemSQL no reasonable timing mechanism in MemSQL Using psycopg2 (PostgreSQL) and Mysqldb python drivers. Ran every query 3 times on row (PostgreSQL) and column / row (MemSQL) and took the best run of each to compare. Second benchmark part: A script for bulk insert/update/delete 20

Benchmark execution Python Script excerpt: 21

Benchmark execution Python Script excerpt: 22

Benchmark result & interpretation

Benchmark result 24

Benchmark result Single tuple data manipulation 10 000 Inserts 10 000 Updates 10 000 Deletes 25

Benchmark interpretation Four points to mention: 1. Specific search on non index column and multiple tuples in result set, performs well 2. Both types have their field of events (e.g. specific search or range search) 3. Bad joined select performance on column-store 4. Column-stores are well suitable for a large amount of single data manipulation operations 26

Conclusion

Conclusion Not an option to compromise one store type over the other. Each one has its field of event SSD can not compensate column-store I/O disadvantages Impressed by the compression and performance Had a fight with measuring execution times in MemSQL Interested to test MemSQL in a larger setup 28

Any questions? 29

References: Image Slide 1: https://victoriafrederick.files.wordpress.com/2014/09/060922-120543-doric-columns-frieze-with-triglyphsand-metopes-and-pediment-at-the-back-of-the-temple-of-hera-ii1.jpg Image Slide 11: http://www.storagenewsletter.com/wp-content/uploads/2014/02/memsqlv3.0.jpg Slide 18, Docker: https://www.docker.com/ Author: Christian Bisig, cbisig@hsr.ch, cbisig@gmail.com Student for Master of Science Engineering at Hochschule für Technik Rapperswil Master Research Unit, Software and Systems Hardware/Software used for tests: imac (late 2009) CPU: 2.8 GHz Intel Core i7 Ram: 16GB 1067 MHz DDR3 SSD 500GB Read: ~260MB/s Write: ~270MB/s OS: OSX El Capitan