ColumnStore Indexes. מה חדש ב- 2014?SQL Server.

Similar documents
SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer

Sepand Gojgini. ColumnStore Index Primer

Boosting DWH Performance with SQL Server ColumnStore Index

ColumnStore Indexes UNIQUE and NOT DULL

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Columnstore in real life

Greenplum Architecture Class Outline

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Column Stores vs. Row Stores How Different Are They Really?

Columnstore Indexes In SQL Server 2016 #Columnstorerocks!!

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Column Store Internals

MTA Database Administrator Fundamentals Course

Advanced Data Management

HANA Performance. Efficient Speed and Scale-out for Real-time BI

CAST(HASHBYTES('SHA2_256',(dbo.MULTI_HASH_FNC( tblname', schemaname'))) AS VARBINARY(32));

SQL Server 2016 gives 40% improved performance over SQL Server 2014

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017

Relational Database Index Design and the Optimizers

20464 Developing Microsoft SQL Server Databases

20464: Developing Microsoft SQL Server 2014 Databases

Course Prerequisites: This course requires that you meet the following prerequisites:

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Querying Data with Transact SQL

Manual Trigger Sql Server 2008 Insert Multiple Rows

[MS20464]: Developing Microsoft SQL Server 2014 Databases

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan

Part 1: Indexes for Big Data

Exadata Implementation Strategy

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Deep Dive Into Storage Optimization When And How To Use Adaptive Compression. Thomas Fanghaenel IBM Bill Minor IBM

#ITDEVCONNECTIONS ITDEVCONNECTIONS.COM

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Oracle In-Memory & Data Warehouse: The Perfect Combination?

Columnstore Technology Improvements in SQL Server 2016

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

SQL Server 2014 Internals and Query Tuning

Oracle Database In-Memory

Data Warehousing & Big Data at OpenWorld for your smartphone

Indexes Best Practices (II) More T-SQL Control-Of-Flow Language

On-Disk Bitmap Index Performance in Bizgres 0.9

20762B: DEVELOPING SQL DATABASES

An Oracle White Paper February Optimizing Storage for Oracle PeopleSoft Applications

SQL Coding Guidelines

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002

Automating Information Lifecycle Management with

Data Warehouse Fast Track

<Insert Picture Here> Controlling resources in an Exadata environment

Microsoft. [MS20762]: Developing SQL Databases

Building Better. SQL Server Databases

Developing SQL Databases

Oracle 1Z0-515 Exam Questions & Answers

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2

Introduction to Column Stores with MemSQL. Seminar Database Systems Final presentation, 11. January 2016 by Christian Bisig

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Microsoft Developing SQL Databases

Oracle Database New Performance Features

Developing SQL Databases

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved


Course Outline. SQL Server Performance & Tuning For Developers. Course Description: Pre-requisites: Course Content: Performance & Tuning.

Chapter 12: Indexing and Hashing. Basic Concepts

In-Memory Columnar Databases - Hyper (November 2012)

Non-relational Lift and Shift. Cheap, flexible Access Customer managed 250GB PB+

Přehled novinek v SQL Server 2016

Was ist dran an einer spezialisierten Data Warehousing platform?

Database In-Memory: A Deep Dive and a Future Preview

EECS 647: Introduction to Database Systems

Chapter 12: Indexing and Hashing

IBM DB2 9 Database Administrator for Linux UNIX and Windows Upgrade.

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Oracle Rebuild All Unusable Indexes In Schema

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Greenplum SQL Class Outline

ORACLE DATA SHEET ORACLE PARTITIONING

CSC 261/461 Database Systems Lecture 19

Relational Database Index Design and the Optimizers

Developing SQL Databases (762)

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

An Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Main-Memory Databases 1 / 25

Evolving To The Big Data Warehouse

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Data Warehousing 11g Essentials

Caching and Buffering in HDF5

Sql Server 'create Schema' Must Be The First Statement In A Query Batch

Actian Vector in Hadoop

SAP HANA Scalability. SAP HANA Development Team

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.

Querying Data with Transact-SQL

Hybrid Columnar Compression (HCC) on Oracle Database 18c O R A C L E W H IT E P A P E R FE B R U A R Y

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Transcription:

ColumnStore Indexes מה חדש ב- 2014?SQL Server דודאי מאיר meir@valinor.co.il

3

Column vs. row store Row Store (Heap / B-Tree) Column Store (values compressed) ProductID OrderDate Cost ProductID OrderDate Cost data page 1000 310 20010701 2171.29 311 20010701 1912.15 312 20010702 2171.29 data page 2000 310 311 312 data page 2001 20010701 20010702 data page 2002 2171.29 1912.15 2171.29 313 20010702 413.14 313 413.14 data page 1001 ProductID OrderDate Cost 314 20010701 333.42 315 20010701 1295.00 316 20010702 4233.14 314 315 316 317 318 319 320 20010703 20010704 333.42 1295.00 4233.14 641.22 24.95 64.32 317 20010702 641.22 321 1111.25 4

Columnstore storage model Data stored column-wise Each page stores data from a single column Highly compressed More data fits in memory Each column can be accessed independently Fetch only columns needed Can dramatically decrease I/O C1 C2 C3 C4 C5 C6 5

How Are Performance Gains Achieved? Two complimentary technologies: Storage Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). 1. Columnar storage allows for less data to be accessed when only a sub-set of columns are referenced 2. Data density/selectivity determines how compression friendly a column is example State / City / Gender 3. Translates to improved buffer pool memory usage New batch mode execution Data can then be processed in batches (1,000 row blocks) versus row-by-row Depending on filtering and other factors, a query may also benefit by segment elimination - bypassing million row chunks (segments) of data, further reducing I/O C1 C2 C3 C4 C5 C6 6

Batch mode Improving CPU utilization Biggest advancement in query processing in years! Data moves in batch through query plan operators Minimizes instructions per row Takes advantage of cache structures Highly efficient algorithms Better parallelism 7

bitmap of qualifying rows Batch mode processing QP vector operators Batch object Process ~1000 rows at a time Batch stored in vector form Optimized to fit in cache Vector operators implemented Filter, hash join, hash aggregation, Greatly reduced CPU time (7 to 40X) Column vectors 8

9

Column Store Delta (row) store Updatable columnstore index C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 tuple mover Table consists of column store and row store DML (update, delete, insert) operations leverage delta store INSERT Values Always lands into delta store DELETE Logical operation Data physically remove after REBUILD operation is performed. UPDATE DELETE followed by INSERT. BULK INSERT if batch < 100k, inserts go into delta store, otherwise columnstore SELECT Unifies data from Column and Row stores - internal UNION operation. Tuple mover converts data into columnar format once segment is full (1M of rows) REORGANIZE statement forces tuple mover to start.

Comparing space savings 101 million row table + index space 19.7GB 10.9GB 6.9GB 5.0GB 4.0GB 1.8GB TABLE WITH CUSTOMARY INDEXING TABLE WITH CUSTOMARY INDEXING (PAGE COMPRESSION) TABLE WITH NO INDEXING TABLE WITH NO INDEXING (PAGE COMPRESSION) TABLE WITH COLUMNSTORE INDEX CLUSTERED COLUMNSTORE 11

Structure of in-memory data warehouse How it works CREATE CLUSTERED COLUMNSTORE Organizes and compresses data into columnstore BULK INSERT Creates new columnstore row groups INSERT Rows are placed in the row store (heap) When row store is big enough, a new columnstore row group is created Partition Columnstore Deleted Bitmap Row Store 12

Structure of in-memory data warehouse How it works (continued) DELETE Rows are marked in the deleted bitmap UPDATE Delete plus insert Most data is in columnstore format Partition Columnstore Deleted Bitmap Row Store

Batch mode processing What s new? SQL Server 2014 Support for all flavors of JOINs OUTER JOIN Semi-join: IN, NOT IN UNION ALL Scalar aggregates Mixed mode plans Improvements in bitmaps, spill support, 14

Archival compression What s new? Adds an additional layer of compression on top of the inherent compression used by columnstore Shrink on-disk database sizes by up to 27% Compression applies per partition and can be set either during index creation or during rebuild 15

Enhanced compression COLUMNSTORE COLUMNSTORE_ARCHIVE TPCH 3.1X TPCDS 2.8X Customer 1 3.9X Customer 2 4.3X ** compression measured against raw data file sys.partitions 22

Partitioning columnstores The basic mechanism The motivation is manageability over performance CREATE TABLE <table> ( ) As usual CREATE CLUSTERED COLUMNSTORE INDEX <name> on <table> Converts entire table to Columnstore format BULK INSERT, SELECT INTO INSERT UPDATE DELETE 17

Insert and updating data Load sizes Bulk insert Creates row groups of 1Million rows, last row group is probably not full But if <100K rows, will be left in Row Store Insert/Update Collects rows in Row Store Tuple Mover When Row Store reaches 1Million rows, convert to a Columnstore Row Group Runs every 5 minutes by default Started explicitly by ALTER INDEX <name> ON <table> REORGANIZE 18

19

Columnstore enhancements summary What s being delivered Clustered and updateable columnstore index Columnstore archive option for data compression Main benefits Real-time super fast data warehouse engine Ability to continue queries while updating without the need to drop and recreate index or partition switching Huge disk space saving due to compression Ability to compress data 5 15x using archival per-partition compression Better performance and more efficient (less memory) batch query processing using batch mode rather than row mode

Call to action Download SQL Server 2014 CTP2 www.microsoft.com/sqlserver