SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer

Similar documents
ColumnStore Indexes UNIQUE and NOT DULL

ColumnStore Indexes. מה חדש ב- 2014?SQL Server.

Sepand Gojgini. ColumnStore Index Primer

Columnstore Indexes In SQL Server 2016 #Columnstorerocks!!

Boosting DWH Performance with SQL Server ColumnStore Index

20762B: DEVELOPING SQL DATABASES

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Microsoft. [MS20762]: Developing SQL Databases

Developing SQL Databases

20464: Developing Microsoft SQL Server 2014 Databases

Microsoft Developing SQL Databases

Module 9: Managing Schema Objects

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.

[MS20464]: Developing Microsoft SQL Server 2014 Databases

Columnstore in real life

Course Prerequisites: This course requires that you meet the following prerequisites:

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

20464 Developing Microsoft SQL Server Databases

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan

Physical Organization: SQL Server. Leggere Cap 7 Riguzzi et al. Sistemi Informativi

Columnstore Technology Improvements in SQL Server 2016

Survey of the Azure Data Landscape. Ike Ellis

SQL Server 2016 gives 40% improved performance over SQL Server 2014

Introduction to Column Stores with Microsoft SQL Server 2016

Data Warehouse Fast Track

SQL Server is Microsoft s premier relational

Deep Dive Into Storage Optimization When And How To Use Adaptive Compression. Thomas Fanghaenel IBM Bill Minor IBM

Developing SQL Databases (762)

Venezuela: Teléfonos: / Colombia: Teléfonos:

SQL Server 2014 In-Memory Technologies.

Developing Microsoft SQL Server Databases

Evolving To The Big Data Warehouse

Availability and Performance for Tier1 applications

Designing Database Solutions for Microsoft SQL Server (465)

CAST(HASHBYTES('SHA2_256',(dbo.MULTI_HASH_FNC( tblname', schemaname'))) AS VARBINARY(32));

Developing SQL Databases

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

SQL Server 2014 Internals and Query Tuning

70-459: Transition Your MCITP: Database Administrator 2008 or MCITP: Database Developer 2008 to MCSE: Data Platform

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

CIB Session 12th NoSQL Databases Structures

Greenplum Architecture Class Outline

Was ist dran an einer spezialisierten Data Warehousing platform?

HPE ProLiant DL580 Gen10 and Ultrastar SS300 SSD 195TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Field Testing Buffer Pool Extension and In-Memory OLTP Features in SQL Server 2014

Microsoft Analytics Platform System (APS)

SQL Server 2014 In-Memory OLTP: Prepare for Migration. George Li, Program Manager, Microsoft

New Features Bulletin Replication Server Options 15.6

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Column Stores vs. Row Stores How Different Are They Really?

Developing Microsoft SQL Server 2012 Databases 36 Contact Hours

An Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica


Top Five Reasons for Data Warehouse Modernization Philip Russom

Developing Microsoft SQL Server 2012 Databases

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Querying Microsoft SQL Server (461)

Oracle 1Z0-515 Exam Questions & Answers

Data Warehousing & Big Data at OpenWorld for your smartphone

ExecuTrain Course Outline Course 10776A: Developing Microsoft SQL Server 2012 Databases 5 Days

Lenovo Database Configuration for Microsoft SQL Server TB

Lab 4: Tables and Constraints

SQL Server 2014 Highlights der wichtigsten Neuerungen In-Memory OLTP (Hekaton)

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Data about data is database Select correct option: True False Partially True None of the Above

CHAPTER. Oracle Database 11g Architecture Options

Data Warehousing 11g Essentials

Exact Numeric Data Types

Tables. Tables. Physical Organization: SQL Server Partitions

Automating Information Lifecycle Management with

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

Developing Microsoft SQL Server 2012 Databases

Physical Organization: SQL Server 2005

Oracle Rebuild All Unusable Indexes In Schema

Part 1: Indexes for Big Data

HANA Performance. Efficient Speed and Scale-out for Real-time BI

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

Crystal Reports. Overview. Contents. How to report off a Teradata Database

Evolution of Database Systems

Manual Trigger Sql Server 2008 Insert Multiple Rows

Sql Server Syllabus. Overview

EXAM TS: Microsoft SQL Server 2008, Database Development. Buy Full Product.

"Charting the Course... MOC A Developing Microsoft SQL Server 2012 Databases. Course Summary

SQL Server technical e-book series. SQL Server performance: faster querying with SQL Server

SAP HANA Scalability. SAP HANA Development Team

MTA Database Administrator Fundamentals Course

SQL Interview Questions

OLAP Introduction and Overview

Manual Trigger Sql Server 2008 Examples Insert Update

Column-Stores vs. Row-Stores: How Different Are They Really?

Safe Harbor Statement

Tomasz Libera. Azure SQL Data Warehouse

Column Store Internals

DESIGNING DATABASE SOLUTIONS FOR MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

#ITDEVCONNECTIONS ITDEVCONNECTIONS.COM

Hybrid Columnar Compression (HCC) on Oracle Database 18c O R A C L E W H IT E P A P E R FE B R U A R Y

Transcription:

SQL Server 2014 Column Store Indexes Vivek Sanil Microsoft Vivek.sanil@microsoft.com Sr. Premier Field Engineer

Trends in the Data Warehousing Space Approximate data volume managed by DW Less than 1TB 1-3 TB 3-10 TB More than 10 TB Don't Know 41% 17% 21% 18% 19% 25% 17% 34% 2% 6% 0% 10% 20% 30% 40% 50% Today In 3 years Scale more: DW systems continue to grow at a fast pace, scalability is a key concern, growing a system from 10s of TBs, to 100s of TB, to PBs Performance at scale: ability to analyze massive amounts of data while offering interactive query response Data warehousing for masses: drive down price per TB Source: TDWI Report Next Generation DW Columnstore designed to address above need

Columnstore Index In-memory columnstore Lives in both memory and disk Built-in to core RDBMS engine Customer benefits: - 10-100x faster - Reduced design effort - Hyper-efficient storage subsystem - Works on customers existing hardware - Easy upgrade, easy management By using SQL Server 2012 In-Memory Columnstore, we were able to extract about 100 million records in 2 or 3 seconds versus the 30 minutes required previously. - Atsuo Nakajima Asst Director, Bank of Nagoya Columnstore Index Representation Existing Tables (Partitions) C 1 C 2 C 3 C 4 C 5 C 6

Columnstore Index Storage Model Data Stored Column-wise Each page stores data from a single column Highly compressed More data fits in memory C1 C2 C3 C4 C5 C6 Each column can be accessed independently Fetch only columns that are needed Can dramatically decrease I/O

Batch Mode: Improving CPU Utilization Biggest advancement in query processing in years! Data moves in batches through query plan operators Minimizes instructions per row Takes advantage of cache structures Highly efficient algorithms Better parallelism

Columnstore in SQL 2012 SQL Server 2012 Columnstore functionality Non-clustered Columnstore indexes Improved compression, compared to ROW/PAGE compression Improved query performance for large read scenarios Limitations No DML support, no updates (data refresh) Only secondary, non-clustered, Columnstore indexes supported Poor memory management (Resource Governor was not honored, index build/rebuild, run-time) Limited data types support Limited batch operations supported

SQL 2014 - Clustered Columnstore Index: Why is clustered index important? Saves space used Simplifies management no secondary indexes to maintain 20.0 Sample Space Used in GB (101 million row table) Columnstore (and clustered Columnstore index) will be PREFERRED storage engine for DW scenarios We encourage users to either move existing tables to CCI, or start using CCI for new tables 15.0 10.0 5.0 91% savings Additional data types are supported High precision decimal, datetimeoffset, binary, varbinary, uniqueidentifier, etc.) Unsupported types: spatial, XML, max types 0.0 Table with customary indexing Table with Table with no customary indexing indexing (page compression) Table with no indexing (page compression) Table with columnstore index Clustered columnstore DDL supported Evolve your schema design as needed ** Space Used = Table space + Index space

Clustered Columnstore index Key Characteristics Available in Enterprise, Developer, and Evaluation editions Updateable Includes all columns in the table Only index on the table, cannot be combined with any other indexes Uses Columnstore compression Columns not physically sorted. Stores data to improve compression and performance

Nonclustered Columnstore Index Key Characteristics No need to include all of the columns in the table Stores a copy of the columns in the index Is not updateable. Changes = rebuild index Can be combined with other indexes on the table Uses Columnstore compression Columns not physically sorted. Stores data to improve compression and performance

Archival Compression What s New? Adds an additional layer of compression on top of the inherent compression used by Columnstore Shrink on-disk database sizes by up to 27% Compression applies per partition and can be set either during index creation or during rebuild Use archival compression only when extra time and CPU resources to compress and retrieve the data are affordable

Columnstore Enhancements Summary New functionalities delivered Clustered and updateable Columnstore index Columnstore archive option for data compression Global batch aggregation Main benefits Real-time super fast data warehouse engine Ability to continue queries while updating without the need to drop and recreate index or partition switching Huge disk space saving due to compression Ability to compress data 5 15x using archival per-partition compression Better performance and more efficient (less memory) batch query processing using batch mode rather than row mode

Columnstore Index Structure Row Groups & Segments Segment A segment contains values for one column for a set of rows Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory Segments C1 C2 C3 C4 C5 C6 Row group Segments for the same set of rows comprise a row group Row group

Columnstore Index Processing Example

Horizontally Partition - Row Groups ~ 1M rows 14

Vertical Partition - Segments

Compress Each Segment* Some Compress More than Others *Encoding and reordering not shown 16

Concepts Coming Together: Loading Data into a Nonclustered Columnstore Index Rows to Load Rowgroups Column Segments C1 C2 C3 C4 Columnstore C1 C2 C3 C4 Compressed column segments are added to Columnstore

Syntax CREATE CLUSTERED COLUMNSTORE INDEX CL_Simple ON SIMPLETABLE WITH (MAXDOP = 0) ON PRIMARY; CREATE COLUMNSTORE INDEX NCI_Simple ON SIMPLETABLE ( SimpleID, SimpleAddressID, SimpleStateID, Amt ); Have to specify columns for nonclustered columnstore index CREATE CLUSTERED COLUMNSTORE INDEX CL_Simple ON SIMPLETABLE WITH (DROP_EXISTING = ON) ON PRIMARY; Required if there is an existing clustered index / columnstore index

Limitations and Restrictions Combination with nonclustered indexes A table with a clustered columnstore index cannot have any type of nonclustered index Constraints A table with a clustered columnstore index cannot have unique constraints, primary key constraints, or foreign key constraints View Cannot be created on a view or indexed view Keywords Cannot be created by using the INCLUDE, ASC and DESC keyword

Unsupported Data Types Following data types are not supported ntext, text, and image varchar(max) and nvarchar(max) rowversion (and timestamp) sql_variant CLR types (hierarchyid and spatial types) xml

Column Store Delta (row) store Updatable Columnstore Index C1 C2 C3 C4 C5 C6 Table consists of column store and row store DML (update, delete, insert) operations leverage delta store C1 C2 C3 C4 C5 C6 INSERT Values Always lands into delta store* DELETE Logical operation Data physically remove after REBUILD operation is performed. UPDATE DELETE followed by INSERT. BULK INSERT if batch < 100k, inserts go into delta store, otherwise columnstore SELECT Unifies data from Column and Row stores - internal UNION operation. Tuple Mover Tuple mover converts data into columnar format once segment is full (1M of rows) REORGANIZE statement forces Tuple Mover to start.

RowGroup DMV Row store or deltastore can accept rows SELECT * FROM sys.column_store_row_groups Columnstore Each row group has its own deltastore Closed (Full) Waiting to be compressed * RETIRED All rows deleted INVISIBLE Data in memory only

Bulk Insert Optimizations Threshold for Tuple Mover is now 102,400 rows < 102,400 Rows inserted into delta store >= 102,400 rows directly into columnstore If greater than 1,048,576 then rowgroup size will is limited to 1,048,576 Less than full columnstore row groups created by bulk insert will not be consolidated Batches of 90K row inserts: you eventually get large segments Batches of 120K row inserts: you get many 120K segments, and performance may not be as optimal long term Index rebuild will fix this by defragging the index, but that is resource intensive Physical order of data file determines how segments are created

Bulk Insert Optimizations Bulk Insert < 102,400 Rows Bulk Inserted 105K Rows (>= 102,400 Rows) ALTER INDEX REBUILD

Tuple Mover Runs every 5 minutes by default When row store reaches 1,048,576 rows convert to a columnstore De-allocates row groups where all rows are deleted Start manually ALTER INDEX REORGANIZE Extended events columnstore_tuple_mover_begin_compress columnstore_tuple_mover_end_compress

Tuple Mover Control Tuple Mover does consume resources Trace flag to disable Tuple Mover (634) as an edge case When disabled, has to be manually invoked with: Alter Index ( ) Reorganize/Rebuild If disabled and not manually invoked: Can cause performance issues when querying data Can end up with multiple rowstores (deltastore) which won t be compressed

Index Maintenance Operations Index rebuild: Re-creates clustered columnstore index completely ALTER TABLE REBUILD ALTER INDEX REBUILD CREATE CLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING = ON) Reorganize: Forces delta store operations only ALTER INDEX REORGANIZE // compresses closed row groups REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON) // compresses all row groups

Statistics for Columnstore Index The needs for statistics Histogram of statistics is required for query plan generation for Columnstore indexes used by the optimizer Best Practices Keep statistics up to date Create multicolumn statistics on correlated columns 28

Best Practices Create columnstore index on large fact tables Leverage star joins Joins on integer keys Leverage Parallelism Provide sufficient memory Use in conjunction with partitioned tables 29

Non-Clustered Columnstore indexes Do we still need them? Yes, if you need constraints or triggers on the table Creating a CCI will fail if there is a B-tree enforcing a key constraint However, you won t be able to update the table No, if constraints are not needed Create table and add a clustered columnstore index No other indexes to worry about Can insert / update / delete in the table Consistent fast query performance 30

Updating Non-Clustered Columnstore Disable index, update data, rebuild -- or - Use partition switching -- or- Use delta table and UNION ALL 31

Questions?