Multidimensional Clustering (MDC) Tables in DB2 LUW. DB2Night Show. January 14, 2011

Similar documents
Deep Dive Into Storage Optimization When And How To Use Adaptive Compression. Thomas Fanghaenel IBM Bill Minor IBM

IBM DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Multi-Dimensional Clustering: a new data layout scheme in DB2

Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2

The DB2Night Show Episode #89. InfoSphere Warehouse V10 Performance Enhancements

Best Practices. Physical Database Design. IBM DB2 for Linux, UNIX, and Windows

What Developers must know about DB2 for z/os indexes

Presentation Abstract

Heckaton. SQL Server's Memory Optimized OLTP Engine

E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Robin Von Boeschoten, John P Kennedy IBM Toronto Laboratories Markham, Ontario, Canada

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Kathleen Durant PhD Northeastern University CS Indexes

Performance Tuning Batch Cycles

Data on External Storage

Column Stores vs. Row Stores How Different Are They Really?

Query Optimization, part 2: query plans in practice

DESIGNING FOR PERFORMANCE SERIES. Smokin Fast Queries Query Optimization

Range Partitioning Fundamentals

IBM DB2 courses, Universidad Cenfotec

CMSC424: Database Design. Instructor: Amol Deshpande

Storage hierarchy. Textbook: chapters 11, 12, and 13

Resource Optimization

That is my advertised agenda, but I might skip about a bit The idea is to share observations gleaned from experience of using systems that have been

File System Interface and Implementation

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis

So you think you know everything about Partitioning?

Course Outline. SQL Server Performance & Tuning For Developers. Course Description: Pre-requisites: Course Content: Performance & Tuning.

DB2 for z/os Optimizer: What have you done for me lately?

Chapter 12: Query Processing

Context. File Organizations and Indexing. Cost Model for Analysis. Alternative File Organizations. Some Assumptions in the Analysis.

7. Query Processing and Optimization

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation and Performance Improvement

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Chapter 11: Storage and File Structure. Silberschatz, Korth and Sudarshan Updated by Bird and Tanin

Data Warehousing & Big Data at OpenWorld for your smartphone

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

XML Storage in DB2 Basics and Improvements Aarti Dua DB2 XML development July 2009

Database Management and Tuning

Chapter 13: Query Processing

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Chapter 12: Query Processing. Chapter 12: Query Processing

CS 245: Database System Principles

L9: Storage Manager Physical Data Organization

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015


CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

Information Systems (Informationssysteme)

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Database Ph.D. Qualifying Exam Spring 2006

Lesson 9 Transcript: Backup and Recovery

In-Memory Data Management

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

How Viper2 Can Help You!

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Advanced Database Systems

Chapter 12. File Management

QUIZ: Is either set of attributes a superkey? A candidate key? Source:

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

CompSci 516: Database Systems

Modern DB2 for z/os Physical Database Design

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Revival of the SQL Tuner

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or

Improving PowerCenter Performance with IBM DB2 Range Partitioned Tables

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Categories and Subject Descriptors H.2.4 [Systems]: Relational databases. General Terms Algorithms, Measurement, Performance, Design.

SQLSaturday Sioux Falls, SD Hosted by (605) SQL

Seminar: Presenter: Oracle Database Objects Internals. Oren Nakdimon.

IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CAPITALIZING On DB2 LUW V9.7 Capabilities - A Data Management Feature Presentation

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Optimizing Database I/O

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEM IMPLEMENTATION. Sunu Wibirama

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

Chapter 12: Query Processing

OPERATING SYSTEM. Chapter 12: File System Implementation

CS3600 SYSTEMS AND NETWORKS

Chapter 11: File System Implementation. Objectives

Chapter 12: Indexing and Hashing. Basic Concepts

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Tuesday, April 6, Inside SQL Server

Boost ERP/CRM Performance by Reorganizing Your Oracle Database: A Proven Reorganization Strategy Donald Bakels

Transcription:

Multidimensional Clustering (MDC) Tables in DB2 LUW DB2Night Show January 14, 2011 Pat Bates, IBM Technical Sales Professional, Data Warehousing Paul Zikopoulos, Director, IBM Information Management Client Technical Professionals January 14, 2011

Agenda Multi-Dimensional Cluster Tables (MDCs) Why are they useful? MDC Precursor: Traditional Clustered Indexes MDC Benefits How They Work A History of Enhancements Customer Experiences DB2 for Data Warehousing and Business Intelligence Breaking down I/O barriers 2

Multidimensional Clustering Tables (MDCs) MDCs are a unique object in DB2 LUW that provide many advantages over other indexes Particularly regular clustered indexes Provide continuous, flexible and automatic clustering of data on disk Yield significant improvements in Query performance Disk space efficiency Data management overhead Dimensional and great for warehousing / BI Great for OLTP too 3

Before MDCs Traditional Clustered Indexes Data physically clustered according to the cluster column Efficient access on one dimension, but. Can only cluster on one column RID-based indexing on other columns doesn t benefit from ordering Heavy maintenance load Inefficient disk clustering over time Monitor and re-org to reclaim lost space Clustered index on REGION Table Index on YEAR Large RID-based index overhead Excessive index space requirements 4

Benefits of MDCs Efficient I/O == Performance 3-4X average query performance improvement, 10X+ for some queries Automated dimensional index creation & management DB2 automatically creates and manages dimensional indexes Never REORG an MDC table for re-clustering Only reorganize an MDC table to perform space reclamation Up to 64 Clustered Indexes per table (Not just the one) 90+% dimension index compression because of the on-disk nature of a MDC table and its associated block pointers You can mix MDC indexes with traditional RID indexes Administration-free rolling ranges No manual ATTACH or DETACH for range cycling: just load the data and MDC automatically provides the clustering 5

MDCs: How They Work REGION, YEAR Composite Block Index on REGION and YEAR East, 2009 East, 2009 Dimension Block Index on REGION REGION North, 2009 North, 2010 North, 2010 YEAR Dimension Block Index on YEAR South, 2010 Data is ordered along extant boundaries according to dimension (clustering) values DB2 creates n+1 block indexes when MDC table is created BIDs point to blocks of rows, not rows, so very compact 6

Processing a SELECT on an MDC Table (AND) 1. Index lookup done on each dimension block index 2. Join with block ANDing 3. Mini-relation scan of result blocks SELECT * FROM MDCTABLE WHERE COLOR= BLUE AND NATION= USA Block Index on Color Block Index on Nation Resulting List of Blocks to Scan Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 USA 12,0 76,0 92,0 100,0112,0216,0 276,0 12,0 76,0 100,0 216,0 + = 7

Processing a SELECT on an MDC Table (OR) 1. Index lookup done on each dimension block index 2. Join with block ORing 3. Mini-relation scan of result blocks SELECT * FROM MDCTABLE WHERE COLOR= BLUE OR NATION= USA Block Index on Color Block Index on Nation Resulting List of Blocks to Scan Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 USA 12,0 76,0 92,0 100,0112,0216,0 276,0 4,0 12,0 48,0 52,0 76,0 100,0 112,0 216,0 276,0 + = 8

Processing a SELECT on an MDC Table (with RID) 1. Block index lookup and RID index lookup 2. Block and RID ANDing 3. Result is row id s in qualifying blocks SELECT * FROM MDCTABLE WHERE COLOR= BLUE AND PARTNO < 1000 Block Index on Color RIDs from Index Resulting RIDS to fetch Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 6,4 8,12 50,1 77,3 107,0115,0219,5 276,9 6,4 50,1 77,3 219,5 + = 9

Processing a SELECT on an MDC Table (with RID) 1. Block index lookup and RID index lookup 2. Block and RID ORing 3. Result is row id s in qualifying blocks SELECT * FROM MDCTABLE WHERE COLOR= BLUE OR PARTNO < 1000 Block Index on Color Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 RIDs from Index 6,4 8,12 50,1 77,3 107,0115,0219,5 276,9 Resulting RIDS to fetch 8,12 107,0 115,0 276,9 Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 + = 10

MDC Benefit Faster Queries Queries can take advantage of block indexes Quickly and easily narrow down a portion of the table having particular dimension values or ranges of values (block elimination) Block indexes are small for very fast index lookups Relation scans of blocks are faster than RID based retrieval Prefetch entire blocks of data for block index scanner - no need for sequential detection on data access Block-level index ANDing and ORing; mixed with RIDs Data is guaranteed to be clustered on extents much faster retrieval BIDs provide additional access plan choices and do not prevent the use of traditional access plans (rid scans, joins, table scans, etc) 11

MDC Index Columns (Dimension) Selection When choosing dimensions for a table, consider: First, which queries will benefit? Examine workload and look for: Columns in equality or range queries Columns with coarse granularity FKs in fact tables consider generated columns to group continuous values like employee numbers Second, consider expected density of cells based on expected data # possible cells = cross product of dimension cardinalities (use stats) Possibility of sparsely populated blocks/cells Third, manipulate for optimal cell density Vary the number of dimensions Vary the granularity of one or more dimensions (rollup to higher grain) Vary the block (extent) size Or Use the DB2 Design Advisor! 12

MDCs INSERT, UPDATE, DELETE INSERT Find the Dimension combo in the composite block index search those blocks for space If not found or full: assign a new block (reuse empty block first) DELETE Find the matching dimension combo UPDATE Non-Dimension values: in-place update If variable length column and no more space, find another block Dimension column value: write to a different block Convert to DELETE then INSERT LOAD Utility LOAD and IMPORT work just like regular tables 13

MDC Block Delete Enhancements Since 8.1 8.2 Design Advisor for dimension selection Let the Design Advisor do the work for you 8.2.2 and 9.1 Block-by-block delete optimization Fast BID update and page-by-page delete Secondary RID index update slow Probe RID index, key-by-key deletes, write to log per index key deleted Secondary indexes could result in long ROLL-out times 9.5 Improved delete with asynchronous RID index cleanup Reduced I/O algorithm and page-by-page logging makes it very fast Fully parallelized for multiple index updates Perform all DELETE activity as a single unit for work - cleanup in a single pass of the data Continuous improvement from 8.2 through 9.7 and beyond 14

MDC Bulk Deletion Results in DB2 9.5 11,000,000 row table with 134,260 16KB pages and 8 RID indexes on a 4 node cluster 15

MDC Sparse Table Enhancements in DB2 9.7 Pre-DB2 9.7, blocks remained property of MDC table after DELETE Sole option was to perform classic REORG to reclaim space Think of this as a high water marker issue for the space that MDC tables occupy Even with empty extents in the MDC table, could have a full table space Needed to perform full offline REORG to give space back to table space Table reconstruction necessary without concurrent WRITE access to the table Operations on MDC table could reuse this space however Example: Insert new E and 04 dimensional set 16

Sparse MDC Tables From DB2 9.7 Onward DB2 9.7 extends the REORG command with a new RECLAIM EXTENTS ONLY REORG TABLE <mdc table name> RECLAIM EXTENTS ONLY Can include in automated table maintenance Not really a REORG: no COPY phase, no shadow copy, etc. Allows you to free space back to the table space in a minimum amount of time with maximum concurrency Storage is freed incrementally during processing Can control concurrency with ALLOW keyword during processing ALLOW WRITE (default) allows concurrent transactions to read and write Default to run on all partitions (range or hash): can override for specific partition Very fast: done in-place with no data movement with minimal logging 1. Find empty blocks in block map 2. Marks new empty block in the table s block map as unallocated MDC table no longer thinks those pages belong to it 3. Marks blocks as unallocated in the table space s space map pages SMPs Now the table space thinks it can use them 17

Sparse MDC Tables When should you REORG like this? Could make it auto-reorg New RECLAIMABLE_SPACE column added to ADMIN_GET_TAB_INFO() function to help you make that decision Provides information that isn t available to catalog tables Monitoring Examples Show me the amount of reusable space in my MDC table SELECT reclaimable_space as SPACE_AVAILABLE FROM TABLE SYSPROC.ADMIN_GET_TAB_INFO_V97 ( paulz, emp )) AS RECLAIMABLE_SPACE_FOR_THIS_TABLE Show me all MDC tables that have more than 10 MB of reusable space SELECT tabschema, tabname, reclaimable_space FROM sysibmadm.admintabinfo WHERE reclaimable_space > 10,000,00 18

Performance Optimization for MDC Tables in 9.7 Pre-9.7: Empty extents could be brought into the buffer pool Starting point of new scan Sequential pre-fetching algorithm grabs the data Pre-9.7: Wasted memory and resources to bring a block (extent) of pages into the buffer pool that have no value DB2 9.7 & Onward: Empty blocks returned to the table space helps performance by avoiding these scenarios 19

MDC Customer Experiences Sample Canadian Astronomy Data Centre "With the MDC function of the DB2 database, the customer can run queries on the more than one billion row database in less than a minute. Compared to other database solutions, that represents an acceleration of 20 to 70 percent for such complex queries" Brazil Telecom By using MDCs, we were able to run (in less then 2 minutes) one very important report that will allow our company be more competitive. Such report was impossible to run in our environment because it was requiring too many resources from the system 20

Customers Performance Experiences Query performance results: Most averaged around or just above 3X query performance improvement Maximum speedup included: 10X, 30X, 100X, 2000+X Average query speedup - # of X 14 Maximum query speedup 12 10 8 6 4 2 0 Cust 1 Cust 3 Cust 5 Cust 7 Cust 2 Cust 4 Cust 6 92x 20.0% 2000x 20.0% 30x 20.0% 5X 20.0% 10x 20.0% 5X 10x 30x 92x 2000x 21

How Does DB2 with MDCs Help BI? DB2 has proven technology to break the I/O barrier Optimize the pipe with Deep Compression Parallelize I/O with Database Partitioning Feature (DPF) Reduce I/O with Range Partitioning Compact I/O with Multidimensional Clustering Tables (MDC) The following will illustrate.. 22

Traditional Large Scans Result in I/O Wait 23

DB2 Database Partitioning Feature = Divide I/O Database Partition 1 Database Partition 2 Database Partition 3 24

Add Range Partitioning to Further Reduce I/O Database Partition 1 Database Partition 2 Database Partition 3 January February March 25

Add MDC to Further Reduce I/O Database Partition 1 Database Partition 2 Database Partition 3 January February March 26

Compression Further Reduces I/O by a Factor of X Database Partition 1 Database Partition 2 Database Partition 3 January February March 27

Questions and Thank You 28