The DB2Night Show Episode #89. InfoSphere Warehouse V10 Performance Enhancements

Similar documents
Multidimensional Clustering (MDC) Tables in DB2 LUW. DB2Night Show. January 14, 2011

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

HANA Performance. Efficient Speed and Scale-out for Real-time BI

DB2 for LUW Advanced Statistics with Statistical Views. John Hornibrook Manager DB2 for LUW Query Optimization Development

Presentation Abstract

What Developers must know about DB2 for z/os indexes

An Index's Guide to the Universe

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

7. Query Processing and Optimization

Range Partitioning Fundamentals

DB2 9 for z/os Selected Query Performance Enhancements

Decision Support Systems aka Analytical Systems

IBM DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Parallelism Strategies In The DB2 Optimizer

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11

Best Practices. Physical Database Design. IBM DB2 for Linux, UNIX, and Windows

Overview of Implementing Relational Operators and Query Evaluation

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

CS330. Query Processing

Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2

Basics of Dimensional Modeling

Overview of Query Evaluation. Overview of Query Evaluation

What s new in DB2 9 for z/os for Applications

Principles of Data Management. Lecture #9 (Query Processing Overview)

DB2 Performance Essentials

Deploying an IBM Industry Data Model on an IBM Netezza data warehouse appliance

Advanced Data Management Technologies Written Exam

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Optimizer Challenges in a Multi-Tenant World

Real-World Performance Training Star Query Edge Conditions and Extreme Performance

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Multi-Dimensional Clustering: a new data layout scheme in DB2

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

Part 1: Indexes for Big Data

Information Systems (Informationssysteme)

Column Stores vs. Row Stores How Different Are They Really?

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Welcome to the presentation. Thank you for taking your time for being here.

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Data warehouses Decision support The multidimensional model OLAP queries

Overview of Query Evaluation

Data Warehousing & Data Mining

InfoSphere Warehouse V9.5 Exam.

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017

An Overview of Data Warehousing and OLAP Technology

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Huge market -- essentially all high performance databases work this way

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

15-415/615 Faloutsos 1

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

SAP CERTIFIED APPLICATION ASSOCIATE - SAP HANA 2.0 (SPS01)

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

The DB2Night Show Episode #88. Real Time Warehousing with IBM InfoSphere Warehouse V10

Relational Query Optimization. Highlights of System R Optimizer

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Evaluation of Relational Operations

DB2 10 for z/os Optimization and Query Performance Improvements

DB2 for z/os Optimizer: What have you done for me lately?

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Evaluation of Relational Operations: Other Techniques

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Hash-Based Indexing 165

Advanced Oracle SQL Tuning v3.0 by Tanel Poder

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Cost Models. the query database statistics description of computational resources, e.g.

Best Practices - Pentaho Data Modeling

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

Data Warehousing and Decision Support, part 2

Query Evaluation Overview, cont.

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Column-Stores vs. Row-Stores: How Different Are They Really?

Informix Extended Parallel Server 8.3

Introduction to Data Warehousing

Database performance becomes an important issue in the presence of

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

Implementation of Relational Operations: Other Operations

DB2 12 for z Optimizer

Database Management and Tuning

Evaluation of Relational Operations: Other Techniques

Evaluation of relational operations

Guide Users along Information Pathways and Surf through the Data

Query Evaluation Overview, cont.

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Oracle Database 11g: SQL Tuning Workshop

An Introduction to Big Data Formats

Evaluation of Relational Operations

Designing dashboards for performance. Reference deck

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

XML: Changing the data warehouse

Evaluation of Relational Operations: Other Techniques

IBM DB2 10 for z/os beta. Reduce costs with improved performance

Creating indexes suited to your queries

Transcription:

The DB2Night Show Episode #89 InfoSphere Warehouse V10 Performance Enhancements Pat Bates, WW Technical Sales for Big Data and Warehousing, jpbates@us.ibm.com June 27, 2012 June 27, 2012

Multi-Core Parallelism Improvements What is new? Starting in DB2 10 Greater flexibility in controlling degree of parallelism through WLM for concurrent running applications and workloads Better built-in runtime decision of parallelism degree whenany is specified DB2 10 reduces overhead for queries with no parallelism (DEGREE=1) New REBAL plan operator rebalances work among subagents Contention on various latches alleviated or eliminated How to enable Multi-Core Parallelism? Multi-core parallelism is enabled by simply turning on theintra_parallel dbm configuration parameter [YES, NO] and setting the degree of parallelism DFT_DEGREE db configuration parameter SET CURRENT DEGREE [ANY,1,..,n] BIND parameter DEGREE [ANY,1,..,n] ALTER WORKLOAD myworkload MAXIMUM DEGREE [ANY,1..n] 5

Greater Flexibility in Controlling Multi-Core Parallelism Enable/disable parallelism within a database application CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL ( YES NO ) Controlling maximum parallelism degree through DB2 workload manager Set themaximum DEGREE workload attribute ALTER WORKLOAD myworkload MAXIMUM DEGREE 2 Only affects workloadmyworkload Takes effect at transaction boundary NO instance or database recycle Precedence Order WLM:MAXIMUM DEGREE Stored Procedure:ADMIN_SET_INTRA_PARALLEL Instance level configuration parameter:intra_parallel 6

Parallelism and Workload Imbalance Data filtering and data skew can cause workloads between subagents to become imbalanced Rebalance (REBAL) operator is a new access plan operator A mechanism for transferring rows between subagents to rebalance their workload Light weight mechanism for rebalance This operator puts underutilized subagents to work Parallel plan with unbalanced workload Rebalanced data stream Only one CPU is busy LTQ NLJN Data redistributed equally among the subagents LTQ NLJN IXScan Index: X1 Fetch IXScan Table: T1 Index: X1 REBAL IXScan Index: X1 Fetch IXScan Table: T1 Index: X1 7

Jump Scan What is Jump Scan? Improvement to avoid large and expensive index scans when the index has gaps Index: (c1, c2) [c1: gap column; c2 non-gap column] Query: SELECT info FROM TPCD.LINEITEM T WHERE T.c2 = 10 (Equality Predicate on c2) (No predicate on c1 unconstrained gap) Why do we need Jump Scan? Difficult to find optimal set of indexes & optimize workloads with many ad-hoc queries ad-hoc queries often have gaps in composite indexes Jump Scan improves performance of Index Scans with gaps in the index When does Jump Scan work best? In data warehouse and SAP environments where query predicates leave gaps in the index, which slows down the index scanning process & leads to poor query performance When the gap column has small cardinality (= number of distinct values that exist for the gap) and the non-gap column predicate is highly selective Ideal case example c1 has 10 distinct values c2 contains a million distinct values, 10 of which satisfy the query predicate. Avoids the need for additional indexes 10

Jump Scan How Does it Work? Composite Index (product_id, order_location, order_amnt) SELECT * FROM orders WHERE product_id = 10 AND order_amnt = 898 Index Scan with Gap (before DB2 10) GAP Index Scan with Gap (with Jump Scan) 11 DB2 scans a huge range of the index between start/stop keys while applying predicates to locate qualifying keys Queries against tables with composite (multicolumn) indexes present a challenge when the query results in a gap (constrained or unconstrained) Index manager skips forward through the index while bypassing large sections that will not yield any results Processing involves two related scans Positioning: fills in the missing key parts Consuming: locates matching keys

Smart Data and Smart Index Prefetching (Data and Index page Sequential Detection + Read Ahead (SD+RA)) What is Smart Data/Index Prefetching? New Prefetching types in DB2 10 that switches between DB2 s original Sequential Detection Prefetching (SD) to Read Ahead Prefetching when tables and indexes become unclustered Uses the index to determine which index and data pages are accessed next (in contrast to SD, which guesses which pages are needed in the future) When does it work best? When tables get very disorganized (e.g. through frequent IUD operations) Fetch Why do we need Smart Data / Index Prefetching? Effective Pre-fetching is critical for MCP Without pre-fetching, all subagents might wait while one performs I/O Maximize data page prefetching and improve performance of IXSCAN and IXSCAN/FETCH for imperfectly clustered tables Minimize need for table REORGs Increase IXSCAN and IXSCAN/FETCH performance consistency independent from the degree of table and index organization IXScan Table: T1 Index: X1 12

Star Schema Defined What is Star Schema? Simplest form of a dimensional model How the data is organized? Facts Dimensions A typical star schema based query Joins a subset of the dimensions with the fact tables In a snowflake schema there may also be joins between dimension tables Account Id Name Company Better performance Improves the performance of queries in data warehouse or data mart environments Reduces the total cost of ownership Less complex tuning actions throughout the entire application life cycle Date Id Date Day Month Quarter Year Sales Date_Id Store_Id Product_Id Units_Sold Product Id EAN_Code Product_Name Brand Category Fact table Dimension table Snowflakes Store Id Store_Number Province Country 15

Star Schema Enhancements in InfoSphere Warehouse 10 New star schema detection method Allows the query optimizer to detect stars based on unique attributes Primary keys, unique indexes, or unique constraints New zigzag join method Provides consistent query performance in warehousing environments Supports multiple fact table queries Exploits indexes even when there is a gap in probing key Optim Query Workload Tuner helps to determine optimal multicolumns indexes to enable the zigzag join 16

Zigzag Join What is zigzag join? A new join method for complex queries on dimensional schemas Works for star schema queries in single or multiple subject areas with snowflakes Works seamlessly with serial or DPF databases, range-partitioned and MDC tables Simple index adviser integrated with db2exfmt and OQT to help with fact table index design and get best performance from zigzag join ZZJOIN Why do we need zigzag join? Improved query performance More stable performance that is less sensitive to small query or configuration changes Easier logical database design When can DB2 use zigzag join? Joins between fact table(s) and dimension tables on dimension unique keys Fact table must have a multicolumn index that contains at least two of the join keys used in the query Table: T1 Table: T2 IXScan Index: t1_t2 17

Improved Performance of Queries View the new operators added as part of PED and PEA processing in EXPLAIN NEW values added to the EXPLAIN_ARGUMENT table Indicate queries using the new hashing function ARGUMENT_TYPE column = UNIQUE ARGUMENT_VALUE column: HASHED_PARTIAL SELECT DISTINCT c11, c12, c21, c22 FROM Table1, Table2 WHERE c11 = c21 db2exfmt output access plan details punique in the access plan signifies the PED operation 22

RUNSTATS Enhancements 23 Index sampling In DB2 9.7 only data pages can be sampled Global indexes on range-partitioned tables can be very large All partitions of a partitioned index are read in DB2 9.7 Performance improvements Path length reduction RUNSTATS is very CPU-intensive Use new index readahead prefetching capability More efficient I/O when sequential detect prefetching isn t possible Usability improvements Specifying table or index schema will be optional Support VIEW keyword Make SAMPLED the default when DETAILED is specified SAMPLED DETAILED is currently the recommended index option Similar changes to CREATE INDEX COLLECT STATISTICS clause 23

RUNSTATS Index Sampling 25 >>-RUNSTATS ON--+-TABLE-+--object-name--------------------------> '-VIEW--'... >--+----------------------------+--+----------------------------+---> '- Table Sampling Options -' '- Index Sampling Options -' Table Sampling Options --TABLESAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------> '-SYSTEM----' >--+-----------------------------------+------------------------ '-REPEATABLE--(--integer-literal--)-' Index Sampling Options --INDEXSAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------ '-SYSTEM----' 25