Multidimensional Grouping Made Easy

Similar documents
OLAP Introduction and Overview

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Advanced Data Management Technologies

Performance Tuning BI on SAP NetWeaver Using DB2 for i5/os and i5 Navigator

Data Warehousing and Data Mining SQL OLAP Operations

DB2 SQL Class Outline

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Oracle Database 10g: Introduction to SQL

Microsoft Querying Microsoft SQL Server 2014

Introduction to Computer Science and Business

My Favorite Things in DB2 11 for z/os

Optimize SQL Performance on DB2 for i. Strategies to make your query and application as fast as possible.

CS 1655 / Spring 2013! Secure Data Management and Web Applications

Tools, tips, and strategies to optimize BEx query performance for SAP HANA

SQL Saturday Cork Welcome to Cork. Andrea Martorana Tusa T-SQL advanced: Grouping and Windowing

Chapter 18: Data Analysis and Mining

REPORTING AND QUERY TOOLS AND APPLICATIONS

$99.95 per user. Writing Queries for SQL Server (2005/2008 Edition) CourseId: 160 Skill level: Run Time: 42+ hours (209 videos)

Sql Server Syllabus. Overview

Andrea Martorana Tusa. T-SQL Advanced: Grouping and Windowing

Multidimensional Queries

Beyond Query/400: Leap into Business Intelligence with DB2 Web Query

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Working with DB2 Data Using SQL and XQuery Answers

Building a Waterfall Chart in Excel

BUILDING A WATERFALL CHART IN EXCEL

6 SSIS Expressions SSIS Parameters Usage Control Flow Breakpoints Data Flow Data Viewers

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

Guide Users along Information Pathways and Surf through the Data

20761B: QUERYING DATA WITH TRANSACT-SQL

IBM i Version 7.3. Database Administration IBM

T-SQL Training: T-SQL for SQL Server for Developers

Data warehouses Decision support The multidimensional model OLAP queries

CUBE, ROLLUP, AND MATERIALIZED VIEWS: MINING ORACLE GOLD John Jay King, King Training Resources

Taking a First Look at Excel s Reporting Tools

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Querying Data with Transact-SQL

Database Lab Queries. Fall Term 2017 Dr. Andreas Geppert

Querying Microsoft SQL Server (MOC 20461C)

How metadata can reduce query and report complexity As printed in the September 2009 edition of the IBM Systems Magazine

Advanced Multidimensional Reporting

Learning Alliance Corporation, Inc. For more info: go to

IBM DB2 for z/os Application Developer Certification

E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE

DEVELOPING DATABASE APPLICATIONS (INTERMEDIATE MICROSOFT ACCESS, X405.5)

Getting Started Guide. ProClarity Analytics Platform 6. ProClarity Professional

Course Outline. Writing Reports with Report Builder and SSRS Level 1 Course 55123: 2 days Instructor Led. About this course

20761 Querying Data with Transact SQL

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

InfoSphere Warehouse V9.5 Exam.

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Microsoft Access Illustrated. Unit B: Building and Using Queries

Advanced Functions with DB2 and PHP for IBM i

Table of Contents. Table of Contents

AVANTUS TRAINING PTE LTD

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

32 nd Annual Spring Conference Tuesday All Day Session

DB Fundamentals Exam.

SSAS Multidimensional vs. SSAS Tabular Which one do I choose?

Relational Databases

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Querying Data with Transact-SQL

5. Single-row function

Data Mining Concepts & Techniques

Querying Microsoft SQL Server

Oracle and Toad Course Descriptions Instructor: Dan Hotka

THE POWER OF PIVOT TABLES

Data is moving faster than ever, the volume of data is exploding, the

On the use of GROUP BY ROLLUP

About using Microsoft Query to retrieve external data

Microsoft Querying Data with Transact-SQL - Performance Course

Writing Reports with Report Designer and SSRS 2014 Level 1

Building Data Models with Microsoft Excel PowerPivot

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Querying Microsoft SQL Server

Database Processing: Fundamentals, Design, and Implementation

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

OPTIMIZING ACCESS ACROSS HIERARCHIES IN DATA WAREHOUSES QUERY REWRITING ICS 624 FINAL PROJECT MAY 成玉 Cheng, Yu. Supervisor: Lipyeow Lim, PhD

Access Intermediate

Oracle PL SQL Training & Certification

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview

Querying Microsoft SQL Server 2014

Using the AT and FOR Options with Relational Summary Functions

"Charting the Course to Your Success!" MOC D Querying Microsoft SQL Server Course Summary

20461: Querying Microsoft SQL Server

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

MTA Database Administrator Fundamentals Course

Data Warehouse and Data Mining

Teiid Designer User Guide 7.5.0

Working with Databases and Database Objects - Answers

Course 20461C: Querying Microsoft SQL Server

Querying Data with Transact-SQL

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

Querying Data with Transact-SQL

IBM. Database Database overview. IBM i 7.1

MIS NETWORK ADMINISTRATOR PROGRAM

CSC Web Programming. Introduction to SQL

1Z0-071 Exam Questions Demo Oracle. Exam Questions 1Z Oracle Database 12c SQL.

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Transcription:

Multidimensional Grouping Made Easy Introducing DB2 for V6R1 i5/os s new SQL group functions by Mike Cain As business solutions increasingly rely on robust data-centric processing, the capabilities of the database management system must continue to rise to the challenge and provide not just fast and efficient data access, but also the ability to handle complex query requests that involve sophisticated data manipulations. With the release of i5/os V6R1, the integrated version of DB2 delivers many new features. One of the features is the addition of advanced support for grouping sets and super groups, which allows for online analytical processing (OLAP) via SQL. This new advanced support also brings the new keywords ROLLUP, CUBE, GROUPING SETS, and GROUPING. With these new grouping functions, application developers can more easily deliver results for multidimensional requests. How the Functions Work SQL OLAP functions meet query and reporting requirements that call for a result set to contain not only aggregation for each specific group (i.e., totals or averages for each concatenated set of grouping columns), but they also include intermediate and final results for different sets of grouping columns. In effect, you will be able to calculate all combinations with one query. The following fictitious table definition illustrates how these functions work: CREATE TABLE ( OrderNo INTEGER NOT NULL, CustomerNo INTEGER NOT NULL, OrderYear INTEGER NOT NULL, OrderQuarter INTEGER NOT NULL, OrderMonth INTEGER NOT NULL, DiscCode CHAR(3), OrderAmount DECIMAL(5,2) NOT NULL ); The fictitious data represents orders for the end of 2006 and the beginning of 2007, such that OrderYear contains two distinct values: 2006 and 2007 OrderQuarter contains four distinct values: 1, 2, 3, and 4 OrderMonth contains 10 distinct values: 1, 2, 3, 4, 5, 8, 9, 10, 11, and 12 DiscCode contains three distinct values: 'ABC', 'DEF', and NULL Before the new support, a query request that specified the grouping expression consisting of columns and OrderMonth would result in information only for each distinct value of OrderYear+OrderQuarter+OrderMonth (Figure 1). FIgure 1 Resulting order table OrderMonth; 2006 3 8 210 2006 3 9 275 2006 4 10 305 2006 4 11 320 2006 4 12 450 2007 1 1 200 2007 1 2 260 2007 1 3 350 2007 2 4 310 2007 2 5 365 28 ProVIP System inews march 2008 SystemiNetwork.com

FIgure 2 ROLLUP query * ( UNION ALL OrderMonth NULL, NULL UNION ALL NULL, NULL, NULL, NULL UNION ALL clause represents super grouping. This function calculates and returns aggregates for each level of the hierarchy implicitly represented in the grouping columns. Given the hierarchy represented by columns X/Y/Z, ROLLUP (X, Y, Z) produces the following aggregation results for grouping sets: (X, Y, Z) (X, Y) (X) () = grand totals NULL, NULL, NULL, NULL, NULL, NULL ) Now imagine that the application requires not only sales figures for the detail-level groups represented in OrderYear+OrderQuarter+ but also the intermediate and final totals for each level. Typically, you need to execute queries to obtain figures for each level detail, subtotal, and final total. Multiple queries mean that you do more work to get all the information that you need. Specifying a new super grouping function enables the DB2 for i5/os engine to calculate and provide information for each distinct value OrderYear+OrderQuarter+OrderMonth OrderYear+OrderQuarter OrderYear Overall totals for the entire set With the proper super grouping or grouping set syntax, the engine can also return any or all combinations of the grouping columns, such as OrderYear+OrderQuarter+OrderMonth OrderYear+OrderQuarter OrderYear OrderYear+OrderMonth OrderQuarter+OrderMonth OrderQuarter OrderMonth Overall totals for the entire set This is yet another case of driving more data-centric processing from DB2 with a single SQL statement. ROLLUP The ROLLUP function added to the For example, if a query request specifies ROLLUP and the grouping columns of and the database engine provides information for each distinct value of OrderYear+OrderQuarter+OrderMonth; OrderYear+OrderQuarter; OrderYear; and a grand total for the entire set of results. ROLLUP ( OrderMonth); To help understand the amount of work involved, consider the ROLLUP query as logically equivalent to the pseudo SQL in Figure 2. In addition to the 10 rows that the query normally returns, DB2 also produces seven more rows (shown in blue in Figure 3). The seven rows show the subtotals and final total for the grouping column values represented in the hierarchy: months within quarter, quarters within year, and years within total. This represents the super grouping work. Notice that for each subtotal and total, NULL values are used for the rolled-up grouping column. This is true even if that particular column is not NULL capable. (I will explain later how to distinguish NULL values produced by super group operations from NULL values in the underlying table.) Because the sample SQL did not specify an ORDER BY clause, the actual order of the results is unpredictable (more about this later). CUBE The CUBE function added to the clause (also considered super grouping) calculates and returns aggregates for all possible distinct combinations represented by the grouping columns. In other words, in addition to grouping along the hierarchy produced by SystemiNetwork.com march 2008 System inews ProVIP 29

Multidimensional Grouping FIgure 3 ROLLUP table FIgure 4 CUBE table ROLLUP, CUBE also calculates cross-tabular results. Cubing operations are particularly useful to produce multidimensional results in support of OLAP. Imagine a single SQL query that provides information at the various intersections of subjects (e.g., sales figures by department, store, location, date) and effectively produces a cube of information. Given the hierarchy represented by columns X/Y/Z, CUBE (X, Y, Z) produces aggregation results for grouping sets: (X, Y, Z) (X, Y) (X, Z) (X) (Y, Z) (Y) (Z) ( ) = grand totals Note that (X,Y) is equal to (Y,X) and thus redundant and not produced. For example, if a query request specifies cube and the grouping columns of and the database engine provides information for each distinct value of OrderYear+OrderQuarter+OrderMonth; OrderYear+OrderQuarter; OrderYear+OrderMonth; OrderYear; OrderQuarter+OrderMonth; OrderQuarter; OrderMonth; and a grand total for the entire set of results. 2006 3 8 210 2006 3 9 275 2006 3-485 2006 4 10 305 2006 4 11 320 2006 4 12 450 2006 4-1075 2006 - - 1560 2007 1 1 200 2007 1 2 260 2007 1 3 350 2007 1-810 2007 2 4 310 2007 2 5 365 2007 2-675 2007 - - 1485 - - - 3045 2006 3 8 210 2006 3 9 275 2006 3-485 2006 4 10 305 2006 4 11 320 2006 4 12 450 2006 4-1075 2006 - - 1560 2007 1 1 200 2007 1 2 260 2007 1 3 350 2007 1-810 2007 2 4 310 2007 2 5 365 2007 2-675 2007 - - 1485-3 - 485-4 - 1075-1 - 810-2 - 675 - - 1 200 - - 2 260 - - 3 350 - - 4 310 - - 5 365 - - 8 210 - - 9 275 - - 10 305 - - 11 320 - - 12 450-3 8 210-3 9 275-4 10 305-4 11 320-4 12 450-1 1 200-1 2 260-1 3 350-2 4 310-2 5 365 2007-1 200 2007-2 260 2007-3 350 2007-4 310 2007-5 365 2006-8 210 2006-9 275 2006-10 305 2006-11 320 2006-12 450 - - - 3045 CUBE ( OrderMonth); In addition to the 10 rows that the query normally returns, DB2 also produced 41 more rows (shown in blue in Figure 4) that represent the subtotals and final total for the various combinations of grouping column values. This represents the expansive super grouping 30 ProVIP System inews march 2008 SystemiNetwork.com

FIgure 5 GROUPING SET table 2006 - - 1560 2007 - - 1485-3 - 485-4 - 1075-1 - 810-2 - 675 2006-8 210 2006-9 275 2006-10 305 2006-11 320 2006-12 450 2007-1 200 2007-2 260 2007-3 350 2007-4 310 2007-5 365 FIgure 6 Table showing super groups Order DiscCode OrderYear DiscCode YearGroup Group OrderAmount - - 1 1 3425 2006 ABC 0 0 505 2006-0 0 675 2006 DEF 0 0 445 2007-0 0 800 2007 ABC 0 0 650 2007 DEF 0 0 350 2006-0 1 1625 2007-0 1 1800 accomplished by the database engine. Again, notice that for each subtotal and total, NULL values are used to represent the empty grouping columns. GROUPING SETS The GROUPING SET functionality lets you specify multiple grouping clauses in a single statement where DB2 calculates and returns aggregates for each set. Before this new grouping support, only one set of grouping columns would be specified. Now more than one set of grouping criteria can be provided. The CUBE and ROLLUP operations can produce a series of grouping sets, as previously illustrated. For n grouping columns of the ROLLUP operation, n+1 grouping sets are produced. For n grouping columns of the CUBE operation, 2^n (2 to the power n) grouping sets are produced. The exponential growth of grouping sets and the corresponding increase in data processing are important to keep in mind. Where CUBE and ROLLUP implicitly predefine which subtotals will be calculated, the GROUPING SET notation lets the application explicitly define which groups will have subtotals returned. For example, instead of all the cross-tabular grouping combinations returned from a CUBE operation, maybe the application needs only a subset of all possible combinations. Given the hierarchy represented by columns X/Y/Z, GROUPING SETS ((X), (Y), (X, Z)) produces aggregation results for grouping sets: (X) (Y) (X, Z) For example, given a query request that specifies the three grouping sets (OrderYear), (OrderQuarter), and ( OrderMonth), the database engine provides aggregate information only for each distinct value of OrderYear (which is set 1), OrderQuarter (set 2), and OrderYear+OrderMonth (set 3). GROUPING SETS ( (OrderYear), (OrderQuarter), ( OrderMonth)); Figure 5 shows the resulting GROUPING SETS table. The grouping set support is powerful, and it can produce a huge set of results depending on the syntax you use. This is especially significant given that you can specify CUBE (set 1) and/or ROLLUP (set 2) as part of a grouping set specification (i.e., DiscCode set 3, CustomerNo set 4). GROUPING SETS (CUBE( OrderMonth), ROLLUP( OrderNo), (DiscCode), (CustomerNo)); The CUBE or ROLLUP operation expands into yet more grouping sets, as I previously described. Find more examples and information about the additive and multiplicative nature of grouping set notation in the DB2 for i5/os SQL Reference Manual. It is a good idea to lay out the grouping required and to carefully examine the SQL syntax before running the query. If you don t, you may unleash the proverbial query from hell. GROUPING Aggregate Function As I previously stated, part of the grouping sets and super groups functionality includes the database SystemiNetwork.com march 2008 System inews ProVIP 31

Multidimensional Grouping engine setting some or all of the result columns to NULL. The GROUPING aggregate function helps the application understand the value of a particular grouping column. In other words, did DB2 retrieve and use a value from the table, or did DB2 set a NULL value to represent that no grouping value is present in the query result? As part of the query s list, the aggregate function returns a small integer data type, which is set to either 0 or 1. The value 1 is used when DB2 sets the referenced grouping column to NULL because the row was produced from a grouping set or super group operation. The value 0 is used when the referenced grouping column contains an actual group by value from the underlying data set. DiscCode, GROUPING(OrderYear) AS OrderYearGroup, GROUPING(DiscCode) AS DiscCodeGroup, ROLLUP ( DiscCode); Recall that the ROLLUP operation produces the following grouping sets: ( DiscCode) (OrderYear) = DiscCode value set to NULL by DB2 () = OrderYear and DiscCode value set to NULL by DB2 Given that columns OrderYear and DiscCode are set to NULL as part of the ROLLUP super grouping operation, the GROUPING aggregate function is used on OrderYear and DiscCode to tell the application whether the respective column values are part of the underlying data set. If DB2 sets OrderYear to NULL, the value of column OrderYearGroup is set to 1. If DB2 sets DiscCode to NULL, the value of column DiscCodeGroup is set to 1. In all other situations, 0 is returned in these columns. The GROUPING function is particularly useful when the grouping column is NULL capable and the application needs to determine whether the NULL value in the result set is part of a super group or the source data. For example, the column DiscCode is defined as NULL capable, so there may be an actual group with a legitimate NULL value (Figure 6). The rows highlighted in blue in Figure 6 are the super groups. These rows have DiscCode and/or OrderYear set to NULL by DB2. The other rows represent groups from the underlying data set. Provide Order While you review the examples, you may notice that the order of the rows in the result sets is random and unpredictable. SQL does not guarantee a particular order or consistency of ordering unless you specify the ORDER BY clause in the query. Without an ORDER BY clause, even the same query (against the same data) with the same result set might produce a different result order. Given that the new grouping operations can return NULL for grouping column values, remember that the default behavior of DB2 is to order the NULL values last. In other words, the NULL value is higher than all other values. WHERE ORDER BY DiscCode, GROUPING(OrderYear) AS OrderYearGroup GROUPING(DiscCode) AS DiscCodeGroup DiscCode IS NOT NULL ROLLUP ( DiscCode) OrderYear ASC, DiscCode ASC; If the application expects or assumes a particular order of rows from the query, you need to specify the ORDER BY clause, especially when you use the GROUPING SETS, CUBE, and ROLLUP functions. SQL Query Optimization As part of the ongoing DB2 for i5/os enhancement strategy, virtually all new features and functions are available only when you use the SQL Query Engine (SQE) to optimize and run the query request. The new V6R1 grouping functions are no exception. Queries optimized and executed by the original Classic Query Engine (CQE) cannot specify GROUPING SETS, CUBE, or ROLLUP operations. The good news is that almost all SQL requests can now take advantage of SQE s benefits when running on V6R1. DB2 for i5/os continues to use sophisticated, and in many cases IBM-patented, technology when it optimizes and runs queries with the new grouping functions. Because one of the primary goals of the DB2 Query Optimizer is to minimize I/O and build the most efficient plan, query rewrite occurs for GROUPING SETS, ROLLUP, and CUBE operations. This rewrite capability lets the integrated version of DB2 perform multiple sets of aggregations with a single pass of the data, convert CUBE to a series of ROLLUPs to take advantage of the single pass process, and combine disparate GROUPING SETs and ROLLUPs into ROLLUPs with depth control. 32 ProVIP System inews march 2008 SystemiNetwork.com

The query rewrite capability also includes support for materialized query tables (MQTs). This lets DB2 rewrite a super grouping query in order to use a properly created and populated MQT, which further enhances performance and efficiency. SQL Query Tuning To help application developers and SQL performance analysts, the DB2 for i5/os OnDemand Performance Center tools have been enhanced to recognize and support the new grouping functions. The SQL Plan Cache Snapshots, SQL Performance Monitor, and state-of-the-art Visual Explain report on and illustrate the grouping query plans. As with most queries, a proper indexing strategy gives DB2 statistics and more options to optimize and implement an SQL request that contains GROUPING SETS, CUBE, and ROLLUP. Although DB2 will automatically provide advice, and possibly even create the proper temporary index, the best practice is to provide a permanent index before you run in production. Typically, it s best to create an index with key columns that cover the local selection equality predicates (e.g., WHERE MyColumn = 'ABC') and the grouping column(s). This basic strategy also applies to grouping sets and super groups. DiscCode, WHERE CustomerNo = 112358 ROLLUP ( DiscCode) ORDER BY DiscCode; CREATE INDEX Index ON (CustomerNo, DiscCode); DB2 for i5/os has also been enhanced to automatically provide index advice for these new types of grouping queries. Upgrade to the Latest and Greatest With the powerful new grouping and OLAP functionality, V6R1 brings yet another level of data-centric capabilities to the application developer. It s time to do more with SQL. It s time to let DB2 for i5/os do more for you. Mike Cain is a senior technical staff member with IBM, where he focuses on researching, quantifying, and demonstrating i5/os support for very large database, business intelligence, and SQL query performance. You can e-mail Mike at mcain@ us.ibm.com. Up to Speed on the Essentials? Our expert-written Essential Guides cover Remote Journaling IFS Security Encryption Implementing RFID Global Data Synchronization WDSc for PDM/SEU Users Backup and Recovery Connecting Excel and the iseries Stored Procedures Thwarting Hackers SCM XML High Availability Free-Format RPG Check them out at SystemiNetwork.com /essentialguides Your byline could be here! Help your fellow System i professionals AND earn money doing it. Our most valuable articles, tips, and utilities come from technical professionals like you who write from their own experiences. Even if you've never written for publication before, your professional experience is a valuable resource. Support the System i community by sharing your field-tested techniques and practices in System inews. Our editors will work with you to polish your prose. Contact Vicki Hamende (vhamende@ penton.com) for more information or to propose a topic idea. SystemiNetwork.com march 2008 System inews ProVIP 33