Why You Should Run TPC-DS: A Workload Analysis

Similar documents
Transaction Processing Performance Council (TPC) TPC Overview

35 Database benchmarking 25/10/17 12:11 AM. Database benchmarking

TPC-DI. The First Industry Benchmark for Data Integration

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

TPC-Energy Benchmark Development: Mike Nikolaiev, Chairman of the TPC-Energy Specification Committee

Transaction Processing Performance Council. Past, Present, Future

New TPC Benchmarks for Decision Support and Web Commerce

Data Warehouse Tuning. Without SQL Modification

Oracle 1Z0-515 Exam Questions & Answers

Performance evaluation and. INF5100 Autumn 2007 Jarle Søberg

Column-Stores vs. Row-Stores: How Different Are They Really?

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

1Z0-526

I/O Characterization of Commercial Workloads

Optimizing Data Transformation with Db2 for z/os and Db2 Analytics Accelerator

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

Performance evaluation and benchmarking of DBMSs. INF5100 Autumn 2009 Jarle Søberg

Oracle Database 11g: Data Warehousing Fundamentals

VERITAS Storage Foundation 4.0 for Oracle

Oracle Database 11g: Administer a Data Warehouse

TPCX-BB (BigBench) Big Data Analytics Benchmark

Fundamentals of Information Systems, Seventh Edition

Parallel Query Optimisation

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Data Warehousing and Decision Support

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Benchmark TPC-H 100.

Ahmad Ghazal, Minqing Hu, Tilmann Rabl, Alain Crolotte, Francois Raab, Meikel Poess, Hans-Arno Jacobsen

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Oracle Database 11g: SQL Tuning Workshop

Data Warehouse and Data Mining

Data Warehousing 11g Essentials

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

On-Disk Bitmap Index Performance in Bizgres 0.9

One Size Fits All: An Idea Whose Time Has Come and Gone

Course Contents: 1 Datastage Online Training

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehousing and Decision Support

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Information Management course

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

HP ProLiant delivers #1 overall TPC-C price/performance result with the ML350 G6

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

Transaction Performance vs. Moore s Law

Oracle GoldenGate and Oracle Streams: The Future of Oracle Replication and Data Integration

Evolution of Database Systems

Data Warehouse and Data Mining

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

April Copyright 2013 Cloudera Inc. All rights reserved.

In-Memory Data Management Jens Krueger

Oracle s JD Edwards EnterpriseOne IBM POWER7 performance characterization

Scaling To Infinity: Making Star Transformations Sing. Thursday 15-November 2012 Tim Gorman

Real-World Performance Training Star Query Edge Conditions and Extreme Performance

(Extended) Entity Relationship

SAP NetWeaver BW Performance on IBM i: Comparing SAP BW Aggregates, IBM i DB2 MQTs and SAP BW Accelerator

Oracle In-Memory & Data Warehouse: The Perfect Combination?

Data Warehouse and Data Mining

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Big Data solution benchmark

Oracle Hyperion Profitability and Cost Management

6+ years of experience in IT Industry, in analysis, design & development of data warehouses using traditional BI and self-service BI.

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

SQL Server 2014 Highlights der wichtigsten Neuerungen In-Memory OLTP (Hekaton)

HANA Performance. Efficient Speed and Scale-out for Real-time BI

Advanced Data Management Technologies Written Exam

Capturing Your Changed Data

Oracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

QLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

E(xtract) T(ransform) L(oad)

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

IBM DB2 Analytics Accelerator

TPC Benchmark H Full Disclosure Report. ThinkServer RD630 VectorWise RedHat Enterprise Linux 6.4

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Multi-Vendor, Un-integrated

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

Microsoft SQL Server Training Course Catalogue. Learning Solutions

Column Stores vs. Row Stores How Different Are They Really?

Cisco Systems, Inc. TPC Benchmark H Full Disclosure Report

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

The mixed workload CH-BenCHmark. Hybrid y OLTP&OLAP Database Systems Real-Time Business Intelligence Analytical information at your fingertips

Relational Database Systems 2 1. System Architecture

Enterprise Data Warehousing

EMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing

Data sharing and transformation in real time. Stephan Leisse Solution Architect

Data Warehouse and Data Mining

Exadata Implementation Strategy

SQL/MX UPDATE STATISTICS Enhancements

Call: Datastage 8.5 Course Content:35-40hours Course Outline

IBM WEB Sphere Datastage and Quality Stage Version 8.5. Step-3 Process of ETL (Extraction,

D B M G Data Base and Data Mining Group of Politecnico di Torino

Answer: A Reference: df(page 1, first para)

What is Real Application Testing?

Oracle Database 10g: Implement and Administer a Data Warehouse

Transcription:

Why You Should Run TPC-DS: A Workload Analysis Meikel Poess Oracle USA Raghunath Othayoth Nambiar Hewlett-Packard Company Dave Walrath Sybase Inc (in absentia)

Agenda Transaction Processing Performance Council (TPC) Scope of TPC-DS benchmark TPC-DS Design Considerations TPC-DS Workload Analysis TPC-DS Metric Analysis Q&A 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 2

Transaction Processing Performance Council The TPC defines transaction processing and database benchmarks and delivers trusted results to the industry. Most credible, system-level benchmark evaluation test for the server industry Fulfilling the role of a Consumer Reports for the computing industry Scores are the most requested server benchmarks in server RFPs Active benchmarks TPC-C: Online transaction processing TPC-H: Data Warehouse for ad hoc queries TPC-App: Application server and web services TPC-E: Online transaction processing (new) Benchmarks under development TPC-DS: Decision Support 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 3

TPC Membership 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 4

What makes the TPC unique TPC is the only benchmark organization that requires priceperformance scores across all of its benchmarks All tests require full documentation of the components and applications under test, so that the test can be replicated The TPC requires an independent audit of results prior to publication Extensive oversight via fair use policies TPC tests the whole system performance, not just a piece TPC is database agnostic: Oracle, IBM DB2, Sybase, Microsoft SQL Server, NonStop SQL/MX and other databases TPC provides cross-platform performance comparisons, a view of processor versus real performance, technology comparisons and actual cost of performance comparisons 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 5

Objectives of TPC Benchmarks System and database vendors Competitive analysis Release to release progress Technology development Customers Cross vendor/architecture performance comparison Cross vendor/architecture TCO comparison Evaluate new technologies Eliminate investment in in-house characterization Research community A standard yet customizable workload 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 6

TPC s DW/DSS Benchmark History TPC-D - Data Warehouse (1995-1999) TPC-R - Data Warehouse for reporting queries (99-04) TPC-H - Data Warehouse for ad hoc queries (99- current) TPC-DS - Decision Support (target 2008) Latest status and specification http://www.tpc.org/tpcds/default.asp Series of Presentations TPC-DS, Taking Decision Support Benchmarking to the Next Level, SIGMOD 2002 The Making of TPC-DS, VLDB 2006 Why You Should Run TPC-DS: Workload Analysis, VLDB 2007 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 7

Scope of TPC DS Benchmark Measures generally applicable aspects of a Decision Support System Examine large volume of data Give answers to real-world business questions Execute queries of various operational requirements Generate intense activity against the database server component of a system (IO, memory, CPU, Interconnect) Remain closely synchronized with source OLTP database through a periodic database maintenance function Provides the industry An objective means of comparing the performance of decision support systems the price-performance of decision support systems A standard yet customizable workload Gartner Inc. showed business intelligence (BI) as a top priority for CIOs. http://www.gartner.com/2_events/conferences/bie7i.jsp 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 8

Hardware TPC Decision Support Benchmark Representative Large Query Set Non-Uniform Data Set Ad-Hoc and Reporting Queries Database Intensive Load on CPU Disk IO (Read and Write) Memory Metric reflects all above Architectural Neutral Sometimes in conflict Enable Serve TPC-DS Enable Exercise Query Optimizer Data Storage Methods Data Access Methods Join Methods Sort/GB Algorithms Complex ADS Simple Metric System Realism Application Realism Benchmark Consumer 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 9

Benchmark Execution: Bird s Eye View System Setup Database Setup Database Load Query Run #1 ETL #1 Query Run #2 + + ETL #2 Un-timed Timed Setup Creation Database Query of: Run of: Load #1 Runs Servers/ Data n Maintenance-ETL concurrent Operating users System #2 #1 each Load System of raw tables data Storage users Arrays executes including 99 queries Load into fact tables RAID Creation Table of spaces auxiliary data Networks Delete from fact tables structures File Groups Database Maintain Statistics Software slowly changing dimensions Log files gathering Flat files (optional) Repeat of Query Run #1 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 10

Hardware Vendor Requirements CPU Disk IO Read and Write IO Balanced Query Mix Memory Access Architectural Neutral Metric reflects all of the above Implementation CPU bound queries IO bound queries ETL Large query set/concurrent user Large hash/joins, sorts, GB ANSI SQL, wide industrial representation in TPC Metric includes Load, Query and ETL performance 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 11

Requirements Query Optimizer Join Operations Sort/GB Operations Complex ADS Data Storage Techniques Data Access Patterns Database Vendor Implementation Rich Query Set: - star transformation and traditional large join operations Rich Data Set: - NULLs +non-uniform distributions Multiple Snowflake Schemas: - Nested Loops - Hash Joins - Bitmap Joins Sort/GB on large data sets ADS are allowed on a subset of the schema Physical Partitioning /Clustering/Compression Query Set allows for large sequential scans and random IOs 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 12

Query Run The query run tests the system s ability to execute the most number of queries in the least amount of time (multi user test) Queries can be categorized by: Query Class Ad Hoc Reporting Iterative Data Mining Queries Schema Coverage Resource Utilization SQL Features 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 13

Query Categorization by Resource Utilization 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 14

CPU Intensive Query (Query 70) SELECT sum(ss_net_profit) as total_sum. s_state,s_county,grouping(s_state)+grouping(s_county),rank()over(partition by grouping(s_state) +grouping(s_county),case when grouping(s_county)=0 then s_state end order by sum(ss_net_profit) desc) FROM store_sales,date_dim,store WHERE d_year = [YEAR] AND d_date_sk = ss_sold_date_sk AND s_store_sk = ss_store_sk AND s_state in (SELECT s_state FROM (SELECT s_state,rank()over(partition by s_state order by sum(ss_net_profit)desc) as r FROM store_sales,store,date_dim WHERE d_year =[YEAR] AND d_date_sk = ss_sold_date_sk AND s_store_sk = ss_store_sk GROUP BY s_state) WHERE r <= 5) GROUP BY ROLLUP(s_state,s_county) ORDER BY lochierarchy desc,case WHEN lochierarchy = 0 THEN s_state END,rank_within_parent; 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 15

IO Intensive Query (82) SELECT i_item_id,i_item_desc,i_current_price FROM item, inventory,date_dim,store_sales WHERE i_current_price between [P] and [P] + 30 AND inv_item_sk = i_item_sk AND d_date_sk=inv_date_sk AND d_date between cast('[date]' as date) AND (cast('[date]' as date)+60) AND i_manufact_id IN ([ID.1],[ID.2],[ID.3]) AND inv_quantity_on_hand between 100 and 500 AND ss_item_sk = i_item_sk GROUP BY i_item_id,i_item_desc,i_current_price ORDER BY i_item_id; 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 16

Data Maintenance Functions Are defined as pseudo code Can be implemented in SQL, or programming SQL Need to guarantee referential integrity Maintain slowly changing dimensions Insert and delete fact tables 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 17

Updates/Inserts/Deletes (non history keeping) Dimension Dimension entries are identified by their business key and all changed fields are updated. 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 18

Updates/Inserts/Deletes (history keeping) Dimension Dimension entries are identified by their business key and end_rec_date=null Then rows is updated by setting end_rec_date to new date New row is inserted 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 19

Deletes/Inserts Fact Tables Assuming partitioning by sales date, sales are deleted by date range Fact Table Sales Fact Table Returns Return rows might be scattered 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 20

Primary Performance Metric Queries per Hour QphDS @ SF = 99*2* S *3600* SF ( ) T + T + 0.01* S* T TT1 TT 2 Load S: Number of query streams SF: Scale Factor T TT1 and T TT2 : elapsed times to complete query run #1 and #2 T LOAD is the total elapsed time to complete the database load 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 21

Metric Analysis: Use of Materialization 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 22

More Information Specification Dbgen2 Qgen2 Query templates http://www.tpc.org/tpcds/default.asp 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 23

Q & A 33nd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria 24