TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

Similar documents
High Volume In-Memory Data Unification

Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT

Comparison of Database Cloud Services

Comparison of Database Cloud Services

On-Disk Bitmap Index Performance in Bizgres 0.9

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.8.0

Take Me to SSD: A Hybrid Block-Selection Method on HDFS based on Storage Type

Schema Tuning. Tuning Schemas : Overview

Tuning Relational Systems I

CPI Phoenix IQ-201 using EXASolution 2.0

CPI Phoenix IQ-201 using EXASolution 2.0

TPC Benchmark H Full Disclosure Report. Sun Microsystems Sun Fire X4100 Server Using Sybase IQ 12.6 Single Application Server

Whitepaper. Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment

TPC Benchmark H Full Disclosure Report

Efficient in-memory query execution using JIT compiling. Han-Gyu Park

MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems

CSC317/MCS9317. Database Performance Tuning. Class test

Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional

Challenges in Query Optimization. Doug Inkster, Ingres Corp.

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle

GPU-Accelerated Analytics on your Data Lake.

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Developing a Dynamic Mapping to Manage Metadata Changes in Relational Sources

Correlated Sample Synopsis on Big Data

vsql: Verifying Arbitrary SQL Queries over Dynamic Outsourced Databases

XWeB: the XML Warehouse Benchmark

Histogram Support in MySQL 8.0

Enhanced XML Support in DB2 for LUW

Hands-on Lab: Working with SQL using Hive

ADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING

TPC Benchmark H Full Disclosure Report. Sun Microsystems Sun Fire X4200 M2 Server Using Sybase IQ 12.6 Single Application Server

A Compression Framework for Query Results

Run Your Own Oracle Database Benchmarks with Hammerora

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).

Just In Time Compilation in PostgreSQL 11 and onward

Multiple query optimization in middleware using query teamwork

TPC Benchmark H Full Disclosure Report. Sun Microsystems Sun Fire V490 Server Using Sybase IQ 12.6 Single Application Server

TPC Benchmark H Full Disclosure Report. Kickfire Appliance 2400 Using MySQL Database

Fighting Redundancy in SQL

Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig?

Active Disks - Remote Execution

Benchmarking Polystores: the CloudMdsQL Experience

Parallelism Strategies In The DB2 Optimizer

Efficiency Analysis of the access method with the cascading Bloom filter to the data warehouse on the parallel computing platform

Towards Comprehensive Testing Tools

Midterm Review. March 27, 2017

How to Analyze and Tune MySQL Queries for Better Performance

Advanced Query Optimization

Lazy Maintenance of Materialized Views

How to Analyze and Tune MySQL Queries for Better Performance

Avoiding Sorting and Grouping In Processing Queries

Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses

NewSQL Databases MemSQL and VoltDB Experimental Evaluation

Using MySQL, Hadoop and Spark for Data Analysis

SMOPD-C: An Autonomous Vertical Partitioning Technique for Distributed Databases on Cluster Computers

Oracle. Professional. WITH Function-Based Indexes (FBIs), I was able to alter an execution. Avoid Costly Joins with FBIs Pedro Bizarro.

Optimizing Queries Using Materialized Views

ABSTRACT. GUPTA, SHALU View Selection for Query-Evaluation Efficiency using Materialized

Fighting Redundancy in SQL: the For-Loop Approach

Understanding Data Races in MySQL

Query Optimization Time: The New Bottleneck in Realtime

Algebricks: A Data Model-Agnostic Compiler Backend for Big Data Languages

GPU ACCELERATION FOR OLAP. Tim Kaldewey, Jiri Kraus, Nikolay Sakharnykh 03/26/2018

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa

Jayant Haritsa. Database Systems Lab Indian Institute of Science Bangalore, India

Database Tuning. Database Tuning. Outline. Principles, Experiments and Troubleshooting Techniques

On-Disk Bitmap Index In Bizgres

6.830 Problem Set 2 (2017)

How to Analyze and Tune MySQL Queries for Better Performance

Anorexic Plan Diagrams

Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining

arxiv: v1 [cs.ai] 14 Nov 2017

How to Analyze and Tune MySQL Queries for Better Performance

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012

TPC-D: Benchmarking for Decision Support

JITing PostgreSQL using LLVM

CSIT115/CSIT815 Data Management and Security Assignment 2

1 Introduction Many current application environments use mediators (e.g., [19, 7, 1, 12, 4, 22]) to provide query access to a wide variety of heteroge

JIT-Compiling SQL Queries in PostgreSQL Using LLVM

Main-Memory Database Management Systems

Robust Optimization of Database Queries

Simba: Towards Building Interactive Big Data Analytics Systems. Feifei Li

Materialized Views. March 28, 2018

Getting Started with SAP Sybase IQ Column Store Analytics Server

Materialized Views. March 26, 2018

The Evolu*on of a Rela*onal Database Layer over HBase

Data Manipulation (DML) and Data Definition (DDL)

Exploring Power-Performance Tradeoffs in Database Systems

Benchmarking Hybrid OLTP&OLAP Database Systems

14 Data Warehouses in a nutshell

X2S. A Major Qualifying Project Report: Submitted to the Faculty. Of the WORCESTER POLYTECHNIC INSTITUTE

COMPUTING SCIENCE. A P2P Database Server Based on BitTorrent. John Colquhoun and Paul Watson TECHNICAL REPORT SERIES

Query Optimizer Plan Diagrams: Production, Reduction and Applications

The architecture and implementation of the Vega Information Grid

Benchmarking In PostgreSQL

Data Processing on Modern Hardware

Finding the Pitfalls in Query Performance

Data Processing on Modern Hardware

Transcription:

TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets TPC-H benchmark provides 8 table datasets. The below DDL statements is for them. create external table supplier ( S_SUPPKEY bigint, S_NAME text, S_ADDRESS text, S_NATIONKEY bigint, S_PHONE text, S_ACCTBAL double, S_COMMENT text) create external table lineitem ( L_ORDERKEY bigint, L_PARTKEY bigint, L_SUPPKEY bigint, L_LINENUMBER bigint, L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_RETURNFLAG text, L_LINESTATUS text, L_SHIPDATE date, L_COMMITDATE date, L_RECEIPTDATE date, L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text) create external table part ( P_PARTKEY bigint, P_NAME text, P_MFGR text, P_BRAND text, P_TYPE text, P_SIZE integer, P_CONTAINER text, P_RETAILPRICE double, P_COMMENT text)

create external table partsupp ( PS_PARTKEY bigint, PS_SUPPKEY bigint, PS_AVAILQTY int, PS_SUPPLYCOST double, PS_COMMENT text) create external table customer ( C_CUSTKEY bigint, C_NAME text, C_ADDRESS text, C_NATIONKEY bigint, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, C_COMMENT text) create external table orders ( O_ORDERKEY bigint, O_CUSTKEY bigint, O_ORDERSTATUS text, O_TOTALPRICE double, O_ORDERDATE date, O_ORDERPRIORITY text, O_CLERK text, O_SHIPPRIORITY int, O_COMMENT text) create external table nation ( N_NATIONKEY bigint, N_NAME text, N_REGIONKEY bigint, N_COMMENT text) create external table region ( R_REGIONKEY bigint, R_NAME text, R_COMMENT text) TPC-H Queries Q1

l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order lineitem l_shipdate <= '1998-09-01'::date l_returnflag, l_linestatus l_returnflag, l_linestatus Q2 Tajo does not support sclar subquery yet. So, you should use multiple queries as follows: create table nation_region as select n_regionkey, r_regionkey, n_nationkey, n_name, r_name region join nation on n_regionkey = r_regionkey r_name = 'EUROPE'; create table r2_1 as select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment, ps_supplycost nation_region join supplier on s_nationkey = n_nationkey join partsupp on s_suppkey = ps_suppkey join part on p_partkey = ps_partkey p_size = 15 and p_type like '%BRASS'; create table r2_2 as select p_partkey, min(ps_supplycost) as min_ps_supplycost r2_1 p_partkey; select s_acctbal, s_name, n_name, r2_1.p_partkey, p_mfgr, s_address, s_phone, s_comment r2_1 join r2_2 on r2_1.p_partkey = r2_2.p_partkey ps_supplycost = min_ps_supplycost s_acctbal, n_name, s_name, r2_1.p_partkey; Q3

l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority customer as c join orders as o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o. o_custkey join lineitem as l on l.l_orderkey = o.o_orderkey o_orderdate < '1995-03-15'::date and l_shipdate > '1995-03-15'::date l_orderkey, o_orderdate, o_shippriority revenue desc, o_orderdate; Q5

n_name, sum(l_extendedprice * (1 - l_discount)) as revenue customer, orders, lineitem, supplier, nation, region c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= '1994-01-01'::date and o_orderdate < '1995-01-01'::date n_name revenue desc; Q6 select sum(l_extendedprice*l_discount) as revenue lineitem l_shipdate >= '1994-01-01'::date and l_shipdate < '1995-01-01'::date and l_discount >= 0.05 and l_discount <= 0.07 and l_quantity < 24; Q10

c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment customer as c join nation as n on c.c_nationkey = n.n_nationkey join orders as o on c.c_custkey = o.o_custkey and o.o_orderdate >= '1993-10- 01'::date and o.o_orderdate < '1994-01-01'::date join lineitem as l on l.l_orderkey = o.o_orderkey and l.l_returnflag = 'R' c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment revenue desc Q12 select l_shipmode, sum(case when o_orderpriority ='1-URGENT' or o_orderpriority ='2- HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count orders, lineitem o_orderkey = l_orderkey and (l_shipmode = 'MAIL' or l_shipmode = 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= '1994-01-01'::date and l_receiptdate < '1995-01-01'::date l_shipmode l_shipmode Q14

100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice* (1-l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue lineitem, part l_partkey = p_partkey and l_shipdate >= '1995-09-01'::date and l_shipdate < '1995-10-01'::date Q19

( ) or ( ) or ( ); sum(l_extendedprice * (1 - l_discount) ) as revenue lineitem, part p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ( 'SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON'