Histogram Support in MySQL 8.0

Similar documents
Towards Comprehensive Testing Tools

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle

MySQL 8.0 What s New in the Optimizer

Jayant Haritsa. Database Systems Lab Indian Institute of Science Bangalore, India

Anorexic Plan Diagrams

Robust Optimization of Database Queries

Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

High Volume In-Memory Data Unification

How to Analyze and Tune MySQL Queries for Better Performance

Avoiding Sorting and Grouping In Processing Queries

On-Disk Bitmap Index Performance in Bizgres 0.9

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012

How to Analyze and Tune MySQL Queries for Better Performance

Oracle. Professional. WITH Function-Based Indexes (FBIs), I was able to alter an execution. Avoid Costly Joins with FBIs Pedro Bizarro.

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).

MySQL 8.0: Common Table Expressions

How to Analyze and Tune MySQL Queries for Better Performance

Comparison of Database Cloud Services

A Compression Framework for Query Results

How to Analyze and Tune MySQL Queries for Better Performance

Comparison of Database Cloud Services

MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems

Query Optimizer Plan Diagrams: Production, Reduction and Applications

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Developing a Dynamic Mapping to Manage Metadata Changes in Relational Sources

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa

Press Release Writing Tips and Tricks for the Enterprise Technology Space

Tuning Relational Systems I

Schema Tuning. Tuning Schemas : Overview

Safe Harbor Statement

6.830 Problem Set 2 (2017)

Combining SQL and NoSQL with MySQL

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database at OOW 2017

MariaDB Optimizer. Current state, comparison with other branches, development plans

Oracle Exam 1z0-882 Oracle Certified Professional, MySQL 5.6 Developer Version: 7.0 [ Total Questions: 100 ]

MySQL InnoDB Cluster. MySQL HA Made Easy! Miguel Araújo Senior Software Developer MySQL Middleware and Clients. FOSDEM 18 - February 04, 2018

More SQL in MySQL 8.0

What's New in MySQL 5.7?

CSC317/MCS9317. Database Performance Tuning. Class test

Creating and Working with JSON in Oracle Database

Midterm Review. March 27, 2017

MySQL Query Tuning 101. Sveta Smirnova, Alexander Rubin April, 16, 2015

Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig?

Safe Harbor Statement

Copyright 2017, Oracle and/or its aff iliates. All rights reserved.

Lazy Maintenance of Materialized Views

Parallelism Strategies In The DB2 Optimizer

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

On-Disk Bitmap Index In Bizgres

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision

<Insert Picture Here> DBA s New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g

Oracle 11g Partitioning new features and ILM

Mix n Match Async and Group Replication for Advanced Replication Setups. Pedro Gomes Software Engineer

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Unit 1 - Chapter 4,5

MySQL JSON. Morgan Tocker MySQL Product Manager. Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Efficient in-memory query execution using JIT compiling. Han-Gyu Park

Optimizing Queries Using Materialized Views

COUNT Function. The COUNT function returns the number of rows in a query.

#MySQL #oow16. MySQL Server 8.0. Geir Høydalsvik

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

Real-World Performance Training Star Query Prescription

Multiple query optimization in middleware using query teamwork

MySQL 5.7: What s New in the Optimizer?

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.8.0

Exam Questions 1z0-882

Develop Python Applications with MySQL Connector/Python DEV5957

Advanced Query Optimization

State of the Dolphin Developing new Apps in MySQL 8

Querying Data with Transact SQL

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Best Practices for Performance

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11

COMPUTING SCIENCE. A P2P Database Server Based on BitTorrent. John Colquhoun and Paul Watson TECHNICAL REPORT SERIES

Insider s Guide on Using ADO with Database In-Memory & Storage-Based Tiering. Andy Rivenes Gregg Christman Oracle Product Management 16 November 2016

Self-Tuning Database Systems: The AutoAdmin Experience

Cost Models. the query database statistics description of computational resources, e.g.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Databases (MariaDB/MySQL) CS401, Fall 2015

1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Fusion Product Hub Training Data Governance: Business Rules and Impact Analysis. July 2014

Materialized Views. March 28, 2018

OpenJDK Adoption Group

Oracle Optimizer: What s New in Oracle Database 12c? Maria Colgan Master Product Manager

Oracle Secure Backup 12.2 What s New. Copyright 2018, Oracle and/or its affiliates. All rights reserved.

1/3/2015. Column-Store: An Overview. Row-Store vs Column-Store. Column-Store Optimizations. Compression Compress values per column

Oracle Database In-Memory By Example

<Insert Picture Here> Upcoming Changes in MySQL 5.7 Morgan Tocker, MySQL Community Manager

CPI Phoenix IQ-201 using EXASolution 2.0

CPI Phoenix IQ-201 using EXASolution 2.0

MySQL Group Replication in a nutshell

NoSQL + SQL = MySQL. Nicolas De Rico Principal Solutions Architect

Modern Development With MySQL

Materialized Views. March 26, 2018

<Insert Picture Here> Boosting performance with MySQL partitions

<Insert Picture Here> MySQL Cluster What are we working on

Transcription:

Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms used? Query example Some advice 3

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms used? Query example Some advice 4

Motivating Example JOIN Query EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; id select type table type possible keys key 1 SIMPLE orders ALL 1 SIMPLE customer eq_ ref i_o_orderdate, i_o_custkey key len PRIMARY PRIMARY 4 ref rows filtered extra NULL NULL NULL 15000000 31.19 dbt3.orders. o_custkey 1 33.33 Using where Using where 5

Motivating Example Reverse join order EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where Using where 6

Query Execution Time (seconds) Comparing Join Order Performance 16 14 12 10 8 6 4 2 0 orders customer customer orders

Histograms Create histogram to get a better plan ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS; EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where Using where 8

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms used? Query example Some advice 9

Histograms Column statistics Information about value distribution for a column Data values group in buckets Frequency calculated for each bucket Maximum 1024 buckets May use sampling to build histogram Sample rate depends on available memory Automatically chooses between two histogram types: Singleton: One value per bucket Equi-height: Multiple values per bucket 10

Frequency Singleton Histogram One value per bucket Each bucket stores: Value Cumulative frequency Well suited to estimate both equality and range predicates 0,25 0,2 0,15 0,1 0,05 0 0 1 2 3 5 6 7 8 9 10

Frequency Equi-Height Histogram Multiple values per bucket Not quite equi-height Values are not split across buckets Frequent values in separate buckets Each bucket stores: Minimum value Maximum value Cumulative frequency Number of distinct values Best suited for range predicates 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 0-0 1-1 2-3 5-6 7-10

Usage Create or refresh histogram(s) for column(s): ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS; Note: Will only update histogram, not other statistics Drop histogram: ANALYZE TABLE table DROP HISTOGRAM ON column [, column]; Based on entire table or sampling: Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB) New storage engine API for sampling Default implementation: Full table scan even when sampling Storage engines may implement more efficient sampling 13

Storage Stored in a JSON column in data dictionary Can be inspected in Information Schema table: SELECT JSON_PRETTY(histogram) FROM information_schema.column_statistics WHERE schema_name = 'dbt3_sf1' AND table_name ='lineitem' AND column_name = 'l_linenumber'; 14

Histogram content { "buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523], [3, 0.6427401784471978], [4, 0.7855470933802572], [5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-02-03 21:05:21.690872", "sampling-rate": 0.20829115437457252, "histogram-type": "singleton", "number-of-buckets-specified": 1024 } 15

Strings Max. 42 characters considered Base64 encoded SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+ value cumulfreq +-------+--------------------+ F 0.4862529264385756 O 0.974029654577566 P 0.9999999999999999 +-------+--------------------+ 16

Calculate Bucket Frequency Use window function SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq, c - LAG(c, 1, 0) over () freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+----------------------+ value cumulfreq freq +-------+--------------------+----------------------+ F 0.4862529264385756 0.4862529264385756 O 0.974029654577566 0.48777672813899037 P 0.9999999999999999 0.025970345422433927 +-------+--------------------+----------------------+ 17

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms used? Query example Some advice 18

Condition filter effect Ref access When are Histograms useful? Estimate cost of join t x JOIN t x+1 Number of records read from t x Records passing the table conditions on t x Cardinality statistics for index t x t x+1 records(t x+1 ) = records(t x ) * condition_filter_effect * records_per_key

How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) P(A and B)

How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Histograms 4. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) P(A and B)

Calculating Condition Filter Effect for Tables Example without histograms 0.03 (index) SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; 0.29 (range) 0.1 (guesstimate) Condition filter effect for tables: office: 0.03 employee: 0.29 * 0.1 * 0.33 0.01 0.33 (guesstimate)

Calculating Condition Filter Effect for Tables Example with histogram 0.03 (index) SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; 0.29 (range) 0.1 (guesstimate) Condition filter effect for tables: office: 0.03 employee: 0.29 * 0.1 * 0.95 0.03 0.95 (histogram)

Frequency Computing Selectivity From Histogram Example 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 0-7 age <= 21 Selectivity = 0.203 + (0.306 0.203) * 5/8 = 0.267 age > 21 Selectivity = 1-0.267 = 0.733 0.203 8-16 0.306 17-24 25-31 32-38 age 39-46 47-53 54-61 62-70 71-104 Cumulative Frequency

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms used? Query example Some advice 25

DBT-3 Query 7 Volume Shipping Query SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation, EXTRACT(YEAR FROM l_shipdate) AS l_year, l_extendedprice * (1 - l_discount) AS volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE') OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA')) AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;

DBT-3 Query 7 Query plan without histogram

DBT-3 Query 7 Query plan with histogram

Query Execution Time (seconds) DBT-3 Query 7 Performance 1,8 1,6 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0 Without histogram With histogram

Program Agenda 1 2 3 4 5 Motivating example Quick start guide How is histograms used? Query example Some advice 30

Some advice Which columns to create histograms for? Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries Queries with IN-subqueries ORDER BY... LIMIT queries Best fit Low cardinality columns (e.g., gender, orderstatus, dayofweek, enums) Columns with uneven distribution (skew) Stable distribution (do not change much over time)

Some more advice When not to create histograms: First column of an index Never used in WHERE clause Monotonically increasing column values (e.g. date columns) Histogram will need frequent updates to be accurate Consider to create index How many buckets? If possible, enough to get a singleton histogram For equi-height, 100 buckets should be enough

More information MySQL Server Team blog http://mysqlserverteam.com/ https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth) My blog: http://oysteing.blogspot.com/ MySQL forums: Optimizer & Parser: http://forums.mysql.com/list.php?115 Performance: http://forums.mysql.com/list.php?24

Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 34

35