Oracle Optimizer: What s New in Oracle Database 12c? Maria Colgan Master Product Manager

PART 3 2

Program Agenda Adaptive Query Optimization Statistics Enhancements What s new in SQL Plan Management 3

Histograms Histograms tell Optimizer about the data distribution in a Column Creation controlled by METHOD_OPT parameter Default create histogram on any column that has been used in the WHERE clause or GROUP BY of a statement AND has a data skew Relies on column usage information gathered at compilation time and stored in SYS.COL_USAGE$ Four types of histograms Frequency Top-Frequency Height balanced Hybrid 4 Oracle Confidential

Histograms Frequency Histograms (FREQUENCY) A frequency histogram is only created if the number of distinct values in a column (NDV) is less than 254 values Frequency histogram 5 Oracle Confidential

Histograms Top Frequency (TOP-FREQUENCY) Traditionally a frequency histogram is only created if NDV < 254 What if a small number of values occupies most of the rows (>99%)? Creating a frequency histograms on that small set of values is very useful even though NDV is greater than 254 Ignoring unpopular values allows for better quality histogram Built using the same technique used for frequency histograms Only created with AUTO_SAMPLE_SIZE 6

Top Frequency Histogram Example Table PRODUCT_SALES contains information on Christmas ornament sales TIME_ID column perfect candidate for top-frequency histogram It has 1.78 million rows There are 620 distinct TIME_IDs But 99.9% of the rows have less than 254 distinct TIME_IDs 7

Histograms Height Balanced Histograms (HEIGHT BALANCED) A height balanced histogram is created if the number of distinct values in a column (NDV) is greater than 254 values Height balanced histogram 8

Histograms Hybrid Histograms (HYBRID) A height balanced histogram is created if the number of distinct values in a column (NDV) is greater than 254 values Hybrid histogram 9

Histograms Hybrid Histograms (HYBRID) Similar to height balanced histogram as created if the NDV >254 Store the actual frequencies of bucket endpoints in histograms No values are allowed to spill over multiple buckets More endpoint values can be squeezed in a histogram Achieves the same effect as increasing the # of buckets Only created with AUTO_SAMPLE_SIZE 10

Height-balanced versus Hybrid Histogram Oracle Database 11g Oracle Database 12c 11

Height-balanced Histogram Example Step 1: SELECT row_num, time_id FROM sales ORDER BY 2; ROWNUM TIME_ID 1 02-JAN-98 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : With a traditional height based histogram an even number of rows goes in each bucket Multiple buckets can have the same endpoint There are 960 rows in the sales table Automatically created histograms have 254 buckets That is 3.77 rows per bucket 12

Height-balanced Histogram Example Step 2: Assign roughly an equal number of rows per bucket ROWNUM TIME_ID 1 02-JAN-98 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : Bucket 0 has end point 02-JAN-98 Bucket 1 has end point 05-JAN-98 Bucket 2 has end point 09-JAN-98 Step 3: Bucket 0 added for min value Step 4: Buckets with the same endpoint are compressed 13

Height-balance Histogram Example Multiple buckets have the same endpoint Max number of popular values that could have been recorded 96 Missing bucket numbers are buckets that have been compressed 14

Height-balanced versus Hybrid Histogram Oracle Database 11g Oracle Database 12c 15

Hybrid Histogram Example Step 1: SELECT row_num, time_id FROM sales ORDER BY 2; ROWNUM TIME_ID 1 02-JAN-98 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : As with traditional height based histogram want an even number of rows in each bucket But no two buckets have the same endpoint There are 960 rows in the sales table Automatically created histograms have 254 buckets That is 3.77 rows per bucket 16

Hybrid Histogram Example Step 2: Assign roughly an equal number of rows per bucket ROWNUM TIME_ID 1 02-JAN-98 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : Bucket 1 has end point 02-JAN-98 1 Bucket 5 has end point 06-JAN-98 1 Bucket 10 has end point 10-JAN-98 2 17

Hybrid Histogram Example No two buckets have the same endpoint Frequency of endpoint values recorded in new column called endpoint repeat count Potential to recorded 254 popular values : Frequency at which an endpoint values occurs in the sample is recorded You can manual increase the number of buckets to 2048 18

Online Statistics Gathering Statistics gathered as part of the direct path load operations Create Table As Select or Insert As Select commands Statistics available directly after load No additional table scan required to gather statistics All internal maintenance operations that use CTAS benefit from this Note only occurs on IAS if table is empty 19

Online Statistics Gathering Example New table SALES2 is created using a CTAS command Both table and column level statistics are available immediately after table has been created Statistics immediately available 20

Online Statistics Gathering Example Histogram & index statistics not gathered To gather these statistics without re-gathering basic statistics set option parameter of GATHER_TABLE_STATS to GATHER AUTO 21

Online Statistics Gathering Example Histogram & index statistics gathered using new GATHER AUTO option without regathering base column statistics 22

Session private statistics for GTT Overview Traditionally statistics gathered on GTT were shared by all sessions Share statistics are not always optimal Now each session can have its own version of statistics for GTT Controlled by new preference GLOBAL_TEMP_TABLE_STATS Default value is SESSION (non shared) To force sharing (as in 11g) set table preference to SHARED 23

Session private statistics for GTT Statistics gathered on a GTT are no longer shared by all sessions To restore shared statistics change the table preference GLOBAL_TEMP_TABLE_STATS to SHARED By default statistics on GGT are session-private 24

Enhanced Incremental Statistics Incremental statistics allows global level statistics to accurately generated from partition level statistics NDV statistics can now be accurately aggregated by the introduction of the synopsis The synopses are stored in the Sysaux tablespace In 12c reduced the space required to store synopses on disk 25

Enhanced Incremental Statistics NDV statistics can now be accurately aggregated The synopses are stored in the Sysaux tablespace 12c reduces the space required to store synopses on disk Sales Table May 18 th 2012 May 19 th 2012 May 20 th 2012 May 21 st 2012 May 22 nd 2012 1. Partition level stats are gathered & synopsis created 2. Global stats generated by aggregating partition level statistics and synopsis May 23 rd 2012 Sysaux Tablespace 26

Enhanced Incremental Statistics for Partition Exchange DBA 1. Create external table for flat files Sales Table May 18 th 2012 May 19 th 2012 May 20 th 2012 6. Global stats generated by aggregating partition level statistics for existing partition with stats on new partition 2. Use CTAS command to create non-partitioned table TMP_SALES May 21 st 2012 TMP_SALES 3. Set INCREMENTAL to true & INCREMENTAL_LEVEL to TABLE 4. Gather Statistics May 22 nd 2012 May 23 rd 2012 5. Alter table Sales exchange partition May_24_2012 with table tmp_sales May 24 th 2012 Sysaux Tablespace 27

Enhanced Incremental Statistics Staleness Tolerance During data load some rows going to the older partitions In 11g any DML on older partitions triggered partition statistics to be re-gathered Sales Table May 18 th 2012 May 19 th 2012 May 20 th 2012 May 21 st 2012 May 22 nd 2012 May 23 rd 2012 1. Partition level stats are gathered & synopsis created 2. Global stats generated by aggregating partition level statistics and synopsis Sysaux Tablespace 28

Enhanced Incremental Statistics Staleness Tolerance New DBMS_STATS preference INCREMENTAL_STALENESS When set to USE_STALE_PERCENT DML on less than 10% of rows in older partitions will not trigger re-gather Sales Table May 18 th 2012 May 19 th 2012 May 20 th 2012 May 21 st 2012 May 22 nd 2012 May 23 rd 2012 1. Partition level stats are gathered & synopsis created 2. Global stats generated by aggregating partition level statistics and synopsis Sysaux Tablespace 29

Statistics Enhancements Concurrent Statistics Gathering Original introduced in 11.2 Gather statistics on multiple objects at the same time Controlled by DBMS_STATS preference, CONCURRENT Uses Database Scheduler and Advanced Queuing Number of concurrent gather operations controlled by job_queue_processes parameter Each gather operation can still operate in parallel 30

Concurrent Statistics Gathering for SH Schema A statistics gathering job is created for each table and partition in the schema Level 1 contain statistics gathering jobs for all nonpartitioned tables and a coordinating job for each partitioned table Level 2 contain statistics gathering jobs for each partition in the partitioned tables Exec DBMS_STATS.GATHER_SCHEMA_STATS( SH ); 31

Concurrent Statistics Enhancements in 12c Multiple partitioned table support Batch manager to batch smaller jobs together to reduce scheduling over head Auto statistics gather job can now use concurrency Cap to limit the resource available to the job via resource manager 32

Extended Statistics Two types of Extended Statistics Column groups statistics Column group statistics useful when multiple column from the same table are used in where clause predicates or group by clause Expression statistics Expression statistics useful when a column is used as part of a complex expression in where clause predicate Automatically maintained when statistics are gathered on the table Candidates for columns groups can be manually or automatically determined 33

Automatic Column Group Detection Column groups detected based on an STS or by monitoring a workload Uses DBMS_STATS procedure SEED_COL_USAGE If the first two arguments are set to NULL the current workload will be monitored The third argument is the time limit in seconds 1. Start column group usage capture 34

Automatic Column Group Detection Actual number of rows returned by this query is 932 Optimizer under-estimates the cardinality as it assumes each where clause predicate will reduce number of rows returned Optimizer is not aware of real-world relations between city, state, & country 2. Run your workload 35

Automatic Column Group Detection Actual number of rows returned by this query is 145 Optimizer over-estimates the cardinality as it is not aware of the real-world relations between state & country 2. Run your workload 36

Automatic Column Group Detection SELECT dbms_stats.report_col_usage(user, 'customers') FROM dual; 3. Check column usage information recorded for our table EQ means column was used in equality predicate in query 1 COLUMN USAGE REPORT FOR SH.CUSTOMERS 1. COUNTRY_ID : EQ 2. CUST_CITY : EQ 3. CUST_STATE_PROVINCE : EQ 4. (CUST_CITY, CUST_STATE_PROVINCE, COUNTRY_ID) : FILTER 5. (CUST_STATE_PROVINCE, COUNTRY_ID) : GROUP_BY FILTER means columns used together as filter predicates rather than join etc. Comes from query 1 GROUP_BY columns used in group by expression in query 2 37

Automatic Column Group Detection 4. Create extended stats for customers based on usage SELECT dbms_stats.create_extended_stats(user, 'customers') FROM dual; EXTENSIONS FOR SH.CUSTOMERS 1. (CUST_CITY, CUST_STATE_PROVINCE, COUNTRY_ID): SYS_STUMZ$C3AIHLPBROI#SKA58H_N 2. (CUST_STATE_PROVINCE, COUNTRY_ID) : SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ created created Column group statistics will now be automatically maintained every time you gather statistics on this table 38

END OF PART 3 39