Oracle Database 11gR2 Optimizer Insights Marcus Bender Distinguished Sales Consultant Presales Fellow Strategic Technical Support (STU) ORACLE Deutschland GmbH, Geschäftsstelle Hamburg
Parallel Execution & Optimizer Optimizer Optimizer / Cardinality Optimizer / Access Path Optimizer / Join Types Optimizer / Join Order Monitoring Execution SQL Monitor
What Happens When a SQL Statement is issue? User SQL Execution 1 2 Oracle Database Syntax Check Semantic Check Parsing 4 Shared Pool check Shared Pool 3 Query Transformation Library Cache Shared SQL Area C1 C2 Cn Plan Generator Code Generator Optimizer Cost Estimator
Optimizer Engine Cardinality How to determine what cardinality estimate should be How to combat common causes for incorrect cardinality estimates Access paths How to determine the access path selected Caused for why the wrong access path was selected Join type How to determine what what join type was selected Common causes for why the wrong join type was selected Join order How to determine what join order was selected Common causes for why the wrong join order was selected *
Execution Plan SELECT /*+ gather_plan_statistics */ p.prod_name, SUM(s.quantity_sold) FROM sales s, products p WHERE s.prod_id =p.prod_id GROUP By p.prod_name ; SELECT * FROM table ( dbms_xplan.display (null, null. 'ADVANCED')); Slide 1
Execution plan - Cardinality Cardinality the estimated # of rows returned by each operation in the plan
Execution plan Access Path Access Path Look in Operation section to see how each object is being accessed
Access Paths Getting the data Access Path Full table scan Table access by Rowid Index unique scan Index range scan Index skip scan Full index scan Fast full index scan Index joins Bitmap indexes Explanation Reads all rows from table & filters out those that do not meet the where clause predicates. Used when no index, DOP set etc Rowid specifies the datafile & data block containing the row and the location of the row in that block. Used if rowid supplied by index or in where clause Only one row will be returned. Used when stmt contains a UNIQUE or a PRIMARY KEY constraint that guarantees that only a single row is accessed. Accesses adjacent index entries returns ROWID values Used with equality on non-unique indexes or range predicate on unique index (<.>, between etc) Skips the leading edge of the index & uses the rest Advantageous if there are few distinct values in the leading column and many distinct values in the nonleading column Processes all leaf blocks of an index, but only enough branch blocks to find 1 st leaf block. Used when all necessary columns are in index & order by clause matches index struct or if sort merge join is done Scans all blocks in index used to replace a FTS when all necessary columns are in the index. Using multi-block IO & can going parallel Hash join of several indexes that together contain all the table columns that are referenced in the query. Wont eliminate a sort operation uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Can efficiently merge indexes that correspond to several conditions in a WHERE clause Slide 2
Execution plan - Join Type Look in the Operation section to check if the correct join type is used
Join Types Access Path Nested Loops joins Hash Joins Sort Merge joins Cartesian Joins Outer Joins Explanation For every row in the outer table, Oracle accesses all the rows in the inner table Useful when joining small subsets of data and there is an efficient way to access the second table (index look up) The smaller of two tables is scan and resulting rows are used to build a hash table on the join key in memory. The larger table is then scan, join column of the resulting rows are hashed and the values used to probing the hash table to find the matching rows. Useful for larger tables & if equality pred Consists of two steps: 1. Sort join operation: Both the inputs are sorted on the join key. 2. Merge join operation: The sorted lists are merged together. Useful when the join condition between two tables is an inequality condition Joins every row from one data source with every row from the other data source, creating the Cartesian Product of the two sets. Only good if tables are very small. Only choice if there is no join condition specified in query Returns all rows that satisfy the join condition and also returns all of the rows from the table without the (+) for which no rows from the other table satisfy the join condition
Join order 2 1 3 Want to start with the table that reduce the result set the most If the join order is not correct, check the statistics, cardinality & access methods
Optimizer / Cardinality Oracle 11gR2 Cardinality on the object and on the join level is determined by the optimizer to find the best execution plan For the optimizer that means number of rows returned by an operation The column ROWS in the execution plan or ESTIMATED ROWS in sql monitor shows this information Correct information is crucial for correct execution plans Cardinality feedback (object level): optimizer estimates cardinality actual rows processed are kept in row source tree If estimations are wrong by factor 2 or more they are overwritten
Check Cardinality using SQL Monitor
Incorrect Cardinality estimates causes Cause No Statistics or Stale Statistics Data Skew Correlated single Table Predicates Multiple correlated Columns used in a Join Function wrapped Column Complicated Expression Slide 3
Optimizer Statistics CPU & IO DATA DICTIONARY Optimizer OPTIMIZER STATISTICS Index Table Column System PROMO_PK Index PR_ID PR_NAME PROMO_DATE 1 Promo_1 15-NOV-98 2 Promo_1 31-DEC-98 PROMOTIONS Table GB HJ HJ Execution plan
Stale Statistics Statistics are considered stale when 10% or more of the rows in the object have changed Changes include, inserts, updates, deletes etc. Query dictionary to check if statistics are stale SELECT table_name, stale_stats FROM user_tab_ statistics; Table Name Stale_stats Sales NO Customers YES Product -- No means Stats are good Yes means Stats are stale Null means no Stats Solution gather statistics
How to Gather Statistics Your gather statistics commands should be this simple Use DBMS_STATS Package
How to Gather Statistics Sample Size # 1 most commonly asked question What sample size should I use? Controlled by ESTIMATE_PRECENT parameter From 11g onwards use default value AUTO_SAMPLE_SIZE New hash based algorithm Speed of a 10% sample Accuracy of 100% sample
Example of Data Skew SELECT * FROM HR.Employee WHERE Job_id = AD_VP; NAME ENUM JOB Kochhar 101 AD_VP De Haan 102 AD_VP Optimizer assumed even distribution cardinality estimate is NUM_ROW 107 6 = = NDV 19 HR Employee table Last_name Em_id Job_id SMITH 99 CLERK ALLEN 7499 CLERK WARD 2021 CLERK KOCHHAR 101 AD_VP De Haan 102 AD_VP CLARK 7782 CLERK
Solution: Gather Histogram Stats EXEC DBMS_STATS.GATHER_TABLE_STATS( HR, EMPLOYEES,method_opt => FOR ALL COLUMNS SIZE SKEWONLY ); SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = EMPLOYEES';
What is a frequency histogram? JOB_ID ========== AC_ACCOUNT Bucket 1 has AC_ACCOUNT = 4 AC_VP AD_ASST AD_PRES CLERK Bucket 2 has AD_VP = 2 Bucket 3 has AD_ASST = 3 Bucket 4 has AD_PRES = 11 Bucket 5 has CLERK = 36... FI_ACCOUNT Bucket 19 has FI_ACCOUNT = 8
Multiple correlated Columns SELECT... FROM... WHERE model = 530xi AND make = BMW ; Make BMW BMW BMW Model 530xi 530xi 530xi Color RED BLACK SILVER Three records selected Cardinality #ROWS * 1 * 1 NDV c1 NDV c2 Cardinality = 12 * 1:3 * 1:4 = 1 Make Vehicles table Model Color BMW 530xi RED BMW 530xi BLACK BMW 530xi SILVER PORSCHE 911 RED MERC SLK RED MERC C320 SLIVER
Solution Create extended statistics on the Model & Make columns exec dbms_stats.gather_table_stats('dwh', 'VEHICLES', degree=>8, method_opt=> 'for columns (MODEL, MAKE) size 256 '); Select column_name, num_distinct, histogram from user_tab_col_statistics where table_name = Vehicles'; New Column with system generated name
Solutions for correct cardinality estimates Cause Stale or missing statistics Data Skew Multiple single column predicates on a table Multiple columns used in a join Function wrapped column Complicated expression containing columns from multiple tables Solution DBMS_STATS Create a histogram Create a column group using DBMS_STATS.CREATE_EXTENDED_STATS Create a column group using DBMS_STATS.CREATE_EXTENDED_STATS Create statistics on the function wrapped column using DBMS_STATS.CREATE_EXTENDED_STATS Use dynamic sampling level 4 or higher
Optimizer / Cardinality Oracle 12c The optimizer in 12g checks the cardinality during execution Results are buffered before processed NL / HASH Buffer If #rows <= 10 then NL else HASH NL Scan C Scan A Index B
Optimizer / Statistics Oracle 12c Dynamic Sampling Optimizer gathers statistics during parse operation alter system set optimizer_dynamic_sampling = 6; Optimizer Dynamic Sampling (no joins): 0 = off 2 = DEFAULT, tables without stats, 32 blocks 3 = level 2 + complex single predicates, 64 blocks 4 = even if table is analyzed, statistics are gathered 5 = level 4, 2x default blocks 6 = level 4, 4 x default blocks PQ 7 = level 4, 8 x default blocks 8 = level 4, 32 x default blocks 9 = level 4, 128 x default blocks 10 = level 4, complete table
Simplification & Performance From experience we know that running a statement with the optimal execution plan results in best performance Performance: In 11gR2 cardinality extended statistics, cardinality feedback and improved dynamic sampling enables the optimizer to find the best execution plan Simple: For extended stats a tool is designed that analyzes the workload and recommends all relevant group statistics
Parallel Execution & Optimizer Optimizer Optimizer / Cardinality Optimizer / Access Path Optimizer / Join Types Optimizer / Join Order Monitoring Execution SQL Monitor
Simplification & Performance Simple: Monitoring execution provides an excellent overview of all running statements on a system including: Execution plan Number of PQ slaves and PQ distribution Top wait events Temp usage Actual rows processed Elapsed time and phase of execution CPU and I/O consumption Performance: Thus enabling a DBA to get easy the best and most detailed information to analyze potential performance problems