Real-World Performance Training SQL Performance

Size: px

Start display at page:

Download "Real-World Performance Training SQL Performance"

Jeffry McCarthy
5 years ago
Views:

2 Real-World Performance Training SQL Performance Real-World Performance Team

3 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

4 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

5 The Optimizer

6 The Optimizer Engine The Rule Based Optimizer determines execution plan based on rules Inputs SQL Physical Design Indexes Partitioning Execution Plan Index access if index is present Join type based on how data is stored The rule based optimizer used a set of rules and rankings to determine the execution plan based on the SQL statement and how data was stored Not available in Oracle 11 or 12

7 The Cost Based Optimizer Introduces Data Awareness What would we need to know? SQL Design Physical Layout Table layout, partitions, indexes, constraints Data Layout = Statistics How many rows in the table? How big is the table? For each column how many values? How good is the system at table scanning? If we are joining how many rows will one row on one table get from the join?

8 The Cost Based Optimizer Different Types of Statistics System Statistics Performance information for the system IO and CPU capability Table Statistics Reflect a point in time view of the data Static Dynamic Statistics Run time sampling of the data

9 The Optimizer Engine The cost based optimizer determines execution plan based on inputs Inputs Execution Plan SQL Schema Design Access Method Physical Design Join Method Statistics Join Order Table Column Dynamic Statistics System Statistics Distribution Method

10 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

11 Optimizer Inputs Schema Design Third Normal Dimensional Other

12 Optimizer Inputs SQL Design The design of the SQL statement and the operators used have a considerable impact on performance intersect minus exists not exists window functions multi-table inserts outer joins distinct vs group by

13 Optimizer Inputs Physical Design The physical design of the system determines which optimizations are available during execution Indexes index access vs full table scans Partitioning partition pruning and partition wise joins

14 Statistics Basics The statistics contain information which the optimizer uses when developing the execution plan They do not change unless they are recollected Statistics are kept at different levels Table Partition Column Index

15 Statistics Main Table and Partition Statistics Statistic Rows Blocks Empty blocks Avg Row Length Description Number of rows in the object Number of blocks in the object Number of empty blocks in the object Average Row Length in the object 15

16 Statistics Main Column Statistics Statistic NDV Low Value High Value Density Num Nulls Description Number of distinct values Low value in column High value in column Shows distribution of values Number of nulls in the column 16

17 Gathering Statistics Problems I cannot collect statistics it takes too long If I collect statistics things might change What sample size should I use? I don t need to bother because Oracle automatically collects statistics every night

18 Gathering Statistics Guidelines Use the DBMS_STATS procedure; do not use ANALYZE DBMS_STATS changes between releases and patches Use AUTO_SAMPLE_SIZE This uses a faster and more accurate NDV algorithm Allows the use of Incremental Statistics Allows the use of Concurrent Statistics Allows the use of new histogram types

19 Statistics Gathering Performance Auto Sample Size Fast and Accurate Incremental Stats Concurrent Stats Gather statistics on bulk loads New in 12c gathers stats on a bulk load into an empty table On by default and can be controlled with the GATHER_OPTIMIZER_STATISTICS and NO_GATHER_OPTIMIZER_STATISTICS hints More for convenience than performance

20 Gathering Statistics Incremental Stats Partitioned tables typically need both partition level and global statistics For queries that prune to a single partition, the partition level stats are used For queries that prune to more than one partition, the global level stats are used With incremental stats The statistics for a newly loaded partition are gathered and recorded in a synopsis Global level stats are computed from the synopsis 12c supports incremental stats on non-partitioned tables Creates synopsis on table to facilitate partition exchange

21 Gathering Statistics Incremental Stats Object Column Values NDV Partition #1 1,3,3,4,5 4 Partition #2 2,3,4,5,6 5

22 Gathering Statistics Incremental Stats Object Column Values NDV Partition #1 1,3,3,4,5 4 Partition #2 2,3,4,5,6 5 NDV by Addition WRONG 9

23 Gathering Statistics Incremental Stats Object Column Values NDV Partition #1 1,3,3,4,5 4 Partition #2 2,3,4,5,6 5 NDV by Addition WRONG 9 NDV by Synopsis CORRECT 6

24 Gathering Statistics Incremental Stats Comparison of Incremental vs non-incremental stats Simulates a daily load over 60 days 1,000,000 Rows Per Day 60 Partitions

25 Gathering Statistics Incremental Stats Begin DBMS_STATS.SET_TABLE_PREFS(USER, 'TABLE NAME', 'INCREMENTAL','TRUE'); End; /

26 Gathering Statistics Concurrent Stats By default, stats are gathered sequentially for each partition of a partitioned table Concurrent stats enables partition statistics to be gathered concurrently Enabled through job scheduler Stats jobs only run on one partitioned table at a time Full details:

27 Gathering Statistics Concurrent Stats Gather Schema Stats Job1 Table 1 Global Stats Job2 Table 2 Global Stats Job3 Table 3 Coord Job Job4 Table 4 Global Stats Job3.1 Table 3 Partition 1 Stats Job3.2 Table 3 Partition 2 Stats Job3.3 Table 3 Partition 3 Stats

28 Gathering Statistics Concurrent Stats Begin DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','TRUE'); End; / Number of jobs limited by job_queue_processes Coordinate job_queue_processes and parallel setting on stats job

29 Dynamic Statistics Previously called dynamic sampling Statistics gathered at parse time to augment the existing statistics Always invoked for parallel execution Used when regular statistics are not sufficient to get good quality cardinality estimates. Table level stats prior to 12 enhanced in 12c to include joins and group-by predicates Controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter Dynamic statistics can greatly enhance the quality of statistics and lead to better execution plans But there is a cost better statistics are usually derived with larger sample sizes leading to increased parse times

30 System Statistics System Statistics provide the Optimizer information about the performance capabilities of the server CPU and IO 1. Gather system statistics System statistics describe the system's hardware characteristics, such as I/O and CPU performance Recommended in the Upgrade Guide 2. Set Exadata statistics Exadata specific MOS Note Do not gather or set system statistics RWP Recommendation System statistics add another variable for the optimizer to consider, which can lead to suboptimal plans

31 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

32 The Optimizer Engine Inputs SQL Schema Design Statistics Table Column Dynamic Statistics System Statistics Execution Plan Access Method Join Method Join Order Distribution Method

33 The Execution Plan A product of the Inputs The optimizer takes the inputs and creates an Execution Plan, which contains the detailed steps the system will use to execute the SQL statement The quality of the execution plan is dependent on the quality of the inputs Poor Quality Inputs = Poor Execution Plan Good Quality Inputs = Good Execution Plan

34 The Execution Plan Determining the execution Plan Each line in an execution plan is a row source which produces rows and sends them to the next row in the plan The optimizer estimates how many rows will be produced by a row source and associates a cost with the operation The optimizer evaluates multiple execution plans and determines their cost The optimizer picks the plan with the lowest cost

35 Execution Plan Basic Components Execution Plan Row Source Access method Join method Join order Cardinality Shows the detailed steps used to execute a SQL statement Execution plan operators that produce and/or consume rows The way in which the data is accessed The method used to join tables The order in which the tables are joined Number of rows produced by a row source 35

36 How to see the Execution Plan DBMS_XPLAN.DISPLAY_* procedures SQL Monitor Report SQL Trace SQL*Plus Auto Trace Explain Plan for shows the expected execution plan which may be different from the actual execution plan

37 Execution Plan DBMS_XPLAN.DISPLAY_* Gives the actual execution plan from various sources Cursor cache dbms_xplan.display_cursor AWR dbms_xplan.display_awr SQL Tuning Set dbms_xplan.display_sqlset

38 Execution Plan DBMS_XPLAN.DISPLAY_CURSOR Output Plan hash value: Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT 187K(100) 1 SORT AGGREGATE 1 14 * 2 TABLE ACCESS STORAGE FULL LINEORDER K 187K (1) 00:00: Predicate Information (identified by operation id): storage(("lo_tax"=0 AND "LO_ORDERDATE">=TO_DATE(' :00:00', 'syyyy-mm-dd hh24:mi:ss') AND "LO_ORDERDATE"<=TO_DATE(' :00:00', 'syyyy-mm-dd hh24:mi:ss'))) filter(("lo_tax"=0 AND "LO_ORDERDATE">=TO_DATE(' :00:00', 'syyyy-mm-dd hh24:mi:ss') AND "LO_ORDERDATE"<=TO_DATE(' :00:00', 'syyyy-mm-dd hh24:mi:ss')))

39 Execution Plan SQL Monitor Report 7/16/2018

40 An Example You As The Optimizer

41 You As The Optimizer If a query retrieves 10 rows from a 50 million row table, what is the correct access method? INDEX If a query retrieves 40 million rows from a 50 million row table, what is the correct access method? TABLE SCAN Did you make this decision based on rules, or an estimate of cost?

42 You As The Optimizer Determining the Access Method In the previous example we were given the number of rows to be retrieved. But usually the optimizer is not given this information, so let s look at a query and discuss how the optimizer decides which access method to use The query will ask What is the total quantity of orders in the week of July 4 th 1997 sold with a zero tax rate?

43 You As The Optimizer What information do you need to determine the access method? SELECT SUM(lo_quantity) FROM lineorder WHERE lo_orderdate BETWEEN to_date(' ','yyyymmdd') AND to_date(' ','yyyymmdd') AND lo_tax = 0 ;

44 You As The Optimizer What information do you need to determine the access method? SELECT SUM(lo_quantity) FROM lineorder WHERE lo_orderdate BETWEEN to_date(' ','yyyymmdd') AND to_date(' ','yyyymmdd') AND lo_tax = 0 ; Number of rows in the table Existence of indexes If the table is partitioned, how many partitions will need to be accessed Estimate of the number of rows to be retrieved from the table NDV Number of distinct values in a column How selective are the filter columns? If there are lots of distinct values the filter predicates will probably have fewer rows If there are few distinct values the filter predicates will probably have more rows

45 You As The Optimizer Determining the Access Method SELECT SUM(lo_quantity) FROM lineorder WHERE lo_orderdate BETWEEN to_date(' ','yyyymmdd') AND to_date(' ','yyyymmdd') AND lo_tax = 0 ; lineorder rows 53,986,608 lineorder partitioned no lo_orderdate indexed yes, btree lo_tax indexed no lo_orderdate NDV 2406 lo_tax NDV 9

46 You As The Optimizer Determining the Access Method Index Scan Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT (1) 00:00:02 1 SORT AGGREGATE 1 14 * 2 TABLE ACCESS BY INDEX ROWID BATCHED LINEORDER K (1) 00:00:02 * 3 INDEX RANGE SCAN LO_DATE_N 179K 480 (1) 00:00:

47 You As The Optimizer Determining the Access Method Table Scan Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT K (1) 00:00:08 1 SORT AGGREGATE 1 14 * 2 TABLE ACCESS FULL LINEORDER K 187K (1) 00:00:

48 You As The Optimizer Determining the Access Method Access Method Cost Index Scan 47,908 Table Scan 187,519 48

49 You As The Optimizer Determining the Join Order and Method Now let s look at a query that has more than one table We will ask the same question but instead of putting date predicates directly on the lineorder table, we join the lineorder table to date_dim and we specify our date range using date_dim What is the total quantity of orders in the week of July 4 th 1997 sold with a zero tax rate?

50 You As The Optimizer Determining the Join Order and Method SELECT SUM(lo_quantity) FROM lineorder JOIN date_dim ON lo_orderdate = d_datekey WHERE d_year = 1997 AND d_weeknuminyear = 27 AND lo_tax = 0 ;

51 You As The Optimizer Determining the Join Order and Method Consider the join order If we retrieve X number of rows from lineorder, how many rows will we then retrieve from date_dim And vice versa, if we retrieve X number of rows from date_dim, how many rows will we then retrieve from lineorders How many rows from each table overlap or join?

52 You As The Optimizer Determining the Join Order and Method Table Statistic Value lineorder rows 53,986,608 date_dim rows 2,556 lineorder lo_orderdate low 01-JAN-92 lineorder lo_orderdate high 02-AUG-98 date_dim d_datekey low 31-DEC-91 date_dim d_datekey high 29-DEC-98 52

53 You As The Optimizer Determining the Join Order and Method Table Statistic Value lineorder lo_orderdate NDV 2,406 lineorder lo_tax NDV 9 date_dim d_datekey NDV 2,556 date_dim d_year NDV 8 date_dim d_weeknuminyear NDV 53 53

54 You As The Optimizer Determining the Join Order and Method For each order, we then consider which join method to use Nested Loop Hash Merge

55 You As The Optimizer Determining the Join Order and Method Nested Loops Join Nested loops joins work best when retrieving a small number of rows for the join condition A nested loop join involves the following steps: 1. The optimizer determines the driving table and designates it as the outer table. 2. The other table is designated as the inner table. 3. For every row in the outer table, retrieve matching rows in the inner table. If 10 rows are retrieved from the outer table we need to perform 10 lookups in the inner table If 10M rows are retrieved from the outer table we need to perform 10M lookups in the inner table These lookups can be performed via index or table scans

56 You As The Optimizer Determining the Join Order and Method Nested Loops Join Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT (1) 00:00:02 1 SORT AGGREGATE NESTED LOOPS K (1) 00:00:02 3 NESTED LOOPS 157K 486K (1) 00:00:02 4 TABLE ACCESS BY INDEX ROWID BATCHED DATE_DIM (0) 00:00:01 * 5 INDEX RANGE SCAN DATE_DIM_N1 7 1 (0) 00:00:01 * 6 INDEX RANGE SCAN LO_DATE_N (0) 00:00:01 * 7 TABLE ACCESS BY INDEX ROWID LINEORDER (1) 00:00: Outer Table Inner Table

57 You As The Optimizer Determining the Join Order and Method Nested Loops Join Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT K (1) 00:04:02 1 SORT AGGREGATE NESTED LOOPS K 6187K (1) 00:04:02 3 NESTED LOOPS 5998K 486K 6187K (1) 00:04:02 * 4 TABLE ACCESS FULL LINEORDER 5998K 80M 187K (1) 00:00:08 * 5 INDEX UNIQUE SCAN DATE_DIM_PK 1 0 (0) 00:00:01 * 6 TABLE ACCESS BY INDEX ROWID DATE_DIM (0) 00:00: Outer Table Inner Table

58 You As The Optimizer Determining the Join Order and Method HASH Join Hash joins are used for joining large data sets. 1. The optimizer uses the smaller of the two tables or data sources to build a hash table on the join key in memory, using a deterministic hash function to specify the location in which to store each row in the hash table. This is called the build side. 2. It then scans the larger table, called the probe table, probing the hash table to find the joined rows. This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

59 You As The Optimizer Determining the Join Order and Method HASH Join Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT K (1) 00:00:08 1 SORT AGGREGATE 1 29 * 2 HASH JOIN K 187K (1) 00:00:08 3 TABLE ACCESS BY INDEX ROWID BATCHED DATE_DIM (0) 00:00:01 * 4 INDEX RANGE SCAN DATE_DIM_N1 7 1 (0) 00:00:01 * 5 TABLE ACCESS FULL LINEORDER 5998K 80M 187K (1) 00:00: Build Table Probe Table

60 You As The Optimizer Determining the Join Order and Method HASH Join Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time SELECT STATEMENT K (1) 00:00:08 1 SORT AGGREGATE 1 29 * 2 HASH JOIN K 148M 194K (1) 00:00:08 * 3 TABLE ACCESS FULL LINEORDER 5998K 80M 187K (1) 00:00:08 4 TABLE ACCESS BY INDEX ROWID BATCHED DATE_DIM (0) 00:00:01 * 5 INDEX RANGE SCAN DATE_DIM_N1 7 1 (0) 00:00: Build Table Probe Table

61 You As The Optimizer Determining the Join Order and Method Merge Join Merge joins are used for joining large data sets. Requires the data to be sorted not as efficient as a hash join Used with inequalities where a hash join is not able to be used How it works: 1. A sort merge join reads two data sets and sorts them when they are not already sorted. 2. For each row in the first data set, the database finds a starting row in the second data set, and then reads the second data set until it finds a nonmatching row. 3. For large data sets that do not fit into memory the sorts will spill to temp, which has a noticeable performance impact while performing physical IO

62 You As The Optimizer Determining the Join Order and Method Merge Join Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time SELECT STATEMENT K (1) 00:00:09 1 SORT AGGREGATE MERGE JOIN K 217K (1) 00:00:09 3 SORT JOIN (34) 00:00:01 4 TABLE ACCESS BY INDEX ROWID BATCHED DATE_DIM (0) 00:00:01 * 5 INDEX RANGE SCAN DATE_DIM_N1 7 1 (0) 00:00:01 * 6 SORT JOIN 5998K 80M 275M 217K (1) 00:00:09 * 7 TABLE ACCESS FULL LINEORDER 5998K 80M 187K (1) 00:00:

63 You As The Optimizer Determining the Join Order and Method Merge Join Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time SELECT STATEMENT K (1) 00:00:09 1 SORT AGGREGATE MERGE JOIN K 217K (1) 00:00:09 3 SORT JOIN 5998K 80M 275M 217K (1) 00:00:09 * 4 TABLE ACCESS FULL LINEORDER 5998K 80M 187K (1) 00:00:08 * 5 SORT JOIN (34) 00:00:01 6 TABLE ACCESS BY INDEX ROWID BATCHED DATE_DIM (0) 00:00:01 * 7 INDEX RANGE SCAN DATE_DIM_N1 7 1 (0) 00:00:

64 You As The Optimizer Determining the Join Order and Method The optimizer will associate a cost with each of these based on it s estimate of the amount of work The goal is to reduce the number of rows early so it will perform less work throughout the execution of the SQL statement We basically have a matrix of join orders and methods and the cost associated with each

65 You As The Optimizer Determining the Join Order and Method Join Method Join Order date_dim, lineorder lineorder, date_dim Nested Loops 39,480 6,187,540 Hash 187, ,909 Merge 217, ,129 65

66 Serial vs Parallel Execution

67 Serial and Parallel Execution Serial Execution SQL is executed by one process The correct solution when: the query references a small data set high concurrency efficiency is important Parallel Execution SQL is executed by many processes working together The correct solution when: the query references a large data set low concurrency elapsed time is important

68 Parallel Execution Basics Query Coordinator (QC) Parallel Execution Server (PX) Degree of Parallelism (DoP) Parallel server group Distribution method The top level process for the parallel query An (OS) process that operates on part of a parallel query The number of parallel execution servers used in each parallel server group during parallel execution The group of parallel server processes that operate on a row source The method by which data is sent from one set of PX servers to another 68

69 Distribution Methods

70 Parallel Query Lets look at 2 parallel execution plans for the same query: select count(*) from yellow y, green g where y.deptno = g.deptno

71 Parallel Query Hash Distribution Green Table Scan PQ1 PQ2 f(x) f(x) Hash Distribution PQ3 Yellow Table Scan PQ1 PQ2 f(x) f(x) PQ4 Join and Hash Distribution Aggregate QC Result Set

72 Parallel Query Hash Distribution

73 Hash Distribution Producers Consumers

74 Hash Distribution Producers Consumers Sent 16 rows Received 16 rows

75 Parallel Query Broadcast Distribution Green Table Scan PQ1 Broadcast Distribution PQ2 PQ3 Yellow Table Scan, Join and Aggregate PQ4 QC Result Set

76 Parallel Query Broadcast Distribution

77 Broadcast Distribution Producers Consumers

78 Broadcast Distribution Producers Consumers Sent 16 rows Received 64 rows

79 1 2 3 Small Table Replication Table is loaded into the buffer cache Buffer Cache If the table is not already in the buffer cache, one or more of the px processes first load it into the buffer cache 4

80 1 2 Small Table Replication Table is read from the buffer cache Buffer Cache Each parallel process reads the table from the buffer cache 3 4

81 Parallel Execution What may go wrong Skew Distribution method A PX process will do much more work than the others SELECT DISTINCT when there are very few distinct values A hash join with 1 or 2 values having most of the rows Using broadcast when it should use hash Usually happens when row estimate is much lower than actual Using hash when it should broadcast: Too few values, row distribution is skewed Usually, better/extended statistics help solve the problem

82 You As The Optimizer Determining the Distribution Method SELECT SUM(lo_quantity) FROM lineorder JOIN date_dim ON lo_orderdate = d_datekey WHERE d_year = 1997 AND d_weeknuminyear = 27 AND lo_tax = 0 ; Size of the tables and rows retrieved Selectivity of join predicates How many rows do we expect to join from each table How many rows do we expect to retrieve, and thus distribute, from each table Skew in the data Hash distribution does not work well with data skew

83 You As The Optimizer Determining the Distribution Method Hash Distribution Id Operation Name Rows Bytes Cost (%CPU) Time TQ IN-OUT PQ Distrib SELECT STATEMENT K (1) 00:00:05 1 SORT AGGREGATE PX COORDINATOR 3 PX SEND QC (RANDOM) :TQ Q1,02 P->S QC (RAND) 4 SORT AGGREGATE 1 29 Q1,02 PCWP * 5 HASH JOIN K 104K (1) 00:00:05 Q1,02 PCWP 6 JOIN FILTER CREATE :BF (0) 00:00:01 Q1,02 PCWP 7 PX RECEIVE (0) 00:00:01 Q1,02 PCWP 8 PX SEND HASH :TQ (0) 00:00:01 Q1,00 P->P HASH 9 PX BLOCK ITERATOR (0) 00:00:01 Q1,00 PCWC * 10 TABLE ACCESS FULL DATE_DIM (0) 00:00:01 Q1,00 PCWP 11 PX RECEIVE 5998K 80M 104K (1) 00:00:05 Q1,02 PCWP 12 PX SEND HASH :TQ K 80M 104K (1) 00:00:05 Q1,01 P->P HASH 13 JOIN FILTER USE :BF K 80M 104K (1) 00:00:05 Q1,01 PCWP 14 PX BLOCK ITERATOR 5998K 80M 104K (1) 00:00:05 Q1,01 PCWC * 15 TABLE ACCESS FULL LINEORDER 5998K 80M 104K (1) 00:00:05 Q1,01 PCWP

84 You As The Optimizer Determining the Distribution Method Broadcast Distribution Id Operation Name Rows Bytes Cost (%CPU) Time TQ IN-OUT PQ Distrib SELECT STATEMENT K (1) 00:00:05 1 SORT AGGREGATE PX COORDINATOR 3 PX SEND QC (RANDOM) :TQ Q1,01 P->S QC (RAND) 4 SORT AGGREGATE 1 29 Q1,01 PCWP * 5 HASH JOIN K 104K (1) 00:00:05 Q1,01 PCWP 6 JOIN FILTER CREATE :BF (0) 00:00:01 Q1,01 PCWP 7 PX RECEIVE (0) 00:00:01 Q1,01 PCWP 8 PX SEND BROADCAST :TQ (0) 00:00:01 Q1,00 P->P BROADCAST 9 PX BLOCK ITERATOR (0) 00:00:01 Q1,00 PCWC * 10 TABLE ACCESS FULL DATE_DIM (0) 00:00:01 Q1,00 PCWP 11 JOIN FILTER USE :BF K 80M 104K (1) 00:00:05 Q1,01 PCWP 12 PX BLOCK ITERATOR 5998K 80M 104K (1) 00:00:05 Q1,01 PCWC * 13 TABLE ACCESS FULL LINEORDER 5998K 80M 104K (1) 00:00:05 Q1,01 PCWP

85 You As The Optimizer Determining the Distribution Method Small Table Replication Id Operation Name Rows Bytes Cost (%CPU) Time TQ IN-OUT PQ Distrib SELECT STATEMENT K (1) 00:00:05 1 SORT AGGREGATE PX COORDINATOR 3 PX SEND QC (RANDOM) :TQ Q1,00 P->S QC (RAND) 4 SORT AGGREGATE 1 29 Q1,00 PCWP * 5 HASH JOIN K 104K (1) 00:00:05 Q1,00 PCWP 6 JOIN FILTER CREATE :BF (0) 00:00:01 Q1,00 PCWP * 7 TABLE ACCESS FULL DATE_DIM (0) 00:00:01 Q1,00 PCWP 8 JOIN FILTER USE :BF K 80M 104K (1) 00:00:05 Q1,00 PCWP 9 PX BLOCK ITERATOR 5998K 80M 104K (1) 00:00:05 Q1,00 PCWC * 10 TABLE ACCESS FULL LINEORDER 5998K 80M 104K (1) 00:00:05 Q1,00 PCWP

86 You As The Optimizer Determining the Execution Plan The optimizer makes decisions for every step in the plan based on available information For more complex queries, the matrix of decisions becomes larger and the difficulty of producing accurate cardinality estimates increases There are lots of tools which can be used to increase performance but they are only effective if the optimizer chooses the correct plan Partitioning Compression Exadata Database In-Memory

87 The Optimizer Poor plans The vast majority of bad execution plans are the result of poor cardinality estimates Verification of cardinality estimates should always be the starting point for diagnosis Check for order of magnitude differences Consider whether gathering new statistics fits the scope of the problem Is there one problem query? Are there many problem queries? Not every poor cardinality estimate results in a bad plan

88 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

89 Advanced Optimizer Engine Basic statistics provide basic information about the data Assumes even distribution of data values Does not account for correlation between predicate filters There are facilities to restrict and/or guide the optimizer Versions 11g and 12c introduced features to allow the optimizer to learn and adapt

90 Advanced Optimizer Engine Inputs Statistics Histograms Extended Statistics Plan Management Hints Adaptive Statistics Execution Plan Adaptive Plans

91 The Optimizer The optimizer has become increasingly more dynamic with each release We can classify the information the optimizer uses into the following categories 1. Things to help get the right plan the first time 2. Things to help get the right plan during execution 3. Things to help get the right plan on subsequent executions 4. Things to help get the right plan if the others fail

92 The Optimizer Engine 1. Getting the plan right the first time Inputs SQL Schema Design Physical Design Statistics Table Column Dynamic Statistics System Statistics Execution Plan Access Method Join Method Join Order Distribution Method

93 The Optimizer Engine 1. Getting the plan right the first time work on getting the inputs correct Inputs Execution Plan SQL Schema Design Access Method Physical Design Join Method Statistics Join Order Table Column Dynamic Statistics Do not rely on these System Statistics Histograms Extended Statistics Distribution Method

94 Handling Data Skew Histograms Designed to give more detail on data distribution Histograms are built based on SQL Usage Hence the need to seed the optimizer sys.col_usage$ keeps track of columns used in predicates and joins These columns are then candidates for histograms New in 12c Up to 2,048 buckets but default values still use 254 Uses a full table scan to create histograms with AUTO_SAMPLE_SIZE more accurate for unpopular values Types Frequency for columns with NDV <= 254 Top-frequency frequency histogram where NDV > 254 but majority of the rows <= 254 values Height Balanced no longer used with AUTO_SAMPLE_SIZE Replaced by Hybrid Still created with estimate_percent Remain after upgrades Hybrid for columns with NDV > 254 and lots of popular values

95 Handling Data Correlation and Functions Two Types of Extended Statistics Column Groups Provides the optimizer with information regarding the relationship (correlation) between the data stored in different columns of the same table City and airport code Month and zodiac sign Allows the optimizer to compute a better cardinality estimate when several the columns from the same table are use d together in a where clause of a SQL statement Expression Statistics Helps estimate the cardinality of a where clause predicate that has columns embedded inside expressions UPPER(EMP_LAST_NAME)=:B1

96 Extended Statistics Use the dbms_stats.create_extended_stats function to create extended statistics Select dbms_stats.create_extended_stats(user, AIRPORTS, (CITY,CODE) ) from dual; Select dbms_stats.create_extended_stats(user, EMP, (UPPER(EMP_LAST_NAME)) ) from dual;

97 Extended Statistics Restrictions Only used when the SQL statement predicates are equalities or in-lists. Not used if there are histograms present on the underlying columns and there is no histogram present on the column group.

98 Advanced Optimizer Engine 2. Get the right plan during execution Inputs Execution Plan Adaptive Plans

99 Advanced Optimizer Engine 3. Get the right plan on subsequent executions Inputs Execution Plan Adaptive Statistics

100 Adaptive Query Optimization Goals Introduced in 12.1 Increase the likelihood of good plans by making use of execution statistics During execution As feedback for re-optimization During future compilations As feedback for sampling As feedback to direct subsequent gathering of statistics LEARNING from your workload and persisting information in the data dictionary, increasing the chance of good plans first time 100

101 Adaptive Query Optimization What happens during the execution of a statement? Use the actual number of rows to choose appropriate join methods and parallel distribution methods during execution Adaptive Plans REACTING Use the actual number of rows to re-optimize on subsequent execution Automatic Re-Optimization LEARNING Use the actual number of rows to direct the gathering of statistics for future compilations of similar statements Adaptive Statistics LEARNING 101

102 The Real World The Adaptive Plans features in 12.1 have very few reported problems These features is enabled by default in 12.2 The Adaptive Statistics features in 12.1 have many reported problems Inefficient execution plans while the system is learning Unpredictable execution plans as Oracle searches for better plans Long parse times These features is disabled by default in

103 Changes Between 12.1 and 12.2 Oracle 12.1 optimizer_adaptive_features Default: TRUE optimizer_adaptive_plans Default: TRUE optimizer_adaptive_statistics Default: FALSE Oracle 12.2 Adaptive Plans Adaptive Statistics 103

104 Recommended Strategy for 12.1 Install patch for BUG# Splits the parameter optimizer_adaptive_features into two parameters optimizer_adaptive_plans enabled by default optimizer_adaptive_statistics disabled by default Install patch for BUG# Disables the automatic creation of extended statistics by default Details in MOS Note

105 Advanced Optimizer Engine 4. Get the right plan if the other methods fail Inputs Execution Plan Plan Management Hints

106 Execution Plan Management Several features guide or restrict the Optimizer to use specific plans SQL Plan Baselines constrains the optimizer to only select from a set of accepted plans for a SQL statement Controlled through SQL Plan Management SQL Profiles auxiliary information specific to a SQL statement which guides the optimizer to a better plan These features are useful to protect against plan changes which could adversely affect performance These features are widely misused to compensate for poor inputs to the optimizer engine

107 Hints Hints can be used to influence the execution plan chosen by the optimizer Hints are a useful tool during development and testing to check the performance impact of a different execution plan Using hints in a production environment is poor practice It is difficult to apply hints appropriately for a large number of SQL statements They do not allow enhancements to the optimizer and execution engine to be used It is safer and easier to correct deficiencies in the optimizer inputs

108 Changes to the Optimizer Engine Database Versions and Patches The set of algorithms which make up the optimizer change between database releases Optimizer changes can also be introduced in patches and patch sets If the quality of inputs is poor, such as inaccurate statistics, this can lead to execution plan differences between versions

109 Changes to the Optimizer Engine Database Parameters Database parameters influence or restrict the behavior of the optimizer optimizer_* parameters Hidden underscore parameters DANGER! changing these parameters is often done to compensate for poor inputs and they should be left at the default values

110 Agenda The Optimizer Optimizer Inputs Optimizer Output Advanced Optimizer Behavior Why is my SQL slow? Optimizer Edges Cases and Top SQL Mistakes

111 Why is My SQL Slow?

112 Problem Query Table has 1B rows and is 55 GB.

113 Problem Query Query 1 consists of two subqueries. The first subquery finds all of the Ferraris.

114 Problem Query The second subquery finds all of the Ferrari 458s.

115 Problem Query Outer query performs aggregations. Outer query joins the results of the subqueries.

116 Problem Query Query 2 is the same but has different predicate values.

117 Default Statistics

118 Default Statistics Query 1 takes 40 seconds with default statistics

119 Default Statistics

120 Default Statistics Query 2 takes 3 seconds with default statistics

121 Default Statistics

122 Default Statistics Development Findings Baseline Performance for Query 1 exceeds target Baseline Performance for Query 2 meets target

123 Initial Optimization Steps More Predicate Values

124 More Predicate Values Increase the list of predicate values Now query 1 takes 2 seconds

125 More Predicate Values

126 More Predicate Values Development Findings Query runs faster just by changing the list of values in the select list Plan changed from a broadcast to a hash distribution due to the higher but inaccurate cardinality estimate Get correct plan with wrong cardinality estimate can lead to inconsistent plans and performance

127 Initial Optimization Steps Increase Degree of Parallelism

128 Degree of Parallelism Change DoP from 32 to 128 Now query takes 2 seconds

129 Degree of Parallelism

130 Degree of Parallelism Development Findings Changing DoP from 32 to 128 improves performance and meets the target; 4X more resources yields a 20X performance improvement Plan has changed from a broadcast distribution to a hash distribution due to DoP change DoP is a resource management technique, not a query tuning tool

131 Indexes

132 Indexes Development Findings Indexes Indexes on columns: owner_id country make model country, make, model

133 Indexes Add indexes and query takes longer 58 seconds!

134 Indexes Query performance varies depending on whether the index is cached or not Index lookups on millions of rows is slow

135 Indexes Development Findings Indexes Not understanding the big/little data challenge Indexes are not efficient for operations on a large numbers of rows Full table scan is faster with predictable performance

136 To Index or Not Indexing is an OLTP technique for operations on a small number of rows A table scan may consume more resources but it will be predictable no matter how many rows are returned Indexes impact DML operations If I/O bandwidth went from 70MB/sec to 70GB/sec would you change your optimization/execution strategy?

137 To Index or Not Index driven query retrieving 1,000,000 rows Assume the index is cached and the data is not. 1,000,000 random 5ms per I/O This would require 5000 Seconds ( or over 1 hour ) to Execute How much data could you scan in 5000 Seconds with a fully sized I/O system able to scan 25 GB/Sec? Over 100 TB!

138 Histograms

139 Histograms Rerun stats to get histograms no change in plan or run time

140 Histograms Lots of wait time on temp IO

141 Histograms Development Findings Re-gathered stats to automatically create histograms Frequency histograms on country, make and model columns No change in plan query still exceeds target

142 Flash Temp

143 Flash Temp Query time reduced to 23 seconds

144 Flash Temp Now IO accounts for a smaller percentage of database time

145 Flash Temp Development Findings Most of the wait time was spent performing IO on temp, so move temp to flash disks Improved performance but still does not meet target Not a good use of flash Incorrect use of tools/products

146 Manual Memory Parameters

147 Manual Memory Parameters

148 Manual Memory Parameters Very little IO in database time Poor cardinality estimate 256K estimated rows vs 40M actual rows Increased memory size manually now there is no use of temp

149 Manual Memory Parameters Development Findings Set sort_area_size and hash_area_size to 2G Eliminated temp usage but still did not meet target Memory is allocated per parallel server process, which can quickly exceed resources Moving to a solution before understanding the problem

150 Cardinality Estimates

151 Cardinality Estimates

152 Cardinality Estimates Plan switches from a broadcast to a hash distribution Use cardinality hint to specify correct number of rows

153 Cardinality Estimates Cardinality Hint SQL Monitor showed poor cardinality estimates Cardinality hint gives optimizer the correct number of rows for the table scan Plan changed from a broadcast to hash distribution Query time now meets target Now temp is not an issue

154 Disable Broadcast Distribution

155 Disable Broadcast Distribution

156 Disable Broadcast Distribution Disable broadcast distribution and now we have the hash distribution as with the cardinality hint

157 Disable Broadcast Distribution Development Findings Google reveals a hidden parameter to disable broadcast distribution Plan and run times are similar to cardinality hint, meeting target Moving to a solution before understanding the problem

158 Second Query with Broadcast Distribution Disabled

159 Query 2: Broadcast Distribution Disabled

160 Query 2: Broadcast Distribution Disabled Query 2 also uses a hash distribution but no longer meets the target

161 Query 2: Broadcast Distribution Disabled Development Findings Plan uses a hash distribution Exceeds target

162 Second Query with Broadcast Distribution Enabled

163 Query 2: Broadcast Distribution Enabled

164 Query 2: Broadcast Distribution Enabled Reset parameter to enable broadcast distribution now query 2 uses a broadcast and meets the target

165 Query 2: Broadcast Distribution Enabled Development Findings Reset _parallel_broadcast_enabled Plan now uses a broadcast distribution Meets target Should not change system parameters to tune one query

166 Extended Stats

167 Extended Stats

168 Extended Stats Created column group but still have a poor cardinality estimate

169 Extended Stats Development Findings High correlation between Country, Make and Model columns Created column group Query still exceeds target Still have poor cardinality estimate

170 Histogram on Column Group

171 Histogram on Column Group

172 Histogram on Column Group With a histogram on the column group we now have a good cardinality estimate Now we get a hash distribution and meet the target

173 Histogram on Column Group Development Findings Re-gathered stats after running the query with the column groups Frequency Histogram on the column group Accurate cardinality estimates Optimizer now uses a hash distribution

174 Second Query with Histogram on Column Group

175 Query 2: Histogram Column Group

176 Query 2: Histogram Column Group Query 2 also has a good cardinality estimate And uses a broadcast distribution

177 Query 2: Histogram Column Group Development Findings Accurate cardinality estimates Optimizer uses a broadcast distribution on second query

178 Histogram on Column Groups Now we have the correct solution! Both queries have good cardinality estimates Correct plans Meet targets

179 Auto Column Group Creation: Seed Column Usage

180 Auto Column Group Creation

181 Auto Column Group Creation Back to the default stats while seeding column usage poor cardinality estimate as seen earlier and a broadcast distribution for query 1

182 Auto Column Group Creation: Seed Column Usage Development Findings Start with default statistics Execute dbms_stats.seed_col_usage to monitor column usage Run query

183 Auto Column Group Creation: Create Extended Stats

184 Auto Column Group Creation

185 Auto Column Group Creation With the column group identified and created, we have a good cardinality estimate And we get a hash distribution

186 Auto Column Group Creation: Create Extended Stats Development Findings dbms_stats.report_col_usage shows column groups identified during Seed Column Usage dbms_stats.create_extended_stats creates column groups identified Automatically identifies usage of Country, Make and Model columns together and creates column group

187 Auto Column Group Creation: Create Extended Stats Development Findings Regather stats Automatically creates Histogram on the column group Query meets target

188 What Did We Learn?

189 Root Causes of Suboptimal Database Performance Using DoP for query tuning Indexes for large data sets Temp on flash Forcing use of more memory Disable broadcast distribution 7/16/2018

190 Agenda SQL and the Optimizer You As The Optimizer Optimization Strategies Why is my SQL slow? Optimizer Edges Cases Top SQL Mistakes

191 Optimizer Edge Cases Always stale statistics Arbitrary high and low values Correlation Functions

192 Optimizer Edge Cases Always stale statistics Scenario: Data is loaded into a partitioned table as it is received Statistics are gathered by nightly maintenance process Resulting stats indicate 0 rows in partition Poor cardinality estimates and plans Resolution: Copy statistics from the previous partition, or set statistics to appropriate values Lock statistics to prevent updates by maintenance job Optimizer is able to use representative stats to get more accurate cardinality estimates Optional: Unlock and gather new statistics after the partition is fully populated

193 Optimizer Edge Cases Always stale statistics TABLE_NAME PARTITION_NAME INT GLOBAL_STATS NUM_ROWS SAMPLE_SIZE LAST_ANALYZED LINEORDER_STALE R1992 NO YES OCT-14 LINEORDER_STALE R1993 NO YES OCT-14 LINEORDER_STALE R1994 NO YES OCT-14 LINEORDER_STALE R1995 NO YES OCT-14 LINEORDER_STALE R1996 NO YES OCT-14 LINEORDER_STALE R1997 NO YES OCT-14 LINEORDER_STALE R1998 NO YES 0 24-OCT-14 7 rows selected.

194 Optimizer Edge Cases Always stale statistics

195 Optimizer Edge Cases Representative statistics TABLE_NAME PARTITION_NAME INT GLOBAL_STATS NUM_ROWS SAMPLE_SIZE LAST_ANALYZED LINEORDER_YR R1992 NO YES OCT-14 LINEORDER_YR R1993 NO YES OCT-14 LINEORDER_YR R1994 NO YES OCT-14 LINEORDER_YR R1995 NO YES OCT-14 LINEORDER_YR R1996 NO YES OCT-14 LINEORDER_YR R1997 NO YES OCT-14 LINEORDER_YR R1998 NO YES OCT-14 7 rows selected.

196 Optimizer Edge Cases Representative statistics

197 Optimizer Edge Cases Arbitrary High and Low Values C1 C2 1 date' ' 2 date' ' 3 date' ' 4 date' ' 5 date' ' 6 date' '

198 Optimizer Edge Cases With Arbitrary High Value for Unknown Dates CREATE TABLE ( LINEORDER "LO_ORDERKEY","LO_LINENUMBER","LO_CUSTKEY","LO_PARTKEY","LO_SUPPKEY" NUMBER NOT NULL ENABLE NUMBER NUMBER NOT NULL ENABLE NUMBER NOT NULL ENABLE NUMBER NOT NULL ENABLE,"LO_ORDERDATE" DATE NOT NULL ENABLE...,"LO_CLOSEDATE" DATE DEFAULT '31-DEC-2999' ) ;

199 Optimizer Edge Cases With Arbitrary High Value for Unknown Dates

200 Optimizer Edge Cases With Nulls to Represent Unknown Dates CREATE TABLE ( LINEORDER "LO_ORDERKEY","LO_LINENUMBER","LO_CUSTKEY","LO_PARTKEY","LO_SUPPKEY" NUMBER NOT NULL ENABLE NUMBER NUMBER NOT NULL ENABLE NUMBER NOT NULL ENABLE NUMBER NOT NULL ENABLE,"LO_ORDERDATE" DATE NOT NULL ENABLE...,"LO_CLOSEDATE" ) ; DATE

201 Optimizer Edge Cases With Nulls to Represent Unknown Dates

202 Optimizer Edge Cases Correlation When different columns in a given table have values that are correlated Make and model of a car, a phone,. Month and zodiac sign City and airport Correlation causes an under-estimate in the cardinality because predicates are seen as independent by the Optimizer

203 Optimizer Edge Cases Correlation

204 Optimizer Edge Cases Functions and Wildcards Functions and wildcards make it very difficult to obtain good cardinality estimates In many cases the optimizer will simply guess at a cardinality estimate based upon 1% or 5% of the rows in a table Techniques such as dynamic sampling and extended statistics should be evaluated to improve the cardinality estimates.

205 Optimizer Edge Cases Rounding Function SELECT d_sellingseason, p_category, s_region FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE lo_orderdate between to_date('15-jun-1993','dd-mon-yyyy') and to_date('15-jun-1995','dd-mon-yyyy') AND d_monthnuminyear in (12, 1) AND p_container AND p_color in ('JUMBO PACK') in ('red') AND round(lo_extendedprice) > ORDER BY d_sellingseason, p_category, s_region

206 Optimizer Edge Cases Rounding Function

207 Optimizer Edge Cases Substring Function SELECT d_sellingseason, p_category, s_region FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE lo_orderdate between to_date('15-jun-1993','dd-mon-yyyy') and to_date('15-jun-1995','dd-mon-yyyy') AND d_monthnuminyear in (12, 1) AND SUBSTR(p_container,-4,4) in ('DRUM') AND p_color in ('red') ORDER BY d_sellingseason, p_category, s_region

208 Optimizer Edge Cases Substring Function

209 Optimizer Edge Cases Using Extended Statistics for Substring Function BEGIN DBMS_STATS.gather_table_stats( ownname => DW', tabname => PART', method_opt => 'for all columns size skewonly for columns (SUBSTR(P_CONTAINER,-4,4)) ); END; / PL/SQL procedure successfully completed. select extension_name, extension from user_stat_extensions where table_name='part'; EXTENSION_NAME EXTENSION SYS_STUW_AMVY8N$KU59A 847#Z7P (SUBSTR("P_CONTAINER",(-4),4))

210 Optimizer Edge Cases Using Extended Statistics for Substring Function

211 Agenda SQL and the Optimizer You As The Optimizer Optimization Strategies Why is my SQL slow? Optimizer Edges Cases Top SQL Mistakes

212 Top SQL Mistakes Missing Joins Implicit or Wrong Data Type Conversions More Top SQL Mistakes shown in the Reference Materials

213 Top SQL Mistakes Missing Joins There should be n-1 join conditions in a query, where n is the number of tables in the query block, otherwise Cartesian products will occur Often a problem with programmers new to SQL developing/testing on small datasets where a DISTINCT is used to reduce the rows This is often not seen as a performance problem until the datasets become large! If the query aggregates the data, the total number of rows returned would be unchanged, but the values are likely to be incorrect

214 Top SQL Mistakes Missing Join: 5 tables, 3 joins SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder, customer, date_dim, part, supplier WHERE -- lo_custkey lo_orderdate AND lo_partkey AND lo_suppkey = c_custkey AND = d_datekey = p_partkey = s_suppkey AND d_year in (1993, 1994, 1995) AND d_monthnuminyear in (12, 1) AND p_container AND p_color in ('JUMBO PACK') in ('red') GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region

215 Top SQL Mistakes Missing Join: 5 tables, 3 joins

216 Top SQL Mistakes With All Joins: 5 tables, 4 joins SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder, customer, date_dim, part, supplier WHERE lo_custkey AND lo_orderdate AND lo_partkey AND lo_suppkey = c_custkey = d_datekey = p_partkey = s_suppkey AND d_year in (1993, 1994, 1995) AND d_monthnuminyear in (12, 1) AND p_container AND p_color in ('JUMBO PACK') in ('red') GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region

217 Top SQL Mistakes With All Joins: 5 tables, 4 joins

218 Top SQL Mistakes Implicit or Wrong Data Type Conversions SQL does not enforce much data type checking on SQL statements Where there are data type mismatches SQL will automatically cast/convert data into the appropriate type to execute the SQL statement This may result in the following effects Increased resource usage converting data types Poor execution plans avoiding indexes and partition pruning Failed SQL statements as data values may not convert correctly

219 Top SQL Mistakes Implicit or Wrong Data Type Conversions Common in columns that contain numeric data but are never used for arithmetic operations telephone numbers credit card numbers check numbers When a programmer references these columns care must be made to ensure the bind variables are type VARCHAR2 and not numbers

220 Top SQL Mistakes Implicit or Wrong Data Type Conversions SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE d_year AND p_container in ('1993', '1994', '1995') in ('JUMBO PACK') GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region

221 Top SQL Mistakes Implicit or Wrong Data Type Conversions SQL> desc date_dim Name Null? Type D_DATEKEY D_DATE D_DAYOFWEEK D_MONTH D_YEAR D_YEARMONTHNUM D_YEARMONTH D_DAYNUMINWEEK D_DAYNUMINMONTH... NOT NULL DATE VARCHAR2(18) VARCHAR2(10) VARCHAR2(9) NUMBER NUMBER VARCHAR2(7) NUMBER NUMBER

222 Top SQL Mistakes No Data Type Conversion SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE d_year in (1993, 1994, 1995) AND p_container in ('JUMBO PACK') GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region

223 Top SQL Mistakes Implicit or Wrong Data Type Conversions SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE d_year in (1993, 1994, 1995) AND c_phone = GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region This SQL statement is vulnerable to: Bad plans because index cannot be used 1722 errors if there is non numerical characters Increased resource usage converting data Poor cardinality estimates

224 Top SQL Mistakes Implicit or Wrong Data Type Conversions SQL> desc customer Name Null? Type C_CUSTKEY C_NAME C_ADDRESS C_CITY C_NATION C_REGION C_PHONE C_MKTSEGMENT NOT NULL NUMBER VARCHAR2(25) VARCHAR2(25) VARCHAR2(10) VARCHAR2(15) VARCHAR2(12) VARCHAR2(15) VARCHAR2(10)

225 Top SQL Mistakes Implicit or Wrong Data Type Conversions SELECT d_sellingseason, * ERROR at line 1: ORA-12801: error signaled in parallel query server P00E, instance scao08adm01.us.oracle.com:imtst1 (1) ORA-01722: invalid number

226 Top SQL Mistakes No Data Type Conversion SELECT d_sellingseason, p_category, s_region, sum(lo_extendedprice) FROM lineorder JOIN customer ON lo_custkey = c_custkey JOIN date_dim ON lo_orderdate = d_datekey JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE d_year in (1993, 1994, 1995) AND c_phone = GROUP BY d_sellingseason, p_category, s_region ORDER BY d_sellingseason, p_category, s_region

227 Top SQL Mistakes No Data Type Conversion

228 Top SQL Mistakes Implicit or Wrong Data Type Conversions SELECT /*+ MONITOR */ ; p_category, s_region, sum(lo_extendedprice) FROM lineorder_mon JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE lo_orderdate between to_timestamp('15-jun-1993','dd-mon-yyyy') AND p_container GROUP BY p_category, s_region ORDER BY p_category, s_region and to_timestamp('15-jun-1995','dd-mon-yyyy') in ('JUMBO PACK') This SQL statement is vulnerable to: Bad plans because partition pruning may not take place Poor cardinality estimates

229 Top SQL Mistakes Implicit or Wrong Data Type Conversions SQL> desc lineorder Name Null? Type LO_ORDERKEY LO_LINENUMBER LO_CUSTKEY LO_PARTKEY LO_SUPPKEY LO_ORDERDATE LO_ORDERPRIORITY LO_SHIPPRIORITY LO_QUANTITY LO_EXTENDEDPRICE LO_ORDTOTALPRICE... NOT NULL NUMBER NUMBER NOT NULL NUMBER NOT NULL NUMBER NOT NULL NUMBER NOT NULL DATE VARCHAR2(15) VARCHAR2(1) NUMBER NUMBER NUMBER

230 Top SQL Mistakes Implicit or Wrong Data Type Conversions

231 Top SQL Mistakes Implicit or Wrong Data Type Conversions

232 Top SQL Mistakes Correct Data Type Conversion SELECT /*+ MONITOR */ p_category, s_region, sum(lo_extendedprice) FROM lineorder_mon JOIN part ON lo_partkey = p_partkey JOIN supplier ON lo_suppkey = s_suppkey WHERE lo_orderdate between to_date('15-jun-1993','dd-mon-yyyy') GROUP BY p_category, s_region ORDER BY p_category, s_region and to_date('15-jun-1995','dd-mon-yyyy')

233 Top SQL Mistakes Correct Data Type Conversion

234 Top SQL Mistakes Correct Data Type Conversion

235 Reference Section

236 Tools for SQL Statement Analysis

237 Tools For SQL Analysis Static Analysis Dynamic Analysis Tracing and Debugging

238 Tools for SQL Analysis Static Analysis Explain Plan and dbms_xplan.display SQL*Plus Autotrace

239 Static Analysis Explain Plan Overview Provides an indication of the possible execution plan DDL e.g. CREATE TABLE AS SELECT DML Queries Strengths Available in every installation Does not require execution of the SQL statement

240 Static Analysis Explain Plan Weaknesses Bind variables All binds treated as VARCHAR No bind peeking Beware of PLAN_TABLE inherited from previous releases Based on expectations rather than reality

241 Static Analysis Explain Plan Predicate Information Filters Transformations Candidates for Offload Bloom Filters Notes Section Automatic Degree of Parallelism (aka Auto DoP) Rationale for choice of DoP Dynamic Sampling When no statistics are available When the optimizer chooses to use dynamic sampling in the evaluation of parallel queries Cardinality Feedback

242 Static Analysis Explain Plan needs to be formatted Plan hash value: Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT (0) 00:00:01 1 TABLE ACCESS BY INDEX ROWID EMP (0) 00:00:01 * 2 INDEX RANGE SCAN EMP_N1 3 1 (0) 00:00: Predicate Information (identified by operation id): access("deptno"=10)

243 Static Analysis Explain Plan Formatting SET TAB OFF SET TRIMSPOOL ON Use fixed width font Preserve spaces Avoid truncating or wrapping long lines

244 Static Analysis Explain Plan Plan hash value: Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT (0) 00:00:01 1 TABLE ACCESS BY INDEX ROWID EMP (0) 00:00:01 * 2 INDEX RANGE SCAN EMP_N1 3 1 (0) 00:00: Predicate Information (identified by operation id): access("deptno"=10)

245 Static Analysis Explain Plan More Background Explain the Explain Plan Written by Maria Colgan

246 Static Analysis SQL*Plus Autotrace SQL> set linesize 132 tab off SQL> set autotrace traceonly explain SQL> SELECT * FROM EMP WHERE DEPTNO = 10; Execution Plan Plan hash value: Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT (0) 00:00:01 1 TABLE ACCESS BY INDEX ROWID EMP (0) 00:00:01 * 2 INDEX RANGE SCAN EMP_N1 3 1 (0) 00:00: Predicate Information (identified by operation id): access("deptno"=10)

247 Static Analysis SQL*Plus Autotrace SQL> set autotrace traceonly explain statistics SQL> SELECT * FROM EMP WHERE DEPTNO = 10; Execution Plan Statistics recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 1159 bytes sent via SQL*Net to client 525 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed

248 Static Analysis SQL*Plus Autotrace Overview Provides an indication of the possible execution plan DML Queries Strengths Built in functionality with some useful shortcuts May not require execution of queries Weaknesses Much the same as Explain Plan

249 Tools for SQL Analysis Dynamic Analysis dbms_xplan.display_* SQL Monitor

250 Dynamic Analysis DBMS_XPLAN.DISPLAY_* Overview Gives the actual execution plan from various sources Cursor cache dbms_xplan.display_cursor AWR dbms_xplan.display_awr SQL Tuning Set dbms_xplan.display_sqlset

251 Dynamic Analysis dbms_xplan.display_cursor() SQL> set feedback off linesize 132 pagesize 0 tab off SQL> SELECT * FROM EMP WHERE DEPTNO = :n; SQL> select * from table(dbms_xplan.display_cursor()); SQL_ID 5xt3urx81f9th, child number SELECT * FROM EMP WHERE DEPTNO = :n Plan hash value: Id Operation Name Rows Bytes Cost (%CPU) Time SELECT STATEMENT 2 (100) 1 TABLE ACCESS BY INDEX ROWID EMP (0) 00:00:01 * 2 INDEX RANGE SCAN EMP_N1 3 1 (0) 00:00: Predicate Information (identified by operation id): access("deptno"=:n)

252 Dynamic Analysis dbms_xplan.display_cursor() Various formatting options Default is usually sufficient ALLSTATS LAST can be useful when using the GATHER_PLAN_STATISTICS hint to gather execution statistics Beware of overhead Ignore the following artifact in parallel queries storage(:z>=:z AND :Z<=:Z)

253 Dynamic Analysis SQL Monitor Overview Feature of SQL Tuning Pack Generation Oracle Enterprise Manager dbms_sqltune.report_sql_monitor Text, HTML or Active Active is preferred

254 Dynamic Analysis SQL Monitor set trimspool on set trim on set pagesize 0 set linesize 1000 set long set longchunksize spool sqlmon_previous.html select dbms_sqltune.report_sql_monitor( session_id=>sys_context('userenv','sid'), report_level=>'all', type=>'active') from dual; spool off

255 Dynamic Analysis SQL Monitor By default SQL Monitor is limited to 300 plan lines. If the report exceeds 300 lines, it will not show up in v$sql_monitor or the EM SQL Monitoring Page. This value can be increased using the_sqlmon_max_planlines parameter alter session set _sqlmon_max_planlines =500; 300 is usually sufficient and should only be changed if needed

256 Dynamic Analysis SQL Monitor

257 Performance Diagnosis and Tuning EM Access to Reports SQL Monitor can be accesses by navigating Performance Tab SQL Monitoring Link Select SQL Statement of Interest to drill down Show recent SQL statements running for more than 5 seconds. SQL statement details can quickly highlight issues such as skew in PQ execution non-representative statistics (large difference between estimated rows and actual rows)

258 EM Monitor Report Select SQL Monitoring

259 EM Monitor Report Select SQL of Interest

260 EM Monitor Report

261 Tools for SQL Analysis Tracing Active Session History Discussed in OLTP session

262 Tracing Overview Tracing of calls and optionally wait events and bind values Raw tracefile can be useful to identify anomalies Generally post-process using tkprof (trace kernel profile) Strengths Difficult to get some of the information any other way Weaknesses Tracing needs to be enabled Multiple tracefiles generated for parallel execution, making almost useless Runtime and space overhead

263 Tracing Embed an identifier in the name of the tracefile SQL> alter session set tracefile_identifier = my_10046_trace; Level 1 is equivalent to SQL_TRACE = TRUE SQL> alter session set events = '10046 trace name context forever, level 1';

264 Tracing Use dbms_monitor to set it Easy way to enable/disable tracing in other sessions SQL> execute dbms_monitor.session_trace_enable() SQL> execute dbms_monitor.session_trace_disable()

265 Tracing Trace Level Functionality Off (specified in place of forever ) Disable Tracing 1 Equivalent to SQL_TRACE=TRUE 4 Include Bind Values 8 Include Wait Events 12 Include Bind Values and Wait Events

266 Tracing select * from emp where deptno = :n call count cpu elapsed disk query current rows Parse Execute Fetch total Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 88 (TEACHER) Number of plan statistics captured: 2 Rows (1st) Rows (avg) Rows (max) Row Source Operation TABLE ACCESS BY INDEX ROWID EMP (cr=4 pr=0 pw=0 time=152 us cost=2 size=114 card=3) INDEX RANGE SCAN EMP_N1 (cr=2 pr=0 pw=0 time=120 us cost=1 size=0 card=3)(object id 64363)

267 Tracing Rows Execution Plan SELECT STATEMENT MODE: ALL_ROWS 3 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'EMP' (TABLE) 3 INDEX MODE: ANALYZED (RANGE SCAN) OF 'EMP_N1' (INDEX) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited Waited Disk file operations I/O SQL*Net message to client SQL*Net message from client ********************************************************************************

268 Tools for SQL Analysis Debugging Superseded by SQL_COMPILER trace SQL Test Case Builder SQLTXPLAIN my.oracle.support Doc ID

269 Debugging SQL_COMPILER Embed an identifier in the name of the tracefile: SQL> alter session set tracefile_identifier = SQLCOMP_trace; For all SQL statements: SQL> alter session set events = 'trace [SQL_COMPILER.*]'; For a specific SQL Identifier: SQL> alter session set events = 'trace [sql_compiler] [SQL:7h35uxf5uhmm1]';

270 Debugging SQL Test Case Builder Overview Export the execution environment for a SQL statement Generation Oracle Enterprise Manager dbms_sqldiag.export_sql_testcase dbms_sqldiag.import_sql_testcase TIP: Avoid the need to quote quotes, for example: SQL> variable i clob SQL> exec :i := q'#select * from emp where ename = 'KING'#'

271 Debugging SQL Test Case Builder variable i CLOB variable o CLOB BEGIN :i := q'#select * from emp where deptno = 10#'; END; / BEGIN dbms_sqldiag.export_sql_testcase ( directory => 'TEACHER_TCB_EXPORT',sql_text => :i,testcase => :o ); END; /

272 Debugging SQL Test Case Builder BEGIN dbms_sqldiag.import_sql_testcase ( directory => 'TEACHER_TCB_IMPORT',filename => 'oratcb1_171f07e50002main.xml' ); END; /

273 More Top SQL Mistakes 7/16/2018

274 More Top SQL Mistakes Parse Errors and Incorrect SQL Sending SQL that is syntactically incorrect is very expensive to the database and very historically difficult to diagnose exception handling/throwing errors is expensive Bad SQL leaves no trace within the shared pool because it is bad and hence unshareable Statistic parse count(failures) indicates it may be happening The danger is clever programmer that sends bad SQL to see if it parsed to determine version or other information

275 More Top SQL Mistakes Parse Errors and Incorrect SQL SQL> select pk_id from history_version_2 where pk_id is NULL; select pk_id from history_version_2 where pk_id is NULL; * ERROR at line 1: ORA-00942: table or view does not exist

276 ANSI Outer Join

Real-World Performance Training SQL Performance

Real-World Performance Training SQL Performance Real-World Performance Team Agenda 1 2 3 4 5 6 SQL and the Optimizer You As The Optimizer Optimization Strategies Why is my SQL slow? Optimizer Edges Cases