Using Oracle STATSPACK to assist with Application Performance Tuning Scenario You are experiencing periodic performance problems with an application that uses a back-end Oracle database. Solution Introduction Before we begin we will assume that STATSPACK has been setup and is configured to run for intervals of 15 minutes. For more information on installation and configuration of the STATS PACK please review the following URL: http://download.oracle.com/docs/cd/b10500_01/server.920/a96533/statspac.htm#27255 The STATSPACK utility is one mechanism used to monitor the performance of an Oracle database. The STATSPACK utility provides a capability to analyze database statistics with the use of snapshots at different times and generating reports on the differences. The Collection of Statistics is called a snapshot. Snapshots are a point in time collection of the statistics available via the V$ views and they are given a Snap_ID value to identify them. Reports can be generated on the changes between any 2 snapshots. Snapshots are taken for the following reasons: To evaluate performance during specific tests of the system. To evaluate performance changes over a long period of time. To support different collection levels STATSPACK provides the level parameter. By default this is set to 5 and is adequate for most reports. A level 5 gives the same statistics for lower levels, plus high-resourceusage SQL statements. The default value 5 offers a significant degree of flexibility during the queries for the most resource-intensive SQL statements. To understand how to configure the level parameters further please review the URL: http://www.adp-gmbh.ch/ora/tuning/statspack.html A STATSPACK report provides the following key information though the report is not limited to these. Instance Information such as the database ID and name, Versions, Operating system, hostname and so on. Database cache size information for the buffer cache, log buffer and shared pool. Overall load statistics, by second and by transaction, such as the amount of redo generated, number of transactions, statements executed, and so on. Efficiency percentages, also known as hit ratios, such as library cache hits, buffer hits, soft parse percentages and so on. Shared pool Utilization showing memory usage over the observed period of time. Top 5 timed events what you have been waiting for/waiting on. A report of all wait events in the system during the observed period of time.
Various top-sql reports, such as the SQL with the most buffer gets (those that do the most logical I/O), the SQL with the most physicals reads, the most executed SQL, the most frequently parsed SQL and so on. A statistics report showing all of the various counters for the observed period of time, such as how many enqueue waits there were (locks that caused a wait), how many physical reads, how many disk sorts and so on. I/O reports by tablespace and file and so on. Reports are a function of the level of a snapshot taken and the reporting options used. A report could be 20 pages long and considered a small report, however you should be able to skim and absorb the contents of a STATSPACK report in minutes as demonstrated in the remainder of this article. We recommend that a STATSPACK report be captured at 15 minute intervals (Note: A job can be scheduled to run a STATSPACK capture). This time period will allow us to focus in on specific time periods were performance problems have been experienced. In the remainder of this article, we will review what to look for in a STATSPACK report. Below is a standard headers for a STATSPACK report. STATSPACK report for Database DB Id: 4041216860 Instance: APPINS Inst Num: 1 Startup Time: 05-Jul-10 00:00 Release: 10.2.0.4.0 RAC: NO Host Name: hpclom01 Num CPUs: 20 Phys Memory (MB): 98,253 Snapshot Snap Id Snap Time Sessions Curs/Sess Comment Begin Snap: 18358 06-Jul-10 14:00:58 82 14.0 End Snap: 18367 06-Jul-10 17:01:21 81 15.0 Elapsed: 180.38 (mins) Notice the Elapsed time period for this report; this is to be large enough to contain meaningful data and small enough to be relevant. As noted previously we would like snapshot periods to be 15 minutes in duration, not the 180.83 minutes outlined in this report. With this type of elapsed time period problems are very hard to identify and are the equivalent of looking for a needle in a haystack. We will keep analyzing this statspack and see what else we can find. Now reviewing the Load Profile section of the report. Load Profile Per Second Per Transaction Redo size: 10,104.92 8,516.90 Logical reads: 454,381.94 382,974.51 Block changes: 33.82 28.51 Physical reads: 2.56 2.15 Physical writes: 4.78 4.03 User calls: 77.96 65.71 Parses: 30.52 25.72 Hard parses: 0.12 0.11 Sorts: 8.94 7.53 Logons: 0.00 0.00 Executes: 31.26 26.34 Transactions: 1.19
% Blocks changed per Read: 0.01 Recursive Call %: 13.98 Rollback per transaction %: 2.94 Rows per Sort: 109.91 We focus in on 3 items, Hard Parses, Executes (How many statements we are executing per second/transaction), and Transactions (How many transactions per second we process) to get a quick health check of system performance. These values give an overall view of the load on the server. In this case we are looking at a light system load, just 1 2 transactions per second. Next we look at the Instance Efficiency Percentages section of the report: Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 100.00 In-memory Sort %: 100.00 Library Hit %: 98.85 Soft Parse %: 99.59 Execute to Parse %: 2.36 Latch Hit %: 69.38 Parse CPU to Parse 73.77 % Non-Parse CPU: 93.15 Elapsd %: Shared Pool Statistics Begin End Memory Usage %: 81.39 80.26 % SQL with executions>1: 82.76 87.11 % Memory for SQL w/exec>1: 81.58 85.49 The 3 items highlighted in bold, Library Hit, Soft Parse and Execute to parse ratios explain how well the shared pool is being utilized. This is the area which can achieve huge gains in performance. In this case the Library Hit and Soft Parse ratios are high and these are good values. If the Library Hit Ratio was low it could be indicative of a shared pool that is too small or that the system did not make use of bind variables in the application. The Soft Parse % value is one of the most important ratios in the database. It should be near to 100% as possible. In our case the Execute to Parse % is too low. It is possible in this case that the application in question is not using shareable SQL, or the database has sub-optimal parameters that are reducing the effectiveness of cursor sharing. A problem like excessive parsing is likely to manifest itself as additional network traffic between the application server and clients. The additional parse activity may also show up as a marked increase in CPU consumption on the database server. Note: All Oracle SQL statements must be parsed the first time that they execute, and parsing involves a syntax check, a semantic check (against the dictionary), the creation of a decision tree, and the generation of the lowest cost execution plan. Once the execution plan is created, it is stored in the library cache (part of the shared_pool_size) to facilitate re-execution. There are two types of parses: Hard parse - A new SQL statement must be parsed from scratch. If the database is parsing every statement that is executing, the parse to execute ratio will be close to 1% (high hard parses), often indicating non-reentrant SQL that does not use host variables Soft parse - A reentrant SQL statement where the only unique feature are host variables
Next moving on to the Top 5 Timed Events section of the report Event Waits Time (s) Avg wait (ms) %Total Call Time latch: cache buffers chains 112,325,351 118,283 1 91.3 wait list latch free 575,195 9,031 16 7.0 CPU time 1,344 1.0 db file parallel write 40,238 317 8.2 log file parallel write 13,682 180 13.1 Latch:Cache buffers Chains are high. This Oracle metric is used to protect a buffer list in the buffer cache. These latches are used when searching for, adding, or removing a buffer from the buffer cache. Contention on this latch usually means that there is a block that is greatly contended for (known as a hot block). This is a high value so contention should be reviewed. wait list latch free are high. The latch free wait occurs when the process is waiting for a latch held by another process. Latch free waits are usually due to SQL without bind variables, but buffer chains and redo generation can also cause them. CPU time is the amount of time that the Oracle database spent processing SQL statements, parsing statements, or managing the buffer cache. If this is the main timed event, tuning SQL statements and/or increasing server CPU resources will provide the greatest performance improvement. In this case it is low and can be ignored. db file parallel write - The DBWR process produces this wait event as it writes dirty blocks to the data files. This event can cause poor read performance, and the writes may interfere with reads from the data files. Moving the tables that are experiencing the highest write activity to solid state disks may help to alleviate this wait event. In this case it is low and can be ignored. log file parallel write - This event occurs when Oracle is waiting for the completion of writes to the redo log files. Moving some or all copies of your redo logs logs to the WriteAccelerator can reduce the amount of time spent waiting for this event. In this case it is low and can be ignored. Now moving on to the I/O statistics. Tablespace IO Stats ->ordered by IOs (Reads + Writes) desc Tablespace Reads AV Reads/s AV Rd(ms) AV Blks/Rd Writes AV Writes/s Buffer Waits Av Buf Wt(ms) REMDBP_DATA 23,144 2 4.8 1.0 17,291 2 2 0.0 STATSPACK 1,846 0 5.9 1.0 21,852 2 0 0.0 SYSAUX 1,173 0 2.2 1.0 713 0 0 0.0 REMDBP_UND 4 0 10.0 1.0 986 0 2 0.0 SYSTEM 268 0 8.2 1.0 297 0 0 0.0 TOOLS 72 0 0.0 1.0 61 0 0 0.0 The Load Profile stats. Logical reads: 454,381/s Parses: 30.52/s Physical reads: 2/s Hard parses: 0.12/s Physical writes: 4/s Transactions: 1.19/s Rollback per transaction: 2.94% Buffer Nowait: 100%
This database has a relatively high logical I/O at 454,381 reads per second. Logical Reads includes data block reads from both memory and disk. High LIO is sometimes associated with high CPU activity. CPU bottlenecks occur when the CPU run queue exceeds the number of CPUs on the database server, and this can be seen by looking at the "r" column in the vmstat UNIX/Linux utility or within the Windows performance manager. Consider tuning your application to reduce unnecessary data buffer touches (SQL Tuning or PL/SQL bulking), using faster CPUs or adding more CPUs to your system. Next moving on we will now take a look at the Instance Activity stats. Statistic Total per Second per Trans SQL*Net roundtrips to/from client 583,850 54.0 45.5 consistent gets 4,917,339,356 454,341.6 382,940.5 consistent gets - examination 2,353,067,601 217,413.6 183,246.5 db block changes 366,040 33.8 28.5 execute count 338,286 31.3 26.3 parse count (hard) 1,352 0.1 0.1 parse count (total) 330,307 30.5 25.7 physical reads 27,659 2.6 2.2 physical reads direct 15,138 1.4 1.2 physical writes 51,706 4.8 4.0 physical writes direct 1,504 0.1 0.1 redo writes 13,682 1.3 1.1 session cursor cache hits 314,843 29.1 24.5 sorts (memory) 96,741 8.9 7.5 table fetch continued row 10,106,317 933.8 787.0 table scans (long tables) 222 0.0 0.0 table scans (short tables) 22,589 2.1 1.8 workarea executions - onepass 0 0.0 0.0 The database has 217,413.6 consistent gets examination per second. "Consistent gets - examination" is different than regular consistent gets. It is used to read undo blocks for consistent read purposes, but also for the first part of an index read and hash cluster I/O. To reduce logical I/O, you may consider moving your indexes to a large blocksize tablespace. Because index splitting and spawning are controlled at the block level, a larger blocksize will result in a flatter index tree structure. In our case we have 10,106,317 table fetch continued row actions during this period. Migrated/chained rows always cause double the I/O for a row fetch and "table fetch continued row" (chained row fetch) happens when we fetch BLOB/CLOB columns (if the avg_row_len > db_block_size), when we have tables with > 255 columns, and when PCTFREE is too small. In our case we may need to reorganize the affected tables with the dbms_redefintion utility and re-set your PCTFREE parameters to prevent future row chaining. We have high small table full-table scans, at 2.1 per second. Verify that our KEEP pool is sized properly to cache frequently referenced tables and indexes. Moving frequently-referenced tables and indexes to SSD or the WriteAccelerator will significantly increase the speed of small-table full-table scans. Next moving on to the Buffer Pool Advisory. Buffer Pool Advisory Current: Optimized: Improvement: 3,419,000 disk reads 2,948,000 disk reads 13.78% fewer The Oracle buffer cache advisory utility indicates 3,419,000 disk reads during the sample interval. Oracle estimates that doubling the data buffer size (by increasing db_cache_size) will reduce disk reads to
2,948,000, a 13.78% decrease. Next moving on to the init.ora parameters. Parameter Value cursor_sharing force db_block_size 8,192 log_archive_start true optimizer_index_caching 100 optimizer_mode choose pga_aggregate_target 1.48GB query_rewrite_enabled true session_cached_cursors 1,000 _optimizer_cost_model choose We are not using your KEEP pool to cache frequently referenced tables and indexes. This may cause unnecessary I/O. When configured properly, the KEEP pool guarantees full caching of popular tables and indexes. Remember, an average buffer get is often 100 times faster than a disk read. Any table or index that consumes > 10% of the data buffer, or tables & indexes that have > 50% of their blocks residing in the data buffer should be cached into the KEEP pool. We can fully automate this process using scripts. The optimizer_index_cost_adj parameter is an initialization parameter that can be very useful for SQL tuning. It is a numeric parameter with values from zero to 10,000 and a default value of 100. It can also be enabled at the session level by using the alter session set optimizer_index_cost_adj = nn syntax. This parameter lets you tune the optimizer behavior for access path selection to be more or less index friendly, and it is very useful when you feel that the default behavior for the CBO favors full-table scans over index scans. We currently have obsolete parameters set. These parameters are: log_archive_start. Oracle classifies non-current parameters as either depreciated or obsolete. Depreciated parameters will prevent your database from starting up, whereas obsolete parameters will simply display a warning and may be ignored.