1
Oracle Performance Tuning Boot Camp: 10 New Problem- Solving Tips Using ASH & AWR Debaditya Chatterjee Vitor Promeet Mansata 2
3 types of Performance Management Reactive Performance Management Proactive Performance Management Preventive Performance Management 3
4
Reactive Performance Management 1.Comparing Performance Across Two Time Periods 2. Database Hang Analysis 3.SQL Performance Analysis 5
Comparing Performance Across Two Periods Performance was fine yesterday, today my application is really slow? Inconsistent Performance Over utilization of system resources High load ad hoc query consuming resources Change in execution plan of query Parallel execution downgrade 6
Compare Period ADDM SQL Commonality AWR Snapshot Period 1 Regressed SQL I/O Bound AWR Snapshot Period 2 Compare Period ADDM Analysis Report Undersized SGA Full ADDM analysis across two AWR snapshot periods Detects causes, measure effects, then correlates them Causes: workload changes, configuration changes Effects: regressed SQL, reach resource limits (CPU, I/O, memory, interconnect) Makes actionable recommendations along with quantified impact 7
Compare Period ADDM: Method Identify what changed Configuration changes, workload changes 30% smaller Buffer cache 10% new SQL Quantify performance differences Uses DB Time as basis for measuring performance Top SQL increased 45% Read I/O up 55% Identify root cause Correlate performance differences with changes Buffer cache reduction caused read I/O increase 8
Reactive Performance Management 1.Comparing Performance Across Two Time Periods Compare Period ADDM 2.Database Hang Analysis 3.SQL Performance Analysis 9
Database Hang Analysis My database has hung? I do not want to bounce it again Database Hung state Blocking Sessions Memory allocation issues Library cache issues Unresponsive Storage (ASM) Interconnect problems 10
Real-Time ADDM Architecture EM Agent Diagnostic Connection Unresponsive DB Deadlocks Hangs Enterprise Manager JDBC Connection Real time analysis Database ADDM Analysis Latches Uses a pre-established diagnostic connection for unresponsive systems Initiates a standard JDBC connection for real-time analysis Diagnostic connection collects data without holding latches or running SQL First intelligent advisor to diagnose problems in real-time as they occur, no matter how sick the system is 11
Real-Time ADDM Real-time analysis of hung or slow database systems Holistically identify global resource contentions and deadlocks Quantified performance impact Precise, actionable recommendations Provide cluster-wide analysis for RAC 12
Reactive Performance Management 1.Comparing Performance Across Two Time Periods Compare Period ADDM 2.Database Hang Analysis Real-Time ADDM 3.SQL Performance Analysis 13
SQL Performance Analysis I enabled parallel query, yet this query is taking so long. Can you take a look? Parallel Downgrades Uncontrolled parallel execution Parallel Server availability Object level settings Session level settings 14
Real-Time SQL Monitoring Insert executed with parallel hint. 15
Real-Time SQL Monitoring Parallel Tab Parallel Coordinator busy for the entire duration!! 16
Real-Time SQL Monitoring Enabled Parallel DML Parallel Slaves busy for the entire duration!!! 17
Reactive Performance Management 1.Comparing Performance Across Two Time Periods Compare Period ADDM 2.Database Hang Analysis 3.SQL Performance Analysis Real-Time ADDM SQL Monitoring 22
23
Proactive Performance Management 4.Proactively Monitoring Long Running Programs 5.Analyzing Transient Performance Problems Understanding Workload Profile 6. Correlating ASH & AWR 7. Using ASH Analytics 24
Reactive Tracing of long running programs? Can you trace my program? What is wrong with tracing? A very reactive way of looking at problems Overhead of writing data to trace files Programs we want to trace are usually the ones with issues Impacts the performance of the production system 25
Real Time Database Operation Monitoring NEW Database Operation (DBOPs) Simple DBOP (already supported in 11g) A SQL statement (e.g. SQL for DSS, batch/report SQL, runaway SQL) A PL/SQL procedure/function Composite (new in 12g) Session(s) activity between two points of time defined by application code / DBA For example, SQL*Plus script, batch job, ETL processing, At most one DBOP per DB session 26
Naming a Database Operation Naming or Tagging Bracketing EXPLICIT BEGIN_OPERATION SQL PL/SQL Blocks SQL SQL END_OPERATION IMPLICIT DBOP (Tag) SQL PL/SQL Blocks SQL SQL 27
Real Time Database Operations Monitoring Database monitoring of application jobs Grouping of SQLs, sessions for the application job Key scenarios: ETL operations, Quarter End Close job Real time monitoring driven by application specified tagging Automatically tag Data pump jobs Tagging ability in PLSQL, OCI, JDBC Avoids the overhead of SQLTrace Visibility of Top SQL statements, system and session performance metrics 28
Proactive Performance Management 4.Reactive Tracing of Long Running Programs Database Operations 5.Analyzing Transient Performance Problems Understanding Workload Profile 6. Correlating ASH & AWR 7. Using ASH Analytics 29
Analyzing Transient Performance Problems What happened last night the batch job took twice the time to finish? No way to detect transient issues We look at AWR data Averaged out over the snapshot window On-disk ASH Data Sampled every 10 seconds Very difficult to detect such issues in the past 30
ADDM Compare Period ADDM Real-Time ADDM Enhanced Real-Time ADDM Automatic Performance Diagnostics Diagnose persistent performance issues Uses AWR snapshots Regular interval Automatic or Manual Coarse grain performance comparison across two periods Relies on AWR data Manual Hung or extremely slow databases Uses a normal and diagnostic mode connection Manual NEW Proactively detect and diagnose transient high-impact problems Built inside the database Automatic Runs every 3 seconds 31
Real-Time ADDM Automatic real time problem detection and analysis Runs every 3 seconds Database self-monitors for serious performance issues Recognize bad performance trends and trigger analysis : High CPU, I/O spikes, memory, interconnect, hangs, deadlocks Identify a problem before it threatens application performance Short duration (5 min spikes) ADDM analysis Actionable advice for critical issues Richer data set available for analysis Reports (analysis and data) stored in AWR for historical analysis ADDM, SQL Monitoring reports NEW 32
Triggering Conditions # Rule Condition 1 High Load Average active sessions greater than 3 times the number of CPU cores 2 I/O bound Impact on active sessions based on single block read performance 3 CPU bound Active sessions greater than 10% of total load and CPU utilization great than 50% 4 Over-allocated memory Allocation over 95% of physical memory 5 Interconnect bound Single block interconnect transfer time based 6 Session Limit Session limit close to 100% 7 Process Limit Process limit close to 100% 8 Hung Session Significant number of hung sessions. If this number is greater than 10% of total sessions 9 Deadlock Detected Any deadlock detected by hang analyzer 33
Real-Time ADDM Report 34
Proactive Performance Management 4.Reactive Tracing of Long Running Programs Database Operations 5.Analyzing Transient Performance Problems Real-Time ADDM Understanding Workload Profile 6. Correlating ASH & AWR 7. Using ASH Analytics 35
Understanding Workload Profile The SQL Response Metric crossed the warning threshold. What is wrong? Several factors can impact SQL Response time Increased or unusual load on system Hardware Issues Runaway queries consuming system resources Changes in execution plans Missing or stale object statistics Need a mechanism to quickly analyze in-memory performance data 36
2009 Amadeus IT Group SA The largest transaction processor in travel Transaction-based business model Operate globally in the growing travel and technology market Two highly synergistic and profitable businesses: Distribution and IT solutions Travel buyers Consumers/ General public Corporate travel departments Travel providers 711 airlines (over 420 bookable) 24 Insurance companies 50+ cruise and ferry lines 207 tour operators 110,000+ hotel properties 30 car rental companies 95 railways IT SOLUTIONS Including direct distribution technology Common / overlapping platforms & applications Common data centre Common customers Common sales & marketing infrastructure DISTRIBUTION BUSINESS Provision of indirect distribution services Travel agencies Travel Management companies Business travel agencies Leisure travel agencies Online travel agencies Consolidators Single-site agency Travel search companies Airline sale offices and airline websites connected to Amadeus direct sell technology
2009 Amadeus IT Group SA Operational Oracle DB s Some numbers (Production only): 53 Oracle DB s 30 MySQL DB s 80 Clusters 700 TB DB Volume 4 PB Storage Volume Technologies Stack 2: Oracle 10.2.0.3 in HP-UX 11.11 (and RHEL), with Symantec Volume Manager and Clusterware RAC and Single Instance Stack 3: Oracle 11.2.0.2 in HP-UX 11.21 and RHEL 5.7 with Oracle Grid Infrastructure RAC and RAC One Stack 4: Oracle 11.2.0.3 in RHEL 5.7 with Oracle Grid Infrastructure RAC and RAC One 38
2009 Amadeus IT Group SA DB Response time analysis - AWR AWR top 5 section shows the Wait Class which contributes most to DB wait time Foreground Wait Class section in AWR to see distribution of DB waits over Waits classes 39 Objects involved in TX row lock contention can be identified in Segment Statistics section of AWR
2009 Amadeus IT Group SA From AWR to ASH ASH report for the period of increase of Application waits will show the same waits as AWR Can I get the Application Module which suffered from this type of contention? 40
2009 Amadeus IT Group SA Extracting more data from ASH Identify SQL statements and sessions impacted by waits on Application Wait Class 41
2009 Amadeus IT Group SA Extracting more data from ASH Get a list of blocking sessions and DB objects! 42
Understanding Workload Profile Graphical ASH report for advanced analysis Provides visual filtering for recursive drill-downs Select any time period for analysis Analyze performance across many dimensions Different visualizations: Stacked chart or Tree Map Collaborate with others using Active Reports 43
Proactive Performance Management 4.Reactive Tracing of Long Running Programs Database Operations 5.Analyzing Transient Performance Problems Real-Time ADDM Understanding Workload Profile 6. Correlating ASH & AWR 7. Using ASH Analytics ASH Analytics 44
45
Preventive Performance Management 8.Prevent Regression After Upgrade 9.Ensure Optimal Resource Allocation 10.Prevent Performance Issues Due To Application Changes 46
47
Using SQL Profiles to regress to an older plan LinkedIn s ERP systems were being upgraded from 10g to 11g Presence of a large amount of custom code Limited Time frame to complete the upgrade. Management concern about System performance Initial testing showed no major problems/concerns A week before go-live several potential showstopper performance issues were noticed. 48
The Approach Re-writing or tuning several pieces of code was not feasible in a short window of time. Decision to use either SQL Profiles or Baselines to regress to the 10g plan in the interim 49
Using OEM to regress back to the old plan Run the job that calls the badly performing SQL. In OEM open the performance tab and search for the session by using the SID (or any other criteria) 50
Run SQL Tuning Advisor Run Schedule Tuning Advisor by drilling down to the session and clicking on the sqlid. 51
Run SQL Tuning Advisor You can compare the explain plan and see the new explain plan in the same window Click Implement to implement the SQL profile. DONE!!! 52
Preventive Performance Management 8.Prevent Regression After Upgrade SQL Tuning Advisor 9.Ensure Optimal Resource Allocation 10.Prevent Performance Issues Due To Application Changes 53
Ensure Optimal Resource Allocation In a consolidated environment how can I ensure one database is not running away with all my system resources? Database resource manager directives prevent a single session to run away with all resources In DB 12c CDB level resource plans ensure optimal resource allocations across PDBs Create a resource allocation strategy Allocate appropriate CPU and I/O (Exadata) across PDBs 54
Allocating Resources in DB 12c NEW No Resource Allocation Gives maximum flexibility for each PDB Allows any PDB to consume all available resource Risky as one PDB can run away with all resources. Specify a minimum allocation Ensures all PDBs get a specific share of the resources Allows any PDB to consume any unused resources Kicks in at 100% resource utilization. Assumes that not all PDBs will use its allocated resources Specify a minimum and maximum Ensures all PDBs get a specific share of the resources Prevents a PBD from taking more than the maximum value assigned. May result in unused capacity 55
Setting up Resource Manager in Oracle Enterprise Manager Extremely simple to manage the CDB resource plans using Enterprise Manager UI 56
Preventive Performance Management 8.Prevent Regression After Upgrade SQL Tuning Advisor 9.Ensure Optimal Resource Allocation DB Resource Manager 10.Prevent Performance Issues Due To Application Changes 57
Prevent performance issues due to Application Changes The new BI system has very aggressive SLAs defined. How can we ensure consistent performance across the system? Code migration, new indices, objects can often impact performance of the application How do we validate the performance of critical queries before rolling out these changes? 58
Validate Impact of custom code migration State 1 Trial 1 Trial 2 State 1 Custom Code Changes State 2 Use SPA Guided Workflow (recommended) or PL/SQL APIs Create a SQL tuning set of the top X (20 or 30) queries Establish first trial remotely using current state baseline Make change Create the indexes or migrate custom code Establish second trial remotely using the same SQL Tuning Set Review SPA report and rollout or rollback changes. 59
Take the Guess Work Out! Run your trial before and after migrating the change Make sure your most important queries are not regressed Take the guess work out 60
Preventive Performance Management 8.Prevent Regression After Upgrade SQL Tuning Advisor 9.Ensure Optimal Resource Allocation DB Resource Manager 10.Prevent Performance Issues Due To Application Changes SQL Performance Analyzer 61
62
63