Real-World Performance Training Extreme OLTP Performance

Similar documents
Real-World Performance Training Core Database Performance

Oracle 1Z0-054 Exam Questions and Answers (PDF) Oracle 1Z0-054 Exam Questions 1Z0-054 BrainDumps

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

Oracle EXAM - 1Z Oracle Database 11g: Performance Tuning. Buy Full Product.

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Anthony AWR report INTERPRETATION PART I

Oracle Database 12c: Performance Management and Tuning

Oralogic Education Systems

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

What is Real Application Testing?

Oracle Database 10g The Self-Managing Database

Database Performance Analysis Techniques Using Metric Extensions and SPA

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

<Insert Picture Here> Maximizing Database Performance: Performance Tuning with DB Time

What every DBA needs to know about JDBC connection pools Bridging the language barrier between DBA and Middleware Administrators

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Moving Databases to Oracle Cloud: Performance Best Practices

Oracle Database 11g: SQL Tuning Workshop

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

End-to-end Management with Grid Control. John Abrahams Technology Sales Consultant Oracle Nederland B.V.

Toad for Oracle Suite 2017 Functional Matrix

Managing Oracle Database 12c with Oracle Enterprise Manager 12c

Heckaton. SQL Server's Memory Optimized OLTP Engine

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

<Insert Picture Here> DBA s New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g

Diagnostics in Testing and Performance Engineering

Understanding Oracle RAC ( ) Internals: The Cache Fusion Edition

Oracle Performance Tuning. Overview of performance tuning strategies

Real-World Performance 2017

In the Oracle Database 12c: Performance Management and

<Insert Picture Here> DBA Best Practices: A Primer on Managing Oracle Databases

Katharina Römer Principal Sales Consultant STCC Stuttgart ORACLE Deutschland GmbH

What every DBA needs to know about JDBC connection pools * Bridging the language barrier between DBA and Middleware Administrators

Oracle Diagnostics Pack For Oracle Database

Oracle Database 11g: Real Application Testing & Manageability Overview

Oracle Database 12c Performance Management and Tuning

Session 1079: Using Real Application Testing to Successfully Migrate to Exadata - Best Practices and Customer Case Studies

<Insert Picture Here> Controlling resources in an Exadata environment

Oracle Database 11g: Self-Managing Database - The Next Generation

Automatic Workload Repository: Soup to Nuts. Graham Wood Doug Burns

Oracle Database 12c: JMS Sharded Queues

Using Automatic Workload Repository for Database Tuning: Tips for Expert DBAs. Kurt Engeleiter Product Manager

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Oracle Database 11g: Performance Tuning DBA Release 2

RAC Performance Monitoring and Diagnosis using Oracle Enterprise Manager. Kai Yu Senior System Engineer Dell Oracle Solutions Engineering

Oracle 1Z Oracle Database 11g- Administrator I. Download Full Version :

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

Optimize Your Databases Using Foglight for Oracle s Performance Investigator

IBM Education Assistance for z/os V2R2

Jyotheswar Kuricheti

Using Oracle STATSPACK to assist with Application Performance Tuning

Monitoring & Tuning Azure SQL Database

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Amazon Aurora. User Guide

1z0-062.exam.215q 1z0-062 Oracle Database 12c: Installation and Administration

Oracle 10g Self-Management Framework Internals: Exploring the Automatic Workload Repository. Open World September 2005

EMC Unisphere for VMAX Database Storage Analyzer

Configuring JDBC data-sources

(10393) Database Performance Tuning Hands-On Lab

Exadata Implementation Strategy

Enterprise Manager: Scalable Oracle Management

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

MAXGAUGE for Oracle Web Version 5.3

Learning Objectives : This chapter provides an introduction to performance tuning scenarios and its tools.

What's new in MySQL 5.5? Performance/Scale Unleashed

InnoDB: Status, Architecture, and Latest Enhancements

Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it

Manage Change With Confidence: Upgrading to Oracle Database 11g with Oracle Real Application Testing

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

PERFORMANCE TUNING TRAINING IN BANGALORE

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g

Architecting & Tuning IIB / extreme Scale for Maximum Performance and Reliability

Oracle Database 11g: Performance Tuning DBA Release 2

Essential (free) Tools for DBA!

Exadata Implementation Strategy

Microsoft SQL Server Fix Pack 15. Reference IBM

The Oracle DBMS Architecture: A Technical Introduction

1Z Upgrade to Oracle Database 12cm Exam Summary Syllabus Questions

Performance improvements in MySQL 5.5

Removing the I/O Bottleneck in Enterprise Storage

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Database Management and Tuning

1-2 Copyright Ó Oracle Corporation, All rights reserved.

Oracle Database 11g: Manageability Overview. An Oracle White Paper August 2007

Data Management in Application Servers. Dean Jacobs BEA Systems

Oracle Database 18c and Autonomous Database

OpenEdge 12.0 Database Performance and Server Side Joins. Richard Banville Fellow, OpenEdge Development October 12, 2018

Oracle Database 11g : Performance Tuning DBA Release2

Oracle 1Z0-497 Exam Questions and Answers (PDF) Oracle 1Z0-497 Exam Questions 1Z0-497 BrainDumps

Oracle Enterprise Manager Cloud Control 12 c Oracle Enterprise Manager Grand Tour Hands On Lab

1Z Oracle Database Performance and Tuning Essentials 2015 Exam Summary Syllabus Questions

Manjunath Subburathinam Sterling L2 Apps Support 11 Feb Lessons Learned. Peak Season IBM Corporation

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

Rhapsody Interface Management and Administration

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Key to A Successful Exadata POC

Optimizing Database I/O

Transcription:

Real-World Performance Training Extreme OLTP Performance Real-World Performance Team

Extreme OLTP Workloads Small transactions Processing small numbers of rows Fast (single-digit millisecond) response times Large numbers of users It s all about efficiency Get-in-get-out Be nice to others

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Connection Pools Architectural Challenge The primary reason for escalation of OLTP systems into the Real-World Performance Group is poor connection pooling strategies. Symptoms of a poor connection strategy: A high number of connections to the database ( 1,000s ) A dynamic connection pool with a large number of logon/off to the database ( > 1/Sec ) Periods of acceptable performance and then unexplainable/undebuggable periods of poor performance/availability The inability to determine what is happening in real time

Connection Pools Connection Pools

Connection Pools Performance Data The workload is incrementally increased by doubling the load. System appears scalable up to 65% CPU on the DB server.

Connection Pools Performance Data A checkpoint is initiated, creating a spike that results in unpredictable response time

Connection Pools Performance Data A slight increase to the workload results in a disproportionate CPU increase, while response time degrades and the system througput becomes unstable. System monitoring tools become unreliable

Connection Pools Performance Data Reducing the connection pool by 50% results in more application server queuing and fewer DB processes in a wait state. Observable improvement in response time and a consistent transaction rate.

Connection Pools Performance Data Another slight increase to the workload results in a an unstable system with spikes in response time and throughput

Connection Pools Performance Data Connection pool reduced to 144. Note improvement in response time and transaction rate. CPU utilization is also reduced.

Connection Pools Staying in Control In order to stay in control of an OLTP system, the DBA must be able to: Query the system Extract trace files Debug issues Without these tools, when a race condition occurs, it becomes impossible to determine the root cause of any problem or to prevent the problem from reoccurring Resource Management: Learning how to manage and control resources RM is not a turbo, it is a gatekeeper RM will slow/stop some processes so that other process can run efficiently

Connection Pools Control By reducing the CPU_COUNT in the resource manager, the database can be throttled back. Note the increase in response time and the wait event resmgr: cpu quantum

Connection Pools Race Conditions Connection Storms SQL Plan degradations

Connection Pools Connection Storms May be caused by application servers that allow the size of the pool of database connections to increase rapidly when the pool is exhausted A connection storm may take the number of connections from hundreds to thousands in a matter of seconds The process creation and logon activity may mask the real problem of why the connection pool was exhausted and make debugging a longer process A connection storm may render the database server unstable, unusable

Connection Pools The Anatomy of a Connection Storm

Connection Pools The Anatomy of a Connection Storm

Connection Pools The Anatomy of a Connection Storm

Connection Pools SQL Plan Degradations SQL Plan changes can be caused by multiple issues: Change in init.ora parameters that influence the optimizer RDBMS version change New Schema statistics Bind Peeking Poor SQL can also be introduced by: Badly tested new application code Tools version changes All have the same impact they introduce SQL that is executing sub optimally resulting in increases of: CPU I/O

Connection Pools SQL Plan Degradations The increases in CPU and I/O can be orders of magnitude greater than the optimal SQL Plan The impact is usually to overload the system beyond the ability of the system to respond When the number of concurrent sub optimal SQL statements > CPUs a race condition will take place Symptoms include System appears hung Unable to get keyboard response from the Database Server even at the console To regain control you may have pull the power from the HW

Connection Pools SQL Plan Degradations SQL Plan degradations are going to happen, it is important to be able to diagnose them and obtain the debug info before you loose the system. To assist in this process there things that can be done to help set init.ora(cpu_count) to 75% of the actual CPUs and invoke the default resource plan or construct a resource plan leaving some resources idle Limit connections to the DB to 2-3 times the number of Cores Ensure a DB connection pool cannot grow under load

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Serialization Observations The current system performance is acceptable New requirement: Capture transaction summary statistics and store data in a flattened history table for further analysis Implementations of the INSERT statement causes a dramatic reduction in system performance The accepted solution must scale, as the application will be deployed globally in the new year

Serialization

Serialization Performance Data Transaction rate: 52,000 TPS Response time: 2 ms Dominant wait events: log file sync and CPU

Serialization Performance Data Transaction rate: 5,300 TPS Response time: 18 ms Dominant wait events: buffer busy waits and enq:tx index contention

Serialization Reverse Key Index Transaction rate: 19,000 TPS Response time: 5 ms Contention reduced while the history table remains small. CPU increases with workload.

Serialization Reverse Key Index Transaction rate: 3,000 TPS Response time: 33 ms Contention shifts to I/O subsystem when the history table grows larger

Serialization Hash Partition Index Transaction rate: 20,000 TPS Response time: 5 ms Contention reduced by hash partitioned history table index

Serialization Add RAC node Transaction rate: 21,000 TPS Response time: 9 ms RAC appears not to scale well with contention in cluster waits

Serialization Composite Key Index Transaction rate: 35,000 TPS Response time: 5 ms RAC configuration scales well with no contention or gc waits

Serialization Data Analysis INSERT statement results in index contention due to surrogate primary key generated by using a sequence Solution 1: REVERSE KEY primary key index This proves unsatisfactory after the table has grown to a critical size Solution 2: HASH PARTITION primary key index This proves unsatisfactory after moving to a RAC cluster Solution 3: Code the sequence to ensure no index contention New sequence structure: instance# session id sequence Solution 3 demonstrates optimal scaling and response time characteristics

The Scalable Key Code INSERT INTO clo_normal.log_big_clever VALUES ( 1E+25 * USERENV('INSTANCE') + 1E+20 * mod(userenv('sid'),100) + clo_normal.big_clever_seq.nextval, 0,0,'card id','hallid','tid',sysdate,1,12,'wheelpos',123456,123456,123456,0,1);

Serialization Performance Testing Results 25000 35 20000 30 25 15000 10000 20 15 Tx Rate per sec Response Time (ms) 10 5000 5 0 Normal Reverse Key Small Reverse Key Big Hash Big Hash Big 2 Nodes Smart Big 2 Nodes 0

Response Time Debugging by Numbers Response time impacted by non database code Application server time Code problem Capacity planning Response time impacted by database internal contention Right growing index problem Reverse Key Index ( Good for small tables ) Hash partitioned Index ( Good for large tables, poor with RAC ) Non contentious primary key ( Good in all scenarios, application code change required )

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Leaking

Leaking Observations Intermittent error: ORA-01000: Maximum number of cursors exceeded. Application server fails and must be restarted The DBA has suggested that the init.ora parameter open_cursors be reset to 30,000 to make the problem go away for a while. Symptoms of cursor leaking

Leaking Performance Data Error message: ORA-01000 Maximum open cursors exceeded SQL*Net break/reset to client

Leaking Session Details SQL*Net break/reset to client

Leaking Cursor Data Cursor list with Count > 1 implies leaked cursors

Leaking Observations After a period of time, the system performance begins to decline and then degrades rapidly After rapid degradation, the application servers time out and the system is unavailable The DBA claims the database is not the problem and simply needs more connections The init.ora parameter processes is increased to 20,000

Leaking Performance Data Load diminishes to zero

Leaking Session Leaking Due to coding errors on exception handling, the application leaks connections in the connection pool making them programmatically impossible to use This reduces the effective size of the connection pool The remaining connection are unable to keep up with the incoming workload The rate of connection leakage is accelerated until there are no useable connections left in the pool

Leaking Session Leaking Potential indicators of session leaking: Frequent application server resets init.ora parameters process and sessions set very high Configuration of large and dynamic connection pools Large number of idle connections connected to the database Free memory on database server continually reduced Presence of idle connection kill scripts or middleware configured to kill idle sessions

Leaking Observations Without warning, the database appears to hang and the application servers time out simultaneously The DBA sees that all connections are waiting on a single lock held by a process that has not been active for a while. Each time the problem occurs, the DBA responds by running a script to kill sessions held by long time lock holders and allowing the system to restart.

Leaking Performance Data Leaked lock holder causes entire system to stall

Leaking Blocking Tree Block tree reveals lock holder which can be killed

Leaking Performance Data enq: TX row lock contention

Leaking Performance Data System returns to normal when lock is released. There could have been logical data corruption and it may happen again

Leaking Lock Leaking Lock leaking is usually a side effect of session leaking and the exception handling code failing to execute a commit or rollback in the exception handling process. A leaked session may be programmatically lost to the connection while holding locks and uncommitted changes to the database.

Leaking Lock Leaking Programming error impact: Potential system hangs: all connections queue up for the held lock Potential database logical corruptions: end users may have thought transactions were committed when in fact they have not been If sessions return to the connection pool but still have uncommitted changes, it is not deterministic, if and/or when the changes are committed or rolled back. This is a serious data integrity issue.

Leaking How to Develop High Performance Applications Developer Bugs Incorrect/untested exception handling Cursor, session and lock leaking High values for init.ora ( open_cursors, processes, sessions ) Idle process and lock holder kill scripts Oversized connection pools of largely idle processes

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Database / Middleware Interaction

Database / Middleware Interaction Scenario Devices ship files. Files read and processed by multiple application servers Each application server uses multiple threads that connect to database through a connection pool which is distributed by a scan listener over two instances.

Database / Middleware Interaction Problem It s too slow It s a problem with the database Look at all those waits Need to be able to process an order of magnitude more data Obviously need to move to Hadoop

Database / Middleware Interaction Analysis Only small amount of data being processed. Both instances essentially idle with most processes waiting in RAC and concurrency waits.

Database / Middleware Interaction Solution Remove all of those RAC waits by running against a single database instance.

Database / Middleware Interaction Analysis Throughput up by factor of 2x RAC waits gone CPU time actually visible High concurrency waits Buffer busy Tx index contention

Database / Middleware Interaction Solution Reduce contention waits by processing a file entirely within a single application server

Database / Middleware Interaction Analysis Throughput improved again Concurrency events reduced but still present

Database / Middleware Interaction Solution Introduce affinity for a related set of records to a single thread by hashing All records for the same primary key processed by single thread so no contention in index for same primary key vale

Database / Middleware Interaction Analysis More throughput Log file sync predominant event CPU usage close to core count

Database / Middleware Interaction Solution Reintroduce RAC to add more CPU resource Implement separate service for each instance Connect application server to one instance

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Extreme IO

Extreme IO Performance Data Database slow down is magnified in application server Erratic database performance

Extreme IO Performance Data Note smoothing effect of async logging. Beware of data integrity issues! log file sync events have been replaced with log buffer space events

Extreme IO Resolution Performance is consistent as flash cache protects system from I/O surges With uniform log file sync events, we now see a slight slow down of other DB I/O

Extreme IO Data Analysis Average response times for log writes mask outlier values of very high write times > 1 sec ( log file parallel write ) Investigation reveals surges in workloads on shared storage from unrelated workloads The business requires full data integrity. Asynchronous logging is not an option due to potential data loss There are three potential solutions: Eliminate background workloads Relocate log files to dedicated devices Use Exadata flash logging

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

Main Tools for OLTP Diagnostics AWR Report Diagnostic information on the instance or database level Information for the system as a whole ADDM Report Recommendations based on AWR information ASH Report More granular information such as the session level

AWR Report Instance, database reports of database activity HTML reports are preferred SQL>REM SQL>REM For Current Database Instance SQL>@?/rdbms/admin/awrrpt.sql SQL>REM SQL>REM For Full RAC Clusterwide Report SQL>@?/rdbms/admin/awrgrpt.sql SQL>REM SQL>REM For SQL Exec History(Detecting Plan Changes) SQL>@?/admin/awrsqrpt.sql

AWR Report

AWR Report EM Access to Reports AWR report can be accessed by navigating Database Tab Automatic Workload Repository Select snapshots You may find the command line interface is quicker!

AWR Report EM Access to Reports

ADDM Report ADDM Performs performance diagnostic analysis and makes recommendations for improvement. The ADDM report should accompany any AWR report as a matter of standard practice SQL>REM SQL>REM For Current Database Instance SQL>@?/rdbms/admin/addmrpt.sql

ADDM Report

ADDM Report EM Access to Reports ADDM report can be accessed by navigating Performance Tab Historical Right hand drop down menu Select snapshot (Run ADDM) Button

ADDM Report EM Access Select Historical

ADDM Report EM Access Choose analysis icon of interest Run ADDM Report

ASH Report ASH Active Session History reports can provided fine granularity tightly scoped reports e.g. for a short time period (< AWR interval) or for an individual session or a particular module. SQL>REM SQL>REM For Current Database Instance SQL>@?/rdbms/admin/ashrpt.sql

ASH Report

ASH Report EM Access to Reports ASH report can be accessed by navigating Performance Tab Top Activity Link Slide moving window over period of interest (Run ASH Report) Button

ASH Report Select Top Activity Link

ASH Report Slide to period of Interest Run ASH Report

ASH Report

Agenda 1 2 3 4 5 6 7 Connection Pools Serialization Leaking Database / Middleware Interaction Extreme IO Tools for Diagnostics Analysis

AWR Architecture Analysis More than just wait events and top SQL Large amount of data in the AWR report Tells us about the way that the system has been architected and designed as well as about how it is performing Often see common mistakes

AWR from online system Ready for Black Friday?

AWR from Online system Testing system for Black Friday readiness Cannot generate load expected on test system Do you see any problems with this system scaling up from this test? Will we survive Black Friday?

AWR Header First-level bulleted text is Calibri 28 pt Second-level text (press tab key) is 24 pt Calibri is the only font used in the template All bulleted text is sentence case (capitalize first letter of first word) Use bold, red, or both to highlight ONLY KEY text 32 Cores available Over processed Sessions is 100x cores Session count growing Session leak Dynamic connection pools Cursors per session growing Cursor leakage

Load Profile ~260 sessions active on average ~40 on CPU Only have 32 cores System CPU limited 10 logons per second In a stable system? Session leaks Dynamic connection pools 40% of user txns are rollbacks Coding for failure

Init.ora Underscore parameters Db_block_size=16384 Cursor_sharing=FORCE Db_file_multiblock_read_count=32

Init.ora Db_writer_processes=12 On a system that supports asynchio? Open_cursors=2000 Per session limit Implies cursor leaking

Init.ora Optimizer_index_cost_adj=50 Classic hack parameter Processes=5500 Sessions=8320

Top events Where is the time going? Concurrency waits > 35% of time Library cache: mutex X Latch:row cache objects Typical of high CPU load A symptom, not the problem Row lock contention 18% of time IO with 7ms avg read time CPU only 15% of DB Time Log file sync?

Top SQL Where is the time going? Top statement SELECT /*SHOP*/ YFS_ORDER_HEADER.* FROM YFS_ORDER_HEADER WHERE (ORDER_HEADER_KEY = :1 ) FOR UPDATE 12% of load 2 million executions Average execution 0.1 sec

Top SQL Top statement SELECT /*SHOP*/ YFS_ORDER_HEADER.* FROM YFS_ORDER_HEADER WHERE (ORDER_HEADER_KEY = :1 ) FOR UPDATE 40% of RLWs are on that object

Top SQL Next two statements Call of DBMS_APPLICATION_INFO Application instrumentation 14% of load 26M executions each Instrumentation is a good thing BUT Not needed since Oracle 10g Use parameters to OCI or Java instead

Other SQL

Online Summary Not looking good for Black Friday System is CPU bound at test load levels System seems to be leaking both cursors and sessions (and maybe locks) System is running far too many processes High overhead application instrumentation

Reference Section

Operating System Diagnostics ExaWatcher and OSWatcher Captures Operating system data top vmstat netstat iostat mpstat Described in MOS Notes 1617454.1 and 301137.1. Note that Exadata deployment is a modified version of OSWatcher that is installed by default on all database nodes and storage cells.

Operating System Diagnostics ExaWatcher and OSWatcher Lives in /opt/oracle.exawatcher /opt/oracle.oswatcher/osw The collected data will be under /opt/oracle.exawatcher/archive /opt/oracle.oswatcher/osw/archive one subdirectory per collector Should purge data after seven days