WELCOME Unterstützung von Tuning- Maßnahmen mit Hilfe von Capacity Management DOAG SIG Database 28.02.2013 Robert Kruzynski Principal Consultant Partner Trivadis GmbH München BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 1
Our company. Trivadis is a market leader in IT consulting, system integration and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. 2
With over 600 specialists and IT experts in your region. Hamburg 11 Trivadis branches and more than 600 employees 200 Service Level Agreements Düsseldorf Frankfurt Over 4,000 training participants Research and development budget: CHF 5.0 / EUR 4 million Stuttgart Freiburg Basel Munich Vienna Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers Bern Zurich Lausanne 3 3 3
AGENDA 1. Capacity Management and Performance Tuning 2. Real Life Examples 3. Conclusion 4
Capacity Management What do we understand under the term Capacity Management (in the context of database systems)? A process to ensure that capacity of database systems meets current and future business requirements in a cost-effective manner Its goals are avoid resource shortages - they may result in performance and stability problems avoid wastage of resources and overcapacity - both have negative influence on TCO Capacity Management implementation is a project 5
What resources are we talking about? These are the most relevant resources when doing Capacity Management CPU usage - database and database user: CPU usage as a metric - server: busy percentage, load Database and database user I/O - read & writes - rate in IOs/s - throughput in MB/s - differentiated by small and large operations, including the I/O category (backup, redo logging, archiving, data file, etc.) Memory usage - database instance: SGA, PGA, process memory - server: busy/free memory, swap space 6
What other information do we need? Further information needed in Capacity Management projects Database size Session count Active session count / DB time Redo volume Usage of separately licensed Oracle Database options - Partitioning - Advanced Security - Advanced Compression - Operating criteria - Criticality - Availability class - Group membership 7
Capacity Management Approach (1) Record the usage of relevant resources in bigger environments or on complex database systems we suggest to install TVD-CapMan TM (Trivadis Capacity Manager) Look for resource shortages high CPU busy high memory usage high IO rates (e.g. small SGAs with high IO rates, small DBs with high IO rates) Look for spare capacities low CPU busy low memory usage large instances with potential to decrease the SGA size 8
Capacity Management Approach (2) Look for top consumers databases or database-applications most important - IO usage: a few users/queries/tables can have a big impact on the whole environment especially when using shared storage - CPU usage Perform proactive performance analysis on top consumers implement and document performance tuning activities and resulting changes control their impact on the usage of resources If applicable: check utilization of clustered systems Can one node handle the whole load? Control the memory (SGA+PGA) and the number of processes Control CPU and IO usage 9
Capacity Management Approach (3) After some time Generate charts to see the trends Control the impact of accomplished changes Repeat capacity analyses Further steps may include support of consolidation activities sizing new systems forecasting future system behavior (typically only for critical systems) performance monitoring 10
Extracts from O-TUN Database Performance Troubleshooting and Tuning Performance Analysis Main goal of performance analysis is the identification of one or many bottlenecks It can be achieved by several analysis methods The selection of suitable analysis methods depends on some factors Period when the problem occurred License situation (availability of Diagnostic/Tuning Packs) Availability of other tools Personal Experience The next slide contains a block diagram showing the performance analysis methodology suggested in Trivadis course O-TUN 11
Extracts from O-TUN Database Performance Troubleshooting and Tuning Performance Analysis Methodology Problem occurred in the past Problem occurs now Diagnostic pack Diagnostic pack Analyse Statspack report Analyse AWR report and ADDM findings Analyse ASH report Identify top/ involved sessions Perform SQL Trace CPU, IO-rate, DB-time, Redo Tx-Rate, Logons... Detail analysis Check database key performance indicators Identify top wait events Check memory advisories Identify hot segments Identify top/ important SQL Check server resources Identify recent changes Check/tune other databases Perform SQL tuning Check optimizer statistics/ configuration Memory, Swap, CPU (user, kernel) Identify the bottleneck Problem areas Slow IO High IO rate Bad Indexing High CPUusage Hot Segments Inefficient SQL Memory Parsing Locking High executions Many logons Not a database problem Oracle Database Tuning, Performance Analysis Methodology Präsentationskennung - Eintrag über Kopf-/Fusszeile 12
Extracts from O-TUN Database Performance Troubleshooting and Tuning Recommendations Performance analysis might discover some bottlenecks Resulting performance troubleshooting activities attempt to find some approaches in order to eliminate or reduce the bottlenecks Such activities will hopefully result in some recommendations The implementation of a recommendation may have a different degree of difficulty could be fully transparent to the application could require a downtime of the application, the database or the server could require an application change Since performance tuning is an iterative process the next performance analysis could begin shortly after a recommendation has been implemented 13
What is TVD-CapMan TM? TVD-CapMan (Trivadis Capacity Manager) is a software solution allows enterprise wide capacity-, performance- and resource management, accounting and sizing of Oracle database systems collects data about servers, database instances and optionally about application sessions shows a big picture of your Oracle environment in form of a interactive resource map allows various analyses Technical features uses only standard Oracle features (no extract cost features) data gathering is agentless supports Oracle >= 8.1.7 allows gathering of up to 500 databases on 50 servers per minute 14
AGENDA 1. Capacity Management and Performance Tuning 2. Real Life Examples 3. Conclusion 15
Project 1 Some Facts About 350 databases Solaris (Sun Enterprise Class) Platform migration to Linux (Blade-Server/Intel/RAC/ASM) Trivadis takes over the responsibility for the operating of databases and database servers Capacity Management goals no wastage of resources, no shortages, no performance escalations operating stability fulfil the SLA (includes capacity management) 16
Project 1 Capacity Management Activities A few hours after the installation on TVD-CapMan first reports show that a few databases perform a lot of I/O some servers have too busy CPUs a short performance analysis is performed on the top databases with the focus on IO resulting recommendations are provided to the customer During next months further recommendations have been implemented Performance analysis activities are still ongoing proactive - based on top consumers reactive - based on complains Trend reports are generated monthly and discussed with the customer Implementation of accounting reports based on TVD-CapMan Session Collector 17
Example 1: Performance Tuning Results Selected recommendations and their impact on resource usage DB# Problem Recommendation Impact 314 Very high IO-throughput (peaks up to 600MB/s) Indexing Indexes created on 19.08, peaks reduced to 20MB/s, CPU usage dropped by 60-70% 201 Top Segment CTXSYS.SYS_IOT_TOP_42740 with 40Mio Physical Reads/h ALTER TABLE CTXSYS.DR$PENDING MOVE ONLINE Total DB IO reduced by 40%, CPU by 50-60% (07.07) 252 Top Segment with high IO, Users complain about bad perf. ALTER TABLE CACHE, Keep Pool, Index Implemented on 14.06, total DB IO dropped from 100MB/s to 10MB/s 204 Table SYS_CHANGE_PROTOCOL with 10-20MioReads/h Index creation Total DB IO reduced by 50-60% (26.07 and 19.08) 316 Table M_ARCHIVE with over 10MioReads/h Index creation, SGA increase Total DB IO reduced by 40% (26.07) SGA increased on (21.08) 177 Top segments with many full scans and high IO throughput ALTER TABLE SHRINK SPACE CASCADE Segment size reduced from 17GB to 2MB, total DB IO reduced by 80%, CPU by 40% (02.08) 354 High IO and CPU usage due to wrong SQL plans "_fix_control"='745286 3:OFF' Significant reduction IO and CPU usage 18
Example 1: Performance Tuning Results DB# Problem Recommendation Impact 314 Very high IO-throughput (peaks up to 600MB/s) Indexing Indexes created on 19.08, peaks reduced to 20MB/s, CPU usage dropped by 60-70% Chart shows the evidence 19
Example 1: Performance Tuning Results DB# Problem Recommendation Impact 201 Top Segment CTXSYS.SYS_IOT_TOP_42740 with 40Mio Physical Reads/h ALTER TABLE CTXSYS.DR$PENDING MOVE ONLINE Total DB IO reduced by 40%, CPU by 50-60% (07.07) Chart shows the evidence 20
Example 1: Performance Tuning Results DB# Problem Recommendation Impact 204 Table SYS_CHANGE_PROTOCOL with 10-20MioReads/h Index creation Total DB IO reduced by 50-60% (26.07 and 19.08) Chart shows the IO throughput of two databases (prod and test) Second recommendations has been implemented only in test 21
Example 1: Performance Tuning Results DB# Problem Recommendation Impact 354 High IO and CPU usage due to wrong SQL plans "_fix_control"='745286 3:OFF' Significant reduction IO and CPU usage Chart shows the evidence 22
Example 2: Trend Analysis All databases on all servers Reduction of the DB CPU-Usage is clearly visible 23
Example 2: Trend Analysis All databases on all servers Comparison of DB CPU-Usage and User Calls (number of executed statements) 24
Example 2: Trend Analysis All databases on all servers IO-throughput could be reduced 25
Example 3: Big Picture CPU Usage at project start: a few servers are red, some are yellow Top Database 26
Example 3: Big Picture CPU Usage thee months later: no red servers 27
Example 4: RAC Cluster Utilization Average of one day black: free memory blue: SGA pink: PGA 28 Robert Kruzynski, Trivadis
Example 4: RAC Cluster Utilization Average of the same day black: free CPUs green: CPUs used by DB orange: non DB CPU conclusion: RAM/core factor is too low 29
Example 5: Backup-IO Reduction Scheduling of database backups has been optimized A new software component decides if and when a backup should be started Charts show weekly averages 6 MB/s ~ 506 GB/day ~ 15 TB/month 30
Project 2 Some Facts A few Sun M9000 servers with zones large databases (10-20 TB each) long running batch jobs Goals transparency - which application occupies the database servers - when and how many batch jobs can be started (avoid overloading the system) - what are the system limits performance monitoring platform migration planed 31
Project 2 Capacity Management Activities Focus on one server with three large databases Installation of TVD-CapMan including the Session Data Collector Many interesting findings who does the IO backup scheduling issues curious database parameters Performance analysis on important batch jobs resulting recommendations passed to software vendor one quarterly batch job improved from ~1 week to ~36 hours some recommendations still not in production Performance monitoring 32
Example 6: Identification of batch jobs Batch job at work: 10-15 active sessions use 8-10 CPUs Schema information (extracted by TVD-CapMan) was very important in order to identify the batch job Wait time about 40% of the DB time 33
Example 7: Recommendations Performance tuning recommendations Implement some tables as IOTs (Index Organized) it will save the maintenance of a few indexes it will save many table access by rowid operations Drop some indexes Compress some tables Use Basic Table Compression in order to reduce the table size by factor 2-3 It will saves space, IO and reduce the duration of a full scan 34
Example 8: Performance Monitoring Goals Near real-time view of the whole envinronment - Application servers (Java on Windows/VM Ware) - Databases - Database Servers Give answers to the following questions - Which application server is currently running a batch operation? - What is the impact of this operation on the database server Data is provided by TVD-CapMan Reports implemented in APEX 35
Example 8: Performance Monitoring 36 Capacity Management with TVD-CapMan 28.09.2012
Project 3 Some Facts About 3000 databases Goal reduction of the number of CPU cores consolidation of databases with the Partitioning Option First analysis showed In some departments there is a high potential to reduce the number of servers RAM/core ratio has to be increased when ordering new servers Databases using the Partitioning Option are distributed over many servers Consolidation/migration recommendations could be provided 37
Example 9: Consolidation in a department 26 database servers with 90 productive databases Hardware - 22 * 8 core/64gb (4 without Hyper Threading) - 2 * 12 core/256gb - 2 * 24 core/256gb - Total number of CPU-Cores 248 DB-Size 23500GB Total Oracle memory demand 433GB - SGA 350GB - PGA 35GB - Max. Sessions 6000 (~48GB RAM) Maximum CPU usage: 20-30 CPU cores Maximum IO: 15000 IOs/s and 1200MB/s (high IO intensity) 38
Example 9: Consolidation in a department Partitioning option used by 19 of 90 databases On 128 of 248 CPU cores (without considering the cluster partners!) DB CPU-Usage Partitioning: up to 15 CPU-Threads Without Partitioning: up to 15 CPU-Threads IO-Rates Partitioning: 3000-10000 IO/s Without Partitioning: 5000-10000 IO/s 39
Example 9: Consolidation in a department Consolidation from 26 to 12 database servers is possible Recommended hardware: 6 CPU cores/128gb RAM each server 6 two node clusters Total number of CPU cores: 72 (currenty 248!) Consolidation to 6 Servers (12 CPU cores/256gb RAM) is not recommended High IO intensity Expected long backup duration because of the database size and compressed backup sets Separation of databases with and without partitioning 3 clusters with partitioning 3 cluster for databases without partitioning 40
AGENDA 1. Capacity Management and Performance Tuning 2. Real Life Examples 3. Conclusion 41
Conclusion Performance Tuning Most frequent tuning recommendations Indexing Improved caching Segment reorganization Optimizing data structures (e.g. IOTs) Optimizer statistics (e.g. system statistics) SQL profiles SQL tuning (syntax changes) Instance/session parameter changes 42
Conclusion Capacity Management Capacity Management efforts shown here provide transparency (who is using how much) frequently result in performance analyses support performance tuning decisions show the impact of implemented recommendations support consolidation decisions More sophisticated Capacity Management methods exist and can be used in eligible situations e.g. sizing new hardware forecasting resource usage/utilization for critical database systems TVD-CapMan TM essentially supports capacity management activities in various projects 43
THANK YOU. Unterstützung von Tuning- Maßnahmen mit Hilfe von Capacity Management DOAG SIG Database 28.02.2013 Robert.Kruzynski@trivadis.com www.trivadis.com BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 44