idoctor for IBM i - Practical examples Morten Buur Rasmussen Power Performance Specialist IBM Lab Services Europe
Goal of this 45 minutes presentation This presentation gives you some practical examples where you see how you can use idoctor and Collection Service Investigator to show performance data so you will be able to understand and solve performance problems. You will also see examples of idoctor and Job Watcher where you can get deeper information about jobs/threads that can help you solve performance problems. 1
Examples are important 2
Performance Disclaimer it depends Performance information and recommendations in this presentation are based on measurements, analysis, and projections in customer environments for specific performance workloads and may not apply to other installations. Your results may vary significantly and are dependent on the application and configuration. This information is provided along with general recommendations for you to better understand system performance. Information is provided *AS IS* without warranty of any kind. 3
Session objectives Short introduction to idoctor and wait accounting Collection Service Investigator & Jobwatcher data examples Use of SMT4 and SMT8 JVM example QZDASOINIT job and SMP Save While Active influence on jobs running JW and DBMON examples SQL Sequences Journal Caching 4
Experience Fault counts or wait accounting In times of rapid change, experience could be you worst enemy. Nancy Uthke-Schmucki/Rochester 5 5
Fee vs Free Components of idoctor 6 6
Unit of Work (thread or threads in job and System task) All units of work are in one of three states: 1. Dispatched on a CPU (running or waiting) 2. Ready to use a CPU, but waiting for a processor to become available (CPU queued) 3. Waiting on something or someone (blocked or idle) CPU (1) CPU queuing (2) Waiting (3) Elapsed time 7 7
How will it show me where I am waiting? Elapsed time is accounted for by harvesting Time and Count values and categorizing them into 32 different wait buckets We do this every n minutes, for every job, thread, task on the system n = collection interval as specified on the INTERVAL parameter of the CFGPFRCOL command 3 Count: The number of times the thread or LIC task has experienced a state covered by the specific wait bucket 1 2 4 5 6 Time: The elapsed time (wall-clock time) the thread or LIC task has spent in a state covered by the specific wait bucket Exceptions: The current, "in progress" wait at the time Collection Services takes its interval sample Entire intervals in which a thread or task has not used CPU 8
How will it show me where I am waiting? Wait Wait bucket bucket number description (Source: QAPMJOBWTD & QAPMJOBWT file) 1 Time dispatched on a CPU 2 CPU queuing 3 Reserved 4 Other waits 5 Disk page faults 6 Disk non-fault reads 7 Disk space usage contention 8 Disk op-start contention 9 Disk writes 10 Disk other 11 Journaling 12 Semaphore contention 13 Mutex contention 14 Machine level gate serialization 15 Seize contention 16 Database record lock contention 17 Object lock contention 18 Ineligible waits 19 Main storage pool overcommitment 20 Classic JVM user including locks 21 Classic JVM 22 Classic JVM other 23 Reserved 24 Socket transmits 25 Socket receives 26 Socket other 27 IFS 28 PASE 29 Data queue receives 30 Idle / waiting for work 31 Synchronization token contention 32 Abnormal contention 1 2 4 5 6 7 3 8 2 9
Example from CSI Power8 running 7.1 is using by default SMT4 Can you use CSI to view differences between SMT4 and SMT8 10
Change processor multitasking Change system value QPRCMLTTSK to 0 Will set off the multitasking Change system value QPRCMLTTSK to 1 SMT mode or 2 System controlled CALL PGM(QWCCHGPR) PARM(X'00000002') Will set the system to SMT2 CALL PGM(QWCCHGPR) PARM(X'00000004') Will set the system to SMT4 CALL PGM(QWCCHGPR) PARM(X'00000008') Will set the system to SMT8 CALL PGM(QWCCHGPR) PARM(X'00000000') Will set the system to default level (SMT4 for P8 7.1, SMT8 for P8 7.2 and 7.3) Retrieve current SMT value via API QWCRTVPR Sample of a CL Program: http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/apis/qwcrtvpr.htm 11
CSI scenario with very high CPU load P8 running SMT4 xxxxx 12
CSI scenario with very high CPU load P8 running SMT8 xxxxx 13
Learning from change of SMT4 to SMT8 in 7.1 on P8 By changing the maximum number of allowed CPU threads from 4 to 8, the IBM i is able to use more CPU and reduce CPU queuing and abnormal contention. 14
Combined example of CSI and Jobwatcher The following is an example of first using the Collection Service Investigator to have the overview of the job in question and then use Jobwatcher to drill down and look at call stack etc. The scenario is JVM (Java) job running invoicing from 21:00 to 07:00. It will be running for days, but the window for the invoicing is not 24/7. The challenge is to find ways to reduce the run time and avoid changing the application. 15
Ex 1: JVM job B.6103 invoicing 21:00 to 07:00 16
Ex 1: JVM job B.6103 invoicing 21:00 to 07:00 17
Ex 1: Access by JW 18
Ex 1: Thread wait time signature 19
Ex 1: SSD s? 20
Ex 1: Call stack showing RLA (Record Level Access) 21
Ex 1: What are we reading? 22
Learning from Java job running CSI gives a good overview of where the job is spending the time. JW adds the information about call stack and what the jobs is waiting for. In case files should be used to place in memory, placed on SSD s. If investment budget allows, its clearly shown that faster storage would help the job done much faster. 23
Jobwatcher examples From time to time the system is slowing down. CPU queuing occurs. What is causing that? 24
Practical example with P8 SMT4 V7R1- EC 11.8/VP 12 Sometimes slowdown with high CPU 25
JW You can have a look at the threads taking up most CPU resources, but unfortunately its easy to loose the overview. You can observe some threads with same job number, so multithreaded jobs 26
JW Better to look at job level and include number of participating threads 27
JW Time signature for contributing threads 28
JW We can go into any of the threads, if secondary thread, then we click the errow and go to main thread 29
JW and link to System I Navigator with Plan Cache Here we can find the SQL statement and take a look in the Plan Cache 30
JW From Plan Cache we have the Visual Explain 31
Learning from Jobwatcher analysis The Jobwatcher can lead to SQL requests in question as well as giving detailed information about multithreaded jobs. The query degree of *OPTIMIZE can be aggressive in a partition and especially when more heavy queries run at the same time. Consider lowering the query degree, a good start could be 50% or less. *OPTIMIZE 050 in the QAQQINI for the query jobs 32
Recap System value QQUERYDEGREE Who knows the system valued QQRYDEGREE? Who has set the QQRYDEGREE to *OPTIMIZE? Who has set the QQRYDEGREE to *MAX? WHO knows the query options file QAQQINI? QAQQINI is not following the library list or the SQL PATH QAQQINI is set in different ways, commonly CHGQRYA QRYOPTLIB(mylib) 33
Jobwatcher examples Customer is using Save While Active. At the same time a time critical application is running. The application is requiring stable response time. 34
JW Scenario with Save while active The system is spending most time in the save, but the point here is seize contention 35
JW takes to a list of jobs/threads/tasks spending time in seize contention xxx 36
JW shows call stack and holder of the seize xxx 37
JW show you the holder and activity xxx 38
Learning JW and Save While Active Many small sizes causes the jobs to wait during the Save While Active JW shows you the challenge and its up to you to decide how to change this. Consider doing Save While Active over less data Consider if Save While Active is the right solution here 39
Additional JW example Large amount of time spend in seizes 224 40
What jobs are influenced by the seizes? 224 41
One of the jobs influenced 224 42
Have a look at the wait buckets and when you have a seize wait, go to call stack 224 43
QQQDBLOG is the DBMON 224 44
Learning JW and DBMON The DBMON was running to select only CQE (Classic Query Engine): STRDBMON OUTFILE(MYLIB/MYMONFile) JOB(*ALL) TYPE(*DETAIL) COMMENT('WANT_CQE_ONLY') This is recommended by M B Rasmussen to investigate CQE s on the system :o) The partition has not enabled concurrent write, check the status: call qdbencwt Care should be taken to run DBMON with selections for longer time. 45
Jobwatcher examples Customer is doing a SQL update, but it takes longer time than expected. update FE_HISTORIQUE_ENCAISSEMENT mnt set FE_HISTORIQUE_ENCAISSEMENT_ID = (NEXT VALUE FOR FE_HISTORIQUE_ENCAISSEMENT_SEQ) The Visual Explain graph shows a full table scan This makes sense as the entire table is going to be updated 46
JW From the Jobwatcher we can observe that most of the jobs time is spend in disk writes: 47
JW The Jobwatcher shows us what object we are waiting for while doing the writes: 48
JW The objects are the data areas used by the SQL sequences: 49
JW When we look at the time consume per 10 seconds for the job, its very obviously that its constantly waiting for the disk writes: 50
JW Here we have the call stack: 51
JW Lots of time spend in disk writes, the disk writes duration is normal, but just done very frequently. Here we have the call stack 52
Definition of SQL Sequence The option to generate sequence values in the order requested Changed to this: 53
JW After changing the sequence properties, the run became much faster. The graph for the new run is similar to before, but only takes 22 seconds for the 781.000 updates compared to 20 minutes before the change.. 54
Learning JW The SQL definition of the sequence can influence the performance significant. JW shows where the time is spend and what the jobs is mostly doing by showing the call stack. 55
Jobwatcher example Customer is doing an other SQL update, it takes longer time than expected. update set SNPTEMP2.I$_REP_POL_GEN_ROLES S IND_UPDATE = 'U' where exists select '?' from xxxxx The Visual Explain graph shows a full table scan Always check for index recommendations But can idoctor give insight to the duration of the run? 56
Jobwatcher example Most of the time is spend in journal: 57
Jobwatcher example Check if journal caching is activated. If journal caching is allowed, then it can be activated: CHGJRN JRN(JRNLIB/MYJRN) JRNCACHE(*YES) 58
Jobwatcher example With journal caching we have 8 seconds : 59
Jobwatcher example The time spend in journal can hardly be seen: 60
Journal Caching Is Journal caching used in a partition? WRKLICINF to find HA Journal Performance F11 61
Learning JW Journal caching can have a huge impact on the performance. idoctor can help you to show how much time is spend waiting for the journal operations. idoctor helps you finding jobs spending time in journaling. 62
Einstein would probably have loved IBM i!! xxxx 63
idoctor and.you Price estimates per serial numbers P05 to P20 P20 to P50 64
IBM Lab Services Vouchers for IBM i With the IBM i and selected Power Systems servers, valuable education and services vouchers are included at no additional charge Vouchers are designed to help you more fully understand and use the advanced features and capabilities of IBM i Vouchers are only available with selected new Power Systems servers, and vouchers are not available with system upgrades Vouchers are valid for 5 years beyond the ship date For more information, eligible systems and registration information see: http://www-03.ibm.com/systems/power/hardware/vouchers/index.html Or contact: Camilla Jellingdal - CAMILLA@dk.ibm.com Pia Grynderup- piag@dk.ibm.com 65
IBM i Vouchers Available services IBM i Performance IBM i Performance SQL Performance IBM i Database DB2 for IBM i Best Practices DB2 Web Query for IBM i Security Security Assessment PowerSC Single Sign On Availability IBM i Availability Assessment IBM i BRMS PowerHA on IBM i System Solutions Migration Assistance PowerVM Virtual I/O Server and IBM i External Storage for IBM i Middleware WebSphere with IBM i PHP and Open Source on IBM i Applications SAP on IBM 66
IBM Lab Services: Proven expertise to help IT leaders plan, design and implement infrastructure to accelerate digital transformation Contact in the Nordics: Pia Grynderup Nordic Manager, IBM Systems Lab Services piag@dk.ibm.com Camilla Jellingdal Nordic Opportunity Manager CAMILLA@dk.ibm.com IBM Systems 67 67
idoctor Resources idoctor e-mail list: usage tips, build updates, PTF info, etc Send join requests to mccargar@us.ibm.com idoctor Website: http://www-912.ibm.com/i_dir/idoctor.nsf/ Presentations (What s New, etc): http://www-912.ibm.com/i_dir/idoctor.nsf/downloadsdemos.html YouTube Channel (20+ videos): http://www.youtube.com/user/ibmidoctor?feature=mhum These videos are also available on IBM.COM if your company blocks YouTube. Just click the appropriate links titled Video name on IBM.COM from the Video Library pages on our website: https://www-912.ibm.com/i_dir/idoctor.nsf/videos.html idoctor Forum: http://www.ibm.com/developerworks/forums/forum.jspa?forumid=871 Documentation: https://www-912.ibm.com/i_dir/idoctor.nsf/f204de4f34767e0686256f4000757a90/$file/idoctorv7r1.pdf 68