Creating indexes suited to your queries

Size: px

Start display at page:

Download "Creating indexes suited to your queries"

Scot Atkins
5 years ago
Views:

1 Creating indexes suited to your queries Jacek Surma PKO Bank Polski S.A. Session Code: B11 Wed, 16th Oct 2013, 11:00 12:00 Platform: DB2 z/os Michał Białecki IBM Silicon Valley / SWG Cracow Lab In every shop the big challenge is performance of queries (access path selection). Fortunately it is quite surprising how little we really need to understand about the optimizer to improve this queries performance by building proper indexes, suiting a particular query. The presentation will describe two algorithms how to create indexes for Boolean Term (BT) predicate, The first algorithm is about creating indexes for maximum MATCHCOLS (matching). The second algorithm is about creating indexes for sort avoidance. These algorithms will be compared from a performance point of view, and which one to choose for particular query on particular data to perform better, and also will discuss where to stop with adding columns to index. And we will see also how practicaly approach same task for number of queries. 1

2 2 Agenda 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload As an introduction to the topic I will talk about methods of DB2 data access. Then, step by step, I will discuss these two algorithms and finally I will compare these two algorithms using a real example. After my talk my colleague will present tuning multiple queries workload. 2

3 3 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Let s start from some terms, and how we can access data

4 Types of access to DB2 data Pages in index or table can be read in 3 different ways: 1. Random Read (Synchronous Read) 2. Sequential Read (Sequential Prefetch) 3.

different ways: Random, Sequential or Skip-sequential Read Every time DB2 reads a single index leaf-page or reads a single data page, that read is counted as one Random Read.

4 4 Types of access to DB2 data Pages in index or table can be read in 3 different ways: 1. Random Read (Synchronous Read) 2. Sequential Read (Sequential Prefetch) 3. Skip sequential Read Total I/O time Random Read from disc = 10 ms Sequential Read from disk = 0,01 ms Read from bufferpool = 50 µs Sort cost = 0,002 ms We can read pages in index or table in 3 different ways: Random, Sequential or Skip-sequential Read Every time DB2 reads a single index leaf-page or reads a single data page, that read is counted as one Random Read. In computer science, Random Access (sometimes called Direct Access) is the ability to access an element at any position in a sequence. The opposite of this is Sequential Access. In computer science, Sequential Access means that a group of elements is accessed in a predetermined, ordered sequence. DB2 has a mechanism called Sequential Detection. This mechanism monitors the access pattern per statement, and if it detects sequential access, it uses Sequential Prefetch. DB2 uses different read types to prefetch data and avoid costly synchronous read operations that can cause application wait times. Prefetch is a mechanism for reading a set of pages, usually 32, into the buffer pool with only one I/O operation. The maximum number of pages read by a single prefetch operation is determined by the size of the buffer pool that is used for the operation. DB2 uses the following types of prefetch: 4

5 5 Primary Key Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE CUSTID = :CUSTID T INDEX IX1(CUSTID) CUSTID RID T Table CUST index rows table rows The number of touches is the basic measure for the cost of an access path. This is universal true for any database (DBMS) not only for DB2, that is mentioned in any DBA bible, like eg. Tapio Lahdenmaki Relational Database Index Design and the Optimizers One touch means one read an index entry or one table row. In this example, when we access data with the Primary Index we need only 1 touch for an index entry, and 1 touch for the table. This is a Random Read, so the cost is 20 ms. 5

6 6 Clustering Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE ZIPCOD = :ZIPCOD AND LNAME = :LNAME ORDER BY FNAME T T T T INDEX IX2(ZIPCOD,LNAME,FNAME) ZIPCOD LNAME FNAME RID SURMA ADAM SURMA JACK SURMA JOHN SURMA BEATA ADAL PETER T T T Table CUST index rows table rows Because all index rows are sorted, then each index slice read is a Sequential Read. This is why reading an index slice is very fast. From an index point of view, the RIDs point to random data pages. You can however, define one index as clustering, which means that DB2 will try to maintain rows in the sequence of the index column(s). In this example a table slice is also read sequentially. 6

7 7 Nonclustering Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE ZIPCOD = :ZIPCOD AND LNAME = :LNAME ORDER BY FNAME T T T T INDEX IX2(ZIPCOD,LNAME,FNAME) ZIPCOD LNAME FNAME RID SURMA ADAM SURMA JACK SURMA JOHN SURMA BEATA T T Table CUST ADAL PETER T index rows table rows On this visual, the table rows are not in the same sequence as the index rows; therefore, all table touches are random. Minimizing the number of random touches -- with better indexes is very important. The smaller the index slice, the smaller Elapsed and CPU Time. 7

8 Algorithm for index creation goal There are three main reasons to create indexes: To improve query performance To ensure uniqueness of values To ensure a physical clustering sequence of table data

8 8 Algorithm for index creation goal There are three main reasons to create indexes: To improve query performance To ensure uniqueness of values To ensure a physical clustering sequence of table data Sort avoidance: SORT=N Index access only: INDEXONLY=Y Reduce the cost and time of the query In our presentation we will focus on performance of query only, using index. What we would like to gain with index: Uniqueness Matching Screening List Prefetch for RID sort Index Only to save access to data pages No Sort to avoid sort in the Sort-Pool or DSNDB07 Clustering Partitioning The best situation is one index per table. This one index supports the primary key, foreign key, partitioning, clustering, and the data access. Of course this kind of design is difficult, but we should try. 8

9 9 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Going to second point in agenda of our presentation

10 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Maximum Matching Columns only 3 steps Maximum Matching Columns and Index Only only 5 steps Boolean term (BT)

10 10 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Maximum Matching Columns only 3 steps Maximum Matching Columns and Index Only only 5 steps Boolean term (BT) predicate simple or compound predicate when evaluated false for a specific row, it makes the entire WHERE clause false for that particular row. WHERE LASTN = SURMA AND FIRSTN = JACEK What does Matching Columns mean? Matching Columns (MC) are index columns which define the size of Index Slice. Screening Columns (SC) are index columns which eliminate rows from the Index Slice before touching the table. The higher the number of Matching Columns, the smaller the Index Slice. If MATCHCOLS is 0, the access method is called a Nonmatching Index Scan. All the index keys and their RIDs are read because of Screening Columns. If MATCHCOLS is greater than 0, the access method is called a Matching Index Scan. Matching Index Scan is possible as long as the predicates in the WHERE clause are connected with AND, and all are Equal Predicates, and there is only one IN-list predicate, and there is only one Range Predicate. Index Matching reduces the number of index pages to read. Index Screening reduces the number of table rows to read. 10

11 11 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Cardinality number of unique values Filter Factors (FF) selectivity of the predicate Average (estimated) FF = 1 NAME = :NAME Cardinality *) see notes Specific (actual) FF = Number of result rows Number of source rows NAME = JACEK Here we have some very important definition. The Filter Factor of a predicate is the number of qualifying rows divided by the number of source rows. FF=0% no rows qualify, predicate is false for all rows FF=100% all rows qualify, predicate is true for all rows Estimated FF can be different for with frequencies or histograms collected. 11

12 12 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 1 matching Place the EQUAL predicates as leading index columns, that is: =, IS NULL, IS NOT DISTINCT FROM The order from the most restrictive (for CLUSTER from the least restrictive). SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY The order of the columns in an index is from the most filtering/selective/restrictive/with higher cardinality/with maximum number of distinct values/with Filter Factor close to 0/has the most differing values. 12

13 13 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 2 matching The next column in the index is column with predicate IN list. If there are more than one IN list columns, you have to select only the most restrictive one. Starting from DB2 version 10 we can consider multiple IN list columns. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE IN-list Index Scan (ACCESSTYPE=N) is a special case of the Matching Index Scan in which a single indexable IN-list predicate is used as a Matching Equal Predicate. At most only one IN-list predicate can be matching on an index. In case of List Prefetch or Multiple Index Access, IN-list predicates cannot be used as matching predicates. 13

14 14 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 3 matching/screening Next are the Range predicate columns (>, <, >=, <=, BETWEEN, LIKE x% ). They should be set from the most restrictive (the smallest FF). SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN Range predicates interrupt Matching Index Scan. Only the one first Range Column in the index is included in the matching count. 14

15 15 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 4 index only Add the ORDER BY columns in the order in which they appear. Omit the column that appeared in steps 1, 2, 3. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN ZIP Index Only is never possible with Multiple Index Access. Index Only is not possible for any step that uses List Prefetch. Index Only is not possible for Padded indexes, when VARCHAR (varyinglength) columns are returned. Index Only uses only one multicolumn index. Index Only access is when all of the columns needed for the query can be found in the index and DB2 does not access the table. 15

16 16 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 5 index only Add the columns appearing after SELECT. Omit the columns that appeared in steps 1,2,3,4. Updatable columns should be placed at the end of an index. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN ZIP NUMBER STREET When we add the rest of the columns from SELECT clause then we reach Index Only access. 16

17 17 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 4 3 The algorithm to create index for sort avoidance Comparison of algorithms 5 Tuning multiple queries workload Let s start with the algorithm for Sort Avoidance. 17

from the database in ORDER BY sequence, the DBMS must read and sort all result rows before the first FETCH.

18 18 The algorithm to create index for sort avoidance and INDEX ONLY No Sort only 2 steps No Sort and Index Only only 4 steps Attention! We cannot avoid sort for: List Prefetch with ORDER BY ORDER BY on columns of the inner table of a Nested Loop Join ORDER BY on columns RANDOM UNION INTERSECT EXCEPT If the result rows do not come from the database in ORDER BY sequence, the DBMS must read and sort all result rows before the first FETCH. Sort with current hardware is very fast today, but we need an index for sort avoidance when our query fills screens and uses the SQL option OPTIMIZE FOR N ROWS (with N>1). Any index is always ordered so some sorts can be avoided if index keys are in the order needed by ORDER BY, GROUP BY, a JOIN operation, or DISTINCT in an aggregate function. With List Prefetch DB2 performs sort twice: RID sort in the RID POOL Data sort in the SORT POOL or DSNDB07 database 18

19 19 The algorithm to create index for sort avoidance and INDEX ONLY Step 1 matching Place the EQUAL predicates as leading index columns, that is : =, IS NULL, IS NOT DISTINCT FROM The order from the most restrictive. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY This step is the same as for Maximum Matching Column algorithm. 19

20 20 The algorithm to create index for sort avoidance and INDEX ONLY Step 2 no sort Add the ORDER BY columns in the same sequence as they appear in ORDER BY clause, and with the same ASC/DESC options. Ignore columns that were already placed in step 1. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP In this step we have to add the ORDER BY columns in the same sequence as they appear in ORDER BY clause, and with the same ASC/DESC options. 20

21 21 The algorithm to create index for sort avoidance and INDEX ONLY Step 3 screening Add all the remaining columns of the WHERE clause in any order (IN list and Range predicates). Omit the columns that appeared in steps 1, 2. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP AGE HEIGHT When we add the rest of the columns from the WHERE and SELECT clauses, then we reach Index Only access. So now we add all the remaining columns of the WHERE clause in any order. 21

22 22 The algorithm to create index for sort avoidance and INDEX ONLY Step 4 index only Add the columns appearing after SELECT. Omit the columns that appeared in steps 1,2,3,4. Updatable columns should be placed at the end of an index. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP AGE HEIGHT NUMBER STREET Finally we should add the columns appearing after SELECT. 22

23 23 The algorithm to create index summary Fat index IX1 max MC LNAME CITY AGE HEIGHT BORN ZIP NUMBER STREET Fat index IX2 Sort Avoidance LNAME CITY BORN ZIP AGE HEIGHT NUMBER STREET Fat index all columns which apear in query are in one index (Index Only) Semifat index all predicate columns in one index (maximum index screening) In the documentation, we can often meet the terms of Fat Index and Semifat Index. If all the columns needed for the query can be found in the index, then this index is called Fat Index. If all the predicate columns are in the index, then this index is called Semifat Index. 23

24 24 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Let s start with the example from the real life. 24

25 25 Algorithms of index creation cost comparison Which index is more efficient for a particular query / data? Analysis of the case. We have to choose the best index for our query. 25

26 26 Algorithms for index creation (max MC) no index SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16) - FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 0 ACCESSTYPE = R PREFETCH = S (Pure sequential prefetch) The estimate cost for this query is with no index. 26

27 27 Algorithms for index creation (max MC) no index AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME When we want to have a real data we should do the raport from DB2 accounting traces. 27

28 28 Algorithms for index creation (max MC) matching SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 INDEX IX1 AND AMC_IINCOME_M1 > 0 - FF=0,16 AMC_NPERIOD AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AMC_CENTRO_MOD AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 2 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) Doing the first step for max MC the estimate total cost is reduced by nearly half. 28

29 29 Algorithms for index creation (max MC) matching AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME Elapsed Time is reduced, but CPU Time has increased by nearly 45% The reason is FF close to 99% 29

30 30 Algorithms for index creation (max MC) matching + IN(list) SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 TOTAL COST = 12,6 (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = N PREFETCH = S (Pure sequential prefetch) When we add a column with IN-list, then the estimate total cost drops to a value of 12,6 30

31 31 Algorithms for index creation (max MC) matching + IN(list) AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME The value of CPU Time also reduced but is still higher than without the index. 31

32 32 Algorithms for index creation (max MC) matching + IN(list) + screening SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 AMC_IINCOME_M1 AMC_IINCOME_M3 TOTAL COST = 7,12 (DSN_STATEMNT_TABLE) MC = 4 ACCESSTYPE = N PREFETCH = NO When we add Screening Columns the estimate total cost is only 7,12 32

33 33 Algorithms for index creation (max MC) matching + IN(list) + screening AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME According to the accouting report, we see a huge drop in the value of the Elapsed and CPU Time. 33

34 34 Algorithms for index creation (max MC) matching + IN(list) + screening + INDEX ONLY SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = 6,4 (DSN_STATEMNT_TABLE) MC = 4 ACCESSTYPE = N PREFETCH = NO INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 AMC_IINCOME_M1 AMC_IINCOME_M3 AMC_CENTRO_ALTA AMC_ENTIDAD AMC_IINCOME_M4 When we do Index Only access then estimate total cost is 6,4 34

35 35 Algorithms for index creation (max MC) matching+ IN(list) + screening + INDEX ONLY AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME With Index Only access the CPU and Elapsed Time are slightly lower. 35

36 36 Algorithms for index creation (NO SORT) matching + NO SORT SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) Consider now the second algorithm. When we have no sort the estimate total cost is reduced by nearly half. 36

37 37 Algorithms for index creation (NO SORT) matching + NO SORT AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME According to the accouting report the Elapsed Time was reduced to 20,72s and the CPU Time was reduced to 1,54s. 37

38 38 Algorithms for index creation (NO SORT) matching + NO SORT + screening SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) When we add Screening Columns the estimate total cost is

39 39 Algorithms for index creation (NO SORT) matching + NO SORT + screening AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME From to the accouting report, we see a large drop in the values of the Elapsed and CPU Time. 39

40 40 Algorithms for index creation (NO SORT) matching + NO SORT + screening + INDEX ONLY SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 AMC_ENTIDAD AMC_IINCOME_M4 When we do Index Only access then estimate total cost is

41 41 Algorithms for index creation (NO SORT) matching + NO SORT + screening + INDEX ONLY AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME With Index Only access the CPU Time is slightly lower but Elapsed Time is only 1s. 41

now compare the results for our specific query and

42 42 Algorithms for creating an index Summary of query performance results for given case We can now compare the results for our specific query and select the best from the Elapsed Time point of view. 42

43 43 Algorithms for creating an index Summary of query performance results for given case We can now compare the results for our specific query and select the best from the CPU Time point of view. 43

44 Algorithms for creating an index Summary The costs of adding an index or column to index Disk Space Adding index Cache (for non leaf pages) Insert 10 ms per added row Update 10 ms when columns of

44 44 Algorithms for creating an index Summary The costs of adding an index or column to index Disk Space Adding index Cache (for non leaf pages) Insert 10 ms per added row Update 10 ms when columns of new index updated Delete 10 ms per removed row Index maintenance (Reorg/Rebuild/Runstat ) Disk Space Adding column to index Insert none if adequate free space Update 10 ms when new column updated Delete none If a new index must be created, it is important to measure the full impact of the new index. Every secondary index that is added to a table introduces a random reader for inserts, deletes, and updates to key values. Every insert and delete (and some updates) causes an I/O against the secondary index to add and remove keys. Typically, the secondary index is not in the same cluster of the table data, and that can result in many random I/Os to get index pages into the buffer pool for the operation. Thus an insert to a table with a secondary index will actually have additional random reads. Therefore, it is extremely important to understand the frequency of execution of all statements in an application. 44

45 45 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload

46 Designing indexes for number of queries (workload) What if you have: 10 different queries Create 10 different indexes (or consolidation of indexes)? Possible.

(more time needed Hi boss, I will be ready with it in 2016 ) What if you do not know your queries or how often execute them (common in case of dynamic queries)? Magic wand pls!

46 46 Designing indexes for number of queries (workload) What if you have: 10 different queries Create 10 different indexes (or consolidation of indexes)? Possible... With some downsides (time, resources) 1000 queries Consolidate proposed indexes somehow in 1 or a few indexes? (more time needed Hi boss, I will be ready with it in 2016 ) What if you do not know your queries or how often execute them (common in case of dynamic queries)? Magic wand pls!!! Jacek has shown algoritms for Boolean term predicates for particular query.. However that does not address the question, what if you have 10 queries, will you define 10 indexes? Or if you have 1000 queries? Will you make this effort to consider index for every select based on frequency of query? How you consolidate them. Or if you do not know what queries, users run, since it is eg hard to collect (dynamic queries) Actually, when I asked Jacek, he replied to me.. On this table in our system, we in fact have just one query.. So this is perfect way indeed for Jacek s company.. How about other customers? How they should deal with such approach? Knowing how index should be defined is crucial, and helps you to correct indexes you already have. So this is extremly valuable what jacek presented.. Having a tool for it, is nice, but still does not mean, you do not need to think anymore..

47 Optim Query Workload Tuner for DB2 Index Advisor Improve query efficiency Indexing foreign keys in queries that do not have indexes defined Identifying index filtering and screening Support for

47 47 Optim Query Workload Tuner for DB2 Index Advisor Improve query efficiency Indexing foreign keys in queries that do not have indexes defined Identifying index filtering and screening Support for index only access (with INCLUDE columns supported after DB2 V10) Indexing to avoid sorts Simplify use Consolidate indexes and provide a single recommendation Enables what if analysis Provides DDL to create indexes Run immediately or save Test before deployment Utilize virtual index capabilities built into the DB2 engine Compare the access plan change after applying index recommendations virtually Magic wand is not for free, however probabably heaper than DBA time (depends on location ;-)) IBM, similary to other vendors (like eg. BMC / CA) has a tool called Index Advisor, which is part of Optim Workload Query Tuner This is a cost based tool, it uses under the covers virtual indexes and compares their costs taken from explain tables. So lets see what this tool will propose for Jacek query.. And later what it proposes for other sample workload queries.. Query Tuner also provides index advice. It analyzes the query and recommends additional indexes that would benefit the query access. Index advisor might recommend indexes for the following reasons. Foreign keys that do not have indexes defined. Indexes that will provide index filtering and/or screening for the SQL statement. Indexes that will provide index-only access for the SQL statement. Indexes that can help to avoid sorts.

48 48 Optim Query Workload Tuner for DB2 Index Advisor Identify problematic query to improve access path Ensure statistics are up to date (run recommended RUNSTATS) Review index DDL Verify / test proposed indexes in runtime

49 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Identify problematic query to improve access path As you see, this query is simple.. If you do not know anything about data.

Only for range predicate it is somehow filtering well. But range predicate is not what we like most in index creation, right? So it is bit questionable if at all to create index for this index at all.

49 49 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Identify problematic query to improve access path As you see, this query is simple.. If you do not know anything about data.. In fact this is extremly hard query to propose index.. Why? Look at filtering.. Those predicates does not filter much, filter factor is % for 4 predicates and data is skewed. Only for range predicate it is somehow filtering well. But range predicate is not what we like most in index creation, right? So it is bit questionable if at all to create index for this index at all.. Maybe a Rscan would be better here, taking into account, eg that index likely will be size of data? But what we know about data now? Maybe not enough, yet? Anyway, lets try to see, if index advisor has anything interesting to propose here.

50 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Up to date statistics Before you start creating index, there is one prior

50 50 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Up to date statistics Before you start creating index, there is one prior step required, in order to DB2 know your data.. You need current statistics.. For this you can use part of QWT, Statistic advisor (free of charge)

51 51 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Up to date statistics So here is RUNSTATS and we run it.. After it completes, we re-iterate Statistic option to be sure, it was the only recommendation (and so on till no stats is recommended).. And then we can ask for index recommendation..

52 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Review index DDL INDEX VIRT Index proposed by Index Advisor is similar (flipped two last columns) to X2 (no sort index,

52 52 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Review index DDL INDEX VIRT Index proposed by Index Advisor is similar (flipped two last columns) to X2 (no sort index, manually designed on previous pages) AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 AMC_IINCOME_M4 AMC_ENTIDAD And we have index recommendation.. And we can test candidate index to see how AP will be.. Index is very similar as X2 that avoided sort..

53 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime Index suggested by Index Advisor AVERAGE APPL(CL.1) DB2 (CL.

53 53 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime Index suggested by Index Advisor AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME So we have here a fairly good index, with no sort, and index scan, which is comparable to index that was handcrafted manually by Jacek

54 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X2 index (manually designed), with the same cost estimate and similar performance

54 54 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X2 index (manually designed), with the same cost estimate and similar performance AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME To compare with index X2.. So we can say, Index Advisor did almost same good choice like with index X2

55 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X1 has higher cost estimate and is using workfile.

55 55 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X1 has higher cost estimate and is using workfile. It performed very well due to: strong correlation between range predicate and in list predicate. It reduced significantly the number of rows. AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME And now look for access path and cost estimation of Index X1, that was manually designed by Jacek and performed like 100% better than the ones that avoid sort (X2). When I first time saw it I asked myself, why? Why index advisor did not propose it in this case? Answer is not obvious unless you look closer at your data and correlation between columns (next foil)

56 56 Optim Query Workload Tuner for DB2 Index Advisor single query tuning SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 } actual FF=0,034 AND AMC_IINCOME_M2 IN(0,15,16)- X FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; Returned rows are reduced a lot (3.4% << 84%) due to strong correlation between AMC_IINCOME_M1 and AMC_IINCOME_M2 21 mln out of 25 mln total rows are of value 0 AMC_IINCOME_M1 > 0 in almost all cases implies that AMC_IINCOME_M2 IN(0,15,16) is rather AMC_IINCOME_M2 IN(15,16) extremely selective actual FF. Because of the wrong estimation, the returning rows (108,545) after index scan is much fewer than estimation (1,312,881), DB2 over-estimates the cost of sorting for ORDER BY. So that DB2 chooses the index which can avoid sorting. While in this case, because of the strong correlation, adding P_M2 into the matching list can reduce a lot of costs during screening index leaf pages. Finally in the accounting report, the 4-matching column index has better performance. Without knowing correlation it would be suprsing to see this index performed well. For same data, but different predicates, this index can perform far worse... Eg. When AND AMC_IINCOME_M1 >=0, then filtering would not be so good, and and index would perform worse than X2 (check it! ) Or if correlation between columns (different data) is not so good it would be same story. One can ask, why we did not detect correlation, to take it into account for costing.. Answer is: Because there is range predicate (AMC_IINCOME_M1 > 0), is not useful even if we collect colgroup statistics We do not do this for range predicate, we cannot correlate them and use by optimizer, yet (RFE opened (24009? ) ).. So Jacek did design 2 indexes, but he verified, which one performs better for

57 57 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning WHAT IF you have multiple queries? Single query tuning vs. workload tuning: Single query tuning concerns the performance of a specific query Workload tuning focuses on the performance of all queries in the workload An application may (and usually does) consist of set of queries, and it is not practical to perform single query tuning for each query. Analyzing queries in isolation does not account for the effect of index changes to other queries and may result in too many indexes Index created for single query may result in creating an imbalance for other queries That was about single query, and now lets see how it would look like for queries workload..

58 58 Optim Query Workload Tuner for DB2 Workload Index Advisor Steps to be taken: Identify the workload (queries) to be tuned Review index recommendations Validate and compare before and after 58

59 59 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning This is example, we can select many sources.. Here I select queries from Dynamic statement cache, but we can select it from packages, plan_table, etc..

ESTIMATED benefit (change in performance)

60 60 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning Estimated performance improvement And that is what we got.. Set of indexes, that are suited to queries, we captured, with potential / ESTIMATED benefit (change in performance) and there is also listed for the increase of DASD space that indexes would take.

61 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning Comparing workload before and after changes

Performance has improved after creating the new indexes: elapsed time reduced from 1349,23s to 395,60s CPU time

And now, we created index and run the workload again and compare so we can see how it performed.

61 61 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning Comparing workload before and after changes (and after RUNSTATS for new indexes) Real performance improvement Performance with no statistics for new indexes Performance has improved after creating the new indexes: elapsed time reduced from 1349,23s to 395,60s CPU time reduced from 406,55s to 191,12s. And now, we created index and run the workload again and compare so we can see how it performed.. Remember, you need to run also recommended RUNSTATS so optimizer have enough info on indexes you created.. If you do not, then it will use defaults or abandon index, so performance cen be worse.. On our example workload it was worse..

62 62 Conclusions / Summary Know index design algorithms and weigh their pros & cons Know queries that run on your tables Balance what is more practical or economical for you designing index for multiple queries workload requires more time ($) or a tool ($) Verify indexes performance during runtime - Every DBA should know algorithms for index design. This is however manual and timeconsuming task. In some cases this can give us cheap and quick way to design index, when query is simple and we do not have to take care about other queries. - It is essential to know all queries that you have, so you do not imbalance other queries with designed index for one query -Designing indexes for multiple queries requires more time, more work, so you need to balance what is more efficient/quicker/cheaper to design indexes by yourself (time spent/cost of DBA work) or if to use tool (quicker, but also involves cost of tool) - The most important in this picture, is not index design but the fact how such index works on your data/your query. You should always test and verify if index you or tool invented, works as expected, with no suprises ;-).. I wish all your indexes to be well designed, and I hope our presentation would help you with this process. -If you hqave any questions, please feel free now to ask, or we can always be available on session breaks or via .

63 63 QUESTIONS

64 64 References Relational Database Index Design and the Optimizers Tapio Lahdenmaki, Michael Leach DB2 for z/os and OS/390 Development for Performance Gabrielle & Associates optimquerytuner optimquerytuner optimquerytuner3

65 Jacek Surma PKO Bank Polski S.A. Session: B11 Creating indexes suited to your queries Michal Bialecki IBM SVL / SWG Cracow Lab michal.bialecki@pl.ibm.com 65

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach (C^WILEY- IX/INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface xv 1