Creating indexes suited to your queries

Size: px
Start display at page:

Download "Creating indexes suited to your queries"

Transcription

1 Creating indexes suited to your queries Jacek Surma PKO Bank Polski S.A. Session Code: B11 Wed, 16th Oct 2013, 11:00 12:00 Platform: DB2 z/os Michał Białecki IBM Silicon Valley / SWG Cracow Lab In every shop the big challenge is performance of queries (access path selection). Fortunately it is quite surprising how little we really need to understand about the optimizer to improve this queries performance by building proper indexes, suiting a particular query. The presentation will describe two algorithms how to create indexes for Boolean Term (BT) predicate, The first algorithm is about creating indexes for maximum MATCHCOLS (matching). The second algorithm is about creating indexes for sort avoidance. These algorithms will be compared from a performance point of view, and which one to choose for particular query on particular data to perform better, and also will discuss where to stop with adding columns to index. And we will see also how practicaly approach same task for number of queries. 1

2 2 Agenda 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload As an introduction to the topic I will talk about methods of DB2 data access. Then, step by step, I will discuss these two algorithms and finally I will compare these two algorithms using a real example. After my talk my colleague will present tuning multiple queries workload. 2

3 3 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Let s start from some terms, and how we can access data

4 4 Types of access to DB2 data Pages in index or table can be read in 3 different ways: 1. Random Read (Synchronous Read) 2. Sequential Read (Sequential Prefetch) 3. Skip sequential Read Total I/O time Random Read from disc = 10 ms Sequential Read from disk = 0,01 ms Read from bufferpool = 50 µs Sort cost = 0,002 ms We can read pages in index or table in 3 different ways: Random, Sequential or Skip-sequential Read Every time DB2 reads a single index leaf-page or reads a single data page, that read is counted as one Random Read. In computer science, Random Access (sometimes called Direct Access) is the ability to access an element at any position in a sequence. The opposite of this is Sequential Access. In computer science, Sequential Access means that a group of elements is accessed in a predetermined, ordered sequence. DB2 has a mechanism called Sequential Detection. This mechanism monitors the access pattern per statement, and if it detects sequential access, it uses Sequential Prefetch. DB2 uses different read types to prefetch data and avoid costly synchronous read operations that can cause application wait times. Prefetch is a mechanism for reading a set of pages, usually 32, into the buffer pool with only one I/O operation. The maximum number of pages read by a single prefetch operation is determined by the size of the buffer pool that is used for the operation. DB2 uses the following types of prefetch: 4

5 5 Primary Key Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE CUSTID = :CUSTID T INDEX IX1(CUSTID) CUSTID RID T Table CUST index rows table rows The number of touches is the basic measure for the cost of an access path. This is universal true for any database (DBMS) not only for DB2, that is mentioned in any DBA bible, like eg. Tapio Lahdenmaki Relational Database Index Design and the Optimizers One touch means one read an index entry or one table row. In this example, when we access data with the Primary Index we need only 1 touch for an index entry, and 1 touch for the table. This is a Random Read, so the cost is 20 ms. 5

6 6 Clustering Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE ZIPCOD = :ZIPCOD AND LNAME = :LNAME ORDER BY FNAME T T T T INDEX IX2(ZIPCOD,LNAME,FNAME) ZIPCOD LNAME FNAME RID SURMA ADAM SURMA JACK SURMA JOHN SURMA BEATA ADAL PETER T T T Table CUST index rows table rows Because all index rows are sorted, then each index slice read is a Sequential Read. This is why reading an index slice is very fast. From an index point of view, the RIDs point to random data pages. You can however, define one index as clustering, which means that DB2 will try to maintain rows in the sequence of the index column(s). In this example a table slice is also read sequentially. 6

7 7 Nonclustering Index Access SELECT CUSTID,LNAME,FNAME FROM CUST WHERE ZIPCOD = :ZIPCOD AND LNAME = :LNAME ORDER BY FNAME T T T T INDEX IX2(ZIPCOD,LNAME,FNAME) ZIPCOD LNAME FNAME RID SURMA ADAM SURMA JACK SURMA JOHN SURMA BEATA T T Table CUST ADAL PETER T index rows table rows On this visual, the table rows are not in the same sequence as the index rows; therefore, all table touches are random. Minimizing the number of random touches -- with better indexes is very important. The smaller the index slice, the smaller Elapsed and CPU Time. 7

8 8 Algorithm for index creation goal There are three main reasons to create indexes: To improve query performance To ensure uniqueness of values To ensure a physical clustering sequence of table data Sort avoidance: SORT=N Index access only: INDEXONLY=Y Reduce the cost and time of the query In our presentation we will focus on performance of query only, using index. What we would like to gain with index: Uniqueness Matching Screening List Prefetch for RID sort Index Only to save access to data pages No Sort to avoid sort in the Sort-Pool or DSNDB07 Clustering Partitioning The best situation is one index per table. This one index supports the primary key, foreign key, partitioning, clustering, and the data access. Of course this kind of design is difficult, but we should try. 8

9 9 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Going to second point in agenda of our presentation

10 10 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Maximum Matching Columns only 3 steps Maximum Matching Columns and Index Only only 5 steps Boolean term (BT) predicate simple or compound predicate when evaluated false for a specific row, it makes the entire WHERE clause false for that particular row. WHERE LASTN = SURMA AND FIRSTN = JACEK What does Matching Columns mean? Matching Columns (MC) are index columns which define the size of Index Slice. Screening Columns (SC) are index columns which eliminate rows from the Index Slice before touching the table. The higher the number of Matching Columns, the smaller the Index Slice. If MATCHCOLS is 0, the access method is called a Nonmatching Index Scan. All the index keys and their RIDs are read because of Screening Columns. If MATCHCOLS is greater than 0, the access method is called a Matching Index Scan. Matching Index Scan is possible as long as the predicates in the WHERE clause are connected with AND, and all are Equal Predicates, and there is only one IN-list predicate, and there is only one Range Predicate. Index Matching reduces the number of index pages to read. Index Screening reduces the number of table rows to read. 10

11 11 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Cardinality number of unique values Filter Factors (FF) selectivity of the predicate Average (estimated) FF = 1 NAME = :NAME Cardinality *) see notes Specific (actual) FF = Number of result rows Number of source rows NAME = JACEK Here we have some very important definition. The Filter Factor of a predicate is the number of qualifying rows divided by the number of source rows. FF=0% no rows qualify, predicate is false for all rows FF=100% all rows qualify, predicate is true for all rows Estimated FF can be different for with frequencies or histograms collected. 11

12 12 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 1 matching Place the EQUAL predicates as leading index columns, that is: =, IS NULL, IS NOT DISTINCT FROM The order from the most restrictive (for CLUSTER from the least restrictive). SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY The order of the columns in an index is from the most filtering/selective/restrictive/with higher cardinality/with maximum number of distinct values/with Filter Factor close to 0/has the most differing values. 12

13 13 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 2 matching The next column in the index is column with predicate IN list. If there are more than one IN list columns, you have to select only the most restrictive one. Starting from DB2 version 10 we can consider multiple IN list columns. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE IN-list Index Scan (ACCESSTYPE=N) is a special case of the Matching Index Scan in which a single indexable IN-list predicate is used as a Matching Equal Predicate. At most only one IN-list predicate can be matching on an index. In case of List Prefetch or Multiple Index Access, IN-list predicates cannot be used as matching predicates. 13

14 14 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 3 matching/screening Next are the Range predicate columns (>, <, >=, <=, BETWEEN, LIKE x% ). They should be set from the most restrictive (the smallest FF). SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN Range predicates interrupt Matching Index Scan. Only the one first Range Column in the index is included in the matching count. 14

15 15 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 4 index only Add the ORDER BY columns in the order in which they appear. Omit the column that appeared in steps 1, 2, 3. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN ZIP Index Only is never possible with Multiple Index Access. Index Only is not possible for any step that uses List Prefetch. Index Only is not possible for Padded indexes, when VARCHAR (varyinglength) columns are returned. Index Only uses only one multicolumn index. Index Only access is when all of the columns needed for the query can be found in the index and DB2 does not access the table. 15

16 16 The algorithm to create index with the maximum Matching Columns (MC) and INDEX ONLY Step 5 index only Add the columns appearing after SELECT. Omit the columns that appeared in steps 1,2,3,4. Updatable columns should be placed at the end of an index. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX1 LNAME CITY AGE HEIGHT BORN ZIP NUMBER STREET When we add the rest of the columns from SELECT clause then we reach Index Only access. 16

17 17 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 4 3 The algorithm to create index for sort avoidance Comparison of algorithms 5 Tuning multiple queries workload Let s start with the algorithm for Sort Avoidance. 17

18 18 The algorithm to create index for sort avoidance and INDEX ONLY No Sort only 2 steps No Sort and Index Only only 4 steps Attention! We cannot avoid sort for: List Prefetch with ORDER BY ORDER BY on columns of the inner table of a Nested Loop Join ORDER BY on columns RANDOM UNION INTERSECT EXCEPT If the result rows do not come from the database in ORDER BY sequence, the DBMS must read and sort all result rows before the first FETCH. Sort with current hardware is very fast today, but we need an index for sort avoidance when our query fills screens and uses the SQL option OPTIMIZE FOR N ROWS (with N>1). Any index is always ordered so some sorts can be avoided if index keys are in the order needed by ORDER BY, GROUP BY, a JOIN operation, or DISTINCT in an aggregate function. With List Prefetch DB2 performs sort twice: RID sort in the RID POOL Data sort in the SORT POOL or DSNDB07 database 18

19 19 The algorithm to create index for sort avoidance and INDEX ONLY Step 1 matching Place the EQUAL predicates as leading index columns, that is : =, IS NULL, IS NOT DISTINCT FROM The order from the most restrictive. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY This step is the same as for Maximum Matching Column algorithm. 19

20 20 The algorithm to create index for sort avoidance and INDEX ONLY Step 2 no sort Add the ORDER BY columns in the same sequence as they appear in ORDER BY clause, and with the same ASC/DESC options. Ignore columns that were already placed in step 1. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP In this step we have to add the ORDER BY columns in the same sequence as they appear in ORDER BY clause, and with the same ASC/DESC options. 20

21 21 The algorithm to create index for sort avoidance and INDEX ONLY Step 3 screening Add all the remaining columns of the WHERE clause in any order (IN list and Range predicates). Omit the columns that appeared in steps 1, 2. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP AGE HEIGHT When we add the rest of the columns from the WHERE and SELECT clauses, then we reach Index Only access. So now we add all the remaining columns of the WHERE clause in any order. 21

22 22 The algorithm to create index for sort avoidance and INDEX ONLY Step 4 index only Add the columns appearing after SELECT. Omit the columns that appeared in steps 1,2,3,4. Updatable columns should be placed at the end of an index. SELECT STREET, NUMBER, ZIP, BORN FROM CUST WHERE LNAME = JONES AND BORN > 1973 AND AGE IN(30,40) AND CITY = LONDON AND HEIGHT < 150 ORDER BY BORN, ZIP Index columns IX2 LNAME CITY BORN ZIP AGE HEIGHT NUMBER STREET Finally we should add the columns appearing after SELECT. 22

23 23 The algorithm to create index summary Fat index IX1 max MC LNAME CITY AGE HEIGHT BORN ZIP NUMBER STREET Fat index IX2 Sort Avoidance LNAME CITY BORN ZIP AGE HEIGHT NUMBER STREET Fat index all columns which apear in query are in one index (Index Only) Semifat index all predicate columns in one index (maximum index screening) In the documentation, we can often meet the terms of Fat Index and Semifat Index. If all the columns needed for the query can be found in the index, then this index is called Fat Index. If all the predicate columns are in the index, then this index is called Semifat Index. 23

24 24 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload Let s start with the example from the real life. 24

25 25 Algorithms of index creation cost comparison Which index is more efficient for a particular query / data? Analysis of the case. We have to choose the best index for our query. 25

26 26 Algorithms for index creation (max MC) no index SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16) - FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 0 ACCESSTYPE = R PREFETCH = S (Pure sequential prefetch) The estimate cost for this query is with no index. 26

27 27 Algorithms for index creation (max MC) no index AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME When we want to have a real data we should do the raport from DB2 accounting traces. 27

28 28 Algorithms for index creation (max MC) matching SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 INDEX IX1 AND AMC_IINCOME_M1 > 0 - FF=0,16 AMC_NPERIOD AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AMC_CENTRO_MOD AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 2 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) Doing the first step for max MC the estimate total cost is reduced by nearly half. 28

29 29 Algorithms for index creation (max MC) matching AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME Elapsed Time is reduced, but CPU Time has increased by nearly 45% The reason is FF close to 99% 29

30 30 Algorithms for index creation (max MC) matching + IN(list) SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 TOTAL COST = 12,6 (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = N PREFETCH = S (Pure sequential prefetch) When we add a column with IN-list, then the estimate total cost drops to a value of 12,6 30

31 31 Algorithms for index creation (max MC) matching + IN(list) AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME The value of CPU Time also reduced but is still higher than without the index. 31

32 32 Algorithms for index creation (max MC) matching + IN(list) + screening SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 AMC_IINCOME_M1 AMC_IINCOME_M3 TOTAL COST = 7,12 (DSN_STATEMNT_TABLE) MC = 4 ACCESSTYPE = N PREFETCH = NO When we add Screening Columns the estimate total cost is only 7,12 32

33 33 Algorithms for index creation (max MC) matching + IN(list) + screening AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME According to the accouting report, we see a huge drop in the value of the Elapsed and CPU Time. 33

34 34 Algorithms for index creation (max MC) matching + IN(list) + screening + INDEX ONLY SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = 6,4 (DSN_STATEMNT_TABLE) MC = 4 ACCESSTYPE = N PREFETCH = NO INDEX IX1 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M2 AMC_IINCOME_M1 AMC_IINCOME_M3 AMC_CENTRO_ALTA AMC_ENTIDAD AMC_IINCOME_M4 When we do Index Only access then estimate total cost is 6,4 34

35 35 Algorithms for index creation (max MC) matching+ IN(list) + screening + INDEX ONLY AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME With Index Only access the CPU and Elapsed Time are slightly lower. 35

36 36 Algorithms for index creation (NO SORT) matching + NO SORT SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) Consider now the second algorithm. When we have no sort the estimate total cost is reduced by nearly half. 36

37 37 Algorithms for index creation (NO SORT) matching + NO SORT AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME According to the accouting report the Elapsed Time was reduced to 20,72s and the CPU Time was reduced to 1,54s. 37

38 38 Algorithms for index creation (NO SORT) matching + NO SORT + screening SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) When we add Screening Columns the estimate total cost is

39 39 Algorithms for index creation (NO SORT) matching + NO SORT + screening AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME From to the accouting report, we see a large drop in the values of the Elapsed and CPU Time. 39

40 40 Algorithms for index creation (NO SORT) matching + NO SORT + screening + INDEX ONLY SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 AND AMC_IINCOME_M2 IN(0,15,16)- FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; TOTAL COST = (DSN_STATEMNT_TABLE) MC = 3 ACCESSTYPE = I PREFETCH = S (Pure sequential prefetch) INDEX IX2 AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 AMC_ENTIDAD AMC_IINCOME_M4 When we do Index Only access then estimate total cost is

41 41 Algorithms for index creation (NO SORT) matching + NO SORT + screening + INDEX ONLY AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME With Index Only access the CPU Time is slightly lower but Elapsed Time is only 1s. 41

42 42 Algorithms for creating an index Summary of query performance results for given case We can now compare the results for our specific query and select the best from the Elapsed Time point of view. 42

43 43 Algorithms for creating an index Summary of query performance results for given case We can now compare the results for our specific query and select the best from the CPU Time point of view. 43

44 44 Algorithms for creating an index Summary The costs of adding an index or column to index Disk Space Adding index Cache (for non leaf pages) Insert 10 ms per added row Update 10 ms when columns of new index updated Delete 10 ms per removed row Index maintenance (Reorg/Rebuild/Runstat ) Disk Space Adding column to index Insert none if adequate free space Update 10 ms when new column updated Delete none If a new index must be created, it is important to measure the full impact of the new index. Every secondary index that is added to a table introduces a random reader for inserts, deletes, and updates to key values. Every insert and delete (and some updates) causes an I/O against the secondary index to add and remove keys. Typically, the secondary index is not in the same cluster of the table data, and that can result in many random I/Os to get index pages into the buffer pool for the operation. Thus an insert to a table with a secondary index will actually have additional random reads. Therefore, it is extremely important to understand the frequency of execution of all statements in an application. 44

45 45 1 Types of access to DB2 data 2 The algorithm to create an index with a maximum Matching Columns (MC) 3 The algorithm to create index for sort avoidance 4 Comparison of algorithms 5 Tuning multiple queries workload

46 46 Designing indexes for number of queries (workload) What if you have: 10 different queries Create 10 different indexes (or consolidation of indexes)? Possible... With some downsides (time, resources) 1000 queries Consolidate proposed indexes somehow in 1 or a few indexes? (more time needed Hi boss, I will be ready with it in 2016 ) What if you do not know your queries or how often execute them (common in case of dynamic queries)? Magic wand pls!!! Jacek has shown algoritms for Boolean term predicates for particular query.. However that does not address the question, what if you have 10 queries, will you define 10 indexes? Or if you have 1000 queries? Will you make this effort to consider index for every select based on frequency of query? How you consolidate them. Or if you do not know what queries, users run, since it is eg hard to collect (dynamic queries) Actually, when I asked Jacek, he replied to me.. On this table in our system, we in fact have just one query.. So this is perfect way indeed for Jacek s company.. How about other customers? How they should deal with such approach? Knowing how index should be defined is crucial, and helps you to correct indexes you already have. So this is extremly valuable what jacek presented.. Having a tool for it, is nice, but still does not mean, you do not need to think anymore..

47 47 Optim Query Workload Tuner for DB2 Index Advisor Improve query efficiency Indexing foreign keys in queries that do not have indexes defined Identifying index filtering and screening Support for index only access (with INCLUDE columns supported after DB2 V10) Indexing to avoid sorts Simplify use Consolidate indexes and provide a single recommendation Enables what if analysis Provides DDL to create indexes Run immediately or save Test before deployment Utilize virtual index capabilities built into the DB2 engine Compare the access plan change after applying index recommendations virtually Magic wand is not for free, however probabably heaper than DBA time (depends on location ;-)) IBM, similary to other vendors (like eg. BMC / CA) has a tool called Index Advisor, which is part of Optim Workload Query Tuner This is a cost based tool, it uses under the covers virtual indexes and compares their costs taken from explain tables. So lets see what this tool will propose for Jacek query.. And later what it proposes for other sample workload queries.. Query Tuner also provides index advice. It analyzes the query and recommends additional indexes that would benefit the query access. Index advisor might recommend indexes for the following reasons. Foreign keys that do not have indexes defined. Indexes that will provide index filtering and/or screening for the SQL statement. Indexes that will provide index-only access for the SQL statement. Indexes that can help to avoid sorts.

48 48 Optim Query Workload Tuner for DB2 Index Advisor Identify problematic query to improve access path Ensure statistics are up to date (run recommended RUNSTATS) Review index DDL Verify / test proposed indexes in runtime

49 49 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Identify problematic query to improve access path As you see, this query is simple.. If you do not know anything about data.. In fact this is extremly hard query to propose index.. Why? Look at filtering.. Those predicates does not filter much, filter factor is % for 4 predicates and data is skewed. Only for range predicate it is somehow filtering well. But range predicate is not what we like most in index creation, right? So it is bit questionable if at all to create index for this index at all.. Maybe a Rscan would be better here, taking into account, eg that index likely will be size of data? But what we know about data now? Maybe not enough, yet? Anyway, lets try to see, if index advisor has anything interesting to propose here.

50 50 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Up to date statistics Before you start creating index, there is one prior step required, in order to DB2 know your data.. You need current statistics.. For this you can use part of QWT, Statistic advisor (free of charge)

51 51 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Up to date statistics So here is RUNSTATS and we run it.. After it completes, we re-iterate Statistic option to be sure, it was the only recommendation (and so on till no stats is recommended).. And then we can ask for index recommendation..

52 52 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Review index DDL INDEX VIRT Index proposed by Index Advisor is similar (flipped two last columns) to X2 (no sort index, manually designed on previous pages) AMC_NPERIOD AMC_CENTRO_MOD AMC_IINCOME_M1 AMC_CENTRO_ALTA AMC_IINCOME_M2 AMC_IINCOME_M3 AMC_IINCOME_M4 AMC_ENTIDAD And we have index recommendation.. And we can test candidate index to see how AP will be.. Index is very similar as X2 that avoided sort..

53 53 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime Index suggested by Index Advisor AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME So we have here a fairly good index, with no sort, and index scan, which is comparable to index that was handcrafted manually by Jacek

54 54 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X2 index (manually designed), with the same cost estimate and similar performance AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME To compare with index X2.. So we can say, Index Advisor did almost same good choice like with index X2

55 55 Optim Query Workload Tuner for DB2 Index Advisor single query tuning Verify / test proposed indexes in runtime X1 has higher cost estimate and is using workfile. It performed very well due to: strong correlation between range predicate and in list predicate. It reduced significantly the number of rows. AVERAGE APPL(CL.1) DB2 (CL.2) ELAPSED TIME CP CPU TIME SUSPEND TIME And now look for access path and cost estimation of Index X1, that was manually designed by Jacek and performed like 100% better than the ones that avoid sort (X2). When I first time saw it I asked myself, why? Why index advisor did not propose it in this case? Answer is not obvious unless you look closer at your data and correlation between columns (next foil)

56 56 Optim Query Workload Tuner for DB2 Index Advisor single query tuning SELECT AMC_ENTIDAD,AMC_IINCOME_M4,AMC_CENTRO_ALTA FROM XXXX.BGDTAMC WHERE AMC_CENTRO_MOD= - FF=0,99 AND AMC_IINCOME_M1 > 0 - FF=0,16 } actual FF=0,034 AND AMC_IINCOME_M2 IN(0,15,16)- X FF=0,83 AND AMC_NPERIOD = 6 - FF=0,95 AND AMC_IINCOME_M3 < FF=0,84 ORDER BY AMC_IINCOME_M1,AMC_CENTRO_ALTA ; Returned rows are reduced a lot (3.4% << 84%) due to strong correlation between AMC_IINCOME_M1 and AMC_IINCOME_M2 21 mln out of 25 mln total rows are of value 0 AMC_IINCOME_M1 > 0 in almost all cases implies that AMC_IINCOME_M2 IN(0,15,16) is rather AMC_IINCOME_M2 IN(15,16) extremely selective actual FF. Because of the wrong estimation, the returning rows (108,545) after index scan is much fewer than estimation (1,312,881), DB2 over-estimates the cost of sorting for ORDER BY. So that DB2 chooses the index which can avoid sorting. While in this case, because of the strong correlation, adding P_M2 into the matching list can reduce a lot of costs during screening index leaf pages. Finally in the accounting report, the 4-matching column index has better performance. Without knowing correlation it would be suprsing to see this index performed well. For same data, but different predicates, this index can perform far worse... Eg. When AND AMC_IINCOME_M1 >=0, then filtering would not be so good, and and index would perform worse than X2 (check it! ) Or if correlation between columns (different data) is not so good it would be same story. One can ask, why we did not detect correlation, to take it into account for costing.. Answer is: Because there is range predicate (AMC_IINCOME_M1 > 0), is not useful even if we collect colgroup statistics We do not do this for range predicate, we cannot correlate them and use by optimizer, yet (RFE opened (24009? ) ).. So Jacek did design 2 indexes, but he verified, which one performs better for

57 57 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning WHAT IF you have multiple queries? Single query tuning vs. workload tuning: Single query tuning concerns the performance of a specific query Workload tuning focuses on the performance of all queries in the workload An application may (and usually does) consist of set of queries, and it is not practical to perform single query tuning for each query. Analyzing queries in isolation does not account for the effect of index changes to other queries and may result in too many indexes Index created for single query may result in creating an imbalance for other queries That was about single query, and now lets see how it would look like for queries workload..

58 58 Optim Query Workload Tuner for DB2 Workload Index Advisor Steps to be taken: Identify the workload (queries) to be tuned Review index recommendations Validate and compare before and after 58

59 59 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning This is example, we can select many sources.. Here I select queries from Dynamic statement cache, but we can select it from packages, plan_table, etc..

60 60 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning Estimated performance improvement And that is what we got.. Set of indexes, that are suited to queries, we captured, with potential / ESTIMATED benefit (change in performance) and there is also listed for the increase of DASD space that indexes would take.

61 61 Optim Query Workload Tuner for DB2 Index Advisor workload query tuning Comparing workload before and after changes (and after RUNSTATS for new indexes) Real performance improvement Performance with no statistics for new indexes Performance has improved after creating the new indexes: elapsed time reduced from 1349,23s to 395,60s CPU time reduced from 406,55s to 191,12s. And now, we created index and run the workload again and compare so we can see how it performed.. Remember, you need to run also recommended RUNSTATS so optimizer have enough info on indexes you created.. If you do not, then it will use defaults or abandon index, so performance cen be worse.. On our example workload it was worse..

62 62 Conclusions / Summary Know index design algorithms and weigh their pros & cons Know queries that run on your tables Balance what is more practical or economical for you designing index for multiple queries workload requires more time ($) or a tool ($) Verify indexes performance during runtime - Every DBA should know algorithms for index design. This is however manual and timeconsuming task. In some cases this can give us cheap and quick way to design index, when query is simple and we do not have to take care about other queries. - It is essential to know all queries that you have, so you do not imbalance other queries with designed index for one query -Designing indexes for multiple queries requires more time, more work, so you need to balance what is more efficient/quicker/cheaper to design indexes by yourself (time spent/cost of DBA work) or if to use tool (quicker, but also involves cost of tool) - The most important in this picture, is not index design but the fact how such index works on your data/your query. You should always test and verify if index you or tool invented, works as expected, with no suprises ;-).. I wish all your indexes to be well designed, and I hope our presentation would help you with this process. -If you hqave any questions, please feel free now to ask, or we can always be available on session breaks or via .

63 63 QUESTIONS

64 64 References Relational Database Index Design and the Optimizers Tapio Lahdenmaki, Michael Leach DB2 for z/os and OS/390 Development for Performance Gabrielle & Associates optimquerytuner optimquerytuner optimquerytuner3

65 Jacek Surma PKO Bank Polski S.A. Session: B11 Creating indexes suited to your queries Michal Bialecki IBM SVL / SWG Cracow Lab michal.bialecki@pl.ibm.com 65

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach (C^WILEY- IX/INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface xv 1

More information

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach A JOHN WILEY & SONS, INC., PUBLICATION Relational Database Index Design and the Optimizers

More information

DB2 9 for z/os Selected Query Performance Enhancements

DB2 9 for z/os Selected Query Performance Enhancements Session: C13 DB2 9 for z/os Selected Query Performance Enhancements James Guo IBM Silicon Valley Lab May 10, 2007 10:40 a.m. 11:40 a.m. Platform: DB2 for z/os 1 Table of Content Cross Query Block Optimization

More information

What Developers must know about DB2 for z/os indexes

What Developers must know about DB2 for z/os indexes CRISTIAN MOLARO CRISTIAN@MOLARO.BE What Developers must know about DB2 for z/os indexes Mardi 22 novembre 2016 Tour Europlaza, Paris-La Défense What Developers must know about DB2 for z/os indexes Introduction

More information

z/os Db2 Batch Design for High Performance

z/os Db2 Batch Design for High Performance Division of Fresche Solutions z/os Db2 Batch Design for High Performance Introduction Neal Lozins SoftBase Product Manager All tests in this presentation were run on a dedicated zbc12 server We used our

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or Performance problems come in many flavors, with many different causes and many different solutions. I've run into a number of these that I have not seen written about or presented elsewhere and I want

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

DB2 for z/os Optimizer: What have you done for me lately?

DB2 for z/os Optimizer: What have you done for me lately? Session: A08 DB2 for z/os Optimizer: What have you done for me lately? Terry Purcell IBM Silicon Valley Lab 14 th October 2008 16:45 17:45 Platform: DB2 for z/os You can always read about the features/enhancements

More information

Revival of the SQL Tuner

Revival of the SQL Tuner Revival of the SQL Tuner Sheryl Larsen BMC Session code: F16 9:20 AM Thursday, May 3, 2018 Db2 for z/os Competing More Optimize Drowning Resources, What in Pressures Data You More Have! Problems Drowning

More information

An Introduction to DB2 Indexing

An Introduction to DB2 Indexing An Introduction to DB2 Indexing by Craig S. Mullins This article is adapted from the upcoming edition of Craig s book, DB2 Developer s Guide, 5th edition. This new edition, which will be available in May

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

More Ways to Challenge the DB2 z/os Optimizer. Terry Purcell IBM Silicon Valley Lab

More Ways to Challenge the DB2 z/os Optimizer. Terry Purcell IBM Silicon Valley Lab More Ways to Challenge the DB2 z/os Optimizer Terry Purcell IBM Silicon Valley Lab Agenda Introduction Process for validating the preferred access path Filter Factor Challenges Predicate Challenges Conclusion

More information

Do these DB2 10 for z/os Optimizer Enhancments apply to me?

Do these DB2 10 for z/os Optimizer Enhancments apply to me? Do these DB2 10 for z/os Optimizer Enhancments apply to me? Andrei Lurie IBM Silicon Valley Lab February 4, 2013 Session Number 12739 Agenda Introduction IN-list and complex ORs Predicate simplification

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example Student Introduction to Database Systems CSE 414 Hash table example Index Student_ID on Student.ID Data File Student 10 Tom Hanks 10 20 20 Amy Hanks ID fname lname 10 Tom Hanks 20 Amy Hanks Lecture 26:

More information

Presentation Abstract

Presentation Abstract Presentation Abstract From the beginning of DB2, application performance has always been a key concern. There will always be more developers than DBAs, and even as hardware cost go down, people costs have

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

Index Design in a Busy System

Index Design in a Busy System Index Design in a Busy System Larry Kintisch & Tapio Lahdenmäki BWDB2UG 9/13/2006 Agenda [70 + 70 minutes]: - A "simple" query: a difficult index - Introduction to Statistics of simple queues - Effect

More information

Query tuning with Optimization Service Center

Query tuning with Optimization Service Center Session: F08 Query tuning with Optimization Service Center Patrick Bossman IBM May 20, 2008 4:00 p.m. 5:00 p.m. Platform: DB2 for z/os 1 Agenda Overview of Optimization Service Center Workload (application)

More information

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Slide 1: Cover Welcome to lesson 3 of the db2 on Campus lecture series. Today we're going to talk about tools and scripting, and this is part 1 of 2

More information

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs Introduction to Database Systems CSE 414 Lecture 26: More Indexes and Operator Costs CSE 414 - Spring 2018 1 Student ID fname lname Hash table example 10 Tom Hanks Index Student_ID on Student.ID Data File

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

IBM Optim Query Workload Tuner for DB2 for z/os 4.1. Hands-on Labs

IBM Optim Query Workload Tuner for DB2 for z/os 4.1. Hands-on Labs IBM Optim Query Workload Tuner for DB2 for z/os 4.1 Hands-on Labs INTRODUCTION... 2 SINGLE QUERY TUNING... 5 LAB 1 CUT COST AND OPTIMIZE PERFORMANCE... 7 1.1 GETTING STARTED... 8 1.2 CREATING A SAMPLE

More information

Practical MySQL indexing guidelines

Practical MySQL indexing guidelines Practical MySQL indexing guidelines Percona Live October 24th-25th, 2011 London, UK Stéphane Combaudon stephane.combaudon@dailymotion.com Agenda Introduction Bad indexes & performance drops Guidelines

More information

Why did the DB2 for z/os optimizer choose that access path?

Why did the DB2 for z/os optimizer choose that access path? Why did the DB2 for z/os optimizer choose that access path? Terry Purcell IBM tpurcel@us.ibm.com Saghi Amirsoleymani IBM amirsole@us.ibm.com Session Code: A10 Thursday May 13 th, 9:45am 10:45am Platform:

More information

Lecture 15. Lecture 15: Bitmap Indexes

Lecture 15. Lecture 15: Bitmap Indexes Lecture 5 Lecture 5: Bitmap Indexes Lecture 5 What you will learn about in this section. Bitmap Indexes 2. Storing a bitmap index 3. Bitslice Indexes 2 Lecture 5. Bitmap indexes 3 Motivation Consider the

More information

What s new in DB2 9 for z/os for Applications

What s new in DB2 9 for z/os for Applications What s new in DB2 9 for z/os for Applications Patrick Bossman bossman@us.ibm.com Senior software engineer IBM Silicon Valley Lab 9/8/2009 Disclaimer Copyright IBM Corporation [current year]. All rights

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis.

This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis. 1 This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis. Analyzing database performance is a complex subject. This session gives an overview about the

More information

Independent consultant. Oracle ACE Director. Member of OakTable Network. Available for consulting In-house workshops. Performance Troubleshooting

Independent consultant. Oracle ACE Director. Member of OakTable Network. Available for consulting In-house workshops. Performance Troubleshooting Independent consultant Available for consulting In-house workshops Cost-Based Optimizer Performance By Design Performance Troubleshooting Oracle ACE Director Member of OakTable Network Optimizer Basics

More information

Independent consultant. Oracle ACE Director. Member of OakTable Network. Available for consulting In-house workshops. Performance Troubleshooting

Independent consultant. Oracle ACE Director. Member of OakTable Network. Available for consulting In-house workshops. Performance Troubleshooting Independent consultant Available for consulting In-house workshops Cost-Based Optimizer Performance By Design Performance Troubleshooting Oracle ACE Director Member of OakTable Network Optimizer Basics

More information

Inline LOBs (Large Objects)

Inline LOBs (Large Objects) Inline LOBs (Large Objects) Jeffrey Berger Senior Software Engineer DB2 Performance Evaluation bergerja@us.ibm.com Disclaimer/Trademarks THE INFORMATION CONTAINED IN THIS DOCUMENT HAS NOT BEEN SUBMITTED

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

CSE 344 Final Review. August 16 th

CSE 344 Final Review. August 16 th CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

CA Chorus for DB2 Database Management

CA Chorus for DB2 Database Management CA Chorus for DB2 Database Management CA Performance Handbook for DB2 for z/os Version 04.0.00 This Documentation, which includes embedded help systems and electronically distributed materials (hereinafter

More information

To REORG or not to REORG That is the Question. Kevin Baker BMC Software

To REORG or not to REORG That is the Question. Kevin Baker BMC Software To REORG or not to REORG That is the Question Kevin Baker BMC Software Objectives Identify I/O performance trends for DB pagesets Correlate reorganization benefits to I/O performance trends Understand

More information

DB2 10 for z/os Optimization and Query Performance Improvements

DB2 10 for z/os Optimization and Query Performance Improvements DB2 10 for z/os Optimization and Query Performance Improvements James Guo DB2 for z/os Performance IBM Silicon Valley Lab August 11, 2011 6 PM 7 PM Session Number 9524 Disclaimer Copyright IBM Corporation

More information

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Craig S. Mullins. A DB2 for z/os Performance Roadmap By Craig S. Mullins. Database Performance Management Return to Home Page.

Craig S. Mullins. A DB2 for z/os Performance Roadmap By Craig S. Mullins. Database Performance Management Return to Home Page. Craig S. Mullins Database Performance Management Return to Home Page December 2002 A DB2 for z/os Performance Roadmap By Craig S. Mullins Assuring optimal performance is one of a database administrator's

More information

DB2 12 for z Optimizer

DB2 12 for z Optimizer Front cover DB2 12 for z Optimizer Terry Purcell Redpaper Introduction There has been a considerable focus on performance improvements as one of the main themes in recent IBM DB2 releases, and DB2 12

More information

SQLSaturday Sioux Falls, SD Hosted by (605) SQL

SQLSaturday Sioux Falls, SD Hosted by (605) SQL SQLSaturday 2017 Sioux Falls, SD Hosted by (605) SQL Please be sure to visit the sponsors during breaks and enter their end-of-day raffles! Remember to complete session surveys! You will be emailed a link

More information

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Datenbanksysteme II: Caching and File Structures. Ulf Leser Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: Local: 0845 777 7 711 Intl: +44 845 777 7 711 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release

More information

An A-Z of System Performance for DB2 for z/os

An A-Z of System Performance for DB2 for z/os Phil Grainger, Lead Product Manager BMC Software March, 2016 An A-Z of System Performance for DB2 for z/os The Challenge Simplistically, DB2 will be doing one (and only one) of the following at any one

More information

Answer: Reduce the amount of work Oracle needs to do to return the desired result.

Answer: Reduce the amount of work Oracle needs to do to return the desired result. SQL Tuning 101 excerpt: Explain Plan A Logical Approach By mruckdaschel@affiniongroup.com Michael Ruckdaschel Affinion Group International My Qualifications Software Developer for Affinion Group International

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,

More information

FIT 100 More Microsoft Access and Relational Databases Creating Views with SQL

FIT 100 More Microsoft Access and Relational Databases Creating Views with SQL FIT 100 More Microsoft Access and Relational Databases Creating Views with SQL Creating Views with SQL... 1 1. Query Construction in SQL View:... 2 2. Use the QBE:... 5 3. Practice (use the QBE):... 6

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Query Optimization In 15 Demos Your mom is a nice lady. Erik Darling

Query Optimization In 15 Demos Your mom is a nice lady. Erik Darling Query Optimization In 15 Demos Your mom is a nice lady. Erik Darling About me Consultant at Brent Ozar Unlimited Contributor: firstresponderkit.org Before that DBA: e-discovery Developer: market research

More information

DB2 9 for z/os V9 migration status update

DB2 9 for z/os V9 migration status update IBM Software Group DB2 9 for z/os V9 migration status update July, 2008 Bart Steegmans DB2 for z/os L2 Performance Acknowledgement and Disclaimer i Measurement data included in this presentation are obtained

More information

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414 Introduction to Data Management CSE 414 Unit 4: RDBMS Internals Logical and Physical Plans Query Execution Query Optimization Introduction to Database Systems CSE 414 Lecture 16: Basics of Data Storage

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach A JOHN WILEY & SONS, INC., PUBLICATION Relational Database Index Design and the Optimizers

More information

Expert Stored Procedure Monitoring, Analysis and Tuning on System z

Expert Stored Procedure Monitoring, Analysis and Tuning on System z Expert Stored Procedure Monitoring, Analysis and Tuning on System z Steve Fafard, Product Manager, IBM OMEGAMON XE for DB2 Performance Expert on z/os August 16, 2013 13824 Agenda What are stored procedures?

More information

DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach

DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach Roy Boxwell SOFTWARE ENGINEERING GmbH Session Code: V05 15.10.2013, 11:30 12:30 Platform: DB2 z/os 2 Agenda

More information

Welcome to the presentation. Thank you for taking your time for being here.

Welcome to the presentation. Thank you for taking your time for being here. Welcome to the presentation. Thank you for taking your time for being here. In this presentation, my goal is to share with you 10 practical points that a single partitioned DBA needs to know to get head

More information

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Motivation for External Sort Often have a large (size greater than the available

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

What s new from the Optimizer in DB2 11 for z/os?

What s new from the Optimizer in DB2 11 for z/os? What s new from the Optimizer in DB2 11 for z/os? 赵雄伟 DB2 z/os Level 2 support zhaoxw@cn.ibm.com 1 Agenda Plan Management Predicate Indexability In-Memory Data Cache (sparse index) Duplicate Removal DPSIs

More information

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag. Physical Design D B M G 1 Phases of database design Application requirements Conceptual design Conceptual schema Logical design ER or UML Relational tables Logical schema Physical design Physical schema

More information

Advanced Oracle SQL Tuning v3.0 by Tanel Poder

Advanced Oracle SQL Tuning v3.0 by Tanel Poder Advanced Oracle SQL Tuning v3.0 by Tanel Poder /seminar Training overview This training session is entirely about making Oracle SQL execution run faster and more efficiently, understanding the root causes

More information

Understanding the Optimizer

Understanding the Optimizer Understanding the Optimizer 1 Global topics Introduction At which point does the optimizer his work Optimizer steps Index Questions? 2 Introduction Arno Brinkman BISIT engineering b.v. ABVisie firebird@abvisie.nl

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Modern Database Systems Lecture 1

Modern Database Systems Lecture 1 Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not

More information

CSC 261/461 Database Systems Lecture 19

CSC 261/461 Database Systems Lecture 19 CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Lesson 9 Transcript: Backup and Recovery

Lesson 9 Transcript: Backup and Recovery Lesson 9 Transcript: Backup and Recovery Slide 1: Cover Welcome to lesson 9 of the DB2 on Campus Lecture Series. We are going to talk in this presentation about database logging and backup and recovery.

More information

DB2 SQL Tuning Tips for z/os Developers

DB2 SQL Tuning Tips for z/os Developers DB2 SQL Tuning Tips for z/os Developers Tony Andrews IBM Press, Pearson pic Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Cape Town Sydney

More information

CSE 344 FEBRUARY 14 TH INDEXING

CSE 344 FEBRUARY 14 TH INDEXING CSE 344 FEBRUARY 14 TH INDEXING EXAM Grades posted to Canvas Exams handed back in section tomorrow Regrades: Friday office hours EXAM Overall, you did well Average: 79 Remember: lowest between midterm/final

More information

DB2 for LUW Advanced Statistics with Statistical Views. John Hornibrook Manager DB2 for LUW Query Optimization Development

DB2 for LUW Advanced Statistics with Statistical Views. John Hornibrook Manager DB2 for LUW Query Optimization Development DB2 for LUW Advanced Statistics with Statistical Views John Hornibrook Manager DB2 for LUW Query Optimization Development 1 Session Information Presentation Category: DB2 for LUW 2 DB2 for LUW Advanced

More information

Principles of Data Management

Principles of Data Management Principles of Data Management Alvin Lin August 2018 - December 2018 Structured Query Language Structured Query Language (SQL) was created at IBM in the 80s: SQL-86 (first standard) SQL-89 SQL-92 (what

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

V9 Migration KBC. Ronny Vandegehuchte

V9 Migration KBC. Ronny Vandegehuchte V9 Migration Experiences @ KBC Ronny Vandegehuchte KBC Configuration 50 subsystems (15 in production) Datasharing (3 way) 24X7 sandbox, development, acceptance, production Timings Environment DB2 V9 CM

More information

MySQL Indexing. Best Practices for MySQL 5.6. Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA

MySQL Indexing. Best Practices for MySQL 5.6. Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA MySQL Indexing Best Practices for MySQL 5.6 Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA For those who Does not Know Us Percona Helping Businesses to be Successful with MySQL

More information

Automatic Parallel Execution Presented by Joel Goodman Oracle University EMEA

Automatic Parallel Execution Presented by Joel Goodman Oracle University EMEA Automatic Parallel Execution Presented by Joel Goodman Oracle University EMEA Copyright 2011, Oracle. All rights reserved. Topics Automatic Parallelism Parallel Statement Queuing In Memory Parallel Query

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

DB2 V8 Neat Enhancements that can Help You. Phil Gunning September 25, 2008

DB2 V8 Neat Enhancements that can Help You. Phil Gunning September 25, 2008 DB2 V8 Neat Enhancements that can Help You Phil Gunning September 25, 2008 DB2 V8 Not NEW! General Availability March 2004 DB2 V9.1 for z/os announced March 2007 Next release in the works and well along

More information

Oracle SQL Tuning for Developers Workshop Student Guide - Volume I

Oracle SQL Tuning for Developers Workshop Student Guide - Volume I Oracle SQL Tuning for Developers Workshop Student Guide - Volume I D73549GC10 Edition 1.0 October 2012 D78799 Authors Sean Kim Dimpi Rani Sarmah Technical Contributors and Reviewers Nancy Greenberg Swarnapriya

More information

Query Processing & Optimization. CS 377: Database Systems

Query Processing & Optimization. CS 377: Database Systems Query Processing & Optimization CS 377: Database Systems Recap: File Organization & Indexing Physical level support for data retrieval File organization: ordered or sequential file to find items using

More information

Efficient Object-Relational Mapping for JAVA and J2EE Applications or the impact of J2EE on RDB. Marc Stampfli Oracle Software (Switzerland) Ltd.

Efficient Object-Relational Mapping for JAVA and J2EE Applications or the impact of J2EE on RDB. Marc Stampfli Oracle Software (Switzerland) Ltd. Efficient Object-Relational Mapping for JAVA and J2EE Applications or the impact of J2EE on RDB Marc Stampfli Oracle Software (Switzerland) Ltd. Underestimation According to customers about 20-50% percent

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Lecture 12. Lecture 12: Access Methods

Lecture 12. Lecture 12: Access Methods Lecture 12 Lecture 12: Access Methods Lecture 12 If you don t find it in the index, look very carefully through the entire catalog - Sears, Roebuck and Co., Consumers Guide, 1897 2 Lecture 12 > Section

More information

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 6 Professional Program: Data Administration and Management MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) AGENDA

More information

Oracle Database 11g: SQL Tuning Workshop. Student Guide

Oracle Database 11g: SQL Tuning Workshop. Student Guide Oracle Database 11g: SQL Tuning Workshop Student Guide D52163GC10 Edition 1.0 June 2008 Author Jean-François Verrier Technical Contributors and Reviewers Muriel Fry (Special thanks) Joel Goodman Harald

More information

Top 5 Issues that Cannot be Resolved by DBAs (other than missed bind variables)

Top 5 Issues that Cannot be Resolved by DBAs (other than missed bind variables) Top 5 Issues that Cannot be Resolved by DBAs (other than missed bind variables) March 12, 2013 Michael Rosenblum Dulcian, Inc. www.dulcian.com 1 of 43 Who Am I? Misha Oracle ACE Co-author of 2 books PL/SQL

More information

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13 CS121 MIDTERM REVIEW CS121: Relational Databases Fall 2017 Lecture 13 2 Before We Start Midterm Overview 3 6 hours, multiple sittings Open book, open notes, open lecture slides No collaboration Possible

More information

Introduction to Database Systems CSE 344

Introduction to Database Systems CSE 344 Introduction to Database Systems CSE 344 Lecture 6: Basic Query Evaluation and Indexes 1 Announcements Webquiz 2 is due on Tuesday (01/21) Homework 2 is posted, due week from Monday (01/27) Today: query

More information