Session:E07 GALIO - DB2 index advisor, how we implemented it and Viktor Kovačević, MSc Hermes Softlab 6th October 2009 14:15 (60') Platform: UDB / LUW
OUTLINE Application & Database tuning Self made index advising tool?! Why? DB2 tools Combinatorial problem of optimal index search GALIO design in detail DB2 functionalities implemented 2
APPLICATION & DATABASE TUNING Top down or bottom up approach Query workload execution optimization B+ indexes 3
APPLICATION & DATABASE TUNING Database design and B+ indexes Database optimizer Minimizing query cost Index advising tools 4
APPLICATION & DATABASE TUNING Typical situation in SW development DEVELOPMENT PRODUCTION Production 1 DB Development databases Production 2 DB Production n DB 5
APPLICATION & DATABASE TUNING 6
APPLICATION & DATABASE TUNING Improve performance of a particular statement or workload in development process for specific production environment How to include existing tools into continuous optimization of new versions of some specific product 7
APPLICATION & DATABASE TUNING Improve performance of the most frequently executed queries for the specific product release. Determine how to optimize the performance of a new key query in new version in specific production environment. Find objects that are not used in a workload. 8
APPLICATION & DATABASE TUNING SELECT Max(Optimization) FROM Development WHERE Title in ( Developer, Database developer, DBA, Business analyst, Project Manager ) UNION SELECT Max(Optimization) FROM Production WHERE Title in ( Business owners, IT stuff, DBA ) 9
SELF MADE INDEX ADVISING TOOL?! WHY? 10
SELF MADE INDEX ADVISING TOOL?! WHY? Automate optimization process as part of standard development process Build & tune new release of the product automatically Optimize product disconnected from production database environment 11
DB2 TOOLS DB2 has had Index Advisor since Version 6.1 12
DB2 TOOLS 13
DB2 TOOLS 14
DB2 TOOLS 15
DB2 TOOLS The Design Advisor analyzes a specified workload and considers factors such as the type of workload statements Frequency with which a particular statement occurs Characteristics of your database to generate recommendations that minimize the total cost to run the workload 16
DB2 TOOLS You can import statements into the file from several sources: Delimited text file Event Monitor table Query Patroller historical data tables by using the -qp option from the command line Explained statements in the EXPLAINED_STATEMENT table Recent SQL statements that have been captured with a DB2 snapshot. 17
DB2 TOOLS To run the Design Advisor on dynamic SQL statements: Reset the database monitor with the following command: db2 reset monitor for database databasename Issue the db2advis command with the -g option. If you want to save the dynamic SQL statements in the ADVISE_WORKLOAD table for later reference, use the -p option as well. 18
COMBINATORIAL PROBLEM OF OPTIMAL INDEX SEARCH Or Index Or on Or on EMP index How on EMP about updates on lastname, hiredate, EMP table? On EMP table?? lastname, salary salary, Or hiredate salary Combinatorial explosion 19
T = { T1, T2} C1 = { C11, C12} C2 = { C21, C22, C23} N = { n1 = 2, n2 = 3} I1 = {( C11), ( C12 ), ( C11, C12 ), ( C12, C11) } ( C21), ( C22 ), ( C23 ), ( C21, C22 ), ( C21, C23 ), ( C22, C23 ), ( C22, C21), ( C23, C21)( C23, C22 ), I 2 = ( C21, C22, C23 ), ( C21, C23, C22 )( C22, C21, C23 ), ( C22, C23, C21), ( ) ( ) C23, C21, C22, C23, C22, C21 I = I 1 2 = 2 i= 1 1 3 n1! = 2 + 2 = 4 ( n i)! n2! 3! 3! 3! = + + = 3 + 6 + 6 = 15 ( n i)! 2! 1! 0! i= 1 2 ρ( I ) 1 ρ( I ) 2 ΔC ΔC Δ c = 3 = 2 = 2 Δ I C 1 Δ I C 2 Δ T 4 = 2 = 16 15 = 2 = 32768 = 3 I = I 1 2 ΔC 1 I = I ΔC 2 ΔC ρ( I1) = ρ( I1) = {{},{( C11) },{( C12 )}...,{( C11), ( C12 ), ( C11, C12 ), ( C12, C11) }} {},{ ( C21) },{( C22 )},{( C23 )},..., Δ Δ ( ) ( ) ( ) ( ) ( ) ( ) ( ) T C Δ { Δ I 21, C22, C23, C21, C22, C21, C23, C22, C23, C22, C, C 1 4 4 4 4 C Δ 21 ) = T ρ( I 2 ρ( I 2 ) = 1 ) = ( = ) ( + ) ( + + ) ( = 1 + 4 + 6 )( + 4 = 15 ) ( ) ( ) ( ) C23, C 21, C23, C22, C21, C22, C23, C21, C23, C22 C22, C21, C23, i= 0 i 0 1 2 3 C22, C23, C21, C23, C21, C22, C23, C22, C21 }} Δ C Δ I 2 15 15 15 15 C ΔT ρ( I 2 ) = ΔC Δ= + C Δ + C Δ + = 1 + 15 + 105 + 455 = 576 C Ω = ρ( I1 ) ρ( I 2 ) 0= ρ( I1 1) ρ( I2 ) = 16 3 32768 = 522848 ΔT Ω = ρ( I ΔC ΔT 1 ) ρ( I ΔC ΔT 2 ) = 576 *15 = 8640 GALIO DESIGN IN DETAIL 20 Genetski algoritmi so preiskovalna tehnika, ki omogoča suboptimalno reševanje optimizacijskih problemov. Genetski algoritmi predstavljajo vrsto evolucijskih algoritmov, ki uporabljajo tehnike in pristope, temelječe na zakonitostih biološke evolucije, kot so dedovanje (angl. inheritance), mutacija, selekcija, križanje (angl. crossover, recombination). Genetski algoritmi so implementirani kot računalniška simulacija. Populacija rešitev za dani problem (osebki ali fenotipi) stremi k optimalni rešitvi. Predstavitve ali kodiranja rešitev v genetskem algoritmu imenujemo kromosomi, genotipi ali genomi. Tradicionalno so rešitve predstavljene kot dvojiški nizi (angl. binary string), vendar so mogoče tudi drugačne rešitve, ki so prilagojene konkretnemu problemu. Treba pa je zagotoviti, da so način kodiranja in ustrezni evolucijski operatorji (mutacija, križanje) skladni. V magistrski nalogi smo razvili GA za problem izbire indeksov, kjer so rešitve predstavljene v dvodimzionalni obliki v obliki matrik. Vsaka sekundarna indeksna konfiguracija je predstavljena z matriko tako, da stolpci matrike predstavljajo stolpce tabel iz entitetno-relacijskega modela, urejene leksikografsko in grupirane po pripadajočih tabelah. Vrstice matrike predstavljajo posamezne sekundarne indekse v indeksni konfiguraciji tako, da je vsak element matrike zaporedna številka pripadajočega stolpca v sekundarnem indeksu ali prazna vrednost v primeru, da stolpec ne pripada konkretnemu indeksu. Poleg tega ima vsaka matrika temeljno verjetnostno vrstico, ki vsebuje verjetnosti izbire stolpcev tabel kot gradnika sekundarnega indeksa, ki temelji na številu različnih vrednosti podatkov v stolpcu. Ta vrednost je pridobljena iz statistike podatkov v tabeli, ki je sestavni del sistemskega dnevnika statistik podatkovne baze. 20
GALIO DESIGN IN DETAIL 21 Genetski algoritmi so preiskovalna tehnika, ki omogoča suboptimalno reševanje optimizacijskih problemov. Genetski algoritmi predstavljajo vrsto evolucijskih algoritmov, ki uporabljajo tehnike in pristope, temelječe na zakonitostih biološke evolucije, kot so dedovanje (angl. inheritance), mutacija, selekcija, križanje (angl. crossover, recombination). Genetski algoritmi so implementirani kot računalniška simulacija. Populacija rešitev za dani problem (osebki ali fenotipi) stremi k optimalni rešitvi. Predstavitve ali kodiranja rešitev v genetskem algoritmu imenujemo kromosomi, genotipi ali genomi. Tradicionalno so rešitve predstavljene kot dvojiški nizi (angl. binary string), vendar so mogoče tudi drugačne rešitve, ki so prilagojene konkretnemu problemu. Treba pa je zagotoviti, da so način kodiranja in ustrezni evolucijski operatorji (mutacija, križanje) skladni. V magistrski nalogi smo razvili GA za problem izbire indeksov, kjer so rešitve predstavljene v dvodimzionalni obliki v obliki matrik. Vsaka sekundarna indeksna konfiguracija je predstavljena z matriko tako, da stolpci matrike predstavljajo stolpce tabel iz entitetno-relacijskega modela, urejene leksikografsko in grupirane po pripadajočih tabelah. Vrstice matrike predstavljajo posamezne sekundarne indekse v indeksni konfiguraciji tako, da je vsak element matrike zaporedna številka pripadajočega stolpca v sekundarnem indeksu ali prazna vrednost v primeru, da stolpec ne pripada konkretnemu indeksu. Poleg tega ima vsaka matrika temeljno verjetnostno vrstico, ki vsebuje verjetnosti izbire stolpcev tabel kot gradnika sekundarnega indeksa, ki temelji na številu različnih vrednosti podatkov v stolpcu. Ta vrednost je pridobljena iz statistike podatkov v tabeli, ki je sestavni del sistemskega dnevnika statistik podatkovne baze. 21
GALIO DESIGN IN DETAIL Ω Ω COST Ω = Ni COSTi ( qi ) + INDEXSTAT ( Ii ) qi Q Ii SI COST Ω i ( q i ) represents the cost estimation given by the database optimizer and calculated through a query explanation plan mechanism New index configuration SI ' SI that contains only the usable indexes. COST ΩBEST min F T I Ω Ω 1 Ni COSTi ( qi ) + F2 INDEXSTAT ( Ii ) qi Q Ii SI = Δc, Δt, DATASTAT Optimization goal: Minimum of the total query execution cost with minimal number of indexes. 22
GALIO DESIGN IN DETAIL 23
GALIO DESIGN IN DETAIL Table data statistics Database table model (Entity-relationship model) Estimated query access paths (index usability, costs) Genetic Algorithm Query workload Column probabilites adaptation module New index candidates 24
GALIO DESIGN IN DETAIL 25
GALIO DESIGN IN DETAIL Deleting index Adding new index 26
GALIO DESIGN IN DETAIL Deleting column Swap to columns Adding new column 27
GALIO DESIGN IN DETAIL 28
GALIO DESIGN IN DETAIL Import query workload from various sources (from database or specific application logs). Evaluating existing query cost with different optimizer settings. Export index configuration DDLs. Creating recommended indexes. Compare different index configurations.... 29
GALIO DESIGN IN DETAIL 30
DB2 FUNCTIONALITIES IMPLEMENTED DBLOOK Generate the DDL for all objects in database SAMPLE UPDATE statements to replicate the statistics on all tables and indexes UPDATE statements for optimizer-related database and database manager db2look configuration -d SAMPLE parameters -a -e -m -l -x -f -o db2look.sql db2set statements for optimizer-related registry variables DDL for all user-defined database partition groups, buffer pools and table spaces in database SAMPLE. 31
DB2 FUNCTIONALITIES IMPLEMENTED EXPLAIN PLAN EXPLAIN PLAN SELECTION SET QUERYNO = 13 SET QUERYTAG = 'TEST13' FOR SELECT C1 FROM T1 32
DB2 FUNCTIONALITIES IMPLEMENTED EXPLAIN PLAN The Explain tables capture access plans when the Explain facility is activated. The Explain tables must be created before Explain can be invoked. db2 -tf EXPLAIN.DDL Related tables: EXPLAIN_ARGUMENT EXPLAIN_OBJECT EXPLAIN_OPERATOR EXPLAIN_PREDICATE EXPLAIN_STREAM EXPLAIN_INSTANCE EXPLAIN_STATEMENT 33
Session GALIO - DB2 index advisor, how we implemented it and Viktor Kovačević Hermes Softlab viktor.kovacevic@hermes-softlab.com 34