Administrivia CS133:Databases Spring2017 Lec13 3/02 Prof.BethTrushkowsky Lab3startsnextweek Noproblemsetoutthisweek Updatedgrutoringhours TakepollonPiazzaformyofficehoursNming! GoalsforToday ReasonaboutthestagesofqueryopNmizaNon UnderstandhowtoesNmatethecostofafull queryplan Pipeliningvs.materializaNon Intermediateresultsizes What%plans% are% considered?% CostXbasedQuerySubXSystem Queries QueryParser QueryOpNmizer Plan Generator PlanCost EsNmator Ideally:%find%the%best%query%plan% Reality:%avoid%the%worst%plans!% How%is%the%cost%of%a% plan%es4mated?% CatalogManager QueryPlanEvaluator Schema StaNsNcs
QueryOpNmizaNonOverview QueryconvertedtorelaNonalalgebraexpression RelaNonalalgebraconvertedtotree,joinsasbranches Operators)can)also)be)applied)in)different)order!) Each%operator%has%implementa4on%choices%!%Choosing%forms%physical%plan% QueryOpNmizeralgorithm Goal:given%a%a%query,theopNmizerwantsto Decidewhichqueryplanstoconsider Compareplansandchoosethe best one (best=shortestnmetorun) SELECTS. FROMR,S WHERER.sid=S.sidAND R.ANDS.raNng>5 π () σ ( ranng>5) ( ) % % %% ra4ng%>%5% % Howaboutthisalgorithm? Step1:enumerate%the%space%of%all%possible%plans% Step2:run%each%query%plan,%measure%its%run4me% Step3:choose%the%plan%that%ran%the%fastest!% Lecbranchisthe outer relanon % % QueryOpNmizeralgorithm Goal:given%a%a%query,theopNmizerwantsto Decidewhichqueryplanstoconsider Compareplansandchoosethe best one (best=shortestnmetorun) Algorithm Step1:consider%a%set%of%possible%plans% Step2:es4mate%cost%for%each%plan% Step3:choose%the%plan%with%lowest%cost% EsNmaNngCost Don twanttoexecuteaplantofigureoutitsrunxnme! Insteades4mate%costoftheplan Usecostasaproxy%for%runJ4me% Costofaplan=sum%of%the%costs%for%each%operator%in% the%plan%
Reasoningaboutoperatorcost ForquesNonsbelow,assume: EachrelaNonis5pagesandstoredasaheapfile,noindexes Bufferpoolhas4frames JoinalgorithmispageXnestedXloopXjoin(PNLJ) Order)by)operatorusesgeneralexternalmergeXsort 1. (Review)WhatisthecostinI/Osforthisplan,ignoringcostof finaloutput? A B 2. Nowwhataboutthecostofthisplan?What)informa8on)are)you) missing?) ORDERBY(A.foo) Pipelinedvs.Materialized Queryplanoperator soutputcouldbegenerated ineithermaterializedorpipelined)fashion Materialized% OutputofanoperatorwriMen%back%to%disk%asa temporaryfilebeforeitsparentreadsitin Pipelining( onxthexfly ) Outputofoperator%immediately%given%to%parent%as input A B Pipelining Parentandchildoperators)execu8ng)concurrently) Iteratormodel Parentcallsnext()onchild/children (Asneeded)childcallsnext()onitschild/children SavingscomparedtomaterializaNon NowriteI/Ocostforchild soutput NoreadI/Ocostforparent sinput Algorithmsofoperatorsmustsupportpipeliningfor thistowork Exercise2:Pipelining UsePageXNestedXLoopjoinsforthejoin algorithm Someexamples: (AjoinB)joinC Pipelined Cjoin(AjoinB) Since(AjoinB)istheinnerrelaNonforthesecondjoin,need tomaterializeit
SchemaforExamples (sid:integer,:string,ra8ng:integer,age:real) (sid:integer,bid:integer,day:date,rname:string) : Eachrecordis40byteslong 100recordperpage 1000pages : Supposethereare100%boats% (uniformlydistributed) Eachrecordis50byteslong, 80recordperpage Supposethereare10%ra4ngs (uniformlydistributed1x10) 500pages Cost:500+500*1000I/Os Bynomeanstheworstplan! Misses)several)opportuni8es: selecnonscouldhavebeen pushed earlier,nouseismadeofanyavailable indexes... Goal1of1op3miza3on:))Tofindmore efficientplansthatcomputethesame answer. MoNvaNngExample SELECT S. FROM R, S WHERE R.sid=S.sid AND R. AND S.rating>5 QueryPlan: AlternaNvePlans PushSELECTs (NoIndexes) AlternaNvePlans PushSELECTs (NoIndexes) rating>5 (Scan & Write to temp T) 500,500IOs 250,500IOs 250,500IOs 4010IOs 500+1000+10+(250*10)
Exercise3X4:EsNmateI/Ocost Exercise4:EsNmateI/Ocost (Scan & Write to temp T) 6000IOs 4250IOs 1000+500+250+(10*250) 6000IOs 1000+(10*500) AlternaNvePlan:Indexes Supposehavetheseindexes: ClusteredAlt1%hash%index%on%bid)of UnclusteredAlt2hash%index%on%sid%of Gesngwith: Usingindex,weget100,000/100boats =1000recordson1000/100=10pages Cost:SelecNonon(10I/Os); then,foreachtuple,get[one]%matching tuple(1000*1.2) =1210I/Os. Joincolumnsidisa keyfor! (Use Alt 1 hash Index on bid) (Index Nested Loops on sid) Duetoindexon sid,decidenot topushdown ra8ng)>5) QueryBlocks:UnitsofOpNmizaNon Outer)block) SELECTS. FROMS WHERES.ageIN (SELECT))MAX)(S2.age)) )))))))FROM)))S2) )))))))GROUP)BY))S2.ra8ng) Nested)block) AnSQLqueryisparsedintoacollecNonofquery blocks,andtheseareopnmizedoneblockatanme. InnerblocksareusuallytreatedassubrouNnes Computed: onceperquery(foruncorrelatedsubxqueries) oronceperoutertuple(forcorrelatedsubxqueries)
TheSystemRaka SelingerXstyle QueryOpNmizer Impact: InspiredmostopNmizersinusetoday WorkswellforsmallXmediumcomplexityqueries (<10joins) Cost%es4ma4on: Veryinexact,butworksokinpracNce. StaNsNcs,maintainedinsystemcatalogs,usedtoesNmate costofoperanonsandresultsizes. ConsidersasimplecombinaNonofCPUandI/Ocosts. Plan%Space:Toolarge,mustbepruned! StaNsNcsandcardinalityesNmaNon Catalogstypicallycontainatleast: tuples(ntuples)andpages(npages)perrelanon andforeachindex: disnnctkeyvalues(nkeys). low/highkeyvalues(low/high). Indexheight(Height)foreachtreeindex. Indexsize(NPages)(e.g.,leafpagesfortree). StaNsNcsincatalogsupdatedperiodically. UpdaNngwheneverdatachangesistooexpensive;lotsof approximanonanyway,soslightinconsistencyok. PatSelinger:hups://wwwX03.ibm.com/ibm/history/witexhibit/wit_fellows_selinger.html SizeEsNmaNonandReducNonFactors Consideraqueryblock: Reduc8on)factor)(RF))associatedwitheachterm reflectstheimpactoftheterminreducingresultsize RF)is)also)called) selec3vity ) SELECTauributelist FROMrelaNonlist WHEREterm1AND...ANDtermk Howtopredictsizeofoutput? Needtoknow/esNmateinputsize Needtoknow/esNmateRFs Needtoknow/assumehowtermsarerelated ResultSizeEsNmaNonforSelecNons Resultcardinality(forconjuncNveterms)= %%input%tuples*%%product%of%all%rf s AssumpNons: 1.Valuesareuniformlydistributed and%terms%are%independent! 2.InSystemR,statsonlytrackedforindexedauributes (modernsystemshaveremovedthisrestricnon) Term% col=value col>value) Reduc4on%Factor% 1/Nkeys(I) (High(I)Xvalue)/(High(I)XLow(I)) Note:)in)System)R,)if)missing)indexes,)assume)RF)=)1/10)
Exercise5 RF=16/40*1/10=1/25 Resultsize:20pagesor1600tuples ForequiXjoinofRandS )range)of)result)sizes)(in))of)tuples)? IfRandShaveno%join%aMribute%values%incommon? Ifjoinauributesareakey%for%S? AndifthejoinauributesarealsoaforeignkeyinR? General%case:joinauributesaincommon,akeyforneither Assump8on:thesetofdisNnctR.avaluesiscontainedinS.a )Idea:))eachtupleofRhasa1/NKeys(S)chanceofjoiningwitha tupleins NTuples(R)*NTuples(S)/NKeys(S) ReversingaboveassumpNonyields Ntuples(S)*Ntuples(R)/Nkeys(R) ResultSizeEsNmaNonforJoins (use%smaller%of%two%if%different)%