Scalable Exploration of Physical Database Design

Size: px
Start display at page:

Download "Scalable Exploration of Physical Database Design"

Transcription

1 Scalable Exploratio of Physical Database Desig Ard Christia Köig Microsoft Research Shubha U. abar Staford Uiversity Abstract Physical database desig is critical to the performace of a large-scale DBMS. The correspodig automated desig tuig tools eed to select the best physical desig from a large set of cadidate desigs quickly. However, for large workloads, evaluatig the cost of each query i the workload for every cadidate does ot scale. To overcome this, we preset a ovel compariso primitive that oly evaluates a fractio of the workload ad provides a accurate estimate of the likelihood of selectig correctly. We show how to use this primitive to costruct accurate ad scalable selectio procedures. Furthermore, we address the issue of esurig that the estimates are coservative, eve for highly skewed cost distributios. The proposed techiques are evaluated through a prototype implemetatio iside a commercial physical desig tool. 1 Itroductio The performace of applicatios ruig agaist eterprise database systems depeds crucially o the physical database desig chose. To eable the exploratio of potetial desigs, today s commercial database systems have icorporated APIs that allow What-if aalysis [8]. These take as iput a query Q ad a database cofiguratio C, ad retur the optimizer-estimated cost of executig Q if cofiguratio C were preset. This iterface is the key to buildig tools for exploratory aalysis as well as automated recommedatio of physical database desigs [13, 20, 1, 10]. The problem of physical desig tuig is the defied as follows: a physical desig tool receives a represetative query workload WL ad costraits o the cofiguratio space to explore as iput, ad outputs a cofiguratio i which executig WL has the least possible cost as measured by the optimizer cost model) 1. To determie the best cofiguratio, a umber of cadidate cofiguratios are eumerated ad the evaluated usig What-if aalysis. The represetative workload is typically obtaied by tracig the queries that execute agaist a productio system usig tools such as IBM Query Patroler, SQL Server Profiler, or ADDM [10]) over a represetative period of time. 1 ote that queries iclude both update ad select statemets. Thus this formulatio models the trade-off betwee the improved performace of select-queries ad the maiteace costs of additioal idexes ad views. Sice a large umber of SQL statemets may execute i this time, the straightforward approach of comparig cofiguratios by repeatedly ivokig the query optimizer for each query i WL with every cofiguratio is ofte ot tractable [5, 20]. Our experiece with a commercial physical desig tool shows that a large part of the tool s overhead arises from repeated optimizer calls to evaluate large umbers of cofiguratio/query combiatios. Most of the research o such tools [13, 20, 1, 10, 7, 8] has focused o miimizig this overhead by evaluatig oly a few carefully chose cofiguratios. Our approach is orthogoal to this work ad focuses o reducig the umber of queries for which the optimizer calls are issued. Because the topic of which cofiguratios to eumerate has bee studied extesively, we will ot commet further o the cofiguratiospace. Curret commercial tools address the issue of large workloads by compressig them up-frot, i.e. iitially selectig a subset of queries ad the tuig oly this smaller set [5, 20]. These approaches do ot offer ay guaratees o how the compressio affects the likelihood of choosig the right cofiguratio. However, such guaratees are importat due to the sigificat overhead of chagig the physical desig ad the performace impact of bad database desig. The problem we study i this paper is that of efficietly comparig two or more) cofiguratios for large workloads. Solvig this problem is crucial for scalability of automated physical desig tuig tools [20, 1, 10]. We provide probabilistic guaratees o the likelihood of correctly choosig the best cofiguratio for a large workload from a give set of cadidate cofiguratios. Our approach is to sample from the workload ad use statistical iferece techiques to compute the probability of selectig correctly. The resultig probabilistic compariso primitive ca be used for a) fast iteractive exploratory aalysis of the cofiguratio space, allowig the DB admiistrator to quickly fid promisig cadidates for full evaluatio, or b) as the core compariso primitive iside a automated physical desig tool, providig both scalability ad locally good decisios with probabilistic guaratees o the accuracy of each compariso. Depedig o the search strategy used, the latter ca be exteded to guaratees o the quality of the fial result. The key challeges to this probabilistic approach are

2 twofold. First, the accuracy of the estimatio depeds critically o the variace of the estimator we use. The challege is thus to pick a estimator with as little variace as possible. Secod, samplig techiques rely o a) the applicability of the Cetral Limit Theorem CLT) [9] to derive cofidece statemets for the estimates ad b) the sample variace beig a good estimate of the true variace of the uderlyig distributio. Ufortuately, both assumptios may ot be valid i this sceario. Thus, there is a eed for techiques to determie the applicability of the CLT for the give workload ad set of cofiguratios. 1.1 Our Cotributios We propose a ew probabilistic compariso primitive that give as iput a workload WL, a set of cofiguratios C ad a target probability α outputs the cofiguratio with the lowest optimizer-estimated cost of executig WL with probability at or above the target probability α. It works by icremetally samplig queries from the origial workload ad computig the probability of selectig the best cofiguratio with each ew sample, stoppig oce the target probability is reached. Our work makes the followig saliet cotributios: We derive probabilistic guaratees o the likelihood of selectig the best cofiguratio Sectio 4). We propose a modified samplig scheme that sigificatly reduces estimator variace by leveragig the fact that query costs exhibit some stability across cofiguratios Sectio 4.2). We show how to reduce the estimator variace further through a stratified samplig scheme that leverages commoality betwee queries Sectio 5). Fially, we describe a ovel techique to address the problem of highly skewed distributios i which the sample may ot be represetative of the overall distributio ad/or the CLT may ot apply for a give sample size Sectio 6). The remaider of the paper is orgaized as follows: I Sectio 2 we review related work. I Sectio 3 we give a formal descriptio of the problem ad itroduce the ecessary otatio. I Sectio 4 we describe two samplig schemes that ca be used to estimate the probability of selectig the correct cofiguratio. I Sectio 5 we the show how to reduce variaces through the use of stratificatio ad combie all parts ito a efficiet algorithm. I Sectio 6 we describe how to validate the assumptios o which the probabilistic guaratees described earlier are based. We evaluate our techiques experimetally i Sectio 7. 2 Related Work The techiques developed i this paper are related to the field of statistical selectio ad rakig [15] which is cocered with the probabilistic rakig of systems i experimetal setups based o a series of measuremets from each system. However, statistical rakig techiques are typically aimed at comparig systems for which the idividual measuremets are distributed accordig to a ormal distributio. This is clearly ot the case i our sceario. To icorporate o-ormal distributios ito statistical selectio, techiques like batchig e.g. [17]) have bee suggested, that iitially geerate a large umber of measuremets, ad the trasform this raw data ito batch meas. The batch sizes are chose to be large eough so that the idividual batch meas are approximately idepedet ad ormally distributed. However, because procedures of this type eed to produce a umber of ormally distributed estimates per cofiguratio, they require a large umber of iitial measuremets accordig to [15], batch sizes of over 1000 measuremets are commo), thereby ullifyig the efficiecy gai due to samplig for our sceario. Workload compressio techiques such as [5, 20] compute a compact represetatio of a large workload before submittig it to a physical desig tool. Both of these approaches are heuristics i the sese that they have o meas of assessig the impact of the compressio o cofiguratio selectio or cosequetly o physical desig tuig itself. [5] poses workload compressio as a clusterig problem, usig a distace fuctio that models the maximum differece i cost betwee two queries for arbitrary cofiguratios. This distace fuctio does ot use the optimizer s cost estimate, so it is ot clear how well this approximatio holds for complex queries. [20] compresses a workload by selectig queries i order of their curret costs, util a user-specifiable percetage X of the total workload cost has bee reached. While computatioally simple, this approach may lead to a sigificat reductio i tuig quality for workloads i which queries of some templates are geerally more expesive tha the remaiig queries, as the oly some templates are cosidered for tuig. Makig X user-specifiable does ot alleviate this problem, for the user has o way of assessig the impact of X o tuig quality. The most serious drawback of both approaches is that they do ot adapt the umber of queries retaied to the space of cofiguratios cosidered. Cosequetly, they may result i compressio that is either too coservative, resultig i excessive umbers of optimizer calls or too coarse, resultig i iferior physical desigs. 3 Problem Statemet I this sectio we give a formal defiitio of the problem addressed i this paper. This defiitio is idetical to the iformal oe give i Sectio 1.1, but i additio to the parameters C, WL ad the target probability α, we itroduce 2

3 a additioal parameter δ, which describes the miimum differece i cost betwee cofiguratios that we care to detect. Specifyig a value of δ > 0 helps to avoid scearios i which the algorithm samples a large fractio of the workload whe comparig cofiguratios of early) idetical costs. While accuracy for such cofiguratios is ecessary i some cases, detectig large differeces is all that matters i others. For istace, whe comparig a ew cadidate cofiguratio to the curret oe, the overhead of chagig the physical database desig is justified oly whe the ew cofiguratio is sigificatly better. 3.1 otatio I this paper, we use Costq, C) to deote the optimizerestimated cost of executig a query q i a cofiguratio C. Similarly, the total estimated cost of executig a set of queries {q 1,..., q } i cofiguratio C is Cost{q 1,..., q }, C) := i=1 Costq i, C); whe the set of queries icludes the etire workload, this will be referred to as the cost of cofiguratio C. I the iterest of simple otatio, we will use the simplifyig assumptio that the overhead of makig a sigle optimizer call is costat across queries 2. I the remaider of the paper we use the term cost to describe optimizer-estimated query costs. We use the phrase to sample a query to deote the process of obtaiig the query s text from a workload table or file, ad evaluatig its cost usig the query optimizer. The cofiguratio beig used to evaluate the query will be clear from cotext. The probability of a evet A will be deoted by P ra), ad the coditioal probability of evet A give evet B by P ra B). 3.2 The Cofiguratio Selectio Problem Problem Formulatio: Give a set of physical database cofiguratios C = {C 1,..., C k } ad a workload WL = {q 1,..., q }, a target probability α ad a sesitivity parameter δ, select a cofiguratio C i such that the probability of a correct selectio, P rcs), is larger tha α, where P rcs) := P rcostwl, C i ) < CostWL, C j ) + δ, j i), 1) while makig a miimal umber of optimizer-calls. Our approach to solvig the cofiguratio selectio problem is to repeatedly sample queries from WL, evaluate P rcs), ad termiate oce the target probability α has bee reached. ext, we describe how to estimate P rcs) Sectio 4) ad how to use these estimates ad stratificatio of the workload to costruct a algorithm for the cofiguratio selectio problem Sectio 5). 2 We discuss how to icorporate differeces i optimizatio costs betwee queries i Sectio Samplig Techiques I this sectio we preset two samplig schemes to derive P rcs) estimates first we describe a straightforward approach called Samplig Sectio 4.1); the we describe Delta Samplig, which exploits properties of query costs to come up with a more accurate estimator Sectio 4.2). 4.1 Samplig Samplig is the base samplig scheme that we use to estimate the differeces i costs of pairs of cofiguratios. First, we defie a ubiased estimator X i of CostWL, C i ). The estimator is obtaied by samplig a set of queries SL i from WL ad is calculated as X i = SL i q SL i Costq, C i ), i.e., X i is the mea of SL i radom variables, scaled up by the total umber of queries i the workload. The variace of the uderlyig cost-distributio is σi 2 l=1 = Costq l,c i ) CostWL,Ci) ) 2. ow, whe usig a simple radom sample of size to estimate X i, the variace of this estimate is V arx i ) := 2 σ 2 i ) 1 1 [9]. I the followig, we will use the shorthad Si 2 := σi 2 / 1). To choose the better of two cofiguratios C l ad C j we ow use the radom variable X l X j, which is a ubiased estimator of the true differece i costs µ l,j := CostWL, C l ) CostWL, C j ). Uder large-sample assumptios, the stadardized radom variable l,j := S 2 l SL l X l X j ) µ l,j 1 SL l )+ S 2 j SL j 1 SL j ) ) 2) is ormally distributed with mea 0 ad variace 1 accordig to the CLT). Based o this we ca assess the probability of makig a correct selectio whe choosig betwee C l ad C j, which we write as P rcs l,j ). The decisio procedure to choose betwee the two cofiguratios is to pick the cofiguratio C l, for which the estimated cost X l is the smallest. I this case the probability of makig a icorrect selectio correspods to the evet X l X j < 0 ad µ l,j > δ. Thus, P rcs l,j ) = 1 P rx l X j < 0 µ l,j > δ) = P rx l X j 0 µ l,j > δ) P rx l X j 0 µ l,j = δ) = δ P r l,j > S 2 l SL l 1 SL l ) + S 2 j SL j 1 SL ) j ). 3

4 Give that l,j 0, 1), we ca compute this probability usig the appropriate lookup-tables. Because the true variaces of the cost distributios are ot kow, we use the sample variaces si 2 = q SL Costq,C q SL Costq,C i ) i i ) ) 2 i SL i SL i 1, which are ubiased estimators of the true variaces, i their place i P rcs l,j ). Cosequetly, accurate estimatio of P rcs l,j ) depeds o a) havig sampled sufficietly may values for the Cetral Limit Theorem CLT) [9] to be applicable ad b) beig able to estimate σi 2 accurately by si 2. It is ot possible to verify these coditios based o a sample aloe, as a sigle very large outlier value may domiate both the variace ad the skew of the cost distributio. We will describe how to use additioal domai kowledge o the cost distributios to verify a) ad b) i Sectio 6. I practice, a stadard rule-of-thumb is to sample mi := 30 queries before P rcs) is computed based o the ormality of l,j. While this i may cases results i the CLT beig applicable, the depedece o the sample-variaces si 2 still meas that P rcs) may be either over- or uder-estimated. Whe the umber of cofiguratios k is greater tha two, we derive the probability of correct selectio P rcs) based o the pairwise selectio probabilities P rcs l,j ). Oce agai, the selectio procedure here is to choose the cofiguratio, C i, with the smallest estimate, X i. Cosider the case where C i is chose as the best cofiguratio from C 1,..., C k. C i would be a icorrect choice, if for ay j i, X i X j < 0 but µ i,j > δ. Thus we ca derive a trivial boud for P rcs) usig the Boferroi iequality: P rcs) 1 j {1,...,k},j i 1 P rcsi,j ) ). 3) 4.2 Delta Samplig Delta-Samplig is a variatio of Samplig, that leverages the stability of the rakigs of query costs across cofiguratios to obtai sigificat reductio i the variace of the resultig estimator. I Delta Samplig, we pick oly a sigle set of queries SL WL from the workload ad the estimate the differece i total cost betwee two cofiguratios C l, C j as X l,j = SL q SL Costq, C l) Costq, C j )). The selectio procedure here is to pick the cofiguratio C l for which X l,j < 0. To give bouds for the probability of correct selectio, we defie l,j := X l,j µ l,j S l,j 2 1 SL SL ) 4) where Sl,j 2 := 1 σ l,j 2 ad σ l,j 2 is the variace of the distributio {Costq i, C l ) Costq i, C j ) i = 1,..., } of the cost-differeces betwee C l ad C j. Agai l,j 0, 1) ad we ca derive P RCS l,j ) as i the case of Samplig. The lower boud o P RCS) from equatio 3 i the case of multiple cofiguratios applies here as well. To explai why Delta Samplig typically outperforms samplig, compare equatio 4 to equatio 2. The differece i estimator-variace betwee Delta Samplig ad Samplig depeds o the differece betwee the terms Sl,j 2 / SL i Equatio 4) ad Sl 2/ SL l +Sj 2/ SL j i Equatio 2), i.e. o the weighted differece betwee the respective variaces. It is kow that σl,j 2 = σ l 2 +σj 2 2 Cov l,j [3], where Cov l,j is the covariace of the distributios of the query costs of the workload whe evaluated i cofiguratios C l ad C j. Delta Samplig exploits this as follows. For large workloads it typically holds that whe rakig the queries by their costs i C l ad C j the raks for each idividual query do ot vary too much betwee cofiguratios especially cofiguratios of similar costs which correspod to difficult selectio problems). This meas that i most cases if the cost of a query q i, whe evaluated i C l is higher tha the average cost of all queries i C l, its cost whe evaluated i C j is likely to be higher tha that of the average query i C j ad vice versa for below-average costs). For example, multi-joi queries will be typically more expesive tha sigle-value lookups, o matter what the physical desig. ow, the covariace Cov l,j is defied as i=1 Costq i,c l ) CostWL,C l) )Costq CostWL,Cj) ) ) i,c j). All queries q i for which the above property holds, will add a positive term to the summatio i the above formula as the sigs of both factors will be equal). Cosequetly, if this property holds for sufficietly may queries, the covariace Cov l,j itself will be positive, thereby makig σl,j 2 < σ l 2 + σj 2. So, if the umber of optimizer calls are roughly equally distributed betwee cofiguratios i Samplig, Delta-Samplig should result i tighter estimates of P rcs). We verify this experimetally i Sectio 7. 5 Workload It is possible to further reduce the estimator variaces usig stratified samplig [9]. Stratified samplig is a geeralizatio of uiform samplig, i which the workload is partitioed ito L disjuct strata WL 1... WL L = WL. Strata that exhibit high variaces cotribute more samples. We itroduce stratificatio by partitioig the WL ito icreasigly fie strata as the samplig progresses. The resultig algorithm for the cofiguratio selectio problem is show i Algorithm 1. The algorithm samples from WL util P rcs) is greater tha α. After each sample, we check if further stratificatio is expected to lower estimator variace; if so, we implemet it. We describe this i detail i Sectio 5.1. Oce the workload is stratified, we also have 4

5 to choose which stratum to pick the ext sample from. We describe this i detail i Sectio 5.2. A further heuristic for large k) is to stop samplig for cofiguratios that are clearly ot optimal. If C l is the chose cofiguratio i a iteratio ad 1 P rcs l,j ) is egligible for some C j, the C j is dropped from future iteratios. This way, may cofiguratios are elimiated quickly, ad oly the oes cotributig the most ucertaity remai i the pool. We study the effects of this optimizatio i Sectio 7. 1: // Iput: Workload WL, Cofiguratios C 1,..., C k, 2: // Probability α, Sesitivity δ 3: // Output: Cofiguratio C best, Probability P rcs) 4: For each C l, pick pilot-sample SL l, SL l = mi. 5: // ote: for Delta-Samplig SL =... = SL k 6: repeat 7: if further beeficial the 8: Create ew strata; sample additioal queries so that every stratum cotais mi queries Sectio 5.1) 9: ed if 10: Select ext query q ad i case of Samplig cofiguratio C i to evaluatesectio 5.2) 11: Evaluate i,j for all 1 i < j k Sectios 4.1, 4.2) 12: Select C best from C s.t. X best X j < 0 j best for -samplig or X best,j < 0 j best for Delta-samplig Sectios 4.1, 4.2) 13: util P rcs) > α 14: retur C best,p rcs) Algorithm 1: Computig PrCS) through Samplig Preprocessig: For workloads large eough that the query strigs do ot fit ito memory, we write all query strigs to a database table, which also cotais the query s ID ad template also kow as a query s sigature [6, 5] or skeleto [18]). Two queries have the same template if they are idetical i everythig but the costat bidigs of their parameters. Template-iformatio ca be acquired either by usig a workload-collectio tool capable of recordig template iformatio e.g. [6]) or by parsig the queries, which geerally requires a small fractio of the overhead ecessary to optimize them. ow we ca obtai a radom sample of size from this table by computig a radom permutatio of the query IDs ad the usig a sigle sca) readig the queries correspodig to the first IDs ito memory. This approach trivially exteds to stratified samplig. 5.1 Progressive Give the strata WL h, h = 1,..., L ad a cofiguratio C i, the stratified estimates X i, i the case of idepedet samplig, are calculated as the sum of the weighted estimates for each stratum WL h. Let the variace of the cost distributio of WL h, whe evaluated i cofiguratio C i be deoted by σih) 2 ad S ih) 2 := σ ih) 2 WL h WL h ). Let 1) the umber of queries sampled from WL h be h. The the variace of the estimator X i of the cost of C i for Samplig is [9] V arx i ) = L h=1 WL h 2 S 2 ih) h 1 h WL h ). 5) We would thus like to pick a stratificatio ad a allocatio of sample sizes L = that miimizes equatio 5. For Delta-samplig, the variaces of the estimators X i,j ca be obtaied aalogously. Pickig a stratificatio: Iitially, cosider the case of Samplig: here, we ca use a differet stratificatio of WL for every X i. I the followig we discuss how to stratify WL for oe cofiguratio C i. Because the costs of queries that share a template typically exhibit much smaller variace tha the costs of the etire workload, it is ofte possible to get a good estimate of the average cost of queries sharig a template usig oly few sample queries. Cosequetly, we oly cosider stratificatios i which all queries of oe template are grouped ito the same stratum ad use the average costs per template to estimate the strata variaces S 2 ih). This does ot mea that usig a very fie-graied stratificatio with a sigle stratum for every template must ecessarily be the best way to stratify. While a fie stratificatio may result i a small variace for large sample-sizes, it comes at a price: i order for the estimate based o stratified samplig to be valid, the umber of samples h take from each stratum WL h has to be sufficietly large so that the estimator of the cost of a cofiguratio i each stratum is ormal. Choosig betwee stratificatios: Based o the above, we ca compute the stratum variaces resultig from a stratificatio ST = {WL 1,..., WL L }, give a sample allocatio T = 1,..., L ) usig equatio 5. Usig these variaces, we ca the estimate the umber of samples, #T otal Samples, ecessary to achieve P rcs) > α for a set of cofiguratios C. We omit the details of this algorithm due to space costraits, but it essetially ivolves computig target variaces T argetv arx i ) for each estimator X i such that these variaces, whe substituted i to the formulas i Sectio 4, give us the target probability. For a give a stratificatio ST, iitial sample allocatio T ad cofiguratio C i we deote the miimum umber of samples required to achieve the target variace uder the assumptio that the uderlyig σih) 2 #SamplesC i, ST, T ). 3 remai costat) by We use this estimate to compare differet stratificatio schemes. We ow choose the stratificatio of WL for a cofiguratio C i as follows: whe samplig, we maitai the average cost of the queries for C i i each template ad use it to estimate Sih) 2, oce we have see a small umber of queries for each template. The algorithm starts with a sigle stratum ST = WL). After every sample from C i it evaluates 3 If we igore the fiite populatio correctio, the this umber ca be computed usig OL log 2 )) operatios by combiig a biary search ad eyma Allocatio [9]. 5

6 if it should chage the stratificatio of the queries for this cofiguratio. This is doe by iteratig over a set of possible ew stratificatios ST p, p = 1..., #templates 1 resultig from splittig oe of the existig strata ito two at the pth-most expesive template ad computig #SamplesC i, ST p, T p ). Here, we chose the T p such that iitially each stratum cotais the maximum of the umber of queries already sampled from it ad mi. For scalability, we oly add a sigle stratum per step. ote that ulike stratificatio i the cotext of survey samplig or approximate query processig e.g. [4] where the chose stratificatio is the result of a complex optimizatio problem), we icur egligible computatioal overhead for chagig the stratificatio. The details are show i Algorithm 2. 1: // Iput: Strata WL 1,..., WL L, Cofiguratio C i 2: // Output: Stratum to split WL s = WL 1 s WL 2 s 3: // Templates to store i WL 1 s, WL 2 s 4: s := L + 1. // Iitializatio 5: mi sam := #Samples C i, {WL 1,.., WL L }, 1,.., L ). 6: for all WL j {WL 1,..., WL L } do 7: Set j := the expected umber of samples from WL j i #Samples C i, {WL 1,..., WL L }, 1,..., L ). 8: if j 2 mi the 9: Order the query templates i WL j by their average cost as tmp p1,..., tmp pm. 10: for Every split poit t {p 1,..., p m 1} do 11: Cosider the split WL j = WL 1 j WL 2 j ito templates {tmp p1,..., tmp t }, {tmp t+1,..., tmp pm }. 12: Compute sam[t] := #Samples C i, {WL 1... WL 1 j, WL 2 j... WL L}, 1,..., max{ mi, WL 1 j }, max{ mi, WL 2 j },..., L ). 13: if sam[t] < mi sam the 14: mi sam := sam[t]; s = t. 15: ed if 16: ed for 17: ed if 18: ed for 19: if s L + 1 the 20: Update #T otal Samples ad target variaces. 21: retur WL s ad template iformatio. 22: ed if Algorithm 2: Computig the optimum stratificatio Overhead: Give T templates i WL, there ca oly be T 1 possible split poits over all strata. For each split poit, we have to evaluate the correspodig value of #Samples...), which requires OL log 2 )) operatios. Cosequetly, the etire algorithm rus i OL log 2 ) T ) time. Because the umber of differet templates is typically orders of magitude smaller tha the size of the etire workload, the resultig overhead is egligible whe compared to the overhead of optimizig eve a sigle query. ote that we execute this algorithm oly for the cofiguratio C i that the last sample was chose from, as the estimates ad sample variaces for the remaiig cofiguratios stay uchaged. for Delta Samplig: For Delta-Samplig, the problem of fidig a good stratificatio is complicated by the fact that Delta-Samplig samples the same set of queries for all cofiguratios ad thus a stratificatio scheme eeds to recocile the variaces of multiple radom variables X i,j. As a cosequece, we cosider the reductio i the average variace of X i,j, 1 i < j k whe comparig stratificatio schemes. Furthermore, the orderig of the templates for each X i,j may vary, resultig i up to k k 1)/2 orderigs to cosider whe computig a ew cadidate stratificatio. For reasos of tractability, we oly cosider a sigle rakig over the X i,j istead. 5.2 Pickig the ext sample We first cosider Samplig without stratificatio: ideally, we would like to pick the ext combiatio of query ad cofiguratio to evaluate so that P rcs) is maximized. We use a heuristic approach, attemptig to miimize the sum of the estimator-variaces istead. I case of Samplig this meas icreasig oe sample-size SL j by oe, so that k i=1 V arx i) is miimized. Sice we do t kow the effect the ext query will have o sample meas ad si 2 s beforehad, we estimate the chage to the sum of variaces assumig that these remai uchaged. For Delta-Samplig, the sampled query is evaluated i every cofiguratio, so sample selectio is trivial. If the workload is stratified, we use a similar approach for Samplig, choosig the cofiguratio ad stratum resultig i the biggest estimated improvemet i the sum of variaces. For Delta-Samplig with stratificatio, a query from the stratum resultig i the biggest estimated improvemet i the sum of the variaces of all estimators is chose ad evaluated i every cofiguratio. While this approach assumes optimizatio times to be costat, differet optimizatio times for each template ca be modeled by computig the average overhead for each cofiguratio/stratum pair ad selectig the oe maximizig the variace reductio relative to the expected overhead. 6 Applicability of the CLT As discussed i Sectio 4.1, the estimatio of P rcs) relies o the fact that i) we have sampled sufficietly may queries for the cetral limit theorem to apply ad ii) the sample variaces si 2 are accurate estimators of the true variaces σi 2. While these assumptios ofte do hold i practice, the potetial impact of choosig a iferior database desig makes it desirable to be able to validate them. We do this as follows: i the sceario of physical database desig, we do kow the total umber of queries i the workload ad ca obtai upper ad lower bouds o their idividual costs. Usig these, we the compute upper bouds o the skew ad variace of the uderlyig distributio ad use these to verify the assumptios i) ad ii). I 6

7 Sectio 6.1, we will describe how to obtai bouds o the costs of queries that have ot bee sampled; the, i Sectio 6.2 we describe how to use these bouds to compute upper bouds o skew ad variace. 6.1 Derivig Cost Bouds for Queries The problem of boudig the true cost of arbitrary database queries has bee studied extesively ad is kow to be hard e.g. [14]). However, i the cotext of physical database desig, we oly seek the cofiguratio with the best optimizer-estimated cost, ad ot bouds o the true selectivity of query expressios. This simplificatio allows us to utilize kowledge o the space of physical database desigs ad the optimizer itself to make this problem tractable; moreover, it is actually essetial to physical desig tuig, as the tuig tool caot ifluece optimizer-decisios at ru-time ad thus eeds to keep its recommedatios i syc with optimizer behavior. First, cosider the issue of obtaiig bouds o SELECT-queries. I the cotext of automated physical desig tools, it is possible to determie a so-called base cofiguratio, cosistig of all idexes ad views that will be preset i all cofiguratios eumerated durig the tuig process. If the optimizer is well-behaved, the addig a idex or view to the base cofiguratio ca oly improve the optimizer estimated cost of a SELECT-query. Cosequetly, the cost of a SELECT-query i its base cofiguratio gives a upper boud o the cost for ay cofiguratio eumerated durig the tuig. Lower bouds o the costs of a SELECT-query ca be obtaied usig the reverse method: usig the cost of a query Q i a cofiguratio cotaiig all idexes ad views that may be useful to Q. All automated physical desig tools kow to us have compoets that suggest a set of structures the query may beefit from; however, ultimately the umber of such structures may be very large. To overcome this, it is possible to use the techiques described i [2]. Here, the query-optimizer is istrumeted with additioal code that outputs for ay idividual access path cosidered durig the optimizatio of a query the correspodig idex or view that would be optimal to support this access path. This reduces the umber of relevat idexes ad views sigificatly, as i) we do ot have to guess which access paths may be relevat ad ii) while a complex SQL query may be executed i a extremely large umber of differet ways, the optimizer itself for reasos of scalability oly cosiders a subset of them. I order to boud the costs of UPDATE statemets icludig ISERT ad DELETE statemets), we use the stadard approach of splittig a complex update statemet ito a SELECT ad a UPDATE part, e.g. the statemet UP- DATE R SET A 1 = A 3 WHERE A 2 < 4 is separated ito: i) SELECT A 3 FROM R WHERE A 2 < 4, ad ii) UPDATE TOPk) R SET A1 = 0, where k is the estimated selectivity of i). The SELECTpart of the query is hadled as outlied above. To boud the cost of the UPDATE-part, we use two observatios: 1) the umber of differet update templates occurrig teds to be orders of magitude smaller tha the workload size ad 2) i the cost models curretly used i query optimizers, the cost of a pure update statemet grows with its selectivity. Thus, we ca boud the cost of all UPDATE-statemets of a specific template T o a cofiguratio C usig the optimizer cost-estimate i C for the two queries with the largest/smallest selectivity i T. This meas that we have to make 2 optimizer calls per template ad cofiguratio ulike whe boudig the cost of SELECT queries); however, this approach scales well sice the umber of differet templates is geerally orders of magitude smaller tha the size of the workload. ote that automated physical desig tuig tools already issue a sigle optimizer call for all queries i WL to derive iitial costs e.g. [20], figure 2) ad require a secod optimizer call per query for report geeratio. While these techiques are very simple, eve very coservative cost bouds ted to work well, as the resultig cofidece bouds give by the CLT coverge quickly. A exhaustive discussio of boudig techiques for SQL queries is ot the focus of this paper, ad these are a iterestig research challege themself. 6.2 Verifyig the validity of the estimators ow, the presece of bouds o the costs of the queries ot part of the sample allows us to compute a upper boud σmax 2 o σi 2, which we ca use i its place, thereby esurig that the estimated value of P rcs) will be coservative. Computig σmax 2 is a maximizatio problem with costraits defied by the cost itervals. Problem Statemet [Boudig the Variace]: Give a set of variables V = {v 1,..., v } represetig query costs, with each v i bouded by low i v i high i, compute σ 2 max = 1 max v 1,...,v ) R i:low i v i high i i=0 v i i=0 v ) i 2. 6) It is kow that this problem is P-hard [11] to solve exactly or to approximate withi a arbitrary ɛ > 0 [12]. The best kow algorithm for this problem requires up to O2 2 ) operatios [12], ad is thus ot practical for the workload sizes we cosider. Therefore, we propose a ovel algorithm approximatig σmax. 2 The basic idea is to solve the problem for values of v i that are multiples of a appropriately chose factor ρ ad to boud the differece betwee the solutio obtaied this way which we ll refer to as ˆσ max) 2 ad the solutio σmax 2 for ucostraied v i. We use the otatio v ρ i whe describig the rouded values. Rewritig equatio 6, we get σmax ) 2 = max v i ) 2 v i ). 7) v 1,...,v ) R i:low i v i high i i=0 i=0 7

8 ow, we defie MaxV 2 [m][j] as the maximum value of m m i=1 vρ i )2 uder the costrait that i=1 vρ i ) = m i=1 lowρ i + j ρ. Cosequetly, every solutio ˆσ2 max to the maximizatio problem for multiples of ρ must be of the form 1 MaxV 2 [][j] 1 m low ρ i + j ρ)2), 8) i=1 ad we ca fid the solutio by examiig all possible values of equatio 8. Because we are oly cosiderig multiples of ρ, we roud the boudaries of each v i to the closest multiples of ρ: low ρ i := low i + ρ/2)/ρ ρ ad high ρ i := high i + ρ/2)/ρ ρ. Thus each v i ca oly take o rage i = high ρ i lowρ i )/ρ + 1 distict values. Cosequetly, the average of the first m values ca take o total m = m i=1 rage i) m 1) differet values. This limitatio is the key idea to makig our approximatio algorithm scalable, as the values of total i typically icrease much more slowly tha the umber of combiatios of differet high i ad low i values. We ca compute all possible values of MaxV 2 [m][j] for m = 1,...,, j = 0,..., total m 1 1 usig the followig recurrece: MaxV 2 [i][j + l] { max MaxV 2 [i 1][l] + low ρ i = + j l=1,...,total ρ)2, if i > 1 i 1 low ρ i + j ρ)2, otherwise. This computatio requires total i 1 steps for each i = 1,...,. ow we ca compute the solutio to the costraied maximizatio problem usig equatio 8 i total steps. Two further performace-optimizatios are possible: because the 2d cetral momet of a distributio has o global maximum over a compact box that is ot attaied at a boudary poit [16], each v ρ i must either take o its maximum or miimum value. Cosequetly, we oly eed to check the cases of j = 0 ad j = total m 1 1 i the recurrece. Furthermore, we ca miimize the umber of steps i the algorithm by traversig the values v i i icreasig order of rage i whe computig MaxV 2 [m][j]. Accuracy of the approximatio: The differece betwee the approximate solutio ˆσ max 2 ad the true optimum σmax 2 ca be bouded as follows: the existece of a set of variables v 1,..., v with i : low i v i high i implies that there exists a set of of multiples of ρ, v ρ 1,..., vρ with i : low ρ i vρ i highρ i, so that i : v i v ρ i ρ/2. Cosequetly, the differece betwee the value of i=0 vρ i )2 ad the value of i=0 v i) 2 i equatio 7 for these sets of values is at most i=0 ρ v ρ i + ρ2 /4 ). Similarly, the differece betwee 1 i=0 vρ i )2 ad 1 i=0 v i) 2 i equatio 7 is at most i=0 ρ v ρ i + ρ2 /4 ). Therefore, the existece of a solutio σ 2 max to equatio 6 implies the existece of a set of of multiples of ρ, v ρ 1,..., vρ with i : low ρ i v ρ i high ρ i such that the differece betwee σmax 2 ad the variace of v ρ 1,..., vρ is bouded by θ := 2 i=0 ρ v ρ i + ρ2 /4 ) *). Similarly, the existece of a solutio ˆσ max 2 to the optimizatio problem for multiples of ρ implies the existece of a set of variables v 1,..., v with i : low i v i high i such that the differece betwee ˆσ max 2 ad the variace of v 1,..., v is bouded by θ as well **). Therefore, if we compute a solutio ˆσ max 2 to the optimizatio problem for multiples of ρ, the **) implies that the solutio to the ucostraied problem is at least ˆσ max 2 θ ad *) implies that o solutio σmax 2 to equatio 6 exists that is larger tha ˆσ max 2 + θ. Overhead: To demostrate the scalability of this approximatio algorithm, we have measured the overhead of computig ˆσ max 2 for a TPC-D workload of 100K queries ad various values of ρ o a Petium 4 2.8GHz) PC i Table 1. =100K ρ=10 =100K ρ=1 =100K ρ=1/10 Timeˆσ 2 max) 0.4 sec. 5.2 sec. 53 sec. Table 1: Overhead of approximatig σ 2 max Verifyig the applicability of the CLT: The 2d assumptio made whe modelig P rcs) is that the sample size is sufficietly large for the CLT to apply. However, we have ot stated thus far how a sufficietly large miimum sample size mi is picked. Oe geeral result i this area is Cochra s rule [9], p. 42) which states that for populatios marked by positive skewess i.e. the populatio cotais sigificat outliers), the sample size should satisfy > 25 G 1 ) 2, where G 1 is Fisher s measure of skew. Uder the assumptio that the disturbace i ay momet higher tha the 3rd is egligible ad additioal coditios o the samplig fractio f := 1 /), it ca be show that this coditio guaratees that a 95% cofidece statemet will be wrog o more tha 6% of the time, i.e. for the stadardized statistic Z := 1 f X σ i µ i : 2 P rz 1.96) P rz 1.96) > The derivatio of additioal results of this type is studied i [19]. While details of this work are beyod the scope of this paper, we use a modificatio of Cochra s Rule proposed i [19], which was foud to be robust for the populatio sizes we cosider: > G 1 ) 2. 9) I order to verify this coditio, we eed to be able to compute a upper boud o G 1 : Problem Statemet [Boudig the skew]: Give a set of variables V = {v 1,..., v }, with each v i bouded by 8

9 low i v i high i, compute G1 max = max v 1,...,v ) R i:low i v i high i i=0 v i i=0 v i ) 3 i=0 vi i=0 vi ) ) Ulike the case of variace maximizatio, to the best of our kowledge, the complexity of maximizig G 1 is ot kow. We approach this problem usig a approximatio scheme similar to the oe used for σ 2 max; because of size costraits, we omit the full descriptio of this algorithm. Because of the quick covergece of the CLT, eve the rough bouds derived i the previous sectio allow us to formulate practical costraits o sample size. For example, for the highly skewed query costs vary by multiple degrees of magitude) 13K query TPC-D workload described i Sectio 7, satisfyig equatio 9 required about a 4% sample; for a 131K query TPC-D workload, a samples of less tha 0.6% of the queries was eeded. 7 Experimets I this sectio we first evaluate the efficiecy of the samplig techiques described i Sectios 4 ad 5. The, we show how the estimatio of P rcs) ca be used to costruct a scalable ad accurate compariso primitive for large umbers of cofiguratios. Fially, we compare our approach to existig work experimetally. Setup: All experimets were ru o a Petium 4 2.8GHz) PC ad use the cost model of the SQL Server optimizer. We evaluated our techiques o two databases, oe sythetic, ad oe real-life. The sythetic database follows the TPC-D schema ad was geerated so that the frequecy of attribute values follows a Zipf-like distributio, usig the skew-parameter θ = 1. The total data size is 1GB. Here, we use a workload cosistig of about 13K queries, geerated usig the stadard QGE tool. The real-life database used was a database ruig a CRM applicatio with over 500 tables ad of size 0.7 GB. We obtaied the workload for this database by usig a trace tool; the resultig workload cotais about 6K queries, iserts, updates ad deletes. 7.1 Evaluatig the Samplig Techiques I this sectio, we evaluate the efficiecy of the differet samplig techiques described ad Delta Samplig, both with ad without progressive stratificatio). I the first experimet, we use the TPC-D workload ad cosider the problem of choosig betwee two cofiguratios C 1, C 2 that have a sigificat differece i cost 7%) ad i their respective sets of physical desig structures C 2 is idex-oly, whereas C 1 cotais a umber of views). ote that WL = 13K, so solvig the cofiguratioselectio problem for k = 2 exactly requires 26K optimizer calls. I the first experimet we set δ = 0 ad ru each samplig scheme for a give sample size ad output the selected cofiguratio. This process is repeated 5000 times, resultig i a Mote Carlo simulatio to compute the true probability of correct selectio for a give umber of samples. The results are show i Figure 1. We ca see that Probability of correct Selectio Sample Size Samplig Delta Samplig Samplig + Delta Samplig + Figure 1: Mote Carlo Simulatio of P rcs) the samplig-based approach to cofiguratio selectio is efficiet: less tha 1% of the umber of optimizer calls required to compute the cofiguratio costs exactly suffice to select the correct cofiguratio with ear-certaity. Delta- Samplig outperforms Samplig sigificatly for small sample sizes, whereas addig progressive stratificatio makes little differece, give the small sample sizes. I the ext experimet we illustrates the issues of iitially pickig a very fie stratificatio of the workload. Here, we ru the same experimet agai, ad use both samplig schemes whe havig partitioed the workload ito L strata, oe for each query template Figure 2). For the fie Probability of correct Selectio Sample Size 130 Figure 2: Progressive vs. fie stratificatio Samplig + prog. Delta Samplig + prog. Samplig + simple Delta Samplig + simple stratificatio ad small sample sizes, the estimates withi each stratum are ot ormal ad thus the probability of correct selectio is sigificatly lower. For large sample sizes, the accuracy of the fie stratificatio is comparable to the progressive stratificatio scheme. We have foud this to hold true for a umber of differet experimetal setups. The ext experimet Figure 3) uses the same workload, but two cofiguratios that are sigificatly harder to distiguish differece i cost 2%), ad share a sigificat umber of desig structures both cofiguratios are idex-oly). Here, Delta Samplig outperforms Idepe- 9

10 det Samplig by a bigger margi, sice the cofiguratios share a large umber of objects, resultig i higher covariace betwee the cost distributios. Because of the larger sample sizes, stratificatio sigificatly improves the accuracy of Samplig. Probability of correct Selectio Sample Size Samplig Delta Samplig Samplig + prog. Delta Samplig + prog. Figure 3: Mote-Carlo Simulatio of P rcs) We coducted the same experimets o the real-life database, usig two cofiguratios that were difficult to compare differece i cost < 1%), ad had little overlap i the physical desig structures Figure 4). Cosequetly, the advatage of Delta Samplig is less proouced. Furthermore, this workload cotais a relatively large umber of distict templates > 120). This meas that we rarely have estimates of the avg. cost of all templates, so progressive stratificatio is used oly i a few cases. Probability of correct Selectio Sample Size Samplig Delta Samplig Samplig + prog. Delta Samplig + prog. Figure 4: Mote-Carlo Simulatio of P rcs) 7.2 Experimets o multiple cofiguratios I this sectio we evaluate the accuracy ad scalability of the compariso primitive based o P rcs)-estimates for large umbers of cofiguratios k, which were collected from a commercial physical desig tool. We ru Algorithm 1 to determie the best cofiguratio with probability α = 90%, σ = 0 ad usig Delta-Samplig with progressive stratificatio as the samplig techique. I these experimets, we use the sample variaces si 2 to estimate σi 2. I order to guard agaist the oscillatio of the P rcs)-estimates, we oly accept a P rcs)-coditio if it holds for more tha 10 cosecutive samples. As a additioal optimizatio, we stop samplig from cofiguratios C j for which the cotributio to the overall ucertaity i P rcs) is egligible i this case P rcs l,j ) >.995). We compare the primitive to two alterative sampleallocatio methods usig idetical umber of samples): a) samplig without stratificatio ad b) samplig the same umber of queries from each stratum. For all methods, we report the resultig probability of selectig the best cofiguratio True P rcs) ) ad the maximum differece i cost betwee the best cofiguratio ad the oe selected by each method Max. ). The latter allows us to assess the worst-case impact of usig the alterative techiques. Each experimet was carried out 5000 times, for both the TPC-D Table 2) ad the CRM database Table 3). Method k = 50 k = 100 k = 500 Delta-Samplig True P rcs) 91.7% 88.2% 88.3% Max. 0.5% 1.5% 1.6% o Strat. True P rcs) 39.1% 28.2% 12.0% Max. 8.8% 9.9% 9.8% Equal Alloc. True P rcs) 42.5% 28.6% 12.8% Max. 7.7% 9.0% 8.6% Table 2: Results for TPCD-Workload Method k = 50 k = 100 k = 500 Delta-Samplig True P rcs) 97.5% 94.4% 89.7% Max. 1.7% 1.4% 0.8% o Strat. True P rcs) 56.0% 37.5% 11.0% Max % 12.69% 6.5% Equal Alloc. True P rcs) 71.1% 52.8% 17.0% Max. 7.2% 5.8% 3.26% Table 3: Results for CRM-Workload I both cases, the compariso primitive based o Delta- Samplig performs sigificatly better tha the alteratives. Moreover, the true P rcs) resultig from the selectio primitive matches the target probability α closely or exceeds it 4, demostratig its capability to chose the sample size effectively to match the accuracy requiremet. 7.3 Compariso to Workload Compressio I this sectio, we compare existig workload compressio techqiues [5, 20] to our approach with three key evaluatio criteria: scalability, quality ad adaptivity. I [20], queries are selected i order of their costs for the curret cofiguratio util a prespecified percetage, X, of the total workload cost is selected. Obviously, this method scales well. Regardig quality, however, the techique fails whe oly few query templates cotai the most expesive queries. To illustrate this, we geerated a 2K query TPC-D workload usig the QGE tool. If X = 20%, [20] will capture queries correspodig to oly few of the TPC-D query 4 The high P rcs) for the CRM workload is due to the requiremet that P rcs) > α must hold for 10 cosecutive samples, which causes over-samplig for easy selectio problems. 10

11 templates. Cosequetly, tuig this compressed workload fails to yield several desig structures beeficial for the remaiig templates. To show this, we tued 5 differet radom samples of the same size as the compressed workload; the improvemet over the etire workload) resultig from tuig each sample was more tha twice the improvemet resultig from tuig the compressed workload. To evaluate the quality resultig from [5], we measured differece i improvemet whe tuig a Delta-sample ad a compressed workload of the same size for a TPC-D workload; i this experimet, both approaches performed comparably. Regardig scalability, [5] requires up to O WL 2 ) complex distace-computatios as a preprocessig step to the clusterig problem. I cotrast, the overhead for executig the Algorithms 1 ad 2 is egligible, as all ecessary couters ad measuremets ca be maitaied icremetally at costat cost. The biggest differece betwee [5, 20] ad our techique cocers adaptivity. [5, 20] both deped o a iitial guess of a sesitivity parameter. I [5], it is the maximum allowable icrease i the estimated ruig time whe queries are discarded ad i [20] it is the percetage of total cost retaied. It is ot clear how to set this parameter correctly, as these determiistic techiques do ot allow the formulatio of probabilistic guaratees similar to the oes i this paper; i both [5, 20] the sesitivity parameter is set up-frot, without takig the cofiguratios space ito accout. We have observed i experimets that the fractio of a workload required for accurate selectio varies sigificatly for differet sets of cadidate cofiguratios. Thus choosig the sesitivity parameter icorrectly has sigificat impact o tuig quality ad speed. Our algorithm, i cotrast, offers a pricipled way of adjustig the sample size olie. 8 Coclusio ad Outlook I this paper, we preseted a scalable compariso primitive solvig the cofiguratio selectio problem, while providig accurate estimates of the probability of correct selectio. This primitive is of crucial importace to scalable exploratio of the space of physical database desigs ad ca serve as a core compoet withi automated physical desig tools. Moreover, we described a ovel scheme to verify the validity of usig the Cetral Limit Theorem ad sample variaces to compute P rcs). These techiques are ot limited to this problem domai, but ca be used for ay CLT-based estimator, if the required bouds o idividual values i the uderlyig distributio ca be obtaied. Refereces [1] S. Agrawal, S. Chaudhuri, et al. Database Tuig Advisor for Microsoft SQL Server. Proc. of the 30th VLDB Cof., Toroto, Caada, [2]. Bruo ad S. Chaudhuri. Automatic Physical Database Tuig: a Relaxatio-based Approach. I ACM SIGMOD Cof., [3] G. Casella ad R. L. Berger. Statistical Iterferece. Duxbury, [4] S. Chaudhuri, G. Das, et al. A Robust, Optimizatio-Based Approach for Aprroximate Aswerig of Aggregate Queries. I Proc. of ACM SIGMOD Cof., [5] S. Chaudhuri, A. Gupta, et al. Compressig SQL Workloads. I Proc. of ACM SIGMOD Cof., [6] S. Chaudhuri, A. C. Köig, et al. SQLCM: A Cotiuous Moitorig Framework for Relatioal Database Egies. I Proc. of 20th I CDE Cof., [7] S. Chaudhuri ad V. R. arasayya. A Efficiet Cost-Drive Idex Selectio Tool for Microsoft SQL Server. I Proc. the 23rd VLDB Cof., Athes, Greece, [8] S. Chaudhuri ad V. R. arasayya. AutoAdmi What-If Idex Aalysis Utility. I Proc. of ACM SIGMOD Cof., Seattle, WA, USA, [9] W. G. Cochra. Samplig Techiques. Wiley, [10] B. Dageville, D. Das, et al. Automatic SQL Tuig i Oracle 10g. I Proc. of the 30th VLDB Cof., [11] S. Ferso, L. Gizburg, et al. Computig Variace for Iterval Data is P-Hard. I ACM SIGACT ews, Vol. 33, o. 2, pages , [12] S. Ferso, L. Gizburg, et al. Exact Bouds o Fiite Populatios of Iterval Data. Techical Report UTEP-CS-02-13d, Uiversity of Texas at El Paso, [13] S. Fikelstei, M. Schikolick, et al. Physical Database Desig for Relatioal Databases. I ACM TODS, 131), [14] Y. Ioaidis ad S. Christodoulakis. Optimal Histograms for limitig Worst-Case Error Propagatio i the Size of Joi Results. I ACM TODS, [15] S.-H. Kim ad B. L. elso. Selectig the Best System: Theory ad Methods. I Proc. of the 2003 Witer Simulatio Cof., pages [16] V. Kreiovich, S. Ferso, et al. Computig Higher Cetral Momets for Iterval Data. Techical Report UTEP-CS-03-14, Uiversity of Texas at El Paso, [17]. M. Steiger ad J. R. Wilso. Improved Batchig for Cofidece Iterval Costructio i Steady-State Simulatio. I Proc. of the 1999 Witer Simulatio Cof., [18] M. Stillger, G. Lohma, V. Markl, ad M. Kadil. LEO - DB2 s Learig Optimizer. I Proc. of the 27th Coferece o Very Large Databases, [19] R. Sugde, T. Smith, et al. Cochra s rule for simple radom samplig. I Joural of the Royal Statistical Society, pages , [20] D. C. Zilio, J. Rao, et al. DB2 Desig Advisor: Itegrated Automatic Physical Database Desig. Proc. of the 30th VLDB Cof., Caada, Ackowledgemets: This paper beefitted tremedously from may isightful commets from Vivek arasayya, Surajit Chaudhuri, Sajay Agrawal ad Ami Saberi. 11

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

A General Framework for Accurate Statistical Timing Analysis Considering Correlations

A General Framework for Accurate Statistical Timing Analysis Considering Correlations A Geeral Framework for Accurate Statistical Timig Aalysis Cosiderig Correlatios 7.4 Vishal Khadelwal Departmet of ECE Uiversity of Marylad-College Park vishalk@glue.umd.edu Akur Srivastava Departmet of

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP)

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP) Heuristic Approaches for Solvig the Multidimesioal Kapsack Problem (MKP) R. PARRA-HERNANDEZ N. DIMOPOULOS Departmet of Electrical ad Computer Eg. Uiversity of Victoria Victoria, B.C. CANADA Abstract: -

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Random Graphs and Complex Networks T

Random Graphs and Complex Networks T Radom Graphs ad Complex Networks T-79.7003 Charalampos E. Tsourakakis Aalto Uiversity Lecture 3 7 September 013 Aoucemet Homework 1 is out, due i two weeks from ow. Exercises: Probabilistic iequalities

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

prerequisites: 6.046, 6.041/2, ability to do proofs Randomized algorithms: make random choices during run. Main benefits:

prerequisites: 6.046, 6.041/2, ability to do proofs Randomized algorithms: make random choices during run. Main benefits: Itro Admiistrivia. Sigup sheet. prerequisites: 6.046, 6.041/2, ability to do proofs homework weekly (first ext week) collaboratio idepedet homeworks gradig requiremet term project books. questio: scribig?

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS Prosejit Bose Evagelos Kraakis Pat Mori Yihui Tag School of Computer Sciece, Carleto Uiversity {jit,kraakis,mori,y

More information

A Boolean Query Processing with a Result Cache in Mediator Systems

A Boolean Query Processing with a Result Cache in Mediator Systems A Boolea Query Processig with a Result Cache i Mediator Systems Jae-heo Cheog ad Sag-goo Lee * Departmet of Computer Sciece Seoul Natioal Uiversity Sa 56-1 Shillim-dog Kwaak-gu, Seoul Korea {cjh, sglee}cygus.su.ac.kr

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

Evaluating Top-k Selection Queries

Evaluating Top-k Selection Queries Evaluatig Top-k Selectio Queries Surajit Chaudhuri Microsoft Research surajitc@microsoft.com Luis Gravao Columbia Uiversity gravao@cs.columbia.edu Abstract I may applicatios, users specify target values

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size Markov Chai Model of HomePlug CSMA MAC for Determiig Optimal Fixed Cotetio Widow Size Eva Krimiger * ad Haiph Latchma Dept. of Electrical ad Computer Egieerig, Uiversity of Florida, Gaiesville, FL, USA

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Solving Fuzzy Assignment Problem Using Fourier Elimination Method Global Joural of Pure ad Applied Mathematics. ISSN 0973-768 Volume 3, Number 2 (207), pp. 453-462 Research Idia Publicatios http://www.ripublicatio.com Solvig Fuzzy Assigmet Problem Usig Fourier Elimiatio

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Małgorzata Sterna. Mateusz Cicheński, Mateusz Jarus, Michał Miszkiewicz, Jarosław Szymczak

Małgorzata Sterna. Mateusz Cicheński, Mateusz Jarus, Michał Miszkiewicz, Jarosław Szymczak Małgorzata Stera Mateusz Cicheński, Mateusz Jarus, Michał Miszkiewicz, Jarosław Szymczak Istitute of Computig Sciece Pozań Uiversity of Techology Pozań - Polad Scope of the Talk Problem defiitio MP Formulatio

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

Recursive Estimation

Recursive Estimation Recursive Estimatio Raffaello D Adrea Sprig 2 Problem Set: Probability Review Last updated: February 28, 2 Notes: Notatio: Uless otherwise oted, x, y, ad z deote radom variables, f x (x) (or the short

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

c-dominating Sets for Families of Graphs

c-dominating Sets for Families of Graphs c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

Computational Geometry

Computational Geometry Computatioal Geometry Chapter 4 Liear programmig Duality Smallest eclosig disk O the Ageda Liear Programmig Slides courtesy of Craig Gotsma 4. 4. Liear Programmig - Example Defie: (amout amout cosumed

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

Package popkorn. R topics documented: February 20, Type Package

Package popkorn. R topics documented: February 20, Type Package Type Pacage Pacage popkor February 20, 2015 Title For iterval estimatio of mea of selected populatios Versio 0.3-0 Date 2014-07-04 Author Vi Gopal, Claudio Fuetes Maitaier Vi Gopal Depeds

More information

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini Proceedig of Iteratioal Coferece O Research, Implemetatio Ad Educatio Of Mathematics Ad Scieces 014, Yogyakarta State Uiversity, 18-0 May 014 BAYESIAN WIH FULL CONDIIONAL POSERIOR DISRIBUION APPROACH FOR

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

Primitive polynomials selection method for pseudo-random number generator

Primitive polynomials selection method for pseudo-random number generator Joural of hysics: Coferece Series AER OEN ACCESS rimitive polyomials selectio method for pseudo-radom umber geerator To cite this article: I V Aiki ad Kh Alajjar 08 J. hys.: Cof. Ser. 944 0003 View the

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Characterizing graphs of maximum principal ratio

Characterizing graphs of maximum principal ratio Characterizig graphs of maximum pricipal ratio Michael Tait ad Josh Tobi November 9, 05 Abstract The pricipal ratio of a coected graph, deoted γg, is the ratio of the maximum ad miimum etries of its first

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

A Parallel DFA Minimization Algorithm

A Parallel DFA Minimization Algorithm A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information