Self-tuning Histograms: Building Histograms Without Looking at Data

Size: px
Start display at page:

Download "Self-tuning Histograms: Building Histograms Without Looking at Data"

Transcription

1 Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com Abstract In ths paper, we ntroduce self-tunng hstograms. Although smlar n structure to tradtonal hstograms, these hstograms nfer data dstrbutons not by examnng the data or a sample thereof, but by usng feedback from the query executon engne about the actual selectvty of range selecton operators to progressvely refne the hstogram. Snce the cost of buldng and mantanng self-tunng hstograms s ndependent of the data sze, self-tunng hstograms provde a remarkably nexpensve way to construct hstograms for large data sets wth lttle up-front costs. Self-tunng hstograms are partcularly attractve as an alternatve to mult-dmensonal tradtonal hstograms that capture dependences between attrbutes but are prohbtvely expensve to buld and mantan. In ths paper, we descrbe the technques for ntalzng and refnng self-tunng hstograms. Our expermental results show that self-tunng hstograms provde a low-cost alternatve to tradtonal mult-dmensonal hstograms wth lttle loss of accuracy for data dstrbutons wth low to moderate skew. 1. Introducton Database systems requre knowledge of the dstrbuton of the data they store. Ths nformaton s prmarly used by query optmzers to estmate the selectvtes of the operatons nvolved n a query and choose the query executon plan. It could also be used for other purposes such as approxmate query processng, load balancng n parallel database systems, and gudng the process of samplng from a relaton. Hstograms are wdely used for capturng data dstrbutons. They are used n most commercal database systems such as Mcrosoft SQL Server, Oracle, Informx, and DB2. Whle hstograms mpose very lttle cost at query optmzaton tme, the cost of buldng them and mantanng or rebuldng them when the data s modfed has to be consdered when we choose the attrbutes or attrbute combnatons for whch we buld hstograms. Buldng a hstogram nvolves scannng or samplng the data, and sortng the data and parttonng t nto buckets, or fndng quantles. For large databases, the cost s sgnfcant enough to prevent us from buldng all the hstograms that we Work done whle the author was at Mcrosoft Research beleve are useful. Ths problem s partcularly strkng for multdmensonal hstograms that capture jont dstrbutons of correlated attrbutes [MD88, PI97]. These hstograms can be extremely useful for optmzng decson-support queres snce they provde valuable nformaton that helps n estmatng the selectvtes of mult-attrbute predcates on correlated attrbutes. Despte ther potental, to the best of our knowledge, no commercal database system supports mult-dmensonal hstograms. The usual alternatve to mult-dmensonal hstograms s to assume that the attrbutes are ndependent, whch enables usng a combnaton of one-dmensonal hstograms. Ths approach s effcent but also very naccurate. The naccuracy results n a poor choce of executon plans by the query optmzer. Self-tunng Hstograms In ths paper, we explore a novel approach that helps reduce the cost of buldng and mantanng hstograms for large tables. Our approach s to buld hstograms not by examnng the data but by usng feedback nformaton about the executon of the queres on the database (query workload). We start wth an ntal hstogram bult wth whatever nformaton we have about the dstrbuton of the hstogram attrbute(s). For example, we wll construct an ntal two-dmensonal hstogram from two exstng onedmensonal hstograms assumng ndependence of the attrbutes. As queres are ssued on the database, the query optmzer uses the hstogram to estmate selectvtes n the process of choosng query executon plans. Whenever a plan s executed, the query executon engne can count the number of tuples produced by each operator. Our approach s to use ths free feedback nformaton to refne the hstogram. Whenever a query uses the hstogram, we compare the estmated selectvty to the actual selectvty and refne the hstogram based on the selectvty estmaton error. Ths ncremental refnement progressvely reduces estmaton errors and leads to a hstogram that s accurate for smlar workloads. We call hstograms bult usng ths process self-tunng hstograms or ST-hstograms for short. Ths work was done n the broader context of the AutoAdmn project at Mcrosoft Research ( that nvestgates technques to make databases self-tunng. ST-hstograms make t possble to buld hgher dmensonal hstograms ncrementally wth lttle overhead, thus provdng commercal systems wth a low-cost approach to creatng and mantanng such hstograms. The ST-hstograms have a low upfront cost because they are ntalzed wthout lookng at the data. The refnement of ST-hstograms s a smple low-cost procedure that leverages free nformaton from the executon engne. Furthermore, we demonstrate that hstogram refnement converges quckly. Thus, the overall cost of ST-hstograms s much lower than that of tradtonal mult-dmensonal hstograms, yet the loss of accuracy s very acceptable for data wth low to moderate skew n the jont dstrbuton of the attrbutes.

2 ST-hstogram Plan Optmze Execute Result On-lne Refnement Result Sze of Selecton Refne Off-lne Refnement Later Workload Log Fgure 1: On-lne and off-lne refnement of ST-hstograms A ST-hstogram can be refned on-lne or off-lne (Fgure 1). In the on-lne mode, the module executng a range selecton mmedately updates the hstogram. In the off-lne mode, the executon module wrtes every selecton range and ts result sze to a workload log. Tools avalable wth commercal database systems, e.g., Profler n Mcrosoft SQL Server, can accomplsh such loggng. The workload log s used to refne the hstogram n a batch at a later tme. On-lne refnement ensures that the hstogram reflects the most up-to-date feedback nformaton but t mposes more overhead durng query executon than off-lne refnement and can also cause the hstogram to become a hghcontenton hot spot. The overhead mposed by hstogram refnement, whether on-lne or off-lne, can easly be talored. In partcular, the hstogram need not be refned n response to every sngle selecton that uses t. We can choose to refne the hstogram only for selectons wth a hgh selectvty estmaton error. We can also skp refnng the hstogram durng perods of hgh load or when there s contenton for accessng t. On-lne refnement of ST-hstograms brngs a ST-hstogram closer to the actual data dstrbuton, whether the estmaton error drvng ths refnement s due to the ntal naccuracy of the hstogram or to modfcatons n the underlyng relaton. Thus, ST-hstograms automatcally adapt to database updates. Another advantage of ST-hstograms s that ther accuracy depends on how often they are used. The more a ST-hstogram s used, the more t s refned, the more accurate t becomes. Applcatons of Self-tunng Hstograms One can expect tradtonal hstograms bult by lookng at the data to be more accurate than ST-hstograms that learn the dstrbuton wthout ever lookng at the data. Nevertheless, SThstograms, and especally mult-dmensonal ST-hstograms, are sutable for a wde range of applcatons. As mentoned above, mult-dmensonal ST-hstograms are partcularly attractve. Tradtonal mult-dmensonal hstograms, most notably MHIST-p hstograms [PI97], are sgnfcantly more expensve than tradtonal one-dmensonal hstograms, ncreasng the value of the savngs n cost offered by ST-hstograms. Furthermore, ST-hstograms are very compettve n terms of accuracy wth MHIST-p hstograms for data dstrbutons wth low to moderate skew (Secton 5). Mult-dmensonal SThstograms can be ntalzed usng tradtonal one-dmensonal hstograms and subsequently refned to provde a cheap and effcent way of capturng the jont dstrbuton of multple attrbutes. The other nexpensve alternatve of assumng ndependence has been repeatedly demonstrated to be naccurate (see, for example, [PI97] and our experments n Secton 5). Furthermore, note that buldng tradtonal hstograms s an offlne process, meanng that hstograms cannot be used untl the system ncurs the whole cost of completely buldng them. Ths s not true of ST-hstograms. Fnally, note that ST-hstograms make t possble to nexpensvely buld not only two-dmensonal, but also n-dmensonal hstograms. ST-hstograms are also a sutable alternatve when there s not enough tme for updatng database statstcs to allow buldng all the desred hstograms n the tradtonal way. Ths may happen n data warehouses that are updated perodcally wth huge amounts of data. The sheer data sze may prohbt rebuldng all the desred hstograms durng the batch wndow. Ths very same data sze makes ST-hstograms an attractve opton, because examnng the workload to buld hstograms wll be cheaper than examnng the data and can be talored to a gven tme budget. The technque of ST-hstograms can be an ntegral part of database servers as we move towards self-tunng database systems. If a self-tunng database system decdes that a hstogram on some attrbute or attrbute combnaton may mprove performance, t can start by buldng a ST-hstogram. The low cost of ST-hstograms allows the system to experment more extensvely and try out more hstograms than f tradtonal hstograms were the only choce. Subsequently, one can construct a tradtonal hstogram only f the ST-hstogram does not provde the requred accuracy. Fnally, an ntrgung possble applcaton of ST-hstograms wll be for applcatons that nvolve queres on remote data sources. Wth recent trends n database usage, query optmzers wll have to optmze queres nvolvng remote data sources not under ther drect control, e.g., queres nvolvng data sources accessed over the Internet. Accessng the data and buldng tradtonal hstograms for such data sources may not be easy or even possble. Query results, on the other hand, are avalable from the remote source, makng the technque of ST-hstograms an attractve opton. The rest of ths paper s organzed as follows. In Secton 2 we present an overvew of the related work. Secton 3 descrbes onedmensonal ST-hstograms and ntroduces the basc concepts that lead towards Secton 4 where we descrbe mult-dmensonal SThstograms. Secton 5 presents an expermental evaluaton of our proposed technques. Secton 6 contans concludng remarks. 2. Related Work Hstograms were ntroduced n [Koo80], and most commercal database systems now use hstograms for selectvty estmaton.

3 Although one-dmensonal equ-depth hstograms are used n most commercal systems, more accurate hstograms have been proposed recently [PIHS96]. [PI97] extends the technques n [PIHS96] to multple dmensons. However, we are unaware of any commercal systems that use the MHIST-p technque proposed n [PI97]. A novel approach for buldng hstograms based on wavelets s presented n [MVW98]. A major dsadvantage of hstograms s the cost of buldng and mantanng them. Some recent work has addressed ths shortcomng. [MRL98] proposes a one-pass algorthm for computng approxmate quantles that could be used to buld approxmate equ-depth hstograms n one pass over the data. Reducng the cost of mantanng equ-depth and compressed hstograms s the focus of [GMP97]. Recall that our approach s not to examne the data at all, but to buld hstograms usng feedback from the query executon engne. However, our technque for refnng ST-hstograms shares commonaltes wth the splt and merge algorthm proposed n [GMP97]. Ths relatonshp s further dscussed n Secton 3. In addton to hstograms, another technque for selectvty estmaton s samplng the data at query optmzaton tme [LNS90]. The man dsadvantage of ths approach s the overhead t adds to query optmzaton. The concept of usng feedback from the query executon engne to estmate data dstrbutons s ntroduced n [CR94]. In ths paper, the data dstrbuton s represented as a lnear combnaton of model functons. Feedback nformaton s used to adjust the weghtng coeffcents of ths lnear combnaton by a method called recursve-least-square-error. Ths paper only consders one-dmensonal dstrbutons. It remans an open problem whether one can fnd sutable mult-dmensonal model functons, or whether the recursve least-square-error technque would work well for mult-dmensonal dstrbutons. In contrast, we show how our technque can be used to construct multdmensonal hstograms as well as one-dmensonal hstograms. Furthermore, our work s easly ntegrated nto exstng systems because we use the same hstogram data structures that are currently supported n commercal systems. A dfferent type of feedback from the executon engne to the optmzer s proposed n [KD98]. In ths paper, the executon engne nvokes the query optmzer to re-optmze a query f t beleves, based on statstcs collected durng executon, that ths wll result n a better query executon plan. 3. One-dmensonal ST-hstograms Although the man focus of our paper s to demonstrate that SThstograms are low cost alternatves to tradtonal multdmensonal hstograms, the fundamentals of ST-hstograms are best ntroduced usng ST-hstograms for sngle attrbutes. Sngleattrbute ST-hstograms are smlar n structure to tradtonal hstograms. Such a ST-hstogram conssts of a set of buckets. Each bucket, b, stores the range that t represents, [low(b), hgh(b)], and the number of tuples n ths range, or the frequency, freq(b). Adjacent buckets share the bucket endponts, and the ranges of all the buckets together cover the entre range of values of the hstogram attrbute. We assume that the refnement of SThstograms s drven by feedback from range selecton queres. A ST-hstogram assumes that the data s unformly dstrbuted untl the feedback observaton contradcts the unformty assumpton. Thus, the refnement/restructurng of ST-hstograms corresponds to weakenng the unformty assumpton as needed n response to feedback nformaton. Therefore, the lfecycle of a ST-hstogram conssts of two stages. Frst, t s ntalzed and then, t s refned. The process of refnement can be broken down further nto two parts: (a) refnng ndvdual bucket frequences, and (b) restructurng the hstogram,.e., movng the bucket boundares. The refnement process s drven by a query workload (see Secton 1). The bucket frequences are updated wth every range selecton on the hstogram attrbute, whle the bucket boundares are updated by perodcally restructurng the hstogram. We descrbe each of these steps n the rest of the secton. 3.1 Intal Hstogram To buld a ST-hstogram, h, on an attrbute, a, we need to know the requred number of hstogram buckets, B, the number of tuples n the relaton, T, and the mnmum and maxmum values of attrbute a, mn and max. The B buckets of the ntal hstogram are evenly spaced between mn and max. At the tme of ntalzng the hstogram structure, we have no feedback nformaton. Therefore, we make the unformty assumpton and assgn each of the buckets a frequency of T/B tuples (wth some provson for roundng) The parameter T can be looked up from system catalogs mantaned for the database. However, the system may not store mnmum and maxmum values of attrbutes n ts catalogs. The precse value of the mnmum and maxmum s not crtcal. Therefore, the ntalzaton phase of ST-hstograms can explot addtonal sources to project an estmate that may subsequently be refned. For example, doman constrants on the column, as well as the mnmum and maxmum values referenced n the query workload can be used for such estmaton. 3.2 Refnng Bucket Frequences The bucket frequences of a ST-hstogram are refned (updated) wth feedback nformaton from the queres of the workload. For every selecton on the hstogram attrbute, we compute the absolute estmaton error, whch s the dfference between the estmated and actual result szes. Based on ths error, we refne the frequences of the buckets that were used n estmaton. The key problem s to decde how to dstrbute the blame for the error among the hstogram buckets that overlap the range of a gven query. In a ST-hstogram, error n estmaton may be due to ncorrect frequences n any of the buckets that overlap the selecton range. Ths s dfferent from tradtonal hstograms n whch, f the hstogram has been bult usng a full scan of data and has not been degraded n accuracy by database updates, the estmaton error can result only from the frst or last bucket, and only f they partally overlap the selecton range. Buckets that are totally contaned n the selecton range do not contrbute to the error. The change n frequency of any bucket should depend on how much t contrbutes to the error. We use the heurstc that buckets wth hgher frequences contrbute more to the estmaton error than buckets wth lower frequences. Specfcally, we assgn the blame for the error to the buckets used for estmaton n proporton to ther current frequences. An alternatve heurstc, not studed n ths paper, s to assgn the blame n proporton to the current ranges of the buckets. Fnally, we multply the estmaton error by a dampng factor between 0 and 1 to make sure that bucket frequences are not modfed too much n response to errors, as ths may lead to oversenstve or unstable hstograms. Fgure 2 presents the algorthm for updatng the bucket frequences of a ST-hstogram, h, n response to a range selecton, [rangelow,rangehgh], wth actual result sze act. Ths algorthm

4 algorthm UpdateFreq Inputs: h, rangelow, rangehgh, act Outputs: h wth updated bucket frequences begn 1 Get the set of k buckets overlappng the selecton range,{ b 1, b 2,, b k }; 2 est = Estmated result sze of selecton usng hstogram h; 3 esterr = act est ; / Compute the absolute estmaton error. / 4 / Dstrbute the error among the buckets n proporton to frequency. / 5 for = 1 to k do 6 frac = mn( rangehgh, hgh( b )) max( rangelow, low( b )) + 1 ; hgh( b ) low( b ) freq( b ) = max (freq( b ) + esterr frac freq( b ) / est, 0) ; 8 endfor end UpdateFreq Fgure 2: Algorthm for updatng bucket frequences n one-dmensonal ST-hstograms s used for both on-lne and off-lne refnement. The algorthm frst determnes the hstogram buckets that overlap the selecton range, whether they partally overlap the range or are totally contaned n t, and the estmated result sze. The query optmzer usually obtans ths nformaton durng query optmzaton, so we can save some effort by retanng ths nformaton for subsequently refnng bucket frequences. Next, the algorthm computes the absolute estmaton error, denoted by esterr (lne 3 n Fgure 2). The error formula dstngushes between overestmaton, ndcated by a negatve error and requrng the bucket frequences to be lowered, and underestmaton, ndcated by a postve error and requrng the bucket frequences to be rased. As mentoned earler, the blame for ths error s assgned to hstogram buckets n proporton to the frequences that they contrbute to the result sze. We assume that each bucket contans all possble values n the range that t represents, and we approxmate all frequences n a bucket by ther average (.e., we make the contnuous values and unform frequences assumptons [PIHS96]). Under these assumptons, the contrbuton of a hstogram bucket to the result sze s equal to ts frequency tmes the fracton of the bucket overlappng the selecton range. Ths fracton s the length of the nterval where the bucket overlaps the selecton range dvded by the length of the nterval represented by the bucket (lne 6). To dstrbute the error among buckets n proporton to frequency, each bucket s assgned a porton of the absolute estmaton error, esterr, equal to ts contrbuton to the result sze, frac freq( b ), dvded by the * total result sze, est, damped by a dampng factor, (lne 7). We expermentally demonstrate n Secton 5 that the refnement process s robust across a wde range of values for, and we recommend usng values of n the range 0.5 to Restructurng Refnng bucket frequences s not enough to get an accurate hstogram. The frequences n a bucket are approxmated by ther average. If there s a large varaton n frequency wthn a bucket, the average frequency s a poor approxmaton of the ndvdual frequences, no matter how accurate t s. Specfcally, hgh frequency values wll be contaned n hgh frequency buckets, but they may be grouped wth low frequency values n these buckets. Thus, n addton to refnng the bucket frequences, we must also restructure the buckets,.e., move the bucket boundares to get a better parttonng that avods groupng hgh frequency and low frequency values n the same buckets. Ideally, we would lke to make hgh frequency buckets as narrow as possble. In the lmt, ths approach separates out hgh frequency values n sngleton buckets of ther own, a common objectve for hstograms (e.g., see [PIHS96]). Therefore, we choose buckets that currently have hgh frequency and splt them nto several buckets. Splttng nduces the separaton of hgh frequency and low frequency values nto dfferent buckets, and the frequency refnement process later adjusts the frequences of these new buckets. In order to ensure that the number of buckets assgned to the SThstogram does not ncrease due to splttng, we need a mechansm to reclam buckets as well. To that end, we use a step of mergng that groups a run of consecutve buckets wth smlar frequences nto one bucket. Thus, our approach s to restructure the hstogram perodcally by mergng buckets and usng the buckets thus freed to splt hgh frequency buckets. Restructurng may be trggered usng a varety of heurstcs. In ths paper, we study the smplest scheme where the restructurng process s nvoked after every R selectons that use the hstogram. The parameter R s called the restructurng nterval. To merge buckets wth smlar frequences, we frst have to decde how to quantfy smlar frequences. We assume that two bucket frequences are smlar f the dfference between them s less than m percent of the number of tuples n the relaton, T. m s a parameter that we call the merge threshold. In most of our experments, m 1% was a sutable choce. We use a greedy strategy to form a run of adjacent buckets wth smlar frequences and collapse them nto a sngle bucket. We repeat ths step untl no further mergng s possble that satsfes the merge threshold condton (Steps 2 9 n Fgure 3). We also need to decde whch hgh frequency buckets to splt. We choose to splt the s percent of the buckets wth the hghest frequences. s s a parameter that we call the splt threshold. In our experments, we used s=10%. Our heurstc dstrbutes the reclamed buckets among the hgh frequency buckets n proporton to frequency. The hgher the frequency of a bucket, the more extra buckets t gets. Fgure 3 presents the algorthm for restructurng a SThstogram, h, of B buckets on a relaton wth T tuples. The frst step n hstogram restructurng s greedly fndng runs of consecutve buckets wth smlar frequences to merge. The algorthm repeatedly fnds the par of adjacent runs of buckets such that the maxmum dfference n frequency between a bucket n the frst run and a bucket n the second run s the mnmum

5 algorthm RestructureHst Inputs: h Outputs: restructured h begn 1 / Fnd buckets wth smlar frequences to merge. / 2 Intalze B runs of buckets such that each run contans one hstogram bucket; 3 For every two consecutve runs of buckets, fnd the maxmum dfference n frequency between a bucket n the 4 frst run and a bucket n the second run; 5 Fnd the mnmum of all these maxmum dfferences, mndff; 6 f mndff m T then 7 Merge the two runs of buckets correspondng to mndff nto one run; 8 Look for other runs to merge. Goto lne 3; 9 endf / Assgn the extra buckets freed by mergng to the hgh frequency buckets. / 12 k = s B; 13 Fnd the set, { b 1, b2,, b k } of buckets wth the k hghest frequences that were not chosen to be 14 merged wth other buckets n the mergng step; 15 Assgn the buckets freed by mergng to the buckets of ths set n proporton to ther frequences; / Construct the restructured hstogram by mergng and splttng. / 18 Merge each prevously formed run of buckets nto one bucket spannng the range represented by all the buckets 19 n the run and havng a frequency equal to the sum of ther frequences; 20 Splt the k buckets chosen for splttng, gvng each one the number of extra buckets assgned to t earler. 21 The new buckets are evenly spaced n the range spanned by the old bucket and the frequency of the old 22 bucket s equally dstrbuted among them; end RestructureHst Fgure 3: Algorthm for restructurng one-dmensonal ST-hstograms over all pars of adjacent runs. The two runs are merged nto one f ths dfference s less than the threshold m T, and we stop lookng for runs to merge f t s not. Ths process results n a number of runs of several consecutve buckets. Each run s replaced wth one bucket spannng ts entre range, and wth a frequency equal to the total frequency of all the buckets n the run. Ths frees a number of buckets to allocate to hgh frequency buckets durng splttng. Splttng starts by dentfyng the s percent of the buckets that have the hghest frequences and are not sngleton buckets. We avod splttng buckets that have been chosen for mergng snce ther selecton ndcates that they have smlar frequences to ther neghbors. The extra buckets freed by mergng are dstrbuted among the buckets beng splt n proporton to ther frequences. A bucket beng splt, b, gets freq( b ) / totalfreq of the extra buckets, where totalfreq s the total frequency of the buckets beng splt. To splt a bucket, t s replaced wth tself plus the extra buckets assgned to t. These new buckets evenly dvde the range of the old bucket, and the frequency of the old bucket s evenly dstrbuted among them. Splttng and mergng are used n [GMP97] to redstrbute hstogram buckets n the context of mantanng approxmate equdepth and compressed hstograms. The algorthm n [GMP97] merges pars of buckets whose total frequency s less than a threshold, whereas our algorthm merges runs of buckets based on the dfferences n ther frequency. Our algorthm assgns the freed buckets to the buckets beng splt n proporton to the frequences of the latter, whereas the algorthm n [GMP97] merges only one par of buckets at a tme and can, thus, splt only one bucket nto two. A key dfference between the two approaches s that n [GMP97], a sample of the tuples of the relaton s contnuously mantaned (the backng sample ), and buckets are splt at ther approxmate medans computed from ths sample. On the other hand, our approach does not examne the data at any pont, so we do not have nformaton smlar to that represented n the backng sample of [GMP97]. Hence, our restructurng algorthm splts buckets at evenly spaced ntervals, wthout usng any nformaton about the data dstrbuton wthn a bucket. Fgure 4 gves an example of hstogram restructurng. In ths example, the merge threshold s such that algorthm RestructureHst merges buckets f the dfference between ther frequences s wthn 3. The algorthm dentfes two runs of buckets to be merged, buckets 1 and 2, and buckets 4 to 6. Mergng these runs frees three buckets to assgn to hgh frequency buckets. The splt threshold s such that we splt the two buckets wth the hghest frequences, buckets 8 and 10. Assgnng the extra buckets to these two buckets n proporton to frequency means that bucket 8 gets two extra buckets and bucket 10 gets one extra bucket. Splttng may unnecessarly separate values wth smlar, low frequences nto dfferent buckets. Such runs of buckets wth smlar low frequences would be merged durng subsequent restructurng. Notce that splttng dstorts the frequency of a bucket by dstrbutng t among the new buckets. Ths means that the hstogram may lose some of ts accuracy by restructurng. Ths accuracy s restored when the bucket frequences are refned through subsequent feedback. In summary, our model s as follows: The frequency refnement process s appled to the hstogram, and the refned frequency nformaton s perodcally used to restructure the hstogram. Restructurng may reduce accuracy by dstrbutng frequences among buckets durng splttng but frequency refnement restores, and hopefully ncreases, hstogram accuracy.

6 Merge: m*t = 3 Splt: s*b = 2 3 extra buckets Frequences Merge 1 extra bucket Merge 2 extra buckets Splt Splt Buckets Frequences Buckets Fgure 4: Example of hstogram restructurng 4. Mult-dmensonal ST-hstograms In ths secton, we present mult-dmensonal (.e., mult-attrbute) ST-hstograms. Our goal s to buld hstograms representng the jont dstrbuton of multple attrbutes of a sngle relaton. These hstograms wll be used to estmate the result sze of conjunctve range selectons on these attrbutes, and are refned based on feedback from these selectons. Usng accurate one-dmensonal hstograms for all the attrbutes s not enough, because they do not reflect the correlaton between attrbutes. In ths secton, we dscuss the specal consderatons for mult-dmensonal hstograms. Workng n multple dmensons rases the ssue of how to partton the mult-dmensonal space nto hstogram buckets. The effectveness of ST-hstograms stems from ther ablty to pnpont the buckets contrbutng to the estmaton error and learn the data dstrbuton. The parttonng we choose must effcently support ths learnng process. It must also be a parttonng that s easy to construct and mantan, because we want the cost of SThstograms to reman as low as possble. To acheve these objectves, we use a grd parttonng of the mult-dmensonal space. Each dmenson of the space s parttoned nto a number of parttons. The parttons of a dmenson may vary n sze, but the parttonng of the space s always fully descrbed by the parttonng of the dmensons. We choose a grd parttonng due to ts smplcty and low cost, even though t does not offer as much flexblty n groupng values nto buckets as other parttonngs such as, for example, the MHIST-p hstogram parttonng [PI97]. The smplcty of a grd parttonng allows our hstograms to have more buckets for a gven amount of memory. It s easer for ST-hstograms to nfer the data dstrbuton from feedback nformaton when workng wth a smple hgh-resoluton representaton of the dstrbuton than t s when workng wth a complex low-resoluton representaton. Furthermore, we doubt that the smple feedback nformaton used for refnement can be used to glean enough nformaton about the data dstrbuton to justfy a more complex parttonng. Each dmenson,, of an n-dmensonal ST-hstogram s parttoned nto B parttons. B does not necessarly equal B j for j. The parttonng of the space s descrbed by n arrays, one per dmenson, whch we call the scales [NHS84]. Each array element of the scales represents the range of one partton, [low,hgh]. In addton to the scales, a mult-dmensonal SThstogram has an n-dmensonal matrx representng the grd cell Scales attrbute 2 1 [1,10] 2 [11,20] 3 [21,30] 4 [31,40] 5 [41,50] attrbute [1,5] [6,10] [11,15][16,20][21,25] Frequency matrx Range selecton usng hstogram Fgure 5: A 2d ST-hstogram and a range selecton usng t frequences, whch we call the frequency matrx. Fgure 5 presents an example of a 5 5 two-dmensonal ST-hstogram and a range selecton that uses t. 4.1 Intal Hstogram To buld a ST-hstogram on attrbutes, a 1, a 2,, a n, we can assume complete unformty and ndependence, or we can use exstng one-dmensonal hstograms but assume ndependence of the attrbutes as the startng pont. If we start wth the unformty and ndependence assumpton, we need to know the mnmum and maxmum values of each attrbute a, mn and max. We also need to specfy the number of parttons for each dmenson, B 1, B 2,, B n. Then, each dmenson,, s parttoned nto B equally spaced parttons, and the T tuples of the relaton are evenly dstrbuted among all the buckets of the frequency matrx. Ths technque s an extenson of one-dmensonal ST-hstograms. Another way of buldng mult-dmensonal ST-hstograms s to start wth tradtonal one-dmensonal hstograms on all the mult-dmensonal hstogram attrbutes. Such one-dmensonal hstograms, f they are avalable, provde a better startng pont than assumng unformty and ndependence. In ths case, we ntalze the scales by parttonng the space along the bucket boundares of the one-dmensonal hstograms, and we ntalze the frequency matrx usng the bucket frequences of the onedmensonal hstograms and assumng that the attrbutes are ndependent. Under the ndependence assumpton, the ntal frequency of a cell of the frequency matrx s gven by n 1 freq[ j1, j2,, jn ] = n freq [ j 1 ], where freq [ j ] s the T = 1 frequency of bucket j of the hstogram for dmenson.

7 . 4.2 Refnng Bucket Frequences The algorthm for refnng bucket frequences n the multdmensonal case s dentcal to the one-dmensonal algorthm presented n Fgure 2, except for two dfferences. Frst, fndng the hstogram buckets that overlap a selecton range (lne 1 n Fgure 2) now requres examnng a mult-dmensonal structure. Second, a bucket s now a mult-dmensonal cell n the frequency matrx, so the fracton of a bucket overlappng the selecton range (lne 6) s equal to the volume of the regon where the bucket overlaps the selecton range dvded by volume of the regon represented by the whole bucket (Fgure 5). 4.3 Restructurng Perodc restructurng s needed only for mult-dmensonal SThstograms ntalzed assumng unformty and ndependence. ST-hstograms ntalzed usng tradtonal one-dmensonal hstograms do not need to be perodcally restructured, assumng that the one-dmensonal hstograms are accurate. Ths s based on the assumpton that the parttonng of an accurate tradtonal one-dmensonal hstogram bult by lookng at the data s more accurate when used for mult-dmensonal ST-hstograms than a parttonng bult by splttng and mergng. As n the one-dmensonal case, restructurng n the multdmensonal case s based on mergng buckets wth smlar frequences and splttng hgh frequency buckets. The requred parameters are also the same, namely the restructurng nterval, R, the merge threshold, m, and the splt threshold, s. Restructurng changes the parttonng of the mult-dmensonal space one dmenson at a tme. The dmensons are processed n any order, and the partton boundares of each dmenson are modfed ndependent of other dmensons. The algorthm for restructurng one dmenson of the mult-dmensonal ST-hstogram s smlar to the algorthm n Fgure 3. However, mergng and splttng n multple dmensons present some addtonal problems. For an n-dmensonal ST-hstogram, every partton of the scales n any dmenson dentfes an (n-1)-dmensonal slce of the grd (e.g., a row or a column n a two-dmensonal hstogram). Thus, mergng two parttons of the scales requres mergng two slces of the frequency matrx, each contanng several buckets. Every bucket from the frst slce s merged wth the correspondng bucket from the second slce. To decde whether or not to merge two slces, we fnd the maxmum dfference n frequency between any two correspondng buckets that would be merged f these two slces are merged. We merge the two slces only f ths dfference s wthn m T tuples. We use ths method to dentfy runs of parttons to merge. The hgh frequency parttons of any dmenson are splt by assgnng them the extra parttons freed by mergng n the same dmenson. Thus, restructurng does not change the number of parttons n a dmenson. To decde whch parttons to splt n any dmenson and how many extra parttons each one gets we use the margnal frequency dstrbuton along ths dmenson. The margnal frequency of a partton s the total frequency of all buckets n the slce of the frequency matrx that t dentfes. Thus, the margnal frequency of partton j n dmenson s gven by f ( j ) B = 1 B n j1 = 1 j 1 = 1 j+ 1= 1 jn = 1 B B freq[ j 1, j2,, jn] As n the onedmensonal case, we splt the s percent of the parttons n any dmenson wth the hghest margnal frequences, and we assgn them the extra parttons n proporton to ther current margnal frequences. 1 [1,10] 2 [11,20] 3 [21,30] 4 [31,40] 5 [41,50] 1 [1,10] 2 [11,20] 3 [21,25] 4 [26,30] 5 [31,50] [1,5] [6,10] [11,15][16,20][21,25] Fgure 6: Restructurng the vertcal dmenson Fgure 6 demonstrates restructurng the hstogram n Fgure 5 along the vertcal dmenson (attrbute 2). In ths example, the merge threshold s such that we merge two parttons f the maxmum dfference n frequency between buckets n ther slces that would be merged s wthn 5. Ths condton leads us to merge parttons 4 and 5. The splt threshold s such that we splt one partton along the vertcal dmenson. We compute the margnal frequency dstrbuton along the vertcal dmenson and dentfy the partton wth the maxmum margnal frequency, partton 3. Mergng and splttng (wth some provsons for roundng) result n the shown hstogram. 5. Expermental Evaluaton In ths secton, we present an expermental evaluaton of our technques usng synthetc data sets and workloads. We nvestgate the accuracy and effcency of one and multdmensonal ST-hstograms. In partcular, we are nterested n the accuracy of ST-hstograms for data dstrbutons wth varyng degrees of skew, and for workloads wth dfferent access patterns. We examne whether hstogram refnement converges to an accurate state, or whether t oscllates n response to refnement. Another mportant consderaton s how well ST-hstograms adapt to database updates, and how effcently they use the avalable memory. Due to space lmtatons, we present only a subset of the experments conducted. 5.1 Setup for Experments Maxmum frequency dfference 60-6 = = = = 4 5 Merge Margnal frequency dstrbuton max Splt Merge: m*t = 5 Splt: s*b 2 = Data Sets We present the results of experments usng one to threedmensonal nteger data sets. The results for hgher dmensonal data sets are smlar. The one-dmensonal data sets have 100K tuples and the mult-dmensonal data sets have 500K tuples. Each dmenson n a data set has V dstnct values drawn randomly from a doman rangng from 1 to V = 200, 100, and 10, for 1, 2, and 3 dmensons, respectvely. For multdmensonal data sets, the number of dstnct values and the domans of all dmensons are dentcal, and the value sets of all dmensons are generated ndependently. Frequences are generated accordng to the Zpfan dstrbuton [Zp49] wth parameter z = 0, 0.5, 1, 2, and 3. z controls the skew of the dstrbuton, wth z=0 representng a unform dstrbuton (no skew). For one-dmensonal data sets, the frequences are assgned at random to the values. For mult-dmensonal data

8 sets, the frequences are assgned at random to combnatons of values usng the technque proposed n [PI97], namely assgnng the generated frequences to randomly chosen cells n the jont frequency dstrbuton matrx Query Workloads We use workloads consstng of random range selecton queres n one or more dmensons. Each workload conssts of 2000 ndependent selecton queres. Most experments use random workloads, n whch the corner ponts of each selecton range are ndependently generated from a unform dstrbuton over the entre doman. Some experments use workloads wth localty of reference. The attrbute values used for selecton range corner ponts n these workloads are generated from pecewse unform dstrbutons n whch there s an 80% probablty of choosng a value from a localty range that s 20% of the doman. The localty ranges for the dfferent dmensons are ndependently chosen at random accordng to a unform dstrbuton Hstograms Unless otherwse stated, we use 100, 50, and 15 buckets per dmenson for 1, 2, and 3 dmensonal ST-hstograms, respectvely. For mult-dmensonal ST-hstograms, we use the same number of buckets n all dmensons, resultng n two and three-dmensonal hstograms wth a total of 2500 and 3375 buckets. The one, two, and three-dmensonal ST-hstograms occupy 1.2, 10.5, and 13.5 klobytes of memory, respectvely. Our tradtonal hstograms of choce are MaxDff(V,A) hstograms for one dmenson, and MHIST-2 MaxDff(V,A) hstograms for multple dmensons. These hstograms were recommended n [PIHS96] and [PI97] for ther accuracy and ease of constructon. We compare the accuracy of ST-hstograms to tradtonal hstograms of these types occupyng the same amount of memory. We consder a wder range of memory allocaton than most prevous works (e.g., [PIHS96], [PI97], and [MVW98]) because of current trends n memory technology. We also demonstrate that our technques are effectve across a wde range of avalable memory (Secton 5.7). Note that the cost of buldng and mantanng tradtonal hstograms s a functon of the sze of the relaton (or the sze of the sample used to buld the hstogram). In contrast, the cost of ST-hstograms s ndependent of the data sze and depends on the sze of the query workload used for refnement Refnement Parameters Unless otherwse stated, the parameters we use for restructurng the hstogram (Secton 3.3) are a restructurng nterval, R=200 queres, a merge threshold, m=0.025%, and a splt threshold, s=10%. For frequency refnement (Secton 3.2), we use a dampng factor, =0.5 for one dmenson, and =1 for multple dmensons Measurng Hstogram Accuracy We use the relatve estmaton error (abs(actual result sze - estmated result sze) / actual result sze) to measure the accuracy of query result sze estmaton. To measure accuracy over an entre workload, we use the average relatve estmaton error for all queres n the workload, gnorng queres whose actual result sze s zero. One mportant queston s wth respect to whch workload should we measure the accuracy of a ST-hstogram. Recall that the premse of ST-hstograms s that they are able to adapt to feedback from query executon. Therefore, for our evaluaton we generate workloads that are statstcally smlar, but not the same as the tranng workload. Unless otherwse stated, our experments use off-lne hstogram refnement. Our steps for verfyng the effectveness of ST-hstograms for some partcular data set are: 1. Intalze a ST-hstogram for the data set. 2. Issue the query workload that wll be used to refne the hstogram and generate a workload log. We call ths the refnement workload. 3. Refne the hstogram off-lne based on the generated workload log. 4. After refnement, ssue the refnement workload agan and compute the estmaton error. Verfy that the error after refnement s less than the error before refnement. 5. Issue a dfferent workload n whch the queres have the same dstrbuton as the workload used for refnement. We call ths the test workload. We cannot expect the workload ssued before refnement to be repeated exactly after refnement, but we can reasonably expect a workload wth smlar statstcal characterstcs. The ultmate test of accuracy s whether the ST-hstogram performs well on the test workload. 5.2 Accuracy of One-dmensonal ST-hstograms In ths secton, we expermentally study the effectveness of onedmensonal ST-hstograms for a wde range of data skew (z) usng random workloads and the procedure outlned n Secton We demonstrate that ST-hstograms are always better than assumng unformty, and that they are compettve wth MaxDff(V,A) hstograms n terms of accuracy except for hghly skewed data sets. Relatve Error z (skew) Assumng Unformty Before Refnement After Refnement After Refnement - Test Workload MaxDff(V,A) MaxDff(V,A) - Test Workload Fgure 7: One-dmensonal data, random workload Fgure 7 presents the estmaton errors for a random refnement workload on one-dmensonal data sets wth varyng z. For each data set, the fgure presents the estmaton error for the random refnement workload assumng a unform dstrbuton and usng the ntal ST-hstogram constructed assumng unformty. The estmaton errors n these two cases are dfferent due to roundng errors durng hstogram ntalzaton. The fgure also presents the average relatve estmaton error for the random refnement workload usng the refned ST-hstogram when ths workload s ssued agan after t s used for refnement. It also presents the error for a statstcally smlar test workload usng the refned ST-hstogram. Fnally, the fgure presents the estmaton errors for the refnement and test workloads usng a tradtonal MaxDff(V,A) hstogram occupyng the same amount of memory as the ST-hstogram.

9 Hstogram refnement results n a sgnfcant reducton n estmaton error for all values of z. Ths reduced error s observed for both the refnement workload and the test workload ndcatng a true mprovement n hstogram qualty. Thus, ST-hstograms are always better than assumng unformty. The MaxDff(V,A) hstograms are more accurate than the ST-hstograms. Ths s expected because MaxDff(V,A) hstograms are bult based on the true dstrbuton determned by examnng the data. However, for low values of z, the estmaton errors usng refned ST-hstograms are very close to the errors usng MaxDff(V,A) hstograms, and are small enough for query optmzaton purposes. MaxDff(V,A) hstograms are consderably more accurate than ST-hstograms only for hghly skewed data sets (z 2). Ths s expected because as z ncreases, the data dstrbuton becomes more dffcult to capture usng smple feedback nformaton. At the same tme, the beneft of MaxDff(V,A) hstograms s maxmum for hghly skewed dstrbutons [PIHS96]. 5.3 Accuracy of Mult-Dmensonal ST-hstograms In ths secton, we show that mult-dmensonal ST-hstograms ntalzed usng tradtonal one-dmensonal hstograms are much more accurate than assumng ndependence. We also compare the performance of such ST-hstograms and MHIST-2 hstograms. In partcular, we demonstrate that these ST-hstograms are more accurate than MHIST-2 hstograms for low to moderate values of z (.e., low correlaton). Ths s an mportant result because t ndcates that ST-hstograms are better than MHIST-2 hstograms n both cost and accuracy for data dstrbutons wth low to medum correlaton. For ths paper, we only present the results of our experments wth ST-hstograms ntalzed usng tradtonal hstograms. Experments wth the less accurate ST-hstograms ntalzed assumng unformty and ndependence have smlar results. Fgures 8 and 9 present the results of usng mult-dmensonal ST-hstograms ntalzed usng MaxDff(V,A) hstograms and assumng ndependence for random workloads on two and threedmensonal data set wth varyng z. The nformaton presented s the same as n Fgure 7, except that we do not show the estmaton error assumng unformty because one would never assume unformty when one-dmensonal hstograms are avalable, and we compare the performance of the ST-hstograms aganst multdmensonal MHIST-2 hstograms nstead of one-dmensonal MaxDff(V,A) hstograms. Snce the ST-hstograms are ntalzed usng MaxDff(V,A) hstograms, usng them before refnement s the same as usng the one-dmensonal hstograms and assumng ndependence. The refned ST-hstograms are more accurate than assumng ndependence, and the beneft of usng them (.e., the reducton n error) ncreases as z ncreases. ST-hstograms are not as accurate as MHIST-2 hstograms for hgh z, especally n three dmensons. Ths ndcates that nferrng jont data dstrbutons based on smple feedback nformaton becomes ncreasngly dffcult wth ncreasng dmensonalty. As expected, MHIST-2 hstograms are very accurate for hgh z [PI97], but we must bear n mnd that the cost of buldng mult-dmensonal MHIST-2 hstograms s much more than the cost of buldng one-dmensonal MaxDff(V,A) hstograms. Furthermore, ths cost ncreases wth ncreasng dmensonalty. Notce, though, that ST-hstograms are more accurate than MHIST-2 hstograms for low z. Ths s because MHIST-2 hstograms use a complex parttonng of the space (as compared to ST-hstograms). Representng ths complex parttonng Relatve Error Relatve Error 35.00% 25.00% 15.00% 5.00% z (of jont dstrbuton) Fgure 8: Two-dmensons, startng wth MaxDff(V,A) z (of jont dstrbuton) Before Refnement After Refnement After Refnement - Test Workload MHIST-2 MHIST-2 - Test Workload Before Refnement After Refnement After Refnement - Test Workload MHIST-2 MHIST-2 - Test Workload Fgure 9: Three-dmensons, startng wth MaxDff(V,A) requres MHIST-2 hstograms to have complex buckets that consume more memory than ST-hstogram buckets. Consequently, ST-hstograms have more buckets than MHIST-2 hstograms occupyng the same amount of memory. For low z, the complex parttonng of MHIST-2 hstograms does not ncrease accuracy because the jont dstrbuton s close to unform so any parttonng s fne. On the other hand, the large number of buckets n ST-hstograms allows them to represent the dstrbuton at a fner granularty leadng to hgher accuracy. Ths result demonstrates the value of mult-dmensonal ST-hstograms for database systems. For data wth low to moderate skew, SThstograms provde an effectve way of capturng dependences between attrbutes at a low cost. Equ-wdth Equ-depth MaxDff(V,A) z Before After Before After Before After % 5.47% 6.41% 6.65% 4.93% 4.95% % 5.84% 8.67% 8.21% 6.64% 6.35% % 11.64% 39.94% 12.61% 36.37% 11.08% % % % 78.36% % 22.57% % % % 48.26% % 26.07% Table 1: Startng wth dfferent types of 1d hstograms Table 1 presents the estmaton errors for random workloads on two-dmensonal data sets wth varyng z usng ST-hstograms bult startng wth tradtonal one-dmensonal hstograms. The errors are shown before refnement and after off-lne refnement usng the same random workloads. All one-dmensonal hstograms have 50 buckets. In addton to MaxDff(V,A) hstograms, the table presents the errors when we start wth equwdth hstograms, whch are the smplest type of hstograms, and

10 when we start wth equ-depth hstograms, whch are currently used by many commercal database systems. The table shows that ST-hstograms are equally effectve for all three types of onedmensonal hstograms. 5.4 Effect of Localty of Reference n the Query Workload An nterestng ssue s studyng the performance of ST-hstograms on workloads wth localty of reference n accessng the data. Localty of reference s a fundamental concept underlyng all database accesses, so one would expect real lfe workloads to have such localty. Moreover, purely random workloads provde feedback nformaton about the entre dstrbuton, whle workloads wth localty of reference provde most of ther feedback about a small part of the dstrbuton. We would lke to know how effectve ths type of feedback s for hstogram refnement. In ths secton, we demonstrate that ST-hstograms perform well for workloads wth localty of reference. We also demonstrate that hstogram refnement adapts to changes n the localty range of the workload. Relatve Error Dmensons W1 Unf and Indep W1 Before Refnement W1 After Refnement W1 Tradtonal W2 Unf and Indep W2 Refned on W1 W2 After Refnement W2 Tradtonal Fgure 10: Workloads wth localty of reference, z=1 Fgure 10 presents the estmaton errors for workloads wth an 80%-20% localty for one and two-dmensonal data sets wth z=1. The frst four bars for each data set present the errors for a workload, W1. The frst two bars respectvely show the errors assumng unformty and ndependence, and usng an ntal SThstogram representng the unformty and ndependence assumpton. The bars are not dentcal because of roundng errors. The thrd bar shows the error usng the ST-hstogram when ssung W1 agan after t s used for refnement. The fourth bar shows the error for W1 usng a tradtonal hstogram. It s clear that refnement consderably mproves estmaton accuracy, makng the ST-hstogram almost as accurate as the tradtonal hstogram. Ths mprovement s also observed on test workloads that are statstcally smlar to W1. Next, we keep the refned hstogram and change the localty of reference of the workload. We ssue a new workload, W2, wth a dfferent localty range. The next four bars n Fgure 10 present the estmaton errors for W2. Frst, we ssue W2 and use the ST-hstogram refned on W1 for result sze estmaton (sxth bar). Ths hstogram s not as accurate for W2 as t was for W1, but t s better than assumng unformty and ndependence. Ths means that refnement was stll able to nfer some nformaton from the 20% of the queres of W1 that le outsde the localty range. When we refne the hstogram on W2 and ssue t agan, we see that the ST-hstogram becomes as accurate for W2 as t was for W1 after refnement. Ths mprovement s also seen for workloads that are statstcally smlar to W2. Relatve Error 1 2 Dmensons R1 Unf and Indep Fgure 11: Adaptng to database updates, z=1 R1 Before Refnement R1 After Refnement R1 Tradtonal R2 Unf and Indep R2 Refned on R1 R2 After Refnement R2 Tradtonal for R1 R2 Tradtonal 5.5 Adaptng to Database Updates The results of ths secton demonstrate that although SThstograms do not examne data, the feedback mechansm enables these hstograms to adapt to updates n the underlyng relaton. Fgure 11 presents the estmaton errors for one and twodmensonal data sets wth z=1 usng random workloads. The frst four bars present the estmaton errors for the orgnal relaton before update, whch we denote by R1. We update the relaton by deletng a random 25% of ts tuples and nsertng an equal number of tuples followng a Zpfan dstrbuton wth z=1. We denote ths updated relaton by R2. We retan the tradtonal and SThstograms bult for R1 and re-ssue the same random workload on R2. The ffth and sxth bars n Fgure 11 are the estmaton error for ths workload on R2 assumng unformty and ndependence, and usng the ST-hstogram that was refned for R1, respectvely. The hstogram s not as accurate as t was for R1, whch s expected, but t s stll more accurate than assumng unformty and ndependence. The seventh bar shows the error usng the ST-hstogram for R2 after refnement usng the same workload. Refnement restores the accuracy of the ST-hstogram and adapts t to the updates n the relaton. We also observe ths mprovement n error for statstcally smlar test workloads. The last two bars n Fgure 11 present the estmaton error for the random workload ssued on R2 usng the tradtonal hstograms for R1 and R2, respectvely. As expected, updatng the relaton reduces hstogram accuracy, and rebuldng the hstogram restores ths accuracy. 5.6 Refnement Parameters In ths secton, we nvestgate the effect of the refnement parameters: R, m, and s for restructurng and for updatng bucket frequences. Table 2 presents the average relatve estmaton errors for random test workloads usng ST-hstograms that have been refned off-lne usng other random refnement workloads for one to three-dmensonal data sets wth varyng z. For each data set, the error s presented f the hstogram s not restructured durng refnement, and f t s restructured wth R=200, m=0.025%, and s=10%. Restructurng has no beneft for low z, but as z ncreases the need for restructurng becomes evdent. Thus, restructurng extends the range of data skew for whch ST-hstograms are effectve. Fgure 12 presents the estmaton errors for random workloads and workloads wth localty of reference on one to threedmensonal data sets wth z=1 usng ST-hstograms that have been refned off-lne usng other statstcally smlar refnement workloads for =0.01 to 1. The estmaton errors are relatvely

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

Simplification of 3D Meshes

Simplification of 3D Meshes Smplfcaton of 3D Meshes Addy Ngan /4/00 Outlne Motvaton Taxonomy of smplfcaton methods Hoppe et al, Mesh optmzaton Hoppe, Progressve meshes Smplfcaton of 3D Meshes 1 Motvaton Hgh detaled meshes becomng

More information

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Estimating Costs of Path Expression Evaluation in Distributed Object Databases Estmatng Costs of Path Expresson Evaluaton n Dstrbuted Obect Databases Gabrela Ruberg, Fernanda Baão, and Marta Mattoso Department of Computer Scence COPPE/UFRJ P.O.Box 685, Ro de Janero, RJ, 2945-970

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

arxiv: v2 [cs.db] 18 Sep 2017

arxiv: v2 [cs.db] 18 Sep 2017 Effcent Approxmate Query Answerng over Sensor Data wth Determnstc Error Guarantees arxv:1707.01414v2 [cs.db] 18 Sep 2017 ABSTRACT Jaquelne Brto UC San Dego jabrto@cs.ucsd.edu Yanns Katss UC San Dego katss@cs.ucsd.edu

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

Automatic selection of reference velocities for recursive depth migration

Automatic selection of reference velocities for recursive depth migration Automatc selecton of mgraton veloctes Automatc selecton of reference veloctes for recursve depth mgraton Hugh D. Geger and Gary F. Margrave ABSTRACT Wave equaton depth mgraton methods such as phase-shft

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information