arxiv: v2 [cs.db] 18 Sep 2017

Size: px
Start display at page:

Download "arxiv: v2 [cs.db] 18 Sep 2017"

Transcription

1 Effcent Approxmate Query Answerng over Sensor Data wth Determnstc Error Guarantees arxv: v2 [cs.db] 18 Sep 2017 ABSTRACT Jaquelne Brto UC San Dego Yanns Katss UC San Dego Wth the recent prolferaton of sensor data, there s an ncreasng need for the effcent evaluaton of analytcal queres over multple sensor datasets. The magntude of such datasets makes exact query answerng nfeasble, leadng researchers nto the development of approxmate query answerng approaches. However, exstng approxmate query answerng algorthms are not suted for the effcent processng of queres over sensor data, as they exhbt at least one of the followng shortcomngs: (a) They do not provde determnstc error guarantees, resortng to weaker probablstc error guarantees that are n many cases not acceptable, (b) they allow queres only over a sngle dataset, thus not supportng the multtude of queres over multple datasets that appear n practce, such as correlaton or cross-correlaton and (c) they support relatonal data n general and thus mss speedup opportuntes created by the specal nature of sensor data, whch are not random but follow a typcally smooth underlyng phenomenon. To address these problems, we propose PlatoDB; a system that explots the nature of sensor data to compress them and provde effcent processng of queres over multple sensor datasets, whle provdng determnstc error guarantees. PlatoDB acheves the above through a novel archtecture that (a) at data mport tme preprocesses each dataset, creatng for t an ntermedate herarchcal data structure that provdes a herarchy of summarzatons of the dataset together wth approprate error measures and (b) at query processng tme leverages the pre-computed data structures to compute an approxmate answer and determnstc error guarantees for ad hoc queres even when these combne multple datasets. As a result of ts novel archtecture, PlatoDB exhbts speedups of 1-3 orders of magntude compared to systems that use the entre sensor datasets to compute exact query answers durng experments performed on real sensor datasets. 1. INTRODUCTION The ncreasng affordablty of sensors and storage has recently led to the prolferaton of sensor data n a varety of domans, ncludng transportaton, envronmental protecton, healthcare, ftness, etc. These data are typcally of hgh granularty and as a result have substantal storage requrements, rangng from a few GB to many TB. For nstance, a Formula 1 produces 20GB of data durng two 90-mnute practce sessons 1, whle a commercal arcraft may generate 2.5TB of data per day 2. Supported by NSF BIGDATA Korhan Demrkaya UC San Dego kdemrka@cs.ucsd.edu Chunbn Ln UC San Dego chunbnln@cs.ucsd.edu Bourser Etenne UC San Dego ebourser@cs.ucsd.edu Yanns Papakonstantnou UC San Dego yanns@cs.ucsd.edu The magntude of sensor datasets creates a sgnfcant challenge when t comes to query evaluaton. Runnng analytcal queres over the data (such as fndng correlatons between sgnals), whch typcally nvolve aggregates, can be very expensve, as the queres have to access sgnfcant amounts of data. Ths problem becomes worse when queres combne n ad hoc ways multple sensor datasets. For nstance, consder a data analytcs scenaro, where a user wants to combne (a) a locaton dataset provdng the locaton of users for dfferent ponts n tme (as recorded by ther smartphone s GPS) and (b) an ar polluton dataset recordng the ar qualty at dfferent ponts n tme and space (as recorded by ar qualty sensors) to compute the average qualty of ar nhaled by each user over a certan tme perod 3. Answerng ths query requres accessng all locaton and ar polluton measurements n the tme perod of nterest, whch can be substantal for long perods. To solve ths problem, researchers have proposed approxmate query processng algorthms [17, 1, 37, 2, 26, 26, 31, 24] that approxmate the query result by lookng at a subset of the data. However, exstng approaches have the followng shortcomngs when t comes to the query processng of multple sensor data sets: Lack of determnstc error guarantees. Most query approxmaton algorthms provde probablstc error guarantees. Whle ths s suffcent for some use cases, t does not cover scenaros where the user needs determnstc guarantees ensurng that the returned answer s wthn the specfed error bounds. Lack of support of queres over multple datasets. Many technques, such as wavelets, provde error guarantees only for queres over a sngle dataset. The errors can be arbtrarly large for queres rangng over multple datasets, as they are unaware of how multple datasets nteract wth each other. Data agnostcsm. The majorty of exstng technques works for relatonal data n general and does not leverage compresson opportuntes that come from the fact that sensor data are not random n nature but follow typcally smooth contnuous phenomena. To overcome the lmtatons, we desgn the PlatoDB system, whch leverages the nature of sensor data to compress them and provde effcent processng of analytcal queres over multple sensor datasets, whle provdng determnstc error guarantees. In a nutshell, PlatoDB operates as follows: When ntated, t preproscence-arbus-puts sensors-n-every-sngle 3 Ths s a real example encountered durng the DELPHI project conducted at UC San Dego, whch studed how health-related data about ndvduals, ncludng large amounts of sensor data, can be leveraged to dscover the determnants of health condtons [18].

2 cesses each tme seres dataset and bulds for t a bnary tree structure, whch provdes a herarchy of summarzatons of segments of the orgnal tme seres. A node n the tree structure summarzes a segment of tme seres through two components: () a compresson functon estmatng the data ponts n the segment, and () error measures ndcatng the dstance between the compressed segment and the orgnal one. The lower level nodes refers to fner-graned segments and smaller errors. Durng runtme, PlatoDB takes as nput an aggregate query over potentally multple sensor datasets together wth an error or tme budget and utlzes the tree structure for each of the datasets nvolved n the query to obtan an approxmate answer together wth a determnstc error guarantee that satsfes the tme/error budget. Contrbutons. In ths work, we make the followng contrbutons: We defne a query language over sensor data, whch s powerful enough to express most common statstcs over both sngle and multple tme seres, such as varance, correlaton, and cross-correlaton (Secton 3). We propose a novel tree structure (structurally smlar to herarchcal hstograms) and a correspondng tree generaton algorthm that provdes a herarchcal summarzaton of each tme seres ndependently of the other tme seres. The summarzaton s based on the combnaton of arbtrary compresson functons that can be reused from the lterature together wth three novel error measures that can be used to provde determnstc error guarantees, regardless of the employed compresson functon (Secton 4). We desgn an effcent query processng algorthm operatng on the pre-computed tree structures, whch can provde determnstc error guarantees for queres rangng over multple tme seres, even though each tree refers to one tme seres n solaton. The algorthm s based on a combnaton of error estmaton formulas that leverage the error measures of ndvdual tme seres segments to compute an error for an entre query (Secton 5) together wth a tree navgaton algorthm that effcently traverses the tme seres tree to quckly compute an approxmate answer that satsfes the error guarantees (Secton 6). We conduct experments on two real-lfe datasets to evaluate our algorthms. The results show that our algorthm outperforms the baselne by 1-3 orders of magntude (Secton 7). 2. SYSTEM ARCHITECTURE Fgure 1 depcts PlatoDB s archtecture. PlatoDB operates n two steps, performed at two dfferent ponts n tme. At data mport tme, PlatoDB pre-processes the ncomng tme seres data, creatng a segment tree structure for each tme seres. At query executon tme, t leverages these segment trees to provde an approxmate query answer together wth determnstc error guarantees. We next descrbe these two steps n detal. Off-lne Pre-Processng. At data mport tme, PlatoDB takes as nput a set of tme seres. The tme seres are created from the raw sensor data by the typcal Extract-Transform-Load (ETL) scrpts potentally combned wth de-nosng algorthms, whch s outsde the focus of ths paper. For each such tme seres, PlatoDB s Segment Tree Generator creates a herarchy of summarzatons of the data n the form of a segment tree; a tree, whose nodes summarze the data for segments of the orgnal tme seres. Intutvely, the structure of the segment tree corresponds to a way of splttng the tme seres recursvely nto smaller segments: The root S 1 of the tree corresponds to the entre tme seres, whch can be splt nto two subsegments (generally of dfferent length), represented by the root s chldren S 1.1 and S 1.2. The segment correspondng to S 1.1 can be n turn splt further nto two smaller segments, represented by the chldren S and S of S 1.1 and so on. Snce each node provdes a bref summarzaton of the correspondng segment, lower levels of the tree provde a more precse representaton of the tme seres than upper levels. As we wll see later, ths herarchcal structure of segments s crucal for the query processor s ablty to adapt to a wde varety of error/tme budgets provded by the user. When the user s wllng to accept a large error, the query processor wll mostly use the top levels of the trees, provdng a quck response. On the other hand, f the user demands a lower error, the algorthm wll be able to satsfy the request by vstng lower levels of the segment trees (whch exact nodes wll be vsted also depends on the query and the nterplay of the tme seres n t). Leveragng the trees, PlatoDB can even provde users wth contnuously mprovng approxmate answers and error guarantees, allowng them to stop the computaton at any tme, smlar to works n onlne aggregaton [15, 7, 26]. Each node of the tree summarzes the correspondng segment through two data tems: (a) a compresson functon, whch represents the data ponts n a segment n a compact way (e.g., through a constant [21] or a lne [19]), and (b) a set of error measures, whch are metrcs of the dstance between the data pont values estmated by the compresson functon and the actual values of the data ponts. As we wll see, the query processor uses the compresson functon and error measures of the segment tree nodes to produce an approxmate answer of the query and the error guarantees, respectvely. Interestngly, PlatoDB s nternals are agnostc of the compresson functon used. As we wll dscuss n Secton 4, PlatoDB s query processor works ndependently of the employed compresson functons, allowng the system to be combned wth all popular compresson technques. For nstance, n our example above we utlzed the Pecewse Aggregate Approxmaton (PAA) [21], whch returns the average of a set of values. However, we could have used other compresson technques, such as the Adaptve Pecewse Constant Approxmaton (APCA) [20], the Pecewse Lnear Representaton (PLR) [19], or others. Remark. It s mportant to note that the segment tree s not necessarly a balanced tree. PlatoDB decdes whether a segment need to be splt based on how close the values derved from the compresson functon are to the actual values of the segment. PlatoDB splts the segment when the dfference s large. Intutvely, ths means that the segment tree contans more nodes for parts of the doman where the tme seres s rregular and/or rapdly changng, and fewer nodes for the smooth parts. PlatoDB treats the problem of fndng the splttng postons as an optmzaton problem, splttng at postons that can brng the largest error reducton. We wll present the segment tree generator algorthms n Secton 4. EXAMPLE 1. Fgure 1(a) shows the segment tree for a tme seres T. The root node S 1 of the tree (correspondng to the segment coverng the entre tme seres) summarzes ths segment through two tems: a set of parameters descrbng a compresson functon f 1 (n ths case the functon returns the average v of the values of the tme seres and can therefore be descrbed by the sngle value v) and a set of error measures M 1 (the detals of error measures wll be presented n Secton 4). Ths entre segment s splt nto two

3 Q Query max ε/ max t Error/ Tme Budget Pre-Processng Offlne Query Processng Onlne ETL + Nose Removal Segment Tree Generator Query Processor Sensor Data Segment Trees Approxmate Error Answer Guarantee (a) Generatng the segment tree of a tme seres T S 1 error measures (b) Evaluatng a query nvolvng T 1 and T 2 Segment S 1 compresson functon S 1.1 S 1.2 Segment S 1.1 Segment S 1.2 S S S 1.1,1 S S 1.2 Tme seres T Segment Tree for T Segment Tree for T 1 Segment Tree for T2 Fgure 1: PlatoDB s archtecture, ncludng detals on the segment tree generaton and query processng. subsegments S 1.1 and S 1.2, gvng rse to the dentcally-named tree nodes. Note that the tree s not balanced. Segment S 1.2 s not splt further as ts functon f 1.2 correctly predcts the values wthn the correspondng segment. In contrast, the segment S 1.1 dsplays great varablty n the tme seres values and s thus splt further nto segments S and S On-lne Query Processng. At query evaluaton tme, PlatoDB s Query Processor receves a query and a tme or error budget and leverages the pre-processed segment trees to produce an approxmate query answer and a correspondng error guarantee satsfyng the provded budget. To compute the answer and error guarantee, PlatoDB traverses n parallel n a top-down fashon the segment trees of all tme seres nvolved n the query. At any step of ths process, t uses the compresson functon and error measures n the current accessed nodes to calculate an approxmate query answer and the correspondng error. If t has not reached yet the tme/error budget (.e., f there s stll tme left or f the current error s stll greater than the error budget), PlatoDB greedly chooses among all the currently accessed nodes the one, whose chldren nodes would yeld the greatest error reducton and uses them to replace ther parent n the answer and error estmaton. Otherwse, PlatoDB stops accessng further nodes of the segment trees and outputs the currently computed approxmate answer and error. Query processng s descrbed n detal n Sectons 5 and 6. Remark. It s mportant to note that, n contrast to exstng approxmate query answerng systems, PlatoDB can answer queres that span across dfferent tme seres, even though the segment trees were pre-processed for each tme seres ndvdually. As we wll see, the fact that the segment trees were generated for each tme seres ndvdually, leads to nterestng problems at query processng tme, such as algnng the segments of dfferent tme seres and reasonng about how these segments nteract to produce the query answer and error guarantees. Fnally, t s also mportant to note that PlatoDB adapts to the provded error budget by accessng dfferent number of nodes. Larger error budgets lead to fewer node accesses, whle smaller error budgets requre more node accesses. EXAMPLE 2. Consder a query Q nvolvng two tme seres T 1 and T 2 and an error budget ε max = 10. Fgure 1(b) shows how the query processng algorthm uses the pre-computed segment trees of the two tme seres. PlatoDB frst accesses the root nodes of both segment trees n parallel and computes the current approxmate query answer ˆR and error ˆε, usng the compresson functon and error measures n the root nodes. Let s assume that ˆε = 20. Snce ˆε > ε max, PlatoDB keeps traversng the trees by greedly choosng a node and replacng t by ts chldren, so that the error reducton at each step s maxmzed. Ths process contnues untl the error budget s satsfed. For nstance, assume that usng the yellow shaded nodes n Fgure 1(b) PlatoDB obtans an error ˆε = 6 < ε max. Then PlatoDB stops traversng the trees and outputs the approxmate answer and the error ˆε = 6. Note that none of the descendants of the shaded nodes s touched, resultng n bg performance savngs. As a result of ths archtecture, PlatoDB acheves speedups of 1-3 orders of magntude n query processng of sensor data compared to approaches that use the entre dataset to compute exact query answers (more detals are ncluded n PlatoDB s expermental evaluaton n Secton 7). 3. DATA AND QUERIES Before descrbng the PlatoDB system, we frst present ts data model and query language.

4 Query Expresson (Q) Q Ar Arthmetc Expresson (Ar) Ar number Agg Ar Ar where {+,,, } Aggregaton Expresson (Agg) Agg Sum(T, l s, l e) Tme Seres Expresson (T ) T base tme seres l e =l s d SeresGen(υ, n) (υ, υ,..., υ) }{{} n Plus(T, T ) (d (1) 1 + d (2) 1,..., d(1) n + d (2) n ) n ) n ) Mnus(T, T ) (d (1) 1 d (2) 1,..., d(1) n d (2) Tmes(T, T ) (d (1) 1 d (2) 1,..., d(1) n d (2) Fgure 2: Grammar of query expressons. Data Model. For the purpose of ths work, a tme seres T =[(t 1, d 1), (t 2, d 2),..., (t n, d n)] s a sequence of (tme, data pont) pars (t, d ), such that the data pont d was observed at tme t. We follow exstng work [13] to normalze and standardze the tme seres so that all tme seres are n the same doman and have the same resoluton. Snce all tme seres are algned, for ease of exposton we omt the exact tme ponts and use nstead the ndex of the data ponts whenever we need to defne a tme nterval. For nstance, we wll denote the above tme seres smply as T =(d 1, d 2,..., d n), and use [, j] to refer to the tme nterval [t, t j]. A subsequence of a tme seres s called a tme seres segment. For example S = (5.01, 5.06) s a segment of the tme seres T = (5.05, 5.01, 5.06, 5.06, 5.08). Query Language. PlatoDB supports queres whose man buldng blocks are aggregaton queres over tme seres. Fgure 2 shows the formal defnton of the query language and Table 1 lsts several common statstcs that can be expressed n ths language. A query expresson Q s an arthmetc expresson of the form Arr 1 Arr 2... Arr n, where are the standard arthmetc operators (+,, ) and Arr s ether an arthmetc lteral or an aggregaton expresson over a tme seres. An aggregaton expresson Sum(T, l s, l e) over a tme seres T computes the sum of all data ponts of T n the tme nterval [l s, l e]. Note that the tme seres that s aggregated could ether be a base tme seres or a derved tme seres that was computed from a set of base tme seres through a set of tme seres operators. PlatoDB allows a seres of tme seres operators, ncludng Plus(T 1, T 2), Mnus(T 1, T 2), and Tmes(T 1, T 2) (whch return a tme seres that has data ponts computed by addng, subtractng, and multplyng the respectve data ponts of the orgnal tme seres, respectvely), as well as SeresGen(v, n), whch takes as nput a value v and a counter n and creates a new tme seres that contans n data ponts wth the value v. Note that the query language can be used to express many common statstcs over tme seres encountered n practce and all the queres we encountered durng the DELPHI project conducted at UC San Dego, whch explored how health-related data about ndvduals, ncludng large amounts of sensor data, can be leveraged to dscover the determnants of health condtons and whch served as the motvaton for ths work [18]. These nclude the mean and varance of a sngle tme seres, as well as the covarance, correlaton, and cross-correlaton between two tme seres. Table 1 shows how common statstcs can be expressed n PlatoDB s query language. 4. SEGMENT TREE As explaned n Secton 2, at data mport tme, PlatoDB creates for each tme seres a herarchy of summarzatons of the seres n the form of the segment tree. In ths Secton we frst explan the structure of the tree and then descrbe the segment tree generaton algorthm. 4.1 Segment Tree Structure Let T = (d 1,..., d n) be a tme seres. The segment tree of T s a bnary tree whose nodes summarze segments of the tme seres wth nodes hgher up the tree summarzng large segments and nodes lower down the tree summarzng progressvely smaller segments. In partcular, the root node summarzes the entre tme seres T. Moreover, for each node n of the tree summarzng a segment S = (d,..., d j) of T, ts left and rght chldren nodes n l and n r summarze two subsegments S l = (d,..., d k ) and S r = (d k+1,..., d j), respectvely, whch form a parttonng of the orgnal segment S. As we wll see n Secton 6, ths herarchcal structure allows PlatoDB to adapt to varyng error/tme budgets by only accessng the parts of the tree requred to acheve the gven error/tme budget. At each node n correspondng to segment S = (d,..., d j), PlatoDB summarzes the segment S by keepng two types of measures: (a) a descrpton of a compresson functon that s used to approxmately represent the tme seres values n the segment and (b) a set of error measures descrbng how far the above approxmate values are from the real values. As we wll see n Sectons 5 and 6, PlatoDB uses at query processng tme the compresson functon and error measures stored n each node to compute an approxmate answer of the query and determnstc error guarantees, respectvely. We next descrbe the compresson functons and error measures stored wthn each segment tree node n detal. Segment Compresson Functon. Let S = (d 1,..., d n) be a segment. PlatoDB summarzes ts contents through a compresson functon f used by the user. PlatoDB supports the use of any of the compresson functons suggested n the lterature [21, 20, 19, 11, 5, 4]. Examples nclude but are not lmted to the Pecewse Aggregate Approxmaton (PAA) [21], the Adaptve Pecewse Constant Approxmaton (APCA) [20], the Pecewse Lnear Representaton (PLR) [19], the Dscrete Fourer Transformaton (DFT) [11], the Dscrete Wavelet Transformaton (DWT) [5], and the Chebyshev polynomals (CHEB) [4]. To descrbe the functon, PlatoDB stores n the segment node parameters descrbng the functon. These parameters depend on the type of the functon. For nstance, f f s a Pecewse Aggregate Approxmaton (PAA), estmatng all values wthn a segment by a sngle value b, then the parameter s just a sngle value b. On the other hand, f f s a Pecewse Lnear Approxmaton (PLR), estmatng the values n the segment through a lne ax + b, then the functon parameters are the coeffcents a and b of the polynomal used to descrbe the lne. In the rest of the document, we wll refer drectly to the compresson functon f (nstead of the parameters that are used to descrbe t). Gven a segment (d 1,..., d n), we wll use f() to denote the

5 Statstc Symbol Defnton Query Expresson n Sum(T,1,n) Mean E(T ) d n n Varance V ar(t ) (d E(T )) 2 Sum(T mes(t, T ), 1, n) Sum(T,1,n) Sum(T,1,n) Covarance Cov(T 1, T 2) Correlaton Corr(T 1, T 2) Cross-correlaton Coss(T 1, T 2, l) n ((d (1) E(T 1 ))(d (2) E(T 2 ))) n 1 n ((d (1) E(T 1 ))(d (2) E(T 2 )) n (d (1) E(T 1 )) 2 n (d (2) E(T 2 )) 2 n ((d (1) E(T 1 ))(d (2) +l E(T 2)) n (d (1) E(T 1 )) 2 n (d (2) +l E(T 2)) 2 Sum(T mes(t 1,T 2 ),1,n) n 1 Table 1: Query expressons for common statstcs. Sum(T 1,1,n) Sum(T 2,1,n) n(n 1) Sum(T mes(t 1,T 2 )) 1 n Sum(T 1,1,n) Sum(T 2,1,n) V ar(t1 )V ar(t 2 ) Sum(T mes(t 1,T 2 )) 1 n Sum(T 1,1,n) Sum(T 2,1+l,n+l) V ar(t1 )V ar(t 2 ) n value for element d of the segment, as derved by f. Segment Error Measures. In addton to the compresson functon, PlatoDB also stores a set of error measures for each tme seres segment S = (d 1,..., d n). PlatoDB stores the followng three error measures: L : The sum of the absolute dstances between the orgnal and the compressed tme seres (also known as the Manhattan or L 1 dstance),.e., L = n d f(). d : The maxmum absolute value of the orgnal tme seres,.e., d = max{ d 1 n}. f : The maxmum absolute value of the compressed tme seres,.e., f = max{ f() 1 n}. EXAMPLE 3. For nstance, consder a segment S = (5.12, 5.09, 5.07, 5.04) summarzed through the PAA compresson functon f = 5.08 (.e., f(1) = f(2) = f(3) = f(4) = 5.08). Then L = = 0.1, d = max{5.12, 5.09, 5.07, 5.04} = 5.12 and f = max{5.08, 5.08, 5.08, 5.08} = As we wll see n Secton 5, the above three error measures are suffcent to compute determnstc error guarantees for any query supported by the system, regardless of the employed compresson functon f. Ths allows admnstrators to select the compresson functon best suted to each tme seres, wthout worryng about computng the error guarantees, whch s automatcally handled by PlatoDB. 4.2 Segment Tree Generaton We next descrbe the algorthm generatng the segment tree. To buld the tree, the algorthm has to decde how to buld the chldren nodes from a parent node;.e., how to partton a segment nto two non-overlappng subsegments. Each possble splttng pont wll lead to dfferent chldren segments and as a result to dfferent errors when PlatoDB uses the chldren segments to answer a query at query processng tme. Ideally, the splttng pont should be the one that mnmzes the error among all possble splttng ponts. However, snce PlatoDB supports ad hoc queres and snce each query may beneft from a dfferent splttng pont, there s no way for PlatoDB to choose a splttng pont that s optmal for all queres. Segment Tree Generaton Algorthm. Based on ths observaton, PlatoDB chooses the splttng pont that mnmzes the error for the basc query that smply computes the sum of all data ponts of the orgnal segment. In partcular, the segment tree generaton algorthm starts from the root and proceedng n a topdown fashon gven a segment S = (d 1,..., d n), selects a splttng pont d k that leads nto two subsegments S l = (d 1,..., d k ) and S r = (d k+1,..., d n) so that the sum of the Manhattan dstances of the new subsegments L Sl + L Sr s mnmzed. The algorthm stops further splttng down a segment S, when one of the followng two condtons hold: () When the Manhattan dstance L S of the segment s smaller than a threshold τ or () when he sze of the segment s below a threshold κ. The choce between condtons () and () and the values of the correspondng thresholds τ and κ s specfed by the system admnstrator. Snce the algorthm needs tme proportonal to the sze of a segment to compute the splttng pont of a sngle segment and t repeats ths process for every non-leaf tree node, t exhbts a worsttme complexty of O(mn), where n s the sze of the orgnal tme seres (.e., the number of ts data ponts) and m number of nodes n the resultng segment tree. Dscusson. Note that by decdng ndependently how to splt each ndvdual segment nto two subsegments, the segment tree generaton algorthm s a greedy algorthm, whch even though makes optmal local decsons for the basc aggregaton query, may not lead to optmal global decsons. For nstance, there s no guarantee that the k nodes that exst at a partcular level of the segment tree correspond to the k nodes that mnmze the error of the basc aggregaton query. The lterature contans a multtude of algorthms that can provde such a guarantee for a gven k;.e., algorthms that can, gven a tme seres T and a number k, produce k segments of T that mnmze some error metrc. Examples nclude the optmal algorthm of [3], as well as approxmaton algorthms wth formal guarantees presented n [34]. However, all these algorthms have very hgh worst-tme complexty that makes them prohbtve for the large number of data ponts typcally found n sensor datasets and are therefore not consdered n ths work. Though several heurstc segmentaton algorthms exst, such as the Sldng Wndows [33], the Top-down [22] and the Bottom-Up [23] algorthm, smlar do our greedy algorthm, they do not provde any formal guarantees. Fnally, note that the tree generated by the above algorthm wll n general be unbalanced. Intutvely, the algorthm wll create more nodes and correspondng tree levels to cover segments that

6 contan data ponts that are more rregular and/or rapdly changng, utlzng fewer nodes for smooth segments. 5. COMPUTING APPROXIMATE QUERY AN- SWERS AND ERROR GUARANTEES Gven pre-computed segment trees for tme seres T 1,..., T n, PlatoDB answers ad hoc queres over the tme seres by accessng ther segment trees. In partcular, to answer a gven query Q under an error/tme budget, PlatoDB navgates the segment trees of the tme seres nvolved n Q, selects segment nodes (or smply segments) that satsfy the budget, and computes an approxmate answer for Q together wth determnstc error guarantees. We wll next present the query processng algorthm. For ease of exposton, we wll start by descrbng how PlatoDB computes an approxmate query answer and the assocated error guarantees assumng that the segment nodes have been already chosen, and wll explan n Secton 6 how PlatoDB traverses the tree to choose the segment nodes. Approxmate query answerng problem under gven segments. Formally, let T 1,..., T k be tme seres, such that tme seres T s parttoned nto segments S 1,... S n. Gven (a) these segments and the assocated measures as descrbed above and (b) a query Q over the tme seres T 1,..., T k, we wll show how PlatoDB computes an approxmate query answer ˆR and an estmated error ˆε, such that the approxmate query answer ˆR s guaranteed to be wth ±ˆε of the accurate query answer R 4,.e., R ˆR ˆε. For ease of exposton, we next frst descrbe the smple case where each tme seres T contans a sngle segment perfectly algned wth the sngle segment of the other seres, before descrbng the general case, where each tme seres T contans multple segments, whch may also not be perfectly algned wth the segments of the other tme seres. 5.1 Sngle Tme Seres Segment Let T 1,..., T k be k tme seres wth sngle algned segments,.e., T s approxmated by a sngle segment S. Also let f be the compresson functon and (L, d, f ) the error measures of segment S, respectvely. To compute the approxmate answer and error guarantees of a query Q over T 1,..., T k usng the sngle segments S 1,..., S k, PlatoDB employs an algebrac approach computng n a bottom-up fashon for each algebrac operator op of Q the approxmate answer and error guarantees for the subquery correspondng to the subtree rooted at op. Ths algebrac approach s based on formulas that for each algebrac query operator, gven an approxmate query answer and error for the nputs of the operator, provde the correspondng query answer and error for the output of the operator. Fgure 3 shows the formulas employed by PlatoDB for each algebrac query operator supported by the system. Note that the output sgnatures dffer between operators. Ths s due to the dfferent types of operators supported by PlatoDB, as explaned next. Recall from Secton 3 that PlatoDB s query language conssts of three types of operators: () tme seres operators, () aggregaton operator, and () arthmetc operators. Whle tme seres operators output a tme seres, aggregaton and arthmetc operators output a sngle number. As a result, the formulas used for answer and error estmaton, treat these two classes of operators dfferently: For tme seres operators, the formulas return, smlarly to the nput tme seres, the compresson 4 Accurate answer means runnng queres over raw data. But note that, n ths work, we can gven estmate errors whout computng the accurate answers. Tme Seres Operators Operator Compr. Output Func. Error Measures f L d f SeresGen(υ, n) υ 0 υ υ Plus(T 1, T 2 ) f 1 + f 2 L 1 + L 2 d 1 + d 2 f 1 + f 2 Mnus(T 1, T 2 ) f 1 f 2 L 1 + L 2 d 1 + d 2 f 1 + f 2 Tmes(T 1, T 2 ) f 1 f 2 mn{ d 1 d 2 f 1 f 2 d 2 L 1 + f1 L 2, f2 L 1 + d 1 L 2} Aggregaton Operator Operator Approxmate Estmated Output Error Sum(T,l s, l le e) =ls f() L Arthmetc Operators Operator Approxmate Estmated Output Error Agg + Number Agg ˆ + Number ˆε Agg Number Agg ˆ Number ˆε Agg Number Agg ˆ Number ˆε number Agg Number Agg ˆ Number ˆε number Agg a + Agg b Agg ˆ a + Agg ˆ b ˆε a + ˆε b Agg a Agg b Agg ˆ a Agg ˆ b ˆε a + ˆε b Agg a Agg b Agg ˆ a Agg ˆ b Agg ˆ a ˆε b + Agg ˆ b ˆε a + ˆε a ˆε b Agg a Agg b Agg ˆ a Agg ˆ Agg ˆ a +ˆε a Agg b Agg ˆ ˆ a b ˆε b Agg ˆ b Fgure 3: Formulas for estmatng answer and error for each algebrac operator (sngle segment). functon and error measures of the output tme seres. For aggregaton and arthmetc operators on the other hand, whch return a sngle number and not an entre tme seres, the formulas return smply a sngle approxmate answer and estmated error. Fgure 3 shows the resultng formulas. 5 Wthout gong nto detal nto each of them, we next explan how they can be used to compute the answer and correspondng error guarantees for an entre query through an example. T Mnus SeresGen μ Sum Tmes T Mnus SeresGen Fgure 4: Approxmate query answer and assocated error for query Q = Sum(Tmes (Mnus(T, SeresGen(µ, n)), Mnus(T, SeresGen(µ, n)), 1, n). Compresson functons and error measures are shown n blue and red, respectvely. EXAMPLE 4. Ths example shows how to use the formulas n Fgure 3 to compute the approxmate answer and assocated er- 5 Out of the formulas, the most nvolved are the output measure estmaton formulas of the Tmes operator. More detals on how they were derved can be found n Appendx A.1. μ

7 ror for a query computng the varance of a tme seres T consstng of sngle segment S. For smplcty of the query expresson we assume that the mean µ of T s known n advance (note that even f µ was not known, the query would stll be expressble n PlatoDB s query language, albet through a longer expresson). Let f be the compresson functon and (L, d, f ) the error measures of S. The query can be expressed as Q = Sum(Tmes (Mnus(T, SeresGen(µ, n)), Mnus(T, SeresGen(µ, n)), 1, n). Fgure 4 shows how PlatoDB evaluates ths query n a bottom-up fashon. It frst uses the formula of the SeresGen operator to compute the compresson functon (f = µ) and error measures (L = 0, d = µ, f = µ) for the output of the SeresGen operator. It then computes the compresson functon (f µ) and error measures (L, (d + µ), (f + µ)) for the output of the Mnus operator. The computaton contnues n a bottom-up fashon, untl PlatoDB computes the output of the Sum operator n the form of an approxmate answer ˆR = n(f µ) 2 where n s the number of data ponts n T, and an estmated error ˆε = (d + f )L. Importantly, the formulas shown n Fgure 3 are guaranteed to produce the best error estmaton out of any formula that uses the three error measures employed by PlatoDB as explaned by the followng theorem: THEOREM 1. The estmated errors produced through the use of the formulas shown n Fgure 3 are the lowest among all possble error estmatons produced by usng the error measures descrbed n Secton 4. The proof can be found n Appendx A Multple Segment Tme Seres Let us now consder the general case, where each tme seres T contans multple segments of varyng dfferent szes. As a result of the varyng szes of the segments, segments of dfferent tme seres may not fully algn. EXAMPLE 5. For nstance consder the top two tme seres T 1 = (S 1,1, S 1,2) and T 2 = (S 2,1, S 2,2) of Fgure 5 (gnore the thrd tme seres for now). Segment S 1,1 overlaps wth both S 2,1 and S 2,2. Smlarly, segment S 2,2 overlaps wth both S 1,1 and S 1,2. One may thnk that ths can be easly solved by creatng subsegments that are perfectly algned and then usng for each of them the answer and error estmaton formulas of Secton 5.1. EXAMPLE 6. Contnung our example, the two tme seres T 1 and T 2 can be splt nto the three algned subsegments shown as the output tme seres T 3. Then for each of these output segments, we can compute the error based on the formulas of Secton 5.1. However, the problem wth ths approach s that the resultng error wll be severely overestmated as the error of a sngle segment of the orgnal tme seres may be counted multple tmes, as t overlaps wth multple output segments. EXAMPLE 7. For nstance, for a query over the tme seres T 1 and T 2 of Fgure 5,the error of S 2,2 wll be double-counted, as t wll be counted towards the error of the two output segments S 3,2 and S 3,3. To avod ths ptfall, PlatoDB does not estmate the error for ts segment ndvdually but nstead computes the error holstcally for the entre tme seres. Fgures 6 and 7 show the resultng answer Fgure 5: Example of algned tme seres segments. The new generated tme seres T 3 s shown n red color. and error estmaton formulas for tme seres operators and the aggregaton operator, respectvely. The formulas of the arthmetc operators are omtted as they reman the same as n the sngle segment case, as the arthmetc operators take as nput sngle numbers nstead of tme seres and are thus not affected by multple segments. 6. NAVIGATING THE SEGMENT TREE So far we have seen how PlatoDB computes the approxmate answer to a query and ts assocated error, assumng that the segments that are used for query processng have already been selected. In ths Secton, we explan how ths selecton s performed. In partcular, we show how PlatoDB navgates the segment trees of the tme seres nvolved n the query to contnuously compute better estmatons of the query answer under the gven error or tme budget s satsfed. Query Processng Algorthm. Let T 1,..., T m be a set of tme seres and I 1,..., I m the respectve segment trees. Let also Q be a query over T 1,..., T m and ε max/t max an error/tme budget, respectvely. To answer Q under the gven budget, PlatoDB frst starts from the roots of I 1,..., I m and uses them to compute the approxmate query answer ˆR and correspondng error ˆε usng the formulas presented n Secton 5. If the estmated error s greater than the error budget (.e., f ˆε ε max) or f the elapsed tme s smaller than the allowed tme budget, PlatoDB chooses one of the tree nodes used above, replaces t wth ts chldren and repeats the above procedure usng the newly selected nodes untl the gven error/tme budget s reached. What s mportant s the crteron that s used to choose the node that s replaced at each step by ts chldren. In general, PlatoDB wll have to select between several nodes, as t wll be explorng n whch segment tree and moreover n whch part of the selected segment tree t pays off to navgate further down. Snce PlatoDB ams to reduce the estmated error as much as possble, at each step t greedly chooses the node whose replacement by ts chldren leads to the bggest reducton n the estmated error. The resultng procedure s shown as Algorthm 1 6. Algorthm Optmalty. Gven ts greedy nature, one may wonder whether the query processng algorthm s optmal. To answer ths queston, we have to frst defne optmalty. Snce the am of the query processng algorthm s to produce the lowest possble error n the fastest possble tme (whch can be approxmated by the number of nodes that are accessed), we say that an algorthm s optmal f for every possble query, set of segment trees, and error budget ε max t answers the query under the gven budget accessng the 6 Note that the algorthm s shown for both error and tme budget case. In contrast to the case when a tme budget s provded, n whch the algorthm has to always keep a computed estmated answer ˆR to return t when the tme budget runs out, n the case of the error budget ths s not requred. Thus, n the latter case, t suffces to compute ˆR only at the very last step of the algorthm, thus avodng ts teratve computaton durng the whle loop.

8 Tme Seres Operators Operator Comp. func. Output Error Measures f L d f SeresGen(υ, n) υ 0 υ υ Plus(T a, T b ) {(f c,1,..., f c,k ) f c, = f a,u + f b,v [1, k]} p L a, + q j=1 L b,j max{d c, d c, = d a,u + d b,v [1, k]} max{f c, f c, = f a,u + f b,v [1, k]} Mnus(T a, T b ) {(f c,1,..., f c,k ) f c, = f a,u f b,v [1, k]} p L a, + q j=1 L b,j max{d c, d c, = d a,u + d b,v [1, k]} max{f c, f c, = f a,u + f b,v [1, k]} Tmes(T a, T b ) {(f c,1,..., f c,k ) f c, = f a,u f b,v [1, k]} L Tc max{d c, d c, = d a,u d b,v [1, k]} max{f c, f c, = f a,u f b,v [1, k]} Fgure 6: Formulas for estmatng answer and error for tme seres operators (multple segments). For each output tme seres segment S c,, let S a,u and S b,v be the nput segments that overlap wth S c,. Aggregaton Operator Operator Approxmate Estmated Output Error Sum(T,l s, l v S e) =u j=1 f v (j) =u L Fgure 7: Formulas for estmatng answer and error for the aggregaton operator (multple segments). the chldren of the shaded node only after t has accessed all the other nodes n the tree. However, ths s suboptmal, as there s a way to access the chldren of the shaded node wth fewer node accesses (.e., by followng the path from the root to the shaded node). Therefore, no algorthm n A s optmal. Algorthm 1: PlatoDB Query Processng Input: Segment Trees I 1,..., I m, query Q, error budget ε max or tme budget t max Output: Approxmate answer ˆR and error ˆε 1 Access the roots of I 1,..., I m; 2 Compute ˆR and ˆε by usng the compresson functons and error measures of the currently accessed nodes (see Secton 5 for detals); 3 whle ˆε > ε max or elapsed tme < t max do 4 Choose a node maxmzng the error reducton; 5 Update the current answer ˆR and error ˆε usng the compresson functons and error measures of the currently accessed nodes; 6 Return ( ˆR, ˆε); lowest number of nodes than any other possble algorthm. Snce a comparson of any possble algorthm s hard, we also restrct our attenton to determnstc algorthms that access the segment trees n a top-down fashon (.e, to access a node N all ts ancestor nodes should also be accessed). We denote ths class of algorthms as A. It turns out that no algorthm n A can be optmal as the followng theorem states: THEOREM 2. There s no optmal algorthm n A. PROOF. Consder the followng segment trees of two tme seres T 1 and T 2. The segment tree of T 1 s shown n Fgure 8 and the segment tree of T 2 s a tree contanng a sngle node. Now consder a query Q over these two tme seres and an error budget ε = h 1 where h > 1 s the heght of the T 1 s tree. Assume that the query error usng the tree roots s ε root = 2h. Also assume that whenever the query processng algorthm replaces a node by ts chldren, the error for the query s reduced by 1 2 h wth the excepton of the shaded node, whch, when replaced by ts chldren, leads to an error reducton of h + 1. Ths means that the query processng algorthm can only termnate after accessng the chldren of the shaded node, as the query error n that case wll be at most 2h (h + 1) = h 1. Otherwse, the error estmated by the algorthm wll be at least 2h 2 h ( 1 2 h ) = 2h 1 > h 1, whch exceeds the error budget and thus does not allow the algorthm to termnate. Snce the shaded node can be placed at an arbtrary poston n the tree, for every gven determnstc algorthm, we can place the shaded node n the tree, so that the algorthm accesses Fgure 8: Segment Tree for Theorem 2. As a result of the above theorem, PlatoDB s query processng algorthm cannot be optmal n general. However, we can show that t s optmal for segment trees that exhbt the followng property: For every par of nodes N and N of the segment tree, such that N s a descendant of N, the error reducton ε (N) acheved by replacng N wth ts chldren s greater or equal to the error reducton ε (N ) acheved by replacng N wth ts chldren. Such a tree s called fne-error-reducton tree and ntutvely t guarantees that any node leads to a greater or equal error reducton than any of ts descendants. If all trees satsfy the above property, PlatoDB s query processng algorthm s optmal: THEOREM 3. In the presence of segment trees that are fneerror-reducton trees, PlatoDB s query processng algorthm s optmal. Operator Incremental Error Update Plus(T a, T b ) ˆε = ˆε (L a (L a.1 + L a.2 )) Mnus(T a, T b ) ˆε = ˆε (L a (L a.1 + L a.2 )) ˆε = ˆε (max(p b,1,..., p b,k )L a Tmes(T a, T b ) max(p b,1,..., p b, )L a.1 + max(p b,,..., p b,k )L a.2 ) Table 2: Incremental update of estmated errors for tme seres operators. p b, {d b,, f b,}. Incremental Error Update. Havng proven the optmalty of the algorthm for fne-error-reducton trees, we wll next dscuss an optmzaton that can be employed to speedup the algorthm. By

9 studyng the algorthm, t s easy to observe that as the algorthm moves from a set N = {N 1,..., N n} of nodes to a set N = {N 1,..., N a 1, N a.1, N a.2, N a+1,..., N n} of nodes (by replacng node N a by ts chldren N a.1 and N a.2), t recomputes the error usng all nodes n N, although only the two nodes N a.1 and N a.2 have changed from the prevous node set N. Ths observaton led to the ncremental error update optmzaton of PlatoDB s query processng algorthm descrbed next. Instead of recomputng from scratch the error of N usng all nodes, PlatoDB ncrementally updates the error of N by usng only the error measures of the newly replaced node N a and the newly nserted nodes N a.1 and N a.2. Let (L a, d a, f a ), (L a.1, d a.1, f a.1), and (L a.2, d a.2, f a.2) be the error measures of nodes N a, N a.1, and N a.2, respectvely. Assume that the segments S b,1,..., S b,k overlap wth the segment of node N a, the segments S b,1,..., S b, ( k) overlap wth the segment of node N a.1, and the segments S b,,..., S b,k overlap wth the segment of node N a.2. Then the estmated error ˆε usng nodes N a.1 and N a.2 can be ncrementally computed from the error ˆε usng node N a through the ncremental error update formulas shown n Table 2 7. Probablstc Extenson. Whle PlatoDB provdes determnstc error guarantees, whch as we dscussed above are n many cases requred, t s nterestng to note that t can be easly extended to provde probablstc error guarantees f needed. Most mportantly ths can be done smply by changng the error measures computed for each segment from (L, d, f ) to (σ ε, ε, f ), where σ ε s the varance of d f(), and ε s the maxmal absolute value of d f(). Then we can employ the Central Lmt Theorem (CLT) [10] to bound the accurate error ε by P r(ε ˆε) 1 α, where α can be adjusted by the users to get dfferent confdence levels. It s nterestng that the rest of the system, ncludng the herarchcal structure of the segment tree and the tree navgaton algorthm employed at query processng tme do not need to be modfed. In our future work we plan to further explore ths probablstc extenson and compare t to exstng approxmate query answerng technques wth probablstc guarantees. 7. EXPERIMENTAL EVALUATION To evaluate PlatoDB s performance and verfy our hypothess that PlatoDB s able to provde sgnfcant savngs n the query processng of sensor data, we are conductng experments on real sensor data. We present here early data ponts that we have dscovered. Datasets. For our prelmnary experments, we used two real sensor datasets: 1. Intel Lab Data (ILD) 8. Smart home data (humdty and temperature) collected at 31-second ntervals from 54 sensors deployed at the Intel Berkeley Research Lab between February 28th and Aprl 5th, The dataset contans about 2.3 mllon tuples (.e., 4.6 mllon sensor readngs n total). 2. EPA Ar Qualty Data (AIR) 9. Ar qualty data collected at hourly ntervals from about 1000 sensors from January 1st 2000 to Aprl 1st The dataset contans about 133 mllon tuples (.e., 266 mllon sensor readngs n total). 7 The SeresGen operator s omtted, snce ts nput s not a tme seres and as a result there s no segment tree assocated wth ts nput From each dataset we extracted multple tme seres, each correspondng to a sngle attrbute of the dataset; Humdty and Temperature for ILD and Ozone and SO 2 for AIR. We then used PlatoDB to create the correspondng segment tree for each tme seres and to answer queres over them. Expermental platform. All experments were performed on a computer wth a 4th generaton Intel processor (4 32 KB L1 data cache, KB L2 cache, 8 MB shared L3 cache, 4 physcal cores, 3.6 GHz) and 16 GB RAM, runnng Ubuntu All the algorthms were mplemented n C++ and compled wth g , usng -O3 optmzaton. All data was stored n man memory. 7.1 Expermental Results In our prelmnary evaluaton, we measured two quanttes: Frst, the sze of the segment tree created by PlatoDB, snce ths segment tree s stored n man memory, and second, the query processng performance of PlatoDB compared to a system that answers queres usng the entrety of the raw sensor data. In our future work, we wll be conductng a more thorough evaluaton of the system. We next present our prelmnary results: Dataset # Tuples Raw Data Segment Tree (0-degree) (1-degree) ILD 2,313, MB 0.14 MB 0.67 MB AIR 133,075, GB 4.37 MB 8.11 MB Table 3: Raw data and segment tree szes. Segment tree sze. Table 3 shows the sze of the raw data and the combned sze of the segment trees bult for all the tme seres extracted from the ILD and AIR datasets. 10 We expermented wth two dfferent compresson functons, resultng n dfferent segment tree szes; a 0-degree polynomal (correspondng to the Pecewse Aggregate Approxmaton [21], where each value wthn a segment s approxmated through the average of the values n the segment) and a 1-degree polynomal (correspondng to the Pecewse Lnear Approxmaton [19], where each segment s approxmated through a lne). As shown, the segment trees are sgnfcantly smaller than the raw sensor data (about 0.40% 1.90% and 0.22% 0.40% smaller for the ILD and AIR datasets, respectvely). As a result, the segment trees of the tme seres can be easly kept n man memory, even when the system stores a large number of tme seres. Tme Cost(ms) Exact ApproPlato-0 ApproPlato Error (%) (a) ILD Tme Cost(ms) Exact ApproPlato-0 ApproPlato Error (%) (b) AIR Fgure 9: Query processng performance for correlaton query (tme shown n ms). Query processng performance. We next compared the query 10 To make a far comparson, the raw data sze refers only to the combned sze of the attrbutes used n the tme seres and does not nclude other attrbutes that exst n the orgnal dataset (such as locaton codes etc).

10 processng performance of PlatoDB aganst a baselne, whch s a custom n-memory algorthm that computes the exact answer of the queres usng the raw data. To compare the systems, we measured the tme requred to process a correlaton query between two tme seres (.e., correlaton(humdty, Temperature) n ILD and correlaton(ozone and SO 2) n AIR)) wth a varyng error budget (rangng from 5% to 25%). Fgure 9 shows the resultng tmes for each of the two datasets. Each graph depcts the performance of three systems; Exact, whch s the baselne method of answerng queres over the raw data, and PlatoDB-0, PlatoDB-1, whch are nstances of PlatoDB usng the 0-degree and 1-degree polynomal compresson functons, as explaned above. By studyng Fgure 9, we can make the followng observatons. Both nstances of PlatoDB outperform Exact by one to three orders of magntude, dependng on the provded error budget. In contrast to Exact whch always uses the entre raw dataset to compute exact query answers, PlatoDB allows the user to select the approprate tradeoff between tme spent n query processng and resultng error by specfyng the desred error budget. The system adapts to the budget by provdng faster responses as the allowed error budget ncreases; Notably, PlatoDB remans sgnfcantly faster than Exact even for small error budgets. In partcular, PlatoDB s over 9 and 37 faster than Exact when the error s 5% n ILD and AIR respectvely. In summary, our prelmnary results show that PlatoDB shows sgnfcant potental for speedng up query processng of ad hoc queres over large amounts of sensor data, as t outperforms exact query processng algorthms n many cases by several orders of magntude. Moreover, t can provde such speedups, whle provdng determnstc error guarantees, n contrast to exstng samplngbased approxmate query answerng approaches that provde only probablstc guarantees, whch may not hold n practce. Despte the dfference n guarantees, n our future work we wll be conductng a more thorough evaluaton of the system comparng t also aganst samplng-based systems. 8. RELATED WORK Approxmate query answerng has been the focus on an extensve body of work, whch we wll summarze next. However, to the best of our knowledge, ths s the frst work that provdes determnstc guarantees for aggregaton queres over multple tme seres. Approxmate query answerng wth probablstc error guarantees. Most of the exstng work on approxmate query processng has focused on usng samplng to compute approxmate query answers by approprately evaluatng the queres on small samples of the data [17, 1, 37, 2, 26, 26]. Such approaches typcally leverage statstcal nequaltes and the central lmt theorem to compute the confdence nterval or varance of the computed approxmate answer. As a result, ther error guarantees are probablstc. Whle probablstc guarantees are often suffcent, there are not sutable for scenaros where one wants to be certan that the answer wll fall wthn a certan nterval Note that as dscussed n Secton 6, PlatoDB can also be extended to provde probablstc guarantees when determnstc guarantees are not requred, smply by modfyng the error measures computed for each segment. A specal form of samplng-based methods are onlne aggregaton approaches, whch provde a contnuously mprovng query answer, allowng users to stop the query evaluaton when they are satsfed wth the resultng error [15, 7, 26]. Wth ts herarchcal segment tree, PlatoDB can support the onlne aggregaton paradgm, whle provdng determnstc error guarantees. Approxmate query answerng wth determnstc error guarantees. Approxmately answerng queres whle provdng determnstc error guarantees has so far receved only very lmted attenton [31, 24, 30]. Exstng work n the area has focused on smple aggregaton queres that nvolve a sngle relatonal table. In contrast, PlatoDB provdes determnstc error guarantees on queres that may nvolve multple tme seres (each of whch can be though of as a sngle relatonal table), enablng the evaluaton of many common statstcs that span tables, such as correlaton, crosscorrelaton and others. Approxmate query answerng over sensor data. Moreover, PlatoDB s one of the frst approxmate query answerng systems that leverage the fact that sensor data are not random but follow a usually smooth underlyng phenomenon. The majorty of exstng works on approxmate query answerng looked at general relatonal data. Moreover, the ones that studed approxmate query processng for sensor data, focused on the networkng aspect of the problem, studyng how aggregate queres can be effcently evaluated n a dstrbuted sensor network [25, 8, 9]. Whle these works focused on the networkng aspect of sensor data, our work focuses on the contnuous nature of the sensor data, whch t leverages to accelerate query processng even n a sngle machne scenaro, where hstorcal sensor data already accumulated on the machne have to be analyzed. Data summarzatons. Last but not least, there has been extensve work on creatng summarzatons of sensor data. Work n ths area has come mostly from two dfferent communtes; from the database communty [16, 30, 27, 35] and the sgnal processng communty [21, 20, 19, 5, 11, 11]. The database communty has mostly focused on creatng summarzatons (also referred to as synopses or sketches) that can be used to answer specfc queres. These nclude among others hstograms [16, 30, 12, 29] (e.g., EquWdth and EquDepth hstograms [28], V-Optmal hstograms [16], Herarchcal Model Fttng (HMF) hstograms [36], and Compact Herarchcal Hstograms (CHH) [32]), as well as samplng methods [14, 6], used among other for cardnalty estmaton [16] and selectvty estmaton [30]. In contrast to such specal-purpose approaches, PlatoDB supports a large class of queres over arbtrary sensor data. The sgnal processng communty on the other hand, produced a varety of methods that can be used to compress tme seres data. These nclude among others the Pecewse Aggregate Approxmaton (PAA) [21], the Adaptve Pecewse Constant Approxmaton (APCA) [20], the Pecewse Lnear Representaton (PLR) [19], the Dscrete Wavelet Transform (DWT) [5], and the Dscrete Fourer Transform (DFT) [11]. However, t has not been concerned on how such compresson technques can be used to answer general queres. PlatoDB s modular archtecture allows the easy ncorporaton of such technques as compresson functons, that are then automatcally leveraged by the system to enable approxmate answerng of a large number of queres wth determnstc error guarantees. 9. CONCLUSION

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements Explct Formulas and Effcent Algorthm for Moment Computaton of Coupled RC Trees wth Lumped and Dstrbuted Elements Qngan Yu and Ernest S.Kuh Electroncs Research Lab. Unv. of Calforna at Berkeley Berkeley

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n ) Inserton sort Selecton sort Bubble sort Shell sort Fancer algorthms: O(n

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Automatic selection of reference velocities for recursive depth migration

Automatic selection of reference velocities for recursive depth migration Automatc selecton of mgraton veloctes Automatc selecton of reference veloctes for recursve depth mgraton Hugh D. Geger and Gary F. Margrave ABSTRACT Wave equaton depth mgraton methods such as phase-shft

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Computer models of motion: Iterative calculations

Computer models of motion: Iterative calculations Computer models o moton: Iteratve calculatons OBJECTIVES In ths actvty you wll learn how to: Create 3D box objects Update the poston o an object teratvely (repeatedly) to anmate ts moton Update the momentum

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

On the Efficiency of Swap-Based Clustering

On the Efficiency of Swap-Based Clustering On the Effcency of Swap-Based Clusterng Pas Fränt and Oll Vrmaok Department of Computer Scence, Unversty of Joensuu, Fnland {frant, ovrma}@cs.oensuu.f Abstract. Random swap-based clusterng s very smple

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Classification Based Mode Decisions for Video over Networks

Classification Based Mode Decisions for Video over Networks Classfcaton Based Mode Decsons for Vdeo over Networks Deepak S. Turaga and Tsuhan Chen Advanced Multmeda Processng Lab Tranng data for Inter-Intra Decson Inter-Intra Decson Regons pdf 6 5 6 5 Energy 4

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Collision Detection. Overview. Efficient Collision Detection. Collision Detection with Rays: Example. C = nm + (n choose 2)

Collision Detection. Overview. Efficient Collision Detection. Collision Detection with Rays: Example. C = nm + (n choose 2) Overvew Collson detecton wth Rays Collson detecton usng BSP trees Herarchcal Collson Detecton OBB tree, k-dop tree algorthms Multple object CD system Collson Detecton Fundamental to graphcs, VR applcatons

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information