A Multiresolution Symbolic Representation of Time Series

Size: px
Start display at page:

Download "A Multiresolution Symbolic Representation of Time Series"

Transcription

1 A Multresoluton Symbolc Representaton of Tme Seres Vasleos Megalookonomou 1 Qang Wang 1 Guo L 1 Chrstos Faloutsos 2 1 Department of Computer & Informaton Scences 2 Department of Computer Scence Temple Unversty Carnege Mellon Unversty Phladelpha, PA, USA Pttsburgh, PA, USA {vasls,qwang,gl}@temple.edu chrstos@cs.cmu.edu Abstract Effcently and accurately searchng for smlartes among tme seres and dscoverng nterestng patterns s an mportant and non-trval problem. In ths paper, we ntroduce a new representaton of tme seres, the Multresoluton Vector Quantzed (MVQ) approxmaton, along wth a new dstance functon. The novelty of MVQ s that t keeps both local and global nformaton about the orgnal tme seres n a herarchcal mechansm, processng the orgnal tme seres at multple resolutons. Moreover, the proposed representaton s symbolc employng key subsequences and potentally allows the applcaton of text-based retreval technques nto the smlarty analyss of tme seres. The proposed method s fast and scales lnearly wth the sze of database and the dmensonalty. Contrary to the vast majorty n the lterature that uses the Eucldean dstance, MVQ uses a mult-resoluton/herarchcal dstance functon. We performed experments wth real and synthetc data. The proposed dstance functon consstently outperforms all the major compettors (Eucldean, Dynamc Tme Warpng, Pecewse Aggregate Approxmaton) achevng up to 20% better precson/recall and clusterng accuracy on the tested datasets. 1. Introducton The problem of effcent retreval of smlar tme seres has receved a lot of attenton due to ts many applcatons n dfferent domans. Brefly, ths problem can be stated as follows: Gven a query sequence q, a database S of N sequences, S 1,S 2,,S N, a dstance measure D and a tolerance threshold ε, fnd the set of sequences R n S that are wthn dstance ε from q. More precsely, fnd: R = {S S D(q, S ) ε }. To compare two gven tme seres, a sutable measure of smlarty should be gven. Nave approaches for comparng tme sequences generally take polynomal tme n the length of the sequences, typcally lnear or quadratc tme. These approaches are not useful for large tme seres databases. Promsng technques nclude those that are based on the reducton of dmensonalty of the orgnal sequences. In ths case, the sequences can be represented as multdmensonal vectors and smlar sequences can be retreved n sublnear tme. There may be several dfferent crtera to evaluate a method, but generally speakng, a good one should be fast, scalable, and accurate (accordng to some ground truth). In ths paper, we ntroduce a new method that satsfes these requrements. Our method s called Multresoluton Vector Quantzed (MVQ) approxmaton and has the followng characterstcs: 1) It uses tme-tested vector quantzaton methods to dscover a vocabulary of subsequences; 2) It takes multple resolutons nto account ths brngs mproved accuracy; 3) It provdes a new dstance functon utlzng textbased technques from Informaton Retreval, to wegh down unnterestng matches, thus mprovng the accuracy. As Agrawal et al. [2] proposed, compared to the Eucldean dstance, a more ntutve dea s that two seres should be consdered smlar f they have enough nonoverlappng tme-ordered pars of subsequences that are smlar. In ths paper, nstead of calculatng the Eucldean dstance, we frst extract key subsequences utlzng the Vector Quantzaton (VQ) [12] technque and encode each tme seres based on the frequency of appearance of each key subsequence. We then calculate smlartes between dfferent tme seres n terms of key subsequence matches. Ths method can be very meanngful n many domans, for example, when comparng two stocks durng a long perod, we may want to fnd out durng how many months the stocks have smlar movements, though the same trend may appear n dfferent months for dfferent stocks. Ths applcaton s smlar to mnng motfs n massve tme seres databases [22]. Whle the hstogram metrc can record the local nformaton very well, t may lose much global nformaton of the tme seres, snce t does not keep track of the order of appearance of dfferent key subsequences. To deal wth ths problem, we propose to apply a herarchcal mechansm: the orgnal tme seres are processed at several dfferent resolutons, and smlarty analyss s performed usng a weghted dstance functon combnng all the resoluton levels. For example, when consderng a tme seres representng a stock prce

2 movement, we know that subsequences of dfferent length have dfferent real meanngs. If the length s 5, the subsequence stands for a weekly trend of the stock. Smlarly for length 20 we have the monthly trend. As we demonstrate n the experments, MVQ outperforms prevous state of the art methods n clusterng and smlarty searches. Intutvely, the excellent performance of the proposed method can be justfed because of the followng facts: 1) t explots pror knowledge about the data usng a learnng approach 2) t takes multple resolutons nto account and 3) unlke wavelets (that also take multple resolutons nto account) t partally gnores the orderng of the codewords wthn the tme sequence due to the hstogram model that s beng used to calculate smlarty. Moreover, the proposed representaton s symbolc employng key subsequences and allows the applcaton of text-based retreval technques nto the smlarty analyss of tme seres. 2. Background 2.1 Related Work Many approaches and technques have been proposed n the past decade [1, 2, 4, 9, 10, 13, 14, 16, 18, 19, 21, 27, 31, 32] that address the problem of smlarty n tme seres. To deal wth dmensonalty reducton, the soluton to extract a sgnature from each sequence and to ndex the sgnature space was orgnally proposed by Faloutsos et al. [9,10]. To guarantee completeness (.e., no false dsmssals) the admssblty crteron that the dstance functon used n the sgnature space must underestmate the true dstance measure (boundng lemma) was also proposed [10]. Obeyng the admssblty crteron, many methods have been suggested and proved useful n dfferent felds, such as the F-ndex ntroduced by Agrawal et al. [1] or the ST-ndex proposed by Faloutsos et al. [10]. Other approaches for effcent smlarty searches on tme sequences are based on pecewse constant approxmaton (PCA) or pecewse aggregate approxmaton (PAA). Y and Faloutsos [32] and Keogh et al. [19,21] proposed to dvde each sequence nto k segments of equal length and to use the average value of each segment as a coordnate of a k-dmensonal feature vector. The advantages of ths transform are that t s very fast and easy to mplement, the sgnature can be used wth arbtrary L p norms, and the ndex can be buld n lnear tme. In addton, the representaton can be used wth a weghted Eucldean dstance where each segment of the sequence has dfferent weght. Keogh et al. [18] have also proposed an Adaptve Pecewse Constant Approxmaton (APCA) where the segments can be of varable length offerng a more effectve compresson than PCA. In [26] the authors propose a pecewse vector quantzed approxmaton (PVQA) of tme seres. In [7] a technque for compressng multple streams of data n sensor networks that employs an approxmate representaton usng a base sgnal extracted from hstorc nformaton has been proposed. The algorthm constructs a dctonary of canddate base sgnals n the process of buldng a base sgnal. The use of mult-scale hstograms and a weghted Eucldean dstance for measurng the smlarty of tme seres at several precson levels has been nvestgated n [6]. In addton, general dmensonalty reducton technques such as Sngular Value Decomposton (SVD) have been used n tme seres data [19]. For these methods n whch the dstance metrc lower bounds the Eucldean dstance, one of the most sgnfcant characterstcs s the avodance of false dsmssals, though there may be a lot of false alarms. However, n some cases, the exstence of too many false alarms may decrease the effcency of retreval. At the same tme, as many researchers have mentoned n ther work [15,29], the Eucldean dstance s not always the optmal dstance measure. For example, n some tme seres, dfferent parts have dfferent levels of sgnfcance n ther meanng. Also, the Eucldean dstance does not allow shftng n tme axs, whch s not unusual n real lfe applcatons. In order to extract hgh-level features out of tme seres, Koudas et al. [28] formalzed problems of dentfyng varous representatve trends n tme seres data. Snce the Eucldean s not the best dstance one can use (as shown later n our paper and n papers we referenced earler), here, we propose a new dstance functon. We do not deal wth the problem of lower boundng the Eucldean on the orgnal vectors snce ths s not so meanngful anymore. 2.2 Prelmnares To make the presentaton of the proposed work clear, we now gve descrptons of varous concepts and defntons used n the paper. We start wth the defnton of a tme sequence and ts subsequences. Defnton 1. Tme Sequence: A sequence (ordered collecton) of real values. X = x 1, x 2,, x n, where n can be very large. Defnton 2. Subsequence: Gven a tme sequence X = x 1, x 2,, x n, of length n, a subsequence S of X s a sequence of length m consstng of contguous postons from X,.e., S=x k,x k+1,,x k+m-1 ; 1 k n-m+1. In smlarty analyss, we need to defne a metrc for the smlarty, that s, a measure of the dstance between two tme seres. Gven two tme seres, X = x 1, x 2,, x n, Y = y 1, y 2,, y n, ther dstance, D, s defned, n general,

3 as an L p norm, where for p=2, the dstance s the Eucldean, the most popular among the metrcs. An ntutve noton of exact and approxmate smlarty was also formalzed by Goldn, and Kanellaks [8]. Obvously, the smplest way of calculatng the smlarty (or dstance) among tme seres s to compute the Eucldean dstance drectly,.e., on the orgnal seres. For a small dataset ths may be feasble, however, for large data sets effcency s a problem, snce the tme complexty s O(N*n), where n s the number of features that need to be represented for each tme seres and N s the number of tme seres n the dataset. In order to compute effcently whle keepng the accuracy not sgnfcantly affected, many technques of dmensonalty reducton (as ntroduced n secton 2.1) have been suggested. In addton to the computatonal complexty assocated wth the Eucldean dstance calculaton on the orgnal tme seres, we cannot always be sure that the nearest neghbors n Eucldean space are ndeed the most smlar ones. Ths s because the pont-based nformaton model (computng smlarty based on every pont) contans only low-level features of the tme seres and t s vulnerable to dfferent knds of shape transformatons, such as shftng and scalng. Under such crcumstances, t would be better f we could fnd some hgh-level features and apply a more robust nformaton retreval model for tme seres analyss. Based on ths dea, we ntroduce a new framework that uses key subsequences to represent tme seres and facltate smlarty retreval. Ths framework conssts of the followng man components: 1) Codebook generaton from a set of tranng samples; 2) Tme seres encodng usng the codebook; 3) Tme seres feature representaton and retreval. Ths framework s smlar to the key block framework suggested by Zhang et al. [33] for content-based mage retreval. In the tme sequences doman the dea was ntroduced n [26]. However, n order to keep both local and global nformaton and mprove the accuracy, we ntroduce the use of multple codebooks wth dfferent resolutons. For each resoluton, Vector Quantzaton [12, 24] s appled to dscover the vocabulary of subsequences n a tme seres database. In VQ a codeword (or codevector) s used to represent a number of smlar vectors. More precsely, a vector quantzer Q of dmenson n and sze s s a mappng: Q: R n C from a vector or a pont n n-dmensonal Eucldean space, R n, to a fnte set C={c 1, c 2,,c s }, the codebook, contanng s output or reproducton ponts c R n, called codewords. Assocated wth every s-pont VQ s a partton of R n nto s regons or cells R for J {1,2,,s} where R ={x R n : Q(x)=c }. For a gven Fgure 1. The Generalzed Lloyd Algorthm (GLA). dstorton functon 1 d(x,c ) (such as the mean squared error (MSE)) between an nput vector x and a codeword c, an optmal mappng should satsfy two condtons: (a) Nearest neghbor Condton (NNC): For a gven codebook, the optmal partton R = {R : =1,2,,s} satsfes: R = { x : d( x, c ) d( x, c ); j} where c s the codeword representng partton R. Gven a pont x n the dataset, the encodng functon for x, Encodng(x)=c only f d(x,c ) d(x,c j ) j. (b) Centrod condton (CC): For a gven partton regon {R : =1,,s} the optmal reconstructon vector (codeword) satsfes: c =centrod(r ) where the centrod of a set R={x : =1, R } s defned as: R 1 centrod( R) = x R The Generalzed Lloyd Algorthm (GLA) [24, 25] s an teratve procedure that produces a locally optmal codebook from a tranng set based on these two condtons (that form the Lloyd teraton). Ths s done durng a tranng phase. The man structure of GLA s gven n the flowchart (see Fgure 1). Startng wth an ntal codebook, the GLA algorthm repeats the Lloyd teraton untl the fractonal drop of the dstorton becomes less than a gven threshold. Ths process s guaranteed to converge snce from the necessary condtons for optmalty each applcaton of the Lloyd teraton must reduce or leave unchanged the average dstorton [12]. To quanttatvely measure the smlarty between dfferent tme seres encoded wth a VQ codebook, we employ the Hstogram Model (HM) that has been successfully appled n mage retreval [33]. We present ths model n the context of tme seres analyss: 1 S HM ( q, t) = (1) 1+ ds( q, t) where = 1 s f, t f, q ds( q, t) =. = 1 1+ f, t + f, q 1 The dstorton s a measure of overall qualty degradaton due to approxmaton of a vector by ts closest representatve from a codebook. j

4 In the formula, f,t and f,q refer to the appearance frequency of codeword c n tme seres t and q, respectvely. Although ths model focuses on the appearance of ndvdual key subsequences n tme seres, correlaton between key subsequences can also be addressed [33]. Informaton about some alternatve models can be found n Appendx A. 3. Proposed Method: MVQ We propose a new method to represent tme seres data, the Multresoluton Vector Quantzed (MVQ) approxmaton, along wth a new dstance functon. The method parttons each tme seres nto equ-length segments and represents each segment wth the most smlar key subsequence from a codebook. The codebook s generated earler durng a tranng phase usng VQ. By countng the appearance frequency of each codeword n each tme seres a new representaton s obtaned. The pecewse approxmaton wth VQ encodng s appled at several resolutons. Table 1 gves a bref descrpton of the notaton we use n the rest of the paper. In the followng subsectons, we ntroduce the components of our method. Table 1. Symbol Table X Orgnal tme seres, X= x 1,x 2,,x n of length n X Encoded form of the orgnal tme seres X = f 1,f 2,,f s N Number of tme seres n the dataset n Length of orgnal tme seres C Codebook: a set of codewords {c 1,,c k,, c s } c Number of resoluton levels s Sze of codebook l Length of codeword 3.1 Codebook Generaton For a gven dataset, a codebook wth s codewords C={c 1,c 2,, c s } s frst generated usng a clusterng algorthm (such as the GLA ntroduced n Secton 2). We apply ths algorthm to generate the codebook based on the dataset T of tme seres. The dataset s preprocessed before the generaton of the codebook; each tme seres n T s parttoned nto a number of segments each of length l and each segment forms a sample of the tranng set that s used to generate the codebook. Each codeword n the codebook corresponds to a key subsequence; t s an approxmaton for a certan group of subsequences of length l. All the tme seres n the database are then encoded usng the codebook (see Secton 3.2). The verson of GLA we use, requres a partton splt mechansm to solve the ntal codebook generaton problem. The algorthm starts wth a codebook contanng only one codeword, the centrod of the whole tranng set. In each repetton and before the applcaton of the Lloyd Table 2. Codewords of a 2-level codebook that are used to represent SYNDATA n MVQ approxmaton. teraton, t doubles the number of codewords (and cells) from the prevous teraton by splttng the most populous cells. Table 2 shows some of the codewords (at two dfferent levels) used by MVQ to represent the Control Chart dataset (SYNDATA) [30]. 3.2 Tme Seres Encodng After a codebook s generated, we can form a new representaton for each tme seres n the dataset. In the process of encodng, every seres s decomposed nto segments (.e., subsequences) of length l (whch s equal to the length of each codeword). For each segment, the closest (based on a dstance metrc) codeword n the codebook s then found and the correspondng ndex s used to represent ths segment. After fndng the correspondng codeword ndex for each segment, the appearance frequency of each codeword s counted. The new representaton of a tme seres s a vector X =f 1,f 2,,f s showng the appearance frequency of every codeword. By applyng ths new encodng form, we can easly deal wth tme seres wth arbtrary large number of ponts, snce we can always reduce ther dmensonalty to a rather small number gven by the sze, s, of the codebook. 3.3 Tme Seres Summarzaton Besdes achevng dmensonalty reducton, ths encodng process also provdes a very nce summarzaton of the tme seres, whch s useful n many applcatons. Table 2 shows dfferent codewords we obtan usng ths method; these codewords stand for the most representatve subsequences (of a gven length) for the entre tme seres dataset. Instead of the whole tme seres, we may be more nterested n the usage of representatve key subsequences. Ths s very useful n the dscovery of motfs or approxmately repeated subsequences n tme seres [22]. In ths case, we can just check the appearance frequences of these codewords and get an overvew of the tme seres. For example, n Fgure 2, we show a tme seres representaton usng a number of codewords. Two of these codewords are beng used twce revealng a pattern that would reman undetected usng prevous

5 Fgure 2. A tme seres (bottom) s beng represented as a sequence of representatve subsequences.e., codewords (top). Two codewords (#3 and #5) are beng used twce n ths representaton. technques. Results on tme seres summarzaton are presented n Secton Dstance measure and a multresoluton representaton Based on the frequency of appearance of key sequences wthn tme seres, features of tme seres are extracted formng a new representaton of a rather small dmensonalty and smlarty retreval can be effcently performed. We stll need a dstance measure approprate for ths new representaton. We choose the Hstogram Model as the dstance measure, and all the expermental results presented n Secton 4 are based on t. By applyng the hstogram model, t s not dffcult to dentfy the tme seres that are smlar to a gven query (.e., that have smlar frequent patterns). However, usng only one codebook (analyss at a sngle resoluton), ntroduces some problems that cannot be gnored. Frst, although the local nformaton of a tme seres s kept after the encodng process, the new representaton of a tme seres s not recordng the order among the ndces of dfferent codewords. Some mportant global nformaton of the tme seres s lost n ths representaton. In Fgure 3, we see two dfferent tme seres whose encoded representatons are the same (2, 1). Ths problem n the key subsequence representaton correspondngly ncreases the number of false alarms reducng the performance of the sngle resoluton (.e., sngle codebook) method. On the other hand, n real applcatons, t s not always easy to fnd a sutable resoluton (correspondngly, a sutable codeword length). Moreover, an napproprate codeword length may reduce the effcency. In order to solve these potental problems occurrng due to the use of a sngle resoluton, we ntroduce a herarchcal mechansm, whch nvolves several dfferent resolutons for encodng. Whle the encodng form of hgher resoluton pays more attenton to the detal of local nformaton, that of lower resoluton represents more global nformaton. The pecewse approxmaton wth VQ encodng s appled at several resolutons. For each resoluton ths s done by groupng a dfferent number of consecutve segments together,.e., the length of the Fgure 3. Necessty of multresoluton representaton: dfferent seres wth the same encoded representaton. segment at a gven resoluton s a multple (usually double) the length of the segment at the mmedate hgher resoluton representaton. Thus, we call ths representaton Multresoluton Vector Quantzed (MVQ) approxmaton. Fgure 4 shows a tme seres and ts reconstructon seres usng dfferent resolutons. (For dfferent resoluton levels, the szes of codebooks are the same, 32, and the lengths of codewords are 128, 64, 32, 16, respectvely.) By assgnng reasonable weghts to dfferent resolutons, we defne a new weghted smlarty measure, the Herarchcal Hstogram Model: c S HHM (q,d )= j w *S (q,d (2) = 1 HM j) where c s the number of resoluton levels. Fgure 4. Reconstructon of tme seres usng dfferent resolutons 3.5 Parameters of MVQ Here we dscuss n more detal the parameters of MVQ and how to choose ther values. For the number c of resoluton levels an ntutve choce s c = log n, wth the length of a codeword at the th level beng 2-1 (1 log n). However, when the codeword s too short (e.g., of length 1, 2), ths becomes meanngless. Thus, we need to set a mnmum value of codeword length l mn and set the number of herarchcal levels as c = log (n / l mn ) +1. The codeword length (l) for each level s chosen as follows: At the frst level, each tme seres s treated as a whole (l = n); at the second level, each tme seres s parttoned nto two parts (l = n/2), and at the th level, l = n / 2-1. In cases where n s not a power of two we satsfy ths constrant approxmately. The sze of the codebook at each resoluton level s data dependent, snce the more subsequences used durng the tranng process and the hgher ther varablty, the

6 larger the sze of the codebook needed. In fact, the hgher the number of parttons and the number of codewords the better the approxmaton but also the more computaton and space s needed. So, there s a tradeoff between effcency and accuracy of approxmaton. In practce (as also shown n our experments (see Secton 4)), use of a rather small codebook can acheve very good results. In addton to the number of codewords, the Lloyd algorthm uses a threshold to stop the teratons when the fractonal drop of the dstorton between consecutve teratons reaches a certan pont. A common value for ths threshold s Our experments show that a multresoluton representaton acheves much hgher accuracy than a sngle resoluton one. The prce for ths mprovement s slghtly more computaton, snce we have to calculate the smlarty at each resoluton level before we can fnally compute S HHM. In our experments we studed the behavor of the multresoluton approach wth dfferent weghts assgned to each resoluton level. Lackng any nformaton or any pror knowledge about the doman (.e., the most realstc case) the straghtforward soluton s to use equal weghts for all resolutons. Ths choce provdes the best results n almost all of the experments we performed. The proposed method also provdes the ablty to nclude pror doman knowledge n the selecton of the weghts. 4. Experments In tme seres smlarty analyss, best matches retreval and clusterng are two of the most common and mportant applcatons. We performed experments to evaluate the effectveness and effcency of our method n these two applcatons. We address the followng ssues: (a) how accurate the method s, (b) how t compares to alternatves, (c) how fast and scalable t s. We start wth a descrpton of the datasets we used n our experments. 4.1 Datasets In the experments presented n ths secton, one synthetc and two real datasets are nvolved. We used the Control Chart synthetc dataset (SYNDATA) whch s downloadable from the UCI KDD archve [30]. Ths dataset contans 600 examples of control charts (each has 60 ponts) synthetcally generated by the process n Alcock and Manolopoulos [3]. The tme seres belong to sx dfferent classes of control charts: Normal, Cyclc, Increasng trend, Decreasng trend, Upward shft, and Downward shft, wth each class havng 100 tme seres. The frst real dataset, CAMMOUSE, s a spatotemporal dataset of 5 words obtaned usng the Camera Mouse Program [5]. The 2D tme seres obtaned represent the X and Y poston of a human trackng feature (e.g., tp of fnger). In conjuncton wth a spellng program the user can wrte varous words and the transtons of the trackng feature or word mage s profles are beng recorded. We used 3 recordngs of 5 words. The 5 words were: Athens, Berln, Boston, London, and Pars. For smplcty, only the x-values are consdered. The average length of sequences n ths dataset s 1100 ponts. The shortest one s 834 ponts and the longest one s 1719 ponts. Snce the length of sequences vares for dfferent nstances, we stretched all sequences to a same length of 1600 ponts. The second real dataset, RTT, conssts of RTT (packet round trp tme) measurements from UCR to CMU wth sendng rate of 50 msec for a day (Feb 10, 2002, startng at 8:20pm). The total number of RTT values s 1,728,000. The dataset was parttoned nto 24 tme seres of length 72,000, each standng for an hour of RTT measurements. These measurements vary between 70 and 150. For clusterng experments we separated the tme seres nto the followng three classes based on the rato of tme where the RTT value s greater than 100: (a) heavy traffc hours: rato > 0.5 (6 seres), (b) medum traffc hours: 0.5 > rato > 0.1 (7 seres) and (c) lght traffc hours: rato < 0.1 ( 11 seres). In order to avod the effects of scalng and shftng n the analyss, before we actually perform any experment, we preprocess the datasets wth zero-mean normalzaton. That s, each tme seres X s normalzed as: X = ( X - X ) / σ(x) where X s the mean value of X and σ(x) s ts standard devaton. For the RTT dataset we take logarthms before we apply the normalzaton. 4.2 Best Match Searchng Experment desgn The best match searchng s defned as follows: gven a query sequence, fnd the best k matches n the database (.e., havng the lowest dssmlarty wth the query) or fnd all the tme seres whose dssmlarty wth the query s below some predefned threshold. In order to evaluate the performance of dfferent approaches n best match searchng, we need an evaluaton metrc. Defnton 3. For a gven query, the set of tme seres whch are actually wthn the same class as the query (gven our pror knowledge) s taken as the standard set (std_set(q)), and the results found by dfferent approaches (knn(q)) are compared wth ths set. The matchng accuracy s defned as: knn(q) std_set(q) Accuracy = 100% (3) k In the defnton above, knn(q), s the k nearest neghbors for the query found by a certan method. In our experments, every tme seres n the dataset s treated as a query, and the best k matches (k nearest neghbors) are sought wthn the whole dataset. The average accuracy of a certan method s then calculated based on the matchng

7 Table 3. Experment parameters for SYNDATA MVQ Parameters Level l s results takng each tme seres as a query. The actual value of k we use depends on the number of tme seres wthn the same class. In our experments, the value of k can vary, but for the purpose of demonstraton, we just show the results when k s set to the number of tme seres wthn the same class Experments on SYNDATA In ths secton, we show the results of the experments performed on the SYNDATA dataset. The expermental parameters for dfferent resoluton levels are gven n Table 3. Wth the ncrease of resoluton, the codeword length decreases and the sze of codebook ncreases (snce there are more tranng samples avalable for that resoluton). The expermental results on SYNDATA are shown n Table 4. The frst element n the weght vector represents the weght assgned to the frst level, the second element the weght assgned to the second level, and so on (e.g., wth a weght vector [ ], only the frst level s nvolved n dstance calculatons). Accuracy s defned based on Eq. (3). The expermental results clearly demonstrate the effect of usng a multresoluton approach: the combnaton of multple resolutons dramatcally mproves the matchng accuracy over the sngle resoluton approach. Table 4. Matchng accuracy on SYNDATA Method Weght Vector Accuracy Sngle level VQ [ ] 0.55 [ ] 0.70 [ ] 0.65 [ ] 0.48 [ ] 0.46 MVQ [ ] 0.83 Eucldean 0.51 To show the effectveness of the proposed representaton and dstance metrc, we appled the plan Eucldean dstance (naïve method) on the same dataset, whch drectly computes the Eucldean dstance to measure the smlarty between tme seres. From Table 4 we can conclude that for ths dataset, the Naïve method does worse than most of the sngle level VQ approxmatons, whle MVQ provdes a much better matchng accuracy. Table 5. Experment parameters for CAMMOUSE data MVQ parameters Level l s Experments on CAMMOUSE We performed smlar experments as wth SYNDATA dataset. The experment parameters and results are shown n Table 5 and 6 respectvely. Table 6. Matchng accuracy on CAMMOUSE data Method Weght Vector Accuracy Sngle level VQ [ ] 0.56 [ ] 0.60 [ ] 0.44 [ ] 0.56 [ ] 0.60 MVQ [ ] 0.83 Eucldean 0.58 From Table 6, t s clear that for the CAMMOUSE dataset, the herarchcal mechansm also helps to mprove the accuracy obtaned wth a sngle resoluton level. Comparng wth the average matchng accuracy of the plan Eucldean method, the retreval accuracy of MVQ s much better (25% hgher) Comparson wth other methods In order to compare the effcency and accuracy of MVQ n smlarty searches we consdered alternatve methods ncludng the Dscrete Fourer Transform (DFT), plan Eucldean, Dynamc Tme Warpng (DTW) and Symbolc Aggregate approxmaton (SAX) [23]. For evaluaton and comparson, every tme seres n the dataset s taken as a query, and the precson and recall pars correspondng to the top 1,2,3,,k retreved tme seres are calculated. Then the average value of precson and recall s computed for the whole dataset. The actual value for k s dfferent for dfferent methods. For DFT, SAX and MVQ, some parameters need to be set up for the experments. For DFT, we take the frst 16 non-zero coeffcents; for SAX the number of segments s set to 15 (SYNDATA) or 16 (CAMMOUSE) and the codebook sze s set to 16. For MVQ we take the same codebook szes as n prevous subsecton for the 5 resoluton levels and use [ ] as the weght vector. Fgure 5 shows the precson-recall performance on SYNDATA and CAMMOUSE. Notce that for a fxed recall rato, the fewer tme seres are retreved the better, and subsequently the hgher the precson s. For both

8 (a) (b) Fgure 5. Precson-recall for dfferent methods (a) on SYNDATA (b) on CAMMOUSE datasets the precson decreases quckly wth Plan Eucldean, DFT, SAX and DTW, whle the precson wth MVQ stays at a hgh level. MVQ acheves the best performance on these datasets. When the tme seres are short (as n the case of SYNDATA) MVQ s need for more space due to the multple codebooks s notceable. However, MVQ s the best dstance functon and provdes the best accuracy. An nterestng observaton s that n most cases, even wth only one layer, our dstance measure can provde comparable or even better results than the other methods. Later n the experments, we restrct the space requrements of MVQ so that they are comparable to those of the other methods. Fgure 6. Processng tme and scalablty Besdes accuracy, other consderatons for a good method should nclude speed and scalablty. Fgure 6 shows the processng tme of dfferent methods on datasets wth varous szes. The expermental settngs for dfferent methods are the same as before. DFT shows the best processng effcency wth the shortest tme, but consderng the poor accuracy result shown n Fgure5, t should not be taken as a good canddate. In comparson to the other methods we consdered here, although the encodng of the query consumes some tme, MVQ outperforms them all n speed when the database sze s not too small. Notce that the tme reported here for MVQ does not nclude the preprocessng needed durng the tranng phase to obtan the codebook (s) for a dataset. A bref dscusson about the preprocessng cost can be found n Appendx B. 4.3 Clusterng experments Experment desgn. For tme seres clusterng, we conducted experments Table 7. Clusterng accuracy of MVQ on SYNDATA Method Weght Vector Accuracy Sngle level VQ [ ] 0.69 [ ] 0.71 [ ] 0.63 [ ] 0.51 [ ] 0.49 MVQ [ ] 0.82 DFT 0.67 SAX 0.65 DTW 0.80 Eucldean 0.55 on both synthetc and real datasets. The PAM (Parttonng Around Medods) clusterng algorthm was used to cluster the orgnal tme-seres n every dataset. However, dfferent approaches appled for dstance calculaton resulted n dfferent dstance matrces for the tme seres, and subsequently n dfferent clusterng results. In order to evaluate the clusterng accuracy and qualty of our approach, a cluster smlarty metrc was used. Gven two clusterngs, G=G 1,G 2,,G k (the true clusters), and A=A 1,A 2,,A k (clusterng result by a certan method), the clusterng accuracy s evaluated wth the cluster smlarty defned as: max Sm G A j (, j ) Sm(G, A) = (4) k where 2 G A j. Sm(G, A j ) = G + A j Ths metrc was ntroduced n [11] to evaluate clusterng results and was also used n [17]. The metrc value ranges between 0 and 1, and t takes the maxmal,.e. 1, when the clusterng result s perfect. For each dataset, we used the same experment parameters as n Secton 4.1. Consderng the stochastc nature of the PAM algorthm, gven a set of parameters, each experment was repeated 10 tmes, and the average result s reported here. For the purpose of comparson, clusterng results wth other methods are also provded Experments on SYNDATA dataset. Takng the same parameters as shown n Table 3, clusterng experments were performed on the SYNDATA dataset. The expermental results are lsted n Table 7. Clusterng performance of other methods s also reported. It s clear that for ths dataset, we cannot acheve satsfyng performance usng the Eucldean Dstance as the dstance metrc, whle the suggested method s very promsng. The performance acheved by several sngle resoluton levels of the VQ approxmaton s better than that of the Naïve method (Eucldean on the orgnal tme seres) and comparable or better to that of the other

9 Table 8. Clusterng accuracy of MVQ on CAMMOUSE Method Weght Vector Accuracy [ ] 0.61 Sngle level [ ] 0.60 VQ [ ] 0.59 [ ] 0.63 [ ] 0.62 MVQ [ ] 0.79 DFT 0.62 SAX 0.58 DTW 0.69 Eucldean 0.61 methods. By combnng dfferent resoluton levels, the clusterng result s further mproved. For completeness we compared a multresoluton mplementaton of SAX to MVQ. We used 5 resoluton levels wth the number of segments as 2, 3, 6, 30 and 60 respectvely. The accuraces of SAX wth dfferent resolutons vary between 0.54 and However, when we tred to combne the dstance measurement n all resoluton levels, the accuracy was Snce SAX encodes already the order of segments n the orgnal tme seres, the use of multresoluton levels does not mprove the accuracy of the representaton and ts performance Experments on CAMMOUSE dataset. The expermental parameters for the CAMMOUSE dataset are the same as n Table 5. Table 8 dsplays the results for MVQ wth dfferent weght vectors and results of the other methods. Agan, the performance of plan Eucldean Dstance s poor, whle MVQ provdes much better clusterng results. Its performance s also superor to the other methods we tested. Observe agan that even wth only one layer, our dstance measure can provde comparatve or even better results than the others (n ths case MVQ has smlar space requrements as the other methods) Experments on the RTT dataset. For MVQ we used 5 dfferent layers 1-5 wth 3, 8, 8, 16, and 16 codewords respectvely. Ths s a total of 51 codewords. We used the same number of parameters for DFT and SAX. Table 9 compares the clusterng accuracy of MVQ wth that of the other methods. An mportant observaton here s that we do not need to take all layers nto consderaton to get the best performance. The reason s that when the dfferent resoluton levels cannot present unformly rch nformaton, the nvolvement of less nformatve levels wll reduce the overall accuracy. Furthermore, the study at dfferent sngle resoluton levels can help us dentfy the mportance of dfferent layers n dscrmnatng among classes. Table 9. Clusterng results on RTT (wth same space requrements for MVQ as for the other methods) Method Weght Vector Accuracy [10000] 0.55 Sngle level [01000] 0.52 VQ [00100] 0.57 [00010] 0.80 [00001] 0.79 MVQ [00011] 0.81 [11111] 0.60 DFT 0.54 SAX 0.54 DTW 0.62 Eucldean Summarzng tme seres Here, we present results from applyng MVQ to summarze tme seres. We consder the SYNDATA dataset. To help n evaluatng the summarzaton capabltes of the proposed approach, n Fgure 7, we present a few typcal tme seres that we manually extracted from each of the sx classes. Fgure 7. Representatve tme seres extracted manually from the SYNDATA dataset. Table 10 shows how the codewords of the frst codebook are used to represent each class at the frst level (of resoluton). The actual codewords are dsplayed n Fgure 8. The frst number n each cell of Table 10 shows how usage of a codeword (row) s dstrbuted across classes (we show percentages). These numbers add up to 100 for each row (codeword). The second number n each cell shows the usage (n percentages) of all codewords for a certan class (column). They add up to 100 for each column. One can make the followng observatons about the representaton of classes at ths level (more coarse approxmaton). For all tme seres n class 1 (normal) only the 2 nd codeword s used and only class 2 (cyclc) tme seres use the same codeword (rarely though). The 2 nd codeword s ndeed very representatve of the tme seres n class 1. Tme seres n class 2 make equal use of codewords 1, 5, and 6 whle they rarely use codeword 2. Snce class 2 s the cyclc one ths makes a lot of sense. One could have a concse representaton by just lookng

10 Table 10. The codewords (c:1-6) used to represent each one of the 6 classes of SYNDATA at level 1. c class1 class2 class3 class4 class5 class6 1 0, 0 100,31 0, 0 0, 0 0, 0 0, ,100 4, 4 0, 0 0, 0 0, 0 0, 0 3 0, 0 0, 0 0, 0 50,100 0, 0 50, , 0 0, 0 50,100 0, 0 50,100 0, 0 5 0, 0 100,40 0, 0 0, 0 0, 0 0, 0 6 0, 0 100,25 0, 0 0, 0 0, 0 0, 0 at the codewords and the frequency of ther use n dfferent classes. Classes 3 (ncreasng trend) and 5 (upward shft) make equal use of the 4 th codeword although no other classes use ths codeword. They both have an ncreasng trend so ths summarzes them very well. Smlarly for classes 4 (decreasng trend) and 6 (downward shft) the 3 rd codeword s used and no other class s usng ths codeword. At ths frst level we cannot dscrmnate between classes 3 and 5 and classes 4 and 6. Fgure 8. The codewords used to represent tme seres of the SYNDATA dataset at the frst level. The second level though provdes more detals nto the summarzaton enablng the dscrmnaton between classes 3 and 5 and classes 4 and 6. Table 11 shows how the codewords of the second codebook are used to represent each class at the second level. The actual codewords are dsplayed n Fgure 9. Please note that Fgure 9. The codewords used to represent tme seres of the SYNDATA dataset at the second level. codeword numbers correspond to dfferent codewords (not the same codewords as for level 1). Tme seres n class 5 make heavy use of codeword 12 that ndeed represents the upward shft. Ths s not the case for class 3 whch nstead uses heavly codewords 1 and 9. Smlarly, class 6 makes heavy use of codeword 15 that ndeed represents the downward shft. Class 4 uses the codeword 15 very rarely. The tables for the other levels are not shown here due to space lmtatons. They are also Table 11. The codewords (c:1-16) used to represent each one of the 6 classes of SYNDATA at level 2. c class1 class2 class3 class4 class5 class6 1 2, 1 0, 0 94, 19 0, 0 2, 1 2, , 51 10, 6 0, 0 0, 0 0, 0 0, 0 3 1, 1 0, 0 0, 0 3, 1 56, 21 40, , 0 0, 0 96, 48 0, 0 3, 1 1, 1 5 0, 0 0, 0 37, 5 0, 0 63, 9 0, , 5 89, 39 0, 0 0, 0 0, 0 0, 0 7 0, 0 0, 0 0, 0 100, 45 0, 0 0, 0 8 0, 0 0, 0 1, 1 3, 2 46, 28 5, , 0 0, 0 98, 26 0, 0 2, 1 0, , 39 22, 11 0, 0 0, 0 0, 0 0, , 0 0, 0 0, 0 12, 3 25, 7 63, , 0 0, 0 7, 1 0, 0 93, 21 0, , 0 0, 0 0, 0 0, 0 100, 11 0, , 2 96, 44 0, 0 0, 0 0, 0 0, , 0 0, 0 0, 0 3, 1 0, 0 97, , 1 0, 0 0, 0 70, 48 0, 0 29, 20 not very useful for summarzaton of ths partcular dataset snce most of the useful summarzaton nformaton s extracted from the frst two levels. These results demonstrate the ablty of MVQ to provde a summarzaton of tme seres datasets. Ths s possble due to the symbolc and multresoluton nature of the representaton. 5. Dscusson The MVQ approach that we proposed for representng tme seres data n order to make ther analyss more effcent s a natural extenson of the pecewse constant approxmaton schemes proposed earler. By applyng Vector Quantzaton to extract hgh-level features of the data and by nvolvng a multresoluton approach we were able to dentfy a vocabulary of subsequences of varous lengths and mprove performance and effcency n tme seres smlarty retreval. We were especally successful n domans where we could not acheve good results usng the Eucldean dstance as the smlarty metrc. In addton, the new representaton s very useful n summarzng tme seres by provdng typcal patterns observed at dfferent resolutons. We presented the man dea of an approach to represent tme seres along wth a new dstance functon that s better than prevous dstance functons and n addton t s fast to compute. Obvously, there are a lot of varatons of ths approach ncludng use of sldng wndows, non-rgd borders for subsequences, use of dfferent rules for assgnng weghts to dfferent resolutons, etc. These are drectons n whch ths work can be extended. Another nterestng problem s related to the sze of the codebook. When we generate the codebooks for dfferent resolutons, the sze of each codebook affects the performance of encodng. The more codewords at a gven resoluton, the better the approxmaton but the effcency of the method decreases.

11 Future studes nclude lookng nto these tradeoffs n more detal. 6. Conclusons In ths paper we ntroduced a new symbolc representaton of tme seres, MVQ, along wth a new dstance functon that s better than major compettors. By parttonng a sequence nto equal-length segments and usng vector quantzaton to represent each sequence by appearance frequences of key subsequences, MVQ provdes a more meanngful smlarty metrc for many domans, besdes the mprovement n effcency because of the dmensonalty reducton especally n the case of long sequences. Moreover, usng a multresoluton approach, MVQ can record both local and global nformaton of tme seres, whch further mproves the robustness n calculatng smlarty, requrng lttle more calculaton than a sngle resoluton approach. The expermental evaluaton of the proposed method showed that t outperforms current state-of-the-art methods n clusterng and smlarty searches. Ths s due to the followng: (a) t explots pror knowledge about the data, (b) t takes multple resolutons nto account and (c) t partally gnores the orderng of the codewords wthn the tme sequence due to the hstogram model that t uses. The proposed representaton s symbolc potentally allowng the applcaton of text-based retreval technques nto the smlarty analyss of tme seres. Moreover, due to the symbolc and multresoluton representaton the proposed approach s excellent n summarzng tme seres by provdng typcal patterns observed at dfferent resolutons. The proposed transformaton on tme seres s very fast to process long tme seres, snce the length of new representaton s only related to the sze of the codebook. The parameters of our method are easy to determne. In partcular, a general concluson from our experments s that lackng any pror knowledge equal weghts to all resoluton levels works well most of the tme. Whle the expermental results presented here manly focus on smlarty analyss, clusterng, and summarzaton, our approach can also be easly adjusted to other applcatons, such as frequent pattern retreval (.e., motf dscovery), assocaton rule mnng, and other data mnng applcatons. Acknowledgements The authors are grateful to the anonymous referees and to Eamonn Keogh for provdng helpful comments. Ths work was supported n part by NSF under Grant No. IIS , by NIH under Grant No. R01MH A1 (funded by NIMH, NINDS, and NIA) and by the Pennsylvana Department of Health. References [1] Agrawal, R., Faloutsos, C. and Swam, A.. Effcent smlarty search n sequence databases, Proceedngs of the 4th Int'l Conference on Foundatons of Data Organzaton and Algorthms. Chcago, IL, Oct 13-15, pp [2] Agrawal, R., Ln, K. I., Sawhney, H. S. and Shm, K., Fast smlarty search n the presence of nose, scalng, and translaton n tme-seres databases, Proceedngs of the 21st Int'l Conference on Very Large Databases. Zurch, Swtzerland, Sept., 1995, pp [3] Alcock R.J. and Manolopoulos Y.. "Tme-Seres Smlarty Queres Employng a Feature-Based Approach" Proceedngs of 7th Hellenc Conference on Informatcs, Ioannna, Greece, Aug , 1999, pp.iii.1-9. [4] Baeza-Yates, R.A. & Gonnet, GH.. A fast algorthm on average for all-aganst-all sequence matchng, Proceedngs of the Strng Processng and Informaton Retreval Symposum, 1999, pp [5] Betke, M., Gps, J., and Flemng, P., "The Camera Mouse: Vsual Trackng of Body Features to Provde Computer Access For People wth Severe Dsabltes." IEEE Transactons on Neural Systems and Rehabltaton Engneerng, 10:1, March 2002, pp [6] Chen, L. and Ozsu, M.T., Mult-scale hstograms for answerng queres over tme seres data, Proceedngs of the 20th Internatonal Conference on Data Engneerng, Boston, MA, 2004, p [7] Delgannaks A., Kotds, Y., and Roussopoulos, N., Compressng hstorcal nformaton n sensor networks, Proceedngs of the 2004 ACM SIGMOD Internatonal Conference on Management of Data, Pars, France, June 2004, pp [8] Goldn, D.Q. and Kanellaks, P.C. On smlarty queres for tme-seres data: Constrant specfcaton and mplementaton, Proceedngs of Constrant Programmng, Marselles, France, [9] Faloutsos, C., Jagadsh, H., Mendelzon, A. and Mlo, T., A sgnature technque for smlarty-based queres, Proceedngs of the Int'l Conference on Compresson and Complexty of Sequences. Postano-Salerno, Italy, Jun 11-13, [10] Faloutsos, C., Ranganathan, M. and Manolopoulos, Y., Fast subsequence matchng n tme-seres databases, Proceedngs of the ACM SIGMOD Int'l Conference on Management of Data. Mnneapols, MN, May 25-27, 1994, pp [11] Gavrlov, M., Anguelov, D., Indyk, P. and Motwan, R., Mnng the stock market: Whch measure s best?, Proceedngs of the Internatonal Conference on Data Mnng and Knowledge Dscovery, 2000, pp [12] Gersho, A. and Gray R. M., Vector Quantzaton and Sgnal Compresson, Kluwer Academc Publshers, [13] Gusfeld, D., Algorthms on Strngs, Trees and Sequences. Cambrdge Unversty Press, [14] Hetland, M. L., A survey of recent methods for effcent retreval of smlar tme sequences, In Mark Last, Abraham Kandel, and Horst Bunke, edtors, Data Mnng n Tme Seres Databases, World Scentfc, [15] Höppner, F., Dscovery of temporal patterns learnng rules about the qualtatve behavor of tme seres, Proceedngs of the 5th European Conference on Prncples and Practce of Knowledge Dscovery n Databases, Freburg, Germany, 2001, pp

12 [16] Huhtala, Y., Kärkkänen, J. & Tovonen, H., Mnng for smlartes n algned tme seres usng wavelets, Data Mnng and Knowledge Dscovery: Theory, Tools, and Technology, SPIE Proceedngs Seres, Vol Orlando, FL, Apr., 1999, pp [17] Kalpaks, K., Gara, D. and Puttagunta, V, Dstance Measures for Effectve Clusterng of ARIMA Tme-Seres, Proceedngs of the 2001 IEEE Internatonal Conference on Data Mnng, San Jose, CA, Nov 29-Dec 2, 2001, pp [18] Keogh, E., Chakrabart, K., Pazzan, M. and Mehrotra, S., Locally adaptve dmensonalty reducton for ndexng large tme seres databases, Proceedngs of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, May 21-24, 2001, pp [19] Keogh, E., Chakrabart, K., Pazzan, M. and Mehrotra, S., Dmensonalty Reducton for Fast Smlarty Search n Large Tme Seres Databases, Journal of Knowledge and Informaton Systems, 2001 [20] Keogh, E. & Folas, T., The UCR Tme Seres Data Mnng Archve. Rversde CA. Unversty of Calforna, Computer Scence & Engneerng Department. [21] Keogh, E. and Pazzan, M., A smple dmensonalty reducton technque for fast smlarty research n large tme seres databases, Proceedngs of the Fourth Pacfc-Asa Conference on Knowledge Dscovery and Data Mnng, Kyoto, Japan, [22] Ln, J., Keogh, E., Patel, P. and Lonard, S., Fndng motfs n tme seres, The 2nd Workshop on Temporal Data Mnng, at the 8th ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, Edmonton, Alberta, Canada, July 23-26, [23] Ln, J., Keogh, E., Lonard, S. and Chu, B., A Symbolc Representaton of Tme Seres, wth Implcatons for Streamng Algorthms, Proceedngs of the 8th ACM SIGMOD Workshop on Research Issues n Data Mnng and Knowledge Dscovery, San Dego, CA. June 13, [24] Lnde, S., Buzo, A. and Gray, A., An algorthm for vector quantzer desgn, IEEE Transactons on Communcatons, vol. 28, 1980, pp [25] Lloyd, S. P., Least squares quantzaton n PCM, IEEE Transactons on Informaton Theory, IT(28), 1982, pp [26] Megalookonomou, V., L, G., Wang, Q., "A Dmensonalty Reducton Technque for Effcent Smlarty Analyss of Tme Seres Databases", Proceedngs of the 13th ACM CIKM Internatonal Conference on Informaton and Knowledge Management, Washngton, DC, Nov. 8-13, 2004, pp [27] Park, S., Chu, W.W., Yoon, J. and Hsu, C., Effcent search for smlar subsequences of dfferent lengths n sequence databases, Proceedngs of the ICDE, 2000, pp [28] Potr Indyk, Nck Koudas, S. Muthukrshnan. Identfyng Representatve Trends n Massve Tme Seres Data Sets Usng Sketches, Proceedngs of VLDB, 2000, pp [29] Rafe, D., On smlarty-based queres for tme seres data, Proceedngs of the 15th Internatonal Conference on Data Engneerng (ICDE), Sydney, Australa, 1999, pp [30] UCI KDD Archve. [31] Wu, Y., Agrawal, D. and El Abbad, A., A comparson of DFT and DWT based smlarty search n tme-seres databases, Proceedngs of the 9th ACM CIKM Int'l Conference on Informaton and Knowledge Management. McLean, VA, Nov 6-11, 2000, pp [32] Y, B-K and Faloutsos, C., Fast Tme Sequence Indexng for Arbtrary Lp Norms, Proceedngs of the VLDB, Caro, Egypt, Sept, [33] Zhu, L., Rao, A. and Zhang A., Theory of Keyblock-based Image Retreval, ACM Transactons on Informaton Systems, 20(2), 2002, pp Appendx A. Tme seres codeword representaton Other models of smlarty In VQ-based mage retreval [33], two other models that have been proposed are the Boolean Model (BM) and the Vector Model (VM). The Hstogram Model we adopted n our methodology can be consdered as specal case of VM. For completeness we present these models n the context of tme seres analyss below: Boolean model (BM): computes the smlarty of the Boolean models of the codeword representaton of two tme seres usng the followng formula: S BM ( q, t) = n11 * w11 + n00 * w00 where n 11 s the number of dentcal ndces and n 00 s the number of ndces of the code words that do not exst n both of the representatons, whle w 11 and w 00 are the weghts assgned to these frequences. Vector Model (VM): computes the smlarty between the frequency-based representatons of two tme seres usng the followng formula: S vm ( q, t ) = s = 1 s In the above formula, f,t, denotes the frequency of codeword n tme seres t. B. Preprocessng cost: Codebook generaton In MVQ a codebook needs to be generated for each one of the multresoluton levels usng tranng data before the encodng can be performed. Let be the number of teratons n the tranng process where depends on the predefned threshold of the fractonal drop of the dstorton. Durng each teraton, every tranng vector s compared to every codeword. Snce the sze of codebook s s, and totally there are N*w tranng vectors (N s the number of tme seres n the tranng set and w s the number of fragments at the hghest resoluton of a tme seres), and c the number of resoluton levels, the tme complexty of preprocessng of a sngle level s: T(tranng) = O(c * N * w * s * ). Ths tme complexty s not so prohbtve snce tranng s done once durng preprocessng and as we showed earler the sze of the codebook needs not be large to acheve very good approxmaton usng MVQ. In the case that the data s modfed over tme there s no addtonal overhead f the dstrbutons reman the same. In the case of a decreased codebook qualty an ncremental update of the codewords need to be consdered. = 1 f, t f 2, t * * f s = 1, q f, q 2

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Boundary-Based Time Series Sorting

Boundary-Based Time Series Sorting JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, VOL. 6, NO. 3, SEPTEMBER 2008 323 Boundary-Based Tme Seres Sortng Jun-Ku L, Yuan-Zhen Wang, and Ha-Bo L Abstract In many applcatons, t s desrable

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

An Image Compression Algorithm based on Wavelet Transform and LZW

An Image Compression Algorithm based on Wavelet Transform and LZW An Image Compresson Algorthm based on Wavelet Transform and LZW Png Luo a, Janyong Yu b School of Chongqng Unversty of Posts and Telecommuncatons, Chongqng, 400065, Chna Abstract a cylpng@63.com, b y27769864@sna.cn

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Mining User Similarity Using Spatial-temporal Intersection

Mining User Similarity Using Spatial-temporal Intersection www.ijcsi.org 215 Mnng User Smlarty Usng Spatal-temporal Intersecton Ymn Wang 1, Rumn Hu 1, Wenhua Huang 1 and Jun Chen 1 1 Natonal Engneerng Research Center for Multmeda Software, School of Computer,

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Combining The Global and Partial Information for Distance-Based Time Series Classification and Clustering

Combining The Global and Partial Information for Distance-Based Time Series Classification and Clustering Paper: Combnng The Global and Partal Informaton for Dstance-Based Tme Seres Hu Zhang, Tu Bao Ho, Mao-Song Ln, and We Huang School of Knowledge Scence, Japan Advanced Insttute of Scence and Technology,

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL Nader Safavan and Shohreh Kasae Department of Computer Engneerng Sharf Unversty of Technology Tehran, Iran skasae@sharf.edu

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Video Content Representation using Optimal Extraction of Frames and Scenes

Video Content Representation using Optimal Extraction of Frames and Scenes Vdeo Content Representaton usng Optmal Etracton of rames and Scenes Nkolaos D. Doulam Anastasos D. Doulam Yanns S. Avrths and Stefanos D. ollas Natonal Techncal Unversty of Athens Department of Electrcal

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information