Using internal evaluation measures to validate the quality of diverse stream clustering algorithms

Size: px
Start display at page:

Download "Using internal evaluation measures to validate the quality of diverse stream clustering algorithms"

Transcription

1 Vetnam J Comput Sc (2017) 4: DOI /s REGULAR PAPER Usng nternal evaluaton measures to valdate the qualty of dverse stream clusterng algorthms Marwan Hassan 1 Thomas Sedl 2 Receved: 23 December 2015 / Accepted: 30 September 2016 / Publshed onlne: 14 October 2016 The Author(s) Ths artcle s publshed wth open access at Sprngerlnk.com Abstract Measurng the qualty of a clusterng algorthm has shown to be as mportant as the algorthm tself. It s a crucal part of choosng the clusterng algorthm that performs best for an nput data. Streamng nput data have many features that make them much more challengng than statc ones. They are endless, varyng and emergng wth hgh speeds. Ths rased new challenges for the clusterng algorthms as well as for ther evaluaton measures. Up tll now, external evaluaton measures were exclusvely used for valdatng stream clusterng algorthms. Whle external valdaton requres a ground truth whch s not provded n most applcatons, partcularly n the streamng case, nternal clusterng valdaton s effcent and realstc. In ths artcle, we analyze the propertes and performances of eleven nternal clusterng measures. In partcular, we apply these measures to carefully syntheszed stream scenaros to reveal how they react to clusterngs on evolvng data streams usng both k-means-based and densty-based clusterng algorthms. A seres of expermental results show that dfferent from the case wth statc data, the Calnsk-Harabasz ndex performs the best n copng wth common aspects and errors of stream clusterng for k-means-based algorthms, whle the revsed valdty ndex performs the best for densty-based ones. Keywords Stream clusterng Internal evaluaton measures Clusterng Valdaton MOA B Marwan Hassan m.hassan@tue.nl Thomas Sedl sedl@dbs.f.lmu.de 1 Archtecture of Informaton Systems Group, Endhoven Unversty of Technology, Endhoven, The Netherlands 2 Database Systems Group, LMU Munch, Munch, Germany 1 Introducton Clusterng of data objects s a well-establshed data mnng task that ams at groupng these objects. The groupng s made such that smlar objects are aggregated together n the same group (or cluster) whle dssmlar ones are grouped n dfferent clusters. In ths context, the defnton of smlarty, and thus the fnal clusterng s hghly dependent on the appled dstance functon between the data objects. Dfferent to classfcaton, clusterng does not use a subset of the data objects wth known class labels to learn a classfcaton model. As a completely unsupervsed task, clusterng calculates the smlarty between objects wthout havng any nformaton about ther correct dstrbuton (also known as the ground truth). The latter fact motvated the research n the feld of clusterng valdaton notably more than the feld of classfcaton evaluaton. It has been even stated that clusterng valdaton s regarded as mportant as the clusterng tself [32]. There are two types of clusterng valdaton [31]. The external valdaton, whch compares the clusterng result to a reference result whch s consdered as the ground truth. If the result s somehow smlar to the reference, we regard ths fnal output as a good clusterng. Ths valdaton s straghtforward when the smlarty between two clusterngs has been well-defned, however, t has fundamental caveat that the reference result s not provded n most real applcatons. Therefore, external evaluaton s largely used for synthetc data and mostly for tunng clusterng algorthms. Internal valdaton s the other type clusterng evaluaton, where the evaluaton of the clusterng s compared only wth the result tself,.e., the structure of found clusters and ther relatons to each other. Ths s much more realstc and effcent n many real-world scenaros as t does not refer to any assumed references from outsde whch s not always fea-

2 172 Vetnam J Comput Sc (2017) 4: sble to obtan. Partcularly, wth the huge ncrease of the data sze and dmensonalty as n recent applcatons wth streamng data outputs, one can hardly clam that a complete knowledge of the ground truth s avalable or always vald. Obvously, clusterng evaluaton s a stand-alone process that s not ncluded wthn clusterng task. It s usually performed after the fnal clusterng output s generated. However, nternal evaluaton methods have been used n the valdaton phase wthn some clusterng algorthms lke k- means [29], k-medods [26], EM [8] and k-center [19]. Stream clusterng deals wth evolvng nput objects where the dstrbuton, the densty and the labels of objects are contnuously changng [16]. Whether t s hgh-dmensonal stream clusterng [14,24], herarchcal stream clusterng [15,23] or sensor data clusterng [19 21], evaluatng the clusterng output usng external evaluaton measures (lke SubCMM [17,18]) requres a ground truth that s very dffcult to obtan n the above-mentoned scenaros. For the prevous reasons, we focus n ths artcle on the nternal clusterng valdaton and study ts usablty for drftng streamng data. To farly dscuss the ablty of nternal measures to valdate the qualty of dfferent types of stream clusterng algorthms. We expand the study to cover both a k-means-based stream clusterng algorthm [1] as well as a densty-based stream clusterng one [6]. Ths s manly motvated by the fact that those algorthms are good representatves of the two man dfferent categores of stream clusterng algorthms. The remander of ths artcle s organzed as follows: Sect. 2 examnes some popular crtera of decdng whether found clusters are vald, and the general procedure we used n ths artcle to evaluate stream clusterng. In Sect. 3, we lst eleven dfferent mostly used nternal evaluaton measures and shortly show how they are actually exploted n clusterng evaluaton. In Sect. 4, we ntroduce a set of thorough experments on dfferent knds of data streams wth dfferent errors to show the behavors of these nternal measures n practce wth a k-means-based stream clusterng algorthm. In addton, we nvestgate more concretely how the nternal measures react to stream-specfc propertes of data. To do ths, several common error scenaros n stream clusterngs are smulated and also evaluated wth nternal clusterng valdaton. In Sect. 5, the nternal evaluaton measures are agan used to valdate a densty-based stream clusterng. Ths s done by frst extractng a ground truth of the clusterng qualty usng external evaluaton measures and then checkng whch of the nternal measures has the hghest correlaton wth that ground truth. Fnally, n Sect. 6, we summarze the contents of ths artcle. Ths artcle further dscusses the ntal techncal results ntroduced n [22] and extends them by elaboratng the algorthmc descrpton n Sect. 2, enrchng the results n Sect. 4 and ntroducng Sect. 5 completely. 2 Internal clusterng valdaton In ths secton, we descrbe our concept of nternal clusterng valdaton and how they are realzed for exstng nternal valdaton measures. Addtonally, we wll show an abstract procedure to make use of these measures n streamng envronments n practce. 2.1 Valdaton crtera Contrary to external valdaton, nternal clusterng valdaton s based only on the ntrnsc nformaton of the data. Snce we can only refer to the nput dataset tself, nternal valdaton needs assumptons about a good structure of found clusters whch are normally gven by reference result n external valdaton. Two man concepts, the compactness and the separaton, are the most popular ones. Most other concepts are actually just combnatons of varatons of these two [34]. The Compactness measures how closely data ponts are grouped n a cluster. Grouped ponts n the cluster are supposed to be related to each other, by sharng a common feature whch reflects a meanngful pattern n practce. Compactness s normally based on dstances between n-cluster ponts. The very popular way of calculatng the compactness s through varance,.e., average dstance to the mean, to estmate how objects are bonded together wth ts mean as ts center. A small varance ndcates a hgh compactness (cf. Fg. 1). Quanttatvely, one way of calculatng the compactness usng the average dstance s explaned n Eq. 1. The Separaton measures how dfferent the found clusters are from each other. Users of clusterng algorthms are not nterested n smlar or vague patterns when clusters are not well-separated (cf. Fg. 2). A dstnct cluster that s far from the others corresponds to a unque pattern. Smlar to the compactness, the dstances between objects are wdely used to measure separaton, e.g., parwse dstances between cluster centers, or parwse mnmum dstances between objects n dfferent clusters. Separaton s an nter-cluster crteron n the sense of relaton between clusters. An example of how to quanttatvely calculate the separaton usng the average dstance s explaned n Eq. 2. Fg. 1 Clusters on the left have better compactness than the ones on the rght

3 Vetnam J Comput Sc (2017) 4: Consdered exstng nternal evaluaton measures Fg. 2 Clusters on the left have better separaton than the ones on the rght 2.2 General procedure Usng a carefully generated synthetc data set, where we know the underlyng parttonng and the dstrbuton of the data, we apply the nternal valdaton measures usng dfferent parameters of the clusterng algorthms. The target s now to observe whch of the evaluaton measures s reachng ts best value when settng the parameters of the selected clusterng algorthm to best reflect the dstrbuton of the data set. We collect the values of the nternal measures for each batch, and fnally average the values of all batches. An abstract procedure of ths process s lsted n Algorthm 1. The algorthm explans both cases of a k-means-based algorthm and a DBSCAN-based algorthm. Algorthm 1: InternalValdatonProcedure() Prepare the current stream batch from the dataset ntalze the clusterng algorthm (a k-means-based or a DBSCAN-based one); ntalze a set T of all combnatons of meanngful ranges for each parameter; foreach parameter settng ps T do Run the selected clusterng algorthm wth the parameter settng ps; foreach batch n the stream do Compute the correspondng nternal valdaton ndex of the clusterng output; end Average the clusterng qualty of the valdaton ndex over all batches from the stream; end f the current algorthm s k-means based then Check whch ndex s reachng ts best values wth the correct number of generated clusters k n the data set; end else Check whch parameter settng ps T causes best values of external evaluaton measures over the current DBSCAN-based algorthm; Check whch nternal ndex has the hghest correlaton wth the external measures w.r.t. ps ; end In ths secton, we brefly revew the most used eleven nternal clusterng measures n recent works. One can easly fgure out of each measure whch desgn crtera s chosen and how they are realzed n mathematcal form. We wll frst ntroduce mportant notatons used n the formula of these measures: D s the nput dataset, n s the number of ponts n D, g s the center of whole dataset D, P s the number of dmensons of D, NC s the number of clusters, C s the -th cluster, n s the number of data ponts n C, c s the center of cluster C, σ(c ) s the varance vector of C, and d(x, y) s the dstance between ponts x and y. For the convenence, we wll put an abbrevaton for each measure and use t through the rest of ths artcle. Frst, some measures are desgned to evaluate ether only one of compactness or separaton. The smplest one s the Root-mean-square standard devaton (RMSSTD): RMSSTD = ( x C x c 2 ) 1/2 (1) P (n 1) Ths measure s the square root of the pooled sample varance of all the attrbutes, whch measures only the compactness of found clusters [10]. Another measure whch consders only the separaton between clusters s the R- squared (RS) [10]: x D RS = x g 2 x C x c 2 x D x (2) g 2 RS s the complement of the rato of sum of squared dstances between objects n dfferent clusters to the total sum of squares. It s an ntutve and smple formulaton of measurng the dfferences between clusters. Another measure consderng only separaton s the Modfed Hubert Ɣ statstc (Ɣ) [25]: Ɣ = 2 n(n 1), j {1 NC}, = j x C y C j d(x, y) d(c, c j ) Ɣ calculates the average weghted parwse dstances between data ponts belongng to dfferent clusters by multplyng them by the dstances between the centers of ther clusters. The followng measures are desgned to reflect both compactness and separaton at the same tme. Naturally, consderng only one of the two crtera s not enough to evaluate complex clusterngs. We wll ntroduce frst the Calnsk- Harabasz ndex (CH) [5]: (3)

4 174 Vetnam J Comput Sc (2017) 4: CH = d2 (c, g)/(nc 1) x C d 2 (x, c )/(n NC) CH measures the two crtera smultaneously wth the help of average between and wthn cluster sum of squares. The numerator reflects the degree of separaton n the way of how much the cluster centers are spread, and the denomnator corresponds to compactness, to reflect how close the n-cluster objects are gathered around the cluster center. The followng two measures also share ths type of formulaton,.e., numerator-separaton/denomnator-compactness. Frst, the I ndex (I) [30]: I = ( 1 NC x D d(x, g) x C d(x, c ) max, j (4) d(c, c j )) P (5) To measure separaton, I adopts the maxmum dstance between cluster centers. For compactness, the dstance from a data pont to ts cluster center s used lke CH. Another famous measure s the Dunn s ndces (D) [9]: D = mn mn j ( mnx C,y C j d(x, y) ) max k ( maxx,y Ck d(x, y) ) (6) D uses the mnmum parwse dstance between ponts n dfferent clusters as the nter-cluster separaton and the maxmum dameter among all clusters as the ntra-cluster compactness. As mentoned above, CH, I, and D follow the form (Separaton)/(Compactness), though they use dfferent dstances and dfferent weghts of the two factors. The optmal cluster number can be acheved by maxmzng these three ndces. Another commonly used measure s Slhouette ndex (S) [33]: S = 1 NC 1 n x C b(x) a(x) (7) max[b(x), a(x)] where a(x) = n 1 1 y C,y =x d(x, y) [ ] and b(x) = mn 1 j = n j y C j d(x, y). S does not take c or g nto account and uses parwse dstance between all the objects n a cluster for numeratng compactness (a(x)). Here, b(x) measures the separaton wth the average dstance of objects to alternatve cluster,.e., second closest cluster. Daves-Bouldn ndex (DB) [7] sanold but stll wdely used nternal valdaton measure: DB = 1 NC max j = 1 n x C d(x, c ) + n 1 j x C j d(x, c j ) d(c, c j ) (8) DB uses ntra-cluster varance and nter-cluster center dstance to fnd the worst partner cluster,.e., the closest most scattered one for each cluster. Thus, mnmzng DB gves us the optmal number of clusters. The Xe-Ben ndex (XB) [35] s defned as: XB = x C d 2 (x, c ) n mn = j d 2 (c, c j ) Apparently, the smaller the values of XB, the better the clusterng qualty. Along wth DB, XB has a form of (Compactness)/(Separaton) whch s the opposte of CH, I, and D. Therefore, t reaches the optmum clusterng by beng mnmzed. It defnes the nter-cluster separaton as the mnmum square dstance between cluster centers, and the ntra-cluster compactness as the mean square dstance between each data object and ts cluster center. In the followng, we present more recent clusterng valdaton measures. The SD valdty ndex (SD) [12]: SD = NCmax Scat(NC) + Ds(NC) (10) NCmax s the maxmum number of possble clusters Scat(NC) = NC 1 σ(c ) σ(d) Ds(NC) = max, j d(c,c j ) mn, j d(c,c j ) ( j d(c, c j )) 1 SD s composed of two terms; Scat(NC) stands for the scatterng wthn clusters and Ds(NC) stands for the dsperson between clusters. Lke DB and XB, SD measures the compactness wth varance of clustered objects and separaton wth dstance between cluster centers, but uses them n a dfferent way. The smaller the value of SD, the better. A revsed verson of SD s S_Dbw [11]: S_Dbw = Scat(NC) + Dens_bw(NC) (11) Dens_bw(NC) = ( j = f (x, y) = ( max x C 1 NC(NC 1) x C C f (x,u j j ) f (x,c ), ) x C f (x,c j j ) { 0 fd(x, y) >τ, 1 otherwse. where u j s the mddle pont of c and c j, τ s a threshold to determne the neghbors approxmated by the average standard devaton of cluster centers: τ = NC 1 NC =1 σ(c ), and Scat(NC) s the same as that of SD. S_Dbw takes the densty nto account to measure the separaton between clusters. It assumes that for each par of cluster centers, at least one of ther denstes should be larger than the densty of ther mdpont to be a good clusterng. Both SD and S_Dbw ndcate the optmal clusterng when they are mnmzed. ) (9)

5 Vetnam J Comput Sc (2017) 4: Internal valdaton of stream clusterngs In ths secton, we evaluate the result of stream clusterng algorthms wth nternal valdaton measures. 4.1 Robustness to conventonal clusterng aspects The results on usng nternal evaluaton measures for clusterng statc data wth smple errors n [28] prove that the performance of the nternal measures s affected by varous aspects of nput data,.e., nose, densty of clusters, skewness, and subclusters. Each measure of the dscussed 11 evaluaton measures reacts dfferently to those aspects. We perform more complex experments than the ones n [28], thstme on stream clusterngs to see how the nternal measures behave n real-tme contnuous data. We run the CluStream [1]clusterng algorthm wth dfferent parameters, choose the optmal number of clusters accordng to the evaluaton results, and compare t to the true number of clusters. Accordng to [10], RMSSTD, RS and Ɣ have the property of monotoncty and ther curves wll have ether an upwards or a downwards tendency towards the optmum when we monotoncally ncrease (or decrease) the number of clusters (or the parameter at hand). The optmal value for each of these measures s at the shft pont of ther curves whch s also known as the elbow. Streamng data has usually complex propertes that are happenng at the same tme. The experments n [28], however, are lmted to very smple toy datasets reflectng only one clusterng aspect at a tme. To make t more realstc, we use a data stream reflectng fve conventonal clusterng aspects at the same tme Expermental settngs To smulate streamng scenaros, we use MOA (Massve Onlne Analyss) [4] framework. We have chosen Random- RBFGenerator, whch emts data nstances contnuously from a set of crcular clusters, as the nput stream generator (cf. Fg. 3). In ths stream, we can specfy the sze, densty, and movng speed of the nstance-generatng clusters, from whch we can smulate the skewness, the dfferent denstes, and the subcluster aspect. We set the parameters as follows: number of generatng clusters = 5, radus of clusters = 0.11, ther dmensonalty = 10, varyng range of cluster radus = 0.07, varyng range of cluster densty = 1, cluster speed = 0.01 per 200 ponts, nose level = 0.1, nose does not appear nsde clusters. The parameters whch are not mentoned are not drectly related to ths experment and are set to the default values of MOA. For the clusterng algorthm, we have chosen CluStream [1]wth k-means as ts macro-clusterer. We vary the parameter k from 2 to 9, where the optmal number of clusters s 5. We set the evaluaton frequency to 1000 ponts and run our Fg. 3 A screenshot of the Dmensons 1 and 2 of the synthetc data stream used n the experment. Colored ponts represent the ncomng nstances, and the colors are faded out as the processng tme passes. Ground truth cluster boundares are drawn n black crcle. Gray crclesndcate the former state, expressng that the clusters are movng. Black (faded out to gray) ponts represent nose ponts. (color fgure onlne) stream generator tll ponts, whch gves 30 evaluaton results Results Table 1 contans the mean value of 30 evaluaton results whch we obtaned n the whole streamng nterval. It shows that RMSSTD, RS, CH, I, and S_Dbw correctly reach ther optmal number of clusters, whle the others do not. Accordng to the results n [28], the optmal value of each of RMSSTD, RS, and Ɣ s dffcult to determne. For ths reason, we do not accept ther results even f some of them show a good performance. In the statc case n [28], CH and I were unable to fnd the rght optmal number of clusters. CH s shown to be vulnerable to nose, snce the nose ncluson (n cases when k < 5) makes the found clusters larger and less compact. However, n the streamng case, most clusterng algorthms follow the onlne-offlne-phases model. The onlne phase removes a lot of nose when summarzng the data nto mcroclusters, and the offlne phase (k-means n the case of CluStream [1]) deals only wth these cleaned summares. Of course, there wll be always a chance to get a summary that s completely formed of nose ponts, but those wll have less mpact over the fnal clusterng than the statc case. Thus, snce not all the nose ponts are ntegrated nto the clusters, the amount of cluster expanson s a bt smaller than the statc case.

6 176 Vetnam J Comput Sc (2017) 4: Table 1 Evaluaton results of nternal valdaton on the stream clusterngs k RMSSTD RS Ɣ CH I D S DB XB SD S_Dbw The best obtaned values for each parameter (not necessarly the maxmum or the mnmum) are n bold. The best values for RMSSTD, RS and Ɣ are selected as the frst elbow n ther monotoncally ncreasng or decreasng curves (accordng to [10]) Therefore, the effect of nose to CH s less n the streamng case than the statc one. In the statc case, I was slghtly affected by the dfferent denstes of the clusters, and the reasons were not well revealed. Therefore, t s not surprsng that I performs well as we take average of ts evaluaton results for the whole streamng nterval. The evaluaton of D dd not result wth a very useful output, snce t gves uncondtonal zero values n most evaluaton ponts (before they are averaged as n Table 1). Ths s because the numerator of Equaton (6) could be zero when at least one par of x and y happens to be equal to each other,.e., the dstance between x and y s zero. Ths case rses when C and C j are overlapped and the par (x, y) s elected from the overlapped regon. Streamng data has hgh possblty to have overlapped clusters, and so does the nput of ths experment. Ths drves D to produce zero, makng t an unstable measure n streamng envronments. Smlar to the statc data case, S, DB, XB, and SD perform bad n the streamng settngs. The man reason also les n the overlappng of clusters. Overlappng clusters are the extreme case of subclusters n the experments of the statc case dscussed n [28]. 4.2 Senstvty to common errors of stream clusterng In ths secton, we perform a more detaled study on the behavors of nternal measures n streamng envronments. The prevous experment s more or less a general test on a sngle data stream, so we use here the nternal clusterng ndces on a seres of elaborately desgned experments whch well reflects the stream clusterng scenaros. MOA framework has an nterestng tool called ClusterGenerator, whch can produce a found clusterng by manpulatng ground truth clusters wth a certan error level. It can smulate dfferent knds of error types and even combne them to construct complcated clusterng error scenaros. It s very useful snce we can test the senstvty of evaluaton measures to specfc errors [27]. Evaluatng a varaton of the ground truth seems a bt awkward n the sense of nternal valdaton snce t actually refers to the predefned result. However, ths knd of experment s absolutely meanngful, because we can watch reactons of nternal measures to some errors of nterest. [27] used ths tool to show the behavor of nternal measures, e.g., S, Sum of Squares (SSQ), and C-ndex. Although the error types exploted n [27] are lmted, those measures are not of our nterest and already proved to be bad n the prevous experments Expermental Settngs Due to the drftng nature of streamng data, certan errors are common to appear for stream clusterng algorthms. These errors are reflected by a wrong groupng of the drftng objects. The correct groupng of the objects s reflected n the orgnal data set, where we assume that the real dstrbuton of the objects (and thus the groupng) s prevously known. Ths already known assgnment of the drftng objects to ther correct clusters s called the ground truth. The closer the output of a clusterng algorthm to ths ground truth, the better ts qualty. The above-mentoned errors are the devatons of the output of clusterng algorthms from the ground truth. A good valdaton measure should be able to evaluate the amounts of these errors correctly. In the case of nternal valdaton measures, ths should be possble even wthout accessng the ground truth. To obtan a controlled amount of ths error, a smulaton of a stream clusterng algorthm s embedded n the MOA framework [4]. Ths prevous explaned smulaton, called ClusterGenerator, allows the user to control the amount of devatons from the ground truth usng dfferent parameters. The ClusterGenerator has sx error types as ts parameters, and they effectvely reflect common errors of stream clusterngs. Radus ncrease and Radus decrease change the

7 Vetnam J Comput Sc (2017) 4: Fg. 4 Common errors of stream clusterngs. A sold crcle represents a true cluster, and a dashed crclendcates the correspondng found cluster (error). The cause of the error s the fast evoluton of the stream n the drecton of the arrows. radus of clusters (Fg. 4a, b), whch normally happens n the stream clusterng snce data ponts keep fadng n and out. Thus, n Fg. 4a, for nstance, the ground truth s represented by the sold lne, and the arrows represent the drecton of the evoluton of the data n the ground truth where the cluster s shrnkng. The dashed lne represents, however, the output of the smulated clusterng algorthm usng the ClusterGenerator that suffers from the: Radus Increase error. The same explanaton apples to all other errors depcted n Fg. 4. Cluster add and Cluster remove change the number of found clusters, whch are caused by groupng nose ponts or falsely detectng meanngful patterns as a nose cloud. Cluster jon merges two overlappng clusters as one cluster (Fg. 4c), whch s a crucal error n streamng scenaros. Fnally, Poston offset changes the cluster poston, and ths commonly happens due to the movement of clusters n data streams (Fg. 4d). We perform the experments on all the above error types. We ncrease the level of one error at a tme and evaluate ts output wth CH, I and S_Dbw, whch performed well n the prevous experment. For the nput stream, we use the same stream settngs as n Sect Results In Fg. 5, the evaluaton values are plotted on the y-axs accordng to the correspondng error level on the x-axs. From Fg. 5a, we can see that CH value decreases as the level of Radus ncrease, Cluster add, Cluster jon, and Poston offset errors ncreases. CH correctly and constantly penalzes the four errors, snce smaller CH value corresponds to worse clusterng. However, t shows completely reversed curves n Radus decrease and Cluster remove errors. The reason for wrong rewardng of the Radus decrease error, s that the reducton of the sze of clusters ncreases ther compactness, and thus both CH and I ncrease. The Cluster remove error detecton s a general problem for all nternal measures as they compare ther clusterng result only to ts self. Regardless of the Radus decrease and the Cluster remove errors, CH has generally the best performance on streamng data compared to the other measures. We can see n Fg. 5b that usng I results n a msnterpretaton of the Radus decrease and Cluster remove error stuatons. The reason for t s smlar to that of CH, snce the usage of I results also n adoptng the dstance between objects wthn clusters as the ntra-cluster compactness and the dstance between cluster centers as the nter-cluster separaton. In addton, usng I wrongly favortes the Poston offset error nstead of penalzng t. If the boundares of found clusters are moved besdes the truth, they often mss the data ponts, whch produces a smlar stuaton to Radus decrease whch I s vulnerable to. S_Dbw produces hgh values when t regards a clusterng as a bad result, whch s opposte to the prevous two measures. In Fg. 5c, we can see that t correctly penalzes the three error types Radus ncrease, Cluster add, and Cluster jon. For Poston offset error, one can say that the value s somehow ncreasng but the curve s actually fluctuatng too much. It also fals to penalze Cluster remove correctly. From these results, we can determne that among the dscussed nternal evaluaton measures, CH s the best nternal evaluaton one whch can well handle many stream clusterng errors. Even though S_Dbw performs very well on the statc data (cf. [28]) and on the streamng data n the prevous experments (cf. Sect ), we observed that t has weak capablty to capture common errors of stream clusterng. 5 Internal evaluaton measures of densty-based stream clusterng algorthms In ths secton, we evaluate the performance of nternal stream clusterng measures usng a densty-based stream clusterng

8 178 Vetnam J Comput Sc (2017) 4: Fg. 5 Expermental results for each error type. Evaluaton values (y-axs) are plotted accordng to each error level (x-axs). Some error curves are drawn on a secondary axs due to ts range: a Radus decrease and Cluster remove, b Radus decrease, Cluster remove, and Poston offset. algorthm, namely DenStream [6]. We wll start by expermentng DenStream usng external evaluaton measures to get some knd of ground truth, then we wll compare the performance of the nternal evaluaton measures usng how close they are to ths ground truth. Smlar to the prevous secton, we use MOA (Massve Onlne Analyss) [4] framework for the evaluaton. Agan we have used the RandomRBFGenerator to create a 10-dmensonal dataset of 30,000 objects formng 5 drftng clusters wth dfferent and varyng denstes and szes. For DenStream [6] and MOA, we set the parameter settngs as follows: the evaluaton horzon = 1000, the outler mcrocluster controllng threshold: β = 0.15, the ntal number of objects ntponts = 1000, the offlne factor of ɛ compared to the onlne one = 2, the decayng factor λ = 0.25, and the processng speed of the evaluaton = 100. The parameters whch are not mentoned are not drectly related to ths experment and are set to the defaults of MOA.

9 Vetnam J Comput Sc (2017) 4: Dervng the ground truth usng external evaluaton measures Internal evaluaton measures do not beneft from the ground truth nformaton provded n the form of cluster label n our dataset. Ths was not a problem n the case of the k- means-based algorthm CluStream [1] dscussed n Sect. 4.1, snce the optmal parameter settng was smply k = 5 as we have generated 5 clusters. In the case of the denstybased stream clusterng algorthm DenStream [6] ths s not as straghtforward. To obtan some knd of ground truth for a densty-based stream clusterng algorthm lke DenStream, we used the results from some external evaluaton measures to derve the parameter settngs for the best and the worst clusterng results. The followng external evaluaton measures were used. The frst one s the F1 [3] measure whch s a wdely used external evaluaton that harmonzes the precson and the recall of the clusterng output. The other one s the purty measure whch s wdely used [2,6,24] to evaluate the qualty of a clusterng. Intutvely, the purty can be seen as the pureness of the fnal clusters compared to the classes of the ground truth. The average purty s defned as follows: purty = NC n d =1 n NC (12) where NC represents the number of clusters, n d denotes the number of objects wth the domnant class label n cluster C and n denotes the number of the objects n the cluster C. The thrd used external evaluaton measure s the number of clusters whch averages prevous numbers of clusters wthn the H wndow. Smlarly, the F1 and the purty are computed over a certan predefned wndow H from the current tme. Ths s done snce the weghts of the objects decay over tme. Thus, the number of found clusters could be any real value, whle F1 and purty could be any real value from 0to Results Table 2 contans the mean value of 5 evaluaton results whch we obtaned n the whole streamng nterval when consderng the external evaluaton measures: F1, purty and the number of clusters for dfferent settngs of the μ and ɛ parameters of DenStream. The bold values of each column represent the best value of the ndex among the outputs of the used parameter settngs. It s the hghest value n the case of F1 and the purty, and the closest value to 5 n the case of the number of clusters. The worst values n each column are underlned. It can be seen from Table 2 that among the Table 2 Evaluaton results of external valdaton on the stream clusterngs μ ɛ F1 Purty Number of clusters The best obtaned values for each measure are n bold, the worst ones are underlned selected 9 parameter settngs, μ = 2 and ɛ = 0.18 results n the worst clusterng output of DenStream over the current dataset whle μ = 2 and ɛ = 0.06 results n the best one. Fgure 6 depcts the external evaluaton measures values for these settngs. Our task now s to get the nternal evaluaton measure that shows the hghest correlaton wth ths result. 5.2 The results of usng nternal evaluaton measures for densty-based stream clusterng Fgures 7, 8 and 9 show the mean values of 5 evaluaton results usng all nternal evaluaton measures over the prevous parameter settngs. These results are summarzed n Table 3, where RMSSTD, RS and Ɣ are drectly excluded due to the subjectve process of defnng the frst elbow n ther monotoncally ncreasng or decreasng curves (accordng to [10]). We obtaned these results for the dfferent selected parameter settngs, and the fnal values are summarzed from the measurements n the whole streamng nterval. Table 3 shows that all the nternal evaluaton measures except for SD reach ther worst values (underlned values) exactly at the settng (μ = 2 and ɛ = 0.18). Ths shows that the results of those nternal measures are nlne wth those of the external ones w.r.t. punshng the worst settng. What s left now s to check whch of those measures reaches ts best value at the same settng where the external evaluaton measures are reachng ther best values (.e., μ = 2 and ɛ = 0.06). It can be seen from Table 3 that none of the nternal evaluaton measures s reachng the best value (n bold) at that parameter settng. We have to calculate now whch of those nternal evaluaton measures has the hghest (local) correlaton between ts

10 180 Vetnam J Comput Sc (2017) 4: =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 MnPonts = 2 MnPonts = 3 MnPonts=4 F1 Purty Num of Clusters Fg. 6 External evaluaton measures on the y-axs usng dfferent parameter settngs of DenStream on the x-axs =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 MnPonts = 2 MnPonts = 3 MnPonts=4 RMSSTD XB S_Dbw Fg. 7 Performance of the nternal evaluaton measures: RMSSTD, XBand S_Dbw on the y-axs usng dfferent parameter settngs of DenStreamon the x-axs =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 MnPonts = 2 MnPonts = 3 MnPonts=4 Gamma D SD Fg. 8 Performance of the nternal evaluaton measures: Ɣ, D and SD on the y-axs usng dfferent parameter settngs of DenStream on the x-axs. best value Vbest, and the value calculated at the best ground truth settng (μ = 2 and ɛ = 0.06), we call ths value Vtruth. Let: 9s=1 Vavg = Vs (13) 9 be the average of the values taken for each nternal measure over the each settng s of the 9 consdered parameter settngs. Our target s to get out of the 7 wnnng nternal measures n Table 3, the nternal evaluaton measure that acheves: ( ) V mn best Vtruth Vbest V avg (14)

11 Vetnam J Comput Sc (2017) 4: =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 =0.06 =0.12 =0.18 MnPonts = 2 MnPonts = 3 MnPonts=4 RS S Fg. 9 Performance of the nternal evaluaton measures: RS and S on the y-axs usng dfferent parameter settngs of DenStream on the x-axs. Table 3 Evaluaton results of nternal valdaton on the stream clusterngs Table 4 Testng the wnnng nternal measures (.e., those whose worst value n Table 3 matched the worst ground truth) μ ɛ CH I D S DB XB SD S_Dbw The best obtaned values (not necessarly the maxmum or the mnmum) are n bold Internal measure CH I D S DB XB S_Dbw V best V truth V best V avg Thetestsamngtofndwhch of those has the hghest correlaton between ts best value Vbest and ts value at the best ground truth settng Vtruth In other words, we are seekng for the measure whose Vbest has the smallest relatve devaton from Vtruth compared to ts devaton from the mean. It should be noted that the smple tendency check mentoned n Eq. 14 s relable. For a specfc measure, mnmzng the fracton mentoned n Eq. 14 mples that the numerator s consderably smaller than the denomnator. Thus, the devaton of the measure from the ground truth Vtruth s consderably smaller than the devaton from ts own mean. Thus, we can get some knd of guaranty that ths correlaton s strong enough. As we are unable to fnd an nternal measure whose Vbest = V truth, we perform ths approxmaton to fnd the one wth the closest tendency to make Vtruth ts V best. Table 4 shows that S_Dbw has the hghest correlaton between ts V S_Dbw best value and the ground truth V S_Dbw truth value. Ths s because t has the smallest V best V truth V best V avg value hghlghted n bold. Ths means that the among the tested nternal evaluaton measures, S_Dbw has shown the best results when consderng the densty-based stream clusterng algorthm DenStream [6]. Smlar to the statc data case, CH, I, DS, DB, and SD perform bad n the streamng settngs. Ths s dfferent to the k-means stream clusterng case, where CH performed the best. On the other hand, S_Dbw performed the best whch s smlar to the statc case results reported n [28]. XB worked also well. 6 Conclusons and outlook Evaluatng clusterng results s very mportant to the success of clusterng tasks. In ths artcle, we dscussed the nternal clusterng valdaton scheme n both k-means and densty-based stream clusterng scenaros. Ths s much more effcent and easer to apply n the absence of any prev-

12 182 Vetnam J Comput Sc (2017) 4: ous knowledge about the data than the external valdaton. We explaned fundamental theores of nternal valdaton measures and ts examples. In the k-means-based case, we performed a set of clusterng valdaton experments that well reflect the propertes of streamng envronment wth fve common clusterng aspects at the same tme. These aspects reflect monotoncty, nose, dfferent denstes of clusters, skewness and the exstence of subclusters n the underlyng streamng data. The three wnners from the frst expermental evaluaton were then further evaluated n the second phase of experments. The senstvty of each of those three measures was tested w.r.t. sx stream clusterng errors. Dfferent to the results ganed n a recent work on statc data, our fnal expermental results on streamng data showed that Calnsk-Harabasz ndex (CH) [5] has, n general, the best performance n k-means-based streamng envronments. It s robust to the combnaton of the fve conventonal aspects of clusterng, and also correctly penalzes the common errors n stream clusterng. In the densty-based case, we performed a set of experments over dfferent parameter settngs usng the DenStream [6] algorthm. We used external evaluaton measures to extract some ground truth. We used the ground truth to defne the best, and the worst parameter settngs. Then, we tested whch of the nternal measures has the hghest correlaton wth the ground truth. Our results showed that the revsed valdty ndex: S_Dbw [11] shows the best performance under densty-based stream clusterng algorthms. Ths s nlne wth the results reported over statc data n [28]. Addtonally, the Xe-Ben ndex (XB) [35] has shown also a good performance. In the future, we want to test those measures on dfferent categores of advanced stream clusterng algorthms lke adaptve herarchcal densty-based ones (e.g., HAStream [15]) or projected/subspace ones (e.g., PreDeConStream [24] and SubClusTree [14]). Addtonally, we want to evaluate the measures when streams of clusters avalable n subspaces [13] are processed by the above algorthms. Open Access Ths artcle s dstrbuted under the terms of the Creatve Commons Attrbuton 4.0 Internatonal Lcense ( ons.org/lcenses/by/4.0/), whch permts unrestrcted use, dstrbuton, and reproducton n any medum, provded you gve approprate credt to the orgnal author(s) and the source, provde a lnk to the Creatve Commons lcense, and ndcate f changes were made. References 1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clusterng evolvng data streams. In: VLDB, pp (2003) 2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clusterng of hgh dmensonal data streams. In: VLDB, pp (2004) 3. Assent, I., Kreger, R., Müller, E., Sedl, T.: INSCY: Indexng subspace clusters wth n-process-removal of redundancy. In: Proceedngs of the 8th IEEE Internatonal Conference on Data Mnng, ICDM 08, pp IEEE (2008) 4. Bfet, A., Holmes, G., Pfahrnger, B., Kranen, P., Kremer, H., Jansen, T., Sedl, T.: MOA: Massve onlne analyss, a framework for stream classfcaton and clusterng. JMLR 11, (2010) 5. Calnsk, T., Harabasz, J.: A dendrte method for cluster analyss. Comm. Stat. 3(1), 1 27 (1974) 6. Cao, F., Ester, M., Qan, W., Zhou, A.: Densty-based clusterng over an evolvng data stream wth nose. In: SIAM SDM, pp (2006) 7. Daves, D., Bouldn, D.: A cluster separaton measure. IEEE PAMI 1(2), (1979) 8. Dempster, A.P., Lard, N.M., Rubn, D.B.: Maxmum lkelhood from ncomplete data va the EM algorthm. J. R. Stat. Soc. Ser. B. 39(1), 1 38 (1977) 9. Dunn, J.: Well separated clusters and optmal fuzzy parttons. J. Cybern. 4(1), (1974) 10. Halkd, M., Batstaks, Y., Vazrganns, M.: On clusterng valdaton technques. J. Intell. Inf. Syst. 17(2), (2001) 11. Halkd, M., Vazrganns, M.: Clusterng valdty assessment: Fndng the optmal parttonng of a data set. In: IEEE ICDM, pp (2001) 12. Halkd, M., Vazrganns, M., Batstaks, Y.: Qualty scheme assessment n the clusterng process. In: PKDD, pp (2000) 13. Hassan, M., Km, Y., Sedl, T.: Subspace MOA: subspace stream clusterng evaluaton usng the MOA framework. In: DASFAA, pp (2013) 14. Hassan, M., Kranen, P., San, R., Sedl, T.: Subspace anytme stream clusterng. In: SSDBM, p. 37 (2014) 15. Hassan, M., Spaus, P., Sedl, T.: Adaptve multple-resoluton stream clusterng. In: MLDM, MLDM 14, pp (2014) 16. Hassan, M.: Effcent clusterng of bg data streams. PhD thess, RWTH Aachen Unversty (2015) 17. Hassan, M., Km, Y., Cho, S., Sedl, T.: Effectve evaluaton measures for subspace clusterng of data streams. In: Trends and Applcatons n Knowledge Dscovery and Data Mnng PAKDD 2013 Internatonal Workshops, pp (2013) 18. Hassan, M., Km, Y., Cho, S., Sedl, T.: Subspace clusterng of data streams: new algorthms and effectve evaluaton measures. J. Intell. Inf. Syst. 45(3), (2015) 19. Hassan, M., Müller, E., Sedl, T.: EDISKCO: Energy Effcent Dstrbuted In-Sensor-Network K-center Clusterng wth Outlers. In: Proceedngs of the 3rd Internatonal Workshop on Knowledge Dscovery from Sensor Data, SensorKDD 09, pp ACM (2009) 20. Hassan, M., Müller, E., Spaus, P., Faqoll, A., Palpanas, T., Sedl, T.: Self-organzng energy aware clusterng of nodes n sensor networks usng relevant attrbutes. In: Proceedngs of the 4th Internatonal Workshop on Knowledge Dscovery from Sensor Data, SensorKDD 10, pp ACM (2010) 21. Hassan, M., Sedl, T.: Dstrbuted weghted clusterng of evolvng sensor data streams wth nose. J. Dg. Inf. Manag. (JDIM) 10(6), (2012) 22. Hassan, M., Sedl, T.: Internal clusterng evaluaton of data streams. In: Trends and Applcatons n Knowledge Dscovery and Data Mnng PAKDD 2015 Workshop: QIMIE, Revsed Selected Papers, pp (2015) 23. Hassan, M., Spaus, P., Cuzzocrea, A., Sedl, T.: Adaptve stream clusterng usng ncremental graph mantenance. In: Proceedngs of the 4th Internatonal Workshop on Bg Data, Streams and Heterogeneous Source Mnng: Algorthms, Systems, Programmng Models and Applcatons, BgMne 2015 at KDD 15, pp (2015)

13 Vetnam J Comput Sc (2017) 4: Hassan, M., Spaus, P., Gaber, M.M., Sedl, T.: Densty-based projected clusterng of data streams. In: Proceedngs of the 6th Internatonal Conference on Scalable Uncertanty Management, SUM 12, pp (2012) 25. Hubert, L., Arabe, P.: Comparng parttons. J. Intell. Inf. Syst. 2(1), (1985) 26. Kaufman, L., Rousseeuw, P.: Clusterng by means of medods. Statstcal Data Analyss Based on the L 1 Norm, pp (1987) 27. Kremer, H., Kranen, P., Jansen, T., Sedl, T., Bfet, A., Holmes, G., Pfahrnger, B.: An effectve evaluaton measure for clusterng on evolvng data streams. In: ACM SIGKDD, pp (2011) 28. Lu, Y., L, Z., Xong, H., Gao, X., Wu, J.: Understandng of nternal clusterng valdaton measures. In: ICDM, pp (2010) 29. MacQueen, J.B.: Some methods for classfcaton and analyss of multvarate observatons. In: Proceedngs of 5th Berkeley Symposum on Mathematcal Statstcs and Probablty, volume 1, pp Unversty of Calforna Press (1967) 30. Maulk, U., Bandyopadhyay, S.: Performance evaluaton of some clusterng algorthms and valdty ndces. IEEE PAMI 24, (2002) 31. Rendón, E., Abundez, I., Arzmend, A., Quroz, E.M.: Internal versus external cluster valdaton ndexes. Int. J. Comp. Comm. 5(1), (2011) 32. Ramze Rezaee, M., Leleveldt, B.B.F., Reber, J.H.C.: A new cluster valdty ndex for the fuzzy c-mean. Pattern Recogn. Lett. 19(3 4): (1998) 33. Rousseeuw, P.: Slhouettes: a graphcal ad to the nterpretaton and valdaton of cluster analyss. J. Comput. Appl. Math. 20(1), (1987) 34. Tan, P.N., Stenbach, M., Kumar, V.: Introducton to Data Mnng. Addson-Wesley Longman, Inc. Boston (2005) 35. Xe, X.L., Ben, G.: A valdty measure for fuzzy clusterng. IEEE PAMI 13(8), (1991)

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

NIVA: A Robust Cluster Validity

NIVA: A Robust Cluster Validity 2th WSEAS Internatonal Conference on COMMUNICATIONS, Heralon, Greece, July 23-25, 2008 NIVA: A Robust Cluster Valdty ERENDIRA RENDÓN, RENE GARCIA, ITZEL ABUNDEZ, CITLALIH GUTIERREZ, EDUARDO GASCA, FEDERICO

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Optimal Fuzzy Clustering in Overlapping Clusters

Optimal Fuzzy Clustering in Overlapping Clusters 46 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 4, October 008 Optmal Fuzzy Clusterng n Overlappng Clusters Ouafa Ammor, Abdelmoname Lachar, Khada Slaou 3, and Noureddne Ras Department

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Clustering algorithms and validity measures

Clustering algorithms and validity measures Clusterng algorthms and valdty measures M. Hald, Y. Batstas, M. Vazrganns Department of Informatcs Athens Unversty of Economcs & Busness Emal: {mhal, yanns, mvazrg}@aueb.gr Abstract Clusterng ams at dscoverng

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b) 8 Clusterng 8.1 Some Clusterng Examples Clusterng comes up n many contexts. For example, one mght want to cluster journal artcles nto clusters of artcles on related topcs. In dong ths, one frst represents

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Clustering. A. Bellaachia Page: 1

Clustering. A. Bellaachia Page: 1 Clusterng. Obectves.. Clusterng.... Defntons... General Applcatons.3. What s a good clusterng?. 3.4. Requrements 3 3. Data Structures 4 4. Smlarty Measures. 4 4.. Standardze data.. 5 4.. Bnary varables..

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

FlockStream: a Bio-inspired Algorithm for Clustering Evolving Data Streams

FlockStream: a Bio-inspired Algorithm for Clustering Evolving Data Streams FlockStream: a Bo-nspred Algorthm for Clusterng Evolvng Data Streams Agostno Forestero, Clara Pzzut, Gandomenco Spezzano Insttute for Hgh Performance Computng and Networkng, ICAR-CNR Va Petro Bucc, 41C

More information

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input Real-tme Jont Tracng of a Hand Manpulatng an Object from RGB-D Input Srnath Srdhar 1 Franzsa Mueller 1 Mchael Zollhöfer 1 Dan Casas 1 Antt Oulasvrta 2 Chrstan Theobalt 1 1 Max Planc Insttute for Informatcs

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay, Vanshng Hull Jnhu Hu Suya You Ulrch Neumann Unversty of Southern Calforna {jnhuhusuyay uneumann}@graphcs.usc.edu Abstract Vanshng ponts are valuable n many vson tasks such as orentaton estmaton pose recovery

More information

An Internal Clustering Validation Index for Boolean Data

An Internal Clustering Validation Index for Boolean Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 6 Specal ssue wth selecton of extended papers from 6th Internatonal Conference on Logstc, Informatcs and Servce Scence

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information