Data & Knowledge Engineering
|
|
- Cameron Welch
- 5 years ago
- Views:
Transcription
1 Data & Knowledge Engneerng 9 (4) Contents lsts avalable at ScenceDrect Data & Knowledge Engneerng journal homepage: Edtoral WB-ndex: A sum-of-squares based ndex for cluster valdty Qnpe Zhao a,, Pas Fränt b a School of Software Engneerng, Tongj Unversty, 84, Chna b School of Computng, Unversty of Eastern Fnland, 8, Fnland artcle nfo abstract Artcle hstory: Receved 5 September Receved n revsed form May 4 Accepted July 4 Avalable onlne 7 July 4 Keywords: Cluster valdty Keywords categorzaton Short text mnng Clusterng Classfcaton and assocaton rules Determnng the s an mportant part of cluster valdty that has been wdely studed n cluster analyss. Sum-of-squares based ndces show promsng propertes n terms of determnng the. However, knee pont detecton s often requred because most ndces show monotoncty wth ncreasng. Therefore, ndces wth a clear mnmum or maxmum value are preferred. The am of ths paper s to revst a sum-ofsquares based ndex called the WB-ndex that has a mnmum value as the determned number of clusters. We shed lght on the relaton between the WB-ndex and two popular ndces whch are the Calnsk Harabasz and the Xu-ndex. Accordng to a theoretcal comparson, the Calnsk Harabasz ndex s shown to be affected by the data sze and level of data overlap. The Xu-ndex s close to the WB-ndex theoretcally, however, t does not work well when the dmenson of the data s greater than two. Here, we conduct a more thorough comparson of nternal ndces and provde a summary of the expermental performance of dfferent ndces. Furthermore, we ntroduce the sum-of-squares based ndces nto automatc keyword categorzaton, where the ndces are specally defned for determnng the. 4 Elsever B.V. All rghts reserved.. Introducton Dfferent clusterng algorthms wth varous cost functons produce dfferent solutons, and there s no sngle best clusterng algorthm for all possble data sets. A prmary task s therefore to select the best possble clusterng for a gven data set. Cluster valdty provdes a way of valdatng the qualty of clusterng algorthms and a means of dscoverng the natural structure of data sets. For most clusterng algorthms, the s a man parameter. However, the settng of the parameter s not always known and hence determnng the s essental. Cluster valdty measures can be used for determnng the. In fact, the clusterng procedure and cluster valdty are much lke the chcken-and-the-egg problem where knowng how to defne a good cluster valdty ndex requres understandng the data and the clusterng algorthm, but the clusterng algorthm s one of the prncpal tools used to understand the data [] wthout a pror nformaton. Therefore, the study of the cluster valdty s as mportant as that of the clusterng algorthms. The nternal valdty ndex, as one of the categores of cluster valdty, evaluates the results of a clusterng algorthm n terms of quanttes that nvolve the vectors of the data set themselves. A good clusterng algorthm generates clusters wth ntra-cluster homogenety, nter-cluster separaton and connectedness. One category of nternal valdty ndces s based on these propertes. Another category s based on whether the nternal ndces are appled to hard (crsp) or soft (fuzzy) clusterng. A revew of fuzzy cluster valdty ndces s avalable n []. Correspondng author. E-mal address: qnpezhao@tongj.edu.cn (Q. Zhao) X/ 4 Elsever B.V. All rghts reserved.
2 78 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) There are many examples of nternal valdty ndces based on the compactness wthn a cluster and the separaton between clusters. The sum-of-squares based ndces are founded on sum-of-squares wthn cluster (SSW) and/or sum-of-squares between clusters (SSB) values, for example, Ball and Hall [3], Hartgan [4], Calnsk and Harabasz (CH) [5] and Xu [6]. WB-ndex [7] s a sum-of-squares based ndex where a mnmum value can be attaned as the. Examples of other popular ndces are gven n [8 ]. Dunn-type ndces [8] are based on the nter-cluster dstance and dameter of a cluster hypersphere. A Dunn ndex s senstve to outlers, whereas the Daves and Bouldn ndex s defned by the average of cluster evaluaton measures for all the clusters. Xe Ben [] adopts the mnmum dstance between any par of clusters and the global average of dstances between each data object and clusters as nter- and ntra-cluster dstances, respectvely. S_Dbw [] replaces the total separaton wth the densty of data objects n the mddle of two clusters and omts the weghtng factor. A model selecton method called the Bayesan nformaton crteron (BIC) [] has been appled n model-based clusterng, but t can be adapted to partton-based clusterng [3], too. To reduce computaton on dstance calculatons n the slhouette coeffcent (SC), an approach s proposed n [4] to compute the SC quckly by decreasng the number of addton operatons. Mllgan and Cooper [5] have presented a comparson of over 3 nternal valdty ndces as stoppng crtera n herarchcal clusterng algorthms. Dmtradou et al. [6] conducted a comparson study of 5 valdty ndces on bnary data. An organzed study of 6 external valdaton measures for k-means clusterng s gven n [7,8]. A survey of cluster valdaton ndces n the analyss of post-genomc and other data famlarzes researchers wth some of the fundamental concepts behnd cluster valdaton technques [9]. Exstng studes of the comparsons of nternal valdty ndces are prmarly based on one certan clusterng algorthm, for example, herarchcal clusterng algorthm. In ths paper, we study ndces on two clusterng algorthms, the k-means and Random Swap (RS) clusterng algorthms [,]. Conventonal k-means s known to have local maxma problem because of ts ntalzaton. The RS algorthm mproves on the conventonal k-means, whch has been shown to obtan a more stable result. We chose these two clusterng algorthms to study the performance of the ndces dependng on f the clusterng results are stable or not. Besdes a thorough comparson among the ndces, we also revst the WB-ndex. We attempt to analyze the dfference and relaton between the WB-ndex and two smlar ndces (CH and Xu-ndex) theoretcally. For two-dmensonal data, the three ndces seem to work smlarly. However, accordng to the theoretcal analyss, t s shown that the CH ndex s more affected by the data sze N and more senstve to the degree of overlappng of data sets than the WB-ndex. For low dmensonal data (.e., D ), the Xu-ndex s remarkably close to the WB-ndex. However, t rarely detects a clear mnmum value for hgh dmensonal data (.e., D N ). Cluster valdty ndces are dscussed n many applcatons, such as mage segmentaton [], transactonal data [3],post-genomc data analyss [9] and text clusterng [4,5]. Wth more and more short text snppets beng generated, such as search query and results, mcroblogs, tweets and other types of comments n socal network, there s a growng nterest n mnng the short text by clusterng technques [6 8]. Short texts typcally have lmted length, pervasve abbrevatons and coned acronyms. In most cases, there s no contextual nformaton. Therefore, tradtonal clusterng technques used for mnng documents are not sutable for short text clusterng. Automatc keywords categorzaton s to cluster short texts of keywords wth a determned number of clusters. For automatc keywords categorzaton, the prmary ssues are the semantc smlarty measure for keywords and the cluster valdty n the clusterng method. Snce short text snppets lack contextual nformaton, external knowledge s ntroduced to enrch the nformaton contaned wthn short texts (for example, Wkpeda data [9,3], Google search results [3] and lexcal databases [3]). The obtaned through cluster valdty measures s a parameter showng the match between the clusterng results and users' expectaton. There s very lttle research on cluster valdty measures employng n automatc keywords categorzaton. We ntroduce a procedure for automatc keywords categorzaton. Based on the compactness wthn a class and separaton between classes, varants of the sum-of-squares based ndces that are the CH ndex, the Xu-ndex and the WB-ndex are ntroduced and compared n ths paper.. Background.. Prelmnares Determnng the (M) reles on the cluster valdty ndces. Meanwhle, the determned s used to valdate a valdty ndex. Thus, the problem of determnng the s commonly studed as a key problem n cluster valdty. In order to determne the optmal M, other parameters are fxed and the parameter M s optmzed by the valdty crtera. A procedure for determnng the optmal s gven as follows. Gven the data set X, a specfc clusterng algorthm and a fxed range of [M mn, M max ], the basc procedure nvolves:. Repeat a clusterng algorthm successvely for the M from predefned values of M mn to M max.. Obtan the clusterng results (parttons P and centrods C) and calculate the ndex value for each. 3. Select the M as M for whch the partton provdes the best result accordng to some crtera (mnmum, maxmum or knee pont). 4. Compare the detected (M ) wth external nformaton f avalable. A summary of commonly used nternal valdty ndces appears n Table. Symbol X ={x,,x N } represents the data set wth ND-dmensonal ponts, and X ¼ N x =N s the center of the entre data set. The centrods of clusters are C ={c,,c M }, where c s the th cluster and M s the. The log-lkelhood n the BIC s defned as L, and n Xe Ben u k denotes the membershp of the th pont to the kth cluster.
3 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Table Formulas for nternal ndces. Name SSW SSB Calnsk Harabasz [5] Hartgan [4] Formula N SSW ¼ x C p M SSB ¼ n c X SSB= M CH ¼ ð Þ SSW= ðn MÞ H M ¼ SSWM SSW Mþ ðn M Þ or: H M = log(ssb M /SSW M ) Krzanowsk La [33] dff M =(M ) /D SSW M M /D SSW M KL M = dff M / dff M + Ball & Hall [3] BH M = SSW M /M rffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff Xu-ndex [6] Xu ¼ Dlog SSW M = DN þ logm Dunn's ndex [8] dc ; c j ¼ mnx c;y c j kx yk damðc k Þ ¼ max x;y ck kx yk Dunn ¼ mnm mnm j¼þ dc ; c j max M k¼ dam ð c kþ Daves&Bouldn [9] SC [4] BIC [34] Xe Ben [] R j ¼ S þ S j d j n ; j where: d j = c c j, S ¼ n j¼ M DBI ¼ M R nm ax ð Þ ¼ nm x x j x j¼; j ;x j cm bx ð Þ ¼ mn kc t c m k t m x Ct sx ð Þ ¼ bx ð Þ aðx Þ maxðaðx Þ; bx ð ÞÞ N SC ¼ N sx ð Þ M BIC ¼ L N MDþ ð Þ logðn Þ N M u kk x C k k k¼ XB ¼ n o N mn kc t s t C s k x j c and, R ¼ max R j; ¼ ; ; M j¼; ;M.. Knee pont detecton The knee pont potentally ndcates the optmal, but the act of locatng the knee pont n the valdty ndex curve has not been well studed. The maxmum and mnmum values are the most straghtforward knee ponts. Indces wth clear maxmum and mnmum value are preferred. Some ndces, for example SSW and log-lkelhood, are monotonous and as such there s no clear knee pont. Other ndces mght have several local maxmum or mnmum values due to the selecton of M max. The second successve dfference between ndex values (Fg. ) can be used for knee pont detecton, although ths approach only reflects local nformaton. Other methods, such as the L-method [35], have been proposed to fnd the knee pont of the curve by examnng the boundary between the par of straght lnes that most closely ft the curve n herarchcal/segmentaton clusterng. More general methods should be used based on the global trend of the curve. 3. WB-ndex versus the Calnsk Harabasz ndex and the Xu-ndex A SSW cluster s a commonly used measure of compactness, whle a SSB cluster s a measure of separaton. Sum-of-squares-based ndces (see Table ) are manly functons of M, N, D, SSW and SSB, and they usually have a so-called elbow phenomenon (see Fg. ), where knee pont detecton s requred. The second successve dfference s commonly used for knee pont detecton. The WB-ndex [7] s defned as: WBðMÞ ¼ M SSW=SSB ðþ We revst the WB-ndex to determne the dfference among the WB-ndex and the CH ndex and the Xu-ndex based on theoretcal analyses. The CH ndex and Xu-ndex are two commonly used sum-of-squares based ndces, whch work smlarly as the WBndex.
4 8 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) x 4 BIC of S S4 5 Successve Dfference BIC value s 7.5 s local maxmas s3 s Dfference Value of BIC s s s3 s Fg.. Number of clusters versus the BIC on S S4 (see Secton 5) and ther second successve dfferences. Let us assume that cluster has n ponts and there s a ground-truth average pont for cluster, whch s x.(seefg. ). The wthncluster varance for cluster (W ) can then be reformulated as: W ¼ n kx c k ; ½; MŠ ¼ n X c ; ½; MŠ SSW SSB ðmþ ¼ X M W X M W ¼ þ W þ þ W M N B þ B þ þ B M ðþ Wth an ncrement of one cluster, the dfference of SSW/SSB can be wrtten as: Δ SSW SSB ðmþ ¼ SSW SSB ðm SSW Þ SSB ðmþ ¼ ¼ W ¼ B M W W þ W M M X þ B M þ B M W W M X M B M þ B M þ X M W þ W M ð3þ Snce M W s monotoncally decreasng and M s monotoncally ncreasng wth respect to ncreasng M, ΔSSW/SSB(M) then monotonously decreases as well. Ths result ndcates that the decrement of SSW/SSB from cluster sze M- to M s larger than that from M to M +..e., the decrement decreases wth ncreasng M (see Fg. 3). When the decrement degree of Δ(SSW/SSB) s larger than the lnear ncrement of M at the begnnng, WB decreases untl M M mn. A specal case s that Ws ncreasng for all M when M M mn. Thus, there exsts a value M such that WB(M) WB(M )form M and WB(M) b WB(M )form N M. The optmal s determned by the mnmum value of the WB-ndex. The result of the WB-ndex for data S S4 (see Secton 5 for the data) s shown n Fg. 4. Although SSW decreases monotoncally wth ncreasng M, WB-ndex has a U-shape wth clear mnma at M =5. Based on the facts presented above, we can establsh the relatonshp of the WB-ndex wth the CH ndex and the Xu-ndex.
5 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Fg.. Calculaton of W and. For the CH ndex and the WB-ndex, we have WB CH ¼ MN M ð Þ M ¼ þ ðn MÞ ð4þ M Snce M N, we obtan an upper bound that WB CH (N ). Therefore, WB ðn Þ CH ð5þ Based on Eq. (5), the WB-ndex shows a smlar trend as the nverse of the CH ndex. However, the CH ndex s affected by the data sze N. When N s large, the factor M N M plays a more mportant role than SSW/SSn the whole ndex. Consderng data wth overlappng clusters, for example S S4, whch have ncreasng degree of overlappng, the CH ndex has dffculty dealng wth hghly overlappng data such as S3 and S4. For hgher overlap, SSW/SSB contans less nformaton about M. The Xu-ndex can be wrtten as: Xu ¼ log MSSW ð ÞD= DN D= ð6þ. S S.5 SSW/SSB..5 Δ SSW/SSB(3) Δ SSW/SSB() (M) Fg. 3. Values of SSW/SSB as a functon of M for clusterng results on data S and S. ΔSSW/SSs decreasng when M s lnearly ncreasng.
6 8 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) WB Index S S S3 S Fg. 4. The WB-ndex versus the for S S4. Then, e Xu ¼ MSSW ð ÞD= M SSW SSWD= DN D= ¼ DN D= ð7þ WB SSB SSWD= ¼ DN D= WB SSB SSW D= The relaton between the Xu-ndex and the WB-ndex depends on the dmenson D. e Xu ¼ WB f ðssb; SSWÞ; D e Xu ¼ WB f ðssb; SSWÞ; D N ð8þ where f s a monotoncally ncreasng functon domnated by SSB and f s domnated by SSW, whch s monotoncally decreasng. As shown n Fg. 5, SSB has smaller effect than SSW. The nformaton about M orgnates manly from M SSW. Therefore, the Xu-ndex s close to the WB-ndex when D, but for D N, the Xu-ndex s domnated by SSW, whch rarely fnds a clear mnmum value.. M*SSW SSB WB ndex Fg. 5. Plot of M SSW and SSB and the WB-ndex for data S. SSW, SSB and the WB-ndex are normalzed to one.
7 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Fg. 6. A procedure for automatc keywords categorzaton. 4. Automatc keywords categorzaton Gven a lst of N keywords T ={T,T,,T N }, keywords categorzaton ams at clusterng the keywords nto groups. The keywords can be obtaned from several sources such as moves, servces, and tags. The generated clusters are defned as C ={c,c,,c M }. A procedure for the automatc keywords categorzaton s ntroduced n Fg. 6. There are two man aspects to consder n the procedure. The frst aspect s the smlarty measure between two keywords to obtan the smlarty matrx. Wth the smlarty matrx, clusterng algorthms such as herarchcal or spectral clusterng can be employed to obtan the clusters. The second aspect s how to determne the. Calculatng the semantc smlarty between two keywords drectly s an open queston. We use a lexcal database WordNet n ths paper. WordNet provdes a herarchy for thesaurus and a part of the WordNet herarchy by a web nterface WordVs s shown n Fg. 7. The smlarty score between two keywords can then be derved from the herarchy. For example, the smlarty of eatery and grll from Jang and Conrath [36] s.36. We defned a modfed WB-ndex n [37] for determnng the number of groups n keywords categorzaton. In ths paper, we expand our work on the CH ndex and the Xu-ndex also. The basc elements of the sum-of-squares based ndces are defned as: SSWðMÞ ¼ max t SSBðMÞ ¼ XM X M t¼ s N t max ; j mn ; j JC T ; T j T T j c t JC T ; T j T c t ;T j c s þ X jc t j¼ In SSW(M), T and T j are the th and jth keywords n cluster c. Snce t s not possble to calculate the smlarty wth only one keyword n a cluster, we sum up the wth sngle a keyword accordng to j ct j¼. Smlarly, T and T j n SSB(M) are the th and jth keywords n cluster c t and cluster c s respectvely, M s the. Therefore, the three sum-of-squares based are defned: ð9þ WB ndex ¼ M SSWðMÞ SSBðMÞ SSB= M CH ¼ ð Þ SSW= ðn M qffffffffffffffffffffffffffffffffffff Þ Xu ndex ¼ log SSW=N þ logm ðþ 5. Experments The data n the experment nclude shaped, Gaussan-lke and real data. The propertes of the data, such as name, data sze, dmenson and the for reference are summarzed n Table. The nternal ndces based on k-means and RS [] are mplemented n the C programmng language Comparsons of ndces The valdty ndces are tested wth the k-means and the RS wth repettons. The results are examned n lght p of the performance of the ndces wth dfferent clusterng algorthms usng both artfcal and real data, and M mn = and M max ¼ ffffffff N. The values of ndces n the test are computed for all clusters n [M mn,m max ]. The determned corresponds to the mnmum (Krzanowsk La, Xu, Wb, DBI, Xe Ben) or maxmum value (CH, Dunn, SC, SCI and BIC) of the ndces. For some of the ndces (Ball & Hall, Hartgan), the mnmum or maxmum value of the second successve dfference s used as a knee pont detecton method
8 84 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Fg. 7. A part of the WordNet taxonomy vsualzed through the WordVs web-nterface. We plot the performance of valdty ndces on DBI, Xe Ben and the WB-ndex wth the k-means and the RS. As shown n Fgs. 8 and 9, the valdty ndces wth the k-means rarely acheve the correct. However, there are clear mnma for ndces wth the RS. Furthermore, the ndces have hgher varance wth the k-means than the RS, so t s necessary to choose a stable algorthm n cluster valdty. Indexes usng the mn or max functon such as DBI and Xe Ben have hgh varance among repettons. On the other hand, sum-of-squares ndces such as the WB-ndex are more stable. The numbers n Tables 3, 4 and 5 are the mean values of the determned usng dfferent valdty ndces wth the RS. The average values are obtaned from repettons, except for brch. Restrcted by the runnng tme, we calculate the ndces once for brch. For unbalanced and shaped data (e.g., Aggregaton), as well as for data wth denstes (e.g., Compound), almost all of the ndces fal (see n Table 3 that the ndces obtan % correctly determned ). It s nterestng to vew the performance of the ndces for these data sets through the lens of densty-based clusterng or spectral clusterng nstead of RS, whch can be studed n Table Attrbutes of the data sets that have been used. Name Data sze Dmenson # of Clusters Shaped data sets Touchng 73 pathbased 3 3 Compound Aggregaton Gaussan-lke data sets S S4 5 5 R5 6 5 D brch, Real data sets Irs Wne Control Image Wdbc Yeast 484 8
9 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) DBI S S.5.4 S3.5 S Xe Ben WB ndex Fg. 8. Valdty ndces ncludng DBI, Xe Ben and WB-ndex wth the k-means are repeated tmes on data sets S S4. The sold lne s the mean value and the dashed lnes are the mnmum and maxmum values. the future. DBI and SC work very well for both Touchng and pathbased, whch contan connected clusters. For ths type of data sets, the performance of the WB-ndex s close to that of the Xu-ndex. However, the determned by the WB-ndex s larger than those from the Xu-ndex. Among the three ndces, the obtaned from the CH ndex s closest to the ground-truth value. The clusterng algorthms such as k-means and RS are sutable for Gaussan-lke data n general. Therefore, the performance of ndces on Gaussan-lke data s much better than that on shaped data (see Table 4). For large data sets such as brch, most of the valdty ndces produce cluster numbers close to ; however, none of them gves exactly as the. The explanaton may be that the data set brch contans the clusters n a regular grd structure. Among the sum-of-squares ndces, the Xu ndex and the WB-ndex have smlar performances. The Calnsk Harabsz, however, has worse results than the WB-ndex and the.5.4 DBI S S.4 S3.4 S Xe Ben WB ndex Fg. 9. Valdty ndces ncludng DBI, Xe Ben and WB-ndex wth the RS are repeated tmes on data sets S S4. The sold lne s the mean value and the dashed lnes are the mnmum and maxmum values.
10 86 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Table 3 Mean value of the determned by varants of cluster valdty ndces wth RS on shape data. Touchng pathbased Compound Aggregaton M M max Ball & Hall CH Hartgan Krzanowsk La Xu-ndex Wndex Dunn DBI SC SCI Xe Ben BIC Table 4 Mean values of the determned by varants of cluster valdty ndces wth RS on Gaussan-lke data. R5 D3 S S S3 S4 brch M M max Ball & Hall CH Hartgan Krzanowsk La Xu-ndex Wndex Dunn DBI SC SCI Xe Ben BIC Xu-ndex. The CH ndex s more senstve to the degree of overlappng of data sets, such as S3 and S4. The Xu-ndex, WB-ndex and SC have good performance for ths type of data set compared wth other ndces. For real data, the ndces gve more dverse results; ther performance depends strongly on the data sets. For nstance, the WBndex s the only ndex workng for Irs, but t does not work for wne, control, mage and yeast see Table 5. The numbers of clusters determned by the WB-ndex for real data sets are smaller than those by the Xu-ndex n general. Comparng the WB-ndex and the CH ndex, the WB-ndex s smlar to the CH ndex except for the Irs data. From the expermental results, t can be concluded that the sum-of-squares based ndces are mostly ft for Gaussan-lke data sets. For shape-data and real data, the performance of the ndces also depends on the clusterng algorthm, whether the structure of data sets s well detected by the algorthm. The WB-ndex, compared to the CH ndex and the Xu-ndex works qute smlarly as the other Table 5 Mean values of the determned by varants of cluster valdty ndces wth RS on real data. Irs Wne Control Image Wdbc Yeast M M max Ball & Hall Calnsk Harabsz Hartgan Krzanowsk La Xu-ndex Wndex Dunn DBI SC SCI Xe Ben BIC
11 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) Fg.. The three sum-of-squares based ndces on two manually generated data sets. The fgures n the left column are the results for data and the fgures n the rght column are the results for data. two ndces. However, the WB-ndex works slghtly better among the three ndces n general. It detects the correct result for eght out of 7 data sets and provdes the most correctly determned among the sum-of-squares ndces and other ndces (see Tables 3, 4 and 5). The CH ndex s more affected by the data sze N and s more senstve to the degree of overlappng of data sets than the WB-ndex. For two-dmensonal data, the Xu-ndex s remarkably close to the WB-ndex. However, t rarely detects clear mnmum value for hgh dmensonal, real-type data. 5.. Comparson of ndces for automatc keywords categorzaton Three data sets are nvolved n the experment. Two data sets (see Fg. ) are aggregated manually, whle four are humanly judged as the proper for both data. The other data s collected from a project MOPSI, 4 whch ncludes varous locatontagged data such as servces, photos and routes. Each servce ncludes a set of keywords to descrbe what t s. In all, 378 texts were 4
12 88 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) CH ndex maxmum.5 Xu ndex WB ndex mnmum 3.. mnmum Fg.. The V.S. the three sum-of-squares based. collected after smple cleanng and 36 of these texts are tested n the experments. The ground truth for the data sets are based on human judgment. Even human judgment often dffers from person to person n terms of the categorzatons and the ; the clusterng algorthm and valdty ndces can suggest a potentally approprate categorzaton and provde a possble as a gudelne. We studed the sum-of-squares based ndces, CH ndex, Xu-ndex and WB-ndex for automatc keywords categorzaton on the data sets. To obtan the semantc smlarty among keywords, we used JAVA API provded by WordNet 3.. Herarchcal clusterng algorthm s used n the test. The determned by three sum-of-squares based ndces on two manually generated data sets are shown n Fg.. Four s determned by the CH ndex and the WB-ndex for data (left column). For the Xu-ndex, the results four and seven are close. However, seven s detected wth a mnmum value. Smlarly, four s determned by the CH ndex and WB-ndex on data (rght column). The Xu-ndex detects fve as the wth a mnmum value. The correctness of the categorzaton on both data sets s judged by human and we beleve that the categorzaton of four as the number of groups mostly matches users' expectaton. The three ndces are also compared wth the mops data n Fg.. There s a clear maxmum value of the CH ndex at. The values of the Xu-ndex on the from 7 to are exactly the same. Therefore, t s dffcult to recognze whch number s the mnmum value for the potental number of groups. For the WB-ndex, the mnmum value s detected when the number of clusters s. In general, the performance of the ndces on the mops data s smlar as that on the manually generated data. Comparng the result from the ndces to the ground truth from human judgment, the ndces cannot provde the exactly correct number of clusters; however, they are able to provde suggestons for users. In some real applcatons, the suggestons mght be helpful to the clusterng algorthms. 6. Conclusons In ths paper, we revst a sum-of-squares based ndex, the WB-ndex. Ths ndex s desgned to reach ts mnmum value when the approprate s acheved. There are two smlar sum-of-squares based ndces, whch are the CH and the Xu-ndex. To study the dfference among the three ndces, we perform an analyss of the relaton between the WB-ndex and the other two ndces. It s shown that the CH ndex s affected by the data sze (N) and hgh levels of cluster overlap. For hgh dmensonal data where D N, the Xu-ndex does not work as well as for data wth D. Furthermore, a systematc experment on nternal wth two clusterng algorthms and varants of data sets was conducted to study the effect of clusterng algorthms and data sets on the ndces. The s used for valdatng the result. Accordng to the expermental result, a good and stable clusterng algorthm should be selected for obtanng correct number of clusters from the ndces. Sum-of-squares based ndces work well for Gaussan-type data, although most of them cannot provde a global mnmum or maxmum pont for the correct. The WB-ndex works slghtly better than the other two ndces. The sum-of-squares ndces (e.g., the Wndex) are also more stable than ndces employng mn max functons (e.g., DBI). In addton to the comparsons of ndces, we ntroduce a procedure for performng automatc keywords categorzaton, where texts wth multple keywords are consdered and extensons of three sum-of-squares based ndces are employed for determnng the number of groups. The ndces are compared n the experments and they are shown to be vald for automatc keywords categorzaton. Acknowledgments One of the authors s sponsored by the Fundamental Research Funds for the Central Unverstes and the Natonal Natural Scence Foundaton Commttee of Chna under contract no. 638.
13 Q. Zhao, P. Fränt / Data & Knowledge Engneerng 9 (4) References [] R. Caruana, M. Elhawary, N. Nguyen, C. Smth, Meta clusterng, Sxth Int. Conf. on Data Mnng (ICDM'6), 6, pp [] W. Wang, Y. Zhang, On fuzzy cluster valdty ndces, Fuzzy Sets Syst. 58 (9) (7) [3] G. Ball, D. Hall, ISODATA, a novel method of data analyss and pattern classfcaton, Menlo Park, Calf, Stanford Research Insttute, 965. [4] J. Hartgan, Clusterng algorthms, John Wley & Sons, Inc., New York, NY, USA, 975. [5] T. Calnsk, J. Harabasz, A dendrte method for cluster analyss, Commun. Stat. 3 (974) 7. [6] L. Xu, Bayesan yng-yang machne, clusterng and, Pattern Recogn. Lett. 8 (997) [7] Q. Zhao, M. Xu, P. Fränt, Sum-of-square based cluster valdty ndex and sgnfcance analyss, Proc. of the 7th Int. Conf. on Adaptve and Natural Computng Algorthms, 9, pp [8] J. Dunn, Well separated clusters and optmal fuzzy parttons, J. Cybern. 4 (974) [9] D. Daves, D. Bouldn, Cluster separaton measure, IEEE Trans. Pattern Anal. Mach. Intell. () (979) [] X.L. Xe, G. Ben, A valdty measure for fuzzy clusterng, IEEE Trans. Pattern Anal. Mach. Intell. 3 (8) (99) [] M. Halkd, M. Vazrganns, Clusterng valdty assessment: fndng the optmal parttonng of a data set, Proc. of the IEEE Int. Conf. on Data Mnng (ICDM'),, pp [] S. Stll, W. Balek, How many clusters? An nformaton theoretc perspectve, Neural Comput. 6 () (4) [3] D. Pelleg, A. Moore, X-means: extendng K-means wth effcent estmaton of the, Proc. of the 7th Int. Conf. on Machne Learnng,, pp [4] M. Zoub, M. Raw, An effcent approach for computng slhouette coeffcents, J. Comput. Sc. 4 (3) (8) [5] G. Mllgan, M. Cooper, An examnaton of procedures for determnng the n a data set, Psychometrka 5 (985) [6] E. Dmtradou, S. Dolncar, A. Wengassel, An examnaton of ndexes for determnng the n bnary data sets, Psychometrka 67 () () [7] J. Wu, H. Xong, J. Chen, Adaptng the rght measures for k-means clusterng, 5th ACM SIGKDD Conference on Knowledge Dscovery and Data Mnng (KDD'9), 9, pp [8] J. Wu, J. Chen, H. Xong, M. Xe, External valdaton measures for k-means clusterng: a data dstrbuton perspectve, Expert Syst. Appl. 36 (3) (9) [9] J. Handl, J. Knowles, D. Kell, Computatonal cluster valdaton n post-genomc data analyss, Bonformatcs (5) (5) 3 3. [] P. Fränt, J. Kvjärv, Randomsed local search algorthm for the clusterng problem, Pattern. Anal. Applc. 3 (4) () [] P. Fränt, M. Tuononen, O. Vrmajok, Determnstc and randomzed local search algorthms for clusterng, IEEE Int. Conf. on Multmeda and Expo, [] H.L. Captane, C. Frelcot, On selectng an optmal for color mage segmentaton, Int. Conf. on, Pattern Recognton (ICPR'),, pp [3] H. Yan, K. Chen, L. Lu, J. Bae, Determnng the best k for clusterng transactonal datasets: a coverage densty-based approach, Data Knowl. Eng. 68 (9) [4] G. Mecca, S. Raunch, A. Pappalardo, A new algorthm for clusterng search results, Data Knowl. Eng. 6 (7) [5] D. Ingaramo, D. Pnto, P. Rosso, M. Errecalde, Evaluaton of nternal valdty measures n short-text corpora, CICLng'8, [6] X. N, X. Quan, Z. Lu, W. Lu, B. Hua, Short text clusterng by fndng core terms, Knowl. Inf. Syst. 7 (3) () [7] X. Yan, J. Guo, S. Lu, X. Cheng, Y. Wang, Clusterng short text usng Ncut-weghted non-negatve matrx factorzaton, Proc. of the st ACM Intl. Conf. on Informaton and Knowledge Management,, pp [8] P. Shrestha, C. Jacqun, B. Dalle, Clusterng short text and ts evaluaton, Computatonal Lngustcs and Intellgent Text Processng,, pp [9] S. Banerjee, K. Ramanathan, A. Gupta, Clusterng short texts usng Wkpeda, Proc. of the 3th Annual Intl. ACM SIGIR Conf. on Research and Development n Informaton Retreval, 7, pp [3] X. Hu, X. Zhang, C. Lu, E. Park, X.Zhou, Explotng Wkpeda as external knowledge for document clusterng, Proc. of the 5th ACM SIGKDD Intl. Conf. on Knowledge Dscovery and Data Mnng, 9, pp [3] D. Bollegala, Y. Matsuo, M. Ishzuka, Measurng semantc smlarty between words usng web search engnes, Proc. of the 6th Intl. Conf. on World Wde Web, 7, pp [3] C. Chen, F. Tseng, T. Lang, An ntegraton of WordNet and fuzzy assocaton rule mnng for mult-label document clusterng, Data Knowl. Eng. 69 () 8 6. [33] W. Krzanowsk, Y. La, A crteron for determnng the number of groups n a data set usng sum-of-squares clusterng, Bometrcs 44 () (988) [34] Q. Zhao, M. Xu, P. Fränt, Knee Pont Detecton on Bayesan Informaton, Crteron, ICTAI'8, [35] S. Salvador, P. Chan, Determnng the /segments n herarchcal clusterng/segmentaton algorthms, Proc. of the 6th IEEE Int. Conf. on Tools wth Artfcal Intellgence, 4, pp [36] J.J. Jang, D.W. Conrath, Semantc smlarty based on corpus statstcs and lexcal taxonomy, Proc. of Int. Conf. on Research n Computatonal Lngustcs, 997, pp [37] Q. Zhao, M. Rezae, H. Chen, P. Fränt, Keyword clusterng for automatc categorzaton, IEEE Int. Conf. on Pattern Recognton,, pp Qnpe Zhao receved the B.Sc. degree n Automaton Technology from X'dan Unversty, X'an, Chna n 4. She receved the M.Sc. degree n Pattern Recognton and Image Processng n Shangha Jaotong Unversty, Shangha, Chna n 7. She obtaned the Ph.D. degree n Computer Scence at the Unversty of Eastern Fnland n. Her current research s focused on clusterng algorthm and multmeda processng. Pas Fränt receved hs MSc and Ph.D. degrees from the Unversty of Turku, 99 and 994 n Scence. Snce, he has been a professor of Computer Scence at the Unversty of Eastern Fnland. He has publshed 64 journals and 4 peer revew conference papers, ncludng 3 IEEE transacton papers. Hs current research nterests nclude clusterng algorthms and locaton-aware applcatons. He has supervsed 9 Ph.D. theses and s the head of the East Fnland doctoral program n Computer Scence & Engneerng (ECSE).
Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationA New Approach For the Ranking of Fuzzy Sets With Different Heights
New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationAn Internal Clustering Validation Index for Boolean Data
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 6 Specal ssue wth selecton of extended papers from 6th Internatonal Conference on Logstc, Informatcs and Servce Scence
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationA Deflected Grid-based Algorithm for Clustering Analysis
A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationClustering validation
MOHAMMAD REZAEI Clusterng valdaton Publcatons of the Unversty of Eastern Fnland Dssertatons n Forestry and Natural Scences No 5 Academc Dssertaton To be presented by permsson of the Faculty of Scence and
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationAvailable online at Available online at Advanced in Control Engineering and Information Science
Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced
More informationOptimal Fuzzy Clustering in Overlapping Clusters
46 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 4, October 008 Optmal Fuzzy Clusterng n Overlappng Clusters Ouafa Ammor, Abdelmoname Lachar, Khada Slaou 3, and Noureddne Ras Department
More informationDetermining Fuzzy Sets for Quantitative Attributes in Data Mining Problems
Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also
More informationRelated-Mode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationPositive Semi-definite Programming Localization in Wireless Sensor Networks
Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationA PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION
1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute
More informationNAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics
Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationUsing internal evaluation measures to validate the quality of diverse stream clustering algorithms
Vetnam J Comput Sc (2017) 4:171 183 DOI 10.1007/s40595-016-0086-9 REGULAR PAPER Usng nternal evaluaton measures to valdate the qualty of dverse stream clusterng algorthms Marwan Hassan 1 Thomas Sedl 2
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationAvailable online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )
Avalable onlne at www.scencedrect.com ScenceDrect Proceda Envronmental Scences 26 (2015 ) 109 114 Spatal Statstcs 2015: Emergng Patterns Calbratng a Geographcally Weghted Regresson Model wth Parameter-Specfc
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationNetwork Intrusion Detection Based on PSO-SVM
TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*
More informationFuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval
Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,
More informationDocument Representation and Clustering with WordNet Based Similarity Rough Set Model
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationA Robust Method for Estimating the Fundamental Matrix
Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.
More informationCollaboratively Regularized Nearest Points for Set Based Recognition
Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,
More informationOn the Efficiency of Swap-Based Clustering
On the Effcency of Swap-Based Clusterng Pas Fränt and Oll Vrmaok Department of Computer Scence, Unversty of Joensuu, Fnland {frant, ovrma}@cs.oensuu.f Abstract. Random swap-based clusterng s very smple
More informationAn Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem
An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r
More informationClustering algorithms and validity measures
Clusterng algorthms and valdty measures M. Hald, Y. Batstas, M. Vazrganns Department of Informatcs Athens Unversty of Economcs & Busness Emal: {mhal, yanns, mvazrg}@aueb.gr Abstract Clusterng ams at dscoverng
More informationA Clustering Algorithm for Key Frame Extraction Based on Density Peak
Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao
More informationThree supervised learning methods on pen digits character recognition dataset
Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationFrom Comparing Clusterings to Combining Clusterings
Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15
CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationAn Improved Image Segmentation Algorithm Based on the Otsu Method
3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationIncremental Learning with Support Vector Machines and Fuzzy Set Theory
The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationOutline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:
Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A
More informationVirtual Machine Migration based on Trust Measurement of Computer Node
Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationBAYESIAN MULTI-SOURCE DOMAIN ADAPTATION
BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,
More informationREFRACTIVE INDEX SELECTION FOR POWDER MIXTURES
REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES Laser dffracton s one of the most wdely used methods for partcle sze analyss of mcron and submcron sze powders and dspersons. It s quck and easy and provdes
More informationWeb Document Classification Based on Fuzzy Association
Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu
More informationA Cluster Number Adaptive Fuzzy c-means Algorithm for Image Segmentation
, pp.9-204 http://dx.do.org/0.4257/jsp.203.6.5.7 A Cluster Number Adaptve Fuzzy c-means Algorthm for Image Segmentaton Shaopng Xu, Lngyan Hu, Xaohu Yang and Xaopng Lu,2 School of Informaton Engneerng,
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationK-means and Hierarchical Clustering
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationAdaptive Transfer Learning
Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationStudy of Data Stream Clustering Based on Bio-inspired Model
, pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,
More informationPRÉSENTATIONS DE PROJETS
PRÉSENTATIONS DE PROJETS Rex Onlne (V. Atanasu) What s Rex? Rex s an onlne browser for collectons of wrtten documents [1]. Asde ths core functon t has however many other applcatons that make t nterestng
More informationProper Choice of Data Used for the Estimation of Datum Transformation Parameters
Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationResearch on Categorization of Animation Effect Based on Data Mining
MATEC Web of Conferences 22, 0102 0 ( 2015) DOI: 10.1051/ matecconf/ 2015220102 0 C Owned by the authors, publshed by EDP Scences, 2015 Research on Categorzaton of Anmaton Effect Based on Data Mnng Na
More informationObject-Based Techniques for Image Retrieval
54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the
More informationAn Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method
Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More information