Clustering of Words Based on Relative Contribution for Text Categorization

Size: px
Start display at page:

Download "Clustering of Words Based on Relative Contribution for Text Categorization"

Transcription

1 Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that the groups can be used as the dmensons of the vector space n the text categorzaton. We proposed two new smlarty crterons, whch consder the relatve contrbuton of one feature for one category relatve to other categores and the dfference between relatve contrbutons of two features. We used the proposed methods n herarchcal clusterng algorthm, and generated a compact and effcent representaton of documents. The proposed methods are evaluated on three benchmark corpora (20-newgroups, reuters and ndustry sector), combned wth two classfcaton algorthms (Support Vector Machnes, K-Nearest Neghbor), and compared wth three smlarty measures (weghted average KL dvergence, Cty-block, Eucldean). The expermental results ndcated that the performance of the proposed methods are comparatve wth the other methods when Support Vector Machnes s used; the proposed methods sgnfcantly outperform Eucldean and Cty-block and acheve comparatve performance wth weghted average KL dvergence when K-Nearest Neghbor classfer s used. Index Terms term clusterng, smlarty measure, text categorzaton, relatve contrbuton I. INTRODUCTION A utomatc document categorzaton, whch assgns the predefned categores to a new text document, s an mportant tool for people to organze the vast amount of Ths research s supported by Natonal Natural Scence Foundaton of Chna under Grant no Je-Mng Yang s wth College of Informaton Engneerng, Northeast Danl Unversty, Jln, Jln, Chna (e-mal: yjmlzy@gmal.com). Zh-Yng Lu s wth College of Informaton Engneerng, Northeast Danl Unversty, Jln, Jln, Chna (e-mal: yjmlzy@163.com). Zhao-Yang Qu s wth College of Informaton Engneerng, Northeast Danl Unversty, Jln, Jln, Chna (e-mal: qzywww@mal.nedu.edu.cn) dgtal data n the Internet [1]. The raw documents cannot be drectly fed nto a classfer, so they must be transformed nto a unform representaton form based on ther content [2]. Many experments show that the document representaton approach based on bag of words (BOW) s a sophstcated one [2]. A major characterstc of text categorzaton s the hgh dmensonalty of the feature vector space, whch can be tens or hundreds of thousands of terms for even a moderated sze dataset [3, 4]. It s a bg hurdle n applyng many sophstcated learnng algorthms to the text categorzaton [5]. Another major characterstc of text categorzaton are the hgh level of feature redundancy, feature rrelevance [4] and sparseness problem [6]. These characterstcs not only hnder the classfcaton process and hurt the performance of the classfer but also brng about over-fttng. Because of ths, the dmensonalty reducton s used to reduce the sze of the feature vector space [2]. The term clusterng s one of the dmensonalty reducton methods [2, 7]. It can create a new, reduced-sze feature space by groupng words wth hgh smlarty [1], and many words can be replaced wth the centrod or representatve feature of the correspondng word cluster. There are three key benefts of the term clusterng: (1) the features that have correlatons on the class labels assgned to documents are consdered as a new feature n the reduced-sze feature space. (2) The term clusterng can result n hgher classfcaton accuracy. (3) The term clusterng can provde a good soluton to the sparseness problem and generate extremely compact representatons [1, 6, 7]. The crucal stage of term clusterng s how to measure the smlarty of the terms [8]. The measurement of the smlarty between two elements s essental to the most clusterng procedures. Based on the smlarty, one can decde whch clusters should be combned or where a cluster should be splt. McGll [9] lsted more than 60 dfferent smlarty functons. The qualty of clusterng depends on whether the smlarty metrc s approprate or not. There are many smlarty measurements whch are wdely used n

2 clusterng, such as weghted average KL dvergence [1], Eucldean metrc [10, 11], Cty-block metrc [12]. However, not all proxmty measures are applcable n each envronment [13]. Two new smlarty measures are proposed for term clusterng n ths paper. The contrbuton of a feature to one category s represented by the sum of term frequency occurred n t. The relatve contrbuton of a feature occurrng n one category relatve to other categores s consdered. If the relatve contrbutons of one feature occurrng n each category are same as that of the other one, there exsts the hghest smlarty between these two features n terms of contrbuton to categorzaton. The dfference between the relatve contrbutons of two features s regarded as a new smlarty measure. To evaluate the proposed method, we used two classfcaton algorthms, Support Vector Machnes (SVM) and K-nearest neghbors (KNN) on three benchmark text corpora (20-newsgroups, Reuters and Industry Sector) and compared t wth three well-known smlarty measures (weghted average KL dvergence, Cty-block, Eucldean). The experments show that the performance of the proposed methods are comparatve wth the others when Support Vector Machnes s used; the proposed methods sgnfcantly outperform Eucldean and Cty-block and acheve comparatve performance wth weghted average KL dvergence when K-Nearest Neghbor classfer s used. II. RELATED WORK Much study has been devoted to word clusterng for text categorzaton n recent years. In ths secton we overvew the results whch are most relevant to our study. Word clusterng s frstly nvestgated and used n text categorzaton by Lews [14]. Lews used recprocal nearest neghbor (RNN) clusterng for clusterng terms. The recprocal nearest neghbor clusterng conssts of two tems, one s the nearest neghbor of the other one accordng to the smlarty measure. Lews chose a probablstc approach to text categorzaton and hs results were nferor to those obtaned by word ndexng. Baker & McCallum [1] ntroduced the dstrbutonal clusterng to document classfcaton wth a Naïve Bayes classfer. Dffered from other smlarty metrcs, the dstrbutonal clusterng calculated the probablty dstrbuton over the class ntroduced by the dfferent words to be clustered. The Kullback-Lebler dvergence, whch s an nformaton theoretc measurement, was used to exactly measure the dfference between two probablty dstrbutons. Baker and McCallum found that the dstrbutonal clusterng s better than feature selecton wth regard to preservng the nformaton contaned n redundant features. The expermental results showed that the categorzaton based on word clusterng can mantan good performance and keep a sgnfcantly more compact representaton. III. ALGORITHM DESCRIPTION A. Motvaton Baker and McCallum [1] consdered the probablty dstrbuton of a partcular word w t over classes C can be descrbed as P(C w t). When word w t and w s are clustered together, the new dstrbuton s the weghted average of the dstrbuton of word w t and w s. Table I lsts the term frequences of three features n 10 categores and the class probablty dstrbutons for these features. The numbers n the parentheses are the class probablty dstrbutons. The class dstrbuton curves of three features are shown n Fg 1. The horzontal axs s the lst of class labels. The vertcal axs ndcates the probablty of a term over each class. The curve n Fg 1 can be nterpreted as the probablty dstrbuton of a feature aganst classes. It can be seen from Fg 1 that the shape of the probablty dstrbuton of feature 1 s qute smlar to that of feature 2 and dssmlar to that of feature 3. Thus the feature 1 and feature 2 can be clustered together accordng to the dea of dstrbutonal clusterng. The probablty dstrbutons of words over each class are consdered as the smlarty metrc n dstrbutonal clusterng of words [1]. Inspred by the probablty dstrbutons of one word over each class, we beleved the dfferences of term frequences of a feature occurrng n varous categores could be regarded as the level of the contrbutons of the feature to each class n the context of document classfcaton. The term frequences of a feature occurrng n varous categores are the ponts n a two-dmensonal space, whch conssts of the term frequency and class label. Smlar to the class probablty dstrbuton curve n the dstrbutonal clusterng, the term frequences of a feature occurrng n varous categores are joned and form a contrbuton lne n term frequency aganst class label plane. The contrbuton curves of three features lsted n Table I are shown n Fg 2. The horzontal axs has tcks for lst of class labels. The vertcal axs

3 ndcates the term frequences of a feature occurrng n varous classes. It can be seen from Fg 2 that the dfferences among relatve contrbutons of three features are equal to zero f ther postons n two-dmensonal space are not consdered. In respect of the document classfcaton, the relatve contrbutons of three features to each class are dentcal. Therefore, these features should be clustered together. TABLE I THE TERM FREQUENCIES AND THE PROBABILITIES OF THREE FEATURES OVER 10 CATEGORIES. C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Feature 1 115(0.2174) 103(0.1947) 104(0.1966) 200(0.3781) 190(0.3592) 109(0.2060) 114(0.2155) 103(0.1947) 107(0.2023) 101(0.1909) Feature 2 85(0.2191) 73(0.1881) 74(0.1907) 170(0.4381) 160(0.4124) 79(0.2036) 85(0.2191) 74(0.1907) 75(0.1933) 73(0.1881) Feature 3 15(0.0605) 3(0.0121) 4(0.0161) 100(0.4032) 90(0.3629) 9(0.0363) 15(0.0605) 4(0.0161) 5(0.0202) 3(0.0121) Fg 1. The probablty dstrbuton curves of feature 1, 2 and 3 over varous categores. Fg 2. The contrbuton curve of feature 1, 2 and 3 for classfcaton. Fg 3. Two pars of contrbuton curves, one s smlar and the other s dssmlar.

4 B. Smlarty measurement It s assumed that the contrbuton curves of two features can be descrbed as two vector X = {x 1, x 2,, x n} and Y = {y 1, y 2,, y n}; x and y, [1, n], n 2 are the term frequences of two dfferent features occurrng n category c, respectvely. We beleve that f the dstances (d = x y ) between correspondng elements of two vectors are equal or approxmately equal, the relatve contrbuton of the two features for classfcaton should be smlar to each other. Fg 3 shows two pars of contrbuton curves of two features. d, [1, 10], s the dstance between correspondng elements of two vectors. The two contrbuton curves drew n Fg 3 (a) are smlar to each other because of that the dstances between correspondng elements of two vectors are equal; the two curves drew n Fg 3 (b) are not smlar to each other because of that d 4 and d 5 are sgnfcantly greater than others. In ths paper, we proposed two approaches to measure the smlarty between contrbuton curves of two features. In the frst approach, we use the varance of the dfferences between correspondng elements n two vectors to measure the smlarty between two contrbuton curves, called Relatve Contrbuton 1. The smaller the varance s, the more smlar the relatve contrbutons of two features are. Specally, f the varance s equal to zero, the relatve contrbuton of two features are dentcal. The measure of the smlarty between two vector X = {x 1, x 2,, x n} and Y = {y 1, y 2,, y n} correspondng to two features s lsted as follows: d 1 1 ( X, Y n n ) 1 x y x y n 1 n 1 In the second approach, we use the dfference between two adjacent elements n one vector as the relatve contrbuton of these elements to categorzaton, called Relatve Contrbuton 2. We frstly calculate the dfference between two adjacent elements n one vector, and then calculate the dfference between two vectors. The measure of the smlarty between two vector X = {x 1, x 2,, x n} and Y = {y 1, y 2,, y n} correspondng to two features s lsted as follows: 1 d( X, Y) x x y y n n C. Document representaton After all the terms are grouped nto K clusters, every document n the tranng set and test set wll be represented as a vector n whch the words occurrng n the document wll be mapped nto K clusters. For nstance, Table II s the lst of words and ther term frequences n one document. Table III ndcates 10 clusters created by them. In our experment, the document s represented as a vector, where the element value s the sum of term frequences of terms ncluded n the same cluster. Thus, the document lsted n Table II can be represented by {6, 13, 6, 1, 7, 2, 1, 2, 3, 2}. IV. EXPERIMENTAL SETUP A. Datasets and preprocessng For the document data whch s expected to be avalable for term clusterng, t s frst converted nto a term lst. In the experment, the stop-words are removed and the stemmer program processes the dfferent forms of the same word root [15]. We used a stop-words lst, whch contans 571 words. For the stemmer program, we used the Porter s stemmng algorthm ( The detals are shown as follows: 1. Identfy all words n the tranng set and make a lst of terms. 2. Remove stop-words and apply the stemmng algorthm to each word n the lst. 3. Buld a matrx M: each column (j) s a category n the tranng set, and each row () s a term n the lst. The M(,j) s the sum of term frequency of the th term occurs n the jth category. In order to evaluate the performance of the proposed method, three benchmark datasets - 20-Newsgroups, Reuters and Industry Sector - were used n our study. The 20-Newgroups were collected by Ken Lang (1995) and has become one of the standard corpora for text categorzaton. It contans newsgroup postngs, and all documents were evenly assgned to 20 dfferent UseNet groups. In our study, we only consder three categores such as talk.poltcs.guns, talk.poltcs.mdeast and talk.poltcs.msc. We gnore the UseNet header and only consder the content of the document when tokenzng the document.

5 TABLE II LIST OF WORDS AND THEIR TERM FREQUENCIES IN ONE DOCUMENT. word number word number word number hong 2 share 3 own 1 kong 2 total 1 fund 1 frm 2 outstandng 1 publcly 1 stake 3 common 2 held 1 washngton 1 stock 2 zealand 1 ndustral 2 flng 1 company 1 equty 2 securty 1 bought 2 pacfc 1 exchange 1 dsclose 1 nvestment 2 commsson 1 earler 1 rase 1 prncpally 1 month 1 TABLE III LIST OF 10 CLUSTERS CREATED BY THE WORDS OCCURRING IN TABLE II. THE NUMBER IN THE PARENTHESES IS THE SUM OF TERM FREQUENCIES OF WORDS OCCURRING IN THE SAME CLUSTER. C-1(6) C-2(13) C-3(6) C-4(1) C-5(7) C-6(2) C-7(1) C-8(2) C-9(3) C-10(2) hong stake company outstandng common month total dsclose held prncpally kong exchange own publcly earler securty rase commsson zealand nvestment frm pacfc flng washngton fund ndustral share stock bought equty The Reuters corpus contans stores taken from the Reuters newswre. All stores are non-unformly dvded nto 135 categores. In ths paper, we only consder the top 10 categores such as Earn, Acquston, Money-fx, Gran, Crude, Trade, Interest, Wheat, Shp and Corn. There are 9982 documents n top 10 categores. The Industry Sector corpus, made avalable by Market Gude Inc. ( conssts of company web pages classfed n a herarchy of ndustry sectors [16]. There are 6440 web pages attached to 12 root classes and then herarchcally parttoned nto 71 classes. In ths paper, we only consder 7 root classes, such as captal.goods.sector, conglomerates.ndustry, consumer.non-cyclcal.sector, energy.sector, healthcare.sector, transportaton.sector and utltes.sector. B. Clusterng algorthm Clusterng [17, 18], whch s a method of unsupervsed learnng, s a common technque for statstcal data analyss appled n many felds, such as machne learnng, data mnng, pattern recognton and nformaton processng. There exst many dfferent approaches for clusterng, such as Herarchcal Clusterng, Nearest Neghbor Clusterng, K-means Clusterng [19] and Expectaton Maxmzaton, and so on. The parttons generated by herarchcal clusterng algorthm are more versatle than those generated by other clusterng algorthms [20]. In cluster-based document retreval, the herarchcal clusterng algorthm performed better than other clusterng algorthms [20]. The results of herarchcal clusterng are usually presented n a dendrogram whch represents the nested groupng of data and smlarty levels at whch groupngs change [20]. In herarchcal clusterng, the number of cluster need not be known n advance and can be determned based on the dendrogram by the really requrements [21]. The strateges of herarchcal clusterng generally fall nto two types. One s agglomeratve herarchcal clusterng that begns wth each element as a separate cluster and merge them nto

6 larger clusters. The other s dvsve herarchcal clusterng that begns wth the whole set and proceed to dvde t nto smaller clusters. In ths paper, the agglomeratve herarchcal clusterng algorthm developed by Hoon, et al. [22] was adopted. C. Smlarty Measure The smlarty measures, whch are used to compare wth the proposed method, are detaled n ths secton. The KL dvergence to mean, whch s the average of the KL dvergence of each dstrbuton, s used n dstrbutonal clusterng [1]. Baker and McCallum used the weghted average nstead of the smple average. The weghted average KL dvergence s defned as follows: d P( w ) D( P( C w ) P( C w w )) t t t s P( w ) D( P( C w ) P( C w w )) s s t s where P(w t) and P(w s) are the probablty of the word w t and w s occurrng n the tranng set, respectvely; P(C w t) and P(C w s) are the contrbuton of the word w t and w s to classfcaton, respectvely; P(C w t w s) s the contrbuton of cluster n whch the word w t and w s are combned to classfcaton. D(P(C w t) P(C w t w s)) s the measure of neffcency that occurs when messages are sent accordng to one dstrbuton, P(C w t), but encoded wth a code that s optmal for a dfferent dstrbuton, P(C w t w s). The Cty-block metrc, whch s known as the Manhattan dstance, s the sum of dstances along each element n vector X = {x 1, x 2,, x n} and Y = {y 1, y 2,, y n}. n 1 d x y n 1 The Eucldean metrc s one of the most common types of dstance. It s the geometrc dstance n the multdmensonal space. The Eucldean dstance between two ponts n n-dmenson space X = {x 1, x 2,, x n} and Y = {y 1, y 2,, y n} can be computed as follows: n 1 2 d x y D. Classfers Many classfers are used n text categorzaton n recent years, such as Naïve Bayes (NB), K-nearest neghbor (KNN), Support Vector Machnes (SVM)[23], Roccho, and so on. Compared wth the state-of-art methods, Support Vector Machnes s a hgher effcent classfer n text categorzaton. There s much emprcal support for usng Support Vector Machnes for text categorzaton [6, 24, 25]. The publshed results show that KNN s qute effectve and ts performance s comparable to that obtaned by SVM [2]. In ths paper, we use SVM and KNN to compare the performance of varous clusterng methods. Support Vector Machnes s based on the structural rsk mnmzaton prncple for computatonal learnng theory, and t was orgnally developed by Drucker, et al. [26] and appled to text categorzaton by Joachms [3]. Snce Joachms [3] thought that most of the text categorzaton problems are lnearly separable, the lnear kernel for SVM s selected. In ths study, we use LIBSVM toolkt [27]. C-SVM [28, 29] s selected and the penalty parameter C s 1. KNN [30, 31] s a smple machne learnng algorthm that makes decson dependng on the major category labels attached to the k tranng documents whch are smlar to the test object, and t s a type of nstance-based classfer or lazy learner, snce the decson s made untl all the objects n the tranng set are scanned [2]. We used k=30 n ths experment, and the cosne dstance was used as the measure of document smlarty. E. Evaluatons The classfcaton effectveness n text categorzaton s usually measured n terms of the precson (P) and recall (R) [2] whch are orgnally defned for bnary classfcaton [24]. To compute the averaged estmates n multclass classfcaton context, the mcro-averagng method and -averagng method are used [25]. The mcro-averagng measure and -averagng measure are computed as follows: P R mcro mcro F1 TP TP FP mcro TP TP FN 2P R mcro P R mcro C mcro C 1 TP ( TP FP ) 1 C mcro C 1 TP ( TP FN ) 1

7 P F1 C P 1 C 2R R P P R C 1 R C where TP s the amount of the documents whch are correctly classfed to category c ; FP s the amount of the documents whch are msclassfed to the category c ; FN s the amount of the documents whch are belong to category c and msclassfed to other categores; C s the amount of the categores. The accuracy, whch s defned to be the percentage of correctly labeled documents n test set, s wdely used n text categorzaton [6, 15, 32-34]. The formula of the accuracy s defned as follows: TP TN Accuracy TP TN FP FN F. Valdaton In order to valdate the text representaton method based on word clusters created by the proposed method, the 5 2 fold cross valdaton [35] s used n ths paper. We perform 5 replcatons of 2-fold cross valdaton. In each replcaton, the documents n corpus are randomly parttoned nto two equal-szed subsets (A and B). The learnng algorthm s traned on subset A and tested on subset B, and then traned on subset B and tested on subset A. So 10 performance estmates are produced n 5 2 fold cross valdaton. V. RESULTS A. Results on 20-Newsgroups corpus Table IV and Table V show the mcro F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng algorthm on 20-newsgroups. It can be seen from Table IV that the performance of the proposed methods outperforms that of the other smlarty measures n terms of mcro F1 when the number of clusters s 100, 200 or 300, respectvely. Table V shows that the performance of the Relatve Contrbuton 1 s superor to that of the other smlarty measures when the number of the clusters s 800, 900 or The F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng algorthm on 20-newsgroups are lsted n Table VI and Table VII. It can be seen from Table VI that the performance of the proposed methods outperforms that of the other smlarty measures n terms of F1 when the number of clusters s 100, 200 or 300, respectvely. Table VII shows that the performance of the Relatve Contrbuton 1 s superor to that of Ctyblock and Eucldean except for the number of the clusters s 500 or 600. Moreover, the performance of the Relatve Contrbuton 1 s superor to that of WeghtedKLD when the number of the clusters s 800, 900 or Fg 4 ndcates the accuracy curves of SVM and KNN when fve smlarty measures are used on 20-newsgroups, respectvely. It can be seen from Fg 4(a) that the proposed methods acheve better perfomance wth fewer clusters. Fg 4(b) ndcates that the accuracy performance of the proposed methods s superor to that of other smlarty measures when the number of clusters s greater than 700. B. Results on Reuters corpus Table VIII and Table IX show the mcro F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng TABLE IV THE MICRO F1 OF SUPPORT VECTOR MACHINES BASED ON 20-NEWSGROUPS. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean

8 algorthm on Reuters Table X and Table XI show the F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng algorthm on Reuters Fg 5 draws the accuracy curves of SVM and KNN when fve smlarty measures are used n herarchcal clusterng on Reusters-21578, respectvely. It can be seen from Table VIII - XI and Fg 5 that the performance of the proposed measures s only nferor to that of the WeghtedKLD and superor to that of Ctyblock and Eucldean. TABLE V THE MICRO F1 OF K-NEAREST NEIGHBOR BASED ON 20-NEWSGROUPS. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE VI THE MACRO F1 OF SUPPORT VECTOR MACHINES BASED ON 20-NEWSGROUPS. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE VII THE MACRO F1 OF K-NEAREST NEIGHBOR BASED ON 20-NEWSGROUPS. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean Fg 4. the accuracy curves of Support Vector Machnes and K-Nearest Neghbor used on 20-Newsgroups, respectvely.

9 TABLE VIII THE MICRO F1 OF SUPPORT VECTOR MACHINES BASED ON REUTERS Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE IX THE MICRO F1 OF K-NEAREST NEIGHBOR BASED ON REUTERS Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE X THE MACRO F1 OF SUPPORT VECTOR MACHINES BASED ON REUTERS Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE XI THE MACRO F1 OF K-NEAREST NEIGHBOR BASED ON REUTERS Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean Fg 5. the accuracy curves of Support Vector Machnes and K-Nearest Neghbor used on Reuters-21578, respectvely.

10 C. Results on Industry-Sector corpus Table XII and Table XIII show the mcro F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng algorthm on Industry Sector corpus. Table XIV and Table XV show the F1 measures of Support Vector Machnes and K-Nearest Neghbor when fve smlarty measures are used n herarchcal clusterng algorthm on Industry Sector corpus. It can be seen from Table XII and Table XIV that the performance of SVM combned wth the Relatve Contrbuton 2 s superor to that of other smlarty measures when the number of clusters s 900 or Table XIII and Table XV ndcate that the performance of KNN combned wth the Relatve Contrbuton 1 s superor to that of other smlarty measures when the number of clusters s 800 or 900. Fg 6 shows the accuracy of SVM and KNN when fve smlarty measures are used n herarchcal clusterng on Industry Sector corpus. Fg 6(a) ndcates that the accuracy curve of SVM combned wth the proposed method s hgher than that wth other methods when the number of clusters s 600, 700 or 800. Fg 6(b) shows that the accuracy curve of KNN combned wth the Relatve Contrbuton 1 s hgher than that wth the other methods when the number of clusters s 700, 800 or 900. TABLE XII THE MICRO F1 OF SUPPORT VECTOR MACHINES BASED ON INDUSTRY-SECTOR. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE XIII THE MICRO F1 OF K-NEAREST NEIGHBOR BASED ON INDUSTRY-SECTOR. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE XIV THE MACRO F1 OF SUPPORT VECTOR MACHINES BASED ON INDUSTRY-SECTOR. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean TABLE XV THE MACRO F1 OF K-NEAREST NEIGHBOR BASED ON INDUSTRY-SECTOR. Relatve Contrbuton Relatve Contrbuton WeghtedKLD Ctyblock Eucldean

11 Fg 6. the accuracy curves of Support Vector Machnes and K-Nearest Neghbor used on Industry-Sector, respectvely. VI. ANALYSIS AND DISCUSSION The evaluaton for clusterng results to fnd the parttonng that best fts the underlyng data s one of the most mportant ssues n cluster analyss [36, 37]. There are many cluster valdaton ndces that have been proposed to valdate the qualty of the clusters n the lterature [38]. Daves and Bouldn [39] presented a valdaton ndex whch can nfer the approprateness of varous dvsons of the data. The Daves-Bouldn ndex has a low computatonal complexty. So we chose t to valdate the qualty of the clusters produced by the proposed method. Its formula s lsted as follows: M 1 j DB M max d ( c, c ) 1 j 1,..., M ; j j where M s the number of the clusters; σ (σ j) s the average dstance of all elements n cluster (j) to ther cluster center c (c j); d(c, c j) s the dstance between the cluster centers c and c j. The range of DB s [0, ] and the lower value s for a good clusterng [38]. Table XVI lsts the Daves-Bouldn ndces of the clusters when varous smlarty measures are used and all words n feature vector space are clustered nto k clusters. The k s equal to 100, 200, 300, 400, 500, 600, 700, 800, 900 or Due to the value of that s too large to dsplay, the Daves-Bouldn ndces of clusters generated by Cty-block and Eucldean are not lsted n Table XVI. It can be seen from Table XVI that the qualty of clusters generated by the proposed method s the best. However, there exsts a problem that the qualty of term cluster s not consstent wth the performance of classfers based on the term cluster. We ran our experments on an Intel Core2 Q6600 2G RAM PC under wndows XP. Due to the lmtaton of memory, the sze of the feature space must be less than 25000; otherwse, the clusterng software wll prompt that the memory s nsuffcent for clusterng. In our experments, we chose a part of corpora (20-newsgroups, reuters and ndustry sector) to ensure that the sze of the feature space s less than The clusterng based on the proposed method run about one hour. TABLE XVI THE DAVIES-BOULDIN INDICES OF THE CLUSTERS GENERATED BY VARIOUS SIMILARITY MEASURES ON THREE TEXT CORPORA. Dataset Smlarty measure 20- Relatve Contrbuton newgroups weghtedkld Reuters- Relatve Contrbuton weghtedkld Industry Relatve Contrbuton Sector weghtedkld

12 VII. CONCLUSION We proposed two new smlarty measure methods, whch use the relatve contrbuton of a feature for categores as the measure crteron. The proposed smlarty metrc s used n term clusterng for text categorzaton and can reduce the feature space by one to three orders of magntude whle losng only a few percent n classfcaton performance. We evaluated the proposed methods on three benchmark corpora (20-newgroups, reuters and ndustry sector), usng two classfcaton algorthms (Support Vector Machnes, K-Nearest Neghbor), and compared wth three well-known smlarty measures (weghted average KL dvergence, Cty-block, Eucldean). The results show that the performance of the proposed methods s comparatve wth that of other methods when Support Vector Machnes s used; the proposed methods sgnfcantly outperform Eucldean and Cty-block, and acheve comparatve performance wth weghted average KL dvergence when K-Nearest Neghbor classfer s used. Moreover, the qualty of the clusters generated by the proposed method s the best. In the future, the relatonshp between the qualty of a cluster and the performance of classfers based on the cluster s a research focus for us. REFERENCES [1] L. D. Baker and A. K. McCallum, "Dstrbutonal clusterng of words for text classfcaton," n proceedngs of the 21st annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, Melbourne, Australa, [2] F. Sebastan, "Machne Learnng n Automated Text Categorzaton," ACM Computng Surveys, vol. 34, pp. 1-47, Mar [3] T. Joachms, "Text categorzaton wth Support Vector Machnes: Learnng wth many relevant features," n Machne Learnng: ECML-98. vol. 1398, C. Nédellec and C. Rouverol, Eds., ed: Sprnger Berln / Hedelberg, 1998, pp [4] C. Lee and G. G. Lee, "Informaton gan and dvergence-based feature selecton for machne learnng-based text categorzaton," Informaton Processng & Management, vol. 42, pp , [5] D. Fragouds, et al., "Best terms: an effcent feature-selecton algorthm for text categorzaton," Knowledge and Informaton Systems, vol. 8, pp , [6] R. Bekkerman, et al., "Dstrbutonal word clusters vs. words for text categorzaton," J. Mach. Learn. Res., vol. 3, pp , [7] A.-M. Hsham, "A New Text Categorzaton Technque Usng Dstrbutonal Clusterng and Learnng Logc," IEEE Transactons on Knowledge and Data Engneerng, vol. 18, pp , [8] N. Slonm and N. Tshby, "The power of word clusters for text classfcaton," n proceedngs of ECIR-01, 23rd European Colloquum on Informaton Retreval Research, Darmstadt, Germany, [9] M. McGll, "An Evaluaton of Factors Affectng Document Rankng by Informaton Retreval Systems," Techncal report, Syracuse Unversty School of Informaton Studes, [10] S. Ramaswamy and K. Rose, "Adaptve Cluster Dstance Boundng for Hgh-Dmensonal Indexng," IEEE Transactons on Knowledge and Data Engneerng, vol. 23, pp , [11] P.-E. DANIELSSON, "Eucldean Dstance Mappng," COMPUTER GRAPHICS AND IMAGE PROCESSING, vol. 14, pp , [12] R. M. C. R. de Souza and F. d. A. T. de Carvalho, "Clusterng of nterval data based on cty-block dstances," Pattern Recognton Letters, vol. 25, pp , [13] J. D'Hondt, et al., "Parwse-adaptve dssmlarty measure for document clusterng," Informaton Scences, vol. 180, pp , [14] D. D. Lews, "An evaluaton of phrasal and clustered representatons on a text categorzaton task," n proceedngs of the 15th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, Copenhagen, Denmark, [15] E. Youn and M. K. Jeong, "Class dependent feature scalng method usng nave Bayes classfer for text datamnng," Pattern Recognton Letters, vol. 30, pp , [16] A. McCallum, et al., "Improvng Text Classfcaton by Shrnkage n a Herarchy of Classes," n ICML-98, [17] A. K. Jan, "Data clusterng: 50 years beyond K-means," Pattern Recognton Letters, vol. 31, pp , [18] Y. Zhao and G. Karyps, "Data clusterng n lfe scences," Molecular Botechnology, vol. 31, pp , [19] J. Y. B. Tan, et al., "Tme Seres Predcton usng Backpropagaton Network Optmzed by Hybrd K-means-Greedy Algorthm," Engneerng Letters, vol. 20, pp , [20] A. K. Jan, et al., "Data clusterng: a revew," ACM Comput. Surv., vol. 31, pp , [21] Y. Mn-Shen and W. Kuo-Lung, "A smlarty-based robust clusterng method," IEEE Transactons on Pattern Analyss and Machne Intellgence,, vol. 26, pp , [22] M. J. L. d. Hoon, et al., "Open source clusterng software," Bonformatcs, vol. 20, pp [23] I. Nurtano, et al., "Classfyng Cyst and Tumor Leson Usng Support Vector Machne Based on Dental Panoramc Images Texture Features," IAENG Internatonal Journal of Computer Scence, vol. 40, pp , [24] T. Joachms, Learnng to Classfy Text Usng Support Vector

13 Machnes: Methods, Theory and Algorthms: Kluwer Academc Publshers [25] H. Ogura, et al., "Feature selecton wth a measure of devatons from Posson n text categorzaton," Expert Systems wth Applcatons, vol. 36, pp , [26] H. Drucker, et al., "Support Vector Machnes for Spam Categorzaton," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 10, pp , [27] C.-C. Chang and C.-J. Ln. (2001). LIBSVM : a lbrary for support vector machnes Avalable: [28] C.-C. Chang and C.-J. Ln, "Tranng v-support Vector Classfers: Theory and Algorthms," Neural Computaton, vol. 13, pp , [29] M. A. Davenport, et al., "Controllng False Alarms wth Support Vector Machnes," presented at the IEEE Internatonal Conference on Acoustcs, Speech, and Sgnal Processng (ICASSP), [30] T. Cover and P. Hart, "Nearest neghbor pattern classfcaton," IEEE Transactons on Informaton Theory, vol. 13, pp , [31] E. Uchno, et al., "IVUS-Based Coronary Plaque Tssue Characterzaton Usng Weghted Multple k-nearest Neghbor," Engneerng Letters, vol. 20, pp , [32] Y. H. L and A. K. Jan, "Classfcaton of Text Documents," The Computer Journal, vol. 41, pp , January 1, [33] M. Benkhalfa, et al., "Integratng External Knowledge to Supplement Tranng Data n Sem-Supervsed Learnng for Text Categorzaton," Informaton Retreval, vol. 4, pp , [34] A. Markov, et al., "The hybrd representaton model for web document classfcaton," Internatonal Journal of Intellgent Systems, vol. 23, pp , [35] T. G. Detterch, "Approxmate Sratstcal Tests for Comparng Supervsed Classfcaton Learnng Algorthms," Neural Computaton, vol. 10, pp , [36] M. Halkd, et al., "On Clusterng Valdaton Technques," Journal of Intellgent Informaton Systems, vol. 17, pp , [37] H. Alzadeh, et al., "A Framework for Cluster Ensemble Based on a Max Metrc as Cluster Evaluator," IAENG Internatonal Journal of Computer Scence, vol. 39, pp , [38] S. Günter and H. Bunke, "Valdaton ndces for graph clusterng," Pattern Recognton Letters, vol. 24, pp , [39] D. L. Daves and D. W. Bouldn, "A Cluster Separaton Measure," IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 1, pp , Je-Mng Yang receved hs MSc and PhD degree n computer appled technology from Northeast DanL Unversty, Chna n 2008 and Computer Scence and Technology from Jln Unversty, Chna n 2013, respectvely. Hs research nterests are n machne learnng and data mnng. Zh-Yng Lu receved her MSc degree n computer appled technology from Northeast DanL Unversty, Chna n Hs research nterests are n nformaton retreval and personalzed recommendaton. Zhao-Yang Qu receved hs MSc and PhD degree n computer scence from Dalan Unversty of Technology, Chna n 1988 and North Chna Electrc Power Unversty, Chna n 2012, respectvely. Hs research nterests are n artfcal ntellgence, machne learnng and data mnng.

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1 Herarchcal agglomeratve Cluster Analyss Chrstne Sedle 19-3-2004 Clusterng 1 Classfcaton Basc (unconscous & conscous) human strategy to reduce complexty Always based Cluster analyss to fnd or confrm types

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

An Internal Clustering Validation Index for Boolean Data

An Internal Clustering Validation Index for Boolean Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 6 Specal ssue wth selecton of extended papers from 6th Internatonal Conference on Logstc, Informatcs and Servce Scence

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1 A New Feature of Unformty of Image Texture Drectons Concdng wth the Human Eyes Percepton Xng-Jan He, De-Shuang Huang, Yue Zhang, Tat-Mng Lo 2, and Mchael R. Lyu 3 Intellgent Computng Lab, Insttute of Intellgent

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

A new Unsupervised Clustering-based Feature Extraction Method

A new Unsupervised Clustering-based Feature Extraction Method A new Unsupervsed Clusterng-based Feature Extracton Method Sabra El Ferchch ACS Natonal School of Engneerng at Tuns, Tunsa Salah Zd AGIS lle Unversty of Scence and Technology, France Kaouther aabd ACS

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

An Improved Spectral Clustering Algorithm Based on Local Neighbors in Kernel Space 1

An Improved Spectral Clustering Algorithm Based on Local Neighbors in Kernel Space 1 DOI: 10.98/CSIS110415064L An Improved Spectral Clusterng Algorthm Based on Local Neghbors n Kernel Space 1 Xnyue Lu 1,, Xng Yong and Hongfe Ln 1 1 School of Computer Scence and Technology, Dalan Unversty

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Understanding K-Means Non-hierarchical Clustering

Understanding K-Means Non-hierarchical Clustering SUNY Albany - Techncal Report 0- Understandng K-Means Non-herarchcal Clusterng Ian Davdson State Unversty of New York, 1400 Washngton Ave., Albany, 105. DAVIDSON@CS.ALBANY.EDU Abstract The K-means algorthm

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1.

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1. Herarchcal Semantc Perceptron Grd based Neural CAO Hua-hu, YU Zhen-we, WANG Yn-yan (Dept. Computer of Chna Unversty of Mnng and Technology Bejng, Bejng 00083, chna) chhu@cumtb.edu.cn Abstract A herarchcal

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

Correlative features for the classification of textural images

Correlative features for the classification of textural images Correlatve features for the classfcaton of textural mages M A Turkova 1 and A V Gadel 1, 1 Samara Natonal Research Unversty, Moskovskoe Shosse 34, Samara, Russa, 443086 Image Processng Systems Insttute

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions * Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 1 Journal of Zhejang Unversty-SCIENCE C (Computers & Electroncs) ISSN 1869-1951 (Prnt); ISSN 1869-196X (Onlne) www.zju.edu.cn/jzus; www.sprngerlnk.com

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information