A Novel Term_Class Relevance Measure for Text Categorization

Size: px
Start display at page:

Download "A Novel Term_Class Relevance Measure for Text Categorization"

Transcription

1 A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure called Term_Class relevance to compute the relevancy of a term n classfyng a document nto a partcular class. The proposed measure estmates the degree of relevance of a gven term, n placng an unlabeled document to be a member of a known class, as a product of Class_Term weght and Class_Term densty; where the Class_Term weght s the rato of the number of documents of the class contanng the term to the total number of documents contanng the term and the Class_Term densty s the relatve densty of occurrence of the term n the class to the total occurrence of the term n the entre populaton. Unlke the other exstng term weghtng schemes such as TF-IDF and ts varants, the proposed relevance measure takes nto account the degree of relatve partcpaton of the term across all documents of the class to the entre populaton. To demonstrate the sgnfcance of the proposed measure expermentaton has been conducted on the 20 Newsgroups dataset. Further, the superorty of the novel measure s brought out through a comparatve analyss. Keywords: text categorzaton, term weght, term-document relevance, term_class relevance. 1. Introducton For the few decades automatc content based classfcaton of documents from huge collectons has become an actve area of research due to the fact that electronc data over the nternet has become unmanageably bg and day by day t s ncreasng exponentally. Manual, tag based classfcaton have lost ther sgnfcance because of the huge sze of the data that need to be processed and nablty of the tags n descrbng the content of the documents. Varetes of applcatons of text classfcaton whch are of current demand such as spam flterng n E-mals, classfcaton of E-Books, classfcaton of news documents, classfcaton of text data from socal networks and so on have also made the researchers to explore varous ways of analyzng and representng these data so that quck and effcent retreval and management of ths huge data can be done. 1.1 A revew of the avalable term weghtng schemes As our work focuses on proposal of a new term weghtng scheme, but not on classfcaton framework, here we consder the lterature on only dfferent term weghtng schemes. Terms are the basc nformaton unts of any text document. So, all weghtng schemes developed n the lterature measure the weght of a term n representng the content of a document [1-5]. Based on whether the membershp of the document n predefned categores s provded to

2 measure the weght of a term or not, term weghtng schemes are broadly classfed nto two classes namely, unsupervsed term weghtng schemes and supervsed term weghtng schemes. In the followng subsectons we provde a revew of both the weghtng scheme along wth the technques whch have adopted them Unsupervsed term weghtng schemes Most of the unsupervsed term weghtng schemes are from the nformaton retreval feld. These methods are very useful when the tranng documents are not labeled by ther class labels. The tradtonal term weghtng methods borrowed from IR, such as bnary, term frequency (TF), TF- IDF, and ts varous varants are unsupervsed schemes [2]. The TF-IDF proposed by Jones [6, 7] and ts varants are the most wdely used term weghtng schemes for text classfcaton. Some of the varants of TF are Raw term frequency, log(tf), log(tf+1), or log(tf)+1[1-2]. If n s the number of documents contanng the term and N s the number of documents n the collecton then, the varants of IDF are 1/n, log(1/n), log(n/n), log(n/n)+1 and log(n/n-1)[1]. In [18], a novel nverse corpus frequency (ICF) based technque s proposed whch computes the document representaton n lnear tme Supervsed term weghtng schemes Supervsed term weghtng schemes were developed especally for text categorzaton because of the fact that a supervsed knowledge on the class labels of the tranng samples s provded [1-4]. All the supervsed term weghtng schemes make use of ths class nformaton n dfferent ways. Supervsed term weghtng schemes are further classfed nto subcategores, based on whether the weght estmates relevancy of a term n preservng document content or the relevancy of a term n placng a document as a member of a class. So, t wll be more effectve to call the weghtng schemes whch are used to measure the relevance of a term n preservng the document content as term-document relevance measures and those whch can be used to measure the term relevance n categorzng a document as term_class relevance measures. Term-Document Relevance measure These measures are useful to select a dscrmnatng subset of terms for representng a document by weghtng the terms accordng to ther relevance n preservng the content of the document. These are created by replacng the IDF component of the TF-IDF scheme. Most frequently used technques to replace IDF nclude ch-square measure (X 2 ), Informaton Gan(IG), Gan Rato, Mutual Informaton(MI), Odds Rato(OR) [1-4,8-12]. From past few years, many researchers have proposed alternatve term-document relevance schemes [1, 13-16]. All these are bascally feature selecton technques used n term weghtng schemes. In [14], a comparson of corpusbased and class-based keyword selecton s proposed by usng TF-IDF as weghtng scheme. In [4], a class-ndexng-based term weghtng for automatc text classfcaton s proposed. An

3 nverse class space densty frequency ( ICS F a postve dscrmnaton on nfrequent and frequent terms. ) s used along wth TF-IDF method that provdes Term_class Relevance measures These measures compute the ablty of a term n classfyng a document as a member of a class. To the best of our knowledge, only one work of ths category has been proposed by Isa et al., [20] usng Bayes posteror probablty. Though, some works make use of Bayes probablty for representaton, they have not clearly stated the advantage of the measure n classfcaton [11, 18]. After [20], ths measure was extensvely used for term weghtng [21, 22]. The beauty of ths measure les n the fact that, nstead of computng the weght of a term n preservng the content of a document, the relevancy of the term n categorzng the document as a member of a class can be measured drectly. Whch s computed as the Bayes posteror probablty P(C/ t) for a class C and term t as gven by, where, P( t / C ) P( C ) P( C / t) Pt ( ) Total _ of _ Words _ n _ C PC ( ), Total _ of _ Words _ n _ Tranng _ Dataset Pt ( ) occurrence _ of _ t _ n _ all _ categores, and occurrence _ of _ all _ terms _ n _ all _ categores occurence _ of _ t _ n _ C P( t / C ) occurrence _ of _ all _ terms _ n _ C To make use of the complete advantage of the proposed relevance measure, Isa et al., [20] also propose a text representaton scheme whch works wth the reduced dmenson for each document at the tme of representaton tself. Ths work happened to be the very frst of ts knd n the lterature of text classfcaton where, a document s represented only wth number of dmensons equal to the number of classes n the corpus wthout any dmensonalty reducton technque appled. In ths representaton scheme, frst, a matrx F of sze m X k s created for every document where, m s the number of terms assumed to be avalable n the document and k s the number of classes. Then, every entry F(, ) of the matrx s flled by the relevancy of the correspondng term t n classfyng the correspondng document as a member of class C. Then, a feature vector

4 f of dmenson k s created as a representatve for the document where, f() s the average of relevancy of every term to a class C. It shall be carefully observed here that, a document wth any number of terms s represented wth a feature vector of dmenson equal to the number of classes n the populaton whch s very small n contrast to the feature vector that s created n any other vector space representaton scheme where the dmenson s equal to the total number of terms due to all documents of the populaton. Therefore, a great amount of dmensonalty reducton s acheved at the tme of representaton tself wthout the applcaton of any dmensonalty reducton technque. However, the classfcaton accuracy accomplshed s not of that hgh. Motvated by ths work, n ths paper we propose a novel term_class relevance measure wth the followng obectves, Explotng the complete advantage of text representaton scheme proposed by Isa et al.,[20]. Comparson of the effectveness of the proposed term_class relevance measure wth that of Bayes posteror probablty based measure. Isa et al., [20] make use of SVM as the classfer. So we are also nvestgatng the effect SVM on our proposed relevance measure and also compare t wth other avalable classfers. The rest of the paper s organzed as follows. The proposed Class_Term relevance measure s presented n the Secton 2. In secton 3, presents the results and dscusson on the expermentaton. A comparatve analyss of the proposed relevance measure wth other contemporary works s gven n the Secton 4. Fnally, secton 6 presents the concluson and future enhancements. 2. A New Term_Class Relevance Measure In ths secton, we propose a novel measure called term_class relevance measure. Term_class relevancy s defned as the ablty of a term t n classfyng a document D as a member of a class C. We begn wth ntroducng two new concepts whch decde the role of a term n a class, namely, Class_Term Weght and Class_Term Densty. Class_Term Weght: It s the relatve weght of the term wth respect to a class of nterest whch s computed by countng only those documents of the class of nterest that are contanng the term of nterest aganst that of the entre corpus. That s, the class_term weght of a term t n the class C s computed as the rato of ClassFrequency ( t, C ) to the CorpusFrequency ( t ). It s gven by the equaton below. ClassFrequency ( t, C ) Class _ TermWeght ( t, C ) CorpusFrequency ( t )

5 where, ClassFrequency ( t, C ) s the number of documents of C contanng t at least once and CorpusFrequency ( t ) once. s the number of documents of the entre corpus contanng t at least If the class_term weght of a term t wth respect to the class C s very hgh then the probablty that the document D whch contans t s most lkely a member of the class C s also hgh. Therefore, the relevancy of a term whch we call t as Term _ ClassRe levancy ( t, C ) n decdng the class of a document s drectly proportonal to the class_term weght of the term..e., Term _ Class Re levancy ( t, C ) Class _ TermWeght ( t, C ) (1) Class_Term Densty: It s the relatve densty of a term of nterest wth respect to the class of nterest. It s computed as the rato of the number of occurrences of the term n the class of nterest to that of the entre corpus. That s, the class_term densty of a term t wth respect to the class C s computed as the rato of frequency of t n C to ts frequency n the corpus. It s gven by the equaton below. Class _ TermDensty( t, C ) k TermFrequency ( t, C ) 1 TermFrequency ( t, C ) where, TermFrequency ( t, C ) s the frequency of t n the class C whch s computed as the sum of the frequences of t n every document of C as shown by the equaton below. TermFrequency ( t, C ) Frequency ( t, D ) d doc doc1 where, Frequency( t, D) s the frequency of occurrence of the term t n document D and d s the number of documents n the class C. It shall be notced that, f the class_term densty of a term t n a class C s very hgh then the probablty that a document D whch contans t s most lkely a member of the class C s also hgh. Therefore, the relevancy of a term n decdng the class of a document s drectly proportonal to the class_term densty of the term..e., Term _ Class Re levancy ( t, C ) Class _ TermDensty ( t, C ) (2) By combnng (1) and (2), the term_class relevancy s drectly proportonal to the product of the class_term weght and class_term densty of the term, Term _ Class Re levancy ( t, C ) Class _ TermWeght ( t, C )* Class _ TermDensty ( t, C ) Term _ Class Re levancy ( t, C ) c* Class _ TermWeght ( t, C )* Class _ TermDensty ( t, C )

6 .e., where, c s the proportonalty constant, whch we decde based on the class weght wth respect to the entre populaton. Class Weght ( c ): It s the weght of the th class C n the corpus whch s computed as the rato of the number of documents n C denoted by Sze _ of ( C ) to the total number of documents n the entre corpus as gven by, ClassWeght( C ) Sze _ of ( C ) k 1 Sze _ of ( C ) Where, k s the number of classes. If each class has equal number of documents, then the class-weght serves as a scalng factor n computng the relevance of a term and t ncreases or decreases the relevancy of a term to a class when the sze of the class compared to the sze of other classes s larger or smaller respectvely. Therefore, the proposed relevancy measure of a term t n placng a document D as a member of a class C s gven by the product of the three aspects namely, Class weght, Class_Term weght and Class_Term Densty as gven by the formula below. Term _ Class Re levancy ( t, C ) c* Class _ TermWeght ( t, C )* Class _ TermDensty ( t, C ) The man advantages of the proposed term_class relevancy measure are as follows, It drectly computes the relevancy of the term wth respect to a class of nterest; whch can tself be used as a clue to dentfy the possble class to whch a document may belong wthout the need of a classfer. The measure uses class as well as corpus nformaton together as opposed to the conventonal TF-IDF scheme, whch utlzes the document frequency from only the corpus. It shall be observed that, the relevancy of a term to a class s hgh only f the three factors class_term weght, class_term densty and class_weght are hgh. Ths helps n properly decdng the weght of a term wthout any bas towards a partcular class, whch n turn helps n decdng the class for a classfer. Once the term_class relevance of all terms of the tranng set of documents s computed wth respect to every class present, each tranng document s then represented usng the representaton scheme proposed by Isa et al., [20] as explaned n secton A document s frst represented as a matrx of sze, where, m s the number of terms assumed to be avalable n the document and k s the number of classes. Then, every entry of the matrx s flled by the relevancy of the correspondng term t wth respect to the class C. Then, a feature vector f of

7 dmenson k s created as a representatve for the document where, f() s the average relevancy of all terms wth respect to a class C. The feature matrx of sze thus created for the n tranng documents s used for learnng process. A smlar vector of k dmenson s created for the test documents and gven to the learnng algorthm or a classfer for labelng. The process of tranng and testng the classfers s explaned n the next secton. 3. Classfcaton wth SVM and k-nn classfers To evaluate the applcablty of the proposed term_class relevance measure, we make use SVM as learnng algorthm to perform classfcaton because of ts good generalzaton ablty. Moreover, the tranng burden for SVM s very less even though, the tme requred for tranng s drectly proportonal to the tranng dataset, because the representatve feature vectors are of dmenson equal to the number of classes only. So, to test the effectveness of the proposed relevance measure we have expermented wth the SVM classfer wth Lnear, Gaussan radal bass functon (RBF) and Polynomal kernels. We consder the 20 Newsgroups data set for our expermentaton. It conssts of approxmately 20,000 newsgroup documents consstng 20 classes wth each class bearng nearly equal number of samples. It has become a popular data set for text classfcaton and clusterng applcatons. Some of the documents are closely related to each other whle others are hghly unrelated. We conduct experments wth varous proporton of tranng set to valdate the performance of the proposed relevancy measure. Fg 1 shows the overall classfcaton accuracy of the system wth varous percentages of tranng samples usng SVM classfer wth dfferent kernels. Fg 2 shows the precson of the SVM classfer wth dfferent kernels and Recall s shown n Fg 3. In Fg 4, the overall F- measure s presented. It can be observed from the fgures (1-4) that, the SVM classfer wth RBF kernel s workng well when compared to the other kernels. The results are also presented graphcally n fgures below. 90 A c c u r a c y percentage of tranng Lnear RBF Polynomal

8 Fgure 1. Overall accuracy of classfcaton wth Lnear, RBF and Polynomal kernels 90 P r e c s o n Percentage of Tranng Lnear RBF Polynomal Fgure 2. precson of the SVM classfer wth Lnear, RBF and Polynomal kernels R e c a l l Percentage of Tranng Lnear RBF Polynomal Fgure 3. Recall of the SVM classfer wth Lnear, RBF and Polynomal kernels F - M e a s u r e Lnear RBF Polynomal

9 F-measure Fgure 4. F-measure of the SVM classfer wth Lnear, RBF and Polynomal kernels Further, the k-nn classfer s also adapted to test the proposed method because of ts smplcty n classfcaton. We performed the expermentaton wth varous values of k from 1 to 20 and the performance of the classfer was hgh for k=10. Table 2 shows the results of k-nn classfer for k=10 and a comparson wth the best results of SVM s also gven. Table 2. Results of SVM wth RBF kernel and k-nn wth k=10 % of Accuracy Precson Recall F-measure Trang k-nn SVM k-nn SVM k-nn SVM k-nn SVM To compare the class-wse performance of each classfer we show the varaton of F-measure vs. class n Fgure 5 and 6. Fgure 5, shows the values of F-measure vs. each class usng k-nn classfer wth k=10 and 10 percent of tranng. It can be notced that, the performance s relatvely low for classes 2, 3, 4, 7, 13 and 20. Further, the F-measure of SVM classfer vs. each class wth RBF kernel and 10 % of tranng s shown n Fgure 6. Though, the results of SVM are poor when compared to k-nn, SVM also has shown relatvely low performance for the same classes as n the case of k-nn Class Number

10 F-measure Fgure 5. Classfcaton performance vs. class for k-nn classfer wth 10 % tranng Class Number Fgure 6. Classfcaton performance vs. class for SVM classfer wth RBF kernel and 10 % tranng 4. Comparatve Analyss In ths secton, we provde a quanttatve comparatve analyss of the proposed term_class relevance measure wth the results of Isa et al.,[20] n Table 3. The results correspondng to [20] have been extracted drectly from the paper as the representaton scheme s same n both the works and they also have provded the results on the same 20Newsgroups dataset usng only SVM classfer wth dfferent kernels. We can notce from the Table 3 that, the proposed term_class relevance measure outperforms the measure used by Isa et al.,[20]. Along wth SVM, we compare the results usng results of k-nn classfer wth k=10. It can also be notced from the Table 3 that, k-nn classfer wth k=10 s showng enhanced results when compared to SVM wth all the kernels for both the relevance measures. So, we recommend usng k-nn as the classfer for better classfcaton performance. Table 3. Comparson of Results of the proposed method wth the work of Isa et al.,[20]. Percentage of Tranng Results from [20] wth SVM Results of Proposed Method SVM Lnear RBF Polynomal Lnear RBF Polynomal k-nn

11 Concluson In ths paper, a novel term_class relevance measure to compute the relevance of a term n classfyng an unknown document as a member of a partcular class s proposed. The proposed term_class relevance measure s a product of three aspects namely class_term weght, class_term densty and class_weght. Experments are conducted on 20 Newsgroups dataset usng the SVM and k-nn classfers. An effectve text representaton scheme whch allows representaton of text documents n reduced dmenson s adapted to test the proposed term_class relevance measure. The comparatve analyss of the results of the proposed work wth the other contemporary research works shows the superorty of the proposed term_class relevance measure. References 1. Lan, M., Tan, C. L., Su. J., and Lu, Y Supervsed and Tradtonal Term Weghtng Methods for Automatc Text Categorzaton. IEEE Transactons on Pattern Analyss and Machne Intellgence, Volume: 31 (4), pp G. Salton and C. Buckley Term-Weghtng Approaches n Automatc Text Retreval, Informaton Processng and Management, vol. 24(5), pp Debole F, Sebastan. F Supervsed Term Weghtng for Automated Text Categorzaton. Proceedngs of the 2003 ACM symposum on appled computng, pp Ren F, Sohrab M. G., Class-ndexng-based term weghtng for automatc text classfcaton. Informaton Scences 236 (2013) Harsh B. S., Guru D. S., and Manunath. S. (2010). Representaton and Classfcaton of Text Documents: A Bref Revew. IJCA Specal Issue on Recent Trends n Image Processng and Pattern Recognton RTIPPR, pp K. S. Jones, A statstcal nterpretaton of term specfcty and ts applcaton n retreval, Journal of Documentaton, Vol. 28, pp K. S. Jones, A statstcal nterpretaton of term specfcty and ts applcaton n retreval, Journal of Documentaton, Vol. 60, pp Altınçay H, Erenel Z., Analytcal evaluaton of term weghtng schemes for text categorzaton. Pattern Recognton Letters Vol. 31, pp Lu, Y., Loh, H.T., Sun, A., Imbalanced text classfcaton: A term weghtng approach. Expert Systems wth Applcatons 36, Mladenc, D., Grobelnk, M., Feature selecton on herarchy of web documents. Decson Support Syst. 35 (1), Sebastan, F., Machne learnng n automated text categorzaton. ACM Comput. Surveys 34 (1), 1 47

12 12. Yang, Y., Pedersen, J.O., A comparatve study on feature selecton n text categorzaton. In: Proc. ICML 97, 14th Internat. Conf. on Machne Learnng. Morgan Kaufmann Publshers, San Francsco, US, pp Lu, H., Yu, L., Toward ntegratng feature selecton algorthms for classfcaton and clusterng. IEEE Trans. Knowledge Data Eng. 17 (4), Ozgur, A., Ozgur, L., Gungor, T., Text categorzaton wth class-based and corpusbased keyword selecton. In: Proc. 20th Internat. Symp. on Computer and Informaton Scences. Lecture Notes n Computer Scence, vol. 3733, Sprnger-Verlag, pp Tsa, R.T., Hung, H., Da, H., Ln, Y., Hsu, W., Explotng lkely-postve and unlabeled data to mprove the dentfcaton of proten proten nteracton artcles. BMC Bonform Wang, D, Zhang, H., Inverse-Category-Frequency Based Supervsed Term Weghtng Schemes for Text Categorzaton. Journal of Informaton Scence and Engneerng Vol 29, pp Reed, J, W., Jao, Y., Potok T, E., Klump, B, A., Elmore, M, T., and Hurson, A, R., TF-ICF: A New Term Weghtng Scheme for Clusterng Dynamc Data Streams. 5th Internatonal Conference on Machne Learnng and Applcatons. pp IEEE Computer Socety Washngton 18. Fuhr, N., Hartmann, S., Lustg, G., Schwantner, M., Tzeras, K., Darmstadt, T. H., et al. (1991). AIR/X A rule-based multstage ndexng system for large subect felds. In: Proceedngs of the proceedngs of RIAO(pp ) 19. P. Soucy and G.W. Mneau, Beyond tfdf Weghtng for Text Categorzaton n the Vector Space Model, Proc. Int l Jont Conf. Artfcal Intellgence, pp , Isa, D., Lee, L. H., Kallman, V. P., and Ra Kumar, R Text document preprocessng wth the Bayes formula for classfcaton usng the support vector machne. IEEE Transactons on Knowledge and Data Engneerng. Vol. 20, pp Isa, D., Kallman, V. P., Lee, L. H., Usng the self-organzng map for clusterng of text documents. Expert Systems wth Applcatons. Vol. 36, pp Guru D. S., Harsh B. S., and Manunath. S Symbolc representaton of text documents. In Proceedngs of Thrd Annual ACM Bangalore Conference. do /

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm IOP Conference Seres: Materals Scence and Engneerng PAPER OPEN ACCESS Feature Selecton for Natural Language Call Routng Based on Self-Adaptve Genetc Algorthm To cte ths artcle: A Koromyslova et al 017

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering Journal of Advances n Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sar Branch, Islamc Azad Unversty, Sar, I.R.Iran (Vol. 6, No. 1, February 2015), Pages: 101-114 www.jacr.ausar.ac.r Usng

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition Mathematcal Methods for Informaton Scence and Economcs Novel Pattern-based Fngerprnt Recognton Technque Usng D Wavelet Decomposton TUDOR BARBU Insttute of Computer Scence of the Romanan Academy T. Codrescu,,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1 A New Feature of Unformty of Image Texture Drectons Concdng wth the Human Eyes Percepton Xng-Jan He, De-Shuang Huang, Yue Zhang, Tat-Mng Lo 2, and Mchael R. Lyu 3 Intellgent Computng Lab, Insttute of Intellgent

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Clustering of Words Based on Relative Contribution for Text Categorization

Clustering of Words Based on Relative Contribution for Text Categorization Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Research Article A High-Order CFS Algorithm for Clustering Big Data

Research Article A High-Order CFS Algorithm for Clustering Big Data Moble Informaton Systems Volume 26, Artcle ID 435627, 8 pages http://dx.do.org/.55/26/435627 Research Artcle A Hgh-Order Algorthm for Clusterng Bg Data Fanyu Bu,,2 Zhku Chen, Peng L, Tong Tang, 3 andyngzhang

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

A Misclassification Reduction Approach for Automatic Call Routing

A Misclassification Reduction Approach for Automatic Call Routing A Msclassfcaton Reducton Approach for Automatc Call Routng Fernando Uceda-Ponga 1, Lus Vllaseñor-Pneda 1, Manuel Montes-y-Gómez 1, Alejandro Barbosa 2 1 Laboratoro de Tecnologías del Lenguaje, INAOE, Méxco.

More information

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES Aram AlSuer, Ahmed Al-An and Amr Atya 2 Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Australa

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Computer Aided Drafting, Design and Manufacturing Volume 25, Number 2, June 2015, Page 14

Computer Aided Drafting, Design and Manufacturing Volume 25, Number 2, June 2015, Page 14 Computer Aded Draftng, Desgn and Manufacturng Volume 5, Number, June 015, Page 14 CADDM Face Recognton Algorthm Fusng Monogenc Bnary Codng and Collaboratve Representaton FU Yu-xan, PENG Lang-yu College

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Incremental Learnng wth Feature Shft Detecton for Personalzed E-mal Spam Flterng Gop Sanghan 1, Dr. Ketan Kotecha 2 1 Computer Engneerng Department, Nrma Unversty, Ahmedabad-382481, Inda. 2 Parul Unversty,

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Fuzzy Rough Neural Network and Its Application to Feature Selection

Fuzzy Rough Neural Network and Its Application to Feature Selection 70 Internatonal Journal of Fuzzy Systems, Vol. 3, No. 4, December 0 Fuzzy Rough Neural Network and Its Applcaton to Feature Selecton Junyang Zhao and Zhl Zhang Abstract For the sake of measurng fuzzy uncertanty

More information

Multiclass Object Recognition based on Texture Linear Genetic Programming

Multiclass Object Recognition based on Texture Linear Genetic Programming Multclass Object Recognton based on Texture Lnear Genetc Programmng Gustavo Olague 1, Eva Romero 1 Leonardo Trujllo 1, and Br Bhanu 2 1 CICESE, Km. 107 carretera Tjuana-Ensenada, Mexco, olague@ccese.mx,

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers Research Artcle Internatonal Journal of Current Engneerng and Technology ISSN 77-46 3 INPRESSCO. All Rghts Reserved. Avalable at http://npressco.com/category/jcet Fuzzy Logc Based RS Image Usng Maxmum

More information

On Evaluating Open Biometric Identification Systems

On Evaluating Open Biometric Identification Systems Proceedngs of Student/Faculty Research Day, CSIS, Pace Unversty, May 6th, 2005 On Evaluatng Open Bometrc Identfcaton Systems Mchael Gbbons, Sungsoo Yoon, Sung-Hyuk Cha and Charles Tappert mkegbb@us.bm.com,

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Enhanced Watermarking Technique for Color Images using Visual Cryptography

Enhanced Watermarking Technique for Color Images using Visual Cryptography Informaton Assurance and Securty Letters 1 (2010) 024-028 Enhanced Watermarkng Technque for Color Images usng Vsual Cryptography Enas F. Al rawashdeh 1, Rawan I.Zaghloul 2 1 Balqa Appled Unversty, MIS

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation Tranng of Kernel Fuzzy Classfers by Dynamc Cluster Generaton Shgeo Abe Graduate School of Scence and Technology Kobe Unversty Nada, Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract We dscuss kernel fuzzy classfers

More information

Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification

Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification Credblty Adjusted Term Frequency: A Supervsed Term Weghtng Scheme for Sentment Analyss and Text Classfcaton Yoon Km New York Unversty yhk255@nyu.edu Owen Zhang zhonghua.zhang2006@gmal.com Abstract We provde

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information