Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Size: px
Start display at page:

Download "Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines"

Transcription

1 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak Department of Informaton Technology Department of Teacher Tranng n Electrcal Human Language Technology Laboratory Kng Mongkut s Unversty of Engneerng, Kng Mongkut s Unversty of Natonal Electroncs and Computer Technology North Bangkok, Technology North Bangkok, Technology Center (NECTEC), Bangkok, Thaland Bangkok, Thaland Bangkok, Thaland. Abstract Most Web page classfcaton models typcally apply the bag of words (BOW) model to represent the feature space. The orgnal BOW representaton, however, s unable to recognze semantc relatonshps between terms. One possble soluton s to apply the topc model approach based on the Latent Drchlet Allocaton algorthm to cluster the term features nto a set of latent topcs. Terms assgned nto the same topc are semantcally related. In ths paper, we propose a novel herarchcal classfcaton method based on a topc model and by ntegratng addtonal term features from neghborng pages. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton by usng a topc model and ntegratng neghborng pages, and (2) herarchcal Support Vector Machnes (SVM) classfcaton model constructed from a confuson matrx. From the expermental results, the approach of usng the proposed herarchcal SVM model by ntegratng current page wth neghborng pages va the topc model yelded the best performance wth the accuracy equal to 90.33% and the F1 measure of 90.14%; an mprovement of 5.12% and 5.13% over the orgnal SVM model, respectvely. Keywords - Wep page classfcaton; bag of words model; topc model; herarchcal classfcaton; Support Vector Machnes I. INTRODUCTION Due to the rapd growth of Web documents (e.g., Web pages, blogs, emals) on the World Wde Web (WWW), Web page classfcaton has become one of the key technques for managng and organzng those documents, e.g., as document flterng n nformaton retreval. Generally, Web page classfcaton apples the technque of text categorzaton, whch uses the supervsed machne learnng algorthms for learnng the classfcaton model [1, 2]. Most prevous works on Web page classfcaton typcally appled the bag of words (BOW) model to represent the feature space. Under the BOW model, a Web page s represented by a vector n whch each dmenson contans a weght value (e.g., frequency) of a word (or term) occurrng n the page. The orgnal BOW representaton, however, s unable to recognze synonyms from a gven word set. As a result, the performance of a classfcaton model usng the BOW model could become deterorated. In ths paper, we apply a topc model to represent the feature space for learnng the Web page classfcaton model. Under the topc model concept, words (or terms), whch are statstcally dependent, are clustered nto the same topcs. Gven a set of documents D consstng of a set of terms (or words) W, a topc model generates a set of latent topcs T based on a statstcal nference on the term set W. In ths paper, we apply the Latent Drchlet Allocaton (LDA) [3] algorthm to generate a probablstc topc model from a Web page collecton. A topc model can help capture the hypernyms, hyponyms and synonyms of a gven word. For example, the words vehcle (hypernym) and automoble (hyponym) would be clustered nto the same topc. In addton, the words flm (synonym) and move (synonym) would also be clustered nto the same topc. The topc model helps mprove the performance of a classfcaton model by (1) reducng the number of feature dmensons and (2) mappng the semantcally related terms nto the same feature dmenson. In addton to the concept of topc model, our proposed method also ntegrates some addtonal term features from neghborng pages (.e., parent, chld and sblng pages). Usng some addtonal terms from neghborng pages could help ncrease more evdence for learnng the classfcaton model [4, 5]. We used the Support Vector Machnes (SVM) [6, 7] as the classfcaton algorthm. SVM has been successfully appled to text categorzaton tasks [6, 7, 8, 9]. SVM s based on the structural rsk mnmzaton prncple from computatonal theory. The algorthm addresses the general problem of learnng to dscrmnate between postve and negatve members of a gven class of n-dmensonal vectors. Indeed, the SVM classfer s desgned to solve only the bnary classfcaton problem [7]. In order to manage the mult-class classfcaton problem, many researches have proposed herarchcal classfcaton methods for solvng the mult-class problem. For example, Dumas and Chen proposed the herarchcal method by usng SVM classfer for classfyng a large, heterogeneous collecton of web content. The study showed that the herarchcal method has a better performance than the flat method [10]. Ca and Hofmann proposed a herarchcal classfcaton method that generalzes SVM based on dscrmnant functons that are structured n a 166

2 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, way that mrrors the class herarchy. The study showed that the herarchcal SVM method has a better performance than the flat SVM method [11]. Most of the related work presented a herarchcal classfcaton method by usng dfferent approaches. However n prevous works, the bag of words (BOW) model s used to represent the feature space. In ths paper, we propose a new herarchcal classfcaton method by usng a topc model and ntegratng neghborng pages. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton and (2) learnng classfcaton model. We evaluated among three dfferent feature representatons: (1) applyng the smple BOW model on current page, (2) applyng the topc model on current page, and (3) ntegratng the neghborng pages va the topc model. To construct a herarchcal classfcaton model, we use the class relatonshps obtaned from a confuson matrx of the flat SVM classfcaton model. The expermental results showed that by ntegratng the addtonal neghborng nformaton va a topc model, the classfcaton performance under the F1 measure was sgnfcantly mproved over the smple BOW model. In addton, our proposed herarchcal classfcaton method yelded a better performance compared to the SVM classfcaton method. The rest of ths paper s organzed as follows. In the next secton we provde a bref revew of Latent Drchlet Allocaton (LDA). Secton 3 presents the proposed framework of herarchcal classfcaton va the topc model and neghborng pages ntegraton. Secton 4 presents the experments wth the dscusson on the results. In Secton 5, we conclude the paper. II. A REVIEW OF LATENT DIRICHLET ALLOCATION Latent Drchlet Allocaton (LDA) has been ntroduced as a generatve probablstc model for a set of documents [3, 12]. The basc dea behnd ths approach s that documents are represented as random mxtures over latent topcs. Each topc s represented by a probablty dstrbuton over the terms. Each artcle s represented by a probablty dstrbuton over the topcs. LDA has also been appled for dentfcaton of topcs n a number of dfferent areas such as classfcaton, collaboratve flterng [3] and content-based flterng [13]. Generally, an LDA model can be represented as a probablstc graphcal model as shown n Fgure 1 [3]. There are three levels to the LDA representaton. The varables α and β are the corpus-level parameters, whch are assumed to be sampled durng the process of generatng a corpus. α s the parameter of the unform Drchlet pror on the per-document topc dstrbutons. β s the parameter of the unform Drchlet pror on the per-topc word dstrbuton. θ s a document-level varable, sampled once per document. Fnally, the varables z and w are word-level varables and are sampled once for each word n each document. The varable N s the number of word tokens n a document and varable M s the number of documents. Fgure 1. The Latent Drchlet Allocaton (LDA) model The LDA model [3] ntroduces a set of K latent varables, called topcs. Each word n the document s assumed to be generated by one of the topcs. The generatve process for each document w can be descrbed as follows: 1. Choose θ ~ Dr ( α ): Choose a latent topcs mxture vector θ from the Drchlet dstrbuton. 2. For each word wn W (a) Choose a topc z n ~ Multnomal ( θ ): Choose a latent topc z n from the multnomal dstrbuton. (b) Choose a word w n from Pw ( n z n, β ) a multnomal probablty condtoned on the topc z n. III. THE PROPOSED HIERARCHICAL CLASSIFICATION FRAMEWORK Fgure 2 llustrates the proposed herarchcal classfcaton framework whch conssts of two phases: (1) feature representaton for learnng the Web page classfcaton models, (2) learnng classfcaton models based on the Support Vector Machnes (SVM). In our proposed framework, we evaluated among three dfferent feature representatons: (1) applyng the smple BOW model on current page, (2) applyng the topc model on current page, and (3) ntegratng the neghborng pages va the topc model. After the feature representaton process, we use the class relatonshps obtaned from a confuson matrx of the flat SVM classfcaton model for buldng a new herarchcal classfcaton method. A. Feature Representaton The process for feature representaton can be explaned n detals as follows. Approach 1 (BOW): Gven a Web page collecton conssts of an artcle collecton whch s a set of m documents denoted by D = {D 0,, D m 1 }. In the process of text processng s appled to extract terms. Gven a set of terms s represented W = {W 0,, W k-1 }, where k s the total number of terms. Each term s provded wth certan weght w, whch the weght of each term s assgned wth term frequency. The set of terms s then fltered by usng the feature selecton technque, nformaton gan (IG) [1]. Once the term features are obtaned, we apply the Support Vector Machnes (SVM) to learn the classfcaton model. The model s then used to evaluate the performance of category predcton

3 Approach 2 (TOPIC_CURRENT): Gven a Web page collecton consstng of an artcle collecton whch s a set of m documents denoted by D = {D 0,, D m 1 }. The process of text processng s appled to extract terms. The set of terms s then generated by usng the topc model based on the LDA algorthm. The LDA algorthm generates a set of n topcs denoted by T = {T 0,, T n 1 }. Each topc s a probablty dstrbuton over p words denoted by T = [ w,, ], 0 wp 1 where w j s a probablstc value of word j assgned to topc. Based on ths topc model, each document can be represented as a probablty dstrbuton over the topc set T,.e., D = [ t,, tn 1 ], where t j s a probablstc value of topc j assgned to document. The output from ths step s the topc probablty representaton for each artcle. The Support Vector Machnes (SVM) s also used to learn the classfcaton model. Approach 3 (TOPIC_ INTEGRATED): The man dfference of ths approach from Approach 2 s we ntegrate the addtonal term features obtaned from the neghborng pages to mprove the performance of Web page classfcaton. (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, The process of ntegratng the neghborng pages s explaned as follows. Fgure 3 shows three types of neghborng pages, parent chld and sblng pages. Gven a Web page (.e., current page), there are typcally ncomng lnks from parent pages, outgong lnks to chld pages and lnks from ts parent pages to sblng pages. A parent chld and sblng pages are collectvely referred to as the neghborng pages. Usng the addtonal terms from the neghborng pages could help ncrease more evdence for learnng the classfcaton model. In ths paper, we vary a weght value of neghborng pages from zero to one. A weght value equals to zero means the neghborng pages are not ncluded for the feature 0 representaton. Under ths approach, terms from dfferent page types (.e., current, parent, chld and sblng) are frst transformed nto a set of n topcs (denoted by T = {T 0,..., T n-1 }) by usng the LDA algorthm. The weght values from 0 to 1 are then multpled to the topc dmenson T of parent, chld and sblng pages. The combned topc feature vector by ntegratng the neghborng topc vectors wth adjusted weght values can be computed by usng the algorthm lsted n Table 1. Fgure 2. The proposed herarchcal classfcaton framework 168

4 Fgure 3. A current Web page wth three types of neghborng pages (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, propose a functon for varyng the weght values of terms from parent pages (PDT), chld pages (CDT) and sblng pages (SDT). The probablty values from all neghborng pages are ntegrated wth the current page (CurDT) to form a new ntegrated matrx (IDT). The process of algorthm begns wth the results from the LDA model; that s document-topc matrces from all page types. The algorthm starts by gatherng data from documenttopc matrces (CurDT, PDT, CDT, SDT) usng getpvalue functon. All P-values of the document-topc matrces are then multpled by the weght values of each document-topc matrx except for the current page matrx. Fnally all P-values from four matrces are summed up and then sent to IDT usng setpvalue functon. After the ntegratng process, we use the IDT matrx for learnng the classfcaton model. TABLE I. THE INTEGRATING NEIGHBORING PAGES (INP) ALGORITHM Algorthm : INP Input: CurDT, PDT, CDT, SDT, W p, W c, W s for all d n CurDT do for all t j n CurDT do CurDT getpvalue(curdt,, j) PP getpvalue(pdt,, j) * W p PC getpvalue(cdt,, j) * W c PS getpvalue(sdt,, j) * W s setpvalue(idt, CurDT + PP + PC + PS,, j) end for end for return IDT Parameters and varables: CurDT : document-topc matrx from current page PDT : document-topc matrx from parent pages CDT : document-topc matrx from chld pages SDT : document-topc matrx from sblng pages IDT : ntegrated document-topc matrx PP : P-Value from PDT at specfc ndex PC : P-value from CDT at specfc ndex PS : P-value from SDT at specfc ndex W p : weght value for parent pages, 0.0 W p 1.0 W c : weght value for chld pages, 0.0 W c 1.0 W s : weght value for sblng pages, 0.0 W s 1.0 P-value : probablty value getpvalue(m, r, c) : functon for gettng P-Value from row r and column c of matrx M setpvalue(m, p, r, c) : functon for settng P-Value on row r, column c of matrx M wth value p The INP algorthm that we present n ths paper ncorporates term features obtaned from the neghborng pages (.e. parent, chld and sblng pages) nto the classfcaton model. Usng addtonal terms from the neghborng pages could help ncrease more evdence for learnng the classfcaton model. In ths algorthm, we B. Classfcaton Model Three dfferent feature representaton approaches are used as nput to classfers. In ths paper, we propose two methods for buldng the classfcaton models: (1) Model 1: we adopt the SVM to classfy feature and (2) Model 2: we presented a new herarchcal classfcaton method by usng the class relatonshps obtaned from a confuson matrx for learnng a classfcaton model. Each method s descrbed n detals as follows. Model 1 (SVM): We used the SVM for learnng a classfcaton model. The SVM s the machne learnng algorthm proposed by Vapnk [7]. The algorthm constructs a maxmum margn hyperplane whch separates a set of postve examples from a set of negatve examples. In the case of examples not lnearly separable, SVM uses a kernel functons to map the examples from nput space nto hgh dmensonal feature space. Usng a kernel functon can solve the non-lnear problem. In our experments, we used a polynomal kernel. We mplemented the SVM classfer by usng the WEKA 1 lbrary. Model 2 (HSVM): The proposed method s based on SVM classfer, whch uses the class relatonshp obtaned from a confuson matrx for buldng a herarchcal SVM (HSVM). A confuson matrx shows the number of correct and ncorrect predctons made by the model compared wth the actual classfcatons of the test data. The sze of confuson matrx s m-by-m, where m s the number of classes. Fgure 4 shows an example of a confuson matrx from Approach 3 bult on a collecton of artcles obtaned from the Wkpeda Selecton for Schools. In a confuson matrx, the row corresponds to the actual classes, and the column corresponds to the predcton classes. In ths example, for class art, the model makes the correct predcton equal to 49 nstances and ncorrect predcton nto class ctzenshp (c) for 1 nstance and nto class desgn and technology (e) for 5 nstances. 1 Weka

5 Fgure 4. A confuson matrx of Wkpeda Selecton for Schools We used the confuson matrx for constructng a herarchcal structure. Frst, we need to transform the confuson matrx nto a new symmetrc matrx, called average parwse confuson matrx (APCM) by computng average values of parwse relatonshps between classes n a confuson matrx (CM). The process of transformng CM nto APCM can be explaned as follows. Gven a confuson matrx CM = [v a,p ], where a denotes each row correspondng to actual classes and p denotes each column correspondng to the predcton classes. For the correct predcton,.e., a equals to p n CM, we set the value equal to 0 n APCM. If a s not equal to p,.e., ncorrect predcton, we compute an average value of v a,p and v p,a for a parwse confuson value at ths poston. We appled ths calculaton method for every row and column. For example, n Fgure 4, v 0,0 = 49, a s equal to p (a correct predcton), v 0,0 s set equal to 0 n APCM. For v 0,2 = 1, where a = 0, p = 2 (a s not equal to p), an average parwse confuson value of v 0,2 and v 2,0 s equal to 1. The fnal result of an average parwse confuson matrx computaton s shown n Fgure 5. The computaton of an average parwse value s summarzed by the followng equaton: (2) where w ( v + v ) a, p p, a a, p =, f a p (1) 2 w a, p = 0, f a = p w a, p v a, p = A value from an average parwse confuson matrx (APCM) at row a and column p = A value from a confuson matrx (CM) at row a and column p (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, algorthm to merge two clusters by selectng maxmum average parwse value n a confuson matrx. We frst select a par of classes whch has the maxmum average parwse value n APCM to a dendrogram and select the next hghest average parwse value and go on wth ths process untl all classes are selected nto the dendrogram. The fnal result of dendrogram s shown n Fgure 6. For example, an average parwse value between class f and o s 21.5, the hghest value n APCM, therefore class f and class o are selected as the frst par n the dendrogram. The second hghest value s 19, ths value s an average parwse value between class h and m, therefore class h and class m are selected as the second par. The thrd hghest value s 15 between class g and o. However, class o s already pared wth class f. Therefore, we take only class g to combne wth class f and class g nodes. We perform ths process for all remanng classes. Fnally, we obtan a complete dendrogram for constructng the herarchcal classfcaton model. The herarchcal classfcaton models are constructed from bottom-up level. Wth ths herarchcal classfcaton structure, classes wth lower confuson values are classfed before classes wth hgher confuson. The herarchcal classfcaton model could help mprove the performance of mult-class classfcaton method. Fgure 5. An average parwse confuson matrx of Wkpeda Selecton for Schools Once the average parwse confuson matrx (APCM) s obtaned, we construct a dendrogram based on the sngle lnk algorthm of herarchcal agglomeratve clusterng (HAC) [14,15]. Sngle lnk clusterng s known to be confused by nearby overlappng clusters whch merge two clusters wth the smallest mnmum parwse dstance [14]. To construct our herarchcal classfcaton structure, we adopt the sngle lnk Fgure 6. A herarches of Wkpeda Selecton for Schools 170

6 IV. EXPERIMENTS AND DISCUSSION A. Web page collecton In our experments, we used a collecton of artcles obtaned from the Wkpeda Selecton for Schools whch s avalable from the SOS Chldren's Vllages Web ste 2. There are 15 categores: art, busness studes, ctzenshp, countres, desgn and technology, everyday lfe, geography, hstory, IT, language and lterature, mathematcs, musc, people, relgon and scence. The total number of artcles s 4,625. Table 2 lsts the frst-level subject categores avalable from the collecton. Organzng artcles nto the subject category set provdes users a convenent way to access the artcles on the same subject. Each artcle contans many hypertext lnks to other artcles whch are related to the current artcle. TABLE II. Category THE SUBJECT CATEGORIES UNDER THE WIKIPEDIA SELECTION FOR SCHOOL No. of Artcles Category No. of Artcles Art 74 Busness Studes 88 Ctzenshp 224 Countres 220 Desgn and Technology 250 Everyday lfe 380 Geography 650 Hstory 400 IT 64 Language and lterature 196 Mathematcs 45 Musc 140 People 680 Relgon 146 Scence 1068 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, the number of correctly classfed test documents Accuracy = (3) total number of test documents the number of correct postve predctons Precson = the number of postve predctons the number of correct postve predctons Recall = the number of postve data (4) (5) precson recall F1 = 2 (6) precson + recall where Accuracy represents the percentage of correct predctons n total predctons. Precson (P) s the percentage of the predcted documents for a gven category that are classfed correctly. Recall (R) s the percentage of the documents for a gven category that are classfed correctly. F1 measure s a sngle measure that tres to combne precson and recall. F1 measure ranges from 0 to 1 and the hgher the better. D. Expermental results We started by evaluatng the weght values of neghborng pages under Approach 3. Table 3 shows the results of combnaton the weght value of neghborng pages on our algorthm. For the SVM model, the best combnaton of neghborng pages wth the accuracy equal to 85.21% and the F1 measure of by weght of parent pages, chld pages and sblng pages equal to 0.4, 0.0 and 0.3, respectvely and for the HSVM model has the best combnaton of neghborng pages wth the accuracy equal to 90.33% and the F1 measure of by weght the same SVM model. The results showed that usng nformaton from parent pages and sblng pages are more effectve than chld pages for mprovng the performance of a classfcaton model. B. Experments We used the LDA algorthm provded by the lngustc analyss tool called LngPpe 3 to run our experments. LngPpe s a sute of Java tools desgned to perform lngustc analyss on natural language data. In ths experment, we applyed the LDA algorthm provded under the LngPpe API and set the number of topcs equal to 200 and the number of epochs to 2,000.For text classfcaton process, we used WEKA, an open-source machne learnng tool, to perform the experments. C. Evaluaton Metrcs The standard performance metrcs for evaluatng the text classfcaton used n the experments are accuracy, precson, recall and F1 measure [16]. We tested all algorthms by usng the 10-fold cross valdaton. Accuracy, precson, recall and F1 measure are defned as: TABLE III. CLASSIFICATION RESULTS BY INTEGRATING NEIGHBORING PAGES Models W p W c W s P R F1 Accuracy (%) SVM HSVM From Table 4, the results of classfcaton model based on two models between the SVM model and the herarchcal SVM (HSVM), the approach of ntegratng current page wth the neghborng pages va the topc model (TOPIC_INTEGRATED) yelded a hgher accuracy compared to applyng the topc model on current page (TOPIC_CURRENT) and applyng the BOW model. For the SVM model, on the TOPIC_INTEGRATED approach, the hghest accuracy s 85.21%; mprovement of 23.96% over the BOW model. For the HSVM model, on the TOPIC_INTEGRATED approach, the hghest accuracy s 90.33%; mprovement of 4.64% over the BOW model. 2 SOS Chldren's Vllages Web ste. charty-news/wkpeda-for- schools.htm 3 LngPpe

7 TABLE IV. EVALUATION RESULTS ON CLASSIFICATION MODELS BY USING THREE FEATURE REPRESENTATION APPROACHES (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, TABLE V. CLASSIFICATION RESULTS BASE ON THREE FEATURE REPRESENTATION APPROACHES Classfcaton Models Feature Representaton Approaches SVM HSVM Accuracy (%) Accuracy (%) 1. BOW TOPIC_CURRENT TOPIC_INTEGRATED (W p =0.4,W c =0.0 & W s =0.3) Table 5 shows the expermental results of three feature representaton approaches by usng two models between the SVM model and the herarchcal SVM (HSVM) model for the learnng classfcaton model. From ths table, the approach of ntegratng current page wth neghborng pages va the topc model (TOPIC_INTEGRATED) yelded a hgher performance compared to applyng the topc model on current page (TOPIC_CURRENT) and applcaton of the BOW model. The HSVM classfcaton model yelded a hgher performance compared to the SVM classfcaton model n all three feature representaton approaches. The results of classfcaton model based on the SVM model, applyng the TOPIC_CURRENT approach helped mprove the performance over the BOW by 17.2% based on the F1 measure and applyng the TOPIC_INTEGRATED approach, yelded the best performance wth the F1 measure of 85.01%; mprovement of 23.81% over the BOW model. For the learnng classfcaton model based on the HSVM model, applyng the TOPIC_CURRENT approach helped mprove the performance over the BOW by 3.88% based on the F1 measure and applyng the TOPIC_INTEGRATED, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.11% over the BOW model. The approach of ntegratng current page wth the neghborng pages va the topc model (TOPIC_INTEGRATED) and usng the HSVM model, however, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.13% over the TOPIC_INTEGRATED approach by usng the SVM model. Thus, ntegratng the addtonal neghborng nformaton, especally from the parent pages and sblng pages, va a topc model could sgnfcantly mprove the performance of a classfcaton model. The reason s due to the parent pages often provde terms, such as n the anchor texts, whch provde addtonal descrptve nformaton of the current page. Feature Representaton Approaches SVM Classfcaton Models V. CONCLUSIONS HSVM P R F1 P R F1 1. BOW TOPIC_CURRENT TOPIC_INTEGRATED (W p =0.4,W c =0.0 & W s =0.3) To mprove the performance of Web page classfcaton, we proposed a new herarchcal classfcaton method based on a topc model and by ntegratng the addtonal term features obtaned from the neghborng pages to mprove the performance of Web page classfcaton. We appled the topc model approach based on the Latent Drchlet Allocaton algorthm to cluster the term features nto a set of latent topcs. Terms assgned nto the same topc are semantcally related. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton by usng a topc model and ntegratng neghborng pages, and (2) herarchcal Support Vector Machnes (SVM) classfcaton model constructed from a confuson matrx. From the expermental results, the approach of ntegratng current page wth the neghborng pages va the topc model yelded a hgher performance compared to applyng the topc model on current page and applyng the BOW model. For learnng classfcaton model, the herarchcal SVM classfcaton model yelded a hgher performance compared to the SVM classfcaton model n all three feature representaton approaches and ntegratng current page wth the neghborng pages va the topc model approach, however, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.11% over the BOW model. The approach of ntegratng current page wth the neghborng pages va the topc model and usng the herarchcal SVM classfcaton model yelded the best performance wth the accuracy equal to 90.33% and the F1 measure of 90.14%; an mprovement of 5.12% and 5.13% over the orgnal SVM model, respectvely. REFERENCES [1] Y. Yang and J.P Pederson, A comparatve Study on Feature Selecton n Text Categorzaton, Proceedngs of the 14th Internatonal Conference on Machne Learnng, pp , [2] S. T. Dumas, J. Platt, D. Heckerman, and M. Saham, Inductve Learnng Algorthms and Representatons for Text Categorzaton, Proceedngs of CIKM 1998, pp , [3] D. M. Ble, A. Y. Ng, and M. I. Jordan, Latent Drchlet Allocaton, Journal of Machne Learnng Research, vol 3, pp , [4] X. Q and B.D. Davson, Classfers Wthout Borders: Incorporatng Felded Text From Neghborng Web Pages, Proceedngs of the 31st Annual Internatonal ACM SIGIR Conference on Research & Development on Informaton Retreval, Sngapore, pp ,

8 [5] G. Chen and B.Cho, Web page genre classfcaton, Proceedngs of 2008 ACM symposum on Appled computng, pp , [6] T. Joachms, Text Categorzaton wth Support Vector Machnes: Learnng wth Many Relevant Features, Proceedngs of European Conference on Machne Learnng (ECML), Berln, pp , [7] V. Vapnk, The Nature of Statstcal Learnng Theory, Sprnger, New York,1995. [8] A. Sun, E.-P. Lm, and W.-K. Ng., Web classfcaton usng support vector machne, Proceedngs of the 4th Int l Workshop on Web Informaton and Data Management (WIDM), ACM Press, pp , [9] W. Srura, P. Meesad, and C. Haruechayasak, A Topc-Model Based Feature Processng for Text Categorzaton, Proceedngs of the 5th Natonal Conference on Computer and Informaton Technology, pp , [10] S. Dumas and H. Chen, Herarchcal classfcaton of Web content., Proceedngs of SIGIR-00, 23rd ACM Internatonal Conference on Research and Development n Informaton Retreval, ACM Press, New York, pp , [11] L. Ca and T. Hofmann, Herarchcal document categorzaton wth support vector machnes, In CIKM, pp , [12] M. Steyvers and T.L. Grffths, Probablstc topc models, In: T., Landauer, D., McNamara, S., Denns, and W., Kntsch, (eds), Latent Semantc Analyss: A Road to Meanng, Laurence Erlbaum, [13] C. Haruechayasak and C. Damrongrat, Artcle Recommendaton Based on a Topc Model for Wkpeda Selecton for Schools, Proceedngs of the 11th Internatonal Conference on Asan Dgtal Lbrares, pp , [14] A.K. Jan and R. C. Dubes., Algorthms for Clusterng Data, Prentce Hall, [15] G. Karyps, E. Han, and V. Kumar., Chameleon: A herarchcal clusterng algorthm usng dynamc modelng, IEEE Computer, 32(8):68 75, (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, [16] H. Yu, J. Han, and K. Chen-chuan Chang., PEBL: Web Page Classfcaton wthout Negatve Examples, IEEE Computer, 16(1): 70-81, AUTHORS PROFILE Wongkot Srura receved B.Sc. degree n Computer Scence, M.S. degree n Informaton Technology from Ubon Ratchathan Unversty. Currently, she s a Ph.D. canddate n the Department of Informaton Technology at Kng Mongkut s Unversty of Technology North Bangkok. Her current research nterests Web Mnng, Informaton flterng and Recommender system. Phayung Meesad receved the B.S. from Kng Mongkut s Unversty of Technology North Bangkok M.S. and Ph.D. degree n Electrcal Engneerng from Oklahoma State Unversty. Hs current research nterests Fuzzy Systems and Neural Networks, Evolutonary Computaton and Dscrete Control Systems. Currently, he s an Assstant Professor n Department of Teacher Tranng n Electrcal Engneerng at Kng Mongkut s Unversty of Technology North Bangkok Thaland. Choochart Haruechayasak receved B.S. from Unversty of Rochester, M.S. from Unversty of Southern Calforna and Ph.D. degree n Computer Engneerng from Unversty of Mam. Hs current research nterests Search technology, Data/text/Web mnng, Informaton flterng and Recommender system. Currently, he s chef of the Intellgent Informaton Infrastructure Secton under the Human Language Technology Laboratory (HLT) at Natonal Electroncs and Computer Technology Center (NECTEC), Thaland

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Collaborative Topic Regression with Multiple Graphs Factorization for Recommendation in Social Media

Collaborative Topic Regression with Multiple Graphs Factorization for Recommendation in Social Media Collaboratve Topc Regresson wth Multple Graphs Factorzaton for Recommendaton n Socal Meda Qng Zhang Key Laboratory of Computatonal Lngustcs (Pekng Unversty) Mnstry of Educaton, Chna zqcl@pku.edu.cn Houfeng

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012 Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines An Evaluaton of Dvde-and-Combne Strateges for Image Categorzaton by Mult-Class Support Vector Machnes C. Demrkesen¹ and H. Cherf¹, ² 1: Insttue of Scence and Engneerng 2: Faculté des Scences Mrande Galatasaray

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Enhanced Watermarking Technique for Color Images using Visual Cryptography

Enhanced Watermarking Technique for Color Images using Visual Cryptography Informaton Assurance and Securty Letters 1 (2010) 024-028 Enhanced Watermarkng Technque for Color Images usng Vsual Cryptography Enas F. Al rawashdeh 1, Rawan I.Zaghloul 2 1 Balqa Appled Unversty, MIS

More information

Clustering of Words Based on Relative Contribution for Text Categorization

Clustering of Words Based on Relative Contribution for Text Categorization Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

A Hidden Markov Model Variant for Sequence Classification

A Hidden Markov Model Variant for Sequence Classification Proceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence A Hdden Markov Model Varant for Sequence Classfcaton Sam Blasak and Huzefa Rangwala Computer Scence, George Mason Unversty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Data Mining For Multi-Criteria Energy Predictions

Data Mining For Multi-Criteria Energy Predictions Data Mnng For Mult-Crtera Energy Predctons Kashf Gll and Denns Moon Abstract We present a data mnng technque for mult-crtera predctons of wnd energy. A mult-crtera (MC) evolutonary computng method has

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks Seventh Internatonal Conference on Intellgent Systems Desgn and Applcatons GA-Based Learnng Algorthms to Identfy Fuzzy Rules for Fuzzy Neural Networks K Almejall, K Dahal, Member IEEE, and A Hossan, Member

More information

Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning

Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning Internatonal Journal of Intellgent Informaton Systems 205; 4(3): 58-70 Publshed onlne July 8, 205 (http://www.scencepublshnggroup.com//s) do: 0.648/.s.2050403.2 ISSN: 2328-7675 (Prnt); ISSN: 2328-7683

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.

More information