Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines
|
|
- Garey Cook
- 6 years ago
- Views:
Transcription
1 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak Department of Informaton Technology Department of Teacher Tranng n Electrcal Human Language Technology Laboratory Kng Mongkut s Unversty of Engneerng, Kng Mongkut s Unversty of Natonal Electroncs and Computer Technology North Bangkok, Technology North Bangkok, Technology Center (NECTEC), Bangkok, Thaland Bangkok, Thaland Bangkok, Thaland. Abstract Most Web page classfcaton models typcally apply the bag of words (BOW) model to represent the feature space. The orgnal BOW representaton, however, s unable to recognze semantc relatonshps between terms. One possble soluton s to apply the topc model approach based on the Latent Drchlet Allocaton algorthm to cluster the term features nto a set of latent topcs. Terms assgned nto the same topc are semantcally related. In ths paper, we propose a novel herarchcal classfcaton method based on a topc model and by ntegratng addtonal term features from neghborng pages. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton by usng a topc model and ntegratng neghborng pages, and (2) herarchcal Support Vector Machnes (SVM) classfcaton model constructed from a confuson matrx. From the expermental results, the approach of usng the proposed herarchcal SVM model by ntegratng current page wth neghborng pages va the topc model yelded the best performance wth the accuracy equal to 90.33% and the F1 measure of 90.14%; an mprovement of 5.12% and 5.13% over the orgnal SVM model, respectvely. Keywords - Wep page classfcaton; bag of words model; topc model; herarchcal classfcaton; Support Vector Machnes I. INTRODUCTION Due to the rapd growth of Web documents (e.g., Web pages, blogs, emals) on the World Wde Web (WWW), Web page classfcaton has become one of the key technques for managng and organzng those documents, e.g., as document flterng n nformaton retreval. Generally, Web page classfcaton apples the technque of text categorzaton, whch uses the supervsed machne learnng algorthms for learnng the classfcaton model [1, 2]. Most prevous works on Web page classfcaton typcally appled the bag of words (BOW) model to represent the feature space. Under the BOW model, a Web page s represented by a vector n whch each dmenson contans a weght value (e.g., frequency) of a word (or term) occurrng n the page. The orgnal BOW representaton, however, s unable to recognze synonyms from a gven word set. As a result, the performance of a classfcaton model usng the BOW model could become deterorated. In ths paper, we apply a topc model to represent the feature space for learnng the Web page classfcaton model. Under the topc model concept, words (or terms), whch are statstcally dependent, are clustered nto the same topcs. Gven a set of documents D consstng of a set of terms (or words) W, a topc model generates a set of latent topcs T based on a statstcal nference on the term set W. In ths paper, we apply the Latent Drchlet Allocaton (LDA) [3] algorthm to generate a probablstc topc model from a Web page collecton. A topc model can help capture the hypernyms, hyponyms and synonyms of a gven word. For example, the words vehcle (hypernym) and automoble (hyponym) would be clustered nto the same topc. In addton, the words flm (synonym) and move (synonym) would also be clustered nto the same topc. The topc model helps mprove the performance of a classfcaton model by (1) reducng the number of feature dmensons and (2) mappng the semantcally related terms nto the same feature dmenson. In addton to the concept of topc model, our proposed method also ntegrates some addtonal term features from neghborng pages (.e., parent, chld and sblng pages). Usng some addtonal terms from neghborng pages could help ncrease more evdence for learnng the classfcaton model [4, 5]. We used the Support Vector Machnes (SVM) [6, 7] as the classfcaton algorthm. SVM has been successfully appled to text categorzaton tasks [6, 7, 8, 9]. SVM s based on the structural rsk mnmzaton prncple from computatonal theory. The algorthm addresses the general problem of learnng to dscrmnate between postve and negatve members of a gven class of n-dmensonal vectors. Indeed, the SVM classfer s desgned to solve only the bnary classfcaton problem [7]. In order to manage the mult-class classfcaton problem, many researches have proposed herarchcal classfcaton methods for solvng the mult-class problem. For example, Dumas and Chen proposed the herarchcal method by usng SVM classfer for classfyng a large, heterogeneous collecton of web content. The study showed that the herarchcal method has a better performance than the flat method [10]. Ca and Hofmann proposed a herarchcal classfcaton method that generalzes SVM based on dscrmnant functons that are structured n a 166
2 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, way that mrrors the class herarchy. The study showed that the herarchcal SVM method has a better performance than the flat SVM method [11]. Most of the related work presented a herarchcal classfcaton method by usng dfferent approaches. However n prevous works, the bag of words (BOW) model s used to represent the feature space. In ths paper, we propose a new herarchcal classfcaton method by usng a topc model and ntegratng neghborng pages. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton and (2) learnng classfcaton model. We evaluated among three dfferent feature representatons: (1) applyng the smple BOW model on current page, (2) applyng the topc model on current page, and (3) ntegratng the neghborng pages va the topc model. To construct a herarchcal classfcaton model, we use the class relatonshps obtaned from a confuson matrx of the flat SVM classfcaton model. The expermental results showed that by ntegratng the addtonal neghborng nformaton va a topc model, the classfcaton performance under the F1 measure was sgnfcantly mproved over the smple BOW model. In addton, our proposed herarchcal classfcaton method yelded a better performance compared to the SVM classfcaton method. The rest of ths paper s organzed as follows. In the next secton we provde a bref revew of Latent Drchlet Allocaton (LDA). Secton 3 presents the proposed framework of herarchcal classfcaton va the topc model and neghborng pages ntegraton. Secton 4 presents the experments wth the dscusson on the results. In Secton 5, we conclude the paper. II. A REVIEW OF LATENT DIRICHLET ALLOCATION Latent Drchlet Allocaton (LDA) has been ntroduced as a generatve probablstc model for a set of documents [3, 12]. The basc dea behnd ths approach s that documents are represented as random mxtures over latent topcs. Each topc s represented by a probablty dstrbuton over the terms. Each artcle s represented by a probablty dstrbuton over the topcs. LDA has also been appled for dentfcaton of topcs n a number of dfferent areas such as classfcaton, collaboratve flterng [3] and content-based flterng [13]. Generally, an LDA model can be represented as a probablstc graphcal model as shown n Fgure 1 [3]. There are three levels to the LDA representaton. The varables α and β are the corpus-level parameters, whch are assumed to be sampled durng the process of generatng a corpus. α s the parameter of the unform Drchlet pror on the per-document topc dstrbutons. β s the parameter of the unform Drchlet pror on the per-topc word dstrbuton. θ s a document-level varable, sampled once per document. Fnally, the varables z and w are word-level varables and are sampled once for each word n each document. The varable N s the number of word tokens n a document and varable M s the number of documents. Fgure 1. The Latent Drchlet Allocaton (LDA) model The LDA model [3] ntroduces a set of K latent varables, called topcs. Each word n the document s assumed to be generated by one of the topcs. The generatve process for each document w can be descrbed as follows: 1. Choose θ ~ Dr ( α ): Choose a latent topcs mxture vector θ from the Drchlet dstrbuton. 2. For each word wn W (a) Choose a topc z n ~ Multnomal ( θ ): Choose a latent topc z n from the multnomal dstrbuton. (b) Choose a word w n from Pw ( n z n, β ) a multnomal probablty condtoned on the topc z n. III. THE PROPOSED HIERARCHICAL CLASSIFICATION FRAMEWORK Fgure 2 llustrates the proposed herarchcal classfcaton framework whch conssts of two phases: (1) feature representaton for learnng the Web page classfcaton models, (2) learnng classfcaton models based on the Support Vector Machnes (SVM). In our proposed framework, we evaluated among three dfferent feature representatons: (1) applyng the smple BOW model on current page, (2) applyng the topc model on current page, and (3) ntegratng the neghborng pages va the topc model. After the feature representaton process, we use the class relatonshps obtaned from a confuson matrx of the flat SVM classfcaton model for buldng a new herarchcal classfcaton method. A. Feature Representaton The process for feature representaton can be explaned n detals as follows. Approach 1 (BOW): Gven a Web page collecton conssts of an artcle collecton whch s a set of m documents denoted by D = {D 0,, D m 1 }. In the process of text processng s appled to extract terms. Gven a set of terms s represented W = {W 0,, W k-1 }, where k s the total number of terms. Each term s provded wth certan weght w, whch the weght of each term s assgned wth term frequency. The set of terms s then fltered by usng the feature selecton technque, nformaton gan (IG) [1]. Once the term features are obtaned, we apply the Support Vector Machnes (SVM) to learn the classfcaton model. The model s then used to evaluate the performance of category predcton
3 Approach 2 (TOPIC_CURRENT): Gven a Web page collecton consstng of an artcle collecton whch s a set of m documents denoted by D = {D 0,, D m 1 }. The process of text processng s appled to extract terms. The set of terms s then generated by usng the topc model based on the LDA algorthm. The LDA algorthm generates a set of n topcs denoted by T = {T 0,, T n 1 }. Each topc s a probablty dstrbuton over p words denoted by T = [ w,, ], 0 wp 1 where w j s a probablstc value of word j assgned to topc. Based on ths topc model, each document can be represented as a probablty dstrbuton over the topc set T,.e., D = [ t,, tn 1 ], where t j s a probablstc value of topc j assgned to document. The output from ths step s the topc probablty representaton for each artcle. The Support Vector Machnes (SVM) s also used to learn the classfcaton model. Approach 3 (TOPIC_ INTEGRATED): The man dfference of ths approach from Approach 2 s we ntegrate the addtonal term features obtaned from the neghborng pages to mprove the performance of Web page classfcaton. (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, The process of ntegratng the neghborng pages s explaned as follows. Fgure 3 shows three types of neghborng pages, parent chld and sblng pages. Gven a Web page (.e., current page), there are typcally ncomng lnks from parent pages, outgong lnks to chld pages and lnks from ts parent pages to sblng pages. A parent chld and sblng pages are collectvely referred to as the neghborng pages. Usng the addtonal terms from the neghborng pages could help ncrease more evdence for learnng the classfcaton model. In ths paper, we vary a weght value of neghborng pages from zero to one. A weght value equals to zero means the neghborng pages are not ncluded for the feature 0 representaton. Under ths approach, terms from dfferent page types (.e., current, parent, chld and sblng) are frst transformed nto a set of n topcs (denoted by T = {T 0,..., T n-1 }) by usng the LDA algorthm. The weght values from 0 to 1 are then multpled to the topc dmenson T of parent, chld and sblng pages. The combned topc feature vector by ntegratng the neghborng topc vectors wth adjusted weght values can be computed by usng the algorthm lsted n Table 1. Fgure 2. The proposed herarchcal classfcaton framework 168
4 Fgure 3. A current Web page wth three types of neghborng pages (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, propose a functon for varyng the weght values of terms from parent pages (PDT), chld pages (CDT) and sblng pages (SDT). The probablty values from all neghborng pages are ntegrated wth the current page (CurDT) to form a new ntegrated matrx (IDT). The process of algorthm begns wth the results from the LDA model; that s document-topc matrces from all page types. The algorthm starts by gatherng data from documenttopc matrces (CurDT, PDT, CDT, SDT) usng getpvalue functon. All P-values of the document-topc matrces are then multpled by the weght values of each document-topc matrx except for the current page matrx. Fnally all P-values from four matrces are summed up and then sent to IDT usng setpvalue functon. After the ntegratng process, we use the IDT matrx for learnng the classfcaton model. TABLE I. THE INTEGRATING NEIGHBORING PAGES (INP) ALGORITHM Algorthm : INP Input: CurDT, PDT, CDT, SDT, W p, W c, W s for all d n CurDT do for all t j n CurDT do CurDT getpvalue(curdt,, j) PP getpvalue(pdt,, j) * W p PC getpvalue(cdt,, j) * W c PS getpvalue(sdt,, j) * W s setpvalue(idt, CurDT + PP + PC + PS,, j) end for end for return IDT Parameters and varables: CurDT : document-topc matrx from current page PDT : document-topc matrx from parent pages CDT : document-topc matrx from chld pages SDT : document-topc matrx from sblng pages IDT : ntegrated document-topc matrx PP : P-Value from PDT at specfc ndex PC : P-value from CDT at specfc ndex PS : P-value from SDT at specfc ndex W p : weght value for parent pages, 0.0 W p 1.0 W c : weght value for chld pages, 0.0 W c 1.0 W s : weght value for sblng pages, 0.0 W s 1.0 P-value : probablty value getpvalue(m, r, c) : functon for gettng P-Value from row r and column c of matrx M setpvalue(m, p, r, c) : functon for settng P-Value on row r, column c of matrx M wth value p The INP algorthm that we present n ths paper ncorporates term features obtaned from the neghborng pages (.e. parent, chld and sblng pages) nto the classfcaton model. Usng addtonal terms from the neghborng pages could help ncrease more evdence for learnng the classfcaton model. In ths algorthm, we B. Classfcaton Model Three dfferent feature representaton approaches are used as nput to classfers. In ths paper, we propose two methods for buldng the classfcaton models: (1) Model 1: we adopt the SVM to classfy feature and (2) Model 2: we presented a new herarchcal classfcaton method by usng the class relatonshps obtaned from a confuson matrx for learnng a classfcaton model. Each method s descrbed n detals as follows. Model 1 (SVM): We used the SVM for learnng a classfcaton model. The SVM s the machne learnng algorthm proposed by Vapnk [7]. The algorthm constructs a maxmum margn hyperplane whch separates a set of postve examples from a set of negatve examples. In the case of examples not lnearly separable, SVM uses a kernel functons to map the examples from nput space nto hgh dmensonal feature space. Usng a kernel functon can solve the non-lnear problem. In our experments, we used a polynomal kernel. We mplemented the SVM classfer by usng the WEKA 1 lbrary. Model 2 (HSVM): The proposed method s based on SVM classfer, whch uses the class relatonshp obtaned from a confuson matrx for buldng a herarchcal SVM (HSVM). A confuson matrx shows the number of correct and ncorrect predctons made by the model compared wth the actual classfcatons of the test data. The sze of confuson matrx s m-by-m, where m s the number of classes. Fgure 4 shows an example of a confuson matrx from Approach 3 bult on a collecton of artcles obtaned from the Wkpeda Selecton for Schools. In a confuson matrx, the row corresponds to the actual classes, and the column corresponds to the predcton classes. In ths example, for class art, the model makes the correct predcton equal to 49 nstances and ncorrect predcton nto class ctzenshp (c) for 1 nstance and nto class desgn and technology (e) for 5 nstances. 1 Weka
5 Fgure 4. A confuson matrx of Wkpeda Selecton for Schools We used the confuson matrx for constructng a herarchcal structure. Frst, we need to transform the confuson matrx nto a new symmetrc matrx, called average parwse confuson matrx (APCM) by computng average values of parwse relatonshps between classes n a confuson matrx (CM). The process of transformng CM nto APCM can be explaned as follows. Gven a confuson matrx CM = [v a,p ], where a denotes each row correspondng to actual classes and p denotes each column correspondng to the predcton classes. For the correct predcton,.e., a equals to p n CM, we set the value equal to 0 n APCM. If a s not equal to p,.e., ncorrect predcton, we compute an average value of v a,p and v p,a for a parwse confuson value at ths poston. We appled ths calculaton method for every row and column. For example, n Fgure 4, v 0,0 = 49, a s equal to p (a correct predcton), v 0,0 s set equal to 0 n APCM. For v 0,2 = 1, where a = 0, p = 2 (a s not equal to p), an average parwse confuson value of v 0,2 and v 2,0 s equal to 1. The fnal result of an average parwse confuson matrx computaton s shown n Fgure 5. The computaton of an average parwse value s summarzed by the followng equaton: (2) where w ( v + v ) a, p p, a a, p =, f a p (1) 2 w a, p = 0, f a = p w a, p v a, p = A value from an average parwse confuson matrx (APCM) at row a and column p = A value from a confuson matrx (CM) at row a and column p (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, algorthm to merge two clusters by selectng maxmum average parwse value n a confuson matrx. We frst select a par of classes whch has the maxmum average parwse value n APCM to a dendrogram and select the next hghest average parwse value and go on wth ths process untl all classes are selected nto the dendrogram. The fnal result of dendrogram s shown n Fgure 6. For example, an average parwse value between class f and o s 21.5, the hghest value n APCM, therefore class f and class o are selected as the frst par n the dendrogram. The second hghest value s 19, ths value s an average parwse value between class h and m, therefore class h and class m are selected as the second par. The thrd hghest value s 15 between class g and o. However, class o s already pared wth class f. Therefore, we take only class g to combne wth class f and class g nodes. We perform ths process for all remanng classes. Fnally, we obtan a complete dendrogram for constructng the herarchcal classfcaton model. The herarchcal classfcaton models are constructed from bottom-up level. Wth ths herarchcal classfcaton structure, classes wth lower confuson values are classfed before classes wth hgher confuson. The herarchcal classfcaton model could help mprove the performance of mult-class classfcaton method. Fgure 5. An average parwse confuson matrx of Wkpeda Selecton for Schools Once the average parwse confuson matrx (APCM) s obtaned, we construct a dendrogram based on the sngle lnk algorthm of herarchcal agglomeratve clusterng (HAC) [14,15]. Sngle lnk clusterng s known to be confused by nearby overlappng clusters whch merge two clusters wth the smallest mnmum parwse dstance [14]. To construct our herarchcal classfcaton structure, we adopt the sngle lnk Fgure 6. A herarches of Wkpeda Selecton for Schools 170
6 IV. EXPERIMENTS AND DISCUSSION A. Web page collecton In our experments, we used a collecton of artcles obtaned from the Wkpeda Selecton for Schools whch s avalable from the SOS Chldren's Vllages Web ste 2. There are 15 categores: art, busness studes, ctzenshp, countres, desgn and technology, everyday lfe, geography, hstory, IT, language and lterature, mathematcs, musc, people, relgon and scence. The total number of artcles s 4,625. Table 2 lsts the frst-level subject categores avalable from the collecton. Organzng artcles nto the subject category set provdes users a convenent way to access the artcles on the same subject. Each artcle contans many hypertext lnks to other artcles whch are related to the current artcle. TABLE II. Category THE SUBJECT CATEGORIES UNDER THE WIKIPEDIA SELECTION FOR SCHOOL No. of Artcles Category No. of Artcles Art 74 Busness Studes 88 Ctzenshp 224 Countres 220 Desgn and Technology 250 Everyday lfe 380 Geography 650 Hstory 400 IT 64 Language and lterature 196 Mathematcs 45 Musc 140 People 680 Relgon 146 Scence 1068 (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, the number of correctly classfed test documents Accuracy = (3) total number of test documents the number of correct postve predctons Precson = the number of postve predctons the number of correct postve predctons Recall = the number of postve data (4) (5) precson recall F1 = 2 (6) precson + recall where Accuracy represents the percentage of correct predctons n total predctons. Precson (P) s the percentage of the predcted documents for a gven category that are classfed correctly. Recall (R) s the percentage of the documents for a gven category that are classfed correctly. F1 measure s a sngle measure that tres to combne precson and recall. F1 measure ranges from 0 to 1 and the hgher the better. D. Expermental results We started by evaluatng the weght values of neghborng pages under Approach 3. Table 3 shows the results of combnaton the weght value of neghborng pages on our algorthm. For the SVM model, the best combnaton of neghborng pages wth the accuracy equal to 85.21% and the F1 measure of by weght of parent pages, chld pages and sblng pages equal to 0.4, 0.0 and 0.3, respectvely and for the HSVM model has the best combnaton of neghborng pages wth the accuracy equal to 90.33% and the F1 measure of by weght the same SVM model. The results showed that usng nformaton from parent pages and sblng pages are more effectve than chld pages for mprovng the performance of a classfcaton model. B. Experments We used the LDA algorthm provded by the lngustc analyss tool called LngPpe 3 to run our experments. LngPpe s a sute of Java tools desgned to perform lngustc analyss on natural language data. In ths experment, we applyed the LDA algorthm provded under the LngPpe API and set the number of topcs equal to 200 and the number of epochs to 2,000.For text classfcaton process, we used WEKA, an open-source machne learnng tool, to perform the experments. C. Evaluaton Metrcs The standard performance metrcs for evaluatng the text classfcaton used n the experments are accuracy, precson, recall and F1 measure [16]. We tested all algorthms by usng the 10-fold cross valdaton. Accuracy, precson, recall and F1 measure are defned as: TABLE III. CLASSIFICATION RESULTS BY INTEGRATING NEIGHBORING PAGES Models W p W c W s P R F1 Accuracy (%) SVM HSVM From Table 4, the results of classfcaton model based on two models between the SVM model and the herarchcal SVM (HSVM), the approach of ntegratng current page wth the neghborng pages va the topc model (TOPIC_INTEGRATED) yelded a hgher accuracy compared to applyng the topc model on current page (TOPIC_CURRENT) and applyng the BOW model. For the SVM model, on the TOPIC_INTEGRATED approach, the hghest accuracy s 85.21%; mprovement of 23.96% over the BOW model. For the HSVM model, on the TOPIC_INTEGRATED approach, the hghest accuracy s 90.33%; mprovement of 4.64% over the BOW model. 2 SOS Chldren's Vllages Web ste. charty-news/wkpeda-for- schools.htm 3 LngPpe
7 TABLE IV. EVALUATION RESULTS ON CLASSIFICATION MODELS BY USING THREE FEATURE REPRESENTATION APPROACHES (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, TABLE V. CLASSIFICATION RESULTS BASE ON THREE FEATURE REPRESENTATION APPROACHES Classfcaton Models Feature Representaton Approaches SVM HSVM Accuracy (%) Accuracy (%) 1. BOW TOPIC_CURRENT TOPIC_INTEGRATED (W p =0.4,W c =0.0 & W s =0.3) Table 5 shows the expermental results of three feature representaton approaches by usng two models between the SVM model and the herarchcal SVM (HSVM) model for the learnng classfcaton model. From ths table, the approach of ntegratng current page wth neghborng pages va the topc model (TOPIC_INTEGRATED) yelded a hgher performance compared to applyng the topc model on current page (TOPIC_CURRENT) and applcaton of the BOW model. The HSVM classfcaton model yelded a hgher performance compared to the SVM classfcaton model n all three feature representaton approaches. The results of classfcaton model based on the SVM model, applyng the TOPIC_CURRENT approach helped mprove the performance over the BOW by 17.2% based on the F1 measure and applyng the TOPIC_INTEGRATED approach, yelded the best performance wth the F1 measure of 85.01%; mprovement of 23.81% over the BOW model. For the learnng classfcaton model based on the HSVM model, applyng the TOPIC_CURRENT approach helped mprove the performance over the BOW by 3.88% based on the F1 measure and applyng the TOPIC_INTEGRATED, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.11% over the BOW model. The approach of ntegratng current page wth the neghborng pages va the topc model (TOPIC_INTEGRATED) and usng the HSVM model, however, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.13% over the TOPIC_INTEGRATED approach by usng the SVM model. Thus, ntegratng the addtonal neghborng nformaton, especally from the parent pages and sblng pages, va a topc model could sgnfcantly mprove the performance of a classfcaton model. The reason s due to the parent pages often provde terms, such as n the anchor texts, whch provde addtonal descrptve nformaton of the current page. Feature Representaton Approaches SVM Classfcaton Models V. CONCLUSIONS HSVM P R F1 P R F1 1. BOW TOPIC_CURRENT TOPIC_INTEGRATED (W p =0.4,W c =0.0 & W s =0.3) To mprove the performance of Web page classfcaton, we proposed a new herarchcal classfcaton method based on a topc model and by ntegratng the addtonal term features obtaned from the neghborng pages to mprove the performance of Web page classfcaton. We appled the topc model approach based on the Latent Drchlet Allocaton algorthm to cluster the term features nto a set of latent topcs. Terms assgned nto the same topc are semantcally related. Our herarchcal classfcaton method conssts of two phases: (1) feature representaton by usng a topc model and ntegratng neghborng pages, and (2) herarchcal Support Vector Machnes (SVM) classfcaton model constructed from a confuson matrx. From the expermental results, the approach of ntegratng current page wth the neghborng pages va the topc model yelded a hgher performance compared to applyng the topc model on current page and applyng the BOW model. For learnng classfcaton model, the herarchcal SVM classfcaton model yelded a hgher performance compared to the SVM classfcaton model n all three feature representaton approaches and ntegratng current page wth the neghborng pages va the topc model approach, however, yelded the best performance wth the F1 measure of 90.14%; mprovement of 5.11% over the BOW model. The approach of ntegratng current page wth the neghborng pages va the topc model and usng the herarchcal SVM classfcaton model yelded the best performance wth the accuracy equal to 90.33% and the F1 measure of 90.14%; an mprovement of 5.12% and 5.13% over the orgnal SVM model, respectvely. REFERENCES [1] Y. Yang and J.P Pederson, A comparatve Study on Feature Selecton n Text Categorzaton, Proceedngs of the 14th Internatonal Conference on Machne Learnng, pp , [2] S. T. Dumas, J. Platt, D. Heckerman, and M. Saham, Inductve Learnng Algorthms and Representatons for Text Categorzaton, Proceedngs of CIKM 1998, pp , [3] D. M. Ble, A. Y. Ng, and M. I. Jordan, Latent Drchlet Allocaton, Journal of Machne Learnng Research, vol 3, pp , [4] X. Q and B.D. Davson, Classfers Wthout Borders: Incorporatng Felded Text From Neghborng Web Pages, Proceedngs of the 31st Annual Internatonal ACM SIGIR Conference on Research & Development on Informaton Retreval, Sngapore, pp ,
8 [5] G. Chen and B.Cho, Web page genre classfcaton, Proceedngs of 2008 ACM symposum on Appled computng, pp , [6] T. Joachms, Text Categorzaton wth Support Vector Machnes: Learnng wth Many Relevant Features, Proceedngs of European Conference on Machne Learnng (ECML), Berln, pp , [7] V. Vapnk, The Nature of Statstcal Learnng Theory, Sprnger, New York,1995. [8] A. Sun, E.-P. Lm, and W.-K. Ng., Web classfcaton usng support vector machne, Proceedngs of the 4th Int l Workshop on Web Informaton and Data Management (WIDM), ACM Press, pp , [9] W. Srura, P. Meesad, and C. Haruechayasak, A Topc-Model Based Feature Processng for Text Categorzaton, Proceedngs of the 5th Natonal Conference on Computer and Informaton Technology, pp , [10] S. Dumas and H. Chen, Herarchcal classfcaton of Web content., Proceedngs of SIGIR-00, 23rd ACM Internatonal Conference on Research and Development n Informaton Retreval, ACM Press, New York, pp , [11] L. Ca and T. Hofmann, Herarchcal document categorzaton wth support vector machnes, In CIKM, pp , [12] M. Steyvers and T.L. Grffths, Probablstc topc models, In: T., Landauer, D., McNamara, S., Denns, and W., Kntsch, (eds), Latent Semantc Analyss: A Road to Meanng, Laurence Erlbaum, [13] C. Haruechayasak and C. Damrongrat, Artcle Recommendaton Based on a Topc Model for Wkpeda Selecton for Schools, Proceedngs of the 11th Internatonal Conference on Asan Dgtal Lbrares, pp , [14] A.K. Jan and R. C. Dubes., Algorthms for Clusterng Data, Prentce Hall, [15] G. Karyps, E. Han, and V. Kumar., Chameleon: A herarchcal clusterng algorthm usng dynamc modelng, IEEE Computer, 32(8):68 75, (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, [16] H. Yu, J. Han, and K. Chen-chuan Chang., PEBL: Web Page Classfcaton wthout Negatve Examples, IEEE Computer, 16(1): 70-81, AUTHORS PROFILE Wongkot Srura receved B.Sc. degree n Computer Scence, M.S. degree n Informaton Technology from Ubon Ratchathan Unversty. Currently, she s a Ph.D. canddate n the Department of Informaton Technology at Kng Mongkut s Unversty of Technology North Bangkok. Her current research nterests Web Mnng, Informaton flterng and Recommender system. Phayung Meesad receved the B.S. from Kng Mongkut s Unversty of Technology North Bangkok M.S. and Ph.D. degree n Electrcal Engneerng from Oklahoma State Unversty. Hs current research nterests Fuzzy Systems and Neural Networks, Evolutonary Computaton and Dscrete Control Systems. Currently, he s an Assstant Professor n Department of Teacher Tranng n Electrcal Engneerng at Kng Mongkut s Unversty of Technology North Bangkok Thaland. Choochart Haruechayasak receved B.S. from Unversty of Rochester, M.S. from Unversty of Southern Calforna and Ph.D. degree n Computer Engneerng from Unversty of Mam. Hs current research nterests Search technology, Data/text/Web mnng, Informaton flterng and Recommender system. Currently, he s chef of the Intellgent Informaton Infrastructure Secton under the Human Language Technology Laboratory (HLT) at Natonal Electroncs and Computer Technology Center (NECTEC), Thaland
The Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationWeb Document Classification Based on Fuzzy Association
Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationText Similarity Computing Based on LDA Topic Model and Word Co-occurrence
2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationCAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University
CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationIncremental Learning with Support Vector Machines and Fuzzy Set Theory
The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationDeep Classification in Large-scale Text Hierarchies
Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationExperiments in Text Categorization Using Term Selection by Distance to Transition Point
Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationCHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION
48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationImpact of a New Attribute Extraction Algorithm on Web Page Classification
Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty
More informationBAYESIAN MULTI-SOURCE DOMAIN ADAPTATION
BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,
More informationFace Recognition Based on SVM and 2DPCA
Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationQuery classification using topic models and support vector machine
Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes
More informationSpam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection
E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton
More informationClassifying Acoustic Transient Signals Using Artificial Intelligence
Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)
More informationA Novel Term_Class Relevance Measure for Text Categorization
A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure
More informationFeature Selection as an Improving Step for Decision Tree Construction
2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor
More informationRecommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm
Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationUsing Fuzzy Logic to Enhance the Large Size Remote Sensing Images
Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract
More informationIntelligent Information Acquisition for Improved Clustering
Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center
More informationAn Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationNetwork Intrusion Detection Based on PSO-SVM
TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*
More informationCUM: An Efficient Framework for Mining Concept Units
CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne
More informationClassification / Regression Support Vector Machines
Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM
More informationEfficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity
ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationUser Tweets based Genre Prediction and Movie Recommendation using LSI and SVD
User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationA User Selection Method in Advertising System
Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationDocument Representation and Clustering with WordNet Based Similarity Rough Set Model
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model
More informationCollaborative Topic Regression with Multiple Graphs Factorization for Recommendation in Social Media
Collaboratve Topc Regresson wth Multple Graphs Factorzaton for Recommendaton n Socal Meda Qng Zhang Key Laboratory of Computatonal Lngustcs (Pekng Unversty) Mnstry of Educaton, Chna zqcl@pku.edu.cn Houfeng
More informationMachine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)
Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes
More informationCLASSIFICATION OF ULTRASONIC SIGNALS
The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationCollaboratively Regularized Nearest Points for Set Based Recognition
Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,
More informationAn Anti-Noise Text Categorization Method based on Support Vector Machines *
An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,
More informationAssociative Based Classification Algorithm For Diabetes Disease Prediction
Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha
More informationReliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples
94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray
More informationISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012
Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to
More informationEfficient Text Classification by Weighted Proximal SVM *
Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationAn Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines
An Evaluaton of Dvde-and-Combne Strateges for Image Categorzaton by Mult-Class Support Vector Machnes C. Demrkesen¹ and H. Cherf¹, ² 1: Insttue of Scence and Engneerng 2: Faculté des Scences Mrande Galatasaray
More informationOutline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:
Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationAnnouncements. Supervised Learning
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples
More informationA PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION
1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute
More informationEnhanced Watermarking Technique for Color Images using Visual Cryptography
Informaton Assurance and Securty Letters 1 (2010) 024-028 Enhanced Watermarkng Technque for Color Images usng Vsual Cryptography Enas F. Al rawashdeh 1, Rawan I.Zaghloul 2 1 Balqa Appled Unversty, MIS
More informationClustering of Words Based on Relative Contribution for Text Categorization
Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that
More informationA Knowledge Management System for Organizing MEDLINE Database
A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson
More informationA Hidden Markov Model Variant for Sequence Classification
Proceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence A Hdden Markov Model Varant for Sequence Classfcaton Sam Blasak and Huzefa Rangwala Computer Scence, George Mason Unversty
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationLearning Semantics-Preserving Distance Metrics for Clustering Graphical Data
Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationData Mining For Multi-Criteria Energy Predictions
Data Mnng For Mult-Crtera Energy Predctons Kashf Gll and Denns Moon Abstract We present a data mnng technque for mult-crtera predctons of wnd energy. A mult-crtera (MC) evolutonary computng method has
More informationArabic Text Classification Using N-Gram Frequency Statistics A Comparative Study
Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationUsing Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier
Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A
More informationDeep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies
Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn
More informationGA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks
Seventh Internatonal Conference on Intellgent Systems Desgn and Applcatons GA-Based Learnng Algorthms to Identfy Fuzzy Rules for Fuzzy Neural Networks K Almejall, K Dahal, Member IEEE, and A Hossan, Member
More informationVehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning
Internatonal Journal of Intellgent Informaton Systems 205; 4(3): 58-70 Publshed onlne July 8, 205 (http://www.scencepublshnggroup.com//s) do: 0.648/.s.2050403.2 ISSN: 2328-7675 (Prnt); ISSN: 2328-7683
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationFeature-Based Matrix Factorization
Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management
More informationA Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures
A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School
More informationSRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning
Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.
More information