Semi Supervised Learning using Higher Order Cooccurrence Paths to Overcome the Complexity of Data Representation

Size: px
Start display at page:

Download "Semi Supervised Learning using Higher Order Cooccurrence Paths to Overcome the Complexity of Data Representation"

Transcription

1 Sem Supervsed Learnng usng Hgher Order Cooccurrence Paths to Overcome the Complexty of Data Representaton Murat Can Ganz Computer Engneerng Department, Faculty of Engneerng Marmara Unversty, İstanbul, Turkey Abstract We present a novel approach to sem-supervsed learnng for text classfcaton based on the hgher-order co-occurrence paths of words. We name the proposed method as Sem-Supervsed Semantc Hgher-Order Smoothng (S3HOS). The S3HOS s bult on a tr-partte graph based data representaton of labeled and unlabeled documents that allows semantcs n hgherorder co-occurrence paths between terms (words) to be exploted. There are several graph-based technques proposed n the lterature to dffuse class labels from labeled documents to the unlabeled documents. In ths study we propose a dfferent and natural way of estmatng class condtonal probabltes for the terms n unlabeled documents wthout need to label the documents frst. The proposed approach allows estmatng class condtonal probabltes for the terms n unlabeled documents and mprove the estmaton of terms n the labeled documents at the same tme. We expermentally show that S3HOS can hghly mprove the parameter estmaton and hence ncrease the classfcaton accuracy partcularly when the amount of the labeled data s scarce but unlabeled data s plentful. Keywords Sem-Supervsed Learnng, Nave Bayes, Semantc Smoothng, Hgher-Order Nave Bayes, Text Classfcaton. I. INTRODUCTION The sem-supervsed classfcaton algorthms am to mprove classfer performance n terms of accuracy by ncorporatng large amount of unlabeled data, whch s readly avalable or easy to acqure. In many real-world applcatons t s costly to obtan labeled nstances therefore the use of un-labeled nstances to mprove the accuracy may be a much cost effectve alternatve than labelng more nstances. The research on sem-supervsed learnng (SSL) s of nterest recent years and receved consderable attenton from the research communty on several domans ncludng mage classfcaton and text classfcaton both of whch are the domans that labelng nstances are partcularly expensve due to the sze and complexty of ther structures demands human nterventon. Several tutorals and surveys can be found about the semsupervsed learnng n [15], [16], [17]. There are several dfferent approaches to the sem-supervsed learnng, n partcular sem-supervsed classfcaton. These nclude generatve models, Sem-Supervsed Support Vector Machnes, graph-based models [24], co-tranng and multvew models [15]. In text classfcaton doman, one of the most famous generatve model s the sem-supervsed varant of the Multnomal Naïve Bayes algorthm proposed by the Ngam, McCallum, and Mtchell [18]. Ths study uses a generatve classfer; the Naïve Bayes (NB), to classfy unlabeled nstances startng wth a small tranng set of labeled nstances. Ths classfcaton process of unlabeled nstances s appled teratvely and optmzed by usng Expectaton Maxmzaton (EM). Our approach bears smlartes wth generatve models snce we use parameter estmatons smlar to the Naïve Bayes (NB). Addtonally, t can be consdered as a graphbased model as we too are usng graph data structures. On the other hand, our approach dverges from generatve models by not employng an teratve approach to label the unlabeled nstances and t dffers from the graph-based models as our graph data structures models terms (words) n documents nstead of the documents. Furthermore, many graph-based SSL algorthms are ntrnscally transductve [19] and as a result they can t naturally classfy prevously unseen nstances that are not present durng tranng. In ths paper, we present a novel approach to sem-supervsed learnng based on the hgher-order co-occurrence paths of words. We name the proposed method as Sem-Supervsed Semantc Hgher-Order Smoothng (S3HOS). It s a hgher-order learnng approach. The hgher-order learnng approaches [3], [4], [13] explots the mplct relatons that stems from the basc structure of the natural language. If the terms or words are expressed together n a meanngful context such as a sentence or document, they obvously have a semantc relatonshp and these relatonshps are transtve whch leads to the hgher-order cooccurrences. These relatons, specfcally the hgher-order cooccurrence relatons lnk the terms n dfferent contexts such as documents therefore blurrng the documents borders. In our case, hgher-order co-occurrence relatons can lnk the terms n unlabeled documents to the ones n labeled documents. More specfcally, S3HOS s bult on smlar graph based data representaton of the HOS [13], whch allows semantcs n hgher-order co-occurrence paths between terms (words) to be

2 exploted. Apart from the several graph-based technques proposed n the lterature to dffuse class labels from labeled documents to the unlabeled documents, we propose a novel and natural way of estmatng class condtonal probabltes for the terms n unlabeled documents wthout need to label them frst. The approach allow estmatng class condtonal probabltes even for the terms n unlabeled documents and mprove the estmaton of terms n the labeled documents at the mean tme. We expermentally show that S3HOS can promnently mprove the parameter estmaton and ncrease the classfcaton accuracy partcularly when the amount of the labeled data s scarce and unlabeled data s plentful. The remander of ths paper s organzed as follows: Secton 2 presents background nformaton about the studes of hgherorder co-occurrence, Secton 3 presents and analyzes the proposed classfer algorthm, Secton 4 presents expermental setup and ntroduces datasets, Secton 5 presents experment results ncludng some opnons and the last secton presents a concluson and future work. II. BACKGROUND Tradtonal machne learnng algorthms assume that nstances are ndependent and dentcally dstrbuted (IID) [1]. There are several studes whch explot explct lnk nformaton n order to overcome the shortcomngs of IID approach [1], [2] for supervsed learnng. However, the use of explct lnks has a sgnfcant drawback; n order to classfy a sngle nstance, an addtonal context needs to be provded. In addton to these, there are many graph-based sem-supervsed models, whch reles on the explct relatonshps generated by usng a dstance or smlarty metrc between documents [20], [21], [23], [24]. Smlarly, these models are hghly senstve to the adjacency graph constructon from documents and the regularzaton parameter parameters [22]. Latent Semantc Indexng (LSI) algorthm [6] s a wdely used technque n Text Mnng and Informaton Retreval. It has been shown that LSI takes advantage of mplct hgher-order (or latent) structure n the assocaton of terms and documents. Hgher-order relatons n LSI capture latent semantcs [7]. There are several LSI based classfers. Among these, n [8] the authors propose a LSI based k-nearest Neghborhood (LSI k- NN) algorthm n a sem-supervsed settng for short text classfcaton (SS-LSI-kNN). In several prevous studes, LSI motvated approaches are developed whch explctly make use of hgher-order relatons. These nclude a novel Bayesan framework called Hgher-Order Naïve Bayes (HONB) [3], [4] and a novel data drven space transformaton that allows vector space classfers to take advantage of relatonal dependences captured by hgher-order paths between features [3] and Hgher-Order Support Vector Machnes (HOSVM) [5]. In general, the Naïve Bayes parameter estmaton drastcally suffers from sparsty due to the very large number of parameters to estmate n the text classfcaton. The number of parameters corresponds to ( V C + C ) where V denotes the dctonary and C denotes the set of class labels [10]. The sparsty of the text data ncreases the mportance of smoothng methods snce they are ntended to dstrbute a certan amount of probablty mass to the prevously unseen events. Based on the smlar prncples, Hgher-Order Smoothng (HOS) for Naïve Bayes text classfcaton takes the explotng the hgher-order relatons concept one step further and explot the relatonshps between nstances of dfferent classes n order to mprove the parameter estmaton [13]. The HOS s bult on a graph-based data representaton from the prevous algorthms n hgher-order learnng framework such as HONB [4]. Whle, n the HONB, hgher-order paths are extracted n the context of a partcular class, n HOS, hgher-order paths from the whole tranng set ncludng all classes of documents. Consequently, t s possble to explot the relatons between the terms and the documents n the dfferent classes. In other words, the basc unts for parameter estmaton, the hgher-order paths, not only move beyond document boundares but also the class boundares to explot the latent semantc nformaton between terms. In ths study, we develop the concept of hgher-order paths even further to nclude unlabeled documents along wth labeled dataset. III. APPROACH Hgher-order paths smultaneously capture term co-occurrences wthn documents as well as term sharng patterns across documents, and n dong so provde a much rcher data representaton than the tradtonal feature vector form [3]. A hgherorder co-occurrence path s shown n Fg.1. Ths fgure depcts three documents, d 1, d 2 and d 3, each contanng two terms t 1, t 2 n d 1, t 2, t 3 n d 2, and t 3, t 4 n d 3. Under the documents a small graph of four nodes and three edges s llustrated. Ths chan represents a hgher-order path that lnks term t 1 wth term t 4 through t 2 and t 3. Ths s an example of thrd-order path snce three lnks, or hops, connect t 1 wth term t 4. Smlarly, there s a secondorder path between t 1 and t 3 through t 2. The t 1 co-occurs wth t 2 n document d 1, and t 2 co-occurs wth t 3 n document d 2. Even f terms t 1 and t 3 never co-occur n any of the documents n a corpus, the regularty of these second order paths may reveal latent semantc relatonshps such as synonymy [13]. d 1 t 1 t 2 d 2 t 2 t 3 Fg. 1. Hgher-Order co-occurrence. Adopted from [7] In [13] a tr-partte graph s used to represent terms, documents, and class labels. In ths study we use a smlar tr-partte graph structure. However, n our case, the graph s created from the combnaton of labeled and unlabeled sets of documents. Therefore, as can be seen n Fg. 2, the set of terms and the set of documents consst of two regons demonstratng f they are d 3 t 3 t 4 d 3 t 1 d 1 t 2 d 2 t 3 D2 t 4

3 from labeled or the unlabeled part of the dataset. The nodes representng documents consst of two separate node sets of labeled documents D L and unlabeled documents D U. Smlarly, the nodes representng terms also dvded nto two sets T U and T L. T U T L V w t 1 t 2 d t n t n+1 c 1 c s V C Fg. 2. Trpartte graph representaton of labeled (DL) and unlabeled (DU) sets of documents, terms, and class labels, showng a second-order co-occurrence path between a term and a class label. In general we use the tr-partte graph structure that s depcted n Fg. 2 and the followng defntons n the followng sectons. D U : the set of unlabeled documents D L : the set of labeled documents T U : the set of terms that occurs n unlabeled documents T L : the set of terms that occurs n labeled documents T OU : the set of terms that only occurs n unlabeled documents; T OU = T U T L T OL : the set of terms that occurs only n labeled documents T UL: the set of terms that occurs both n labeled and unlabeled documents; T UL = T U T L We assume that D U D L= and T UL. (t n,c m) 1 s the number of frst-order paths between t n and class label m (c m). These are bascally the co-occurrences, n bnary form whch correspond to the number of documents n class m that ncludes term t n. (t n,c m) 2 ndcates the number of second-order paths between t n and class label m (c m). These are hgher-order co-occurrences. t 1 d 1 t n d r+1 c 1 s an example second-order co-occurrence path between t 1 and c 1 whch s ndcated by bold dashed V D d 2 d 3... d r t z d r+1... d m D U D L lnes n Fg.2. If we look carefully to ths example, t s a specal case whch lnks a term that exst only n unlabeled documents to a class label snce t 1 T OU and t n T UL and d 1 D U and d r+1 D L. By usng ths knd of relatons we could be estmate parameters for the terms n unlabeled documents whch s otherwse not possble wthout labelng them frst. In S3HOS, smlar to the HOS [13], the parameters of the Naïve Bayes can be calculated from hgher-order paths (.e second-order paths) nstead of documents as n Eq. 1 and 2. P 1 (t n c m ) = (t n,c m ) 1 T (t,c m ) 1 P 2 (t n c m ) = (t n,c m ) 2 T (t,c m ) 2 In order to fuse the power of frst-order and second-order paths, we can combne probablty estmates from frst-order paths and second-order paths usng the lnear nterpolaton as n Eq. 3. If the λ=1 we use only the second-order paths n parameter estmates. On the other hand f λ=0 we use only the frst-order co-occurrences. It s mportant to note that λ also controls the effect of labeled documents n the overall estmate snce only the terms n labeled documents have potental to have frst-order cooccurrence wth class labels. In other words, terms that occur only n unlabeled documents can t have a frst-order co-occurrence wth a class label. P(t n c m ) = (1) (2) (1 λ)(t n,c m ) 1 +λ(t n,c m ) 2 +s T ( (1 λ)(t,c m ) 1 +λ(t,c m ) 2 )+2s The s s the smoothng parameter n order to avod zero probabltes. We assgn a very small number (0<s<<1) n our experments. The second parameter of the Naïve Bayes; class pror probablty for class m can be calculated usng Eq. 4. We assume that each class ncludes at least one document. P(c m ) = T (1 λ)(t,c m ) 1 +λ(t,c m ) 2 C T (1 λ)(t,c j ) 1 +λ(t,c j ) j 2 Ths new formulaton makes t possble to calculate class condtonal probabltes for some of the terms that do not occur n documents of a partcular class as well as the terms does not even occurs n the documents of the labeled dataset. Furthermore, the use of second-order paths provdes more evdence for the terms that exst n a partcular class by makng use of the terms and documents n the unlabeled dataset. The hgher-order paths capture the term co-occurrences wthn a class of documents, term relaton patterns across classes, and term relaton patterns across labeled and unlabeled documents. Consequently, The Nave Bayes type probablty estmaton extracts patterns from a much rcher data representaton than the tradtonal vector space representaton. IV. EXPERIMENT SETUP We use three dfferent datasets ncludng wdely used benchmark datasets n text classfcaton doman. The frst dataset s a (3) (4)

4 wdely used dataset n the text classfcaton lterature named WebKB 1. The WebKB dataset ncludes web pages collected from computer scence departments of dfferent unverstes, organzed n seven categores. We use four-class verson of the WebKB dataset, whch s used n prevous works [13]. We call ths dataset as WebKB4 n the followng sectons. WebKB4 dataset has a hghly skewed class dstrbuton. The second dataset, called 1150haber, ncludes 1150 news artcles n fve categores, (economy, magazne, health, poltcs and sports) collected from Turksh onlne newspapers [11]. Our thrd dataset; mdb s a move revews dataset that s commonly used n sentment analyss studes [14] snce each document s labeled as negatve or postve. The class label, document and vocabulary statstcs are gven n Table 1, where C s the number of classes, D s the number of documents, and V s the vocabulary sze.e. the number of dstnct terms. All the datasets are preprocessed usng stop word removal, stemmng and reducng the dmensonalty to 2000 terms usng Informaton Gan (IG). TABLE I. DESCRIPTIONS OF THE DATASETS BEFORE PREPROCESSING Data Set C D V WebKB4 1150Haber mdb , ,000 16,116 11,038 16,679 We also provde term based statstcs for each dataset n Table 2. As can be seen from ths table, WebKB4 dataset dverges from others wth much longer document szes and the largest devaton of document lengths. These knd of dvergences may have notable effect on the results of our model snce the hgherorder path based models fundamentally rely on the quantty and the qualty of the shared terms between documents. TABLE II. THE TERM BASED CHARACTERISTICS OF THE DATASETS Terms n documents Term length Dataset Avg StdDev Avg StdDev Sparsty WebKB Haber mdb All these datasets are fully labeled.e. each and every document has a sngle class label. In our experments, n order to measure the performance of sem-supervsed algorthms we dvded a dataset nto three chunks; tranng set, unlabeled set and test set by usng several dfferent percentages. Class labels of the unlabeled set are removed. The test set percentage s kept constant 20%. The supervsed algorthms are traned wth tranng set and evaluated usng test set, whle the sem-supervsed algorthms are traned wth both the tranng set and unlabeled set (ncludes no class labels), and evaluated wth the test set, lkewse. These percentages can be seen n Table 3. TABLE III. LABELED AND UNLABELED DATA SPLIT PERCENTAGES Tranng Unlabeled Test Set % Set % Set % Ten folds are created for each dataset and each of sx levels by randomly selectng tranng set percentage and the unlabeled set percentage of the data, and leavng the rest 20% for the test. The stratfed samplng s used whle dvdng the datasets.e. the class dstrbutons s preserved. Ths approach s smlar to [9] [12] [13]. Smlar to the [13], the experment results are presented usng the average accuracy and standard devatons (StdDev) of these 10-fold result. We employ up to second-order paths based on the expermental results of prevous studes [3], [4], [7], [13]. We use several common sem-supervsed text classfers. These sem-supervsed algorthms nclude sem-supervsed Naïve Bayes (EM-NB) [18], and sem-supervsed Latent Semantc Indexng k-nearest Neghborhood (SS-LSI-kNN) [8] [4] [13]. We use the exact same parameter settng for the LSI-kNN ; The number of neghbors s set to 25 and k parameter of LSI s automatcally optmzed for each dataset and tranng set percentage by usng the Kaser approach. The SS-LSI-kNN algorthm s the sem-supervsed varant of the LSI-kNN algorthm used n [13]. Ths s smply acheved by provdng unon of labeled dataset and unlabeled dataset as an nput to the LSI algorthm. However the knn part of the algorthm only consders the labeled nstances n the LSI space whch s generated usng the unon of labeled and large number of unlabeled nstances. Due to the extremely computatonal and space hgh-complexty of the LSI algorthm and the extensve number of experments we conduct (10 folds for each of the 6 tranng set percentages) for each dataset, we couldn t obtan SS-LSI-kNN results for our largest datasets of WebKB4. Our computatonal resources smply do not allow them to be fnshed on a reasonable tme frame. V. EXPERIMENT RESULTS AND DISCUSSION In ths secton we provde a detaled emprcal assessment of the proposed algorthm; Sem-supervsed Semantc Hgher-Order Smoothng (S3HOS) wth several sem-supervsed algorthms. Although we calculate qute a few dfferent evaluaton metrcs such as F-measure (F1) and the area under the ROC curve (AUC), we use accuracy as our man evaluaton metrc n ths secton as t s the most common approach. 1

5 In the followng fgures the algorthm we propose n ths study s ndcated wth S3HOS accompaned by the λ (lambda) value whch shows how the frst-order and second-order paths are combned n parameter estmatons. The values of λ are reported. λ = 1 means that only the second-order paths are used. On the other hand, λ = 0.5 means that frst-order and secondorder paths are combned wth equal weghts. Please see approach secton for the detals of the combnaton method. The fgures compares S3HOS wth the sem-supervsed algorthms; sem-supervsed Naïve Bayes (EM+MNB) and SS-LSI-kNN. We consder EM+MNB to be our baselne snce t too s a Naïve Bayes based sem-supervsed algorthm. We start wth the mdb dataset snce we got the most nterestng results from ths dataset as can be seen n Fg.3. We can see that lambda parameter has a crtcal nfluence on the performance of S3HOS on ths dataset. S3HOS wth λ =1 acheves 88% accuracy that s more than 10% dfference from ts closest compettor when the tranng dataset sze s only 1% and unlabeled data sze s 79%. Ths s the hghest gan we see among all our expermental results. Even the S3HOS, λ=0.5 performs better than our base-lne EM+MNB through out the chart. Interestngly, the performance of S3HOS, λ=1 (second-order paths only) seems to decrease slghtly as we ncrease the labeled dataset percentage. SS-LSI-kNN performs worse than other algorthms. Ths s usually the case throughout the experments. Our second dataset s 1150haber and n Fg. 4, the results are agan qute nterestng. The S3HOS λ=1 (second-order paths only) trumphs an accuracy of more than 90% only usng the %1 of the data as labeled tranng set and usng 79% of the data as unlabeled dataset. The accuracy reaches almost 95% when the tranng dataset sze ncreased to 5%. Fg. 4. Accuracy comparson of S3HOS wth sem-supervsed algorthms on 1150haber dataset. S3HOS surpass our baselne sem-supervsed algorthm EM+MNB on WebKB4 dataset as can be seen n Fg. 5. Interestngly, the lambda parameter seem to have a dfferent effect n here. In other words, frst-order paths seem to play a more mportant role n ths dataset. We attrbute ths phenomenon to the hghly skewed class dstrbuton of the dataset. Furthermore, ths dataset has the bgger documents compare to the others and the dfferences between the documents of dfferent classes are relatvely smaller. As a result, hgher-order path may be ntroducng much more nose to the algorthm compare to the others. Fg. 3. Accuracy comparson of S3HOS wth sem-supervsed algorthms on mdb dataset. Fg. 5. Accuracy comparson of S3HOS wth sem-supervsed algorthms on WebKB4 dataset. Overall, our experments show that S3HOS outperforms our baselne sem-supervsed model EM+MNB and SS-LSI-kNN (we could not provde results of the latter one for our larger dataset WebKB4 due to ts exceptonally hgh complexty).

6 VI. CONCLUSIONS AND FUTURE WORK We present a novel and a natural approach to sem-supervsed learnng for text classfcaton based on the hgher-order semantc smoothng (HOS) method that s developed n our prevous study. We name the proposed method as Sem-Supervsed Semantc Hgher Order Smooth-ng (S3HOS). The S3HOS s bult on a graph based data representaton of the documents, terms and class labels, whch allows semantcs n hgher-order co-occurrence paths between terms (and also the class labels) to be exploted. There are several graph-based technques proposed n the lterature to dffuse class labels from labeled documents to the unlabeled documents. In ths study we propose a novel and natural way of estmatng class condtonal probabltes for the terms n unlabeled documents wthout need to label them frst. The approach allow estmatng class condtonal probabltes even for the terms n unlabeled documents and mprove the estmaton of terms n the labeled documents at the mean tme. Ths new formulaton makes t possble to calculate class condtonal probabltes for some of the terms that do not oc-cur n documents of a partcular class as well as the terms does not even occurs n the documents of the labeled dataset. Furthermore, the use of second-order paths provdes more evdence for the terms that exst n a partcular class by makng use of the terms and documents n the unlabeled dataset. Hgher-order paths capture the term co-occurrences wthn a class of documents, term relaton patterns across classes, and term relaton patterns across labeled and unlabeled documents. Consequently, The Nave Bayes type probablty estmaton extracts patterns from a much rcher data representaton than the tradtonal vector space representaton. We expermentally show that S3HOS can promnently mprove the parameter estmaton and ncrease the classfcaton accuracy partcularly when the amount of the labeled data s scarce and unlabeled data s plentful. ACKNOWLEDGMENT Ths work s supported n part by The Scentfc and Technologcal Research Councl of Turkey (TÜBİTAK) grant number 111E239. Ponts of vew n ths document are those of the authors and do not necessarly represent the offcal poston or polces of the TÜBİTAK. REFERENCES [1] Taskar B, Abbeel P, Koller D. Dscrmnatve Probablstc Models for Relatonal Data. In Proc. Uncertanty n Artfcal Intellgence (UAI02), Alberta,Canada, August 1-4, 2002, pp [2] Getoor L, Dehl C P. Lnk Mnng: A Survey. ACM SIGKDD Exploratons Newsletter, 2005, 7(2), pp [3] Ganz M C, Lytkn N, Pottenger W M. Leveragng Hgher Order Dependences between Features for Text Classfcaton. In Proc. European Conference on Machne Learnng and Proncples and Practce of Knowledge Dscovery n Databases (ECML/PKDD), Bled, Slovena, September 2009, pp [4] Ganz M C, George C, and Pottenger W M. Hgher Order Naïve Bayes: A Novel Non-IID Approach to Text Classfcaton. IEEE Transactons on Knowledge and Data Engneerng, 2011, 23(7), pp [5] Lytkn N. Varance-based clusterng methods and hgher order data transformatons and ther applcatons. Ph.D. Thess, Rutgers Unversty, NJ, U.S.A. (2009) [6] Deerwester S C, Dumas S T, Landauer T K, Furnas G W, Harshman, R A. Indexng by Latent Semantc Analyss. Journal of the Amercan Socety for Informaton Scence, 41(6), 1990, pp [7] Kontostaths A, Pottenger W M. A Framework for Understandng LSI Performance. Journal of the Informaton Processng and Management, 42(1), 2006, pp [8] Sarah Z, Hrsh H. Transductve LSI for Short Text Classfcaton Problems. In Proc. of the 17th Internatonal Florda Artfcal Intellgence Research Socety Conference (FLAIRS), Mam Beach, FL, May 2004, PP [9] McCallum A, Ngam K. A Comparson of Event Models for Nave Bayes Text Classfcaton. In Proc. AAAI 1998 Workshop on Learnng for Text Categorzaton, Madson, Wsconsn, July 1998, pp [10] McCallum A, Ngam K. Text classfcaton by bootstrappng wth keywords, EM and shrnkage. In Workng Notes of ACL 1999 Workshop for the Unsupervsed Learnng n Natural Language Processng (ACL), Maryland, USA, pp [11] Amasyalı M F, Beken A. Türkçe Kelmelern Anlamsal Benzerlklernn Ölçülmes ve Metn Sınıflandırmada Kullanılması (Measurement of Turksh Word Semantc Smlarty and Text Categorzaton Applcaton). In Proc. IEEE Snyal İşleme ve İletşm Uygulamaları Kurultayı (SIU) (Sgnal Processng and Communcatons Applcatons Conference), Kocael, Turkey, Aprl [12] Renne J D, Shh L, Teevan J, Karger D R. Tacklng the poor assumptons of nave bayes text classfers. In Proc. ICML 2003, August, pp [13] Poyraz, Mtat, Zeynep Hlal Klmc, and Murat Can Ganz. "Hgher- Order Smoothng: A Novel Semantc Smoothng Method for Text Classfcaton." Journal of Computer Scence and Technology 29.3 (2014): [14] Pang, B., & Lee, L. (2004, July). A sentmental educaton: Sentment analyss usng subjectvty summarzaton based on mnmum cuts. In Proceedngs of the 42nd annual meetng on Assocaton for Computatonal Lngustcs (p. 271). Assocaton for Computatonal Lngustcs. [15] Zhu, X. Sem-Supervsed Learnng Lterature Survey, Unversty of Wsconsn Madson, Computer Scences Techncal Report, Last Modfed on on December 14, 2007 [16] O Chapelle, B Schölkopf, A Zen, Sem-supervsed learnng. MIT Press, ISBN , [17] Zhu, X. (2010). Sem-supervsed learnng. In Encyclopeda of Machne Learnng(pp ). Sprnger US. [18] Ngam, K., McCallum, A., & Mtchell, T. (2006). Sem-supervsed text classfcaton usng EM. Sem-Supervsed Learnng, [19] Krshnapuram, B., Wllams, D., Xue, Y., Carn, L., Fgueredo, M., & Hartemnk, A. J. (2004). On sem-supervsed classfcaton. In Advances n neural nformaton processng systems (pp ). [20] Belkn, M., Nyog, P., & Sndhwan, V. (2006). Manfold regularzaton: A geometrc framework for learnng from labeled and unlabeled examples. The Journal of Machne Learnng Research, 7, [21] Wang, F., & Zhang, C. (2008). Label propagaton through lnear neghborhoods. Knowledge and Data Engneerng, IEEE Transactons on, 20(1), [22] de Sousa, C. A. R., Rezende, S. O., & Batsta, G. E. (2013). Influence of graph constructon on sem-supervsed learnng. In Machne Learnng and Knowledge Dscovery n Databases (pp ). Sprnger Berln Hedelberg. [23] Lebchot, B., Kvma k, I., Françosse, K., & Saerens, M. (2014). Semsupervsed Classfcaton Through the Bag-of-Paths Group Betweenness. Neural Networks and Learnng Systems, IEEE Transactons on, 25(6), [24] Subramanya, A., & Talukdar, P. P. (2014). Graph-Based Sem- Supervsed Learnng. Synthess Lectures on Artfcal Intellgence and Machne Learnng, 8(4),

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Automatic Text Categorization of Mathematical Word Problems

Automatic Text Categorization of Mathematical Word Problems Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Text Categorization Using Word Similarities Based on Higher Order Co-occurrences *

Text Categorization Using Word Similarities Based on Higher Order Co-occurrences * Text Categorzaton Usng Word Smlartes Based on Hgher Order Co-occurrences * Syed Fawad Hussan 1 Glles Bsson 1 Abstract 12 In ths paper, we propose an extenson of the χ-sm coclusterng algorthm to deal wth

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Sentiment Classification and Polarity Shifting

Sentiment Classification and Polarity Shifting Sentment Classfcaton and Polarty Shftng Shoushan L Sopha Yat Me Lee Yng Chen Chu-Ren Huang Guodong Zhou Department of CBS The Hong Kong Polytechnc Unversty {shoushan.l, sophaym, chenyng3176, churenhuang}

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Semi-supervised Classification Using Local and Global Regularization

Semi-supervised Classification Using Local and Global Regularization Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Sem-supervsed Classfcaton Usng Local and Global Regularzaton Fe Wang 1, Tao L 2, Gang Wang 3, Changshu Zhang 1 1 Department of

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Modified Adaptive Resonance Theory Network for Mixed Data Based on Distance Hierarchy

Modified Adaptive Resonance Theory Network for Mixed Data Based on Distance Hierarchy Modfed Adaptve Resonance Theory Network for Mxed Data Based on Dstance Herarchy Chung-Chan Hsu 1, Yan-Png Huang 1,2, and Cheh-Mng Hsao 1 1 Department of Informaton Management, Natonal Yunln Unversty of

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

Modeling Inter-cluster and Intra-cluster Discrimination Among Triphones

Modeling Inter-cluster and Intra-cluster Discrimination Among Triphones Modelng Inter-cluster and Intra-cluster Dscrmnaton Among Trphones Tom Ko, Bran Mak and Dongpeng Chen Department of Computer Scence and Engneerng The Hong Kong Unversty of Scence and Technology Clear Water

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Adaptive Transfer Learning

Adaptive Transfer Learning Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Multivariate Analysis of Static Code Attributes for Defect Prediction

A Multivariate Analysis of Static Code Attributes for Defect Prediction Research Paper) A Multvarate Analyss of Statc Code Attrbutes for Defect Predcton Burak Turhan, Ayşe Bener Department of Computer Engneerng, Bogazc Unversty 3434, Bebek, Istanbul, Turkey {turhanb, bener}@boun.edu.tr

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information