Deep Classification in Large-scale Text Hierarchies

Size: px
Start display at page:

Download "Deep Classification in Large-scale Text Hierarchies"

Transcription

1 Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong qyang@cs.ust.hk ABSTRACT Most classfcaton algorthms are best at categorzng the Web documents nto a few categores, such as the top two levels n the Open Drectory Project. Such a classfcaton method does not gve very detaled topc-related class nformaton for the user because the frst two levels are often too coarse. However, classfcaton on a large-scale herarchy s known to be ntractable for many target categores wth cross-lnk relatonshps among them. In ths paper, we propose a novel deep-classfcaton approach to categorze Web documents nto categores n a large-scale taxonomy. The approach conssts of two stages: a search stage and a classfcaton stage. In the frst stage, a category-search algorthm s used to acqure the category canddates for a gven document. Based on the category canddates, we prune the large-scale herarchy to focus our classfcaton effort on a small subset of the orgnal herarchy. As a result, the classfcaton model s traned on the small subset before beng appled to assgn the category for a new document. Snce the category canddates are suffcently close to each other n the herarchy, a statstcal-language-model based classfer usng n-gram features s exploted. Furthermore, the structure of the taxonomy can be utlzed n ths stage to mprove the performance of classfcaton. We demonstrate the performance of our proposed algorthms on the Open Drectory Project wth over 3, categores. Expermental results show that our proposed approach can reach 5.8% on the measure of M-F at the 5th level, whch s 77.7% mprovement over top-down based SVM classfcaton algorthms. Categores and Subject Descrptors H.4.m [Informaton Systems]: Mscellaneous; I.5.4 [Pattern Recognton]: Applcatons Text processng General Terms: Algorthms, Performance, Expermentaton. Keywords: Deep Classfcaton, Large Scale Herarchy, Herarchcal Classfcaton.. INTRODUCTION Text classfcaton s at the heart of Web page classfcaton, whch can fnd many applcatons rangng from Web personalzaton to targeted advertsements [] on Web pages. In text classfcaton, our am s to categorze a gven text document nto predefned classes, where the man technques used are machne learnng methods such as support vector machnes (SVM). However, most machne Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR'8, July 2-24, 28, Sngapore. Copyrght 28 ACM /8/7...$5.. learnng methods confne themselves to classfyng a document nto two or a few predefned categores. As such, the power of Webpage classfcaton s severely lmted. In ths paper, we take the frst step n explorng how to scale up the target categores from a few to hundreds of thousands, n herarches of classes such as the Open Drectory Project (ODP) and Yahoo! Drectores, thus elevatng text classfcaton to a new, practcal level. Three man dffcultes exst that prevent tradtonal approaches to classfcaton from beng appled. The frst s the sheer sze of the taxonomy of categores. Our experments show that as the number of classes ncreases to a moderate level, the predctve accuracy dramatcally decreases to a level that renders the classfers unusable. The second dffculty caused by the large sze of the taxonomy s that a very long tme for tranng s requred by tradtonal methods. Tradtonal methods become even ntractable for large scale herarches [2][3]. The thrd dffculty les n the fact that n practce, categores are usually organzed as a herarchcal structure. As a result, complex relatonshps, such as parent-chld relatons, often exst among the target classes. However, categores on a largescale herarchy are assumed to be ndependent by most of prevous works. Thus, these methods cannot utlze the structure nformaton. Moreover, the falure of ths assumpton may even mslead these methods and decrease ther performance. Hence, t s mportant to utlze the structure of taxonomy n order to obtan a satsfactory performance. Prevous methods to solvng the herarchcal classfcaton problem can be classfed accordng to the strateges used n classfcaton [8]. These methods can be generally dvded nto two types: bgbang approaches and top-down level based approaches. In bg-bang approaches, a sngle classfer s traned on the entre target herarchy. Bg bang methods may allow the classfcaton model to consder the herarchcal structure of classes. Examples are herarchcal SVM [2] and Roccho-lke classfers []. However, t s proved n [2][3] that t s nfeasble to drectly buld a classfer for a large-scale herarchy. A second approach to solvng the problem s the top-down approach, whch constructs classfers at each level of the category tree where each classfer works as a flat classfer at that level. A document s frst classfed by the classfer at the root level. It s then classfed by the classfers traned at the lower-level categores untl the document reaches a fnal category [6]. In order to classfy a document to a category correctly, t must be classfed perfectly at all the ancestors. As a result, a potental problem for the top-down approach s that msclassfcaton at a parent or ancestor category may force a document to be excluded from the chld categores before t could be examned by the classfers of the chld categores. Moreover, the classfcatons over hgh-level categores may fal easly snce some of the categores are too general and thus harder to dscrmnate as we show n the experments. In ths case, the performance of the top-down approach s sgnfcantly mpared. Ths ndcates that the approach makes very restrctve assumptons

2 on the herarches. Lu et al. [2] evaluated a herarchcal SVM classfcaton algorthm on the Yahoo! herarchy, whch contans 32,99 categores. The results show that the performance of classfcaton on herarchy drops quckly when the level of categores ncreased. Generally, text classfcaton on large-scale target herarches remans an unsolved problem. In ths paper, we propose a novel method that can overcome those dffcultes and consequently mprove the performance of classfcaton n large text herarches. In partcular, we present a two-stage approach for large-scale herarchcal classfcaton; we call our method deep classfcaton. In the frst stage, we organze the herarchy nto flat categores, where we perform a search process on large-scale herarches by retrevng the related categores for a gven document. We rank the categores and take the most related categores as category canddates. Thus, a large-scale herarchy s pruned nto a much smaller but focused one. In the second stage, we tran a classfcaton model on such a small subset of orgnal herarchy and classfy the gven document n that small subset. Durng ths stage, we propose several strateges for tranng classfers. The structure of the orgnal herarchy s utlzed to mprove the classfcaton performance. To evaluate our deep classfcaton approach, we have conducted several experments on the Open Drectory Project, whch contans more than 3, categores. We test the effectveness of proposed deep classfcaton algorthm by comparng to the state-ofthe-art herarchcal classfcaton algorthms. Expermental results show that our proposed approach can reach 5.8% on the measure of M-F at 5th level, whch s 77.7% mprovement over the topdown based SVM classfcaton algorthm. The rest of the paper s organzed as follows. In Secton 2, we gve a bref overvew of related work. In Secton 3, we descrbe the framework of proposed algorthms. In Sectons 4 and 5, we focus on dfferent strateges at each stage. The evaluaton results are shown n Secton 6. Secton 7 concludes wth a summary and suggestons for future work. 2. RELATED WORK 2. Tradtonal Text Classfcaton In tradtonal text classfcaton, many algorthms [7][22] have been proposed, such as Support Vector Machne (SVM), k-nearest Neghbor (knn), Nave Bayes (NB) and so on. Emprcal evaluatons on benchmark datasets such as Reuters 2578 [8] and RCV [] have shown that most of these methods are effectve n tradtonal text classfcaton applcatons. In Web applcatons, most of the classfcaton methods, such as SVM and NB, utlzed the text classfcaton methods for Web documents by ntroducng many novel features related to Web document lke anchor text, metadata and lnk structure to optmze the performance. As reported n [2], flat classfcaton based on SVM generally has worse performance than top-down based SVM for the large-scale herarchcal classfcaton. As the frst work to nvestgate the performance on large-scale herarchy, Lu et al. conducted a large scale analyss on the entre Yahoo categores and reported that the performance of flat SVM s about 3% lower on measures of Mcro-F at the 4th level and deeper. A recall system [3] was proposed on performng large scale flat classfcaton n whch a smple feature based ntermedate flterng s used to reduce the potental categores for an nstance to a small manageable set. However, the system dd not nvestgate the rch structure among the herarchcal categores. Our expermental results n Secton show that hgher performance wll be acheved by consderng such structure nformaton. 2.2 Herarchcal Text Classfcaton There are generally two approaches adopted by the exstng herarchcal classfcaton methods [8], namely, bg-bang approach and top-down approach Bg-bang Approach As descrbed n [8], for the bg-bang approach, only a sngle classfer s used by consderng the herarchcal structure of the categores. Gven a document, the classfer assgns t to one or more categores n the category tree. The bg-bang approach has been desgned usng SVM [2], Roccho-lke classfer [], rule-based classfer [6] and assocaton rules [9]. Assumng the dstrbuton of herarchcal categores follows the power law, Yang et al. [24] gave a theoretcal analyss of scalablty of text classfcaton on flat and herarchcal methods. As reported n ther work, the tme cost of bg-bang classfcaton s larger than that of top-down herarchcal classfcaton. In [2], a modfed SVM verson s appled on the whole herarchy. In [4], a search based approach s proposed to fnd the top K most smlar categores for further search result flterng. In [4], McCallum et al. proposed a herarchcal classfcaton approach usng a shrnkage approach, n whch smoothed parameter estmaton of a data-sparse chld node s used wth ts parent node n order to obtan robust parameter estmates. An EM algorthm s used to evaluate the nterpolatng parameters. However, t s very dffcult to conduct ths process on our problem settng due to the large number of categores. Furthermore, n most prevous works, experments were conducted wth at most a few thousand categores. The task of buldng even a sngle classfer for a large-scale herarchy s known to be ntractable [2]. In contrast, as we show n ths paper, our method s scalable n handlng large text herarches wth hundreds of thousands of categores Top-down Approach Top-down level-based classfcaton has been desgned based on multple Bayesan classfers n [9] and SVM classfers n [5] and [6]. In [5] and [6], Dumas and Chen proposed a classfer on the top-two levels of the LookSmart categores wth 63 categores n total. A top-down based SVM s performed on a very large scale herarchy n [2]. As reported n the work, the performance s about 4% lower on measures of Mcro-F at the 5th level and deeper on Yahoo! drectory. Drectly buldng top-down classfers cannot work well n large scale herarchy due to the problem of error propagaton. TAPER [3] s a system for large scale herarchcal classfcaton usng nave Bayesan and feature selecton on dfferent level categores. TAPER also performed top-down classfcaton on the whole herarchy. In Error! Reference source not found., a search result classfcaton system was developed by classfyng the search results nto deep herarches by usng category canddates retreved by query. However, the work focused on the search results analyss through the query, and dd not drectly solve the document classfcaton ssue. Ths paper proposes a new algorthm for document classfcaton on deep herarches. 3. DEEP CLASSIFICATION In ths secton, we propose a deep-classfcaton algorthm for large scale category herarchy. Our algorthm works as follows. For a gven document, the entre categores can be dvded nto two knds accordng to ther smlarty to the document: related categores to

3 the document and unrelated categores to the document. For a very large scale herarchy, the number of related categores for a document s much less than the number of the unrelated categores. Tradtonal herarchcal classfcaton algorthms only focused on buldng a global classfcaton algorthm to optmze the performance for all categores despte the fact that most of the categores may not be related to a gven document. Our deep classfcaton approach can utlze such a property and thus focus on the categores related to the document. We frst extract a small subset of related categores from the large-scale herarches. We then perform classfcaton on these extracted categores utlzng the structure of the orgnal herarchy. Fgure. Flowchart of Deep Classfcaton The algorthm s shown n Fgure, where we present a two-stage algorthm consstng of a search stage and a classfcaton stage. In the search stage, we try to fnd a subset of categores from the large scale herarchy related to gven document. As a result, the large scale herarchy s pruned nto a small one. Then, n the classfcaton stage, we tran the classfer on ths small herarchy. It s ntutve that the classfcaton performance on a few categores wll be better than that on a larger set of categores. Moreover, structure nformaton of the orgnal herarchy s appled n ths stage to enhance the classfcaton results. In the search stage, a search based algorthm s used to fnd the category canddates for the gven document. We begn wth a set of categores and a pre-classfed tranng set of pages. One can obtan the tranng set from taxonomes lke ODP, Yahoo! or from some other resources dependng on the desred applcaton. Compared wth the entre herarchy, ths narrowng-down procedure helps reduce the number of target category canddates. The detals of ths part wll be dscussed n Secton 4. Next, based on the structure of the pruned herarchy, a classfer s traned and used to categorze the document nto categores. In ths stage, by consderng the pruned herarchcal structure, three tranng data selecton strateges are proposed n Secton 5. whch utlze the herarchcal structure. Then, based on selected tranng data, we perform classfcaton for the gven document. Snce the classfcaton model needs buldng nstantly, t s mportant for the algorthm to be effcent n order to make our method scalable. To satsfy ths goal, we compare dfferent classfers and propose a lght-weghtng classfer based on naïve Bayes classfer whch s descrbed n Secton STRATEGIES IN SEARCH STAGE In the search stage, we propose two strateges to fnd the category canddates for a gven document: document-based search strategy and category-based search strategy. 4. Document based Strategy Document based strategy compares the relevance between the gven document and these documents n the tranng set. The documents n a tranng set and the gven document to be classfed are both represented wth normalzed term frequency vectors. A comparson s done usng the cosne smlarty measure. Top N most smlar documents are selected as related documents to the gven document. These categores are taken as the category canddates. 4.2 Category based Strategy Wth Category based strategy, we represent the category wth the Web pages n ths category and then perform the smlarty calculaton between the categores and the gven document. From these pre-classfed pages n the categores, we can buld a vector of term frequences for each of the categores. The gven document s also represented wth the term frequency vector of the document. Then, we compute the cosne smlarty between the vector of a gven document and the categores. Based on the search stage, we can acqure the related categores, whch can be ether a leaf node or an nternal node of the herarchy. In the next step, we can classfy the gven document nto these category canddates. 5. STRATEGIES IN CLASSIFICATION STAGE Based on the related category canddates, a large herarchy s pruned nto a narrow one. A category s kept f the category or ts chld category s among the canddates. The remanng categores are removed from the herarchy. An example of pruned herarchy s shown n Fgure 2. Nne categores are shown wth bold font as the related categores to the gven document, whch are acqured based on the related categores search stage. Then, we perform classfcaton on the pruned herarchy. Snce the pruned herarchy stll has the relatonshp lnks among the categores, we wsh to use these relatons to enhance the results of classfcaton. We apply classfcaton wth dfferent strateges n ths stage. Below, we consder the steps of ths stage n detal. 5. Strateges for Tranng Data Selecton 5.. Flat Strategy The flat strategy s a smple strategy for tranng data selecton n whch we just consder the category canddates as a flat structure wthout consderng the category nformaton of ther ancestors. From the vewpont of herarchcal classfcaton, ths strategy places all the category canddates drectly at the root, whch s shown n Fgure 3. Then, we drectly tran the classfer based on the Web pages n the canddate categores Pruned Top-down Strategy Consderng the tree structure of pruned herarchy, we can use the pruned top-down based strategy to tran the classfers. The pruned top-down strategy can be taken as specfc type of a top-down classfcaton method proposed n [6][2] by frstly smplfyng the large herarchy nto a narrow one. A document s frst classfed by the classfer at the root level. It s then classfed by the classfers of the lower-level categores untl t reaches a fnal category.

4 5..3 Ancestor-assstant Strategy Fgure 2. Pruned Herarchy Fgure 3. Flat Strategy Fgure 4. Ancestor-Assstant Strategy The structure of the herarchy s largely gnored by prevous two strateges. However, as dscussed n Secton, an deal strategy for tranng data selecton should take ths structural nformaton nto account. Thus, we propose the ancestor-assstant strategy to utlze ths nformaton. Ths strategy s guded by the followng two observatons. Frst, the tranng data from the category canddate tself may be nsuffcent n sze, especally for a deep category. Thus, we need to obtan more data elsewhere. Second, although the tranng data from ts hgher up ancestors may be too general to reflect the characterstcs of the deep category canddate, we can borrow data from the ancestors. We should not do ths for ancestors that are too hgh up. Hence, we propose a trade-off between the herarchcal strategy and flat strategy by combnng the tranng data from the category canddate tself and the tranng data from ts ancestors, as long as they do not share the common ancestors of other category canddates. By consderng the structure of the herarchy, the scarcty of tranng data on deep categores can be allevated. In addton, we nclude the tranng data from a node tself to reserve the characterstcs of the categores and the tranng data wll not be largely affected by the tranng data from ts ancestors. As shown n Fgure 2, snce the common ancestor s category 24, the tranng data for category 874 are from those of 834, 875 and 874 whle the tranng data for category 92 are from those of 854 and 92. The tree n Fgure 4 can clearly clarfy ths strategy. If the node may go up to a hgher level, too many tranng data wll be nvolved. As a result, large amounts of tranng data may cause the data to be unbalanced and degrade the performance. In ths work, we lmt the heght a node to be two-level-hgher than the node tself when applyng ths method. 5.2 Strateges for Classfer Selecton For a gven document, we need to tran a specfc classfer. Thus, t s preferred to employ a lghtweght classfer that does not cost too much tme for tranng. Ths s because a classfer on varous collectons of categores may be requred n response to dfferent documents. If a classfer such as SVM s employed, the long tranng tme mght prevent us from delverng the results to the user n a tmely manner. To ths end, we prefer the Nave Bayes Classfer (NBC) by consderng that probablstc estmaton of NB can be acqured off-lne. In the expermental part, we also gve the expermental results from SVM and compare the effcency and effectveness among them Standard NBC Standard NBC estmates the probablty that a test example belongs to a category by computng the followng: N d () P ( c d) P( d c ) P( c ) = P( c ) P( t c ) where c s a category, d s the test example, N s the vocabulary sze, t j s each term n vocabulary, and d j s the correspondng value n d for term t j (usually term frequency). Durng the classfcaton stage, the classfer s to assgn the category to the gven document accordng to: c* c d)} c ) P( d c )} (2) c C c ) c C N j= P( t j c C c ) It s clear that the probablty P( d c ) for each category c can be acqured off-lne. NBC wll take less tranng tme than SVM algorthm on the pruned herarches. Thus, t s a knd of lghtweght classfer N-Gram Language Models for Classfers In NBC, terms are consdered ndependent of each other gven the category. However, n our stuaton, most of canddate categores are very close to each other. It s dffcult for NBC to dstngush them based on the features of ndependent terms. In our work, we propose to use Markov n-gram language model to perform the classfcaton on canddate categores by consderng the Markov dependency between adjacent terms [7][5]. For a term sequence t t 2 LtT, the probablty of the sequence s wrtten as: T (3) P t t L t ) P( t t Lt ) v j } j= ( 2 N = = An n-gram model approxmates ths probablty by assumng that the only terms relevant to predctng P( t tl t ) are the prevous n- terms; that s, t assumes the Markov n-gram ndependence assumptons P( t t L t ) = P( t t n+ Lt ) We make a straghtforward maxmum lkelhood estmate of n-gram probabltes from a corpus by the observed frequency. We note that dfferent smoothng strateges have been proposed and evaluated n [5]. By usng n-gram features to text classfcaton, our predcton s: j j

5 c* c c C c ) c C d)} c ) P( d c )} T = P ( t c c C t Lt In ths work, we use a 3-gram for our classfcaton based on the result reported n [5], whch states that 3-grams can often result n the best performance for text classfcaton. 6. EXPERIMENTS 6. Expermental Setup 6.. Dataset # of Documents )} Fgure 5. Documents Dstrbuton on Dfferent # of Categores Fgure 6. Categores Dstrbuton on Dfferent To evaluate the performance of our algorthm, experments are conducted usng a set of classfed Web pages extracted from the Open Drectory Project (ODP) ( ODP has about 4,8,87 Web pages and 72,548 categores, n whch each Web page s classfed by human experts nto 7 top level categores (Arts, Busness and Economy, Computers and Internet, Games, Health, Home, Kds and Teens, News, Recreaton, Reference, Regonal, Scence, Shoppng, Socety, Sports, Adult and World). Because the Web pages n the regonal category are also ncluded n other categores and because many Web pages n the category of the world are not wrtten n Englsh, these two categores are removed n our experments. Accordngly, 5 categores n all are used n the experments. After downloadng from the Web, we obtan about.3 mllon Web documents n all. The data are dvded nto a tranng set and a testng set. The dstrbuton of these Web pages on 3, categores s shown n Fgure 5. As shown n the fgure, about 76.8% of the documents belong to the top sx level categores and about 68.6% of the documents belong to forth-to-sxth-level categores. The dstrbuton of 3, categores s shown n Fgure 6. As shown n the fgure, about 67.8% of the categores are n the top 6 level categores and about 64.% of categores belong to four-to-sx-level category. Ths shows that classfyng the Web pages nto deep categores s very mportant. (4) As we mentoned n Secton, the number of related categores for a gven document s small. In ths part, we present statstcs to show the category number for each document. As shown n Table, about 93.46% of the documents belong to one category. Only 6.54% of the documents have two or more categores. It s thus reasonable to select a small subset of the large scale herarchy to perform the classfcaton n ths dataset. Table. Categores Number Dstrbuton Number of Categores Number of Documents Percentage % % % >=4 95.5% Snce the whole data set s too large, we take 3, documents from.3 mllon documents as the testng data. Furthermore, n order to tune the performance of dfferent strateges, 2, addtonal documents are also randomly selected, whch s called valdaton data. The remanng data set s taken as the tranng data. We buld the documents ndexng and the categores ndexng at the related categores search stage Evaluaton Metrcs In typcal classfcaton experments, the number of documents s usually a magntude greater than the number of categores. However, the number of target categores n our tests exceeds 3,. Conductng experments wth 3K* or even more testng documents s very tme-consumng. To avod the undefned problem of Ma-F measurements on a number of categores, we use the metrc M-F [2] descrbed n [2] to measure the M-F on dfferent level. The process of evaluaton s as follows. Frst, we classfy a document nto the whole deep herarchy. For example, a Web page p can be classfed nto the category Top/Computers/Programmng/Languages/JavaScrpt/W3C_DOM. Then, we evaluate the performance for each level of the herarches accordng to the classfed category. That s, when evaluatng the performance of level one, we wll judge whether p belongs to the category Top/Computers. When evaluatng the performance of level 2, we wll judge whether the Web page p belongs to Top/Computers/Programmng. Hence, t s dfferent from that tradtonal method that trans the classfer at level or level 2 by aggregatng the data of chldren nodes nto ts parent category and only evaluatng the performance at that level. 6.2 Overall Performance Three algorthms are compared n ths work: - Herarchcal SVM: Top-down classfcaton s an effcent algorthm. In ths work, we employ the herarchcal SVM as a representatve algorthm for top-down classfcaton. - Search based Strategy: As descrbed n our deep classfcaton algorthm, we can take the most smlar category as the category for the gven document, whch s smlar to the nearest neghbor approach. - Deep Classfcaton: Ths s our proposed algorthm. As we mentoned, there are several strateges for each step. We tune these strateges n Secton 6.3. Then, we take the strateges whch acheve hghest performance. Top categores are taken as category canddates. Category-based search, ancestor-assstant strategy and 3-gram language model for classfers are taken as the settng for deep classfcaton.

6 Each algorthm s tuned to acheve the hghest performance on the valdaton data. The overall performance for three algorthms s shown n Fgure 7. M-F Search based Strategy Herarchcal SVM Deep Classfcaton Fgure 7. Performance on Dfferent As shown n Fgure 7, our proposed deep classfcaton algorthm can acheve consstent mprovement over other algorthms at dfferent levels of the herarchy. As shown n Fgure 7, the performance of our proposed algorthm can reach 5.8% at level 5 whle the herarchcal SVM only acheve 29.2% at same level. The result shows that our algorthm can get about 77.4% mprovements over the top-down approach at level 5. By usng the two-stage schema, our algorthm can make accurate classfcaton on a pruned herarchy. Snce the herarchcal SVM s conducted through a topdown method, as we dscussed above, the structure of the herarchy s not properly utlzed, so the error at hgher levels wll be propagated to deeper level. As a result, the deep-level classfcaton cannot acheve good performance. Another reason s that herarchcal SVM cannot construct tranng set that are suffcent n sze when learnng deep categores of the herarchy. As a result, the performance of herarchcal SVM s sgnfcantly reduced over the deep level categores. Furthermore, as shown n the Fgure 7, the deep classfcaton algorthm also acheves hgher performance than the search based strategy. The result can prove that t s very necessary to perform the classfcaton stage for deep classfcaton algorthm, whch can lead to more precse results for the deep herarchy. 6.3 Strategy Selecton In ths secton, we wll evaluate dfferent strateges used n each stage of proposed deep classfcaton algorthm. Both algorthms are tested on 2 documents n the valdaton data, whch are randomly chosen. We tune these strateges one by one and fx the other strateges when tunng one strategy Search Strategy As proposed n Secton 4, there are two strateges n fndng the category canddates for a new document: document-based strategy and category-based strategy. Here we evaluate whch strategy can produce hgher performance. NB classfer s used as the classfer for ts smplcty. All top categores are used. The expermental results are shown n Fgure 8. As shown n Fgure 8, the category-based strategy can produce hgher performance than the document-based strategy at each level. At level 5, the categorybased strategy can acheve 69.2% mprovement over the documentbased strategy on the measure of M-F. We explan ths observaton by the fact that the smlarty score between several retreved documents n a category and a gven document cannot represent the smlarty between the whole category and the gven document. The category can provde more nformaton than an ndvdual document n that category. Furthermore, the tme cost for category-based strategy s much less than the document-based strategy. Thus, we use the category-based strategy n the search stage for the deep classfcaton algorthm..8 Category-Based.6 Document-Based Fgure 8. Performance on Dfferent Search Strateges Canddate Category Number Selecton In the search stage, the system can return dfferent numbers of category canddates. We try to decde how many top ranked categores to be used so the category canddates are adequate. If we only choose one category, the two-stage method s degenerated to the search based strategy only. We perform evaluaton on the tunng data. Our expermental result s reported n Fgure 9. As shown n Fgure 9, the more categores chosen by the search stage, the more lkely we can fnd the correct target category n the classfcaton stage. However, too many categores also aggravate the burden on tranng tme n the classfcaton stage Top 2 3 # of category canddates Fgure 9. Performance on Dfferent Number of Category Canddates As shown n the fgure, the performance on the top-3 levels s reduced when the number of canddate categores s ncreased from to, although very slghtly. However, n deeper levels, the performance ncreases sgnfcantly and tends to be stable near categores. Thus, the number of category canddates s set to consderng the trade-off between the tme complexty and the performance. In the followng experments, we set the search strategy as the category-based strategy and use the top categores as the number of category canddates Feature Selecton Based on the search stage, category canddates for a new document are found to reduce a large herarchy nto a small one. In our problem, the number of all features exceeds, n most stuatons. To solve ths problem, we carry out feature selecton and show the performance based on usng dfferent numbers of features. We perform the CHI-Square feature selecton, whch s verfed as the best feature selecton method for text classfcaton n [23]. Two dfferent learnng methods are evaluated: Herarchcal SVM and naïve Bayesan (NB). As shown n Fgure, we can fnd that the performance wth selected 2 features s smlar to that wth the whole features. But t s an obvous advantage that fewer features M-F M-F

7 can reduce tme of tranng and testng. Therefore, n ths work, the feature number s lmted to 2 selected by CHI-Square feature selecton. M-F SVM NB+CHI NB SVM+CHI Fgure. Performance on Feature Selecton Tranng Data Selecton Based on the pruned herarchy, we consdered three strateges of tranng data selecton for further classfcaton. In order to show the performance of dfferent strateges, we conduct an experment on the small herarchy generated from the category canddates usng the naïve Bayesan classfer. The expermental results are shown n Fgure. As shown n the fgure, we can fnd the Ancestor- Assstant strategy for tranng data selecton can acheve hghest performance. There are about 3.6% and 9.5% mprovement over the herarchcal strategy and the flat strategy on the measurement of M-F, respectvely, at level 5. Flat.8 Pruned Top-Down Ancestor-Assstant Fgure. Performance on Dfferent Strateges on Tranng Data Selecton As shown n these fgures, we can fnd that the performance of the flat strategy s lower than that of the Ancestor-Assstant strategy snce ths strategy gnores the structure of the herarchy. Thus t cannot acqure enough tranng data at some cases snce the nformaton from the ancestors s not used to enhance the classfer. The nformaton from the ancestors s vtally mportant when the tranng data from the category canddate tself s nsuffcent. The performance of the flat strategy wll be very poor n ths case. Ths experment also proves that usng rch structure of herarchcal categores can enhance the performance of large scale classfcaton, whch s largely gnored n [3]. The low performance of the Top-down strategy s due to two factors: () In the top-down scheme, error rates are accumulated at each level whch gradually reach an unbearable amount at some deep level of the herarchy. Ths problem s overcome n our flat and Ancestor-Assstant strateges where the classfcaton s performed usng a flat classfer. (2) The tranng data from an ancestor may be too general and cannot characterze the category canddates. In other words, ths method mproperly utlzes the structure nformaton and thus ntroduces nose when supplementng the tranng examples. For example, n Fgure 2, tranng data from category 834 and 854 are used to tran classfer when classfyng the documents n category 874 and 92, respectvely. Our Ancestor-Assstant strategy can overcome ths problem snce both generalzed nformaton from the M-F structure and specfc nformaton from the category tself are employed together Classfer Selecton Classfer selecton s a key step to get the fnal category for the new document. Snce the model s traned nstantly when gven a document, NB and 3-gram NB are proposed to use by consderng ther effcency. Here we conduct the experments to show the performance of two algorthms and also compare to the SVM algorthm. We show the performance of SVM wth the features generated by the 3-gram language model. We call t as 3-gram SVM. As shown n Fgure 2, we fnd that our proposed 3-gram based classfcaton method can acheve hgher performance than tradtonal NB. Snce the canddate categores are much smlar wth each other, t s dffcult for NB to dstngush them wthout consderng dependency between words. Another explanaton for ths ssue s that snce the category canddates are acqured based on the ndependent term features, f we stll rely on such features to do classfcaton, the effectveness of classfers wll be decreased. 3- gram classfer takes assocated terms nto account and thus more dscrmnatve features are used than NBC method. As a result, 3- gram classfer wll acheve hgher performance. M-F Gram NB 3-Gram SVM NB SVM Fgure 2. Performance on Dfferent Classfer Selecton Generally, SVM and 3-gram SVM based algorthms can acheve hgher performance that NB algorthm and 3-gram NB algorthm, respectvely. However, the second stage of deep classfcaton needs an effcent classfer because of the onlne computaton. If we use the 3-gram based SVM, t s very tme-consumng to tran the model n the onlne step. Hence, n ths work, a 3-gram NB s taken as the second-stage classfer because of ts hgher performance and effcency. 3-Gram NB 3-Gram SVM SVM Dataset Dataset 2 Dataset 3 AVG Fgure 3. Performance for Dfferent Classfer on Far-Dstance Categores We also conducted addtonal experments to valdate ths concluson. We randomly pcked three groups of deep categores. Each group contans three categores whch are far apart from each another (they dffer at the frst level). We then performed both 3- gram classfer, NB, 3-gram SVM and SVM wth a lnear kernel on the same tranng and testng data under each category group. As shown n Fgure 3, these classfers acheve comparable performance to each other. Furthermore, SVM and 3-gram SVM can acheve better performance than NB and 3-gram classfer, respectvely. M-F

8 6.3.6 Tme Complexty The ndexng process and the tranng process for NB classfer and 3-gram language model for classfcaton are conducted off-lne. The tme complexty of onlne computaton s calculated as follows. As estmated n [24], the average tme for document-based search 2 and category-based search are O( nl n / V ) + O( n) 2 and O( ml n / V ) + O( m), respectvely. Here l n s the average length of new documents, V s the vocabulary sze, m and n s the number of categores and tranng document, respectvely. Snce n s much bgger than m, testng tme for category-based search wll be less than that of document-based search. For the classfcaton stage, we perform the classfcaton only on a narrow herarchy. Assume that we have m categores, whch s a constant, the tme cost s about O(l d *m +m logm ) for NBC and about O( l 3 d * m' + m' log m' ) for 3- gram language model. Therefore, the onlne tme complexty s acceptable, whch ndcates that our algorthm s scalable and can handle very large herarches effcently. 7. CONCLUSION AND FUTURE WORK In ths paper, we have proposed a novel algorthm for Web classfcaton on a large scale text herarchy. A two-stage algorthm s presented, consstng of a search stage and a classfcaton stage. The search stage prunes the orgnal large herarchy nto a small and tractable one. The structure of the orgnal herarchy s consdered when we tran a classfer n the classfcaton stage. As a result, our method s both effcent and effectve n handlng very large scaled herarches. Expermental results showed that our proposed algorthm can acheve 77.7% mprovement over top-down based SVM classfcaton algorthm on the accuracy at 5th level on the large-scale herarches. As one future work, we wll extend the deep classfcaton algorthm for dfferent knds of applcatons, such as onlne advertsement classfcaton. Another work s to mprove the effcency of the search stage algorthm of deep classfcaton. We wll develop more effectve ndexng algorthms to mprove the classfcaton performance. 8. REFERENCES [] Broder, A., Fontoura, M., Josfovsk, V., and Redel, L. A Semantc Approach to Contextual Advertsng. In Proc. of ACM SIGIR '7. ACM, New York, NY, pp , 27. [2] Ca, L. and Hofmann, T. Herarchcal Document Categorzaton wth Support Vector Machnes, In Proc. of CIKM 24, pp , 24. [3] Chakrabart, S., Dom, B., Agrawal, R., and Raghavan, P., Scalable Feature Selecton, Classfcaton and Sgnature Generaton for Organzng Large Text Databases nto Herarchcal Topc Taxonomes. The VLDB Journal, vol. 7, no. 3, pp , 998. [4] Chekur, C., Goldwasser, M., Raghavan, P., and Upfal, E. Web search Usng Automatc Classfcaton. In Proc. of ACM WWW-96, San Jose, US, 996. [5] Chen, H., and Dumas S. Brngng Order to the Web: Automatcally Categorzng Search Results. In Proc. of CHI, pp , 2. [6] Dumas, S. and Chen, H. Herarchcal Classfcaton of Web Content. In Proc. of 23th ACM SIGIR, pp , 2. [7] Gao, J. F. and Ne, J. Y. Wu, G. Y. and Cao, G. H. Dependence Language Model for Informaton Retreval. In Proc. of 27th ACM SIGIR, pp. 7-77, ACM Press, 24. [8] 578/. [9] Koller, D. and Saham, M. Herarchcally Classfyng Documents usng Very Few Words. In Proc. of the 4th ICML, 997. [] Labrou, Y. and Fnn, T. W. Yahoo! as an Ontology: Usng Yahoo! Categores to Descrbe Documents. In Proc. of the 8th ACM CIKM, pp. 8-87, 999. [] Lews, D. D., Yang Y., Rose T. G., L F. RCV: a New Benchmark Collecton for Text Categorzaton Research. Journal of Machne Learnng Research, Vol. 5, pp , 24. [2] Lu, T.-Y., Yang, Y.-M., Wan, H., Zeng, H.-J., Chen, Z. and Ma, W.-Y. Support Vector Machnes Classfcaton wth a Very Large-scale Taxonomy. SIGKDD Exploratons, 7(): pp , 25. [3] Madan, O., Grener, W., Kempe, D., and Salavatpour, M. Recall Systems: Effcent Learnng and Use of Category Indces. In Proc. of AISTATS, 27. [4] McCallum, A. and Rosenfeld, R. Improvng Text Classfcaton by Shrnkage n a Herarchy of Classes. Tom Mtchell and Andrew Ng. ICML-98, 998. [5] Peng, F. C, Schuurmans, D. and Wang, S. J. Augumentng Nave Bayes Text Classfer wth Statstcal Language Models. Informaton Retreval, 7 (3-4), pp , Kluwer Academc Publshers, 24 [6] Sasak, M. and Kta, K. Rule-based Text Categorzaton usng Herarchcal Categores. In Proc. of the IEEE Int. Conf. on Systems, Man, and Cybernetcs, pp , 998. [7] Sebastan, F. Machne Learnng n Automated Text Categorzaton. ACM Computng Surveys, Vol. 34, No., pp. -47, 22. [8] Sun, A. and Lm, E.-P. Herarchcal text classfcaton and evaluaton. In Proc. of IEEE ICDM (pp ). IEEE Computer Socety, 2. [9] Wang, K., Zhou, S., and He, Y. Herarchcal Classfcaton of Real Lfe Documents. In Proc. of the st SIAM Int. Conf. on Data Mnng, Chcago, 2. [2] Xng D. -K., Xue G.-R., Yang Q., Yu Y. Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches. In Proc. of ACM WSDM 28. pp [2] Yang, Y. An Evaluaton of Statstcal Approaches to Text Categorzaton. Journal of Informaton Retreval, Vol., No. /2, pp , 999. [22] Yang, Y. and Lu, X. A Re-examnaton of Text Categorzaton Methods, In Proc. of ACM SIGIR 99, pp , 999. [23] Yang, Y. and Pedersen, J.P. A Comparatve Study on Feature Selecton n Text Categorzaton. In Proc. of 4th ICML, pp , 997. [24] Yang, Y., Zhang, J. and Ksel, B. A Scalablty Analyss of Classfers n Text Categorzaton. In Proc. of ACM SIGIR'3, pp. 96-3, 23.

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1 200 2th Internatonal Conference on Fronters n Handwrtng Recognton Incremental MQDF Learnng for Wrter Adaptve Handwrtng Recognton Ka Dng, Lanwen Jn * School of Electronc and Informaton Engneerng, South

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems 2008 INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT A Smlarty-Based Prognostcs Approach for Remanng Useful Lfe Estmaton of Engneered Systems Tany Wang, Janbo Yu, Davd Segel, and Jay Lee

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Topology-aware Random Walk

A Topology-aware Random Walk A Topology-aware Random Walk Inkwan Yu, Rchard Newman Dept. of CISE, Unversty of Florda, Ganesvlle, Florda, USA Abstract When a graph can be decomposed nto clusters of well connected subgraphs, t s possble

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

A Taxonomy Fuzzy Filtering Approach

A Taxonomy Fuzzy Filtering Approach JOURNAL OF AUTOMATIC CONTROL, UNIVERSITY OF BELGRADE, VOL. 13(1):25-29, 2003 A Taxonomy Fuzzy Flterng Approach S. Vrettos and A. Stafylopats Abstract - Our work proposes the use of topc taxonomes as part

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

SI485i : NLP. Set 5 Using Naïve Bayes

SI485i : NLP. Set 5 Using Naïve Bayes SI485 : NL Set 5 Usng Naïve Baes Motvaton We want to predct somethng. We have some text related to ths somethng. somethng = target label text = text features Gven, what s the most probable? Motvaton: Author

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information