Pruning Training Corpus to Speedup Text Classification 1

Size: px
Start display at page:

Download "Pruning Training Corpus to Speedup Text Classification 1"

Transcription

1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, , Chna State Key Lab of Software Engneerng, Wuhan Unversty, Wuhan, 43007, Chna Abstract: Wth the rapd growth of onlne text nformaton, effcent text classfcaton has become one of the key technques for organzng and processng text repostores. In ths paper, an effcent text classfcaton approach was proposed based on prunng tranng-corpus. By usng the proposed approach, nosy and superfluous documents n tranng corpuses can be cut off drastcally, whch leads to substantal classfcaton effcency mprovement. Effectve algorthm for tranng corpus prunng s proposed. Experments over the commonly used Reuters benchmark are carred out, whch valdates the effectveness and effcency of the proposed approach. Keywords: text classfcaton; fast classfcaton; k-nearest neghbor (knn); tranng-corpus prunng. Introducton As the amount of on-lne textual nformaton ncreases by leaps and bounds, effectve retreval s dffcult wthout support of approprate ndexng and summarzaton of text content. Text classfcaton s one soluton to ths problem. By placng documents nto dfferent classes accordng to ther respectve contents, retreval can be done by frst locatng a specfc class of documents relevant to the query and then searchng the targeted documents wthn the selected class, whch s sgnfcantly more effcent and relable than searchng n the whole documents repostory. Text classfcaton has been a hot research topc n machne learnng and nformaton retreval areas, and a number of methods for text classfcaton were proposed [, ]. Among the exstng methods, knn s the smplest strategy that searches the k-nearest tranng documents to the test document and use the classes assgned to those tranng documents to decde the class of the test document [3, 4, 5, 6]. knn classfcaton method s easy to mplement for t does not requre the phase of classfer tranng that other classfcaton methods must have. Furthermore, expermental researches show that knn method offers promsng performance n text Ths work was supported by the Natural Scence Foundaton of Chna (NSFC) (No ) and the Provncal Natural Scence Foundaton of Hube of Chna (No. 00ABB050). R. Ccchett et al. (Eds.): DEXA 00, LNCS 453, pp , 00. Sprnger-Verlag Berln Hedelberg 00

2 83 J. Guan and S. Zhou classfcaton [, 6]. However, the knn method s of low effcency because t requres a large amount of computatonal power for evaluatng a measure of the smlarty between a test document and every tranng document and for sortng the smlartes. Such a drawback makes t unsutable for some applcatons where classfcaton effcency s pressng. For example, on-lne text classfcaton where the classfer has to respond to a lot of documents arrvng smultaneously n stream format. Some researchers n IR have addressed the problem of usng representatve tranng documents for text classfcaton to mprove classfcaton effcency. In [7] we proposed an algorthm for selectng representatve boundary documents to replace the entre tranng sets so that classfcaton effcency can be mproved. However, [7] ddn t provde any crteron on how many boundary documents should be selected and t couldn t guarantee the classfcaton performance. Lnear classfers [8] represent a category wth a generalzed vector to summarze all tranng documents n that category; the decson of the assgnment of the category can be vewed as consderng the smlarty between the test document and the generalzed vector. Analogously, [9] utlzes the centrod of each class as the only representatve of the entre class. A test document s assgned to the class whose centrod s the nearest one to that test document. However, these approaches could not do well n the case that the szes of dfferent classes are qute dfferent and dstrbuton of tranng documents n each class s not regular n document space. Combnng tradtonal knn and lnear classfcaton methods, [5] uses a set of generalzed nstances to replace the entre tranng corpus, classfcaton s based on the set of generalzed nstances. Experments show ths approach outperforms both tradtonal knn and lnear classfcaton methods. In ths paper, our focus s also on the effcency of knn based text classfcaton. We provde a robust and controlled way to prune nosy and superfluous documents so that the tranng corpuses can be sgnfcantly condensed whle ther classfcaton competence s mantaned as much as possbly, whch leads to greatly mproved classfcaton effcency. We desgn effectve algorthm for text corpus prunng, and carry out experments over the commonly used Reuters benchmark to valdate the effectveness and effcency of our proposed approach. Our approach s especally sutable for on-lne classfcaton applcatons. The rest of ths paper s organzed as follows. Secton ntroduces the vector space model (VSM), a clusterng-based feature selecton method and the knn classfcaton method. Secton 3 frst presents the concepts and algorthm for nosy and superfluous tranng documents prunng, and then gves a fast knn classfcaton approach based on the proposed tranng-corpus prunng algorthm. Secton 4 descrbes some experments for evaluatng the proposed approach. Secton 5 concludes the paper. Prelmnares for knn Based Text Classfcaton. Documents Representaton by Vector Space Model (VSM) In knn based text classfcaton, the vector space model (VSM)[0] s used to represent documents. That s, a document corresponds to an n-dmensonal document

3 Prunng Tranng Corpus to Speedup Text Classfcaton 833 vector. Each dmenson of the document vector corresponds to an mportant term appearng n the tranng corpus. These terms are also called document features. Gven a document vector, ts dmensonal components ndcate the correspondng terms weghts that are related to the mportance of these terms n that document. Denote D a tranng corpus, V the set of document features, V={t, t,, t n }. For a document d n D, t can be represented by VSM as follows. d = ( w, w,..., wn ). Above, d ndcates the vector of document d, w (= n) s the weght of term t. Usually, the weght s evaluated by TFIDF method. A commonly used formula s lke ths: tf log( N / n ) w = n. () ( tf ) [log( N / n )] = Here, N s the total number of documents n D, tf s the occurrence frequency of t n document d, and n s the number of documents where t appears. Obvously, document vectors calculated by () are unt vector. Gven two documents d and d, the smlarty coeffcent between them s measured by the nner product of ther correspondng document vectors,.e., Sm d, d ) = d d. (3). Clusterng Based Feature Selecton ( To calculate document vectors for tranng documents, the frst step s to select a set of proper document features. A number of statstc methods have been used for document features selecton n the lterature []. However, n ths paper we use a new method, whch s referred to as clusterng-based feature selecton. From the pont of geometry vew, every document s a unt vector n document space (n-dmensonal space). Bascally, documents belong to the same class are closer to each other n document space than those that are not n the same class, that s, they have smaller dstance (or larger smlarty). Documents n the same class form a dense hyper-cone area n document space, and a tranng corpus corresponds to a cluster of hyper-cones each of whch corresponds to a class. Certanly, dfferent hyper-cones may overlay wth each other. Intutvely, the goal of feature selecton task here s to select a subset of documents features such that the overlayng among dfferent tranng classes n the document space s as lttle as possble. The basc dea of our clusterng-based feature selecton method s lke ths: treatng each tranng class as a dstnctve cluster, then usng a genetc algorthm to select a subset of document features such that the dfference among all clusters s maxmzed. We defne the dfference among all clusters as follows. m m Dff = ( C C k k m m k *( k ) sm( d, d )) sm( d, d ). m k= d d C ( Ck ) C, k k k = d C k k = k = d C k () (4)

4 834 J. Guan and S. Zhou Above, m s the number of clusters, the frst part ndcates the average ntra-cluster smlarty, and the second part means the average nter-cluster smlarty. Due to space lmtaton, we omt the detals of the clusterng based feature selecton algorthm..3 knn Based Text Classfcaton The knn based text classfcaton approach s qute smple []: gven a test document, the system fnds the k nearest neghbors among tranng documents n the tranng corpus, and uses the classes of the k nearest neghbors to weght class canddates. The smlarty score of each nearest neghbor document to the test document s used as the weght of the classes of the neghbor document. If several of k nearest neghbors share a class, then the per-neghbor weghts of that class are added together, and the resultng weghted sum s used as the lkelhood score of that class wth respect to the test document. By sortng the scores of canddate classes, a ranked lst s obtaned for the test document. By thresholdng on these scores, bnary class assgnments are obtaned. Formally, the decson rule n knn classfcaton can be wrtten as: score( d, c ) = d knn ( d ) Sm( d, d ) δ ( d, c ) b. Above, knn(d) ndcates the set of k nearest neghbors of document d; b s the class-specfc threshold for the bnary decsons, t can be automatcally learned usng cross valdaton; and δ ( d, c ) s the classfcaton for document d wth respect to class c, that s, δ ( d, c ) = 0 d d c ; c. Obvously, for a test document d, the smlarty between d and each document n the tranng corpus must be evaluated before t can be classfed. The tme complexty of knn classfcaton s O(n t D log( D )) where D and n t are the sze of tranng corpus and the number of test documents. To mprove classfcaton effcency, a possble way s to reduce D, whch the goal of ths paper. In ths paper, we assume that ) the class space has flat structure and all classes are semantcally dsonted; ) each document n the tranng corpus belongs to only one class; 3) each test document can be classfed nto only one class. Wth these assumptons, for test document d, t should belong to the class that has the hghest resultng weghted sum n (5). That s, d c only f score( d, c) max{ score( d, c ) = n}. (5) = (6) 3 Tranng-Corpus Prunng for Fast Text Classfcaton Examnng the process of knn classfcaton, we can see that outer documents or boundary documents (locatng near the boundary) of each class (or document hyper-cone) play more decsvely role n classfcaton. On the contrary, nner documents or central documents (locatng at the nteror area) of each class (or

5 Prunng Tranng Corpus to Speedup Text Classfcaton 835 document hyper-cone) are less mportant as far as knn classfcaton s concerned, because ther contrbuton to classfcaton decson can be obtaned from the outer documents. In ths sense, nner documents of each class can be seen as superfluous documents. Superfluous documents are ust not tell us much about makng classfcaton decson, the ob they do n nformng classfcaton decson can be done by other documents. Except for superfluous documents, there may be some nosy documents n tranng corpus, whch are n-correctly labeled tranng documents. We seek to dscard superfluous and nosy documents to reduce the sze of tranng corpus so that classfcaton effcency can bee boosted. Meanwhle, we try to guarantee that the prunng of superfluous documents wll not cause classfcaton performance (precson and recall) degradaton. In the context of knn text classfcaton, for tranng document d n tranng corpus D, there are two sets of documents n D that are related to d n dfferent way. Documents n one of the two sets are crtcal to the classfcaton decson on d f d were a test document; for documents n the other set, d can contrbute to the classfcaton decsons on these documents f they were treated as test documents. Formal defntons for the two document-sets are as follows. Defnton. Gven document d n tranng corpus D, the set of k nearest documents to d n D consttutes the k-reachablty set of d, whch s referred to as k-reachablty(d). Formally, k-reachablty (d)={d d D and d knn(d)}. Defnton. Gven document d n tranng corpus D, there s a set of documents n the same class that d belongs to, n whch each document s k-reachablty set contans d. We defne ths set of documents the k-coverage set of d, or smply k-coverage (d). Formally, k-coverage (d)={d d D and d class(d) and d k-reachablty (d )}. Here, class(d) ndcates the class to whch d belongs. Note that n defnton, k-coverage (d) contans only documents from the same class that d belongs to. The reason les n the fact: our am s to prune tranng-corpus whle mantanng ts classfcaton competence. Obvously, prunng d may mpact negatvely the classfcaton decsons on the documents n the same class that d belongs to; however, t can beneft the classfcaton decsons on the documents n the other classes. Hence, we need take care only the documents n the same class that d belongs to and whose k-reachablty sets contan d. Defnton 3. Gven document d n tranng corpus D, f t could be correctly classfed wth k-reachablty(d) based on the knn method, n other words, d can be mpled by k-reachablty(d), then t s a superfluous document n D. Defnton 4. Gven document d n tranng corpus D, t s a crtcal document f one of the followng condtons s fulflled: a) at least one document d n k-coverage(d) can not be mpled by ts k-reachablty(d ); b) after d s pruned from D, at least one document d n k-coverage(d) cannot be mpled by ts k-reachablty(d ).

6 836 J. Guan and S. Zhou Defnton 5. Gven document d n tranng corpus D, f t s not a superfluous document and ts k-coverage (d) s empty, then t s a nosy document n D. In summary, a superfluous document s superfluous because ts class assgnment can be derved from other documents; a crtcal document s crtcal to other documents because t can contrbute to makng correct classfcaton decsons about these documents; and a nosy documents s nose as far as classfcaton s concerned because t s ncorrectly labeled. In knn classfcaton, nose documents must be gven up; superfluous documents can be dscarded; however, crtcal documents must be kept n order to mantan tranng corpus classfcaton competence. Based on ths consderaton, we gve a rule for tranng corpus prunng as follows. Rule. The rule for tranng-document prunng. For document d n tranng corpus D, t can be pruned from D f ) t s a nosy document n D, or ) t s a superfluous document, but not a crtcal document n D. For the second case n Rule, the frst constrant s the prerequste for prunng a certan document from the tranng corpus, whle the second constrant s put to guarantee that the prunng of a certan document wll not cause degradaton of classfcaton competence of the tranng corpus. Whle prunng superfluous documents, t s worthy of pontng out that the order of prunng s also crtcal because the prunng of one document may mpact the decson on whether other documents can be pruned. Intutvely, nner documents of a class n the tranng corpus should be pruned before outer documents. Ths strategy can ncrease the chance of retanng outer documents as many as possble. Otherwse, f outer documents were pruned before nner documents, t would be possble to cause the Domno effect that a lot of documents are pruned from the tranng corpus, ncludng outer documents, whch would degrade greatly the classfcaton competence of the tranng corpus. Therefore, some rule s necessary to control the order of documents prunng. Generally speakng, nner documents of a certan class n the tranng corpus have some common features: ) nner documents may have more documents of ther own class around themselves than outer documents can have; ) nner documents are closer to the center of ther class than the outer documents are; 3) nner documents are further from the documents of other classes than the outer documents are. Based on these observatons, we gve a rule about superfluous document s prunng prorty as follows. Here, we denote H-kNN(d) the number of documents n knn(d) that belongs to the class of d; smlarty-c(d) the smlarty of document d to the center of ts own class, and smlarty-ne(d) the smlarty of document d to the nearest document that does not belong to ts own class. Rule. The rule for settng prorty of prunng superfluous documents. Gven two documents d, d n a class of the tranng corpus, both d and d are superfluous documents that can be pruned accordng to Rule. ) f H-kNN(d )>H-kNN(d ), then prune d before d ; ) f smlarty-c(d )> smlarty-c(d ), then prune d before d ; 3) f smlarty-ne(d )< smlarty-ne(d ), then prune d before d ;

7 Prunng Tranng Corpus to Speedup Text Classfcaton 837 4) f they have smlar H-kNN, smlarty-c and smlarty-ne, then any one can be pruned frst; 5) the prorty of usng H-kNN, smlarty-c and smlarty-ne: H-kNN>smlarty-c> smlarty-ne. Followng s an algorthm for tranng corpus prunng. In algorthm, we assume that there s only one class n the tranng corpus. If there are multple classes n the tranng corpus, ust carryng out the prunng process n algorthm over one class after another. Algorthm. Prunng-tranng-corpus (T: tranng corpus, P: pruned corpus) ) P=T; S=Φ; ) for each document d n T 3) compute k-reachablty(d); compute k-coverage(d); 4) for each nosy document d n T 5) S=S {d}; T=T-{d}; P=P-{d}; 6) for each document d n k-coverage(d) 7) remove d from k-reachablty(d ) and update k-reachablty(d ) n T; 8) for each document d n k-reachablty(d) 9) remove d from k-coverage (d ); 0) for each document d n T but not n S ) f d can be pruned and have the hghest prorty to be pruned, then ) S=S {d}; P=P-{d}; 3) for each document d n k-coverage(d) 4) update k-reachablty(d ) n T. 5) return P. Based on the technque of tranng corpus prunng, a fast algorthm for knn text classfcaton s outlned as follows. Algorthm. Fast knn classfcaton based on tranng documents prunng (outlne) ) Selectng document feature wth the proposed clusterng based feature selecton method; buldng tranng document vectors wth the selected features; ) Prunng the tranng corpus by usng algorthm ; 3) For each test document d, calculate ts smlarty to each tranng document n the pruned tranng corpus; 4) Sortng the computed smlartes to get knn(d); 5) Decdng d s class based on formula (5) and (6). 4 Expermental Results We evaluate the proposed approach by usng the Reuters benchmark compled by Apte et al. for ther evaluaton of the SWAP- by removng all of the unlabeled documents from the orgnal Reuters corpus and restrctng the categores to have a tranng-set frequency of at least two []. Usually ths corpus s smply referred to as

8 838 J. Guan and S. Zhou Apte. We do not use the Apte corpus drectly, nstead we frst remove tranng and test documents that belong to two or more categores, and then select the top 0 categores to form our own compled Apte corpus. Statstc nformaton of the compled Apte corpus s lsted n Table. Table. Our compled Apte corpus (TC-Apte) Category Number of tranng docs Number of test docs Acq Coffee 93 5 Crude 55 4 Earn Interest 9 90 money-fx 5 00 money-supply 3 30 Shp 38 Sugar 97 3 Trade 5 88 Total We mplemented a prototype wth VC under Wndows 000. Experments were carred out on a PC wth P4.4GHz CPU and 56MHz memory. The goal of experments s to evaluate the performance (effectveness and effcency) of our approach. For smplcty, we denote the compled Apte corpus TC-Apte. In experments, TC-Apte s pruned frst by usng our prunng algorthm; the pruned result corpus s denoted as TC-Apte-pruned. Classfers are traned wth TC-Apte and ts correspondng pruned corpus TC-Apte-pruned, the traned classfers performances are then measured and compared. Three performance parameters are measured: precson (p), recall (r), and classfcaton speedup (or smply speedup), n whch p and r are used for effectveness measurements, and speedup s used for effcency mprovement measurement of our approach. Here we use the mcro-averagng method for evaluatng performance average across multple classes. In the context of ths paper (.e. each document, ether for tranng or for test, belongs to only one category), mcro-average p (or smply mcro-p) and mcro-average r (or smply mcro-r) have smlar values. In ths paper, we use mcro-p, whch can be evaluated as follows: the number of correctly assgned test documents mcro p =. the number of test ocuments We defne effcency speedup n the followng formula: speedup = t t TC Apte TC Apte pruned Above, t TC-Apte and t TC-Apte-pruned are the tme cost for classfyng a test document (or a set of test documents) based on TC-Apte and TC-Apte-pruned respectvely.. Apte corpus s avalable at:

9 Prunng Tranng Corpus to Speedup Text Classfcaton 839 Due to space lmtaton, here we gve only partal expermental results. Fg. llustrates the results of the mpact of k value on prunng effectveness and classfcaton performance over TC-Apte. From Fg., we can see that by usng our corpus-prunng technology, classfcaton effcency can get mproved at a factor of larger than 4, wth less than 3% degradaton of mcro-averagng performance. Obvously, ths result s acceptable. mcro-p k No prunng Prunng (a) k value vs. mcro-p speedup k (b) k value vs. speedup Fg.. Impact of k value on prunng effectveness and classfcaton performance 5 Concluson The rapd growth of text nformaton avalable arses the requrement of effcent text classfcaton method. Although knn based text classfcaton s a good method as far

10 840 J. Guan and S. Zhou as performance s concerned, t s neffcent for t has to calculate the smlarty of the test document to each tranng document n the tranng corpus. In ths paper, we propose a tranng-corpus prunng based approach to speedup the knn method. By usng our approach, the sze of tranng corpus can be reduced sgnfcantly whle classfcaton performance can be kept at a level close to that of wthout tranng documents prunng. Expermental results valdate the effcency and effectve of the proposed approach. References. Fabrzo Sebastan. Machne learnng n automated text categorzaton. ACM Computng Surveys, 34(): -47, 00. Y. Yang and X. Lu. A re-examnaton of text categorzaton. Proceedngs of the nd Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 99), B. Masand, G. Lnoff, and D. Waltz. Classfyng news stores usng memory-based reasonng. Proceedngs of the 5th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 9), 99, Y. Yang. Expert network: effectve and effcent learnng from human decsons n text categorzaton and retreval. Proceedngs of the 7th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 94), 994, W. Lam and C. Y. Ho. Usng a generalzed nstance set for automatc text categorzaton. Proceedngs of the st Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 98), 998, S. Zhou, J. Guan. Chnese documents classfcaton based on N-grams. A. Gelbukh (Ed.): Intellgent Text Processng and Computatonal Lngustcs, LNCS, Vol. 76, Sprng-Verlag, 00, S. Zhou. Key Technques of Chnese Text Database. PhD thess of Fudan Unversty, Chna D. D. Lews, R. E. Schapore, J. P. Callan, and R. Papka. Tranng algorthms for lnear text classfers. Proceedngs of the 9th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 96), 996, E. H. Han and G. Karyps. Centrod-based document classfcaton algorthm: analyss & expermental results. Techncal Report TR-00-07, Dept. of CS, Un. of Mnnesota, Mnneapols, G. Salton, A. Wong, and C. S. Yang. A vector space model got automatc ndexng. K. S. Jones and P. Wllett (Eds.), Readngs n Informaton Retreval. Morgan Kaufmann, Yang, Y., Pedersen J.P. A Comparatve Study on Feature Selecton n Text Categorzaton Proceedngs of the Fourteenth Internatonal Conference on Machne Learnng (ICML'97), C. Apte, F. Damerau, and S. Wess. Text mnng wth decson rules and decson trees. Proceedngs of the Conference on Automated Learnng and Dscovery, Workshop 6: Learnng from Text and the Web, 998.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Clustering of Words Based on Relative Contribution for Text Categorization

Clustering of Words Based on Relative Contribution for Text Categorization Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Fingerprint matching based on weighting method and SVM

Fingerprint matching based on weighting method and SVM Fngerprnt matchng based on weghtng method and SVM Ja Ja, Lanhong Ca, Pnyan Lu, Xuhu Lu Key Laboratory of Pervasve Computng (Tsnghua Unversty), Mnstry of Educaton Bejng 100084, P.R.Chna {jaja}@mals.tsnghua.edu.cn

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining A Notable Swarm Approach to Evolve Neural Network for Classfcaton n Data Mnng Satchdananda Dehur 1, Bjan Bhar Mshra 2 and Sung-Bae Cho 1 1 Soft Computng Laboratory, Department of Computer Scence, Yonse

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Internal Clustering Validation Index for Boolean Data

An Internal Clustering Validation Index for Boolean Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 6 Specal ssue wth selecton of extended papers from 6th Internatonal Conference on Logstc, Informatcs and Servce Scence

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds Learnng from Multple Related Data Streams wth Asynchronous Flowng Speeds Zh Qao, Peng Zhang, Jng He, Jnghua Yan, L Guo Insttute of Computng Technology, Chnese Academy of Scences, Bejng, 100190, Chna. School

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information