Application of k-nn Classifier to Categorizing French Financial News

Size: px
Start display at page:

Download "Application of k-nn Classifier to Categorizing French Financial News"

Transcription

1 Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, Versalles, France {Huazhong.Kou, 2 e-xmlmeda 3 Avenue du Général Leclerc, Bourg La Rene, France {Georges.Gardarn, Alan.D'heygère}@e-xmlmeda.fr Abstract: We have mplemented the document categorzaton system DocCat to automatcally organze French fnancal news for Frstnvest ste. Ths paper descrbes system framework and man technques we use. In DocCat, both relatonal database and XML are used to organze documents, our CBA algorthm s conducted to select features and k nearest neghbor algorthm s mplemented as categorzaton model. We use 4000 fnancal news to learn and evaluate DocCat. The prmary expermental results show that DocCat produces satsfactory performance. The flexble desgn allows users to easly adapt DocCat to dfferent applcaton doman. Keywords: k-nn, document categorzaton, machne learnng, XML. Introducton Created n 997, FrstInvest s a fnancal meda on Internet and specalzes n the dffuson of both fnancal news and expert s opnons of stock exchange on Internet. Today, t s one of most sgnfcant fnancal stes n France, wth more than vstors per month []. To facltate the dffuson of fnancal news, everyday the fnancal news are edted and then organzed nto predefned categores manually by experts at FrstInvest. Manually categorzng may nduce some problems, for example expensve cost and tme consumng. Ths leads us to collaborate wth FrstInvest to automatcally organze French fnancal news. In ths framework, we have proposed and mplemented a document categorzaton system called DocCat. Ths research s supported by a natonal RNTL Proect called CONTEXTE Bourse. Document categorzaton s the procedure of assgnng one or multple predefned category labels to a free text document. A prmary applcaton of text categorzaton s to assgn subect category/es to documents to support nformaton retreval or to ad human ndexers n assgnng such categores. Categorzaton can also help buld a personalzed net news flter. In DocCat, we mplement k nearest neghbor (k-nn) categorzaton algorthm. k-nn s a classcal nstancebased machne learnng algorthm. Many emprcal researches stated that k nearest neghbor (k-nn) s one of the top-performng classfers [2][3]. Ths paper focuses on the applcaton of DocCat to organzng the fnancal news at the FrstInvest ste. The rest of ths paper s organzed as follows: Secton 2 descrbes FrstInvest corpus; Secton 3 presents k-nn categorzaton model; Secton 4 dscusses system general framework, document organzaton schema and system functonalty; Secton 5 explans the system evaluaton measures and experments whle the concluson s made n Secton FrstInvest Corpus 4000 fnancal news have been collected at the FrstInvest ste from 08/0/200 to 0/3/2002. The news before and on January 0 of 2002 are selected as tranng documents to learn the system whle the rest 500 news are used to evaluate the system. Each news belongs to only one of 30 predefned categores (see Table ) but the dstrbutons of these categores n ths corpus are uneven. For example, there are.8% documents of Botechnologe and.3 % of Telecoms.

2 Aéronautque/Défense Immobler Medas/TV/Communcaton Pharmace/chme/gaz Automoble Dstrbuton almentare Web Agency Marchés fnancers Botechnologe SSII Assurances Holdngs MP/Bens d'équpement Bens de consommaton Courtage en lgne Agro-almentare FAI/Portal Constructon/BTP Hotellere/losr/transport Technologques Energe/envronnement/servces Telecoms Edteur de logcels Cosmétque/luxe Marketng/Bases de données Banques Matéraux de constructon Dstrbuton spécalsée Edteur de eux vdéos Pétroler Table Fnancal Categores of FrstInvest Fgure shows the format of an example of tranng news. Each news contans one attrbutes and fve elements. To manually categorze such news, the ndexer must read the content element and analyze t. <?xml verson".0" encodng"iso-8859-"?> <corpus> <news newsid"5000"> <newsdate>0-jan-2002 </newsdate> <category>edteur de logcels</category> <text> <ttle> BVRP affche un chffre d affares en hausse </ttle> <content>bvrp, édteur de logcel de communcaton, a publé ses chffres des neuf premers mos de l exercce (à la fn avrl). Il en ressort un chffre d affares en hausse sur un an de 26,9%, à 28,7 mllons d euros. SI on exclut Lab Producton, sa flale Multméda, qu l a cédé récemment, </content> </text> </news> </corpus> Fgure. News Format 3. k-nn Categorzaton Model documents are selected to be used n the followng steps. The categores of the k top-rankng neghbors are called canddate categores. Then the category score s calculated for each canddate category by usng the smlarty of the selected k documents to the new document. Fnally one or more categores are assgned to the new document by a sutable thresholdng strategy [4]. k-nn s a top-performng algorthm and t s comparable to the most effectve support vector machne algorthm reported n [2]. It uses the document vector representaton model under whch documents are mapped nto the ponts of hgh dmenson concept space [5]. In practce, all document vectors are normalzed to be of unt length. The values of document vector elements can be calculated by term weghtng algorthms. The f-df term weghtng model and ts varants are often used. In DocCat, the followng weghtng model s mplemented: w log( f +.0) * + ( ) log N df l df Where w s the weght of the th term n th document, N s the number of tranng documents, df s the number of tranng documents contanng the th term (document frequency), and f s the number of tmes the th term occurs n the th document (term frequency). Then cosne functon based smlarty noton s ntroduced to fnd the neghbors of a gven document as (2). Sml( d, d ) cos ( d, d ) 2 w l f d d wl wl d 2 d 2 2 w l Where d and d are two document vectors [5]. N l f log l (2) ( ) Example documents Preprocessng Unque terms n documents Feature Selecton New document Unque terms Representaton Dctonary Document Vector Smlarty Calculaton Document vectors Assgned Category Category Score Calculaton k top Neghbors Fgure 2 the General Framework Accordng to k-nn, gven a new document, the system ranks ts neghbors among all tranng documents by calculatng document smlarty and the top k neghbor 4. Implementaton of the System

3 Ths secton presents the general framework of the system, the man technques we use ncludng document organzaton and system functonalty. 4. General Framework Fgure 2 shows the general framework of DocCat. It s composed of two subsystems: the learnng subsystem, lnked by thck arrows; the categorzng subsystem, lnked by thn arrows. The goal of learnng subsystem s to determne all system parameters, and create knowledge database. It s conducted n the followng steps: Preprocessng: we extract all unque words present n each tranng document, remove stop words, punctuaton marks and non-letter characters, then the left words are folded nto low case and converted nto ther stems by Porter stemmng algorthms [6]. The fnal form of word s called term. Both term frequency and document frequency are counted for each term. Furthermore, the terms wth hgh and low document frequency are removed. The resultng terms and ther frequences are stored n database tables as ntermedate data. Feature selecton: after preprocessng, the number of left terms are stll very large and an optmal subset of terms must be selected by usng feature selecton algorthm. χ 2 test model s well-known algorthm used to select feature [7], and our system mplements t. Concept-Based Algorthm (CBA) we proposed [8] s also mplemented (see Secton 4.2). Both of them are flter algorthms: frst they calculate term weghts at the corpus level that ndcates the power of category predcton of terms, then all terms are ranked n the descendng order of calculated term weghts, and fnally some top terms are selected as feature terms that make up of ndexng dctonary. The ndexng dctonary s one of very mportant parts of knowledge database. Document representaton: at the preprocessng step, unque terms have been dentfed for every documents and both ther document frequency and term frequency have been counted. Then gven a document, term weght defned by () s calculated for each dctonary term t contans. Ths way, the th document can be represented by the followng vector (3). d T ( w, w, w, w ) R (3) 2 3 where w s the weght of the th dctonary term n th document d where T and N. All document vectors of tranng documents consttute the core of knowledge database n k-nn categorzaton system. The learnng phase s followed by categorzng phase. Categorzng a document begns by preprocessng t. The goal of preprocessng a document s to dentfy all T dctonary terms present n the document. Then ts correspondng document vector can be created by the way presented above. The other steps to categorze a document are: Smlarty calculaton: the smlarty between the new document vector and every tranng document vector stored n the knowledge database s calculated by (2). k nearest neghbors: based on the smlartes calculated above, all tranng document vectors are ranked n the descendng order of smlarty, then the top k document vectors are chosen for calculatng category score n the next step. Category score calculaton: the categores to whch the k nearest neghbor documents belong are called canddate categores. A score s calculated for each canddate category by some score calculaton algorthm, for example by summng the values of smlarty over the documents of k nearest neghbor documents belongng to ths category. Assgnng category: all canddate categores can be ordered n the descendng order of ther scores, then a thresholdng strategy s used to decde whch category(es) should be assgned to the new document. [4] studed the thresholdng strateges for text categorzaton. By the way, there exsts lots of system parameters such as the sze of dctonary, k value, language, etc. To make our system more flexble, we store all system parameters n a system property fle. By modfyng the property fle, users can very easly confgure and adapt DocCat to the needs of applcaton doman. 4.2 Feature Selecton Algorthm We present concept-based algorthm (CBA) to select features. Under the vector representaton model, a document d can be represented by (3). Then one vector s created for every category by averagng the vectors of documents belongng to the same category. Ths vector s called Concept vector of ths category. The values of concept vector elements can characterze the relatonshp between terms and categores. The concept vector of the category C s noted as C v. It s calculated by (4). Where C v C v d s the vector of the th document n th category C, and C v s number of example documents n the category C. The l th element w cl of C v can be calculated by (5). w cl C v C v C v d w l ( 4) ( 5)

4 Where wl s the weght of the l th term of the th document n the th category C and t can be calculated by (). We use w to measure term-goodness between cl l th term and th category C. It s a local weght value of l th term correspondng to the category C. Furthermore, we use all local weght values of term to calculate the global weght of l th term t at the level of corpus, noted as CW(t) by (6). CW ( t) Pr ( C ) wcl ( 6) Where P r ( C ) s the dstrbuton of the category C n corpus that s the proporton of the number of documents n the category C to the total number of documents n the corpus. Combnng (5) and (6), we have (7) CW Then all terms found n the corpus can be ranked n the descendng order of CW(t) and some top terms are selected to consttute corpus dctonary. We call ths algorthm Concept-Based Algorthm, noted as CBA. For the analyss of CBA, see [8]. 4.3 Document Database Schema k k C Cv v ( t) Pr ( C ) w ( 7) Fgure 3 shows the man parts of document data schema n DocCat. Here, dctonary, category and tranng document vector make up of knowledge database for k-nn categorzaton system and are stored n relatonal tables. Dctonary table has 5 attrbutes: term, document frequency, document IDs. Tranng document Vectors table s a vector representaton of orgnal example fnancal news, t has 4 attrbutes: document ID, date, document vector, category ID. Document vector s composed of all (term, weght) pars, where term s dctonary term found n the current document. Category ID ndcates the membershp between document and category. Category table contans the names and ID of category used by FrstInvest. StemWords table keeps the mappng relatonshp between words and stems. All orgnal productve fnancal news are stored n XML documents. In one XML document we store at most 2000 peces of news stores. The correspondng XML schema s shown as follows: <xsd:schema xmlns XMLSchema > <element name news maxoccurs 2000 > <attrbute name newsid type ID /> <element name newsdate type date /> <element name text > <complextype > <element name ttle type strng /> <element name content type strng /> </schema> For each new productve fnancal news, we create an mage to store ts category, keyword, vector, etc. Then these mages are stored n the XML mage documents. Note that we store at most 2000 news mage n one XML mage document. The schema of XML mage documents s defned by: <schema xmlns XMLSchema > <element name newsimage maxoccurs 2000 > <sequence> <attrbute name newsid type ID /> <element name category type strng / > <element name keywords type strng maxoccurs 20 /> <element name docvector > <element name termweghtpar maxoccurs unbounded > <sequence> <element name term type strng /> <element name weght type decmal /> </sequence> </sequence> </schema> The XML mage documents are very much shorter than the orgnal fnancal news, and they are orented to machne processng. Based on XML mage documents,

5 keyword- and content-based searches are conducted, See Secton 4.3. We store document data obtaned from the tranng example documents n the relatonal tables n order to make advantage of database system technology to analyze the corpus. We use XML documents to store the productve news and ther mages so that many orented Web technologes, for example XQuery and XSTL, can be used to process and dssemnate fnancal news across Internet. 4.4 System Functonalty One of most bascally functonaltes s to categorze new fnancal news nto a proper category. Besde ths, DocCat can support the followng functonaltes: keyword extracton, keyword- and content-based searches CATEGORIZING FINANCIAL NEWS Gven a new fnancal news, DocCat can assgn only one category to t. To categorze one news, DocCat frst dentfes the dctonary terms present n the content of the news and generates a correspondng document vector, then uses k-nn classfer to retreve a category sutable to the news. The document vector and retreved category are stored n XML mage document as the values of vector element and category element respectvely KEYWORD EXTRACTION In DocCat, keywords do not exactly mean the same thng as tradtonal lbrary keyword. They are statstcal keywords. To extract keywords for a news document, all elements of ts document vector are ranked n the descendng order of ther weghts, then the terms correspondng frst h elements are selected. If stemmng algorthm s conducted, the selected terms are not really words. In ths case, the mappng relatonshp stored n the StemWords table wll be exploted to convert selected terms from stem form to real words present n the document. The resultng words are consdered as keywords and stored n the XML mage document as the value of keywords element. In ths way, the words representng the document content are dentfed whle the words not sgnfcant to the document content are removed KEYWORD-BASED SEARCH By the tradtonal keyword search, we search full orgnal text by matchng keywords nput by users. If a word matchng a gven keyword s found, the document wll be returned. Ths produces three problems: Frst, searchng full text s tme consumng; Second rrelevant documents are returned f the words not sgnfcant to the document content match keywords gven by users; Last, the documents contanng the relevant concepts wanted by user are not retreved f these documents do not contan the keywords gven by users. In the realty, there are usually many ways to express a gven concept, so the lteral terms n a user s query may not match those of a relevant document. In other the hand, most words have multple meanngs, so the terms n a user s query wll lterally match terms n documents that are not of nterest to the user. By searchng XML mage document of fnancal news, the frst two problems can be overcome to some extent, because XML mage documents are very much shorter than orgnal documents and partcularly they only contan the sgnfcant words to document content. Fgure 4 shows keyword-based search by usng XML mage documents. Now, we only search the text of the keywords element of XML mage document. Input words User XML mage document News XML documents news Fgure 4. keyword search by usng XML mage documents CONTENT-BASED SEARCH Content-based search means that users can start ther query wth a free text as a query strng, for example the sentences expressng the desred concepts. DocCat takes the query strng as a fnancal news document and transforms ths query document nto a document vector. Then, the smlartes between ths query vector and all document vectors stored n XML mage documents are calculated by the smlarty model (2). Then the fnancal news n the news XML documents are ordered n the descendng order of smlartes, and the frst l top news are thought of as content related documents and are returned to users. The keyword-based search only consders the presence or absence of query words n the documents, whle content-based search dstngushes the words from the vewpont of degree that the words contrbute to the document content. 5. Evaluaton and Experments Based on the 4000 fnancal news collected by FrstInvest ste, some experments have been done. At the preprocessng step, we remove 39 stop-words and convert words nto ther word stems by usng Porter stemmng algorthm [6]. Fnally, 4,428 unque terms

6 are obtaned. Note that only the content parts of news are used n DocCat whle the ttle parts of news are not nvolved. We do dfferent experments by varyng both the szes of feature terms (000,2000 and 3000) and the k values (0,20,30,40,50,60,70) for k-nn. The RCut threshold strategy [4] of value s adopted, that s, the category wth the hghest category score among the canddate categores s assgned to the document. RCut threshold strategy s sutable to the FrstInvest stuaton. Indeed, FrstInvest classfes a news nto only one category, see Secton 2. To evaluate categorzaton systems, we use three standard measures: Recall (r), Precson (p) and F ( r, p). For a category, recall (r) s the proporton of correctly assgned documents to all documents belongng to the category and precson (p) s the proporton of correctly assgned documents to all assgned documents. F ( r, p) measure s defned by combng recall and precson [3] as follows: pr F ( r, p) 2 p + r We also check the average performance of a bnary classfer over multple categores, namely, the macroaverage and the mcro-average [3]. Macro-average gves an equal weght to the performance on every category, regardless how rare or how common a category s. Mcro-average, however, gves an equal weght to the performance on every document (category nstance), thus favors the performance on common categores. For detal, see [3]. Category Recall Precson F Rate Holdngs Pharmace/chme/gaz Hotellere/losr/transport Marchés fnancers Telecoms Aéronautque/Défense Web Agency Banques Botechnologe Dstrbuton spécalsée Table 2. system performance over 0 categores wth 000 features and k0 Due to the lmt of space, here we only present the expermental results over 0 categores n Table 2, where the last column represents the dstrbutons of category n the corpus. The results n Table 2 are obtaned by settng k0 and selectng 000 features by CBA algorthm. Wth the 000 features, the system acheves the best performance at k0. The values of mcro-average of recall and precson and F are 0.72, 0.67 and 0.70 whle the values of macro-average of them are 0.636, 0.7 and The expermental results ndcate that the system acheves a good performance over common categores. For the less frequent categores, the performance s not satsfactory. The weak performance over small categores arses from the uneven category dstrbuton. 6. Concluson Ths paper brefly presents the applcaton of document categorzaton system DocCat. The goal of DocCat s to automatcally categorze French fnancal news at the fnancal portal FrstInvest. The general framework and common technques of categorzaton are also dscussed. In addton, by creatng XML mage documents for productve fnancal news, we propose two approaches to searchng text: keyword-based search and content-based search. Furthermore, the flexblty of the system allows users to easly adapt t to ther applcaton doman and requrements. References [] [2] Joachms, T. Text categorzaton wth support vector machnes: Learnng wth many relevant features. In the proceedngs of ECML, 998. [3] Yang, Y. An evaluaton of statstcal approaches to text categorzaton. Informaton Retreval,(),pp.69-90,999. [4] Yang, Y. A study on thresholdng strateges for text categorzaton, Proceedngs of ACM SIGIR 0, 200 [5] Salton, G. Automatc Text Processng: The Transformaton, Analyss, and Retreval of Informaton by Computer. Addson-Wesley, Readng, Pennsylvana, 989. [6] Porter, An algorthm for suffx strppng, Program, Vol. 4, no. 3, 980, pp [7] Yang Y. and Jan O. Pederson, A Comparatve Study on Feature Selecton n Text Categorzaton, In the 4 th ICML, pp ,997. [8] H. Kou, G. Gardarn., K. Zetoun. Two New Approaches to Feature Selecton for Document Categorzaton, techncal report #2002/9, PRSM Laboratory, Unversty of Versalles, 2002.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition Mathematcal Methods for Informaton Scence and Economcs Novel Pattern-based Fngerprnt Recognton Technque Usng D Wavelet Decomposton TUDOR BARBU Insttute of Computer Scence of the Romanan Academy T. Codrescu,,

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Laplacian Eigenmap for Image Retrieval

Laplacian Eigenmap for Image Retrieval Laplacan Egenmap for Image Retreval Xaofe He Partha Nyog Department of Computer Scence The Unversty of Chcago, 1100 E 58 th Street, Chcago, IL 60637 ABSTRACT Dmensonalty reducton has been receved much

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Automatic Text Categorization of Mathematical Word Problems

Automatic Text Categorization of Mathematical Word Problems Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department

More information

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering Journal of Advances n Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sar Branch, Islamc Azad Unversty, Sar, I.R.Iran (Vol. 6, No. 1, February 2015), Pages: 101-114 www.jacr.ausar.ac.r Usng

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Background Removal in Image indexing and Retrieval

Background Removal in Image indexing and Retrieval Background Removal n Image ndexng and Retreval Y Lu and Hong Guo Department of Electrcal and Computer Engneerng The Unversty of Mchgan-Dearborn Dearborn Mchgan 4818-1491, U.S.A. Voce: 313-593-508, Fax:

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

A Hybrid Text Classification System Using Sentential Frequent Itemsets

A Hybrid Text Classification System Using Sentential Frequent Itemsets A Hybrd Text Classfcaton System Usng Sentental Frequent Itemsets Shzhu Lu, Hepng Hu College of Computer Scence, Huazhong Unversty of Scence and Technology, Wuhan 430074, Chna stoneboo@26.com Abstract:

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Pictures at an Exhibition

Pictures at an Exhibition 1 Pctures at an Exhbton Stephane Kwan and Karen Zhu Department of Electrcal Engneerng Stanford Unversty, Stanford, CA 9405 Emal: {skwan1, kyzhu}@stanford.edu Abstract An mage processng algorthm s desgned

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Fingerprint matching based on weighting method and SVM

Fingerprint matching based on weighting method and SVM Fngerprnt matchng based on weghtng method and SVM Ja Ja, Lanhong Ca, Pnyan Lu, Xuhu Lu Key Laboratory of Pervasve Computng (Tsnghua Unversty), Mnstry of Educaton Bejng 100084, P.R.Chna {jaja}@mals.tsnghua.edu.cn

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Signature and Lexicon Pruning Techniques

Signature and Lexicon Pruning Techniques Sgnature and Lexcon Prunng Technques Srnvas Palla, Hansheng Le, Venu Govndaraju Centre for Unfed Bometrcs and Sensors Unversty at Buffalo {spalla2, hle, govnd}@cedar.buffalo.edu Abstract Handwrtten word

More information