Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
|
|
- Phoebe Hall
- 6 years ago
- Views:
Transcription
1 Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto + Nagaoka Unversty of Technology Kamtomoka-cho, Nagaoka-sh, Ngata , Japan Takash Yukawa Nagaoka Unversty of Technology Kamtomoka-cho, Nagaoka-sh, Ngata , Japan yukawa@vos.nagaokaut.ac.p Abstract In the present paper, a term weghtng classfcaton method usng the ch-square statstc s proposed and evaluated n the classfcaton subtask at NTCIR-6 patent retreval task. In ths task, large numbers of patent applcatons are classfed nto F- term categores. Therefore, a patent classfcaton system requres hgh classfcaton speed, as well as hgh classfcaton accuracy. The ch-square statstc can calculate the frequency of word appearance n the F-term and the frequency of word non-appearance n the F-term. The proposed method treats words as a scalar value and a rankng algorthm smply adds the word values of each word ncluded n the test patent document n each F-term. Therefore, the proposed method provdes classfcaton that s sgnfcantly faster than other methods. The proposed method s evaluated n A-precson, R-precson, and F-measure. Although the proposed method dd not obtan the best score, ths method acheves a classfcaton accuracy that s as hgh as those of other methods usng machne learnng or the vector classfcaton method. In ths task, the processng speed s not evaluated. Therefore, processng speed s also evaluated. The evaluaton results show that the proposed method s much faster than that usng the vector classfcaton +Current afflaton: Nppon Telegraph and Telephone West Corporaton method. Evaluaton results of classfcaton accuracy and processng speed show that the proposed method s confrmed to be effectve and to be practcal. Keywords: patent classfcaton, F-term, ch- Square statstc 1. Introducton In the prevous NTCIR workshop, a machne learnng method and a vector classfcaton method, such as the k-nearest Neghbor method, provded good results for the classfcaton subtask. However, these methods are expected to requre a long processng tme when classfyng a large number of patent documents. At the classfcaton subtask, the number of F-term themes and test documents ncrease consderably. Therefore, the patent classfcaton system s requred to have a hgh classfcaton speed as well as hgh classfcaton accuracy. In the present paper, a hgh-speed classfcaton method havng a classfcaton accuracy that s as good as or better than those of tradtonal methods s proposed usng a complex machne learnng classfcaton method. In addton, the detals of proposed method are descrbed heren. The proposed method s mplemented and evaluated for a classfed test collecton n the classfcaton subtask. The results of the evaluaton are presented and dscussed.
2 Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan 2. Background 2.1 Classfcaton subtask at NTCIR-6 scalar value. In ths secton, the proposed patent classfcaton method usng the ch-square statstc weghtng [6] s descrbed n detal. In the classfcaton subtask at NTCIR-5 [2], two subtasks, the theme classfcaton subtask and the F- term classfcaton subtask, are performed. The F- term classfcaton subtask s used for only fve themes, ncludng 2,562 test documents. These documents assgn multple F-terms. In ths NTCIR-5 F-term classfcaton subtask, popular methods nclude the use of the K Nearest Neghbor method [3] and the Support Vector Machne method [4]. These methods provded good classfcaton accuracy but requre a long processng tme. 2.2 Classfcaton subtask at NTCIR-6 In classfcaton subtask at NTCIR-6 [5] only the F-term classfcaton task s performed. The total number of F-term themes s 108 and total number of test documents s 21,606, and these ncrease at a great rate for NTCIR-5. Therefore, the classfcaton system requres faster classfcaton. Although the Japan Patent Offce (JPO) categorzes approxmately 1,900 F-term themes and JPO has over 4.5 mllon patent applcatons, the NTCIR-6 test collecton has a smaller number of documents, for the purpose of practcal use. In the evaluaton, A-Precson, R-precson, and F- measures are used. The A-precson s the average precson when each relevant F-term s ranked to a test document. The R-precson ndcates the precson when the top R relevant F-term s ranked n a test document, where R s the number of relevant categores. The F-measure s the average nverse of the combned recall and precson. The recall s the rato of correct outputs to the total number of correct categores. The precson s the rato of correct outputs to the total number of outputs. 3.1 Preprocessng A patent applcaton composes fve parts: bblographcal nformaton (ttle of nventon, applcaton number, patent applcant, nventor, etc.), an abstract, a clam, a detaled descrpton, and a bref descrpton and schematc drawngs. For ths task, the proposed method uses only abstract and clam. ChaSen s used for morphologcal analyss and only nouns are used n ths method. 3.2 Ch-square statstc Ch-square statstc weghtng s a term weghtng classfcaton method, such as the TF-IDF [7] method. However, ch-square statstc weghtng consders weghts for words whch do not appear n the documents as well as word whch appear n them. The ch-square statstc weghtng apples the followng equaton to each word of each F-term: 2 (x,y) = D(y,n) + D(x-y,(1-)n) +D(m-y,(1-)n)+D(n-x-(m-y),(1-)(1-)n) (1) where = x/n, = m/n, and D(o,e) = (o-e) 2 /e (2) Each parameter can be estmated usng the followng 2x2 contngency table, whch lsts the F- terms of each word n the tranng patent documents. Table 2 2x2 contngency table Word B Word B Subtotal F-term A y x-y x F-term A m-y n-x-m+y n-x Table 1 Number of themes and documents Subtotal m n-m n NTCIR-5 NTCIR-6 JPO Theme about 1,900 Test Over Here, y s the number of appearances of word B n Document mllon the assgned F-term A n the document, x-y s the number of appearances n the document other than word B n the assgned F-term A, m-y s the number 3. Basc Ch-square Statstc Weghtng of appearances n the document of word B n other than the assgned F-term A, n-x-m+y s the number of appearances n the document of words other than To acheve a hgh-speed classfcaton system, the word B n other than the assgned F-term A, x s the method avodng usng a machne learnng technque total number of the patent documents wth assgned s proposed. The proposed method treats a word as a F-term A, m s the total number of appearances n the
3 Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan document of word B, and n s the total number of the documents n each theme. 3.3 Classfcaton The rankng algorthm sums the calculated chsquare statstc weghtng n each F-term n each word appearng n the test documents. After summng the ch-square statstc weghtng, they are sorted n descendng order: R ( D, F ) W ( F ) W D (3) where W (F ) s the ch-square statstc weghtng of F-term ncluded n document. 3.4 Evaluaton at tentatve test set Table 3 shows the evaluaton results for the average precson usng the NTCIR-5 test collecton as a tentatve test set as well as the same results for the TF-IDF method as a baselne method of term weghtng. 4.1 N-gram ch-square statstc weghtng The N-gram s used to consder the co-occurrence relaton between words. The N-gram ch-square statstc can be calculated n the same way as that for the words. Table 4 N-gram 2x2 contngency table N-gram B N-gram B Subtotal F-term A y x-y x F-term A m-y n-x-m+y n-x Subtotal m n-m n The value of document s calculated as sums of the followng two values: sums of ch-square statstc weghts for words appeared n the document. sums of ch-square statstc weghts for N-gram appeared n the document. WD R ( D, F ) W ( F ) N ( F ) N D (4) Table 3 Evaluaton results of the ch-square statstc and baselne method at NTCIR-5 test collecton System 2B022 3G301 4B064 5H180 5J104 Ch- Square TF-IDF As show n Table 3, the ch-square statstc term weghtng method performs well for all themes. In partcular, the results of 2B022 and 5J104 are good. These themes are few tranng patent document. The ch-square statstc term weghtng classfcaton method performs well even wth few tranng data. 4. Improved Ch-square The prevous secton ndcates that the proposed method performs well compared wth the baselne of the term weghtng method. However, these evaluaton results ndcated that the proposed method had no advantage over the other methods used at NTCIR-5. Therefore, two mprovements are appled to the proposed method. In ths secton, the mprovement of the ch-square statstc term weghtng s descrbed n detal. where N (F ) s the N-gram ch-squared weghtng of F-term ncluded n document. In ths paper, bgram (N=2) s used for the evaluaton. 4.2 Weght emphass for Words n F-term descrptons Words n F-term descrptons are consdered to be mportant words. Therefore, f the words n F-term descrptons appear n the test document, these word ch-square statstc weghts are added to ts weght: W ( F ) W L R ( D, F ) N ( F ) W D W ( F ) W L ND (5) where L s word n F-term descrptons ncludng F- term. 4.3 Evaluaton wth the mproved method Table 5 shows the evaluaton results for the average precson at NTCIR-5 test collecton, as compared wth the other methods consdered heren. In the Table 5, VSM, SVM and K-NN are method and result of other teams partcpated n classfcaton subtask at NTCIR-5. As show n Table 5, the mproved method acheves accuracy as hgh as or better than the accuracy of other vector classfcaton methods. These results show that although term weghtng and smple methods lke the proposed method can obtan good results for a classfed patent document.
4 Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Table 5 Evaluaton results of the mproved chsquare method at NTCIR-5 test collecton System 2B022 3G301 4B064 5H180 5J104 Proposed VSM SVM K-NN Evaluaton of the NTCIR-6 test set In ths secton, the evaluaton results at NTCIR-6 of the proposed method are descrbed and compared wth all of the teams that partcpated n the classfcaton subtask at NTCIR-6 patent retreval task. The weght for each word n the F-term descrpton s vared from 1 to 2, and the N-gram chsquare statstc weghtng s vared from 0.5 to 1.0. Table 7 shows the parameters and results for each system. For the proposed system, the best score was obtaned by NUT5 usng the N-gram ch-square statstc weght of 0.8 and the F-term label key word weght of Evaluaton result of NTCIR-6 Fgures 1, 2, and 3 show the results for all of the teams for A-Precson, R-Precson and F-measures, respectvely. Sx teams of 46 systems, ncludng the team of the present study, partcpated n the workshop. The proposed method ranked 5th among all teams. However, there s no sgnfcant dfference n A-Precson and R-Precson between the top four teams. 5.1 Evaluaton results of the proposed system The proposed method s appled to sx runs wth dfferent parameters as shown n Table 6. The runs 1 through 3 use word ch-square statstc term weghtng and b-gram ch-square statstc term weghtng. The runs 4 through 6 use word ch-square statstc term weghtng, b-gram ch-square statstc term weghtng, and weghts for words n the F-term descrpton. Table 6 System parameters Run ID N-gram F-term label NUT NUT NUT NUT NUT NUT Table 7 Evaluaton results of NTCIR-6 Fg. 1 Evaluaton Result of A-Precson Fg. 2 Evaluaton Result of R-Precson Run ID A-Precson R-Precson F-measure NUT NUT NUT NUT NUT NUT Fg. 3 Evaluaton Result of F-measure
5 Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan 5.3 Dscusson Table 8 shows the classfcaton methods of all teams. Most teams use a machne learnng system or the vector classfcaton method. These systems are assumed to requre a long tme to learn and classfy all of the documents. The proposed system acheves fast classfcaton. In ths task, the classfcaton speed s not evaluated. The proposed method s evaluated n comparson wth the vector classfcaton method. Durng the preprocessng phase, the proposed method performs approxmately 3.5 tmes faster than the vector classfcaton method. In addton, durng the classfcaton phase, the proposed method performs approxmately fve tmes faster than the vector classfcaton method. Table 8 System Team 1 Team 2 Team 3 Team 4 Team 5 Proposed system 6. Concluson Classfcaton method of the NTCIR-6 patent workshop HSMV, SVM SVM, NB NB, Maxmum Entropy K-NN K-NN Ch-Squared References [1] Japan Patent Offce. Admnstraton of Patent 2006 Annual report. [2] Makoto Iwayama, Atush Fu, Norko Kando: Overvew of Classfcaton Subtask at NTCIR-5 Patent Retreval Task, Proc. NTCIR-5 Workshop Meetng (2005). [3] Y. Yang and X. Lu. A re-examnaton of text categorzaton methods. Proc. 22nd Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (1999). [4] N. Crstann and J. Shawe-Taylor. An Introducton to Support Vector Machnes and Other Kernel-based Learnng s. Cambrdge Unversty Press (2000). [5] Makoto Iwayama, Atsush Fu, Norko Kando, Overvew of Classfcaton Subtask at NTCIR-6 Patent Retreval Task, Proceedngs of the 6th NTCIR Workshop, [6] Shnch Morshta, Jun Sese: Traversng Itemset Lattces wth Statstcal Metrc Prunng, Proc. ACM SIGACT-SIGMOD-SIGART Symp. On Database Systems (PODS), pp (2000). [7] Salton, G., and Buckley, C. Term-weghtng approaches n automatc text retreval. Informaton Processng and Management 24 (1998), [8] Shannon, C., A Mathematcal Theory of Communcaton, Bell Syst. Tech. J.27. In the present paper, a hgh-accuracy and hghspeed patent classfcaton method s proposed for the F-term classfcaton subtask. The proposed method s a fast weghtng method usng the ch-square statstc term. The proposed method was appled to sx systems n the classfcaton subtask at NTICR-6 and the results of ths applcaton were evaluated. The results of accuracy evaluaton were good, even though the best score was not obtaned by the proposed method, confrmng the effectveness of the proposed method.
UB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationA Novel Term_Class Relevance Measure for Text Categorization
A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure
More informationDeep Classification in Large-scale Text Hierarchies
Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong
More informationExperiments in Text Categorization Using Term Selection by Distance to Transition Point
Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur
More informationArabic Text Classification Using N-Gram Frequency Statistics A Comparative Study
Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationApplication of k-nn Classifier to Categorizing French Financial News
Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035
More informationDescription of NTU Approach to NTCIR3 Multilingual Information Retrieval
Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationSpam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection
E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationFingerprint matching based on weighting method and SVM
Fngerprnt matchng based on weghtng method and SVM Ja Ja, Lanhong Ca, Pnyan Lu, Xuhu Lu Key Laboratory of Pervasve Computng (Tsnghua Unversty), Mnstry of Educaton Bejng 100084, P.R.Chna {jaja}@mals.tsnghua.edu.cn
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationUsing Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier
Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A
More informationKeywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines
(IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak
More informationA Misclassification Reduction Approach for Automatic Call Routing
A Msclassfcaton Reducton Approach for Automatc Call Routng Fernando Uceda-Ponga 1, Lus Vllaseñor-Pneda 1, Manuel Montes-y-Gómez 1, Alejandro Barbosa 2 1 Laboratoro de Tecnologías del Lenguaje, INAOE, Méxco.
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationUsing an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering
Journal of Advances n Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sar Branch, Islamc Azad Unversty, Sar, I.R.Iran (Vol. 6, No. 1, February 2015), Pages: 101-114 www.jacr.ausar.ac.r Usng
More informationCombining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval
Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationAutomatic Text Categorization of Mathematical Word Problems
Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationClassic Term Weighting Technique for Mining Web Content Outliers
Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha
More informationCredibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification
Credblty Adjusted Term Frequency: A Supervsed Term Weghtng Scheme for Sentment Analyss and Text Classfcaton Yoon Km New York Unversty yhk255@nyu.edu Owen Zhang zhonghua.zhang2006@gmal.com Abstract We provde
More informationInformation Retrieval
Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are
More informationEfficient Text Classification by Weighted Proximal SVM *
Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna
More informationSemantic Image Retrieval Using Region Based Inverted File
Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:
More informationSURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB
SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal
More informationWeb Document Classification Based on Fuzzy Association
Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu
More informationIssues and Empirical Results for Improving Text Classification
Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationFeature Selection as an Improving Step for Decision Tree Construction
2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor
More informationChi Square Feature Extraction Based Svms Arabic Language Text Categorization System
Journal of Computer Scence 3 (6): 430-435, 007 ISSN 1549-3636 007 Scence Publcatons Ch Square Feature Extracton Based Svms Arabc Language Text Categorzaton System Abdelwadood Moh'd A MESLEH Faculty of
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationInvestigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers
Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,
More informationImpact of a New Attribute Extraction Algorithm on Web Page Classification
Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty
More informationA Web Site Classification Approach Based On Its Topological Structure
Internatonal Journal on Asan Language Processng 20 (2):75-86 75 A Web Ste Classfcaton Approach Based On Its Topologcal Structure J-bn Zhang,Zh-mng Xu,Kun-l Xu,Q-shu Pan School of Computer scence and Technology,Harbn
More informationEvaluation of an Enhanced Scheme for High-level Nested Network Mobility
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.
More informationWeb Spam Detection Using Multiple Kernels in Twin Support Vector Machine
Web Spam Detecton Usng Multple Kernels n Twn Support Vector Machne ABSTRACT Seyed Hamd Reza Mohammad, Mohammad Al Zare Chahook Yazd Unversty, Yazd, Iran mohammad_6468@stu.yazd.ac.r chahook@yazd.ac.r Search
More informationA MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES
A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES Aram AlSuer, Ahmed Al-An and Amr Atya 2 Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Australa
More informationRelevance Feedback Document Retrieval using Non-Relevant Documents
Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large
More informationSelecting Query Term Alterations for Web Search by Exploiting Query Contexts
Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer
More informationDeep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies
Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn
More informationNovel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition
Mathematcal Methods for Informaton Scence and Economcs Novel Pattern-based Fngerprnt Recognton Technque Usng D Wavelet Decomposton TUDOR BARBU Insttute of Computer Scence of the Romanan Academy T. Codrescu,,
More informationFederated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks
Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer
More informationDesign of Structure Optimization with APDL
Desgn of Structure Optmzaton wth APDL Yanyun School of Cvl Engneerng and Archtecture, East Chna Jaotong Unversty Nanchang 330013 Chna Abstract In ths paper, the desgn process of structure optmzaton wth
More informationRecommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm
Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league
More informationBiostatistics 615/815
The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts
More informationFace Recognition Based on SVM and 2DPCA
Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationA Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures
A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School
More informationEnhanced AMBTC for Image Compression using Block Classification and Interpolation
Internatonal Journal of Computer Applcatons (0975 8887) Volume 5 No.0, August 0 Enhanced AMBTC for Image Compresson usng Block Classfcaton and Interpolaton S. Vmala Dept. of Comp. Scence Mother Teresa
More informationHigh-Boost Mesh Filtering for 3-D Shape Enhancement
Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,
More informationVirtual Machine Migration based on Trust Measurement of Computer Node
Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationComparison Study of Textural Descriptors for Training Neural Network Classifiers
Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationQuery classification using topic models and support vector machine
Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationDynamic Integration of Regression Models
Dynamc Integraton of Regresson Models Nall Rooney 1, Davd Patterson 1, Sarab Anand 1, Alexey Tsymbal 2 1 NIKEL, Faculty of Engneerng,16J27 Unversty Of Ulster at Jordanstown Newtonabbey, BT37 OQB, Unted
More informationCUM: An Efficient Framework for Mining Concept Units
CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationClassifying Acoustic Transient Signals Using Artificial Intelligence
Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)
More informationCAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University
CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationClassification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM
Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based
More informationWeb-supported Matching and Classification of Business Opportunities
Web-supported Matchng and Classfcaton of Busness Opportuntes. DIRO Unversté de Montréal C.P. 628, succursale Centre-vlle Montréal, Québec, H3C 3J7, Canada Jng Ba, Franços Parads,2, Jan-Yun Ne {bajng, paradfr,
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationReliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples
94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,
More informationA Method of Hot Topic Detection in Blogs Using N-gram Model
84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna
More informationCorner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity
Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent
More informationBAYESIAN MULTI-SOURCE DOMAIN ADAPTATION
BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,
More informationAudio Content Classification Method Research Based on Two-step Strategy
(IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng
More informationComparison of Performance in Text Mining using Categorization of Unstructured Data
Indan Journal of Scence and Technology, Vol 9(4), DOI: 0.7485/jst/06/v94/9648, June 06 ISSN (Prnt) : 0974-6846 ISSN (Onlne) : 0974-5645 Comparson of Performance n Text Mnng usng Categorzaton of Unstructured
More informationTHE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY
Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO
More informationOracle Database: SQL and PL/SQL Fundamentals Certification Course
Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and
More informationVol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.
Journal of Emergng Trends n Computng and Informaton Scences 009-03 CIS Journal. All rghts reserved. http://www.csjournal.org Unhealthy Detecton n Lvestock Texture Images usng Subsampled Contourlet Transform
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray
More informationJournal of Engineering Science and Technology Review 7 (3) (2014) Research Article
Jestr Journal of Engneerng cence and Technology Revew 7 (3) (2014) 151 157 Research Artcle JOURAL OF Engneerng cence and Technology Revew www.estr.org Traffc Classfcaton Method by Combnaton of Host Behavour
More information