Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Size: px
Start display at page:

Download "Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier"

Transcription

1 Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A Nazl Goharan Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A ABSTRACT Wth the ever-ncreasng number of documents on the web, dgtal lbrares, news sources, etc., the need of a text classfer that can classfy massve amount of data s becomng more crtcal and dffcult. The major problem n text classfcaton s the hgh dmensonalty of feature space. The Support Vector Machne (SVM classfer s shown to perform consstently better than other text classfcaton algorthms. However, the tme taen for tranng a SVM model s more than other algorthms. We explore the use of the Ambguty Measure (AM feature selecton method that uses only the most unambguous eywords to predct the category of a document. Our analyss shows that AM reduces the tranng tme by more than 50% than the scenaro when no feature selecton s used, whle mantanng the accuracy of the text classfer equvalent to or better than usng the whole feature set. We emprcally show the effectveness of our approach n outperformng seven dfferent feature selecton methods usng two standard benchmar datasets. Categores and Subject Descrptors H.3.3 [Informaton Systems and Retreval]: Informaton flterng, Informaton Search and Retreval-search process General Terms Algorthms, Performance, Expermentaton Keywords Feature selecton, Text classfcaton, SVM 1. INTRODUCTION Text classfcaton nvolves scannng through the text documents, and assgnng categores to documents to reflect ther content. A supervsed learnng algorthm nduces decson rules that are used to categorze documents to dfferent categores by learnng from a set of tranng examples. One of the problems n text classfcaton s hgh dmensonalty of the feature space. Some features are Permsson to mae dgtal or hard copes of all or part of ths wor for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SAC 08, March 16-0, 008, Fortaleza, Ceará, Brazl. Copyrght 008 ACM /08/0003 $5.00. commonly used terms, not specfc to any category. These features may hurt the accuracy of the classfer. Moreover, the tme requred for nducton ncreases as the number of features ncreases. That s, rrelevant features lead to an ncrease n tranng tme. Feature selecton methods are used to acheve two objectves: to reduce the sze of the feature set to optmze the classfcaton effcency; and to reduce nose found n the data to optmze the classfcaton effectveness [11]. Feature selecton methods are used as a preprocessng step n the learnng process. The selected features from the tranng set are then used to classfy new ncomng documents. Among the well-nown feature selecton methods are nformaton gan, expected cross entropy, the weght of evdence of text, odds rato, term frequency, mutual nformaton and CHI. The Ambguty Measure (AM feature selecton method s shown to perform better than the state of art feature selecton algorthms on statstcal classfers [9]. The Ambguty measure algorthm selects the most unambguous features, where unambguous features are those features whose presence n a document ndcate a hgh degree of confdence that the document belongs to one specfc category. One of the wdely used text classfcaton algorthms s Support Vector Machnes (SVM [3][4][5][16]. Pror wor [5] ndcates that SVM performs consstently better than Naïve Bayes, NN, C4.5 and Roccho text classfers. However, one of the lmtatons of SVM s ts tme complexty. [16] shows that SVM has a hgher tme complexty for tranng a model than other text classfcaton algorthms. To overcome ths lmtaton of SVM, feature selecton methods are used as a preprocessng step before tranng SVM [1][13][14]. Many well-nown feature selecton algorthms are used on SVM to mprove the accuracy and effcency of SVM. We explore the effects of the AM feature selecton method when appled on SVM and evaluate ts performance n comparson to the publshed state of the art feature selecton algorthms on SVM. We use the AM feature selecton method as a pre-processng step for the Support Vector Machne classfer. The features whose AM are below a gven threshold,.e., more ambguous terms, are purged whle the features whose AM values are above a gven threshold are used for the SVM learnng phase. We compare AM wth the other feature selecton algorthms on two dfferent standard benchmar datasets and show that AM performs statstcally sgnfcantly better than seven publshed state of the art feature selecton methods, reported n [13][14], wth 99% confdence. We also emphercally show that we can reduce the

2 tranng tme by more than 50% than the scenaro when no feature selecton s used, whle mantanng the accuracy of the classfer.. PRIOR WORK To show the effectveness of our feature selecton algorthm, we compare our approach wth the exstng feature selecton methods lsted n Table 1. The descrpton of these feature selecton methods s gven n [][13][15][17], thus we forgo ther mathematcal justfcaton and provde a bref explanaton on the dfferences. The feature selecton methods le odds rato, nformaton gan and CHI use the nowledge about the presence of the terms n the relevant categores ( c as well as n the non-relevant categores ( c. In our approach, AM feature selecton method only uses the nowledge about the presence of the terms n the relevant categores to calculate how confdently a eyword ponts to a gven category. Our objectve s to choose only the features that confdently pont to only one category. In the Improved Gn Index and cross entropy methods, the probabltes of a term wth respect to all categores are consdered. Thus, f the term t appears many tmes n the documents of category c, or f t appears n every document of category c, t s assgned a hgher weght. In a stuaton where t appears n both the categores c 1 and c an equal number of tmes, and moreover, t appears n every document of the both categores, then t s assgned a lower weght. In ths case t s ambguous, as t does not pont to a sngle category. Our proposed AM feature selecton avods such stuaton and assgns a lower weght to such features. For tfdf method, tf refers to term frequency wth respect to a gven category and df ndcates the rato of documents n the collecton that have a gven term. In the tfcf method, cf ndcates the rato of categores that have a gven term. Some terms may appear only n one category for a small number of tmes. Although these terms appear n only a sngle category or document, they are purged durng the feature selecton process f Table 1. Dfferent feature selecton algorthms Method Formula Ref. Odds Rato Tfcf Tfdf Improved Gn Index Info. Gan Cross Entropy (CE CHI t c.[ 1 t c ] OR ( tc [17] [1 t c ]. t c tfcf ( t, c tfdf ( t, d C TF( t, c *log cf ( t D TF( t, d *log df ( t [] [] Gn ( t t C. C t [13] t, c IG ( t, c t, clog t c [15] c { c, c } t { t, t } CE t C t m ( t C tlog [13] C N[ t, c. t, c t, c. t, c CHI ( t, c [15] t. t. c. c they have a low term frequency. Furthermore, some terms frequently appear n a few categores or documents (.e., a hgh cf or df wth a smlar dstrbuton of occurrence n all categores. Such terms are ambguous, as they do not pont strongly to only a sngle category. However, as the term frequency of such terms s hgh, these terms may be selected as good features. The AM feature selecton method avods such stuatons by only consderng the rato between the numbers of occurrences of a term n a gven category to the total number of occurrences of the term n tranng set. Thus, both these stuatons are avoded. 3. METHODOLOGY Intally, we descrbe the ntutve motvaton behnd our approach and then provde a formal defnton of our method. We consder the human percepton of dentfyng the topc of a document by a glance at the document and capturng the eywords. Normally one bases hs/her decson about the topc of a document based on the most unambguous words that the eye captures. We explan ths usng a hypothetcal example. Consder the short paragraph (below that s extracted from [6]. Metallca s a Grammy Award-wnnng Amercan heavy metal/thrash metal band formed n 1981 and has become one of the most commercally successful muscal acts of recent decades. They are consdered one of the "Bg Four" poneers of thrash metal, along wth Anthrax, Slayer, and Mega-death. Metallca has sold more than 90 mllon records worldwde, ncludng 57 mllon albums n the Unted States alone. The paragraph seems to be about Musc. Our human percepton s based on our nowledge of the doman or what we hear daly on varous subjects. Thus, f one s famlar wth the famous roc metal band Metallca, then wthout readng the text, one can confdently clam that the text belongs to Musc rather than Medcne or Sports. Thus, f a feature ponts to only one category, then we assgn a hgher ambguty measure to such a feature and f a feature s vague and does not pont to any gven category n partcular, then we assgn a lower ambguty measure to such a feature. Formally, Ambguty measure (AM s defned as the probablty that a term falls nto a partcular category and s calculated usng the followng formula. The closer the AM value s to 1 then the term s consdered less ambguous. Conversely, f AM s closer to 0, the term s consdered more ambguous wth respect to a gven category. The formula for calculatng AM s gven as follows. AM( t, C tf ( t, c tf ( t AM ( t max( AM( t, C Where tf(t,c s the term frequency of a term t n category c and tf(t s the term frequency of a term t n the entre collecton. The result of the calculaton of Ambguty measure (AM for the feature Metallca s gven n Table, ndcatng Musc category for the term. The AM value for the feature Metallca s 0.99, whch ndcates that the feature Metallca s an unambguous feature and should be ept and not fltered. The feature Anthrax s related to the Medcne category wth an AM value of Anthrax s also the name of a famous musc band n 1980s. Hence, t also appears n the category Musc. Thus, the ambguty measure of Anthrax s less than Metallca. In some cases the

3 ambguty measure of some features s low as they appear consstently n dfferent categores. Example of such s the term Records, whch may appear n all dfferent categores. Thus, the AM value of such term s low (0.33 and t s desrable to flter out such features. Ths reducton n dmensonalty of the feature set ncreases the accuracy by avodng the terms that have lower AM values. We emprcally determne a threshold and flter out all the features whose AM measure s below that gven threshold. Table. Ambguty Measure (AM example Term Metallca Anthrax Records Category Count AM Count AM Count AM Medcne Musc Sports Poltcs Furthermore, we also use AM value of a feature as ts weght. In the SVM classfer, a weght of mportance s assgned to each feature. Thus, f the AM value of a feature s hgher, then the feature has more weght and f the AM value s lower, that feature has less weght. 4. EXPERIMENTAL SETUP In all our experments, we use a sngle computer, wth AMD Athlon.16Ghz processor and 1 GB of RAM. We use the lnear SVM ernel n our experments, as the non-lnear versons gan very lttle n terms of performance [11]. For tranng and testng the SVM model, we use LbSVM.84 [1], a software that s commonly used for classfyng the documents nto bnary or mult-labeled categores. 4.1 Datasets To demonstrate the effectveness of AM feature selecton algorthm, we perform experments on two standard benchmar datasets: 0 Newsgroup and Reuters News Group 0 Newsgroup (0NG [7] conssts of a total of 19,997 documents that are categorzed nto twenty dfferent news groups. Each category contans one thousand documents. Some of the categores are very closely related to each other (e.g. comp.sys.bm.pc.hardware and comp.sys.mac.hardware, whle others are hghly unrelated (e.g. msc.forsale and soc.relgon.chrstan. Ths characterstc contrbutes to the dffculty of categorzaton of documents that belong to very smlar categores. We use a 9-1 tran-test splt for 0 Newsgroup dataset. Thus we have 18,000 documents for tranng and 1,997 documents for testng. The total number of unque features n 0 Newsgroup dataset s 6,061. Reuters 1578 The Reuters 1578 corpus [8] contans Reuters news artcles from The documents range from beng mult-labeled, sngle labeled, or not labeled. Reuters dataset conssts of a total number of 135 categores (labels. However, ten of these categores have sgnfcantly more documents than the rest of the categores. Thus, commonly the top 10 categores are used for expermentatons and to compare the accuracy of the classfcaton results. The top 10 categores of Reuters 1578 are earn, acq, money-fx, gran, trade, crude, nterest, wheat, corn and shp. We use Mod-Apte tran-test splt for Reuters 1578 dataset. There are 7,053 documents n tranng set and,76 documents n testng set. The total number of unque features n Reuters 1578 dataset s 19, Evaluaton Measures To evaluate the accuracy of our approach and compare AM to the results of the state of the art feature selecton methods we use mcro-f1 measure. F1 measure s a common measure n text classfcaton that combnes recall and precson nto a sngle score wth equal mportance accordng to the formula:. P. R F1measure= ( P R where P s precson and R s recall. 5. RESULTS & ANALYSIS We organze the results nto two subsectons. In secton 5.1, the effectveness of our approach on two standard benchmar datasets s presented. We compare our results wth the publshed state of the art results and show that AM performs statstcally sgnfcantly better than the seven exstng feature selecton algorthms that are summarzed and publshed n [13][14]. To our nowledge, the classfcaton results for SVM algorthm usng odds rato, tfdf and tfcf are not reported n any pror wors on Reuters 1578 and 0 Newsgroups datasets, thus, we mplemented these feature selecton methods on SVM and report the results n Fgure 1 and Fgure. In secton 5., we demonstrate how AM feature selecton reduces the tranng tme whle optmzng the F1 measure. We also explan the effects of the threshold value on the classfcaton results. 5.1 Accuracy Comparson The comparson of classfcaton performance of AM feature selecton method wth varous feature selecton methods that are reported n [13] on Reuters 1578 dataset s summarzed n Fgure 1. [13] proposed an mproved verson of gn ndex that performs better than the other reported feature selecton algorthms. Our proposed AM feature selecton method statstcally sgnfcantly outperforms the Improved Gn ndex and other feature selecton methods depcted n fgure 1 wth a confdence level of 99% on Reuters 1578 usng a two-taled pared t-test. Smlarly, the classfcaton performance on 0 Newsgroups dataset s summarzed n Fgure. We compare our results to orthogonal centrod feature selecton (OCFS method reported n [14]. To eep our results presentaton consstent wth that reported n [14], we too, report the mcro F1 measures of OCFS by applyng celng functon to the results and roundng to the next hghest nteger. As shown, AM feature selecton method clearly outperforms OCFS method on 0 Newsgroups dataset wth a sgnfcant mprovement. Moreover, AM also statstcally sgnfcantly outperforms the accuracy of nformaton gan, CHI, odds rato, tfdf and tfcf feature selecton methods. As depcted n Fgure 1 and Fgure the F1 measure on Reuters dataset (89.14% s sgnfcantly hgher than F1 measure on 0 Newsgroups dataset (78.74%. The dfference between the F1

4 Fgure 1: Comparson of AM wth other feature selecton methods n terms of F1 measure on Reuters 1578 dataset Fgure : Comparson of AM wth other feature selecton methods n terms of F1 measure on 0 Newsgroup dataset Fgure 3: Correlaton between AM thresholds and tranng/testng tme; and between the AM threshold and Mcro-F1, usng SVM classfer on Reuters 1578 dataset Fgure 4: Correlaton between AM thresholds and tranng/testng tme; and between the AM threshold and Mcro-F1, usng SVM classfer on 0 Newsgroup dataset results of Reuters 1578 and 0NG datasets s due to the percentage of postve and negatve examples n the tranng sets of each. That s, we only consder the top 10 categores for Reuters 1578 dataset. The tranng set conssts of 10% of every category on average. As SVM s a bnary classfer and we use one-aganst-rest approach for mult-labelled datasets, the number of postve examples (actual category n tranng set s 10% and number of negatve examples s 90%. In the 0NG dataset, we have 0 categores wth 5% of documents of each category n the tranng set. Thus, durng classfcaton, we have 5% postve examples and 95% negatve examples. Hence, there are less postve examples to learn from n 0NG dataset as compared to the Reuters dataset, resultng to a better accuracy for Reuters 1578 dataset. 5. Tradeoff of accuracy and tme wth respect to threshold values In ths secton, we report the effects of the AM thresholds n the process of feature selecton on the values of F1 measure and the correspondng tme taen to tran the model and classfy the documents usng SVM classfer. Fgure 3 and fgure 4 show the results for Reuters 1578 and 0 Newsgroups datasets, respectvely. The x-axs represents dfferent threshold values and the y-axs represents mcro-f1 measure and tme. The threshold value ndcates that all the features whose weghts are above that value are selected and the remanng features are fltered. The % of eywords value (fgures 3 & 4 ndcates the correspondng percentage of eywords selected when the threshold was set to a gven value. As shown n Fgure 3, when we apply AM feature selecton method, mcro-f1 measure ncreases as we flter out the features wth lower AM value. We obtan the best mcro-f1 value when the threshold s set to 0.3. Only 70.16% of the features are retaned when the threshold s 0.3. As the threshold s ncreased, the mcro-f1 measure starts droppng. Ths ndcates that when the threshold s less than 0.3, most of the features that are fltered are ambguous and lead to a hgher accuracy of the classfer. When the threshold s above 0.3, most of the features that are fltered contan nformaton relevant to text classfcaton. Thus, when these features are fltered, the accuracy of the classfer decreases. The tranng tme ncludes the feature selecton tme and the tme taen to tran the SVM model usng LbSVM. The testng tme s the tme taen by LbSVM to classfy the testng data. Fgure 3 demonstrates that when no feature selecton s appled,.e. when threshold s equal to zero, tme taen for tranng s 33 seconds. When we reduce the dmensonalty of feature set by settng the threshold to 0.3, the tranng tme also reduces to 1 seconds. Ths demonstrates the effect of feature selecton n reducng the tranng tme for SVM whle optmzng the results.

5 As shown n Fgure 4, the behavor of mcro-f1 measure on 0 Newsgroups dataset s smlar to the results on Reuters dataset. The results consstently mprove when the threshold s below 0.. Only 41% of features are retaned when the threshold s set to 0.. As the threshold ncreases, more features are fltered and thus, from a certan pont the accuracy of the classfer consstently degrades as the threshold further ncreases. When no feature selecton s appled, tme taen for tranng s 387 seconds. However, when we reduce the dmensonalty of feature set by settng the threshold to 0., the tranng tme also reduces to 185 seconds. We also get the best F1 measure value when the threshold s set to 0.. Ths shows that even though the learnng tme s reduced by more than 50%, we stll obtan comparable or better results than when we do not apply any feature selecton. 0 Newsgroups dataset has more tranng documents (18,000 than Reuters 1578 dataset (7,053. Also the number of features (6,061 and the average document length (78 for 0 Newsgroups dataset s more than Reuters 1578 dataset (No. of features: 19,48, Avg. document length: 53. Thus, the tranng tme taen for 0 Newsgroups s more than the tranng tme taen for Reuters 1578 dataset. One of the lmtatons of usng feature selecton algorthms s to fnd a proper threshold for a gven dataset. We found the threshold for Reuters 1578 dataset as 0. and for 0 Newsgroups dataset as 0.3. Addtonally, we expermented usng stratfed 10- fold cross valdaton and confrmed the same thresholds as we reported for Reuters Mod-Apte splt and 0 News Groups 9-1 splt. To further nvestgate ths problem, we further expermented on two addtonal standard datasets from statlog collecton [10] called DNA dataset (3 categores;,000 tranng documents; 1,186 testng documents and Vehcle dataset (4 categores; 761 tranng documents; 85 testng documents. We found that the threshold for both DNA dataset (Mcro F1: 93.17% and Vehcle dataset (Mcro F1: 8.9% s also 0.3. Thus, the observaton ndcates that the threshold between 0. to 0.3 yelds the best results on the four datasets we used for our expermentatons. 6. CONCLUSION We explored an effectve feature selecton algorthm, Ambguty Measure (AM; and we appled AM on SVM text classfcaton. Wth an ever-ncreasng number of dgtal documents, many tradtonal text classfcaton technques fal to handle the scale of ths data due to ther tme complexty and space requrements. In ths paper, we have shown that AM feature selecton method can reduce the computaton tme of the SVM text classfer to an extent wthout hurtng the effectveness of the classfer. We performed experments on two standard benchmar datasets, Reuters 1578 and 0 Newsgroups. We showed that AM performs statstcally sgnfcantly better than the current publshed state of the art feature selecton algorthms on SVM. Furthermore, we provded analyss of how the mcro-f1 s affected as we set more strngent thresholds for feature selecton. We demonstrated that as the threshold for selectng the features s ncreased, the mcro-f1 measure mproves untl up to a specfc threshold. The tme taen for tranng a classfer s much lower than the scenaro when no feature selecton s used. By ncreasng the threshold beyond a pont, the effectveness of the text classfer decreases. 7. REFERENCES [1] Chang C.C., Ln C.J., LIBSVM: a lbrary for support vector machnes, 001. [] Chh H.B., Kulathuramayer N., An Emprcal Study of Feature Selecton for Text Categorzaton based on Term Weghtage. IEEE/WIC/ACM Internatonal Conference on Web Intellgence, 004. pg: [3] Cortes C., Vapn V., Support-vector networs. Machne Learnng, Volume 0, Number 3, September pg [4] Joachms T., Mang Large-scale support vector machne learnng practcal. In B. Schölopf et al. (Eds., Advances n ernel methods: Support vector learnng. MIT Press, 1999, pg [5] Joachms T., Text Categorzaton wth Support Vector Machnes: Learnng wth many relevant features. 10th European Conference on Machne Learnng, 1998, pg [6] [7] Lang K., Orgnal 0 Newsgroups Dataset. people.csa.mt.edu/jrenne/0newsgroups. [8] Lews D., Reuters-1578, resources/testcollectons/reuters1578. [9] Mengle S., Goharan N., Platt Alana., FACT: Fast Algorthm for Categorzng Text. IEEE 5 th Internatonal Conference on Intellgence and Securty Informatcs, 007. pg [10] Mche D., Spegelhalter D., Taylor C., Machne Learnng, Neural and Statstcal Classfcaton. Prentce Hall, [11] Mladenć D., Bran J, Grobeln M., Mlc-Fraylng N., Feature Selecton usng Lnear Classfer Weghts: Interacton wth Classfcaton Models. 7th ACM SIGIR Conference on Research and Development n Informaton Retreval, 004. pg [1] Novovcova J., Mal A., Informaton-theoretc feature selecton algorthms for text classfcaton. IEEE Internatonal Jont Conference on Neural Networs, IJCNN 005. Volume: 5, pg [13] Wenqan S., Houuan H., Habn Z., Yongmn L., Youl Q., Zhha W., A novel feature selecton algorthm for text classfcaton. Expert Systems wth Applcatons: An Internatonal Journal Volume 33, Issue 1, 007 pg 1-5. [14] Yan J., Lu N., Zhang B., Yan S., Chen Z., Cheng Q., Fan Q., Ma W. OCFS: optmal orthogonal centrod feature selecton for text categorzaton. Proceedngs of the 8th annual nternatonal ACM SIGIR conference on Research and development n Informaton Retreval, 005. pg [15] Yang Y., Pedersen J.. A comparatve study on feature set selecton n text categorzaton. 14 th Internatonal Conference on Machne Learnng, pg: [16] Yang Y., Zhang J., Ksel B, A scalablty analyss of classfers n text categorzaton. 6th ACM SIGIR Conference on Research and Development n Informaton Retreval, 003. pg: [17] Zheng Z., Srhar R., Optmally Combnng Postve and Negatve Features for Text Categorzaton. In Proceedngs of the ICML, Worshop on Learnng from Imbalanced Datasets II, Washngton DC, 003.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm IOP Conference Seres: Materals Scence and Engneerng PAPER OPEN ACCESS Feature Selecton for Natural Language Call Routng Based on Self-Adaptve Genetc Algorthm To cte ths artcle: A Koromyslova et al 017

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Feature Kernel Functions: Improving SVMs Using High-level Knowledge

Feature Kernel Functions: Improving SVMs Using High-level Knowledge Feature Kernel Functons: Improvng SVMs Usng Hgh-level Knowledge Qang Sun, Gerald DeJong Department of Computer Scence, Unversty of Illnos at Urbana-Champagn qangsun@uuc.edu, dejong@cs.uuc.edu Abstract

More information

Multiclass Object Recognition based on Texture Linear Genetic Programming

Multiclass Object Recognition based on Texture Linear Genetic Programming Multclass Object Recognton based on Texture Lnear Genetc Programmng Gustavo Olague 1, Eva Romero 1 Leonardo Trujllo 1, and Br Bhanu 2 1 CICESE, Km. 107 carretera Tjuana-Ensenada, Mexco, olague@ccese.mx,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Journal of Process Control

Journal of Process Control Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss

More information

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.

More information

Clustering of Words Based on Relative Contribution for Text Categorization

Clustering of Words Based on Relative Contribution for Text Categorization Clusterng of Words Based on Relatve Contrbuton for Text Categorzaton Je-Mng Yang, Zh-Yng Lu, Zhao-Yang Qu Abstract Term clusterng tres to group words based on the smlarty crteron between words, so that

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012 Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES Aram AlSuer, Ahmed Al-An and Amr Atya 2 Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Australa

More information

A Powerful Feature Selection approach based on Mutual Information

A Powerful Feature Selection approach based on Mutual Information 6 IJCN Internatonal Journal of Computer cence and Network ecurty, VOL.8 No.4, Aprl 008 A Powerful Feature electon approach based on Mutual Informaton Al El Akad, Abdelall El Ouardgh, and Drss Aboutadne

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines An Evaluaton of Dvde-and-Combne Strateges for Image Categorzaton by Mult-Class Support Vector Machnes C. Demrkesen¹ and H. Cherf¹, ² 1: Insttue of Scence and Engneerng 2: Faculté des Scences Mrande Galatasaray

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009)

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009) Statstcal Technques n Robotcs (Fall 09) Keywords: classfer ensemblng, onlne learnng, expert combnaton, machne learnng Javer Hernandez Alberto Rodrguez Tomas Smon javerhe@andrew.cmu.edu albertor@andrew.cmu.edu

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Classification and clustering using SVM

Classification and clustering using SVM Lucan Blaga Unversty of Sbu Hermann Oberth Engneerng Faculty Computer Scence Department Classfcaton and clusterng usng SVM nd PhD Report Thess Ttle: Data Mnng for Unstructured Data Author: Danel MORARIU,

More information

Parallel Sequential Minimal Optimization for the Training. of Support Vector Machines

Parallel Sequential Minimal Optimization for the Training. of Support Vector Machines Parallel Sequental Mnmal Optmzaton for the Tranng of Sport Vector Machnes 1 L.J. Cao a, S.S. Keerth b, C.J. Ong b, P. Uvaraj c, X.J. Fu c and H.P. Lee c, J.Q. Zhang a a Fnancal Studes of Fudan Unversty,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information