A Hybrid Text Classification System Using Sentential Frequent Itemsets

Size: px
Start display at page:

Download "A Hybrid Text Classification System Using Sentential Frequent Itemsets"

Transcription

1 A Hybrd Text Classfcaton System Usng Sentental Frequent Itemsets Shzhu Lu, Hepng Hu College of Computer Scence, Huazhong Unversty of Scence and Technology, Wuhan , Chna Abstract: Text classfcaton technques mostly rely on sngle term analyss of the document data set, whle more concepts especally the specfc ones are usually conveyed by set of terms. To acheve more accurate text classfer, more nformatve feature ncludng frequent co-occurrng words n the same sentence and ther weghts are partcularly mportant n such scenaros. In ths paper, we propose a novel approach usng sentental frequent temset, a concept comes from assocaton rule mnng, for text classfcaton, whch vews a sentence rather than a document as a transacton, and uses a varable precson rough set based method to evaluate each sentental frequent temset s contrbuton to the classfcaton. Experments over the Reuters corpus are carred out, whch valdate the practcablty of the proposed system. Key-Words: text classfcaton, sentental frequent temsets, varable precson rough set model.. Introducton In an effort to keep up wth the tremendous growth of the World Wde Web, many research projects were targeted on how to organze such nformaton n a way that wll make t easer for the end users to fnd the nformaton they want effcently and accurately. Informaton on the Web s mostly present n the form of text document, and that s the reason content-based document management task( collectvely known as nformaton retreval IR), n the last 0 years, have ganed a promnent statues n the nformaton systems feld. Text classfcaton(tc also known as text categorzaton, or topc spottng), the actvty of labelng natural language texts wth thematc categores from a predefned set, s one such task. TC, becomng a major subfeld of the nformaton systems dscplne n the early 90s, s now beng appled n many contexts, rangng from document ndexng based on a controlled vocabulary, to document flterng, automated metadata generaton, word sense dsambguaton, populaton of herarchcal catalogue of Web resources, and n general any applcaton or selectve and adaptve document dspatchng. Recent studes n the data mnng communty proposed new methods for classfcaton employng assocaton rule mnng[,2]. All these current assocatve classfer, to our best knowledge, explot document-level co-occurrng words, whch are groups of words co-occurrng frequently n the same document[3,4]: tranng documents are modeled as transactons where tems are words from the document. Frequent words (temsets) are then mned from such transactons to catch document semantcs and generate IF-THEN rules accordngly. However, assumng document s the unt representng an entre dea, the basc semantc unt n a document s actually the sentence n t. Words co-occurrng n the same sentence have semantc assocaton more or less, and convey more local nformaton than the set of words scatterng n several sentences of a document. Accordng to above observatons, n ths paper, we propose a system for text classfcaton based on two key concepts. The frst s the document DB model whch treats sentence rather document as the transacton to mne the sentental frequent temset (SFI) as the feature of that document. The second concept s usng varable precson rough set model based method to evaluate each SFI s contrbuton to the classfcaton. The system conssts of four components:. A document restructurng scheme that clean nosy nfo n the document and map the orgnal document nto a document DB n whch sentence s the transacton where tems are words n the sentence. 2. A SFIs generator usng Apror algorthm to mne sentental frequent temsets, employed as the feature of the matrx document, n the tranng documents DB. 3. A topc template generator that prune the SFIs and usng the remanng ones to construct topc templates. 4. A classfer that sore each SFI s weght n the test document and topc templates usng our novel weghtng scheme and measure the smlarty between them.. The ntegraton of these four components proved to be of superor performance to tradtonal text classfcaton methods. Although the whole system performance s qute good, each component could be

2 used ndependent of the other. The overall system desgn s llustrated n Fg.. The rest of ths paper s organzed as the follows: Secton 2 ntroduces some prelmnary knowledge and state the problem formally. Secton 3 presents the steps of data preparaton. Secton 4 ntroduces the document DB model and sentental frequent temsets mnng process. Secton 5 ntroduces SFI prunng method. Secton 6 presents our proposed SFI weghtng scheme and SFI-based smlarty measure. Secton 7 dscusses the expermental results. Fnally, we conclude and dscuss future work n the last secton. Tranng Documents Document Constructor DB SFI Mner Topc Template Generator Unlabeled Documents Classfer Fgure. Text classfcaton system desgn 2. Prelmnary and Problem Defnton 2. Text categorzaton Text categorzaton s the task of assgnng a Boolean value to each par d j, c D C, where D s a doman of documents and C { c,..., c c } s a set of predefned categores. A value of T assgned to d j, c ndcates a decson to fle d j under c. More formally, the task s to approxmate the unknown target functon Φ : D C { T, F} (that descrbes how documents ought to be classfed) by means of a functon Φ : D C { T, F} called the classfer (also known as rule, or hypothess, or model) such that Φ and Φ concde as much as possble. Most of researches n text categorzaton come from the machne learnng and nformaton retreval communtes. Rocchl s algorthm[0] s the classcal method n nformaton retreval, beng used n routng and flterng documents. Researchers tackled the text categorzaton n many ways. Classfer based on probablstc methods have been proposed startng wth the frst presented n lterature by Maron n 96 and contnung wth naïve-bayes[] that proved to perform well. ID3 and C4.5 are well-known packages whose cores are makng use of decson tree to buld automatc classfers[2, 3, 4]. K-nearest neghbor (k-nn) s another technque used n text categorzaton[5]. Another method to construct a text categorzaton system s by an nductve rule learnng method. Ths type of classfers s represented by a set of rules n dsjunctve normal form that best cover the tranng set[6, 7, 8]. As reported n [9] the use of bgrams mproved the text categorzaton accuracy as opposed to ungrams use. In addton, n the last decades neural networks and vector machnes (SVM) were used n text categorzaton and they proved to be powerful tools[20, 2, 4]. 2.2 Varable precson rough set model Classfcaton s the core foundaton of rough set theory. In Pawlak s rough set model there s a lmt that the classfcaton s completely correct or wrong, namely, the defntons of lower and upper approxmatons are crsp, whch wll not be applcable to some complcate classfcatons. Based on majorty ncluson relaton Zarko [7] presented a generalzed rough set model, named as varable precson rough set, to overcome the lmtaton. Gven, Y U, we defne the ncluson of to Y, denoted as C (, Y ), by: Y /, > 0 C (, Y) () 0, 0 IS < U, A, V, f > s the nformaton system of dsclosure, where A s the set of attrbutes,

3 A { a, a2,..., a k }. V s the doman of values of A. f s an nformaton functon f : U A V. In the text classfcaton, U s the text collectons and A s the feature set. V s doman of the weght values of feaures n A. R s ndscernblty relaton defned on U, U / R {, 2,..., N }. Is a famly of R_equvalence classes. U s a subset of nterest, and we defne α lower approxmaton and α upper approxmaton by: Rα { U / B C(, ) α} (2) Rα { U / B C(, ) > α} Accordngly, s α boundary regon s defned by: BNDα { U / B α < C(, ) < α} (3) Where α [ 0.5,]. It s easy to show ths model s equvalent to Pawlak s model when α. Ths generalzaton smoothes the boundary of lower and upper approxmatons. In the orgnal rough set model, the classfcaton of the data wth respect to the relatonshp wth the target event s developed by usng three regons: the postve regon n whch an event would occur wth certanty, the negatve n whch an event would not occur wth certanty, and a boundary regon n whch an event mght or mght not occur. The varable precson rough set model defnes the postve and negatve regons as areas where the approxmate classfcaton wth respect to target event wth an error frequency less than some predefned level s possble. In other words, the postve regon then becomes a regon where the event occurred most of the tme, and negatve regon s the regon where the event occurred nfrequently. 3. Data Preparaton In our approach, to convert text of document nto our proposed document DB model whch wll be ntroduced n secton 4., some data preprocessng measures are necessary to be taken to each document. A sentence boundary detector algorthm was developed to locate sentence boundares n the documents. The algorthm s based on a fnte state machne lexcal analyzer wth heurstc rules for fndng the boundares. About 97 percent of the actual boundares are correctly detected. The resultng documents contan very accurate sentence separaton, wth almost neglgble nose. Fnally, to weed out those words that contrbute lttle to buldng the classfer and to reduce the hgh dmensonalty of the data, a document cleanng step s performed to remove stop-words that have no sgnfcance, and to stem the words usng the popular Porter Stemmer algorthm[5]. The subsequent phase conssts of dscoverng SFI from each document DB. 4. Document Database and SFI mnng 4. Document DB Model Sentence s a grammatcal unt that s syntactcally ndependent and has a subject that s expressed or understood. And the central meanng of a document s stated by organzng the basc dea of sentences. Focusng on mnng the local context nformaton n the sentences, we propose a document DB model. In document database model, a word s vewed as an tem, a natural sentence s vewed as a transacton, and a document s vewed as a transacton database. The detaled work flow of constructng document DB s llustrated n Fg. 2. The work presented here takes t a step further toward an effcent way of mnng local context nformaton. Documents Sentence Segmenter Stop words Remover Stemer Encoder Document DB Fgure 2.Process of document DB constructon

4 4.2 SFI Mnng After mappng each document as a transacton DB, we employ the apror algorthm to extract frequent occurrng sets of terms n sentences of each document and use them as that document s characterstc. Compared to documental frequent co-occurrng words, sentental frequent words convey more local context nformaton. The algorthm s descrbes n more detal n fgure 3. In Algorthm step(2) generates the frequent -temset. In steps(3-3) all the k-frequent temsets are generated and merged wth the category n C. The sentence space s reduced n each teraton by elmnatng the transactons that do not contan any of the frequent temsets. Ths step s done by FlterTable( S, F ) functon. The sentental frequent temsets dscovered n ths stage of the process are further processed to buld the topc templates. Fgure 3. Algorthm: fnd sentental frequent temsets n the gven document DB 5. Varable Precson Rough Set Model Based SFI Evaluaton Method By mergng SFIs of documents whch belong to the same category, we get the features of that category spontaneously. We wll use these frequent temsets to construct each category s topc template. For the number of SFI concernng wth a category could be very large, how to calculate each SFI s global weght, SFI s contrbuton to the classfcaton, s the key problem. We propose a weghtng scheme based on varable precson rough set model to evaluate each SFI s global weght, on whch we can select the SFIs for each topc template. Let F {, 2,..., C } be a partton of D, whch s the classfcaton of tranng document accordng to a set of predefned categores C c,..., c }. R A s a SFI as the condton { c

5 attrbutes subset here. Accordng to the Pawlak s rough set model wedefne R F and R F are defned by: RF { R, R2,..., R } (4) RF { R, R2,..., R } Correspondngly, n varable precson rough set model R α F and Rα F are defned by: Rα F { Rα, Rα 2,..., Rα n} (5) Rα F { Rα, Rα 2,..., Rα n} A measure was ntroduced to calculate the mprecson of ths classfcaton, whch s named as approxmate classfcaton qualty defned by: γ R ( F) R U α Approxmate qualty denotes the rato of objects that can be classfed to the F_equvalence classes wth certanty by SFI R. In other words, γ (F R ) measure the consstency degree between the classfcaton by R and F, whch may be nterpreted as the contrbuton to classfcaton that SFI makes. If γ (F R ) of all SFIs are calculated and ordered n ascendng, we can obtan a concse representaton of data by cuttng the features whose classfcaton qualty value s lower than a threshold that users have predefned. (6) 6. A SFI-Based Smlarty Measure As mentoned earler, sentental frequent temsets convey local context nformaton, whch s essental n rankng accurately a document s approprateness to categores. Towards ths end, we devsed a scheme to calculate weght of SFI n test document and topc templates and the cosne measure based on the weght s used to performed the classfcaton. Ths SFI weghtng scheme s a functon of three factors: the length of the SFI l, the frequences of the SFI n both document f, and the levels of sgnfcance (global weght ) of the SFI γ, whch s presented n secton 5. w f l γ (7) j j j j Frequency of SFI s an mportant factor n the measure. The more frequent the SFI appears n the document or the topc template, the more mportant the nformaton conveyed by the SFI s. Smlarly, The longer the SFI s, the more specfc the nformaton conveyed by the SFI s. The smlarty of the test document topc c s calculated wth cosne measure: sm ( c, d ) SFI j N wk wjk k N N 2 2 wk wjk k k (8) d j and the 6. Combnng Sngle-Term and SFI Smlartes If the smlarty between document and topc s solely based on matchng frequent temsets, and no sngle-terms at the same tme, related documents could be judged as nonsmlar f they do not share enough SFIs(a typcal case.) Shared SFIs provde mportant local context matchng, but sometmes smlarty based on SFIs only s not suffcent. To allevate ths problem, and to produce hgh qualty classfcaton, we combne sngle-term smlarty measure wth our temset-based smlarty measure. We used the cosne correlaton smlarty measure[6],[7], wth TF_IDF term weghts, as the sngle-term measure. The cosne measure was chosen due to ts wde use n the text classfcaton lterature, and snce t s descrbed as beng able to capture human categorzaton behavor well. The TF-IDF weghtng s also a wdely used term weghtng scheme. The combnaton of the term-based and the SFI-based smlarty measures s a weghted average of the two quanttes, and s gven by (9). The reason for separatng sngle-terms and SFIs n the smlarty equaton, as opposed to treatng a sngle-term as one-word-temset, s to evaluate the blendng factor between the two quanttes, and see the effect of SFIs n smlarty as opposed to sngle-terms. sm( c, d ) α sm ( c, d ) + ( α) sm ( c, d ) (9) j SFI j t j where α s a value n the nterval [0,] whch determnes the weght of the SFIs smlarty measure, or, as we call t, the Smlarty Blend Factor. Accordng to expermental results dscussed n Secton 7, we found that a value between 0.6 and 0.8 for α results n the maxmum mprovement n classfcaton qualty. 7. Expermental Results In order to test the effectveness of the text classfcaton system, we conducted a set of experments usng our proposed document DB model, varable precson rough set model based SFI prunng method, SFI weghtng scheme, and smlarty measure. 7. Text Corpora

6 Our set of evaluate experment was conducted on the well-known Reuters-2578 collectons, whch are usually splt nto two parts: tranng set for buldng the classfer and testng set for evaluatng the effectveness of the system. There are many splts of Reuters collecton; we select the ModApte verson. Ths splts leads to a corpus of 2,202 documents consstng of 9,603 tranng documents and 3,299 testng documents. All these documents belong to 35 topcs. However, only 93 topcs have more than one document n the tranng set and 82 topcs have less than 00 documents [8]. Obvously, the performances n the categores wth just a few documents would be very low, especally for those that do not even have a document n the tranng set. Among the documents there are some that have no topc assgned to them. We chose to gnore such documents snce no knowledge can be derved from them. Fnally we select ten categores wth largest number of correspondng tranng documents to test our system. Because other researchers often employ the smlar strategy, we can compare our expermental results wth the work of other researchers convenently. There are 6488 tranng documents and 2545 testng documents n these ten retaned categores. 7.2 Evaluaton Measures In order to assess the performance of our approach, we adopted some qualty measures wdely used n the text mnng lterature for the purpose of text classfcaton. The frst tow measures are Precson and Recall. The terms used to express precson and recall are gven n the contngency Table. Estmates of precson wth respect to c and recall wth respect to c may be thus obtaned as P (0) + FP R () + FN For obtanng estmates of P and R, two dfferent methods are adopted: mcroaveragng: P and R are obtaned by summng over all ndvdual decsons: P + FP µ (2) ( + FP) R + FN µ (3) ( + FN ) Where µ ndcates mcroaveragng. The global contngency table(table 2) s thus obtaned by summng over category-specfc contngency tables; macroaveragng: precson and recall are frst evaluated locally for each-category, and then globally over the results of the dfferent categores: P R M M P C R (4) (5) C Where M ndcates macroaveragng. Another measure taken here s Break-Even Pont (BEP), that s, the value at whch precson equals recall. Category c Classfer Judgments YES NO Expert judgments YES FN NO FP TN Table. The Contngency Table for Category Category C { c,..., c C } Classfer Judgments YES Expert judgments NO c YES FP FP NO FN FN TN TN Table 2. The Global Contngency Table 7.3 Expermental Results In order to better understand the effect of the SFI-based smlarty measure on classfcaton qualty, we carry out a set of experments on the text corpora menton n secton 7. and compare the expermental results the most well-known method.

7 Table 3(the results for the other classfcaton systems are reported as gven n [9]) shows a comparson between our classfer and the other well-known methods. The measure used here are precson/recall-breakeven pont, mcro-average and macro-average on the ten most populated Reuters categores. Our system proves to outperform most of the conventonal method, although Its performance s not every good for three categores,.e., gran, money-fx, trade. Table 3. Precson/Recall-breakeven pont on ten most populated Reuters categores for SFI-BC and most known classfers 8. Concluson and Future Work Text classfcaton s a key test-mnng problem, whch s useful to a great number of text-based applcatons. We presented a system composed of four components n an attempt to mprove the text classfcaton problem. The frst component cleans the data and maps document as the document DB. The second component uses apror algorthm to mne the sentental frequent temsets from document DB and use them as the feature of the correspondng document. The thrd component s the topc template generator. We propose a varable precson rough set abased method to evaluate each SFI s contrbuton to the classfcaton. The fourth component s the SFI-based smlarty measure. By carefully examnng the factors affectng the classfcaton, we devsed a SFI-based smlarty measure that s capable of accurate calculaton of smlarty between test document and topc template. The merts of such a desgn are that each component could be utlzed ndependent of the other. But, we have confdence that the combnaton of these components leads to better result. The expermental results show that the SFI based classfer performs well and ts effectveness s comparable to most well-know text classfers. There are a number of future research drectons to extend and mprove ths work. One drecton that ths work mght contnue on s to mprove on the accuracy of SFI-BC. Although the current scheme proved more accurate than tradtonal methods, there s stll room for mprovement. Another drecton s to mprove the feature selecton qualty. Some other feature selecton technques, such as latent semantc analyss whch could gve an nsght on the dscrmnatve feature among classes maybe s the complement of our strategy. Although the work presented here s amed at text classfcaton, t could be easly adapted to Web document as well. However, t wll have to take semstructure of Web document nto account. Our ntenton s to develop a Web document classfcaton system wth our approach. References [] W. L and J. Pe. CMAR: Accurate and effcent classfcaton based on multple class-assocaton rules. In IEEE nternatonal Conference on Data Mnng (ICDM 0), San Jose, Calforna, Novermber 29-December [2] B. Lu, W. Hsu, and Y. Ma. Integratng classfcaton and assocaton rule mnng. In ACM Int. Conf. on Knowledge Dscovery and Data Mnng (SIGKDD 98), pages 80-86, New York Cty, NY,

8 August 998. [3] M.Antone and O.R.Zaane. Text Document Categorzaton by Term Assosaton. In Proc. of IEEE Intl. Conf. on Data Mnng, pages 9-26, [4] D.Meretaks, D.Fragoutds, H.Lu and S.Lkothanass. Scalable Assocaton-based Text Classfcaton. In Proc. of ACM CIKM, 2000 [5] M.F. Poter, An Algorthm for Suffx Strppng, Program, vol.4, no.3, pp-30-37, July 980. [6] G..Salton, A. Wong, and C. Yang, A Vector Space Model for Automatc Indexng, Comn. ACM, vol. 8, no., pp , Nov. 75. [7] G. Salton, Automatc Text Processng: The Transformaton, Analyss and Retreval of Informaton by computer. Readng, Mas: Addson Wesley, [8] O.R.Zaïane and M.L.Antone. Classfyng text documents by assocaton terms wth text categores. In Thrteen Australasan Database Conference(ACD 02), pages , Melbourne, Australa, January [9] T. Joachms. Text categorzaton wth support vector machnes: learnng wth many relevant features. In 0 th European Conference on Machne Learnng (ECML-98), pages 37-42, 998. [0] D. A. Hull. Improvng text retreval for the routng problem usng latent semantc ndexng. In 7 th ACM nternatonal Conference on Machne learnng (ECML-98), pages 37-42, 998. [] D.Lews. Naïve (bayes) at forty: The ndependence assumpton n nformaton retreval. In 0 th Conference on Machne Learnng (ECML-98), pages 4-5, 998. [2] W. Cohen and H. Hrsch. Jons that generalze: text classfcaton usng whrl. In 4 th Internatonal Conference on Knowledge Dscovery and Data Mnng (SgKDD 98), pages 69-73, New York Cty, USA, 998. [3] W. Cohen and Y. Snger. Context-senstve learnng methods for text categorzaton. ACM transacton on Informaton systems, 7(2):46-73, 999. [4] T. Joachms. Text categorzaton wth support vector machnes: learnng wth many relevant feature. In 0 th European Conference on Machne Learnng(ECML-98), pages 37-42, 998. [5] Y.Yang. An evaluaton of statstcal approaches to text categorzaton. Techncal Report CUM-CS-97-27, Carnege mellon Unversty, Aprl 997. [6] I.Mounlner and J.G.. Ganasca. Applyng an exstng machne learnng algorthm to text xategorzaton. In S.Wermter, E.Rloff, and G.Scheler, edtors, Connectonst statstcal, and symbolc approaches to learnng for natural language processng. Sprnger Verlag, Hedelberg, Germany, 996. Lecture Notes for Computer Scence seres, number 040. [7] H.L and K. Yamansh. Text classfcaton usng esc-based stochastc decson lsts. In 8 th ACM nternatonal Conference on Informaton and Knowledge Management(CIKM-99), pages 22-30, Kansas Cty, USA,999. [8] C.Apte, F.Damerau, and S. Wess. Automated learnng of decson rules for text categorzaton. ACM Transactons on Informaton System, 2(3):232-25, 994. [9] C. M. Tan, Y. F. Wang, and C. D. Lee. The use of bgrams to enhance text categorzaton. Journal of Informaton Processng and Management, [20] M. Ruz and P. S:nvasan. Neural networks for text categorzaton. In 22 nd ACM SIGIR nternatonal Conference on Informaton Retreval, pages , Berkeley, CA, USA, August 999. [2] Y. Yang and. Lu. A re-examnaton of text categorzaton methods. In 22ACM nternatonal Conference on Research and Development n Informaton Retreval (SIGIR-99), pags 42-49, Berkeley, US, 999.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1.

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1. Herarchcal Semantc Perceptron Grd based Neural CAO Hua-hu, YU Zhen-we, WANG Yn-yan (Dept. Computer of Chna Unversty of Mnng and Technology Bejng, Bejng 00083, chna) chhu@cumtb.edu.cn Abstract A herarchcal

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Syntactic Tree-based Relation Extraction Using a Generalization of Collins and Duffy Convolution Tree Kernel

Syntactic Tree-based Relation Extraction Using a Generalization of Collins and Duffy Convolution Tree Kernel Syntactc Tree-based Relaton Extracton Usng a Generalzaton of Collns and Duffy Convoluton Tree Kernel Mahdy Khayyaman Seyed Abolghasem Hassan Abolhassan Mrroshandel Sharf Unversty of Technology Sharf Unversty

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c 6th Internatonal Conference on Informaton Engneerng for Mechancs and Materals (ICIMM 2016) Data Preprocessng Based on Partally Supervsed Learnng Na Lu1,2, a, Guangla Gao1,b, Gupng Lu2,c 1 College of Computer

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Automatic Text Categorization of Mathematical Word Problems

Automatic Text Categorization of Mathematical Word Problems Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework Fuzzy Weghted Assocaton Rule Mnng wth Weghted Support and Confdence Framework M. Sulaman Khan, Maybn Muyeba, Frans Coenen 2 Lverpool Hope Unversty, School of Computng, Lverpool, UK 2 The Unversty of Lverpool,

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information