Classic Term Weighting Technique for Mining Web Content Outliers

Size: px
Start display at page:

Download "Classic Term Weighting Technique for Mining Web Content Outliers"

Transcription

1 Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha Abstract Outler analyss has become a popular topc n the feld of data mnng but there have been less work on how to detect outlers n web content. Mnng Web Content Outlers s used to detect rrelevant web content wthn a web portal. Term Frequency (TF) technques from Informaton Retreval (IR) have been used to detect the relevancy of a term n a web document. However, when document length vares, relatve frequency s preferred. Ths study used maxmum frequency normalzaton and appled Inverse Document Frequency (IDF) weghtng technque whch s a tradtonal term weghtng method n IR to use the value of less frequent terms among documents whch are consdered as more dscrmnatve than frequent terms. The dataset s from The 20 Newsgroups Dataset. TF.IDF s used n dssmlarty measure and the result acheves up to 91.10% of accuracy, whch s about 17.77% hgher than the prevous technque. Keywords nformaton retreval, outlers, term weghtng, web content I I. INTRODUCTION N the past few years, there was a rapd expanson of actvtes n the Web Content Mnng area. However, the focus was only on the techncal, vsual desgn and frequent web content pattern whle less frequent web content pattern called outlers was undervalued. Web content outler mnng s focused on detectng an rrelevant web page from the rest of the web pages under the same categores [3],[5]. Web content outler mnng not only s helpful to detect outlers when a web portal s hacked but also may lead to the dscovery of emergng busness patterns and trends [12]. Unlke tradtonal outler mnng algorthms desgned solely for numerc data sets, web outler mnng algorthms should be applcable for varyng types of data such as text, hypertext, vdeo, audo, mage and HTML tags [11]. There are two groups of web content outler mnng strateges. Those that drectly mne the content outler of documents to dscover nformaton of outlers and those that reject outlers to mprove on the search content of other tools lke search engnes. W. R. Wan Zulkfel s wth the Department of Computer Scence, Faculty of Computer Scence and Informaton Technology, Unversty Putra Malaysa, Serdang, Selangor, Malaysa (phone: ; e-mal: wanrusla@gmal.com). N. Mustapha s wth the Department of Computer Scence, Faculty of Computer Scence and Informaton Technology, Unversty Putra Malaysa, Serdang, Selangor, Malaysa (e-mal: norwat@fsktm.upm.edu.my). A. Mustapha s wth the Department of Computer Scence, Faculty of Computer Scence and Informaton Technology, Unversty Putra Malaysa, Serdang, Selangor, Malaysa (e-mal: ada@fsktm.upm.edu.my). Web content outler mnng s related wth data outler mnng and text outler mnng. It s because many data mnng technques can be appled n Web content mnng, and most of the web contents are texts. However, t s dfferent from data mnng and text mnng because Web data are manly semstructured and/or unstructured, whle data mnng deals prmarly wth structured data and text mnng focuses only on unstructured texts. Web content outler mnng thus requres creatve applcatons of data outler mnng and/or text outler mnng technques to buld ts own unque approaches. The n-gram based and word based technque are useable n the preprocessng part of mnng web content outler. The n- gram based technque s wdely used to dscompose and slce a word nto substrngs szed n. N-gram based technques are sutable n web content outlers mnng because the fxed lengths concept helps n memory utlzaton, plus t supports partal matchng of strngs whch s good for outler detecton [11],[12],[14]. However n-gram based systems become slow for very large datasets because of the huge number of n-gram vectors generated durng mnng web content outlers [14]. Whereas the word based technque just mantan the sze of the words. Although the words are n varable length, the effcency of word based web content outler mnng can be ncreased by ndexng the words n two dmensonal format (, j) and ndexng the doman dctonary based on length of the word [4], [6]. The organzed doman dctonary ensured that the memory space, search tme and run tme for checkng the relevancy of the web documents gets reduced [4]. The n-gram based systems takes a longer tme to complete a task than the word based systems even though the sze of data s not too large. Ths problem ncreases the necessty to use word-based technque n web content outlers mnng to accelerate mplementaton due to the exponental growth of data on the nternet. Term weghtng technque such as TF.IDF [7] has been used ntensely for varous text retreval tasks. A wealth of approaches to model the term vector space has been proposed [1],[2],[8],[10], but the nterest to mplement those technques n Mnng Web Content Outlers has been so far lmted. In ths paper, we used classc vector space technque, TF.IDF to see the compatblty of the technque for Mnng Web Content Outlers. II. RELATED WORKS Weghtng technque has been used n Mnng Web Content Outlers, but the concept s dfferent from term weghtng 271

2 Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa technques n Informaton Retreval. The term weght assgned to the text n web content depends on whch HTML tags enclosed n the text. META and TITLE tags are gven a larger weght than BODY tags because ts gves a better representaton of web content. Relatve Document Weght (RDW) used ths concept. It can compare dfferent documents wth varyng szes n the same category, but the ssue s most of web pages do not have META tag descrpton [11]. The above technque then modfed to n-gram weghtng technque whch s usng n-gram wth doman dctonary [12] and wthout a doman dctonary [13] to determne the smlarty of strngs and expand t to nclude pages contanng smlar strngs. N-grams are used because t supports partal matchng of strngs wth errors. The HyCOQ algorthm s generated to enhance n-gram weghtng technque wthout a doman dctonary by usng the strength of n-gram based and word-based systems. The ndvdual document dssmlartes were derved usng k-dssmlarty, neghborhood dssmlarty and nearest dssmlarty densty adapted from local outler concept [17]. Word based systems apples dfferent technques than n- gram based systems. Besdes applyng full word matchng, the doman dctonary was ndexed based on the length of word n order to enhance term searchng qualty [4]. There are three types of outler detecton n web content. The frst type detect outlers n a web content and remove t mmedately from the orgnal web content to get the requred web content by the user. The system used clusterng technque and mathematcal set formula such as subset, unon and ntersecton for detectng outlers [3]. Meanwhle, the second type focuses on detectng outlers n web pages and returns the web pages that are suspected as web page outlers to the user [11], [12], [13], [17]. Ths applcaton captured web content outlers to gan nterestng values whch can lead to new emergng busness patterns and trends. In addton, the thrd type detects outlers n web pages, remove web pages outler and mprove the search page result by removng redundant web pages [5], [6]. Every type of applcaton s mportant. Ths study focuses on second type of outler detecton. There stll have many thngs to mprove especally the qualty of the outler return result. A word based system used TF [9] but not mplemented t as weghtng technque and TF.IDF [6]. The exstng method used TF.IDF n ther applcaton but t mplemented wth n-gram based technque. Due to the slow runnng problem of n-gram based systems, ths paper changed the technque to word based technque but stll mplementng TF.IDF to see the effcency of word based technque n detectng web content outlers wth TF.IDF technque [7]. III. ARCHITECTURE DESIGN The proposed algorthm uses the advantages of full word match and organzed doman dctonary whch s ndexed based on length of the word [4]. The paper assumes the exstence of a dctonary for ntended category. The full word frequency profle for the web page s generated. The web pages are weghted based on ther frequency and a penalty s awarded aganst word that s present n the document but not n the doman dctonary because t contrbutes more to dssmlarty of the document. Whle those found n the dctonary ncreases the smlarty between the document and the dctonary [12]. The weghtng of a term corresponds to ts frequency of occurrence n the document whch s dstngushed n two types of frequences. The term frequency corresponds to the number of term occurrences n the concern nformaton. Whle absolute frequency corresponds to the stemmed words frequency n the whole collecton of nformaton [16]. Terms whch have a weak frequency are not representatve of the document content whle the most sgnfcant terms are those whose frequency s ntermedate. When document length vares, relatve frequency s preferred than normalzng the values. Maxmum Frequency Normalzaton s used wth Inverse Document Frequency (IDF) weghtng technque because the less frequent terms among documents mght be more dscrmnatve. The relatve weght of document determnes ts dssmlarty weghts compares to other documents n the category and then outlers are ranked based on dssmlarty weghts whch are hgher than the other document n the category. Fg. 1 shows the archtecture desgn of the proposed system. Organzed Doman Dctonary Extracted Web Pages Preprocessng Full Word Profle Generaton Compute Dssmlarty Measure Determne Outlers Fg. 1 Archtecture Desgn of the proposed system A. Document Extracton At the frst phase, the web pages under the same category of nterest were retreved and extracted. It can be acheved usng web search engne or web crawlers [18]. The web pages are analyzed to elmnate texts whch are not enclosed n TITLE or META or BODY tags. However, ths paper used the already extracted dataset taken from the WEBKB data repostory [14], [20]. 272

3 Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa B. Preprocessng Then n the preprocessng phase, any data besdes text embedded n the HTML tags lke hyperlnk, mage, sound, numerc characters, symbols, null values (whtespaces and other predefned characters from both sde of strng) and stop words were removed. Stop words whch are known as words wth frequency greater than user specfed frequency, have been removed from web contents usng publc lst of stopwords [21]. Web contents were also stemmed wth Porter Stemmng Algorthm [22] to change the words to root word. C. Generate Full Word Profle The fltered datasets s then used to generate full word profle. At ths tme, the doman dctonary has been ndexed based on the length of the word [4]. It s mportant to use organzed doman dctonary because every word n the web pages s checked wth the doman dctonary based on the length. If the words exst n both sdes, t wll be flagged as 1, otherwse 0 wll be returned. Then the word frequency wll be counted. The full word profle generated by ndexng all word wth two dmensonal format (,j) [4] and every word attached wth word frequency, word length and the bnary number whch mentoned ether t exst n the doman dctonary or not. D. Compute Dssmlarty Measure In the weghtng computaton, a classc term weghtng technque, TF.IDF [7] from Informaton Retreval (IR) was adopted to evaluate the representatveness of terms n the web content. The dssmlarty measure computed to determne the dfference among pages wthn the same category [11]. The Maxmum Frequency Normalzaton appled to Term Frequency (TF) weghtng because when the document length vares, the relatve frequency s preferred [16]. Snce term frequency alone may not have the dscrmnatng power to pck up all relevant documents from other rrelevant documents, an IDF (Inverse Document Frequency) factor whch takes the collecton dstrbuton nto account has been proposed to help to mprove the performance of IR [15]. 0.5 f ( t, d ) d j, j e j MaxFreq d ( ) N k (1) where e j shows the word exst n the doman dctonary or not and gven f(t j,d ) denotes the frequency of term t j present n the document d, whle MaxFreq(d ) determne maxmum frequency of a word n a document, N s the total number of documents and k s the number of documents wth term t j appears. However, the dssmlarty measure (1) wll only compute the words that exst n the dctonary because the formula returns only a bnary value. Then the words that dd not exst n the doman dctonary wll not be computed. The reason s the word that exsts n the dctonary s more relevant to the doman category and t represents the power of the document. The outlers come out wth the lowest frequency of word that exsts n the dctonary and there wll be only a few words that exst n the doman dctonary. Therefore the dssmlarty measures wll return a hgher dssmlarty value than other web pages. The same results shows n the dssmlarty functon below: 0.5 f ( t, e ) e j, j MaxFreq d ( ) where e shows the words n the document that exst n the doman dctonary. The other functons have the same meanng and defnton, refer to (1). Equaton (2) s the dssmlarty measure where the formula was smplfed from formula (1) and t computes words that only exst n the document and the doman dctonary. E. Determne Outlers The output from the dssmlarty measure was ranked to determne the outlers. The top n (the value of n s equal to total of benchmark data) of the result declared as outlers. IV. ALGORITHM N k Input: Doman Dctonary and Web Document d Output: Outlyng documents 1. Read the content of the documents and the doman dctonary. 2. Extract the documents and preprocess. 3. Generate full word profle 4. Generate organzed doman dctonary 5. For (nt 0; <NoOfDoc; ++) { 6. For( nt j1; j<noofwords; j++) { 7. If ( j exsts n the doman dctonary) { f ( t j, e ) N j MaxFreq d, k ( ) e 9. }}// end of nner loop 10. / number of words n the document that exst n the doman dctonary. 11. Rank the result of 12. The top n of the result declared as outlers. V. EXPERIMENTAL RESULTS Ths technque has been tested wth two datasets. The frst dataset consst of 35 web pages from the Course folder of Unversty Cornell, provded by World Wde Knowledge Base (WEBKB). There s no benchmark data for testng web content outlers, so embedded motve s the only way to know f the outlers returned are actually real outlers. Therefore, the experment used 10 benchmark web pages from Scence Medcal folder provded by The 20 Newsgroups Dataset. Although the outlers usually consttute less than 10% of the entre dataset [19], but the ratonal for choosng 10 web pages as embedded motfs for the frst experment s to see the performance of the system n detectng outlers f there s more (2) 273

4 Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa outlers n the dataset. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% F1-Measure Measurement Accuracy N-GRAMS TF TF.IDF Fg. 2 Performance of outler detecton from the frst dataset. Fg. 2 shows the performance of outler detecton from the frst dataset. The results are counted based on how much the web content outlers (whch s from the benchmark dataset) returned by the system. The results are ranked and the top 10 web pages are categorzed as web content outlers. It qualfed by two parameters whch s the percentage of the accuracy and the F1-measure. The expermental result shows that the system usng TF.IDF technque acheves up to 91.10% of accuracy, whch about 17.77% hgher than the TF technque and 13.10% hgher than the N-Gram based technque. Besdes, t also acheves up to 80% of F1-measure, whch s a 40% mprovement from the TF technque and 30% mprovement from the N-gram based technque. Moreover, the result of the recommended technque shows faster executon tme than N- gram based system and t s sutable for large sze dataset. The second dataset consst of 200 web pages from the Course folder of Unversty Texas, Washngton and Wsconsn provded by World Wde Knowledge Base. 20 benchmark web pages (that s 10% of the entre dataset) was also taken from Scence Medcal folder provded by The 20 Newsgroups Dataset. Fg. 3 shows the performance of outler detecton from the second dataset. The top 20 results returned by the system were consdered as outlers. 100% 80% 60% 40% 20% 0% F1-Measure Measurement Accuracy N-GRAMS TF TF.IDF Fg. 3 Performance of outler detecton from the second dataset. The second experment shows that the performance of TF.IDF technque acheves up to 93.63% of accuracy, whch s about 7.27% hgher than the TF technque and 1.54% hgher than the N-Gram based technque. Besdes, t also acheves up to 65% of F1-measure, whch s a 40% mprovement from the TF technque and 10% mprovement from the N-gram based technque. The N-gram based systems shows good performance but t s not very effcent because the system takes a very long tme to process large datasets. It s because of the huge number of n-gram vectors generated durng mnng [14]. VI. CONCLUSION AND FUTURE WORK Mnng Web Content Outlers have relatons wth mnng text outlers and Informaton Retreval. Therefore many technques from both felds can be adopted for mnng Web Content Outlers. Some effort s needed to mprove the qualty of outler detecton n web content. Ths paper used a tradtonal weghtng technque TF.IDF [7] from Informaton Retreval whch s commonly used n text mnng. The experment shows the TF.IDF technque from Informaton Retreval s not only compatble to use n detectng web outlers, t even returns better results than the prevous works. Ths encourages the efforts to use another weghtng technque from those dscplnes for mnng web content outlers n the future. Then, the technque can be enhanced by addng some calculaton to remove redundant web pages f exst. REFERENCES [1] A. Khan, B. Baharudn and K. Khan, Effcent feature selecton and doman relevance term weghtng method for Document Classfcaton, Second Internatonal Conference on Computer Engneerng and Applcatons IEEE, [2] C. Desy, M. Gowr, S. Baskar, S.M.A. Kalaaras,and N. Ramraj, A novel term weghtng scheme MIDF for Text Categorzaton, Journal of Engneerng Scence and Technology Vol. 5, No. 1 pp , [3] G.Poonkuzhal, K.Thagarajan, and K.Sarukes, Set theoretcal approach for mnng web content through outlers detecton, Internatonal Journal on Research and Industral Applcatons, Vol. 2, pp , Jan [4] G. Poonkuzhal, K. Thagarajan, K. Sarukes, and G.V. Uma, Sgned approach for mnng web content outlers, Proceedngs of World Academy of Scence, Engneerng and Technology, Vol. 56, pp , [5] G. Poonkuzhal, K.Thagarajan, and K.Sarukes, Elmnaton of Redundant Lnks n Web Pages - Mathematcal Approach, World Academy of Scence, Engneerng and Technology 52, [6] G.Poonkuzhal, K.Sarukes, and G.V. Uma, Web content outler mnng through mathematcal approach and trust ratng, 10th WSEAS Internatonal Conference on Appled Computer and Appled Computatonal Scence (ACACOS '11), [7] G. Salton, Automatc Text Processng: The Transformaton, Analyss and Retreval of Informaton by Computer, Addson-Wesley Edtors, [8] G. Tsatsarons and V. Panagotopouloua, A generalzed vector space model for Text Retreval, Proceedngs of the EACL, Assocaton for Computatonal Lngustcs Based on Semantc Relatedness, Athens, Greece, pp , Aprl [9] H.P. Luhn, A statstcal approach to mechanzed encodng and searchng of lterary nformaton, IBM Journal of Research and Development (4), , [10] L-S. Chen, and C-W. Chang, A new term weghtng method by ntroducng class nformaton for sentment classfcaton of Textual Data, Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts Vol I, IMECS, Hong Kong, March

5 Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa [11] M. Agyemang, K. Barker, and R.S. Alhajj, Framework for Mnng Web Content Outlers, ACM Symposum on Appled Computng, pp , 2004 [12] M. Agyemang, K. Barker, and R.S. Alhajj, Mnng web content outlers usng structure orented weghtng technques and n-grams, Proceedngs of ACM SAC, New Mexco, [13] M. Agyemang, K. Barker, and R.S. Alhajj, WCOND-Mne: Algorthm for Detectng Web Content Outlers from Web Documents, Proceedngs of the 10th IEEE Symposum on Computers and Communcatons (ISCC), [14] M. Agyemang, K. Barker, and R.S. Alhajj, Hybrd approach to web content outler mnng wthout query vector. Sprnger Berln, Vol. 3589, [15] M. Lan, C. L. Tan, and J. Su, Supervsed and tradtonal term weghtng methods for Automatc Text Categorzaton, Journal of IEE PAMI, Vol.10, July [16] M. Mohammadan, Intellgent Agents For Data Mnng and Informaton Retreval, Unversty of Canberra, Australa, Idea Group Publshng, Hershey, London, Melbourne, Sngapore, 2004, pp [17] M.M. Breung, H-P. Kregel, R.T. Ng, and J. Sander, LOF: Identfyng Outlers n Large Dataset, Proc. of ACM SIGMOD, Dallas, TX, pp , [18] S. Chakrabart, M. Berg, and B. Dom, Focused crawlng: A new approach to topc-specfc Web Resource Dscovery, Computer Networks, Amsterdam, Netherlands, [19] V. Barnett, and T. Lews, Outlers n Statstcal Data, John Wlley, [20] July [21] July [22] July

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Enhanced AMBTC for Image Compression using Block Classification and Interpolation Internatonal Journal of Computer Applcatons (0975 8887) Volume 5 No.0, August 0 Enhanced AMBTC for Image Compresson usng Block Classfcaton and Interpolaton S. Vmala Dept. of Comp. Scence Mother Teresa

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Mining User Similarity Using Spatial-temporal Intersection

Mining User Similarity Using Spatial-temporal Intersection www.ijcsi.org 215 Mnng User Smlarty Usng Spatal-temporal Intersecton Ymn Wang 1, Rumn Hu 1, Wenhua Huang 1 and Jun Chen 1 1 Natonal Engneerng Research Center for Multmeda Software, School of Computer,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Revealing Paths of Relevant Information in Web Graphs

Revealing Paths of Relevant Information in Web Graphs Revealng Paths of Relevant Informaton n Web Graphs Georgos Kouzas 1, Vassleos Kolas 2, Ioanns Anagnostopoulos 1 and Eleftheros Kayafas 2 1 Unversty of the Aegean Department of Fnancal and Management Engneerng

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information Remote Sensng Image Retreval Algorthm based on MapReduce and Characterstc Informaton Zhang Meng 1, 1 Computer School, Wuhan Unversty Hube, Wuhan430097 Informaton Center, Wuhan Unversty Hube, Wuhan430097

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Mining Image Features in an Automatic Two- Dimensional Shape Recognition System

Mining Image Features in an Automatic Two- Dimensional Shape Recognition System Internatonal Journal of Appled Mathematcs and Computer Scences Volume 2 Number 1 Mnng Image Features n an Automatc Two- Dmensonal Shape Recognton System R. A. Salam, M.A. Rodrgues Abstract The number of

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering Journal of Advances n Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sar Branch, Islamc Azad Unversty, Sar, I.R.Iran (Vol. 6, No. 1, February 2015), Pages: 101-114 www.jacr.ausar.ac.r Usng

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm IOP Conference Seres: Materals Scence and Engneerng PAPER OPEN ACCESS Feature Selecton for Natural Language Call Routng Based on Self-Adaptve Genetc Algorthm To cte ths artcle: A Koromyslova et al 017

More information

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS VIJAY SONAWANE 1, D.RAJESWARA.RAO 2 1 Research Scholar, Department of CSE, K.L.Unversty, Green Felds, Guntur, Andhra Pradesh

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Multiblock method for database generation in finite element programs

Multiblock method for database generation in finite element programs Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX

HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX P.Shanmugavadvu 1, P.Sumathy 2, A.Vadvel 3 12 Department of Computer Scence and Applcatons, Gandhgram Rural Insttute,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information