Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Size: px
Start display at page:

Download "Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering"

Transcription

1 Journal of Advances n Computer Research Quarterly pissn: x eissn: Sar Branch, Islamc Azad Unversty, Sar, I.R.Iran (Vol. 6, No. 1, February 2015), Pages: Usng an Automatc Weghted Keywords Dctonary for Intellgent Web Content Flterng Najbeh Farz Vejouyeh 1, Jamshd Bagherzadeh 2 1) Islamc Azad Unversty of Shabestar Branch, Shabestar, Iran 2) Assstant professor, Computer Scence and Eng. Deptt, Urma Unversty, Urma, Iran Najbeh.Farz@yahoo.com; J.Bagherzadeh@urma.ac.r Receved: 2013/07/12; Accepted: 2014/05/26 Abstract Flterng of web pages wth napproprate contents s one of the major ssues n the feld of ntellgent network's securty. Havng a good ntellgent flterng method wth hgh accuracy and speed s needed for any country n order to control users' access to the web. So, t has been consdered by many researchers. Presentng web pages n an understandable way by machnes s one of the most mportant preprocessng steps. Thus, offerng a way to descrbe web pages wth lower dmensons would be very effectve, especally n determnng the nature of web pages wth respect to whether they should be fltered out or not. In ths paper, we propose an automatc method to detect forbdden keywords from web pages. Next, we defne a new representaton of web pages n vector form whch conssts of weghted sum and frequency of forbdden keywords n dfferent parts of web pages named RWSF. For ths, a rankng dctonary of keywords ncludng forbdden keywords s used. To evaluate the proposed method, 2643 pages consstng of 1311 normal pages and 1332 forbdden pages were used. Among these, 1851 pages were used to tran the system and 792 pages were used for system evaluaton. The system has been assessed usng varous classfers such as: k-nearest Neghbor, Support Vector Machnes, Decson Tree and Artfcal Neural Networks. Evaluaton results ndcate the hgh effcency and accuracy of the proposed method n all classfers. Keywords: Content based flterng, Forbdden keywords extracton, Rankng keywords, Web page representaton 1. Introducton The number of web pages has expanded greatly because of the fast growth of the World Wde Web. The ndexed Web contans at least 8.33 bllon pages untl July 8, Web page flterng has varous purposes. For nstance, protecton aganst mproper content s one of the major web page flterng purposes. Web provdes advantageous space for users to gan all knds of nformaton. But ths space has been flled wth a number of harmful web pages, lke pornography, volence, racsm, and so on. In 2001, the Onlne Computer Lbrary Center s annual revew found 74,000 adult webstes accountng for 2% of stes on the net, and they brought n profts of more than $1 bllon together; many were small scale, wth half makng $20,000 a year. Consequently, web flterng can be used to block access to pages that are aganst the establshed polcy. 101

2 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh Another purpose of flterng s to avod msusng of the network. A survey on Internatonal Data Corporaton (IDC) proved that people spend one thrd of ther on-lne tme, on tasks other than ther job- related tasks. It s obvous that Internet accelerates the communcaton process and makes research actvtes more effectve, however t also has some problems obvously. Employees mostly use the nternet for personal actvtes such as on-lne shoppng, chattng wth frends or downloadng materal durng work hours, whch decrease productvty and responsblty of the company they work for. Hence, n recent years, plenty of researchers have obtaned notceable nterest n studyng and offerng a soluton to manage and flter mproper nformaton on the web. There have been plenty of flterng methods n the system, whch can be approxmately dvded nto four major categores as follows [1, 2, 3, 4]: Blacklst and whte-lst: Blacklst contans banned web stes, whch cannot be accessed, and whte-lst contans the pages, whch are allowed. Regardng a new web page, t s avalable or forbdden dependng on the requested URL, matchng ether blacklst or whte-lst. There s an obstacle here. Keepng the URL lsts complete and up to date s a very tough task. PICS: PICS (Platform for Internet Content Selecton) can develop rankng for web stes. There are usually two measures to rank the web pages. the frst one s selfrankng and the second one s other rankng. The dfference between two orgnates from the case that f the rankng results are gven by web publshers or not. Flterng systems can operate by means of rankng nformaton of web stes. The PICS s not an oblgatory labelng system, so the rankng nformaton s not always relable. Keywords flterng: Ths method s an easy approach to block access to web stes whch functon accordng to the occurrence of forbdden words. In ths method, a lst of forbdden words or phrases s often requred. Hence, the web page s blocked when the number of forbdden words n the web page s more than predefned lmts. The problem wth the keywords analyss based flterng systems s that they rely on the keyword lsts for a great deal, whch need great effort. Besdes, fndng enough partcular keywords n some felds s hard. The meanng of the word depends on the context. For example, f t s supposed to flter contents by matchng keywords for nstance a word lke "sex",t may mstakenly block web stes about genders. For ths reason, ths method wll unavodably cause over-blockng. In addton, ths method can easly be defected due to msspelled words. Intellgent approach to web content flterng: A web flterng system can use ntellgent approach to analyze the content. For nstance, tranng models or data mnng technques are effcent ways to classfy web contents automatcally. Content analyss s a worldwde method for web page flterng task because t s well-known that llegal web stes nclude partcular text, mage and other nformaton that can assst us to flter them. Supervsed learnng methods are used broadly n web page flterng systems. The problem wth supervsed learnng methods s that a great set of hgh-qualty labeled samples are needed, and they 102

3 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) are hard to obtan. Sem-supervsed tranng methods are effcent when the avalable labeled sample set s not large. In ths paper, we have proposed a brand new method for web page representaton. In the proposed method, we have used weghted forbdden keywords dctonary to represent web pages. We have compared t wth TFIDF method n accuracy, tranng tme and memory usage. We also evaluated the effect of weghted forbdden keywords dctonary n accuracy of the proposed method by usng dfferent classfers. The remander of the paper s organzed as follows. In the secton 2 we start out revewng the related works on web flterng. The archtecture of our flterng system s descrbed n the secton 3. Web page classfcaton n our system s explaned n the secton 4. In the secton 5 we descrbe the proposed method for document representaton and weghted keywords dctonary. Expermental results are gven and dscussed n the secton 6, pror to the concluson n the secton Related Work Machne learnng methods such as k-nearest neghbors (knn), Neural Network, Decson Trees, Support Vector Machnes (SVM), Neural Networks (NN) are broadly used n web page flterng problems [5, 6, 7, 8, 9,17]. Du et al. [1] proposed a web flterng system that uses text classfcaton approach to classfy web pages nto desrable and undesrable ones. Smlartes between the nput web page and all tranng web page samples are averaged and compared wth a threshold to determne the label of the nput page. The system was traned wth a tranng dataset of 487 adult URLs, wthout any non-adult URLs and we used a database that ncluded 329 adult URLs and 587 non-adult URLs to test the system performance. Ther method acheved a hgh accuracy on the data set contanng adult texts from the adult category of Yahoo. Because the styles of pornography texts and stores are not the same, so ths approach cannot work well n the real world [10]. In [10], Wu et al. ntroduced a system lke a Cellular Neural Network word net to extract and reflect semantc and statstc aspects of texts. They analyzed dfferent types of keywords alongsde obvous keywords, hdden keywords and logcal keywords. SVM was appled as a classfer. In order to evaluate the performance of ther system, they used a dataset contanng 3162 Chnese texts among them 577 were trcky texts, 585 texts were related to sex but normal at the same tme and 2000 normal texts. 300 trcky texts, 300 sex-related normal texts and 1000 normal texts were used as tranng data, and the rest acted as test data. Also they gathered lst of 109 expressve terms contanng 29 apparent keywords, 33 hdden keywords and 47 logcal keywords. Ther expermental results showed that three knds of keywords can mprove the recognton rates notceably. They obtaned the best classfcaton rate usng the CNN- 103

4 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh lke word net to extract aspects of texts too. It affrms that CNN-lke word net can accurately represent the semantc features of trcky texts. Chen et al. [11] frst used a C4.5 decson tree to classfy nput pages nto three classes of contnuous texts, dscrete texts, and mage pages. A CNN net s appled to recognze the semantc relatons wthn contnuous texts and a naïve Bayesan algorthm s adopted to dentfy dscrete texts. After that, a fuson classfer based on Bayes Theorem approach ntegrated texts and mages and 91.8% classfcaton rate was ganed over 1500 sample pages. Usng only URLs and keywords nstead of a content based analyss, as well as a small set of test data, and relatvely low accuracy rate are some shortcomngs of ther work [12]. He et al. [2] used a sem-supervsed framework for web page flterng. The Adaboost algorthm was used as a classfer. The expermental results show that sem-supervsed learnng approach outperforms supervsed method when avalable labeled sample set s small. Feature reducton should be employed to decrease the number of feature terms to an acceptable level before flterng. In [13] authors proposed to use a rough set to reduce orgnal feature terms. After selectng features, all web pages were represented by the feature vector wth the weghtng functon. They also presented a brand new coeffcent weghted method based on rough set to Bayesan formula. The method mproves flterng performance but t s not very effcent to ncrease flterng correctness. In [19], Ma proposed a neural network method for determnng the exstng status of a requested URL n the large prohbted collecton. The smulaton results show superor performances n both memory requrement and speed, comparng wth a database mplementaton on the same PC. 3. Flterng Archtecture We use the combnaton of the three methods ncludng black lst, keyword blockng and ntellgent content flterng for web page flterng. The formulaton of our system archtecture s as follows: 1) URL s launched. 2) If the ste exsts n the blacklst, block the page and stop. 3) Load the page's HTML source code. 4) If the frequency of the forbdden keywords n the page s more than a predefned threshold, classfy the page as forbdden page and go to the (6). 5) Analyze the content of the web page and make a further decson on the ste regardng whether to allow access or deny t. 6) Block the page f t s judged as a forbdden page and update the blacklst. Fgure 1 shows the general archtecture of our flterng system. 104

5 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) Web Page Classfcaton Fgure 1. Flterng Archtecture Machne learnng technques provde us powerful ways to automatcally predct forbdden web pages usng manually classfed web pages. Fgure 2 llustrates the general schema of the proposed approach. It conssts of two phases: generatng a predcaton model phase and detecton phase. For web page representaton, the web pages have to be transformed from the full text verson to web page vectors. The Frst step conssts of tokenzaton, stop word removal and word stemmng to make a vocabulary, where each term occurs at least once n a certan number of web pages. In the second step, we prepare the forbdden keywords usng vocabulary and calculate ther ranks. After that we represent all the tranng web pages as vectors of 18 features usng the rankng dctonary of forbdden keywords obtaned n the prevous step. Web page vectors are used as nputs to learn and make a model (classfer) for predcaton. In the detecton phase a new web page s converted to ts correspondng vector usng forbdden keywords and ther ranks, then the classfer classfes t. 105

6 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh 5. Web Page Representaton Fgure 2. General schema of the proposed approach Algorthms that can mprove the classfcaton effcency whle mantanng accuracy, are hghly desred. Nevertheless, web page representaton s one of the preprocessng technques that s used to reduce the complexty of the documents and make them easer to handle. Web page representaton s an mportant aspect n web page classfcaton, whch denotes the mappng of a web page nto a compact form of ts content. 5.1 Feature s vector wth the TFIDF weghtng functon A web page s typcally represented as a vector of term weghts (word features) from a set of terms (vocabulary). Vocabulary s the set of all dstnct words and other tokens occurrng n any web page from tranng dataset [18]. A major characterstc of the web page classfcaton problem s the extremely hgh dmensonalty of web page data. After selectng feature subsets, all documents were represented by the feature vector wth the TFIDF weghtng functon. That s, the weght of term t n document dj s calculated by 106

7 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) f j N W j tfdf ( t, d j ) log M n f k 1 2 kj (1) Where f j denotes the number of tmes, t occurs n document dj, n(t) the number of documents n whch t occurs at least once, N the total number of documents, M s the sze of the feature subset. 5.2 Feature s vector wth the WRSF representaton's method Hammam et al. [4, 14, 15] use another method to represent web pages. They represent web pages as vectors of numbers, whch show numbers and frequences of forbdden keywords n dfferent parts of web pages such as ttle, body, lnks, etc. As the speed of flterng s mportant, ths method s a good way for representng web pages. The created vectors would have less dmensons whch speed up creatng a classfer and consequently web page classfcaton. In all the papers, whch use forbdden keywords dctonary to represent web pages, dctonary s made by experts based on forbdden groups, except the method of [15], whch creates sem-automatc dctonary based on n-grams that has hgh accuracy n contrast to manual and automatc methods. In sem-automatc methods there s a need for experts to select keywords whch are cost consumng and error prone. In ths paper we propose an automatc method based on Ch-square [9] to select forbdden keywords based on tranng documents. The term-goodness measure s defned as: X ( t ) N ( a d b c ) ( a b )( a c )( d b )( c d ) (2) Where a s the number of tmes t occurs n the forbdden web pages, b s the number of tmes t occurs n the normal web pages, c s the number of forbdden web pages wthout t, d s the number of normal web pages wthout t and N s the total number of webpages. Usng ths formula we can choose a number k of keywords as forbdden keywords where ther goodness s more than the predefned threshold. In all provded papers and systems whch use forbdden keywords dctonary to represent web pages, number and frequency of forbdden keywords have been consdered as man features. These methods gve equal mportance to all forbdden keywords of dctonary. However, when we need k number of keywords, all of them are not equally ncorporated n forbdden webpages. We can have hgh accuracy by rankng forbdden keywords of dctonary and take nto account the weghted sum and frequency of forbdden keywords nstead of number and frequency of forbdden keywords. We have selected a number of words and have normalzed ther rankng wth respect to ther 107

8 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh mnmum and maxmum values and mapped them nto the (1, 40) nterval. Then we represent web pages as vectors of 18 features usng the rankng dctonary of forbdden keywords. Textual and profle features that we used to represent web pages are shown n Table 1. Weghted sum and frequency of forbdden keywords n dfferent parts of web pages are calculated by the followng formula: W eghted Sum Rank ( t ) n( t ) W eghted Frequancy Rank ( t ) n( t ) Rank ( t ) n( t ) m (3) Where n(t) s the number of tmes t occurs n the target part of the web page and m s the number of non-forbdden words n target part of the web page. Features nw-page wfw-page nw-body wfw-body nw-ttle wfw-ttle n-url nw-url n-lnk nw-lnk wfw-lnk n-mage nw-mage Wfw-mage nw-src nw-alt nw-meta wfw-meta Table 1. Selected features for web page representaton Descrpton Weghted sum of forbdden words that occur n the page Weghted frequency of forbdden words that occur n the page Weghted sum of forbdden words that occur n the body Weghted frequency of forbdden words that occur n the body Weghted sum of forbdden words that occur n the ttle Weghted frequency of forbdden words that occur n the ttle Number of URLs n the page Weghted sum of forbdden words that occur n the URLs Number of lnks n the page Weghted sum of forbdden words that occur n the lnks Weghted frequency of forbdden words that occur n the lnks Number of mages n the page Weghted sum of forbdden words that occur n the mages Weghted frequency of forbdden words that occur n the mage Weghted sum of forbdden words that occur n the attrbute src of the mg tag Weghted sum of forbdden words that occur n the attrbute alt of the mg tag Weghted sum of forbdden words that occur n the meta Weghted frequency of forbdden words that occur n the meta For example, the followng text s content part of tag Meta of a forbdden page, words of text that are n forbdden words dctonary are specfed n underlned form and rank of each words s gven n the aganst table. 108

9 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) Brand New! We have revewed Shemale Sex Dates and t was awesome. Horny Shemale Lovers Take the Free Tour and see for yourself! Weghted sum and frequency of ths text s calculated as follows. Weghted Sum Forbdden keywords Shemale Sex Horny Rank W eghted Frequancy Words extracted from the text regardless of the forbdden words are defned n talcs after removng stop words and equals to 7. Weghted sum and frequency for texts related to rest of the Web page were calculated as sample and a vector consstng of 18 attrbutes s formed for each web page. 6. Expermental Results 6.1 Dataset descrpton To evaluate the proposed method, 2643 random samples of ODP 1 lnks have been selected from allowed and llegal categores. Among them 1311 web pages belong to the allowed category and 1332 pages belong to the forbdden category. Among selected samples, 933 legal web pages and 918 forbdden web pages have been randomly chosen as tranng dataset. Moreover, 792 web pages have been selected n order to assess system accuracy and effcacy that nclude 393 legal webpages and 399 forbdden web pages. 6.2 Performance measure Usually blockng and over-blockng rate are used for performance measurement n the flterng systems. Blockng rate measures the percentage of forbdden pages that the flterng system manages to block and over-blockng rate shows the rate of msclassfed normal pages as forbdden pages. They are defned by the followng equatons: BlockngRate Over BlockngRate TP TP TN FP FP FN (4) Where TP s the number of test web pages correctly classfed under forbdden web pages, FP s the number of test web pages ncorrectly classfed under forbdden web pages, TN s the number of test web pages correctly classfed under normal web pages, 1. Open Drectory Projects 109

10 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh and FN s the number of test web pages ncorrectly classfed under normal web pages. These defntons are shown n Table 2. Table 2. The global contngency table Expert Judgment Yes No Classfer Yes TP FP Judgments No FN TN Another commonly used measure n flterng systems s accuracy that s defned n the equaton (5). Accuracy TP TN (5) N Where N s the total number of web pages. 7. Comparson Analyss To evaluate the proposed method, after attachng tranng web pages to each other, words n the pages are extracted and after removng stop words, the remanng words were stemmed. Porter algorthm [16] s used for word stemmng. The number of rooted words n the vocabulary was equal to after rootng the keywords. Forbdden keywords were selected usng the method mentoned n the secton 5. In the next stage, the correspondng vectors of web pages were formed n three ways. In the frst way (TFIDF), a web page was represented as a vector of words where the words are selected by CHI word selecton method [5]. In the second way (RSF), a web page was represented as a vector of numbers and frequences of forbdden keywords n dfferent parts of web pages. In the thrd way, a web page was represented as we proposed (RWSF), whch s ntroduced n the secton 5. Dfferent classfers ncludng Support Vector Machne, k-nearest Neghbor, Artfcal Neural Network and Decson Tree are used to evaluate all types of representatons. In our experments, all the classfers were obtaned from the framework Weka (Wtten and Frank 1999). We evaluate the performance of TFIDF method by varyng the number of features from 100 to1000. The results of our experments are shown n the Fgure 3. As seen n the fgure, SMO (a verson of SVM mplemented n Weka) has a hgh accuracy of 120 words, Neural Networks has a hgh accuracy of 160 words, and k-nearest Neghbor has a hgh accuracy of 100 words usng TFIDF method. 110

11 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) Fgure 3. Comparson of the classfers n the TFIDF approach Fgure 4. Comparson of the classfers n the proposed approach Fgure 4 presents the comparson of the classfers wth number of dfferent keywords n the dctonary. Accordng to the results of experments, the SMO classfer has a hgh accuracy of percent wth 1000 keywords. The knn classfer wth k = 10 at best mood has accuracy of percent. Neural Network classfer has a hgh accuracy equal to percent wth dctonary ncludng 700 keywords. The Decson Tree classfer has the hgh accuracy of percent wth a dctonary that ncludes 800 forbdden keywords. To compare our method wth TFIDF, we selected the best result of the two methods n each classfer (has shown n Table 3) and calculated the percentage of ncrease or 111

12 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh decrease n the accuracy, tranng executon tme, and memory usage for savng tranng data after preprocessng step by the followng formula: Result Result new old Result old 100 (6) Table 3. Comparng dfferent classfers n each method of web page representaton n the best way of accuracy TFIDF RWSF Accurac y Tranng Tme (S) Memory Usage (KB) Accuracy Tranng Tme (S) Memory Usage (KB) SMO DT ANN k-nn The result of our experments are shown n Fgure 5. Although our approach does not have much effect on ncreasng the accuracy of the system comparng to the tfdf method, t s very effectve n decreasng the tranng tme and memory usage. The evaluaton results of comparng RSF wth RWSF methods are shown n Fgure 6. As shown, the use of rankng dctonary s mentoned by varous classfers to evaluate ts effect n achevng hgher accuracy. Fgure 5. Percentage ncrease n accuracy, tranng tme and memory usage for savng data n the proposed method compared to TFIDF usng dfferent classfers. 112

13 Journal of Advances n Computer Research (Vol. 6, No. 1, February 2015) Fgure 6. Comparng the accuracy of flterng between use of forbdden keywords and weghted forbdden keyword wth dfferent classfers 8. Concluson Inventon of Web has made t the man place to publsh any knd of nformaton. There are varous types of nformaton ncludng a large number of napproprate web pages, whch are useless for some groups of people. Some organzatons need to flter access of ther communty to erotc pages. Recently, some ntellgent technques based on classfcaton methods of texts were proposed to prevent users to access forbdden web pages. In ths paper, we have proposed a new ntellgent automatc way to forbdden keywords dctonary formaton. We presented webpages usng varous features obtaned based on forbdden keywords dctonary. Then we assessed our flterng system usng dfferent classfcaton technques such as Decson Tree, Support Vector Machne, k-nearest Neghbor and Artfcal Neural Network. The results of all classfcatons show that the proposed method has hgh effcency. In ths paper, we flter web pages only usng textual nformaton of web pages. The accuracy needs to be further mproved by analyzng the varous multmeda n the web pages, ncludng audos, mages and vdeos. 9. References [1] Du R, Safav-Nan R, Suslo W. Web Flterng Usng Text Classfcaton. In Networks 2003 ICON2003 The 11th IEEE Internatonal Conference on; 28 Sept.-1 Oct. 2003; pp

14 Usng an Automatc Weghted Keywords N. Farz Vejouyeh, J. Bagherzadeh [2] He Z, L X, Hu W. A boosted sem-supervsed learnng framework for web page flterng. In Proceedngs of the 2009 IEEE nternatonal conference on Systems, Man and Cybernetcs (SMC 09); Oct. 2009; IEEE Press, Pscataway, NJ, USA, pp [3] Lee PY, Hu SC, Fong ACM. Neural Networks for Web Content Flterng. IEEE Intellgent Systems 2002; 17: [4] Guermaz R, Hammam M, Hamadou AB. Combnton Classfers for Web Volent Content Detecton and Flterng. ICCS '07 Proceedngs of the 7th nternatonal conference on Computatonal; 2007, pp [5] Baharudn B, Lee LH, Khan K. A Revew of Machne Learnng Algorthms for Text Documents Classfcaton. Journal of Advances n Informaton Technology 2010; 1(1): [6] Harsh B, Guru D, Manjunath S. Representaton and Classfcaton of Text Documents: A Bref Revew IJCA,Specal Issue on RTIPPR; 2010, 2: [7] Mtchell TM. Machne Learnng. Annual Revew of Computer Scence 1997; 4: [8] Mtra V, Wang CJ, Banerjee S. Text classfcaton: A least square support vector machne approach, Appled Soft Computng Journal. 2007, 7 (3), pp [9] Sebastan F. Machne Learnng n Automated Text Categorzaton. ACM Computng Surveys. 2001; 34(1): [10] Wu O, Hu W. Web Senstve Text Flterng by Combnng Semantcs and Statstcs. IEEE Internatonal Conference on Natural Language Processng and Knowledge Engneerng. 30 Oct.- 1 Nov. 2005, IEEE NLP-KE '05, pp [11] Chen Z, Wu O, Zhu W, Hu W. A Novel Web Page Flterng System by Combnng Texts and Images. In Proceedngs of the 2006 IEEE/WIC/ACM Internatonal Conference on Web Intellgence (WI 06) Dec IEEE Computer Socety, Washngton, DC, USA, pp [12] Ahmad A, Fotouh M, Khalegh M. Intellgent classfcaton of web pages usng contextual and vsual features. APPL SOFT COMPUT; 2011; 11(2): [13] Wu Y, She K, Zhu W, Yue X, Luo H. A Web Text Flter Based on Rough Set Weghted Bayesan. In Proceedngs of the 2009 Eghth IEEE Internatonal Conference on Dependable, Autonomc and Secure Computng (DASC 09). IEEE Computer Socety, Washngton, DC, USA, pp [14] Hammam M, Chahr Y, Chen L. Combnng Text and Image Analyss n the Web Flterng System Webguard. Internatonal Assocaton for Development of the Informaton Socety IADIS. Novembre 2003, pp [15] Guermaz R, Hammam M, Hamadou AB. Usng a Sem-automatc Keyword 9 Dctonary for Improvng Volent Web Ste Flterng Thrd Internatonal IEEE Conference on Sgnal Image Technologes and Internet Based System, Dec. 2007, pp [16] Porter M. An algorthm for suffx strppng. Automated Lbrary and Informaton Systems, 1980; 14(3): [17] S. Ramasundaram and S.P. Vctor; Algorthms for Text Categorzaton : A Comparatve Study; World Appled Scences Journal 22 (9): pp , ISSN , [18] Y. Zhao, Chapter 10 - Text Mnng, In: Yangchang Zhao, Edtor(s), R and Data Mnng, Academc Press, 2013, Pages , R and Data Mnng, ISBN , [19] H. Ma, "Fast Blockng of Undesrable Web Pages on Clent PC by Dscrmnatng URL Usng Neural Networks," Expert Systems Wth Applcatons (ESWA), vol. 34, no. 2, pp , February

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1 A New Feature of Unformty of Image Texture Drectons Concdng wth the Human Eyes Percepton Xng-Jan He, De-Shuang Huang, Yue Zhang, Tat-Mng Lo 2, and Mchael R. Lyu 3 Intellgent Computng Lab, Insttute of Intellgent

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information

Spam Detection Through Sliding Windowing of Headers

Spam Detection Through Sliding Windowing of  Headers Spam Detecton Through Sldng Wndowng of E-mal Headers Francsco Salcedo-Campos, Jesus Daz-Verdejo, Pedro Garca-Teodoro Dpt. of Sgnal Theory, Telematcs and Communcatons ETSIIT - CITIC - Unversty of Granada

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults 1 An Improved Neural Network Algorthm for Classfyng the Transmsson Lne Faults S. Vaslc, Student Member, IEEE, M. Kezunovc, Fellow, IEEE Abstract--Ths study ntroduces a new concept of artfcal ntellgence

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition Mathematcal Methods for Informaton Scence and Economcs Novel Pattern-based Fngerprnt Recognton Technque Usng D Wavelet Decomposton TUDOR BARBU Insttute of Computer Scence of the Romanan Academy T. Codrescu,,

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis Assgnment and Fuson of Multple Learnng Methods Appled to Remote Sensng Image Analyss Peter Bajcsy, We-Wen Feng and Praveen Kumar Natonal Center for Supercomputng Applcaton (NCSA), Unversty of Illnos at

More information

Key-Selective Patchwork Method for Audio Watermarking

Key-Selective Patchwork Method for Audio Watermarking Internatonal Journal of Dgtal Content Technology and ts Applcatons Volume 4, Number 4, July 2010 Key-Selectve Patchwork Method for Audo Watermarkng 1 Ch-Man Pun, 2 Jng-Jng Jang 1, Frst and Correspondng

More information

Fingerprint matching based on weighting method and SVM

Fingerprint matching based on weighting method and SVM Fngerprnt matchng based on weghtng method and SVM Ja Ja, Lanhong Ca, Pnyan Lu, Xuhu Lu Key Laboratory of Pervasve Computng (Tsnghua Unversty), Mnstry of Educaton Bejng 100084, P.R.Chna {jaja}@mals.tsnghua.edu.cn

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection 2009 10th Internatonal Conference on Document Analyss and Recognton A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE Journal of Theoretcal and Appled Informaton Technology 30 th June 06. Vol.88. No.3 005-06 JATIT & LLS. All rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 RECOGNIZING GENDER THROUGH FACIAL IMAGE

More information

COMBINING TEXT AND IMAGE ANALYSIS IN THE WEB FILTERING SYSTEM "WEBGUARD"

COMBINING TEXT AND IMAGE ANALYSIS IN THE WEB FILTERING SYSTEM WEBGUARD COMBINING TEXT AND IMAGE ANALYSIS IN THE WEB FILTERING SYSTEM "WEBGUARD" Mohamed Hammam LIRIS, Ecole Centrale de Lyon 36, Av Guy de Collongue, 69131 Ecully-France Youssef Chahr GREYC - URA CNRS 6072 Campus

More information

KOHONEN'S SELF ORGANIZING NETWORKS WITH "CONSCIENCE"

KOHONEN'S SELF ORGANIZING NETWORKS WITH CONSCIENCE Kohonen's Self Organzng Maps and ther use n Interpretaton, Dr. M. Turhan (Tury) Taner, Rock Sold Images Page: 1 KOHONEN'S SELF ORGANIZING NETWORKS WITH "CONSCIENCE" By: Dr. M. Turhan (Tury) Taner, Rock

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

A Misclassification Reduction Approach for Automatic Call Routing

A Misclassification Reduction Approach for Automatic Call Routing A Msclassfcaton Reducton Approach for Automatc Call Routng Fernando Uceda-Ponga 1, Lus Vllaseñor-Pneda 1, Manuel Montes-y-Gómez 1, Alejandro Barbosa 2 1 Laboratoro de Tecnologías del Lenguaje, INAOE, Méxco.

More information

Vol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Vol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Journal of Emergng Trends n Computng and Informaton Scences 009-03 CIS Journal. All rghts reserved. http://www.csjournal.org Unhealthy Detecton n Lvestock Texture Images usng Subsampled Contourlet Transform

More information