Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Size: px
Start display at page:

Download "Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies"

Transcription

1 Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn 2 Hong Kong Unversty of Scence and Technology, Hong Kong, Chna qyang@cse.ust.hk ABSTRACT Organzng Web search results nto herarchcal categores facltates users browsng through Web search results, especally for ambguous queres where the potental results are mxed together. Prevous methods on search result classfcaton are usually based on pre-tranng a classfcaton model on some fxed and shallow herarchcal categores, where only the top-two-level categores of a Web taxonomy s used. Such classfcaton methods may be too coarse for users to browse, snce most search results would be classfed nto only two or three shallow categores. Instead, a deep herarchcal classfer must provde many more categores. However, the performance of such classfers s usually lmted because ther classfcaton effectveness can deterorate rapdly at the thrd or fourth level of a herarchy. In ths paper, we propose a novel algorthm known as Deep Classfer to classfy the search results nto detaled herarchcal categores wth hgher effectveness than prevous approaches. Gven the search results n response to a query, the algorthm frst prunes a wde-ranged herarchy nto a narrow one wth the help of some Web drectores. Dfferent strateges are proposed to select the tranng data by utlzng the herarchcal structures. Fnally, a dscrmnatve naïve Bayesan classfer s developed to perform effcent and effectve classfcaton. As a result, the algorthm can provde more meanngful and specfc class labels for search result browsng than shallow style of classfcaton. We conduct experments to show that the Deep Classfer can acheve sgnfcant mprovement over state-of-the-art algorthms. In addton, wth suffcent off-lne preparaton, the effcency of the proposed algorthm s sutable for onlne applcaton. Categores and Subject Descrptors H.4.m [Informaton Systems Applcatons]: Mscellaneous; I.5.4 [Pattern Recognton]: Applcatons Text processng Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. WSDM 08, February 11 12, 2008, Palo Alto, Calforna, USA. Copyrght 2008 ACM /08/ $5.00. General Terms Algorthms, Expermentaton Keywords Deep Classfer, Search Result Mnng, Herarchcal Classfcaton, Herarchy Prunng 1. INTRODUCTION Wth the ncreasng prevalence of Web technologes, Web search has become essental n everyday lfe. Current search engnes typcally return a long lst of search results n response to a user-ssued query. Although the most authortve results may be ranked hgh among all results under a proper rankng algorthm (e.g. PageRank [20]), t remans a queston whether the most authortve pages are what the user actually wants. In partcular, when a query s nherently ambguous and the frst pages are not the ntended results of the users, sometmes users may have dffculty n subsequently browsng the rest of the pages. For example, f a user wants to fnd the benefts from an apple frut and ssues a query apple, all results on the frst page are focused on the topc about the computer company Apple 1, rather than the ntended topc of the apple frut. Ths s because the Apple company s so well-known that Web pages about the sense, that s the Apple company, are more lkely to be well lnked and thus be ranked hgher. To make matters worse, the results about frut apple are placed apart from each other n the lst of search results. For the frut apple, the results are placed at 14, 37, 38, 62, 63, 66, 67, 71, 75, 76, 82, 84, 88 among top 100 search results from the Google search engne. A soluton s to perform classfcaton on the search results [5, 8]. As descrbed n [5, 8], automatc category organzaton brngs many advantages for users, where the search results are automatcally compled nto a herarchy accordng to dfferent potental meanngs. As ponted n these two papers, the users preferred the category nterface much better than the lst nterface; n fact they were 50% faster n fndng nformaton that was organzed nto categores. Generally, there are two types of models for classfcaton: shallow classfcaton and deep classfcaton. Shallow classfcaton [5, 8] trans the model on the top level or the top two levels of herarchy whle deep classfcaton [15] learns

2 the classfcaton models on an entre large-scale herarchy. However, the shallow classfcaton scheme may be too coarse for the gven search result data when all the resultant data tems are placed n only two or three classes. In contrast, placng the search result on a category n a deep herarchy can provde more content nformaton on the results than wth a shallow herarchy. Among the prevous researches, Lu et al. [15] ntroduced a top-down classfcaton strategy on a deep but wde target herarchy. The wdth leads to the large sze of the herarchy, resultng n a performance declne as the herarchcal depth ncreases. The measure of Macro-F1 was near 20% at the second level, and decreased to near 10% at the ffth. In addton to an accuracy decrease, tranng the model on a global herarchy may not be a good choce and yeld poor performance. For example, the classfcaton results of the query apple are supposed to be dstrbuted among the categores such as Health, Computer (f these categores are avalable), whle results of the query Saturn are expected to dstrbute among categores lke Scence, Game and Car. It s expected that a classfer nto two categores Health, Computer wll outperform a classfer nto more categores such as Health, Computer, Scence, Game and Car on search results n response to the query apple. Ths mples that t s desrable to employ an adaptve method for creatng classfers that uses dfferent target categores for dfferent user queres. Accordng to the above observatons, n ths paper, we propose a novel algorthm known as Deep Classfer to classfy search results nto a large and deep target herarchy adaptvely by utlzng the exstng herarches on the Web. We do ths as follows. We frst prune a large herarchy nto one wth a smaller sze whle keepng the orgnal herarchcal structure. Ths s accomplshed by frst queryng an on-lne Web drectory for the specfed query and then creatng all ancestors of these deep categores. Ths results n a deep but narrow target herarchy from the orgnally large and wde one. In ths way, a dfferent target herarchy s created adaptvely for a dfferent user query. The leaf nodes n such a herarchy wll be taken as the category canddates for the search results categorzaton. Based on the narrow and deep herarchy, we propose dfferent strateges for tranng the data selecton wth the help of the herarchcal structure. These classfcaton models are learned onlne usng an effcent mplementaton of the dscrmnatve naïve Bayesan classfer for classfcaton. Expermental results show that our Deep Classfer algorthm can acheve sgnfcant mprovement over state-of-theart algorthms. Furthermore, wth suffcent off-lne preprocessng, the effcency of the proposed algorthm s sutable for onlne applcaton. As a result, the entre algorthm can provde more meanngful and specfc class labels for search result browsng when compared to the shallow counterparts. It s worthy to menton that actually, the onlne Web drectores such as ODP, Yahoo! Drectory, etc, are straghtforward solutons to what has mentoned above. Searchng on these Web drectores, users wll be fed up wth search results attached wth category attrbutes, whch has already been assgned by human edtors and are of course herarchcal. However, the obvous defcency of these solutons s that the number of assembled Web pages s lmted n any human mantaned Web drectores. Pages one can search from these web drectores are much less than those from a ODP Pruned Herarchy Query Tranng Data Selecton Classfcaton Results classfed Search Engne Search Results Fgure 1: Overvew of Deep Classfer search engne. Our soluton combnes Web drectores and search engnes. From the former, we gan a rch (large and deep) concept herarchy and from the latter, we are able to search among bllons of pages ndexed by the search engne. The remander of ths paper s organzed as follows. Secton 2 gves a bref overvew of the Deep Classfer algorthm. In Secton 3, we propose dfferent strateges for tranng data selecton. In Secton 4, a dscrmnatve naïve Bayesan classfer s presented. The mplementaton ssues are descrbed n Secton 5. We wll report and analyze the results of a seres of experments of our proposed algorthm n Secton 6. Related work s dscussed n Secton 7. In Secton 8, we wll gve a concluson and dscuss the future work. 2. OVERVIEW OF DEEP CLASSIFIER In ths secton, we gve an overvew of our Deep Classfer algorthm. Fgure 1 llustrates the flowchart of our system. We frst let a user ssue a query, whch wll be submtted to an onlne Web drectory (e.g. Open Drectory Project (ODP) 2 ) and a search engne (e.g. Google 3 ) smultaneously to get the category nformaton and search results, respectvely. The onlne Web drectory may respond a lst of categores that are relevant to the query. For example, by ssung the query Saturn to ODP, ODP wll return fve categores (categores wth bold font n the rght part of the Fgure 2). By creatng the pruned herarchy from the fve categores, only twenty-four ancestors reman. Compared wth the entre herarchy as shown to the left of the Fgure 2, ths narrowdown procedure helps us greatly reduce the number of target category canddates. The leaf nodes n the pruned herarchy are regarded as our target category canddates. These nodes may have offsprng n the orgnal large herarchy but offsprng are now pruned. In other words, we may also pck up nternal nodes, nstead

3 Top ( ) Arts (18085) Busness (202209) Computers (101724) Internet (29828) Software (28566) Databases (1983) System (1016). Games (39019) Health (47319) Home (22965) Kds and Teens (26855) News (96444) Recreaton (77252) Reference (47948) Scence (74325) Shoppng (86090) Socety (184090) Sports (70617) Operatng Systems (5322) Dagnostcs (54). Hardware (5783) Query (Saturn) Top ( ) Arts (18085) Anmaton (6479) Games (39019) / Sega (56) Kds and Teens (26855) / / Recreaton (77252) Autos (5457) Scence (74325) / / / Saturn (2) Saturn (7) Makes and Models (4025) Planets Characters (94) / / Salor Saturn (10) Saturn (19) Saturn Fgure 2: Prune Large and Deep Herarchy nto Decent Sze. of only leaf nodes, of the orgnal large herarchy as our target category canddates. Next, based on the structure of the pruned herarchy, three tranng data selecton strateges are proposed n Secton 3 by utlzng the herarchcal structure. Ths step s mportant snce the labeled Web pages under one ODP category are usually too few to tran a relable classfer. Then, based on the tranng data, we perform classfcaton model learnng. Snce our algorthm s conducted onlne, t s mportant for the algorthm to be effcent. To satsfy ths goal, we propose a smple classfer based on naïve Bayesan whch s descrbed n Secton 4. We wll also compare our results wth the Support Vector Machne n experments to see how much tme one should wat to obtan a probably better classfcaton results by SVM. Fnally, we classfy the search results and demonstrate the results wth a herarchcal search result nterface for the user. Compared wth a top-level categorzaton or a top-two-level categorzaton proposed n [5, 8] lke Arts Games Kds and Teens Recreaton Scence Arts Anmaton Game Vdeo Game Kds and Teens School Tme Recreaton Autos Scence Astronomy our proposed Deep Classfer can (1) provde an nterface wth more meanngful class labels shown n the rght part of Fgure 2 than top-or-top-two-level style and (2) classfy search results more delcately. The second pont s not very obvous n ths case, snce the number of category canddates does not change between shallow and deep classfcaton, but we wll show examples of ths knd n the experment secton later (n Table 1). 3. TRAINING DATA SELECTION In ths secton, we dscuss strateges for tranng data selecton. Ths s a very mportant ssue n our task. In Fgure 2, the number after category name s the number of tranng data attached to the category (and ts offsprng). One can see that some category canddate contans very few labeled Web pages, whch s nsuffcent to buld a relable classfer. We wll propose three dfferent strateges for selectng the tranng data for our task. The best strategy s used n our fnal desgn of the Deep Classfer. 3.1 Flat Strategy The flat strategy s a smple method for tranng data selecton, n whch we transfer the herarchcal classfcaton task nto a flat classfcaton task. From the vewpont of herarchcal classfcaton, ths strategy places all the category canddates (those n bold font n Fgure 3) drectly at the root, whch s shown n Fgure 4. Then, we drectly classfy the search results nto category canddates by a flat classfer. As shown n Fgure 4, we can drectly tran the model usng the data from categores 44, 85, 205, 66, 874, 902, 42, 677 and 707. Ths method s smple to use, but t does not consder the herarchcal structure of web drectory. 3.2 Herarchcal Strategy Snce a herarchy s pruned to a manageable sze, the exstng top-down style can be tred even for onlne applcaton Fgure 3: An Example of Pruned Herarchy We now descrbe the algorthm accordng to the tree structure shown n Fgure 3. The category canddates are shown n bold font, and the herarchcal strategy s to frst classfy (estmate probabltes of) the search results nto categores marked wth 17, 23, 66 and 27. Thus a classfer s created by selectng tranng data under the nodes (and ther offsprng) 17, 23, 66 and 27. The estmated probablty of each non-canddate category s propagated to ther canddate offsprng. The rest of the algorthm s smlar to that at the 141

4 Fgure 4: Flat Strategy Fgure 5: Ancestor-Assstant Strategy top level. For example, another classfer wll be created by selectng the tranng data from the categores 102 and 203 and then classfyng the search results nto them. Formally, we represent the category canddate c wth ts ancestors, denoted by c 1, c 2,, c l where l s the length of the path from the category canddate c to the root. c 1 s the root node and c l s the category canddate tself. The probablty P(c 1 x) equals to 1. Now we formalze the probablty P(c x) that the test case x belongs to c by: P(c x) = P(c 1,c 2,, c l x) = P(c l x, c 1, c 2,, c l 1 ) P(c 1,, c l 1 x) = = l k=1 P(c k x, c 1,, c k 1 ) Each condtonal probablty n the product s estmated by a classfer at the k th level under the category c k. If there s only one chld under c k n the pruned tree, 1 s assgned drectly wthout the cost of classfer tranng. Search results are classfed by fndng c whch maxmze the a posteror P(c x). Ths strategy s dfferent from other two snce t requres learnng more than one classfers before makng a fnal decson on classfcaton. Several state-of-art algorthms worked n a smlar herarchcal way [5, 8, 15, 3]. Presentng ths strategy avals us to make comparson wth such work. 3.3 Ancestor-Assstant Strategy The ancestor assstant strategy s guded by the followng two consderatons. Frst, the tranng data from the category canddate tself may be nsuffcent n sze, especally for a deep category. Thus, we need to obtan more data elsewhere. Second, the tranng data from ts hgh ancestors may be too general to reflect the characterstcs of the deep category canddate. In one word, we want to some borrow tranng data from the ancestors but should not go too hgh up. Hence, we propose a trade-off between the herarchcal strategy and flat strategy. That s, we combne the tranng data from the category canddate tself and those from ts ancestors and sblngs. More precsely, we fnd the farthest ancestor of the category canddate whch s not the ancestors of other category canddates. And the Web pages at ths ancestor and all offsprng of ths ancestor are pcked up. The tranng data for category 874 are from those of 834 and all offsprng of 834 whle the tranng data for category 902 are from those of 854 and all offsprng of 854 snce the common ancestor s category 24 (n Fgure 3). From the vewpont of herarchcal classfcaton, we attach the maxmal subtree contanng one and only one category canddate to the root. The tree n Fgure 5 can clearly clarfy ths strategy. 4. CLASSIFICATION MODEL Due to the adaptve nature of the problem, a classfer that s fast to learn s preferred because the classfers are traned onlne at query tme. If a classfer such as SVM s employed, the long tranng tme mght prevent us from delverng the result n a tmely manner. To ths end, we prefer the naïve Bayesan classfer. 4.1 Event Model There are dfferent event models for nputs of naïve Bayesan text classfer. [16] mentoned two: mult-varate Bernoull and multnomal models. These two models regard each document as a whole and the jont dstrbuton of the whole documents and the target categores are consdered. In ths paper, we use the event model ntroduced n [17], whch makes t easy to nterpret our later modfcaton on model. We regard each document as a sequence of random varables, A 1,... A n, where n s the length of the sequence (document). Each A corresponds to the th word n the document. These random varables are ndependently, dentcally dstrbuted, as the naïve assumpton says. Ths makes observaton on any poston n a document can be used to estmate for all postons. The support of these varables are all dstnct word IDs. In later dscusson, we do not explctly attach a poston subscrpt for A. Then we do not drectly model the jont dstrbuton between documents and categores but between words and categores. And a document s the ntersecton of a sequence of word events. Or n other words, f one has a N-faced dce where N denotes the vocabulary sze and toss t up for fnte tmes, ths process generates a document by recordng each result after tossng up the dce. 4.2 Nave Bayesan Classfer Under the above event model, the classfer actually estmates P(A = w j c ) for all and j where j terates over all words and terates over all target canddates. The classfer estmates the probablty that a document (sequence of word events) belongs to a category by computng P(c v) P(v c )P(c ) = P(c ) P(A = w j c ) v j (1) where c s a category, v s the document, N s the vocabu- 142

5 lary sze, w j s each word n vocabulary, and v j s the correspondng count n v for word w j. P(c ) s called the a pror probablty that a document v belongs to category c and usually estmated by countng tranng data from dfferent categores. However, n our settng, the tranng data are taken from the manually created herarches such as ODP, and the test examples are from search engnes such as Google. The obtaned top 100 (or 200) search results (test cases) may be ncomplete. That means the dstrbuton of categores may not be the same as that n a manually created herarchy. We observed ths nconsstency n most of our evaluated queres. Therefore, we beleve that t s preferred to weaken the a pror probabltes and remove the P(c ) tem n the formulaton. Ths can also be regarded as the assumpton that the a pror probablty of each category s weakened to the same amount, 1/n, where n s the number of categores, accordng to the maxmum entropy prncple. Besdes, [12] assumed that a pror dd not actually have great mpact. 4.3 Dscrmnatve Nave Bayes Classfer Applyng Bayes theorem, we obtan P(A = w j c ) = = P(A = wj, c) P(c ) P(c A = wj)p(a = wj). P(c ) Under the assumpton that all P(c ) are equal, we can rewrte = = P(A = w j c ) v j ( P(c A = w j)p(a = w j) P(c ) ) vj ( P(c A = w j) v j P(A = wj) P(c ) P(c A = w j) v j ) vj snce all P(A = w j) are dentcal across dfferent categores gven the document. Now we arrve at P(c v) P(c A = w j) v j (2) Ths form facltates another way of parameter estmates. From the form of (1), each tem P(A = w j c ) v j n the score functon ndcates that f w j s common n a category c, then a test case wth hgh occurrence of w j wll receve a hgh score. Lkewse, f w j occurs very lttle n a category c, then a test case wth low occurrence of w j wll receve lttle penalty from the occurrence of w j. Thus, we can regard each P(A = w j c ) as a vote on smlarty between the category c and the test case v. The ultmate score s the combnaton of votes on smlarty from all occurrng words. From the form of (2), we can regard each word as a vote on dscrmnaton. If one word only occurs n one category, whch can be regarded as a dscrmnatve word for that category, t wll contrbute a maxmum vote, 1, regardless of how many other words occur n that category, whch may nfluence or weaken the vote from ths word n the standard naïve Bayesan classfer dscussed n Secton 4.2. Therefore, we refer to ths classfer as dscrmnatve naïve Bayesan classfer. 4.4 Parameter Estmate If we denote the occurrences of the event (A = w j, c ) n tranng data as δ j, for (1), a maxmum lkelhood estmate yelds and for (2), P(A = w j c ) = δj j δ j P(c A = w j) = δj δ j So, for standard naïve Bayesan classfer, the dscrmnant functon for c s f (v) = j δ j + ǫ v j ln j (δ j + ǫ) and for dscrmnatve naïve Bayesan classfer, the dscrmnant functon for c s g (v) = j ǫ s used for smoothng. δ j + ǫ v j ln (δ j + ǫ) 5. IMPLEMENTATION We employ the ODP herarchy n ths work. Ths drectory contans 17 top-level categores such as Busness, Computer, Game, Health, etc. There are totally 712,548 categores and 4,800,870 web pages n our dumped verson. To reduce the scale of whole herarchy to a manageable sze and lmt ourselves to Englsh content only, three toplevel categores, Adult, Regonal and World are not used n our whole system. If the category canddates under these three categores are returned from ODP after a query, we just gnore them. Thus, 1, 946, 361 documents and 170, 198 categores are remaned. 5.1 Off-lne Cache To create the classfer effcently gven a set of categores, we frst try to download all regstered Web pages n ODP. The Web pages crawled are somewhat less than those regstered n the ODP herarchy snce some network errors occurred durng crawlng. In total we crawled 1,297, 222 Web pages that are dstrbuted over 157, 927 categores. Then, we can buld a vocabulary lst of all Web pages crawled. The orgnal vocabulary contans 6,387,537 dstnct words 4. We perform a global feature selecton n whch words wth low occurrence are removed. Words less than 4 occurrences over the whole herarchy are removed from our vocabulary. Ths reduces vocabulary by 80%, down to 1,287,715 words. We pre-counted and saved δ j for all words n all categores. Thus a classfer can be quckly created by readng these δ j for nvolved words n nvolved categores. We use the Ancestor-Assstant strategy and dscrmnatve naïve Bayesan classfer n our fnal desgn of the Deep Classfer, whch wll be shown best among other alternatves later. 4 word here means a sequence of letter, number and hyphen and not startng and endng wth a hyphen 143

6 30 % All 1.9 mllon 1.3 mllon 0.3 mllon Level Fgure 6: # of Queres vs. # of Top-level Categores 5.2 Space and Tme Complexty The Deep Classfer requres a two-dmenson table of δ j. Each row corresponds to a category and each column a word n the vocabulary. The space complexty s O(M N), where N s the sze of vocabulary and M s the number of all categores. Not all the columns are loaded at ntalzaton for space consderaton. The Second Chance strategy [22] s employed to swap n and out rows. When testng a search result, only words occur n the snppets are consdered. So the tme complexty for testng s O(n logn + n m + K), where n s the length of the snppets, m s the number of category canddates and N s the sze of the whole vocabulary. The frst tem denotes the tme to convert snppets nto word IDs. The second tem denotes the tme to classfy. And K s the tme for memory swappng only when the requred column has not been loaded prevously, so t s optonal. 6. EXPERIMENTS In ths secton, we frst show the collected statstcs about the category dstrbuton for queres. Then we ntroduce the data sets we used for evaluaton. And then we valdate on Dataset I our statement n Secton 3 that tranng data s usually nsuffcent at deep categores. Fnally, performances of dfferent strateges for collectng tranng data proposed n Secton 3 and dfferent classfer models proposed n Secton 4 are compared on the two datasets. 6.1 Category Dstrbuton We have reported n the Introducton secton that the categores of a query are usually dstrbuted among a few concentrated categores. We collected 1000 popular queres from a famous search engne, and counted how many toplevel categores the results of these queres from ODP are dstrbuted over. The resultant dstrbuton s shown n Fgure 6. There are about 94.7% queres are dstrbuted over less than sx categores and about 74.2% queres are over three or less categores. The two most wdely dstrbuted queres are games (n 14 top-level categores) and books (n 12 top-level categores). Ths justfes two ssues: (1) Drectly classfyng search results nto top-level categores may be too coarse for many queres and arrangng search results nto deep categores can separate them nto more dfferent categores; (2) Adap- Fgure 7: Dstrbuton of Document under Dfferent Levels tvely learnng a classfer on dfferent categores for dfferent queres s preferred to a unversal counterpart on all categores snce queres wll not be dstrbuted over too many categores. Ths s obvous especally at the top level. 6.2 Data Set To the best of our knowledge, there s no standard data set for our evaluaton. In order to evaluate the performance of the Deep Classfer, we prepared two datasets, a large one and a small one, for evaluaton. In each dataset, we collected top 100 search results for each query Dataset I: Randomly Selected 100 Queres The ODP data we used n our Deep Classfer mplementaton (dscussed n Secton 5) for classfer tranng s about 1.3 mllon pages, there are another about 0.6 mllon pages are not ncluded. We use half 5 of them to buld a search engne ndexed by Lucene 6. Ths engne was bult to smulate the role of search engne lke Google. We randomly selected about 100 queres from query logs and collected top 100 search results of each query from our faked search engne. Snce each search result s from the ODP, the category attrbute s already present. Ths greatly saved human labor to annotate large number of search results wth categores and made t possble to prepare such a large dataset n lmted tme and human labor. We also compared the dstrbuton of the three collectons: full 1.9 mllon pages regstered n ODP, 1.3 mllon pages mentoned n Secton 5 for Deep Classfer tranng and 0.3 mllon pages for the faked search engne. Both the percentage of document number and category number under each level are close n these three collectons. The comparson of document number s shown n Fgure 7. That of category number s smlar. Ths ensures that we wll not ntroduce great bas of search results towards some levels, deep or shallow, of categores. Lkewse, ths also ensures that the network falure mentoned n Secton 5 dd not cause a greatly nconsstent dstrbuton of both Web pages and categores wth the full verson. On ths dataset, we also valdated the assumpton that 5 The 0.6 mllon pages are those whch cannot be downloaded at the frst tme and ths half s those successfully downloaded at a second try

7 Table 1: Selected Ambguous Queres query # of cat # of top-level cat max depth ajax apple dell jaguar java saturn subway trec ups Herarchcal Flat Ancestor-Assstant Mcro-F1 Macro-Precon Macro-Recall Macro-F1 tranng data at deep categores are very few. Flat strategy only obtaned 21.6 tranng examples per query and category canddate. Ths fgure sgnfcantly ncreased to when the Ancestor-Assstant strategy s employed Dataset II: Ambguous Queres Ambguous queres are lmted n number and queres randomly selected n the manner descrbed above are mostly unambguous. So the frst dataset can only reflect the performance of our system on facet categorzaton of unambguous queres. To make up ths defcency, we asked our users to label the top-100 search results from real Google for 9 selected ambguous queres. Each search result was labeled by two users and f the category they labeled dd not agree, we just gnore that search result for that query. Table 1 lsts all the queres used n our evaluaton system. For each query, we also lst the number of category canddates returned from ODP, the number of dfferent top-level categores, and the maxmal depths of category canddates. The dfferent values between the second and thrd columns supports the statement clamed n Secton 2 that classfyng on large and deep herarchy can be more delcate. 6.3 Overall Performance In ths secton, we wll ntroduce the experments we have conducted to valdate these three hypotheses: 1. The Ancestor-Assstant strategy may outperform the herarchcal strategy and the flat strategy. 2. The dscrmnatve naïve Bayesan classfer may outperform the standard one. 3. The dscrmnatve naïve Bayesan classfer s supposed to be much faster than SVM although some performance may be lost Strateges for Selectng Tranng Data In ths experment, we fx to employ the dscrmnatve naïve Bayesan classfer, whch wll be shown to acheve hgher performance later, and vary the strateges for selectng the tranng data. The Mcro-F1, Macro-Precson, Macro-Recall and Macro-F1[23] on Dataset I are reported n Fgures 8, whch are averaged over all queres. For each ndvdual query n Dataset II, we report these measures separately n Table 2. As shown, we can fnd the Ancestor-Assstant strategy for tranng data selecton can acheve hghest performance. There are about 11.3%, 3.4%, 13.4%, and 15.7% mprovement on Dataset I and about 53.3%, 52.4%, 43.1%, and Fgure 8: Dataset I: Dfferent Strategy for Tranng Data Selecton Standard NB Dscrmnatve NB Mcro-F1 Macro-Precon Macro-Recall Macro-F1 Fgure 9: Dataset I: Dfferent Classfers 60.0% mprovement on Dataset II from the herarchcal strategy on the measures of Mcro-F1, Macro-Precson, Macro- Recall, and Macro-F1, respectvely. There are about 21.3%, 10.3%, 19.5% and 24.7% mprovement on Dataset I and about 14.1%, 14.3%, 68.8%, and 67.7% mprovement on Dataset II from the flat strategy on these measures. The performance of the proposed Ancestor-Assstant strategy mproved sgnfcantly from both the herarchcal and flat strateges. Ths valdates the frst hypothess that the Ancestor- Assstant strategy whch employs the tranng data from both the category canddate tself and ts ancestors and sblngs, and converts herarchcal classfcaton nto flat mode s the best. One can fnd that the flat strategy s not stable. When the tranng data from the category canddate tself s very nsuffcent, the performance wll be very poor, especally on Macro-Recall. But f suffcent, t acheves comparable performance to the Ancestor-Assstant strategy. We consder the low performance of the herarchcal strategy whch s smlar to what s adopted n some exstng work [8, 5, 15, 3] as due to the followng two factors: The top-down scheme wll accumulate the error rates at each level whch gradually reaches an unbearable amount at some deep level of the herarchy. Ths problem s overcome n our flat and Ancestor-Assstant strateges where the classfcaton s performed usng a flat mode. 145

8 Query Mcro-F1 Macro-Precson Macro-Recall Macro-F1 flat he aa flat he aa flat he aa flat he aa ajax apple dell jaguar java saturn subway trec ups (average) Table 2: Dataset II: Dfferent Strateges for Tranng Data Selecton Query Mcro-F1 Macro-Precson Macro-Recall Macro-F1 std ds std ds std ds std ds ajax apple dell jaguar java saturn subway trec ups (average) Table 3: Dataset II: Dfferent Classfers In TAPER[3], the lengths of all paths from the root to the leaves equal, but ths s not true for general. The probablty estmated by ths strategy s the product of dfferent number of condtonal probabltes, whch necessarly favors those leaves at shallow levels Standard vs. Dscrmnatve Nave Bayesan Classfer In ths secton, we conduct experments to valdate the second hypothess. Snce the Ancestor-Assstant strategy s valdated the best, we only report the subsequent experments on ths strategy. We compared the performance between the standard naïve Bayesan classfer and the proposed dscrmnatve naïve Bayesan classfer. The results are shown n Fgure 9 for Dataset I, and Table 3 for Dataset II. The proposed dscrmnatve naïve Bayesan classfer can acheve about 2.8%, 2.3%, 3.9% and 4.0% mprovement on Dataset I and about 7.3%, 28.8%, 25.6% and 24.0% mprovement on Dataset II from the standard naïve Bayesan classfer on the measures of Mcro-F1, Macro-Precson, Macro- Recall, and Macro-F1, respectvely. We consder the hgher performance produced by the dscrmnatve naïve Bayesan classfer s based on followng observatons. When two categores n deep level of the herarchy need to be dstngushed and they have a common ancestor n deep level of herarchy too, e.g. the category 874 and 902 n Fgure 3 and ther common ancestor 24, these two categores consst many common words n major contents and only a few dscrmnatve words that are of competence to dscrmnate these two categores. In such stuaton, n standard naïve Bayesan classfer, the contrbuton from these dscrmnatve words are shrunken by a large denomnator whle dscrmnatve naïve Bayesan classfer wll assgn hgher weght for these dscrmnatve words. The expermental result also valdates the second hypothess that the dscrmnatve naïve Bayesan classfer outperforms the standard one Nave Bayesan Classfer vs. SVM We also compared naïve Bayesan classfers wth the sophstcated SVM on Dataset I to valdate our thrd hypothess. We use the lbsvm package[4] for SVM mplementaton. It employs one-aganst-rest strategy to support mult- 146

9 classfcaton. We faled to observe statstcally sgnfcant mprovements by SVM, but the tme used by SVM was about 20 tmes of the tme used by naïve Bayesan classfers. The tme recorded for SVM ncluded the tme to fetch tranng examples n vector forms va dsk I/O, model tranng and classfcaton. For each query, we clear the memory cache before classfcaton, so the tme for naïve Bayesan classfers also ncluded I/O cost for fetchng saved δ j. In other words, ths s the maxmum tme needed by naïve Bayesan classfers. If the tme for watng the response from ODP and the search engne are excluded, the tme used by dscrmnatve naïve Bayesan classfer s averagely less than 1 second ncludng any optonal I/O cost, whch makes t sutable for onlne applcaton Sgnfcance Test We perform one-taled pared t-tests to check whether mprovements mentoned above are of statstcal sgnfcance. The p-values are shown n Table 4. Dataset Par Mcro-F1 Macro-Precon Macro-Recall Macro-F1 I II ds vs std aa vs he aa vs flat ds vs std aa vs he aa vs flat Table 4: P-values of T-tests 7. RELATED WORK In ths secton, we dscuss some man researches related to our problem, ncludng search result categorzaton, classfcaton on herarchcal categores, Web drectory based applcatons and classfcaton models. 7.1 Search Result Categorzaton and Clusterng As a retreval system though, TAPER[3] also ncluded search result classfcaton. The classfcaton was performed off-lne rather than at query tme. The standard naïve Bayesan classfer combned wth the herarchcal strategy for tranng data selecton proposed n Secton 3.2 s very smlar to the classfcaton model employed n TAPER[3]. Search result classfcaton algorthms have been proposed n [5, 8], whch corresponds to the shallow classfcaton algorthms n ths work. Dumas and Chen [8] have developed a system called SWISH n whch the search results were automatcally categorzed based on the top-two-level categores of the LookSmart drectory, where Web pages n the same category were clustered together. As descrbed n ther work, the category organzaton brngs many sgnfcant advantages for Web users. Partcpants lked the category nterface much better than the lst nterface, and they were 50% faster n fndng nformaton that was organzed nto categores. In [13], researchers used the top-level ODP categores as the tranng examples. For the end user, the most notable consequence of usng a classfcaton technque s the qualty of the category names. Because the documents are classfed to an exstng taxonomy, the class names are predefned and can be carefully selected to optmally convey the ntended meanngs. Thus the namng problem assocated wth the clusterng methods s avoded. However, the classfcaton scheme may be too coarse for the gven data set resultng n a top-level categorzaton where all data tems are placed n one or two classes. Clusterng s a way to show the results n more detal topcs. Many algorthms on search result clusterng are proposed, e.g. Vvsmo 7. As mentoned n [11], besdes the lack of qualty of the resultng clusters, the dsadvantages nclude the lack of predctablty of the clusterng results and the dverse mx of the obtaned sub-cluster herarches. Ths problem s even there for state-of-the-art search engne systems such as Vvsmo. 7.2 Classfcaton on Herarchcal Categores Several other researches have nvestgated the problem of classfcaton over herarchcal taxonomes [2, 10, 21, 24]. Most of ther fndngs over the testng data sets only numbered n hundreds, or at most a few thousand categores. Lu et al. [15] conducted a large scale analyss on the entre Yahoo categores to show the performance of classfcaton on a large scale taxonomy. As stated n that paper, there are about 246,279 categores n Yahoo! Drectory. The performance of classfcaton on the top-level categores s about 72% of the Mcro-F1 measure. However, when classfyng the documents nto deeper categores, the performance decreased quckly. As shown n [15], the performance s about 30% lower on the measure of Mcro-F1 at the 4th level and deeper. Drectly buldng large scale taxonomy classfers cannot work n solvng ths problem due to the poor performance of classfcaton. 7.3 Web Drectory Applcaton Web drectores, such as Yahoo!, Open Drectory Project (ODP), Gmpsy and others, asked human edtors to revew Web stes under the Web drectores. The most notable Web drectory s the Open Drectory Project, whch s one of the largest efforts to manually annotate Web pages. Over 70,430 edtors are engaged n keepng the drectory reasonably up to date. As a result, the ODP now provdes access to over 4 mllon Web pages n the catalogs. These Web drectores are organzed n a tree structure. In addton, each category has been labeled wth an understandable name. The ODP has been successfully utlzed n much related work, such as classfcaton [8, 5, 18], personalzed Web search [6, 14], evaluaton [9, 1], etc. 7.4 Classfer Model The naïve Bayesan classfer has been used for a few decades. Serval work (e.g.[7]) tred to explan ts success although the naïve assumpton usually does not hold. Although the names share the same word dscrmnatve, our dscrmnatve naïve Bayesan classfer s dfferent from dscrmnatve models dscussed n [19], whch compared the dscrmnatve classfer and the generatve classfers. Our methods focused on dscrmnatve features rather than the classfer tself. The reversed naïve Bayesan classfer mentoned n [12] s very smlar to our dscrmnatve naïve Bayesan classfer. But the event model are dfferent, and we also proposed a dscrmnaton explanaton whch was not presented n [12]

10 In that paper, the author clamed that the reversed verson and the orgnal verson s the same under the assumpton that there are the same number of tranng data n each categores, whch s not true of our stuaton. 8. CONCLUSIONS AND FUTURE WORK The exstng algorthms for search result classfcaton are based on shallow levels of topc herarches. Therefore, they are too coarse to provde the needed nformaton for user browsng. The classfcaton algorthms on large and deep herarches can provde much detaled categores but tradtonally have been done wth low classfcaton performance. In ths paper, we proposed a novel algorthm known as Deep Classfer to classfy search results nto deep topc herarches n an adaptve manner. By prunng the orgnal herarchy nto a decent sze whle reservng the herarchcal structure and depth, we ntroduced dfferent strateges for collectng suffcent tranng data n order to buld relable classfers. Dscrmnatve naïve Bayesan classfers are employed for effcency and effectveness. Expermental results showed that the performance of the Ancestor- Assstant strategy can acheve the hghest performance when compared to other strateges, whle the dscrmnatve naïve Bayesan classfer can acheve hgher performance than the standard naïve Bayesan classfer. Furthermore, the search results are demonstrated n deep herarchcal style, whch provdes more nformatve class labels for users browsng. By consderng that usng the content rather than only the snppet can make some further mprovements on performance, n the future, we wll further verfy the performance on the whole content of Web page nstead of snppets. As mentoned n Secton 6, the number of top-level categores correspondng to a query are usually less than fve. We omtted the number of queres wth no returned categores from the ODP for the lmted coverage of human labeled Web drectory. In such a stuaton, the method n ths paper wll have dffcultes. It requres further research to handle such a stuaton. For example query expanson may be a soluton to ths. At present, we smply employ a flat classfer on all top-level categores to classfy the search results from these queres. 9. REFERENCES [1] S. M. Betzel, E. C. Jensen, A. Chowdhury, D. Grossman and O. Freder. Usng manually-bult web drectores for automatc evaluaton of known-tem retreval. In Proc. of SIGIR, Toronto, Canada, [2] L. Ca and T. Hofmann. Herarchcal document categorzaton wth support vector machnes. In Proc. of CIKM, pages 78 87, [3] S. Chakrabart, B. Dom, R. Agrawal, and P. Raghavan. Scalable feature selecton, classfcaton and sgnature generaton for organzng large text databases nto herarchcal topc taxonomes. VLDB Journal: Very Large Data Bases, 7(3): , [4] C.-C. Chang and C.-J. Ln. LIBSVM : a lbrary for support vector machnes, Software avalable at cjln/lbsvm [5] H. Chen and S. T. Dumas. Brngng order to the web: Automatcally categorzng search results. In Proc. of CHI 00, pages , August [6] P. Chrta, W. Nejdl, R. Pau and C. Kohlshuetter Usng ODP metadata to personalze search. In Proc. of SIGIR, Salvador, August [7] P. Domngos and M. J. Pazzan. On the optmalty of the smple bayesan classfer under zero-one loss. Machne Learnng, 29(2-3): , [8] S. T. Dumas and H. Chen. Herarchcal classfcaton of web content. In Proc. of SIGIR, pages , August [9] P. Ganesan, H. Garca-Molna and J. Wdom. Explotng herarchcal doman structure to compute smlarty. Techncal report, Stanford Computer Scence Dept, , June [10] M. Grantzer. Herarchcal text classfcaton usng methods from machne learnng. Master s Thess, Graz Unversty of Technology, [11] M. Hearst. Clusterng versus faceted categores for nformaton exploraton. Communcaton of the ACM, 49(4):59 61, [12] A. Juan and H. Ney. Reversng and smoothng the multnomal nave bayes text classfer, In Proc. of PRIS, Alacant (Span), pages , [13] B. Kules and B. Shnederman. Categorzed graphcal overvews for web search results: An exploratory study usng U.S. government agences as a meanngful and stable structure. Techncal report HCIL , CS-TR-4715, UMIACS-TR , ISR-TR [14] F. Lu, C. Yu and W. Meng. A personalzed web search by mappng user queres to categores. In Proc. of CIKM, pages , [15] T.-Y. Lu, Y.-M. Yang, H. Wan, H.-J. Zeng, Z. Chen and W.-Y. Ma. Support vector machnes classfcaton wth a very large-scale taxonomy. SIGKDD Exploratons, 7(1):36 43, [16] A. McCallum and K. Ngam. A comparson of event models for nave bayes text classfcaton, In AAAI-98 Workshop on Learnng for Text Categorzaton, [17] T. M. Mtchell. Machne Learnng, McGraw Hll. ISBN Secton [18] D. Mladenc. Turnng yahoo nto an automatc web page classfer. In Proc. of ECAI, Brghton, UK, pages , [19] A. Ng and M. Jordan. On dscrmnatve vs. generatve classfers: A comparson of logstc regresson and nave bayes, In NIPS 14, Cambrdge, MA: MIT Press, [20] B. Sergey and L. Page. The anatomy of a large-scale hypertextual Web search engne. Computer Networks and ISDN Systems, 30(1 7): , [21] A. Sun and E. Lm. Herarchcal text classfcaton and evaluaton. In Proc. of ICDM, pages , [22] A. S. Tanenbaum. Modern Operatng Systems (the second edton) secton 4.4.4, pages , New Jersey: Prentce-Hall [23] Y. Yang. An evaluaton of statstcal apphroaches to text categorzaton. Journal of Informaton Retreval, 1 No. 1/2:67 88, [24] Y. Yang, J. Zhang and B. Ksel. A scalablty analyss of classfers n text categorzaton. In Proc. of SIGIR, pages ,

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Web-supported Matching and Classification of Business Opportunities

Web-supported Matching and Classification of Business Opportunities Web-supported Matchng and Classfcaton of Busness Opportuntes. DIRO Unversté de Montréal C.P. 628, succursale Centre-vlle Montréal, Québec, H3C 3J7, Canada Jng Ba, Franços Parads,2, Jan-Yun Ne {bajng, paradfr,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A

More information

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment TELKOMNIKA, Vol.10, No.5, September 2012, pp. 1087~1092 e-issn: 2087-278X accredted by DGHE (DIKTI), Decree No: 51/Dkt/Kep/2010 1087 Parallel Implementaton of Classfcaton Algorthms Based on Cloud Computng

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information