Abstract. 1 Introduction

Size: px
Start display at page:

Download "Abstract. 1 Introduction"

Transcription

1 Challenges and Solutons for Synthess of Knowledge Regardng Collaboratve Flterng Algorthms Danel Lowd, Olver Godde, Matthew McLaughln, Shuzhen Nong, Yun Wang, and Jonathan L. Herlocker. School of Electrcal Engneerng and Computer Scence Oregon State Unversty 102 Dearborn Hall Corvalls, OR {,, mclaughm, nong, wangyun, Abstract Collaboratve flterng (CF)-based recommender systems predct what tems a user wll lke or fnd useful based on the recommendatons (actve or mplct) of other members of a networked communty. In spte of more than ten years of research, there s lttle consensus on state-of-the-art knowledge regardng CF predctve algorthms. There are many barrers to synthess of the sgnfcant quantty of avalable publshed research on CF algorthms. We present results from an emprcal study that attempts synthess on popular CF algorthms and use ths study to llustrate some key challenges to synthess n CF algorthm research. In response to these challenges we propose the development of publcly mantaned reference mplementatons of proposed CF algorthms and emprcal evaluaton procedures and we ntroduce CoFE, a publc software framework wth the goal of jumpstartng the buldng of these reference mplementatons. Fnally, we demonstrate how CoFE was used to mplement a hghperformance nearest-neghbor-based algorthm that scales to arbtrary numbers of users. 1 Introducton Read any good books lately? Every day, people ask each other questons such as ths n an attempt to sort through the plethora of optons that both enrch and afflct modern lvng. In a world wth vastly more books avalable than any of us has tme to look at, much less read, we all need some way to decde among them. By passng along recommendatons to frends wth smlar taste, people dstrbute the work of fndng good books n order to spend more tme readng books they enjoy. Unfortunately, not all of our frends share our tastes, lmtng the number of useful counselors avalable. Furthermore, those frends who do share our tastes probably haven t read every book we mght lke, and can only furnsh recommendatons for the few they know. Of course, the problem s more general than fndng good books people must make decsons about a great many thngs, ncludng moves, restaurants, web stes, house plants, tropcal resorts, and so on. How can we get better recommendatons on such an ever-wdenng assortment of optons? Danel Lowd s now at the Department of Computer Scence and Engneerng, Unversty of Washngton, Seattle, WA, 98195, lowd@cs.washngton.edu. Olver Godde can be reached at ogodde@yahoo.com. 1/38

2 1.1 Bref Introducton to Collaboratve Flterng & Recommender Systems Collaboratve flterng-based recommender systems address precsely ths problem by drawng upon the experences of thousands or even mllons of people. For example, a book recommendaton web ste usng collaboratve flterng technology could combne the ratngs of mllons of onlne users to gve better and broader book recommendatons than any of the users could get from frends. The work of fndng good books s thus dstrbuted, as n a crcle of frends, but on a much larger scale. Amazon.com s one well-recognzed example of a ste that uses collaboratve flterng n ths manner. In addton to helpng users fnd tems of nterest, collaboratve flterng has proven to beneft e-commerce retalers as well. Amazon.com reported that many more sales result from tems recommended by collaboratve flterng than from those shown on bestseller or featured tems lsts [15]. Another success story found that collaboratveflterng based recommendaton e-mals generated twce as many purchases as manual recommendatons [27]. These systems are just two of the many recommender systems, commercal and academc, that have been developed usng collaboratve flterng. Other systems have ncluded MoveLens and NetFlx.com for recommendng moves, PHOAKS for recommendng webstes, Rngo for recommendng musc, Jester for recommendng jokes, and many more [7,9,25,26]. 1.2 Important Termnology Used n ths Paper Recommender systems are a relatvely new area of study and standardzed vocabulary s only begnnng to emerge. Here we brefly provde our defntons for the termnology used n ths paper. Resnck et al. descrbe a recommender system as follows: In a typcal recommender system people provde recommendatons as nputs whch the system then aggregates and drects to approprate recpents. [20] The most common technology used to mplement recommender systems s collaboratve flterng, whch we may refer to as CF for short. Recommender systems may ncorporate non-cf technology. However, n ths artcle, we focus exclusvely on CF technology. We refer to anythng recommended by a recommender system as an tem. Ths term could refer not only to books and moves, but also to restaurants, web pages, New Year s resolutons, and so on. The type of tems beng recommended and the context n whch they are recommended s known as the content doman of recommendaton (e.g. web-based book recommendaton doman). A user s an ndvdual who nteracts wth a recommender system, provdng the system wth ratngs n order to receve recommendatons or predctons. Ratngs are statements of preference by users for tems. At a mnmum, a ratng conssts of three elements: a user, an tem, and a ratng value. The ratng value may be bnary, nteger valued, real-valued, or even unary (a sngle postve ratng value, but no negatve or ambvalent ratng values). For nteger- and real-valued ratngs, low numbers generally ndcate negatve preference (the tem was bad), mddle numbers ndcate ambvalence (the tem was nether good nor bad), and hgh number ndcate postve preference (the tem was great!). A predcton or predcted ratng s a recommender system s estmate of the ratng value that a user would assgn to an tem. We refer to a recommendaton as an tem wth a hgh predcted 2/38

3 ratng for a user that s recommended to the user. Recommendatons are often called best bets. A collaboratve flterng algorthm s a procedure that examnes ratngs data from users, nfers preference patterns among many users and many tems, and computes a predcted ratng for a gven user (termed the actve user) on a gven tem (termed the actve tem). A ranked lst of recommended tems can be generated as well, by predctng ratngs for all tems and lstng the N tems whose predcted ratngs are the hghest 1. The accuracy of a CF algorthm s defned as how close an algorthm s predcted ratngs are to the true ratngs suppled by users. Explct ratngs are evaluatons that are drectly entered by users; for example, a user s ratng of 1-5 stars for an tem at Amazon.com s an explct ratng. Implct ratngs, n contrast, are ndcatons of preference that are derved from other user behavor, such as purchasng certan tems from a catalog or vstng specfc web stes. Algorthms that work wth mplct user ratngs generally operate dfferently to take nto account a very dfferent qualty of nformaton. For ths study, we chose to focus solely on explct ratngs and algorthms desgned to operate on them. A dataset s a collecton of preference ratngs data from a communty of users on a set of tems n a partcular target doman. The most popular publcly avalable datasets nvolve ratngs for moves and vdeos: the EachMove dataset [16], and the MoveLens dataset [1]. A metrc s a computaton appled to the output of a collaboratve flterng algorthm to provde an evaluaton of the qualty of the collaboratve flterng algorthm. 1.3 The Challenge of Fndng the Best CF Algorthm There have been many dfferent collaboratve flterng algorthms proposed to compute predctons of users ratngs. One algorthm wll be more effectve than the others, gven specfc crcumstances. Gven a choce, you would always want to use the most effectve algorthm possble, snce that ought to result n a better user experence, more e-commerce sales, less tme wasted browsng through rrelevant tems, and so on. For just ths reason, a more effectve algorthm has become the most popular research goal among collaboratve flterng researchers n recent years. How one best determnes f an algorthm s more effectve s stll open to debate, but most researchers use an algorthm s average accuracy when tested on an exstng database of user ratngs. Others may look at executon tme and memory requrements as well. So what are the best CF algorthms? Wth almost ten years of publshed scentfc research on the development and evaluaton of CF algorthms, we would expect to have sold recorded knowledge about whch algorthms are best for whch content domans. An examnaton of the publshed research ndcates that we are far from that goal. What we fnd are many reports of emprcal studes that are hard to generalze beyond the context of ther publshed study. We fnd many publshed artcles ntroducng new CF algorthms that follow the same template. The template begns by proposng an algorthm and then clamng expermental results showng ts 1 Snce computng the predcted ratng of every tem s usually too computatonally ntensve, scentsts have developed algorthms that approxmate the process, generatng a set of tems that have predcted ratngs above a threshold (e.g., predct tems that are lkely to be good but may not necessarly be the best recommendatons). 3/38

4 emprcal superorty over one or two baselne algorthms. Unfortunately, we fnd t hard to evaluate the strength of such results due to varatons n the expermental procedures, dfferent datasets, and dfferent algorthm mplementatons used to evaluate those algorthms. New work and new methodology s requred n CF algorthms research to brng us closer to the goal of understandng what algorthms are best for whch domans. Rather than contnue to propose new algorthms usng methodologes that nhbt cross-comparson, we need research that seeks to synthesze the dversty of work that has been done before nto a coherent pcture that can be drectly appled by practtoners seekng to mplement or employ recommender systems usng collaboratve flterng. 1.4 Contrbutons of ths Artcle Ths artcle presents some ntal results of our attempt to unfy the knowledge regardng the accuracy of collaboratve flterng recommendaton algorthms. In partcular, our contrbutons are: 1. A specfcaton of challenges faced by scentsts attemptng to synthesze the exstng publshed research on collaboratve flterng algorthms. These challenges are presented n Secton A case study representng an attempt to synthesze exstng publshed work on CF algorthm accuracy. Ths study conssts of an emprcal comparson of a collecton of proposed collaboratve flterng algorthms that prevously have only been examned ndvdually aganst non-comparable baselne algorthms. Ths case study n Secton 3 s used to llustrate the challenges that we ntroduce. The case study tself has several key contrbutons to further research on CF algorthms: a. Evdence that contradcts prevous accuracy clams for certan algorthms. b. Evdence showng that nearest neghbor algorthms (n partcular, the Item-Item algorthm) are most accurate at predctng ratng values on mult-valued ratng data of entertanment. c. Enhancements to several exstng algorthms that sgnfcantly mproved accuracy n our experments. Most notable s an adaptaton of the Bayesan network approach suggested by [4] that uses normalzed user ratngs rather than dscrete ratng classes. 3. Proposals for specfc research methodologes and research nfrastructure that would enable future research to better face the prevously descrbed challenges, enablng more genercally usable scentfc results. These proposals are dscussed n Secton 4 of ths artcle. 4. An nfrastructure that we have developed and made freely avalable to the publc as a frst step towards facng the challenges, ncludng a. A portable and hghly extensble software framework for nvestgatng collaboratve flterng algorthms, enablng rapd desgn and evaluaton of new algorthms. Ths software framework represents the frst step towards collaboratve flterng research nfrastructure that enables more effectve future CF research. 4/38

5 b. Source code for reference mplementatons of a collecton of collaboratve flterng algorthms that have been proposed by promnent CF researchers. Most of the algorthms have been tuned to ensure that they can perform at least as well as ther orgnal creators clamed. Ths nfrastructure s ntroduced n Secton Fnally, a hgh performance, producton capable recommendaton engne, bult on top of the CF software framework. Supportng well-known nearest-neghbor methods, ths engne can generate hundreds of recommendaton lsts per second, gven mllons of user ratngs. The source code for ths engne s freely avalable. Ths software, ntroduced n Secton 5.1, should greatly ncrease the avalablty of collaboratve flterng technology to all software developers. 2 Challenges to Synthess Greater scentfc advances are possble when ndvdual research contrbutons can be syntheszed nto a broader understandng through objectve comparson and thrd-party evaluaton. In ths secton, we ntroduce characterstcs of collaboratve flterng algorthms research that present challenges to achevng such an understandng. The lst of challenges that we have temzed n ths secton s not ntended to be a complete lst. Rather they are the challenges that we have found frequently obstruct our attempts to synthess. In the followng secton, Secton 3, we llustrate examples of these challenges n an emprcal case study. 2.1 Challenge 1: Dfferent Datasets Dfferent datasets have dfferent characterstcs that can sgnfcantly affect the outcome of emprcal analyss of collaboratve flterng algorthms. For example, datasets may have dfferent numbers of ratngs per user and thus dfferent amounts of tranng data or they may have dfferent granularty of ratngs one dataset may nclude ten levels of preference, whle the other may only nclude fve. When two groups of researchers use entrely dfferent datasets, t s dffcult to synthesze ther results nto any form of greater understandng. Propretary datasets ones that have not been released to the publc present addtonal challenges because scentsts not afflated wth the orgnal researchers are unable to reproduce or extend results wthout access to the data. Even when researchers use the same, publcly avalable dataset, ther results may be dangerous to compare. CF researchers often want to run hundreds or thousands of dfferent CF algorthm varants aganst the dataset. Yet wth very large datasets, ths can take an unacceptable amount of tme, so they create smaller subsets of the whole dataset. Scentsts have taken a varety of approaches (or lack of) to ensure that the subsets are representatve of the whole. The addtonal varance creates addtonal uncertanty when tryng to synthesze results acheved on dfferent subsets of the same dataset. For example, some researchers sample only users wth a mnmum number of rated tems to ensure that learnng algorthms have enough nformaton to work wth for each user [5,9]. Others may randomly sample users wthout regard for number of ratngs. 5/38

6 2.2 Challenge 2: Dfferent Evaluaton Metrcs There are many dfferent metrcs that can be appled (for a more complete dscusson of CF metrcs, see [10]), however for the purposes of ths artcle, we are consderng collaboratve flterng accuracy metrcs. Examples of accuracy metrcs nclude mean absolute error [9], precson and recall [23], and the rank half-lfe metrc [4]. When two experments use dfferent emprcal evaluaton metrcs, then the results of those metrcs are very challengng to synthesze. For example, one experment may report that algorthm A has a mean absolute error of 0.7, whle the other experment may report a precson of 70%. How do these two algorthms compare? We cannot say based on ths nformaton from the two experments; synthess s not possble. In partcular, we can see ths problem wth ranked lst evaluaton of CF algorthms, where there s no emergng standard evaluaton metrc. Ranked lst evaluaton metrcs attempt to measure the effectveness of an algorthm at producng a useful lst of top recommendatons ranked by lkely relevance or nterest to the user. Ths contrasts wth mean absolute error whch measures overall predcton error. As an example, dfferent ranked lst metrcs have been used by Breese et al. [4], Karyps et al. [13], Sarwar et al. [21], and Schen [24]. In ths study, we lmted our experments to measurng predcton error; we leave ranked lst evaluaton for future work. 2.3 Challenge 3: Dfferent Expermental Protocols An expermental protocol ncludes the procedures used to tran an algorthm to learn preferences, and the exact procedure to apply the evaluaton metrc. At a hgh level, expermental protocols for analyss of offlne data (data prevously collected) all follow a common procedure. Ths procedure can roughly be descrbed as wthhold-and-predct. A ratngs dataset s broken nto two subsets, the learnng set and the test set. The learnng set s fed nto the collaboratve flterng algorthm as tranng data, whch then predcts ratngs or makes recommendatons for tems not n the tranng set. The accuracy of those predctons or ratngs s evaluated based on the avalable ratngs n the test set. Asde from ths basc organzaton, there are many ways that expermental protocol can vary that can nhbt synthess. These varances nclude: a. Treatment of recommendatons for whch test ratngs are not avalable. If the test protocol nvolves havng an algorthm generate a top-n best recommendatons lst for each test user, then there s the stuaton that the algorthm may recommend an tem for whch we have no ratng n the test set. Ths ssue can be handled n several ways. Most commonly the expermental protocol wll smply evaluate the top N recommended tems for whch there are test ratngs. However, another approach s to assume a default negatve ratng. b. Treatment of mssng or low confdence predctons. Some algorthms are unable to produce predctons or recommendatons for tems when nsuffcent ratngs data for those tems s avalable. Several methods have been appled. The most obvous method s to gnore faled predctons, as done by [3,9]. Alternately, the set of ratngs testng can be restrcted to be only those that all algorthms could predct. Or a less accurate, less personalzed algorthm, such as the average ratng for an tem, can be used to predct n stuatons where the prmary algorthm fals to generate a predcton. 6/38

7 2.4 Challenge 4: Varance n algorthm mplementaton The fnal of the four challenges s that when dfferent scentsts mplement what they beleve to be the same algorthm, the mplementatons commonly provde dfferent results. Ths varance can occur for many reasons, ncludng: a. Dfferent nterpretaton of algorthm detals. In certan research publcatons, such as conference proceedngs, the need for brevty almost guarantees that there wll be nsuffcent space to descrbe all the detals requred to completely specfy how to mplement a partcular algorthm. Thus, scentsts tryng to re-mplement a prevously publshed algorthm often wll apply ther own nterpretaton of how detals should be handled. b. Algorthm tunng. Some algorthms have many parameters; adjustng the parameters can cause the algorthm to respond dfferently to partcularly nputs. Each scentst may tune the algorthm to meet a dfferent need usng a dfferent set of values for controllng parameters. Furthermore, scentsts rarely publsh n detal how algorthm parameters were tuned. c. Errors n the code mplementng the algorthm. It s very hard to detect errors n mplementatons of collaboratve flterng algorthms unless the errors cause the algorthm to generate results that are hghly mprobable. d. Applcaton of algorthm enhancements. Clean and smple abstract representatons of algorthms communcate the best and are more readly accepted under peer revew. Yet real world success often requres that these clean and smple algorthms be embellshed, often wth heurstcs that cannot be justfed theoretcally. These enhancements are often not dscussed n publshed work, yet certan enhancements are requred to produce the optmal accuracy. Varance n algorthm mplementaton s a consderable problem due to the dynamcs of peer revew. Scentfc peer revew culture rewards researchers who present new algorthms that outperform exstng algorthm by some crtera. In order to gan acceptance for ther new algorthm, researchers must mplement one or more of the prevously exstng algorthms, and then show that ther new algorthm out-performs them. These researchers are hghly motvated to ensure that ther new algorthm has no errors, and that t has the best enhancements appled. However, they have less ncentve to ensure that the mplementatons of the competng algorthms are optmally mplemented. 3 Case Study: An Emprcal Comparson of Popular Algorthms In spte of the aforementoned challenges (and at frst, n some gnorance of them), we set out to synthesze 10 years of recorded knowledge about collaboratve flterng (CF) algorthms through emprcal expermentaton. Ths conssted of comparng the accuracy of many proposed CF algorthms n a common controlled expermental setup. The prmary research questons of ths actvty were as follows: Could we replcate the good performance clamed by the authors of each algorthm? Could we replcate the clams of relatve performance made by authors of each algorthm? 7/38

8 Could we establsh a global rankng of algorthm qualty, wth respect to mean absolute error? We evaluated a set of algorthms on two subsets of the EachMove dataset (one wth ncreased sparsty) as well as the Jester joke dataset. Our metrc of evaluaton was mean absolute error. In ths secton, we descrbe our experment, reportng both our emprcal results and the examples of the challenges we observed. 3.1 Descrptons of Algorthms Evaluated We chose to evaluate prmarly algorthms that are frequently cted n the lterature, specfcally algorthms the work on explct ratngs data, as well as a few addtonal algorthms we constructed. In some cases, we made mnor modfcatons to exstng algorthms n order to mprove performance. For the purposes of repeatablty, we descrbe all the tested algorthms n some detal here. A summary s provded n. Code Name Implementaton Reference Modfed MIR Mean Item Ratng [9] (as average ) AMUR Adjusted Mean User Ratng [9] (as bas-from-mean average ) AMIR Adjusted Mean Item Ratng New - CORR Pearson r Correlaton [9] No VSIM Vector Smlarty [4] Yes HORT Hortng [3] Yes ITEM Item-Item [23] No BC Bayesan Clusterng [4] No BN Bayesan Network [4] No CBN Contnuous Bayesan Network New - PD Personalty Dagnoss [18] No Table 1: Summary of all algorthms ncluded n study Notaton In order to more precsely descrbe the operaton of some of the algorthms, we defne here a certan amount of mathematcal notaton for use n ths secton. In ths paper, U represents the set of all users and I represents the set of all tems. Let U be the subset of U consstng of all users who have rated tem. Analogously, let I u be the subset of I consstng of the tems rated by user u. Let r be the ratng of user u on tem, f known. In ths No No 8/38

9 document, the varables and j refer to tems and u and v refer to users. Let r u be user u s mean ratng, and let r be the mean ratng for tem. The algorthms we nvestgated all compute a predcted ratng for a user u on an tem, whch we refer to as p,. In ths context, u and may be termed the actve user and actve tem, respectvely. u Non-personalzed Algorthms One of the smplest recommendaton technques s to recommend those tems that are most popular. We refer to such algorthms as non-personalzed, snce ther ratngs reflect the preferences of the entre user set more than those of the actve user. The recommendatons of these algorthms are analogous to the New York Tmes bestsellers lst or weekly box offce statstcs. Snce these algorthms are so smple and straghtforward, we expect that more complcated algorthms should at least match ther performance n predctng ndvdual ratngs Mean Item Ratng (MIR) The smplest algorthm we mplemented uses the mean ratng of the actve tem for ts predcton, ndependent of whch user s the actve user: p = r. Ths algorthm has been used as a baselne by Breese et al. [1998], Herlocker et al. [1999], Goldberg et al. [2000], and others, sometmes under the name POP, short for popularty. We refer to ths algorthm as Mean Item Ratng to dstngush t from other non-personalzed algorthms Adjusted Mean User Ratng (AMUR) In examnng dfferent CF ratng data sets, we have found that each user has a dfferent dstrbuton of ratngs across possble ratng values. For example, 80% of one user s ratngs may have the value 4, whle 75% of another user s ratngs may have the value 3. One possble explanaton for these varatons n ratng dstrbuton s that the two users descrbed may have had dfferent perceptons of the ratng scale. For example, one user s ratng of 4 may ndcate the same underlyng preference as another user s ratng of 3. We can account for ths by usng the offsets from each user s mean ratng rather than ther raw ratngs. Herlocker et al. [1999] found that averagng these offsets and addng the actve user s mean ratng produced more accurate predctons than Mean Item Ratng: p = r u + ( v r U v, U r ) v We refer to ths as a normalzaton of ratngs even though t s not a true normalzaton n the statstcal sense 2. Algorthms that use ths knd of normalzaton assume that each user s mean ratng represents a neutral preference, and that set amounts above or below that mean represent the same preference for all users. One can thnk of examples where ths s not the case: for 2 That would requre dvdng by the standard devaton of each user s ratng dstrbuton thus creatng a normal dstrbuton. However, adjustng for dfferences n the wdth of a dstrbuton (the std. dev.) has not shown to sgnfcantly mprove predcton accuracy [Herlocker et al. 1999]. 9/38

10 example, f some users only rate tems they lke, then a user s mean ratng could be a poor ndcaton of neutral preference Adjusted Mean Item Ratng (AMIR) An alternate normalzaton technque s to use mean tem ratngs rather than mean user ratngs. By takng the Adjusted Mean User Ratng algorthm and swappng users for tems, we obtan a new algorthm wth predctons generated by the followng formula: p = r + ( j r I j u I u r j ) Ths algorthm assumes that each user rates all tems some constant amount above or below those tems mean ratngs. For example, n ths model, one user mght rate every tem 1 hgher than ts average, whle another mght rate every tem 1 below ts average. Thus, all users are stll assumed to have the same overall preferences, though ther ndvdual ratng scales may dffer Nearest Neghbor Algorthms The frst algorthms used n collaboratve flterng systems were nearest neghbor algorthms. Wth one excepton, algorthms of ths class generate predctons by frst computng the smlarty of the actve user to each potental neghbor and then dong a weghted average of the most smlar neghbors ratngs for the actve tem. The underlyng theory s that users who have rated tems smlarly n the past are lkely to do so n the future. These algorthms are all analogous to askng lke-mnded frends for tem recommendatons. The one excepton to ths s the Item-Item algorthm, whch forms a neghborhood of tems rather than users, but s otherwse qute smlar to the user-based algorthms [23]. Some researchers have referred to ths class of algorthms as memory-based, because many of them requre that all ratngs be kept n memory n order to compute predctons [4,12,18]. However, as we show n Secton 6, ths s not always the case varants of the Pearson r Correlaton algorthm usng samplng can be shown to be almost as effectve as the orgnal whle usng only a fracton of the memory Pearson r Correlaton (CORR) The Pearson r Correlaton algorthm was used n some of the earlest collaboratve flterng systems [19,25], yet t remans a popular baselne algorthm today, snce t s easy to mplement and farly effectve. In ths algorthm, Pearson s r correlaton coeffcent s used to defne the smlarty of two users based on ther ratngs for common tems: sm( v) = I I u v ( r r )( r σ σ u v u v, r ) v σ u and σ v represent the standard devatons of the ratngs of users u and v, respectvely. Both the ratng averages ( r u, r v ) and standard devatons are taken over just the common tems rated 10/38

11 by both users. In order to acheve the best possble mplementaton, we have used the modfcaton suggested by Herlocker et al. [1999], whch weghts smlartes by the number of tem ratngs n common between u and v when less than some threshold parameter : max( I u I v, γ ) sm ( v) = sm( v) γ Ths adjustment avods overestmatng the smlarty of users who happen to have rated a few tems dentcally, but may not have smlar overall preferences. Such correlatons may be hgh, but due to the lmted amount of data, we have lttle confdence n them. The adjusted smlarty weghts are used to select a neghborhood V U, consstng of the k users most smlar to u who have rated tem. If fewer than k users have postve smlarty to then only those users wth postve smlarty are used. The ratngs of these neghbors are combned nto a predcton as follows: p = r u + v V sm v) ( r v V ( v, sm ( v) r ) v Vector Smlarty (VSIM) The Vector Smlarty algorthm consders each user s set of ratngs as a vector and uses the cosne of the angle between two users ratngs vectors as a measure of ther smlarty [4]. More precsely, sm( v) = I I u u I r v r r I v, v r v, As n Pearson r Correlaton, a neghborhood V s formed consstng of the k users most smlar to the actve user that have rated the actve tem. (Breese et al. [1998] dd not lmt the number of neghbors, but we found ths step to be very helpful.) A predcton s then computed as follows: p( ) = v V v sm( v) r V sm( v) Breese et al. [1998] also proposed adjustng the smlarty weght computaton so that agreement about nfrequently rated tems would contrbute more to two users smlarty than agreement about frequently rated tems. However, we dd not fnd ths modfcaton to be helpful n our experments, so we dd not nclude t n our experments. v, 11/38

12 Hortng (HORT) One weakness Pearson r Correlaton and Vector Smlarty share s that n order for two users to be consdered smlar, they must have rated tems n common. If only a few users have rated a gven tem, none of whom has much n common wth the actve user, then Pearson r Correlaton and Vector Smlarty mght both be unable to produce a predcton. In theory, two users who rate tems smlarly to a thrd are lkely to have smlar taste, even though they may have rated no tems n common. The Hortng algorthm recognzes that ndrect smlarty by allowng neghbors to be acqured transtvely [3]. In the Hortng algorthm, each user s represented by a node n a drected graph, where a lnk from user u to user v means that user v predcts user u. Also stored wth each lnk are two ntegers, s { 1, + 1} and t Z. These varables specfy a lnear transformaton ( r) = sr t L s, t + to normalze the target user s ratngs wth respect to the orgnatng user s ratngs. Ths allows users who rate tems consstently hgher, lower, or opposte of each other to predct each other. (Note that on a 0 to n ratng scale, all useful values of t wll actually le between 2n and +2n, snce those offsets are suffcent to convert a mnmal ratng to a maxmal ratng, or vce versa, even when s = -1.) In practce, we dd not fnd negatve transformatons (s = -1) to be helpful, so we dd not use them. Adjacences n ths drected graph are determned by two threshold requrements. The frst establshes that the target user has rated a representatve sample of the tems rated by the orgnatng user. The orgnal authors called ths requrement hortng, a new word derved from cohorts, specfc to ths algorthm. User u s sad to hort another user v f ether v has rated some fracton of the tems rated by or f v has rated at least of the tems rated by u. and are both algorthm parameters. Mathematcally, user u horts user v f Iu α Iu Iv / Iv or I u Iv β. Note that ths s not symmetrc: f user u has rated 10 tems, user v has rated those same 10 tems plus 100 more, = 0.2 and = 20, then u horts v but v does not hort snce u has not rated a suffcent sample of the tems rated by v. The second threshold establshes that the target and orgnatng user tend to rate tems smlarly, after takng nto account dfferent ratng scales va the lnear transformaton ( ). The L s, t r predcton error e between two users s the average absolute dfference of ther common ratngs. More precsely, r ( I I Ls, t r u v e( v, s, t) = I I u v v, ) If there exst s and t such that e ( v, s, t) < δ for some predcton error parameter and user u horts user v, then user v s sad to predct user u. Ths means that there s a lnk from user u s node to user v s wth the varables s and t set to mnmze e ( v, s, t). These optmal values for s and t can be found by calculatng the predcton error e for each possble value of s and t. The predcted ratng for user u on tem s computed by searchng through the graph at each dstance level l = 1 k and determnng f there s at least one user n the graph wthn dstance l of the user u that has rated tem. The predcted ratng, p, s the average transformed ratng gven by all users at dstance l who have rated tem, for mnmum dstance l. For users more u 12/38

13 than one step away, transforms are composed. If no user of dstance less than or equal to k from the actve user has rated tem, then no predcton can be computed. Ths method should tend to use ratngs from better predctors f possble, but wll use worse ones as necessary. On some datasets, we found that accuracy could be mproved by addng two addtonal parameters: m, the mnmum number of neghbors requred, and M, the maxmum number of neghbors allowed. Here, neghbors are those users whose ratngs are aggregated to compute a predcton for a gven tem. Whle traversng the graph to make a predcton, f fewer than m neghbors have been found at a dstance level of l or less, the algorthm wll contnue searchng at the next level. Note that n ths case, the neghbors whose transformed ratngs are averaged to produce a predcton could come from two or more dfferent levels. Once M neghbors have been found, the algorthm wll average the transformed ratngs of those M neghbors and termnate. Neghbors of lower predcton errors were consdered frst, to ensure that the best M predctors were used. If m s greater than 1, then these M predctors could be dstrbuted over 2 or more dstance levels. Note that the modfed algorthm s equvalent to the orgnal when m = 1 and M = Item-Item (ITEM) Each of the nearest-neghbor algorthms dscussed so far fnds users who have rated the actve tem and are smlar to the actve user. An alternate approach s to fnd tems rated by the actve user that are smlar to the actve tem. Sarwar et al. [23] proposed several dfferent algorthms that used smlartes between tems, rather than users, to compute predctons. These algorthms all assume that the actve user s ratngs for tems related to the actve tem are a good ndcaton of the actve user s preference for the actve tem. Of the algorthms proposed by Sarwar et al., we only mplemented adjusted cosne smlarty, the algorthm Sarwar et al. [23] found to be most accurate; here were refer to t as Item-Item, snce t s the only algorthm we tested that computes smlartes between tems. In ths algorthm, the cosne of the angle between the tem ratng vectors s computed, after adjustng each ratng by subtractng the ratng user s mean ratng. Specfcally, sm(, j) = v U u U U j ( r v, ( r r ) v 2 r )( r u j w U j r ) ( r u w, j r ) w 2 Note that unlke Pearson r Correlaton, means are taken over all ratngs for a user or tem, not a subset of ratngs shared wth any other user or tem. We found t helpful to adjust smlarty weghts based on the number of users n common, f the number of common users was below a certan threshold: max( γ, U U j ) sm (, j) = sm(, j) γ 13/38

14 The predcted ratng for a gven user u and tem s computed usng a neghborhood of tems J I u consstng of the k tems rated by u that are most smlar to. If there are fewer than k tems wth postve smlarty to, then just those are used. p = r + j J sm, j)( r j J ( j sm (, j) r ) j Probablstc Algorthms An alternate approach to the nearest neghbor methods s to learn a probablstc model of the data, and use ths model to predct ratngs. Probablstc algorthms tend to have more drect mathematcal justfcaton than nearest neghbor methods, gven ther assumptons of user behavor. Probablstc algorthms have also been referred to as model-based algorthms [4,12,18], but we prefer the term probablstc algorthms. Nearest neghbor methods, such as the Hortng algorthm, may buld models as well, f only to represent neghbors Bayesan Clusterng (BC) Breese et al. [1998] proposed a smple probablstc model for collaboratve flterng, based on the assumpton that there are dstnct groups of users, each wth farly homogeneous taste throughout. For example, types of users who watch moves mght nclude those who love acton moves, those who love romantc comedes, those who love art flms, and so on. Usng machne learnng methods, these dfferent user groups can be learned automatcally from the data. Then, n order to predct the ratng for a partcular user on a partcular move, we could smply average each user group s mean ratng for that move, weghted by the probablty that ths partcular user s a member of that group. We mplemented the proposed Bayesan clusterng algorthm as a naïve Bayes classfer, where each tem ratng s condtonally ndependent gven user class, a hdden varable representng the user s preference type 3. In ths model, we store each probablty that a user of a gven class wll assgn a gven ratng to a gven tem. Wth the applcaton of Bayes rule, these probabltes are also suffcent to determne the probablty that a user s a member of a gven class. Note that ths s only one of several probablstc clusterng models that have been proposed for collaboratve flterng; for other models, see [27] and [11,12]. These probabltes are learned from the tranng data usng a gradent ascent approach wth a fxed number of teratons. Frst, the model s randomly ntalzed. Then n each teraton, each user s assgned to the most probable class based on prevously rated tems. Snce the membershp of each user class may have changed, user class probablty dstrbutons must be recomputed. Of course, once the user class probablty dstrbutons have changed, some users may no longer be n ther most probable class, so the process repeats. The predcted ratng for an tem s an average of the expected values for each preference class multpled by the probablty that the actve user s a member of that class. 3 See [Mtchell 1997] for a more thorough explanaton of Naïve Bayes classfers and tranng through gradent ascent. 14/38

15 Bayesan Network (BN) Breese et al. [1998] proposed usng Bayesan networks for collaboratve flterng. Each node n ths model s a categorcal varable representng an tem, whose states cover every legal ratng and No Ratng. The ncluson of a No Ratng state allows the model to be learned wth complete data even f no user has rated every tem. The probablty dstrbuton for each tem s modeled by a decson tree. We used the Mcrosoft Research s WnMne Toolkt to buld all of our Bayesan networks [6]. For generatng a predcton, all nodes n the network are nstantated wth the ratngs or lack thereof for the actve user. The probablty of the No Ratng state s clamped to zero for the tem n queston and the probablty dstrbuton over all legal ratngs s generated usng Markovblanket nference [4]. In Markov-blanket nference, the probablty that a gven varable has a gven state s dependent on ts parents (varables n the gven varable s decson tree), ts chldren (varables n whose decson trees the gven varable appears), and ts chldren s parents. The predcted ratng s the resultng expected value. Unlke all prevously dscussed algorthms, the Bayesan Network algorthm drectly assumes that a mssng ratng (marked by the No Ratng state) s an ndcaton of preference (negatve preference n ths case). Ths s an nterestng approach, wth some logc behnd t the fact that you haven t watched a certan move, for example, could ndcate that you wouldn t be nterested n watchng other, smlar moves. Ths algorthm makes some addtonal assumptons regardng how user preferences may be effectvely modeled: t assumes that each dfferent ratng for an tem represents a dstnct preference class (e.g. the algorthm doesn t know that 4 s closer to 5 than 1), that a user s ratng for any gven tem depends only on that user s ratngs of a few specfc tems n the dataset, and that ths dependence can be effectvely represented usng decson trees Contnuous Bayesan Network (CBN) One of the weaknesses of the Bayesan Network algorthm used by Breese et al. [1998] s that t treats the tranng data as classfcaton examples rather than numercal ratngs. The decson trees t bulds depend on havng many tranng examples wth dentcal ratngs of several tems n order to buld the probablty dstrbuton at each leaf. It s dffcult, however, to fnd many users who have gven three or four tems dentcal ratng values, and thus most of the splts n the decson trees are on the No Ratng state (about 97% n models bult from our EachMove dataset, descrbed n Secton ). In other words, users predctons are based largely on what they choose to rate, gnorng most of the actual ratngs gven. Whle the resultng model may be nterestng to analyze, snce t shows many smple relatonshps between tems, t fals to take advantage of much of the nformaton n the orgnal data. An alternatve approach s to represent each tem ratng as a numercal, not categorcal, varable n the network. To do ths, we represented each tem s ratng as a bnary Gaussan varable, ether havng the value No Ratng, or a real number representng an offset from a user s mean ratng. The model s traned not on the raw ratngs themselves (.e., r, ), but on each ratng s offset from ts user s mean ratng (.e., r ru ). Ths algorthm has two advantages: frst, t works wth normalzed ratngs rather than raw ratngs, to take nto account dfferences between users ratng dstrbutons; second, t treats ratngs as nterrelated numbers, rather than dstnct classes. We call ths modfed algorthm Contnuous Bayesan Network, snce t closely 15/38 u

16 resembles the Bayesan Network algorthm n assumptons and mplementaton, but represents user ratngs as contnuous varables. Usng ths revsed algorthm on the EachMove dataset, we found that fewer than 55% of the decson tree splts were on No Ratng Personalty Dagnoss (PD) The Personalty Dagnoss algorthm works on the assumpton that the actve user has the same true preferences as some other user, though the observed ratngs may dffer by Gaussan nose [18]. Unque to ths algorthm s the dea that users have true ratngs for each move wth dfferng observed ratngs due to temporary moods and mpulses. Ths algorthm s also unque n that t uses both a probablstc approach and a nearest-neghbor framework: though t never computes a neghborhood drectly, t does compute smlartes and perform a weghted average over all ratngs for the tem. We nclude t among other probablstc algorthms because of the methods t uses for computng the smlarty. The smlarty between the actve user u and some neghbor v s the probablty that u s true ratngs are dentcal to v s observed ratngs. Ths s farly straghtforward to compute gven the assumpton that observed ratngs dffer from true ratngs accordng to Gaussan nose wth some varance 2. To predct a ratng for the actve user on the actve tem, the probablty of each vald ratng value s computed by summng the probabltes of all users who have gven that ratng value to the actve tem. The predcted ratng value s the one wth the hghest probablty, not the expected value, as n other probablstc approaches Other Algorthms There are algorthms we dd not fully mplement and nvestgate. Some were omtted due to tme restrants, others because they clamed no mprovement n predctve accuracy on explct ratngs data. These algorthms nclude RecTree[5], Dependency Networks[8], Egentaste [7], Sngular Value Decomposton [22], and Probablstc Latent Semantc Analyss [12]. 3.2 Expermental Methods Used n the Emprcal Study Datasets We performed our experments on three dfferent datasets n order to cover some of the varaton present n dfferent collaboratve flterng systems. The datasets we selected were those that were most avalable and most commonly used by other researchers. They are summarzed n Table 2, and descrbed n further detal n the subsectons that follow. Name Users Items Ratngs Ratngs/User Densty Ratng Scale EachMove 6,185 1, , to 5, nteger Sparse EachMove 12,144 1, , to 5, nteger Jester 17, , to 10.00, real Table 2: Summary of datasets ncluded n our nvestgaton. 16/38

17 EachMove One of the largest datasets of explct user preferences s EachMove, a move ratng database collected over a perod of 18 months by the Compaq corporaton [16]. EachMove contans the ratngs of approxmately 60,000 users for a set of 1,800 moves, 2.8 mllon ratngs n all, or an average of 46 ratngs per user. The ratng scale ranges from 0 to 5. Each ratng also has a weght assocated wth t, whch ndcates whether the user saw the move or merely clcked a button readng, That looks awful. In our analyss, we only used ratngs for moves the user actually saw, reducng the average number of ratngs per user to Fnally, we used a subset consstng of all ratngs for a random 10% of the users. We used a representatve sample of the entre dataset to enable us to analyze and cross-valdate more algorthm varants n the tmelne of our project wth the computaton power avalable Sparse EachMove To better study the effects of sparsty on each algorthm, we artfcally ncreased the sparsty of a subset of EachMove data by selectng a random 50% of the ratngs of a random 20% of the users n the complete EachMove dataset. The resultng dataset has approxmately the same number of ratngs as the frst, but spread out over twce as many users, resultng n half the densty. As before, only explct ratngs were ncluded Jester The Jester dataset conssts of ratngs on 100 jokes by almost 18,000 users, over 900,000 ratngs n all [7]. The ratngs scale goes from to , wth ncrements of 0.01, yeldng 200 dstnct possble ratngs n all. Each joke was rated mmedately after beng read by the user, usng an mage map wth one extreme representng strong lkng and the other strong dslke. Before recevng any recommendatons, all users were requred to frst rate a gauge set of 10 jokes. For a collaboratve flterng dataset, ths dataset s exceptonally dense. Snce t takes farly lttle tme to read and rate a joke, and snce only 100 jokes are avalable, a user could rate every joke n less than an hour. The average user actually rated about 50 jokes, but 50% densty s much hgher than the 2.6% densty for EachMove. Whle mportng the ratngs nto our database, we dscovered that over 900 of the ratngs were outsde the allowed range of to On the advce of Goldberg [2003], these ratngs were removed and not consdered n our experments. The almost-contnuous nature of these ratngs could present dffcultes for some algorthms. Bayesan Clusterng and Bayesan Network all buld models that compute the probabltes of a sub-populaton of users assgnng each dscrete ratng to a gven move. For a dataset wth 200 dscrete ratngs ths s mpractcal: each probablty would be very low, and many would be zero. The best way to use these algorthms would be to group the ratngs nto a smaller number of ratng ranges, and then use the algorthms on the ranges rather than the raw ratngs. Due to the tme requred to adapt these algorthms and select the optmal ratng ranges, as well as ther poor performance on the move datasets, we chose not to test these algorthms on the Jester dataset. The Hortng and Personalty Dagnoss algorthms were also desgned to work on dscrete data, but were ncluded n the experment anyway. Specfcally, the Hortng algorthm uses an nteger offset t for normalzng one user s ratngs wth respect to another s; f we were to let t be 17/38

18 fractonal nstead, perhaps ncreased fneness would mprove ths algorthms performance on the dataset. The Personalty Dagnoss computes the probablty that a user s ratng s each nteger, and recommends the nteger ratng wth the hghest probablty. Ths method thus ntroduces artfcal coarseness when used on almost contnuous data, and mght perform better f t returned the expected value nstead Metrcs Here we descrbe brefly our choce of evaluaton metrc. A complete dscusson on approprate metrcs for collaboratve flterng s beyond the scope of ths artcle. We refer readers to [10], for a substantal dscusson on the topc of evaluaton of collaboratve flterng systems. In ths experment, we appled two varants of the most popular accuracy metrc mean absolute error (MAE). MAE measures how close predcted ratngs are to the true ratngs. For a set of ratngs n the test set, T, MAE s defned as follows: MAE = p (,, ), r u r T u T There are usually some ratngs n T for whch a gven algorthm s unable to furnsh a predcton. For example, when usng the Pearson r Correlaton predctve algorthm, f none of the users who rated the actve tem had rated any tems n common wth the actve user, no predcton can be computed. In ths stuaton, most scentsts choose to smply remove that predcton from consderaton, so that t doesn t affect the MAE of that algorthm; however n the extremes, ths can lead to algorthms that avod makng errors by never predctng unless evdence s overwhelmng. We have assumed n ths study that hgh coverage the percentage of ratngs for whch an algorthm can supply a predcton s mportant. Thus we requre that every algorthm provde a predcton for every tem. To acheve ths goal, whenever an algorthm cannot produce a predcton (usually because the computaton does not consder all the data n the dataset), we nstead supply the populaton average the Adjusted Mean User Ratng. To dstngush ths evaluaton approach from the tradtonal one, we refer to t as Augmented MAE, snce t computes the Mean Absolute Error of a gven algorthm after extendng ts coverage wth an alternate algorthm. We also mplemented the more tradtonal approach of omttng faled predctons from the average whch we contnue to reference as MAE. In the extreme case, f an algorthm refused to produce any predctons, ts Augmented MAE would be equal to that of the Adjust Mean User Ratng algorthm, whle ts MAE would be undefned. Another consderaton s that we can only evaluate the accuracy for the actve user on tems that the actve user has provded a ratng. Thus, t s possble that the tem wth the hghest predcted ratng may not be consdered n the evaluaton because we do not have the actve user s true ratng to compare aganst. Ths weakness wll exst n any offlne experment where users have not rated all tems Tunng Algorthm Parameters One of the challenges of comparng so many dfferent algorthms was ensurng that they were performng as well as reasonably possble. Ths requred both correct mplementaton and careful tunng of parameters for each algorthm. Ensurng correct mplementaton can be very challengng, snce a mnor bug or oversght could easly dsadvantage an algorthm n a slght, 18/38

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Adapting Ratings in Memory-Based Collaborative Filtering using Linear Regression

Adapting Ratings in Memory-Based Collaborative Filtering using Linear Regression Adaptng Ratngs n Memory-Based Collaboratve Flterng usng Lnear Regresson Jérôme Kunegs, Şahn Albayrak Technsche Unverstät Berln DAI-Labor Franklnstraße 8/9 10587 Berln, Germany {kunegs,sahn.albayrak}@da-labor.de

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Assessing the Value of Unrated Items in Collaborative Filtering

Assessing the Value of Unrated Items in Collaborative Filtering Assessng the Value of Unrated Items n Collaboratve Flterng Jérôme Kunegs, Andreas Lommatzsch, Martn Mehltz, Şahn Albayrak Technsche Unverstät Berln DAI-Labor Ernst-Reuter-Platz 7, 0587 Berln, Germany {kunegs,andreas.lommatzsch,martn.mehltz,sahn.albayrak}@da-labor.de

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Comparing High-Order Boolean Features

Comparing High-Order Boolean Features Brgham Young Unversty BYU cholarsarchve All Faculty Publcatons 2005-07-0 Comparng Hgh-Order Boolean Features Adam Drake adam_drake@yahoo.com Dan A. Ventura ventura@cs.byu.edu Follow ths and addtonal works

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Greedy Technique - Definition

Greedy Technique - Definition Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

A Novel Distributed Collaborative Filtering Algorithm and Its Implementation on P2P Overlay Network*

A Novel Distributed Collaborative Filtering Algorithm and Its Implementation on P2P Overlay Network* A Novel Dstrbuted Collaboratve Flterng Algorthm and Its Implementaton on P2P Overlay Network* Peng Han, Bo Xe, Fan Yang, Jajun Wang, and Rumn Shen Department of Computer Scence and Engneerng, Shangha Jao

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information