Assessing the Value of Unrated Items in Collaborative Filtering

Assessng the Value of Unrated Items n Collaboratve Flterng Jérôme Kunegs, Andreas Lommatzsch, Martn Mehltz, Şahn Albayrak Technsche Unverstät Berln DAI-Labor Ernst-Reuter-Platz 7, 0587 Berln, Germany {kunegs,andreas.lommatzsch,martn.mehltz,sahn.albayrak}@da-labor.de Abstract In collaboratve flterng systems, a common technque s default votng. Unknown ratngs are flled wth a default value to allevate the sparsty of ratng databases. We show that the choce of that default value represents an assumpton about the underlyng predcton algorthm and dataset. In ths paper, we emprcally analyze the effect of a varyng default value of unrated tems on varous memory-based collaboratve ratng predcton algorthms on dfferent ratng corpora, n order to understand the assumptons these algorthms make about the ratng database and to recommend default values for them.. Introducton In collaboratve flterng systems, users rate tems they encounter. These tems can be documents, moves, songs, etc. Ratngs can be gven by users explctly such as wth the fve star scale [7] used by some webstes or can be collected mplctly by montorng the user s actons such as recordng the number of tmes a user has lstened to a song [6]. The task of a collaboratve flterng system usually conssts of recommendng unseen tems that are potentally nterestng to users. To mplement a recommendaton systemm unknown ratngs have to be predcted for user-tem pars. Dependng on the collaboratve flterng system, ratngs can have dfferent meanngs. Sometmes, ratngs represent a grade of the regarded tem on a scale, and there s a neutral ratng value somewhere n the mddle of that scale. In other cases, ratngs are just counts, for nstance the number of tmes a web document was vewed. In these cases, a mssng ratng has varous meanngs: In the frst case, a mssng ratng cannot be nterpreted wth confdence as any specfc value, because the ratng s smply not known. In the second case, a mssng ratng s equvalent to a ratng of zero. Usually, ratng databases are just sparsly flled. For ths reason, many ratng predcton algorthms use default votng: They fll mssng ratngs wth a default value. Ths default value represents an assumpton about the ratng database: The assumpton that mssng ratngs are best modeled by ths specfc value [6, 7]. The type of collaboratve predcton algorthms analyzed n ths paper are memory-based algorthms. They ether work drectly on the ratng database, or on a preprocessed ratng database: the normalzed ratngs. We analyze the mpact of the default value on the accuracy of the resultng predcton algorthm. The emprcal results of the analyss are then nterpreted accordng to the underlyng algorthm and corpus n order to verfy the valdate the assumpton made n the default votng algorthms. The remnder of ths paper s organzed as follows. Secton 2 ntroduces related background and an example case of a ratng database. Secton 3 gves an overvew on common approaches to collaboratve ratng predcton. The algorthms used n the evaluaton of ths paper are presented n detal n Secton 4. Secton 5 presents the results of our evaluaton, and a summary and an outlook on future work concludes ths contrbuton n Secton 6. 2. Defntons Let U = {U,U 2,...,U m } be the set of users and I = {I,I 2,...,I n } the set of tems. Let I be the set of tems rated by user U and U j the set of users that rated tem I j. Let R be the sparse ratng matrx, where r j s user U s ratng of tem I j f present or s undefned otherwse. r s the mean of user U s ratngs. If the mean apples to a subset of a user s ratngs, ths wll be mentoned separately. A ratng s always calculated for a specfc user and a specfc tem. These wll be called the actve user and the actve tem. Wthout loss of generalty we wll assume the actve user s U and the actve tem s I Thus r s undefned and must be predcted. We wll call the predcton

r. The range of possble ratngs vares from dataset to dataset. In ths paper all ratngs wll be scaled to contnuous values n the range [,+], n order for the accuracy of predctons to be comparable across the datasets. Ths scalng s necessary to compare relatve errors of predcton algorthms on dfferent ratng datasets, assumng that the mnmum and maxmum possble ratngs n each dataset are equvalent. Table. A small example ratng database. The ratng (U,I ) s undefned and must be predcted. Ths example has only ratng values of + and but n general, ratngs can take on any value n [,+]. I I 2 I 3 I 4 I 5 U? + + + U 2 + + U 3 + Table shows an example of a ratng database. Ratngs not gven are ndcated by an empty cell. The ratng to predct s ndcated by a queston mark. 3. Related Work Ths secton gves an overvew on collaboratve predcton algorthms. The algorthms used n ths paper wll be descrbed n detal n the next secton. The most popular collaboratve flterng algorthms predct ratngs by averagng over other related ratngs [5]. These memory-based algorthms keep part of the ratng database tself n memory to calculate predctons. In contrast, model-based algorthms frst preprocess the ratng database to a condensed form [9, 4]. Memory-based algorthms can furthermore be classfed nto user-based and tem-based algorthms. User-based algorthms take the user as the prmary entty and regard tems as relatons between users. Item-based algorthms work analoguous usng tems. A varant of memory-based collaboratve flterng conssts of preprocessng the ratng database accordng to a specfc system. One possble methodology, whch s used n ths paper, conssts of normalzng the ratngs: Adaptng them lnearly such that values lke the mean and varance take on the same values for all users or all tems. The thrd approach analyzed n ths paper s defned n [3] and s an alternatve to normalzaton: Instead of adaptng all ratngs to normal values, we adapt them only parwse when comparng two users (or two tems n tembased flterng). The approach descrbed n [3] uses lnear regresson for ths purpose. Other approaches that have been used n collaboratve flterng nclude smple varatons on the methods cted above [3, 5, 2], clusterng and buldng neghborhoods [9], graph-theoretc approaches [, 5, 0, ], lnear algebrac approaches [2, 8, 9], and probablstc methods [20, 8, 7]. 4. Collaboratve Predcton Algorthms Ths secton descrbes the collaboratve predcton algorthms mplemented for ths paper s evaluaton, and gves a very bref overvew of other collaboratve flterng approaches. The smplest approach to predctng ratngs s to take exstng ratngs of the actve tem by other users, and average them. The average s weghted by the smlarty between users. The standard algorthm for predctng ratngs s the memory-based predcton usng the Pearson correlaton. [3, 5] It conssts of searchng other users that have rated the actve tem, and calculatng the weghted mean of ther ratngs of the actve tem. Let w(a,b) be a weghtng functon dependng on users U a and U b s ratngs, then we predct r by: r = ( ) w(, ) w(,)r () The sums are over all users that have rated tem I and have also rated tems n common wth user U. The weght w(,) must be hgh f users U and U are smlar and low of they are dfferent. A functon fulfllng ths s the Pearson correlaton between the two users ratngs [5]: It s when the ratngs of both users correlate perfectly, zero when they don t correlate and negatve when they correlate negatvely. The correlaton between both users ratngs s calculated by consderng the ratngs of tems they have both rated: j w(a, b) = (r aj r a )(r bj r b ) j (r aj r a ) (2) 2 j (r bj r b ) 2 The sums are taken over I ab = I a I b, the set of tems rated by both users. r a and r b are the mean ratngs for users U a and U b taken over I ab. 4.. Default Value Ratng databases are usually sparse, therefore the correlaton between two users may only be based on a small number of tems rated n common. To overcome ths problem, a varant of the weghted mean algorthm bases the

correlaton on any tem that has been rated by at least one of the two users, fllng mssng ratngs. Gven the default ratng value ρ, we defne the correlaton as: w ρ (a,b) = j (rρ aj r a)(r ρ bj r b) j (rρ aj r a) 2 j (rρ bj r b) 2 (3) Wth the sum gong over all tems at least one of both users has rated. r ρ ab s defned as { r ρ ab = rab ρ when r ab s defned otherwse Ths method to overcome the sparsty problem of ratng databases s sometmes called default votng. 4.2. Normalzaton A common addton to the weghted mean ratng predcton s to normalze the ratng database before makng predcton. For each user, the ratngs are scaled lnearly to a mean of zero and unt varance. The predcted values are then scaled back to match the user s orgnal ratngs. Ths approach s justfed by observng that dfferent users have dfferent ratng habts. On a scale from to 5 for nstance, one user may gve the value 5 n half of hs ratngs, whle another may reserve that value to a very few selected tems [5]. Let r a be the mean and σ a the standard devaton of all ratngs of the form r a. We defne the normalzed ratng value r ab : r a = r a r a σ a (4) r ab s then used nstead of r ab n Equaton (2). Snce normalzaton makes the orgnal ratng scale nvsble to the predcton algorthm, t s sutable as an evaluaton method that gves nsght nto the ratng database ndependently of the behavour of specfc users. 4.3. Par Regresson The par regresson-based varant of the weghted mean algorthm [3, 4] consst of adaptng the ratngs of other users ratngs to the ratng scale of the actve. It makes normalzaton unneccessary, and can be thought of as a specal form of user-specfc normalzaton. 4.4. User-based vs Item-based All of the algorthms descrbed above are user-based n that they take the pont of vew of the actve user and compare the ratngs of other users to the actve user s ratngs. In all cases, we can construct an analogous tem-based predcton algorthm by takng the average of other tems ratngs by the actve user. 5. Evaluaton We evaluate the varaton of the default ratng value on sx memory-based collaboratve flterng algorthms. For each algorthm, we mplement both the user-based and the tem-based varant. Usng the two ratng corpora Move- Lens and Jester 2, we calculate two error measures: the mean average error and root mean squared error. 5.. Methodology The evaluaton s done as follows: A user-tem par for whch a ratng s known s chosen at random from the corpus. Ths ratng s removed from the corpus. We then run all algorthms on the remanng corpus, lettng each predct the value of the ratng removed. Ths procedure s repeated a certan number of tmes for each corpus. The number of tests was chosen to depend on the dataset, wth a mnmum of,500 runs. The error measures are those common n evaluatng collaboratve ratng predcton algorthms [3, 4]. Let (U a(),i b() ) be the user-tem par n test run for {,...,n}. The error measures are the mean average error () and the root mean squared error (RMSE): = r a()b() r a()b() (5) n ( ) 2 RMSE = ra()b() r a()b() (6) n The sx algorthms used are: Pu, P: The plan Pearson correlaton-weghted average of other users or tems ratngs. We use both the user-based and the tem-based varant of Equaton (). Nu, N: The same algorthm as Pu and P, but wth normalzed ratngs as defned n Equaton (4). In the userbased case, the ratngs are scaled lnearly to zero mean and unt varance. The tem-based algorthm works analogously usng tems. Ru, R: The Pearson correlaton-weghted mean of other user s or tem s ratngs, where ratngs are adapted to the actve user s ratng scale usng multple lnear regresson as defned n [3]. http://movelens.umn.edu/ 2 http://www.eor.berkeley.edu/ goldberg/jester-data/

The parameter ρ whch s used as the default value for ratngs n all of the algorthms s vared n the nterval [,+] at a granularty dependng on the ratng corpus. The corpora we consdered n our tests are MoveLens and Jester: MoveLens contans 75, 700 ratngs of, 543 moves by 943 users. MoveLens ratngs are ntegers between and 5. The ratng matrx s flled to about 5%. Jester contans 67,000 ratngs of 00 jokes by 24,900 users. Jester ratngs range from 0 to +0 wth a granularty of 0.0. The ratng matrx s flled to about 25%. In order to compare the test results on both datasets, we gnored any of the addtonal move nformaton provded by MoveLens such as move genres. 5.2. Evaluaton Results The test results are shown n Fgure. wth one fgure per algorthm-corpus combnaton. Table 2. summarzes the best values for ρ n functon of the error measure used. There are several nterestng observatons: The smple algorthms Pu/ gve the best result for ρ 0 wth good accuracy. The performance s however not convex, and gets better wth values farther from 0 n some cases. In the case of the MoveLens corpus, we observe a peak n the error values at a default value of about 0.2. Ths peak can also be observed n the R-MoveLens case, although t s less marked. Some combnatons do not show a dependence on ρ: Nu/ (except Nu-MoveLens). Ths result suggests that the performance of ths algorthm s ndependent of the partcular optmzaton nvolvng a default value. We explan ths result by the fact that the normalzaton step wll partly overcome the bas nduced nto the user s ratngs by usng default values far from zero. Ru/ shows mostly convex errors n functon of ρ. The optmal value for ρ s zero or slghtly postve. Ths result goes aganst the suggeston n [5] to use slghtly negatve values for ρ. Comparng the results of the Pu/ and Nu/ algorthms shows that normalzng the ratng values s not always a gan. In all cases the performance wth an optmal ρ s hgher than n the non-normalzed case. 6. Concluson We evaluated three common collaboratve ratng predcton algorthms that use default votng under the varaton of the default ratng parameter whch represents an assumpton about mssng ratng values. Ths ratng value s a parameter of all algorthms, and s used to fll mssng data n ratng databases at dfferent ponts n the ratng predcton algorthm, dependng on the algorthm. We showed that dependng on the choce of ths parameter, the algorthms tested vary hghly n ther accuracy. In contradcton to prevous results, we found out that two out of three algorthms tested performed better when ths value s slghtly postve, whereas prevous mplementatons [5] and publcatons recommended zero or a negatve value. As future work, we suggest the followng tasks: Use more predcton algorthms. Any predcton algorthms that makes an assumpton about default values can potentally be used. Determne the nfluence of the default value on other error measures, especally those that can be appled to recommendaton algorthms and not just ratng predcton algorthm. References [] C. C. Aggarwal, J. L. Wolf, K.-L. Wu, and P. S. Yu. Hortng hatches an egg: a new graph-theoretc approach to collaboratve flterng. In Proc. Int. Conf. on Knowledge Dscovery and Data Mnng, pages 20 22, 999. [2] J. Baslco and T. Hofmann. Unfyng collaboratve and content-based flterng. In Proc. Int. Conf. on Machne learnng, page 9. ACM Press, 2004. [3] J. S. Breese, D. Heckerman, and C. Kade. Emprcal analyss of predctve algorthms for collaboratve flterng. In Proc. Conf. Uncertanty n Artfcal Intellgence, pages 43 52, 998. [4] Y. Chuan, X. Jepng, and D. Xaoyong. Recommendaton algorthm combnng the user-based classfed regresson and the tem-based flterng. In Proc. Int. Conf. on Electronc Commerce, pages 574 578, 2006. [5] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Redl. An algorthmc framework for performng collaboratve flterng. In Proc. Int. ACM SIGIR Conf. on Research and Development n Informaton Retreval, pages 230 237, 999. [6] J. L. Herlocker, J. A. Konstan, and J. Redl. Explanng collaboratve flterng recommendatons. In Computer Supported Cooperatve Work, pages 24 250, 2000. [7] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Redl. Evaluatng collaboratve flterng recommender systems. ACM Trans. Inf. Syst., 22():5 53, 2004. [8] T. Hofmann. Collaboratve flterng va gaussan probablstc latent semantc analyss. In Proc. Int. Conf. on Research and Development n Informaon Retreval, pages 259 266, 2003. [9] T. Hofmann. Latent semantc models for collaboratve flterng. ACM Trans. Inf. Syst., 22():89 5, 2004. [0] Z. Huang, W. Chung, T.-H. Ong, and H. Chen. A graphbased recommender system for dgtal lbrary. In Proc. Int. Conf. on Dgtal Lbrares, pages 65 73, 2002.

MoveLens Pu: RMSE MoveLens Nu: RMSE MoveLens Ru: RMSE - - 0 - - 0 - - 0 MoveLens P: RMSE MoveLens N: RMSE MoveLens R: RMSE - - 0 - - 0 - - 0 Jester Pu: RMSE Jester Nu: RMSE Jester Ru: RMSE - - 0 - - 0 - - 0 Jester P: RMSE Jester N: RMSE Jester R: RMSE - - 0 - - 0 - - 0 Fgure. The results of all tests, dsplayed n functon of the algorthm and of the corpus. Table 2. The optmal ρ n functon of corpus, algorthm and error measure. Corpus Error measure Pu Nu Ru P N R MoveLens RMSE +0,02-0,46 +0,24 +0,02 +0,08 +0,20 +0,02-0,46 +0,48 +0,02 +0,08 +0,44 Jester RMSE +0.0 +0.30-0.0 +0.00 +0 +0.00 +0.0 +0.00-0.0 +0.00-0.0 +0.00

[] Z. Huang and D. D. Zeng. Why does collaboratve flterng work? recommendaton model valdaton and selecton by analyzng bpartte random graphs. In Proc. Workshop of Informaton Technologes and Systems, 2005. [2] R. Jn, L. S, C. Zha, and J. Callan. Collaboratve flterng wth decoupled models for preferences and ratngs. In Proc. Int. Conf. on Informaton and Knowledge Management, pages 309 36, 2003. [3] J. Kunegs and S. Albayrak. Adaptng ratngs n memorybased collaboratve flterng usng lnear regresson. In Proc. Int. Conf. on Informaton Reuse and Integraton, 2007. [4] M. R. McLaughln and J. L. Herlocker. A collaboratve flterng algorthm and evaluaton metrc that accurately model the user experence. In Proc. Int. Conf. on Research and Development n Informaton Retreval, pages 329 336, 2004. [5] B. J. Mrza, B. J. Keller, and N. Ramakrshnan. Studyng recommendaton algorthms by graph analyss. J. of Intellgent Informaton Systems, 20(2):3 60, 200. [6] D. Nchols. Implct ratng and flterng. In Proc. DELOS Workshop on Flterng and Collaboratve Flterng, pages 3 36, 998. [7] D. Pennock, E. Horvtz, S. Lawrence, and C. L. Gles. Collaboratve flterng by personalty dagnoss: A hybrd memory- and model-based approach. In Proc. Conf. on Uncertanty n Artfcal Intellgence, pages 473 480, 2000. [8] L. Ungar and D. Foster. A formal statstcal approach to collaboratve flterng. In Proc. Conf. on Automated Learnng and Dscovery, 998. [9] G.-R. Xue, C. Ln, Q. Yang, W. X, H.-J. Zeng, Y. Yu, and Z. Chen. Scalable collaboratve flterng usng cluster-based smoothng. In Proc. Int. Conf. on Research and Development n Informaton Retreval, pages 4 2, 2005. [20] K. Yu, A. Schwaghofer, V. Tresp, X. Xu, and H.-P. Kregel. Probablstc memory-based collaboratve flterng. IEEE Transactons on Knowledge and Data Engneerng, 6():56 69, 2004.