Adapting Ratings in Memory-Based Collaborative Filtering using Linear Regression

Adaptng Ratngs n Memory-Based Collaboratve Flterng usng Lnear Regresson Jérôme Kunegs, Şahn Albayrak Technsche Unverstät Berln DAI-Labor Franklnstraße 8/9 10587 Berln, Germany {kunegs,sahn.albayrak}@da-labor.de Abstract We show that the standard memory-based collaboratve flterng ratng predcton algorthm usng the earson correlaton can be mproved by adaptng user ratngs usng lnear regresson. We compare several varants of the memory-based predcton algorthm wth and wthout adaptng the ratngs. We show that n two wellknown publcly avalable ratng datasets, the mean absolute error and the root mean squared error are reduced by as much as 0% n all varants of the algorthm tested. 1. Introducton In collaboratve flterng systems, users are asked to rate tems they encounter. These tems can be documents to read, moves, songs, etc. Ratngs can be gven by users explctly such as wth the fve star scale used by some webstes or can be collected mplctly by montorng the users actons such as recordng the number of tmes a user has lstened to a song. Once enough ratngs are known to the system, one wants to predct the ratng a user would gve to an tem he has not rated. Ths can be useful for mplementng recommender systems: Fnd tems a user has not seen that he would rate postvely. The ratngs collected by the system are usually n the form of sngle numercal values representng ratngs assocated to a user-tem par. The database of ratngs gven s typcally sparse snce each user rates only a small part of all tems. Ratng predcton algorthms take a user and an tem the user has not rated as well as a database of ratngs as nput and output the ratng the user would gve f he had rated t. Collaboratve flterng algorthms are usually dvded nto memory-based and model-based algorthms [, 11]. Memory-based algorthms work drectly on the ratng database whereas model-based algorthms frst transform the ratng matrx to a condensed format wth the goal of representng the essental nformaton about the ratngs. Such model-based algorthms must preprocess the data nto a workable format, whch s often an expensve operaton. The resultng compact model then allows the actual predctons to be calculated quckly. Memory-based predcton algorthms can be made faster by consderng only a subset of the avalable ratngs [4, 5]. Ths paper wll analyze a varant of a well known memory-based collaboratve flterng algorthm: the earson correlaton-based weghted mean of user ratngs. Ths algorthm was already descrbed n GroupLens [1], one of the frst collaboratve flterng systems. The algorthm varant descrbed n ths paper can be found n [3] where t s alluded to but not analyzed further. aper overvew. Secton gves the mathematcal defntons used n ths paper. Secton 3 presents a small example ratng matrx showng how a predcton s made by comparng user ratngs. Secton 4 descrbes the standard memory-based ratng predcton algorthm and some of ts varants that wll be used later. Secton 5 ntroduces the modfcaton to that algorthm alluded to n [3] and dscusses the performance of the modfed algorthm. Secton 6 evaluates all algorthms presented, and Secton 7 concludes the analyss.. Defntons Let U = {U 1, U,..., U m } be the set of users and I = {I 1, I,...,I n } the set of tems. Let I be the set of tems rated by user U and U j

the set of users that rated tem I j. Let R be the sparse ratng matrx, where r j s user U s ratng of tem I j f present or s undefned otherwse. r s the mean of user U s ratngs. If the mean apples to a subset of a user s ratngs, ths wll be mentoned n the text. A ratng s always calculated for a specfc user and a specfc tem. These wll be called the actve user and the actve tem. Wthout loss of generalty we wll assume the actve user s U 1 and the actve tem s I 1 Thus r 11 s undefned and must be predcted. We wll call the predcton r 11. The range of possble ratngs vares from dataset to dataset. In ths paper they wll be scaled to the range [ 1, +1], n order for the accuracy of predctons to be comparable across the datasets. redctng a ratng of 7 nstead of 9 on scale from 0 to 10 s better than beng off by one pont n a system havng only the three possble ratngs 1, 0 and +1. 3. Example We wll now gve as an example a small ratng matrx. Users are U 1 to U 3 and tems are I 1 to I 5. Ratngs are +1, 1 or undefned. For smplcty, no ratngs between 1 and +1 are ncluded. Ths corresponds to a system where users can only rate tems as good or bad. The ratng matrx can be seen n Table 1. Table 1. The ratng (U 1, I 1 ) s undefned and must be predcted. I 1 I I 3 I 4 I 5 U 1? +1 +1 +1 1 U 1 +1 +1 1 U 3 +1 1 1 1 Comparng U 1 wth U, we observe that the two users have gven the same ratngs to the tems they have both rated. Both users ratngs correlate postvely. The comparson of U 1 wth U 3 s less clear-cut: The users agree on one tem and dsagree on two tems. The correlaton between users U 1 and U 3 s therefore negatve. The ratng gven to tem I 1 s negatve for U and postve for U 3. A ratng predcton algorthm should therefore predct a somewhat negatve ratng for the par (U 1, I 1 ). 4. Related Work The standard algorthm for predctng ratngs [, 6] s the so-called memory-based predcton usng the earson correlaton. It conssts of searchng other users that have rated the actve tem, and calculatng the weghted mean of ther ratngs of the actve tem. Let w(a, b) be a weghtng functon dependng on users U a and U b s ratngs, then we predct r 11 by: r 11 = ( ) 1 w(, 1) w(, 1)r 1 (1) where the sums are over all users that have rated tem I 1 and have also rated tems n common wth user U 1. The weght w(, 1) must be hgh f users U and U 1 are smlar and low of they are dfferent. A functon fulfllng ths s the earson correlaton between the two users ratngs [6]: It s 1 when the ratngs of both users correlate perfectly, zero when they don t correlate and negatve when they correlate negatvely. The correlaton between both users ratngs s calculated by consderng the ratngs of tems they have both rated: j w(a, b) = (r aj r a )(r bj r b ) j (r aj r a ) () j (r bj r b ) where the sums are taken over I ab = I a I b, the set of tems rated by both users. r a and r b are the mean ratngs for users U a and U b taken over I ab. The predcton s not defned when the sum of correlatons s zero. 4.1. Varatons Many varatons of the basc memory-based predcton formula exst [, 3, 9, 1]. Ths subsecton presents those that wll be used n the evaluaton of ths paper: 4.1.1. Default Votng In Equaton () the correlaton s calculated over I ab = I a I b, all tems rated by both users. A varaton presented n [] s to calculate the correlaton over all tems rated by at least one user. For the mssng ratngs a default value s used. Emprcally, the best default value was determned to be zero. The modfed correlaton becomes w 0 j (a, b) = (r0 aj r a)(rbj 0 r b) j (r0 aj r a) (3) j (r0 bj r b)

where the sums are over I 0 ab = I a I b and r 0 j = r j when r j s defned and 0 otherwse. r a s taken over I 0 ab. 4.1.. Weght Factor The weght factor varant conssts of multplyng the correlaton wth a weght that depends on the number of tems rated n common by both users. Two varatons are used: w n (a, b) = n w(a, b) (4) w n (a, b) = n w(a, b) (5) In [6] a smlar technque s used, where the factor n s capped at an arbtrary value of 50 common ratngs. 4.. Other Approaches Many approaches other than a weghted mean of ratngs exst [1, ] such as prncpal component analyss (CA) [5], latent semantc analyss [8, 10] and probablstc models [14]. They have n common a hgh complexty and runtme. Another varaton s to only consder a subset of avalable ratngs [4, 5], wth the goal of reducng the runtme of the actual predcton algorthm and possbly mprovng the predctons as only smlar users are consdered. Ths method can be combned wth most predcton methods. 5. Adaptng Ratngs usng Lnear Regresson Why use the correlaton as a weght nstead of just the nner product of both users ratng vectors? Because the mean of the users ratngs may not be zero, and the standard devatons may be dfferent from one. Thus, the correlaton s used because one expects each user to have hs own ratng habts. For nstance some users may only use part of the avalable ratng scale, whle others only gve the hghest and lowest possble ratngs. Also, the mean ratng may vary from user to user. Therefore takng the mean of other users ratngs may not be optmal. Instead of averagng over other users ratngs we should average over other users ratngs adapted to the current user s ratngs. We take Equaton (1) and replace user U s ratng r 1 wth r ( 1)1, user U s ratng of tem I 1 adapted to user U 1 s ratngs. ( ) 1 r 11 = w(, 1) w(, 1)r ( 1)1 (6) We wll assume there s an affne relatonshp between two users ratngs and set: r ( 1)1 = αr 1 + β where α and β depend on the user par (U 1, U ). These factors must be chosen n way that mnmzes the error made by the transformaton on exstng ratngs. The error s defned n lnear regresson as the sum of squared errors for each tem. For tem I j, the error s ε j : r 1j = αr j + β + ε j The total error s then j ε j. The value of the factors mnmzng these errors can be found by performng lnear regresson. As before there are two varatons: Use only tems rated by both, or use tems rated by any user and fll mssng values wth zero. Let X and Y be the column vectors contanng the ratngs of users U and U 1 respectvely. We defne the two-column vector X = ( X 1 ) as contanng the varables subject to regresson. The factors α and β are then gven by: ( α = ( β) X T 1 X) XT Y (7) In the followng, the sums are over the lnes of X and Y, and n = 1 s the number of tems. We have: ( α β) = ( X T X) 1 XT Y = = = 5.1. Runtme ( ) x 1 x XT Y x n 1 n x ( x) 1 n x ( x) ( ) n x x x X T Y ( j [y j (nx j ) [ x)] ( yj x )] x j x j The regresson factors α and β can be calculated n two passes over the vectors X and Y. In the frst pass n, x and x are calculated. The second pass performs the outer sums over j. Calculatng the correlaton usually takes two passes, where the frst s used to calculate the mean ratngs, and the second to calculate the correlaton tself. Therefore, the frst passes of both calculatons can be merged, resultng n a threepass algorthm. The adapted algorthm needs three passes nstead of two, ncreasng the runtme for ths part of the algorthms by half, not changng the runtme class of the 3

algorthm. In the case where only a part of the dataset s analyzed, ths adaptaton can be used as well wthout sgnfcantly ncreasng the total runtme. 6. Evaluaton We test the proposed varaton of the memory-based earson correlaton-based ratng predcton algorthm by runnng t on two datasets, usng twelve varatons of the algorthm (of whch sx use the new method), and calculate two error measures. The tests follow the procedure descrbed n [7]: A ratng s chosen at random from the corpus. It s then removed, and an algorthm s used to predct the ratng, usng all other ratngs as nput. The ratng s then compared to the predcton. The corpora used are MoveLens 1 and Jester. MoveLens contans 75, 700 ratngs of 1, 543 moves by 943 users. MoveLens ratngs are ntegers between 1 and 5. The ratng matrx s flled to about 5%. Jester contans 617, 000 ratngs of 100 jokes by 4, 900 users. Jester ratngs range from 10 to +10 wth a granularty of 0.01. The ratng matrx s flled to about 5%. In order to compare the test results on both datasets, we gnored any of the addtonal move nformaton provded by MoveLens such as move genres. The two error measures used are those descrbed n [7]: Mean average error (MAE): The mean dfference between the ratng and the predcton [13, 7]. Root mean squared error (RMSE): The square root of the mean of squared dfferences between the ratngs and the predctons [7]. Let (U a(), I b() ) be the user-tem par n test run for {1,..., n}, then the error measures are defned as: MAE = 1 r a()b() r a()b() (8) n RMSE = 1 n (r a()b() r a()b() ) (9) For both measures, smaller values ndcate more accurate predctons. These errors are calculated on ratng values scaled to the range [ 1, +1]. Therefore predctng 0 n all cases would gve MAEs and RMSEs not greater than 1. 1 http://movelens.umn.edu/ http://www.eor.berkeley.edu/ goldberg/jester-data/ We used all 1 combnatons of the followng varants of the earson-correlaton memory-based predcton algorthm. The base algorthm wll be called, wth further suffxes to denote varatons. Wth and wthout the adapted ratng (, ) Only use tems rated n common, or fll mssng ratngs wth the default value 0 (, ) Multply the correlaton by a factor of 1, n or n. (, n, n) Gvng the followng twelve algorthms:, n, n, n, n, n, n, n, n For all algorthms, we map predctons greater than +1 to +1 and predctons smaller than 1 to 1. We ran 1, 00 trals for each case. The results are shown n Fgures 1 and. In all cases, the adapted algorthm yelded better predctons than the non-adapted varant. The mean average error decreased by 0.05 to 0.15 unts dependng on the algorthm, and the root mean squared error by 0.10 to 0.15 unts. The relatve accuracy gans on both corpora were dfferent, suggestng that the predcton precson s dependent on the data used. For the MoveLens data, the best algorthm overall for both error measures was n, the adapted meanbased average weghted by the square of common ratngs wthout default ratngs. On the Jester data, the adapted mean algorthms yelded better results, but varyng the other algorthm parameters dd not change the error as much as wth the MoveLens corpus. In general, errors were smaller on the Jester data than on the MoveLens data. 7. Concluson We proposed a modfcaton to the class of memorybased collaboratve ratng predcton algorthms based on the earson correlaton between users. The modfcaton conssts of adaptng the ratngs of other users usng lnear regresson between user pars before averagng them to calculate a predcton. We tested several varatons of the basc algorthm all wth and wthout the modfcaton and found that 4

0.7 wthout adaptaton () wth adaptaton ( ) 0.7 wthout adaptaton () wth adaptaton ( ) MAE RMSE n n n n n n n n Fgure 1. MoveLens test results 0.7 wthout adaptaton () wth adaptaton ( ) 0.7 wthout adaptaton () wth adaptaton ( ) MAE RMSE n n n n n n n n Fgure. Jester test results 5

n all cases, the adaptaton mproved the predcton accuracy. The exact predcton accuracy however was found to be dependent on the dataset analyzed, suggestng addtonal tests are needed usng other datasets. We showed that ths modfcaton does not affect the runtme complexty of the algorthm. In partcular, ths varaton wll be faster to calculate than other methods based on lnear algebrac 3 or probablstc methods. 7.1. Future Work The followng questons reman open and may gude future work on the topc: May t be useful to apply multple lnear regresson on all other users ratngs? Ths mght ncrease accuracy, but also the runtme. It also remans to be seen whether a multple lnear regresson approach mght be mathematcally related to other lnear algebrac methods. On whch datasets are the adapted ratngs not an mprovement? As proposed n Secton 6, performng the tests wth other datasets would be nterestng. Unfortunately, not many ratng datasets are avalable. Snce many varatons of the basc earson correlaton-based algorthm exst [, 3], a comparson wth others of them would be possble. References [1] J. Baslco and T. Hofmann. Unfyng collaboratve and content-based flterng. In ICML 04: roceedngs of the twenty-frst nternatonal conference on Machne learnng, page 9, New York, NY, USA, 004. ACM ress. [] J. S. Breese, D. Heckerman, and C. Kade. Emprcal analyss of predctve algorthms for collaboratve flterng. In Uncertanty n Artfcal Intellgence. roceedngs of the Fourteenth Conference, pages 43 5. Morgan Kaufmann ublshers, 1998. [3] Y. Chuan, X. Jepng, and D. Xaoyong. Recommendaton algorthm combnng the user-based classfed regresson and the tem-based flterng. In ICEC 06: roceedngs of the 8th nternatonal conference on Electronc commerce, pages 574 578, New York, NY, USA, 006. ACM ress. [4] M. Connor and J. Herlocker. Clusterng tems for collaboratve flterng, 001. [5] K. Goldberg, T. Roeder, D. Gupta, and C. erkns. Egentaste: A constant tme collaboratve flterng algorthm. Inf. Retr., 4():133 151, 001. [6] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Redl. An algorthmc framework for performng collaboratve flterng. In SIGIR 99: roceedngs of the nd annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages 30 37, New York, NY, USA, 1999. ACM ress. [7] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Redl. Evaluatng collaboratve flterng recommender systems. ACM Trans. Inf. Syst., (1):5 53, 004. [8] T. Hofmann. Collaboratve flterng va gaussan probablstc latent semantc analyss. In SIGIR 03: roceedngs of the 6th annual nternatonal ACM SI- GIR conference on Research and development n nformaon retreval, pages 59 66, New York, NY, USA, 003. ACM ress. [9] H.-S. Huang and C.-N. Hsu. Smoothng of recommenders ratngs for collaboratve flterng. In roceedngs of the Ffth Conference on Artfcal Intellgence and Applcatons (TAAI-001), KaoHsong, Tawan, November 001. [10] N. Kawamae and K. Takahash. Informaton retreval based on collaboratve flterng wth latent nterest semantc map. In KDD 05: roceedng of the eleventh ACM SIGKDD nternatonal conference on Knowledge dscovery n data mnng, pages 618 63, New York, NY, USA, 005. ACM ress. [11] D. ennock, E. Horvtz, S. Lawrence, and C. L. Gles. Collaboratve flterng by personalty dagnoss: A hybrd memory- and model-based approach. In roceedngs of the 16th Conference on Uncertanty n Artfcal Intellgence, UAI 000, pages 473 480, Stanford, CA, 000. [1]. Resnck, N. Iacovou, M. Suchak,. Bergstorm, and J. Redl. GroupLens: An Open Archtecture for Collaboratve Flterng of Netnews. In roceedngs of ACM 1994 Conference on Computer Supported Cooperatve Work, pages 175 186, Chapel Hll, North Carolna, 1994. ACM. [13] U. Shardanand and. Maes. Socal nformaton flterng: Algorthms for automatng word of mouth. In roceedngs of ACM CHI 95 Conference on Human Factors n Computng Systems, volume 1, pages 10 17, 1995. [14] K. Yu, A. Schwaghofer, V. Tresp, X. Xu, and H.-. Kregel. robablstc memory-based collaboratve flterng. IEEE Transactons on Knowledge and Data Engneerng, 16(1):56 69, 004. 3 Whle lnear regresson uses tools from lnear algebra, t does not nvolve the quadratc (or hgher) complexty assocated wth problems such as SVD decomposton or matrx nverson. 6