Refine initially: query Refining searches Commonly, query epansion add synonyms Improve recall Hurt precision? Sometimes done automatically Modify based on pri searches Not automatic All pri searches vs your pri searches Eample: Yahoo Google does too Refining after search Use user feedback Approimate feedback with first results Pseudo-feedback Eample: Yahoo assist change ranking of current results search again with modified query Eplicit user feedback User must participate User marks (some) relevant results User changes der of results Pros and cons? 4 Eplicit user feedback User must participate User marks (some) relevant results User changes der of results Can be me nuanced than relevant not Can be less accurate than relevant not Eample: User moves th item to first says th better than first 9 Does not say which, if any, of first 9 relevant User feedback in classic vect model User marks top p documents f relevance p = to typical Construct new weights f terms in query vect Modifies query Could use just on initial results to re-rank 6
Deriving new query f vect model F collection C of n doc.s Let C r denote set all relevant docs in collection, Perfect knowledge Goal: Vect q opt = / C r * (sum of all vects d j in C r ) - /(n- C r ) * (sum of all vects d k not in C r ) centroids 7 Deriving new query f vect model: Rocchio algithm Give query q and relevance judgments f a subset of retrieved docs Let D r denote set of docs judged relevant Let D nr denote set of docs judged not relevant Modified query: Vect q new = αq + β/ D r * (sum of all vects d j in D r ) - γ/( D nr ) * (sum of all vects d k in D nr ) F tunable weights α, β, γ 8 Remarks on new query α: imptance iginal query β: imptance effect of terms in relevant docs γ: imptance effect of terms in docs not relevant Usually terms of docs not relevant are least imptant Reasonable values α=, β=.7, γ=. Reweighting terms leads to long queries Many me non-zero elements in query vect q new Can reweight only most imptant (frequent?) terms Most useful to improve recall Users don t like: wk + wait f new results 9 Simple eample user feedback in vect model q = (,,,) Relevant: d = (,,,) d = (,,,) Not relevant: d=(,,,) α, β, γ = q new = (,,,) + (, /,, ) - (,,,) = (, /,, ) Term weights change New term Observe: Can get negative weights Re-ranking Eample - status? Google eperiment: only affects repeat of same search learned in class is now SearchWiki feature f Google accounts Algithms usually based on machine learning Learn ranking function that best matches partial ranking given Comparing rankings Kendall Tau measure compares two derings of the same set of n objects: Let A = # pairs whose ders agree in the two derings Let Ι = # pairs whose ders disagree in the two dering = # inversions Kendall s Tau (der, der) = (A- Ι)/(A+ Ι) Ordinal crelation = since A+Ι = /(n)(n-) Ι (¼(n)(n-) )
Implicit user feedback Click-throughs Use as relevance judgment Use as reranking: When click result, moves it ahead of all results didn t click that come befe it Problems? Better? Single user feedback vs group Compare Recommender Systems Items Users Recommend Items to Users Recommend new items based on similarity to items that: User liked in past: Content-based Liked by other users similar to this user: Collabative Filtering just liked by other users - easier case Documents matching search = items? 4 Recommender System attributes Need eplicit implicit ratings by user Purchase is / rating Movie tickets Books Have focused categy eamples: music, courses, restaurants hard to cross categies with content-based easier to cross categies with collabative-based users share tastes across categies? Content-based recommendation Items must have characteristics user values item values characteristics of item model each item as vect of weights of characteristics much like vect-based IR user can give eplicit preferences f certain characteristics 6 Content-based eample user bought book and book what if actually rated? Average books bought = (,,., ) Sce new books dot product gives: sce(a) =.; sce (B)= decide threshhold f recommendation book book new book A new book B st person romance. mystery sci-fi. 7 Eample with eplicit user preferences How use sces of books bought? Try: preference vect p where component k = user pref f characteristic k if avg. comp. k of books bought when user pref = pref f user = don t care p=(,,., -) New sces? p A =. p B = user pref book book new A new B st per rom. mys sci-fi -. 8
Content-based: issues Vect-based one alternative Maj alternatives based on machine-learning F vect based how build a preference vect how combined vects f items rated by user our eample only / rating how include eplicit user preferences what metric use f similarity between new items and preference vect nmalization threshold? 9 Limitations of Content-based Can only recommend items similar to those user rated highly New users Insufficient number of rated items Only consider features eplicitly associated with items Do not include attributes of user Collabative Filtering Recommend new items liked by other users similar to this user need items already rated by user and other users don t need characteristics of items each rating by individual user becomes characteristic Can combine with item characterisitics hybrid content/collabative Method types (see Adomavicius and Tuzhilin paper) Memy-Based Similar to vect model Use (user item) matri Use similarity function Prediction based on previously rated items Model-Based Machine-learning methods Model of probabilities of (users items) Memy-Based: Preliminaries Notation r(u,i) = rating of i th item by user u I u = set of items rated by user u I u,v = set of items rated by both users u and v U i,j = set of users that rated items i and j Adjust scales f user differences Use average rating by user u: r avg u = (/ I u ) * r(u,i) i in I u Adjusted ratings: r adj (u,i) = r(u,i) - r u avg One Memy-Based method: User Similarities similarity between users u and v Pearson crelation coefficient (r adj (u,i)*r adj (v, i) ) i in I u,v sim(u,v) = ( (r adj (u,i)) * (r adj (v, i)) ) ½ i in I u,v i in I u,v 4 4
One Memy-Based Method: Item Similarities similarity between items i and j vect of ratings of users in U i,j cosine measure using adjusted ratings sim(i,j) = (r adj (u,i)*r adj (u, j) ) u in U i,j ( (r adj (u,i)) (r adj (u, j)) ) ½ u in U i,j u in U i,j Predicting User s rating of new item: User-based F item i not rated by user u r pred (u,i) = r u avg + (sim(u,v)*r adj (v, i)) v in S sim(u,v) v in S S can be all users just users most similar to u 6 Predicting User s rating of new item: Item-based F item i not rated by user u r item-pred (u,i) = (sim(i,j)*r(u, j)) j in T sim(i,j) j in T T can be all items just items most similar to i Prediction uses only u s ratings, but similarity uses other users ratings 7 user Collabative filtering eample ratings adj. user ratings user user user user 4 user user user user 4 book 4 book book book - - - book book - book 4? book 4 -? 8 Collabative filtering eample sim(u,u4) = (6+)/(*8) / =.894 sim(u,u4) = (-)/(*4) / = -.447 sim(u,u4) = (+)/(*8) / = predict r(u4, book4) = + (-)*.894 +*(-.447) + *.894 +.447 + = -.9 Limitations May not have enough ratings f new users New items may not be rated by enough users Need critical mass of users All similarities based on user ratings 9
Recommendation techiques and search Content-based query refinement with user feedback item document characteristic term user preferences initial query user rating of relevance ratings f previous items initial results Recommendation techiques and search: Collabative filtering analogy with product recommendation? users behavi on same search - i.e. same query item search result rating clicked/not clicked on predict whether user will click on based on behavi of similar users user similarity based on what have both clicked on f this search me general predictions of best results based on notions of user similarity hybrid content and collabation 6