# Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Save this PDF as:

Size: px
Start display at page:

Download "Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets"

## Transcription

3 On the other hand, within a item-based type of recommendation [5, 12], the target item i is associated with a score ˆr ui I = w ijr uj = w ij, (2) j I j I(u) and hence, the score is proportional to the similarities between item i and other items already purchased by the user u (j I(u)). 2.3 MF and MBCF unified We end the section by noticing that, in both user-based and itembased MBCF, the user and item contributions are decomposed, that is, we can write and ˆr U ui = x u y i, with x u, y i R n, ˆr I ui = x u y i, with x u, y i R m. In other words, similarly to the MF case, we are implicitly defining an embedding for users and items. This embedding is performed onto an m-dimensional space in user based recommendation systems (k = m) or performed onto an n-dimensional space in item based recommendation systems (k = m). 3. SIMILARITIES FOR CF The cosine similarity is undoubtely the most popular measure of correlation between two vectors. An important characteristic of this similarity measure is that it is symmetric and this feature can be important for certain applications. However, in our opinion, there are cases where the simmetry of the similarity measure does not match exactly the features of a domain. As we will see in the following, asymmetries are very common in CF applications. Now, we generalize the cosine similarity and suggest a nice probabilistic interpretation for binary valued vectors. 3.1 Asymmetric Cosine (asymc) Similarity Consider three sets A, B, C such that C = q and assume two relations exist, namely R A A C, and R B B C. Now, let a A and b B, we can define binary q-dimensional vector representations (a {0, 1} q, b {0, 1} q ), for a and b respectively, on the basis of the corresponding relation. Specifically, each position in a and b corresponds to a specific element c C and it is set to 1 whenever (a, c) R A, and (b, c) R B, respectively. Now, we can formally define the conditional probabilities P (a b), that is, the probability that given c C such that (b, c) R B, then (a, c) R A also holds. Similarly, we can define P (b a). These probabilities can be computed with simple vector operations: P (a b) = a b b b and P (b a) = a b a a where a b computes the number of times (a, c) R A and (b, c) R B co-occur. Similarly, a a = a 2 and b b = b 2 computes the number of times (a, c) R A occurs, and the number of times (b, c) R B occurs, respectively. Nicely, with the definition above, the cosine similarity function can be seen as a product of the square roots of the two conditional probabilities: S 1/2 (a, b) := cos(a, b) = a b a b = P (a b) 1 2 P (b a) 1 2 (3) The idea behind the asymmetric cosine similarity is to give asymmetric weights to the conditional probabilities in the formula above: S α(a, b) = P (a b) α P (b a) 1 α = where 0 α 1. a b a 2α b 2(1 α) Using the notation of sets, and setting R(x) = {c C (x, c) R}, then we can simply write: S α(a, b) = RA RB(b) R A α R B(b) 1 α Finally, note that eq. (3) still holds for real valued (non binary) vector representations. Clearly, in this case, the strict probabilistic interpretation cannot be applied but we can still consider P (a b) as a function indicating how much information we can get from knowing (b, c) R B in order to predict whether (a, c) R A. 3.2 User-based and Song-based similarity Cosine and Pearson s are standard measures of correlation in CF applications. To the best of our knowledge not much has been done until now to adapt these similarities to given problems. Our opinion is that it cannot exist a single similarity measure that can fit all possible domains where collaborative filtering is used. To bridge this gap and try to fit different problems, we consider a parametric family of similarities obtained by instanciating the asymmetric cosine both for the user-based and item-based setting. MSD data have not relevance grades since the ratings are binary values. This is a first simplification that, as we have seen, we can exploit in the definition of the similarity functions. In the case of binary rates the cosine similarity can be computed as in the following. Let I(u) be the set of items rated by a generic user u, then the cosine similarity between two users u, v is defined by I(u) I(v) w uv = S 1/2 (u, v) = I(u) 1 2 I(v) 1 2 and, similarly for items, by setting U(i) the set of users which have rated item i, we obtain: U(i) U(j) w ij = S 1/2 (i, j) =. U(i) 1 2 U(j) 1 2 Note that, especially for the item case, we are more interested in computing how likely it is that an item will be appreciated by a user when we already know that the same user likes another item. It is clear that this definition is not symmetric. For example, say we know that a user already listened to the U2 s song Party Girl which we know is not so popular. Then, probably that user is a fan of U2 and we can easily predict that she/he would also like something far more popular by U2, like Sunday Bloody Sunday for instance. Clearly, the opposite is not true. As an extreme alternative to the cosine similarity, we can resort to the conditional probability measure which can be estimated with the following formulas: and w uv = S 1(u, v) = P (u v) = w ij = S 1(i, j) = P (i j) = I(u) I(v) I(v) U(i) U(j) U(j)

4 Previous works (see [8] for example) pointed out that the conditional probability measure of similarity, P (i j), has the limitation that items which are purchased frequently tend to have higher values not because of their co-occurrence frequency but instead because of their popularity. As we have seen, this might not always be a limitation in a recommendation setting like ours. Experimental results in this paper will confirm that, emphasizing asymmetrically one of the conditionals, will help in general. In the following of the paper we will use the parametric generalization of the above similarity measures called asymmetric cosine similarity. The parametrization on α permits ad-hoc optimizations of the similarity function for the domain of interest. In our case, this is done by validating on available data. Summarizing, we have: w uv = I(u) I(v) S α(u, v) = I(u) α I(v) 1 α (4) w ij = U(i) U(j) S α(i, j) = U(i) α U(j) 1 α (5) where α [0, 1] is a parameter to tune. Note that, this similarity function generalizes the one given in [5] where a similar parameterization is also proposed for the itembased case with strictly empirical motivations. Here, we give a formal justification of the rescaling factor they used by defining the similarity as an asymmetric product of conditional probabilities. 3.3 Locality of the Scoring Function In Section 2.2 we have seen how the final recommendation is computed by a scoring function that combines the scores obtained using individual users or items. So, it is important to determine how much each individual scoring component influences the overall scoring. MBCF algorithms typically approach this problem by restricting the computation to neirest neighbors. Alternatively, we propose to use a monothonic not decreasing function f(w) on the weights to emphasize/deemphasize similarity contributions in such a way to adjust the locality of the scoring function, that is how many, and how much of, nearest users/items really matter in the computation. As we will see, a correct setting of this function turned out to be very useful with the challenge data. In particular, we use the exponential family of functions, that is f(w) = w q where q N. The effect of this exponentiation in both the eq. (1) and eq. (2) is the following: when q is high, smaller weights drop to zero while higher ones are (relatively) emphasized; at the other extreme, when q = 0, the aggregation is performed by simply adding up the ratings. We can note that, in the user-based type of scoring function, this corresponds to take the popularity of an item as its score, while, in the case of item-based type of scoring function, this would turn out in a constant for all items (the number of ratings made by the active user). 4. RECOMMENDATION In this section, we describe a novel recommendation technique based on the asymmetric cosine combination between user and item embeddings. Furthermore, we describe additional techniques, like calibration and aggregation, to improve the final recommendation. 4.1 Asymmetric Cosine based CF In Section 2.3 it is shown how the scoring function of a memorybased model can be formulated as ˆr ui = x u y i, where x u and y i are appropriate representations of users and items, respectively. To increase the flexibility of the prediction further we can resort again to the asymmetric cosine similarity defined in Section 3.1 and replace the scoring function with the following: ˆr ui = S β (u, i) = x u y i x u 2β y i 2(1 β) (6) One can easily note that, when β = 1, the order induced by this scoring function over different songs is the same as the one induced by the standard scoring function because the two scoring functions are basically the same up to a (positive) constant. Since we are focusing on top-n recommendation and we are only interested to the ranking among songs, then we can consider that above as a generalization of the standard memory-based CF scoring function. Now, we can analyze a little more in depth this new scoring function for the user-based and item-based prediction cases. User-based prediction. In this case, the embedding is performed in R n, the space of users, according to: x (v) u = w uv and x u 2 = w u 2 y (v) i = r vi and y i 2 = U(i) and the user-based scoring function consists of: v U(i) ˆr ui = wuv w u 2β U(i) v U(i) wuv (1 β) U(i) (1 β) Item-based prediction. In this case, the embedding is performed in R m, the space of items, according to: x (j) u = r uj and x u 2 = I(u) y (j) i = w ij and y i 2 = w i 2 and the item-based scoring function consists of: j I(u) ˆr ui = wij I(u) β w j I(u) wij i 2(1 β) w i 2(1 β) Note that, in the item case, the normalization is defined in terms of the weight norms whose computation requires the computation of the weights w ij for every j I and this is quite inefficient. Then, a further step of preprocessing where this norm is estimated for each item i is needed in this case. In our implementation, this step is implemented by drawing items j from a subsample of items. 4.2 Calibration The focus of the present work is on very large datasets. For this, until now, we have proposed memory and time efficient algorithms that avoid any time and space consuming model training. However, simple statistics from a dataset can still be estimated efficiently and can potentially result useful for recommending. In this section, we present a simple technique to learn a basic model for songs. The model is trained using positive feedback only hence resulting quite efficient given the sparsity of data. Basically, the

5 idea is to calibrate the obtained scores for each item by comparing them to the average score on positive user-item pairs. For each song the mean, and maximum, value of the scoring function obtained for positive pairs is estimated from data. More formally, let R the relation for which we want to compute the statistics. Then, we define the positive feedback mean and positive feedback maximum as in the following: m i E (u,i) R [ˆr ui] and M i max (u,i) R [ˆrui] For the MSD data, for example, these statistics are estimated by taking ˆr ui values for a sample of training users such that i I(u). So, we need to estimate the average value of the scoring obtained by the same song for user that listened to it. Now, let be given 0 < θ < 1, then the calibrated scoring function is defined as a simple piece-wise linear function: ˆr ui ˆr ui m i θ if ˆr ui m i = C(ˆr ui) = θ + ˆr ui m i M i m i (1 θ) if m i < ˆr ui M i 1 otherwise In our experiments we used θ = Aggregation There are many sources of information available regarding songs. For example, it could be useful to consider the additional metadata which are also available and to construct alternative rankings based on that. In the following, we propose three simple strategies. We assume we have a distribution of probability p k, such that k p k = 1 that determines the weight to give to the predictor k to form the aggregated prediction. In our approach the best p k values are simply determined by validation on training data Stochastic Aggregation Different recommendation strategies are usually individually precision oriented, meaning that each strategy is able to correctly recommend a few of the correct songs with high confidence but, other songs which the user likes, cannot be suggested by that particular ranker. Hopefully, if the rankers are diverse, then the rankers can recommend different songs. If this is the case, a possible solution is to predict a final recommendation that contains all the songs for which the single strategies are more confident. The stochastic aggregation strategy used in the challenge can be described in the following way. We assume we are provided with the list of songs, not yet rated by the active user, given in order of confidence, for all the basic strategies. On each step, the recommender randomly choose one of the lists according to a probability distribution p k over the predictors and recommends the best scored item of the list which has not yet been inserted in the current recommendation Linear Aggregation Another very simple aggregation rule is here presented were the scores given by different predictors are simply averaged. Clearly, in this case, the scores should be comparable. When they are not, we can assume the scores ˆr (k) ui are proportional to the probability of observing item i given the user u as estimated by the k-th predictor. In this case, item scores of each predictor can be normalized in such a way that ˆr u (k) 1 = i ˆr(k) ui = 1. Data Statistics min max ave median users per song songs per user Borda Aggregation The last aggregation method we propose is a slight variant of the well known Borda Count method. In this case, each item i gets a score k p k ( I q k (i) + 1) where q k (i) [1, I ] is the position of item i in the list produced by ranker k. 4.4 Optimizations The computational complexity of the technique proposed mainly depends on the number of weight computations. In the item-based strategy, for each test user, the number of weights to compute is proportional to Ī I where Ī = Eu[ I(u) ] where the expectation is taken over visible songs of test users. In the user-based strategy, for each test user, the number of weights to compute is proportional to Ū I where Ū = Ei[ U(i) ] where the expectation is taken over available songs. However, in this second case, we can compute all the U weights for a user u and, the contribution of each weight w uv accumulated on the item scores when the item is in I(v). This strategy empirically reduced the time required by a factor of 12 on MSD data. 5. EXPERIMENTS AND RESULTS In the MSD challenge we have: i) the full listening history for about 1M users, ii) half of the listening history for 110K users (10K validation set, 100K test set), and we have to predict the missing half. We use a "home-made" random validation subset (HV) of the original training data of about 900K users of training (HVtr, with full listening history). The remaining 100K user s histories has been split in two halves (HVvi the visible one, HVhi the hidden one). The experiments presented in this section are based on this HV data and compare different similarities and different approaches. The baseline is represented by the simple popularity based method which recommends the most popular songs not yet listened to by the user. Besides the baseline, we report experiments on both the user-based and song-based scoring functions, and an example of the application of ranking aggregation. Given the size of the datasets involved we do not stress on the significance of the presented results. This is confirmed by the fact that the presented results do not differ significantly from the results obtained over the indipendent set of users used as the test set in the challenge. In the following results, when not explicitely indicated, we assume β = 1 in eq. (6) thus obtaining the standard scoring function for memory-based CF. 5.1 Taste Profile Subset Stats For completeness we report some statistics about the original challenge training data. In particular, the following table shows the minimum, maximum, and average, number of users per song and songs per user. The median value is also reported. We can see that the large majority of songs have only few users which have listened to it (less than 13 users for half of the songs) and the large majority of users have listened to few songs (less than 27 for half of the users). These characteristics of the dataset make the top-n recommendation task quite challenging.

6 5.2 Truncated Mean Average Precision Conformingly to the challenge, we used the truncated map (mean average precision) as the evaluation metric [10]. Let y denote a ranking over items, where y(p) = i means that item i is ranked at position p. The map metric emphasizes the top recommendations. For any k N, the precision at k (π k ) is defined as the proportion of correct recommendations within the top-k of the predicted ranking (assuming the ranking y does not contain the visible songs), π k (u, y) = 1 k k p=1 r uy(p) For each user the (truncated) average precision is the average precision at each recall point: AP (u, y) = 1 N u N π k (u, y)r uy(p) p=1 where N u is the smaller between N and the number of user u s positively associated songs. Finally, the average of AP (u, y u) s over all users gives the mean average precision (map). 5.3 Results on MSD data The result obtained on the HV data with the baseline (recommendation by popularity) is presented in Table 1. With this strategy, each song i simply gets a score proportional to the number of users U(i) which have listened to it. Baseline Recommendation by Popularity IS (α = 0) q= q= q= q= q= (b) US (α = 0) q= q= q= q= q= q= q= (d) IS (α = 1 ) 2 q= q= q= q= q= (c) US (α = 1 ) 2 q= q= q= q= q= q= q= Table 1: Results obtained by the baseline, item-based (IS) and userbased (US) CF methods varying the locality parameter (exponent q) of the similarity function. (e) Effect of locality (q). In Table 1, we report on experiments that show the effect of the locality parameter q for different strategies: item based and user based, using conditional probability (α = 0) and the cosine version (α = 0.5). As we can see, beside the case IS with cosine similarity (Table 1c), a correct setting of the parameter q drammatically improves the effectiveness on HV data. We can clearly see that the best performance is reached with the conditional probability on an item based strategy (Table 1b) Effect of using Asymmetric Cosine (α). In Figure 1, results obtained fixing the parameter q and varying the parameter α for both user and item-based recommendation strategies are given. In the item-based case, the results improve when setting a non-trivial α. In fact, the best result has been obtained for α = IS with 0 α 0.5, q = 3, (α = 0.15) α Effect of Asymmetric Prediction (β). In Table 2 we report the results of user-based AsymC memory-based collaborative filtering algorithm of Section 4.1 with α = 0.5. Interestingly, as we can see from the table, the best parameter β always improves (sometimes dramatically) the baseline (β = 1, results in Table 1(e)) Effect of Calibration. To test the effect of calibration in itembased recommandation we started with the best parameter setting for uncalibrated recommendation (α = 0.15, q = 3, = ) and calibrated using the technique depicted in Section 4.2. The calibration in fact improve the result, obtaining an interesting = (the best result obtained for this dataset so far). Concerning the effect of calibration on user-based recommendation we started again with the best setting for uncalibrated userbased recommendation (α = 0.5, q = 4, β = 0.7, = (b) US with 0 α 1, q = 5, (α = 0.6) Figure 1: Results obtained by item-based (IS) and user-based (US) CF methods varying the α parameter. α

7 US (α = 1 ) Best β 2 q= q= q= q= q= q= q= Table 2: User-based AsymC MBCF (α = 0.5). Results obtained varying the parameter q and selecting the best parameter β. (IS, α = 0.15, q = 3) (US, α = 0.3, q = 5) Table 3: Results obtained aggregating the rankings of two different strategies, item-based (IS, α = 0.15, q = 3) and user-based (US, α = 0.3, q = 5), by Stochastic Aggregation as discussed in Section 5.3, with different combinations. The (0.8, 0.2) combination corresponds exactly to the winning configuration used in the MSD challenge (see Figure 2) ) and calibrated using the technique described in Section 4.2 obtaining a = Stochastic Aggregation. Concerning the stochastic aggregation we report results obtained using the exact configuration that allowed us to win the MSD challenge (see Figure 2). In particular, in Table 3, two rankers are combined, and their recommendation aggregated, by using the stochastic algorithm described in Section In order to maximize the diversity of the two rankers, we aggregated an item-based ranker and a user-based ranker. We can see that the combination improves the performance with respect to the individual rankers on validation data. Linear Aggregation. In Table 4 we report on results obtained by linearly combining the best performing uncalibrated item-based ranker (α = 0.15, q = 3, β = 1) with the best performing uncalibrated user-based ranker (α = 0.5, q = 4, β = 0.7) varying the combination parameters. The aggregation is performed according to the algorithm given in Section Comparison with other approaches We end this section by comparing our with other approaches that have been used in the challenge. Best ranked teams all used variants of memory based CF, besides the 5-th ranked team that used the Absorption algorithm by YouTube [2] which is a graph based IS(α = 0.15, q = 3) US(α = 0.5, q = 4,β = 0.7) Table 4: Results obtained aggregating the rankings of two different strategies, item-based (IS, α = 0.15, q = 3) and user-based (US, α = 0.3, q = 5) AsymC MBCF (β = 0.7), with different combinations. Figure 2: Screenshot of the final MSD challenge leaderboard. method that performs a random walk on the rating graph to propagate preferences information over the graph. The team placed second used an approach similar to ours to get 17 user-item joint features to use in a learning-to-rank method (RankNet) [14]. On the other side, matrix factorization based techniques showed a very poor performance on this task and people working on that faced serious memory and time efficiency problems. Finally, some teams tried to inject meta data information in the prediction process with scarse results. In our opinion, this can be due to the fact that there is a lot of implicit information contained in the user s history and this is much more than explicit information one can get from metadata. We conclude that meta data information can be more effectively used in a cold start setting. 5.5 Results on MovieLens Data To demonstrate the generality of the approach, we present results on a different dataset, MovieLens1M 2. This dataset originally consists of more than 1 million of ratings for 3, 883 movies and 6, 040 users. The dataset has been made suitable to implicit feedback by considering the 226, stars ratings as positive feedback and ignoring all the other available ratings, thus obtaining a density of Similarly to the MSD dataset, data have been split in a 90% training users and 10% test users. Test user histories has been split again in two halves (visible and hidden). Finally, the task consists on predicting 5-stars movies in the hidden half using 5-stars movies in the visible half and complete history of training users. 2

### Performance Comparison of Algorithms for Movie Rating Estimation

Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical

### Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

### Recommendation Systems

Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

### A Time-based Recommender System using Implicit Feedback

A Time-based Recommender System using Implicit Feedback T. Q. Lee Department of Mobile Internet Dongyang Technical College Seoul, Korea Abstract - Recommender systems provide personalized recommendations

### Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

### Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

### CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

### Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation

### Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback

Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback ABSTRACT Yi Fang Department of Computer Science Purdue University West Lafayette, IN 47907, USA fangy@cs.purdue.edu

### New user profile learning for extremely sparse data sets

New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl.

### Experiences from Implementing Collaborative Filtering in a Web 2.0 Application

Experiences from Implementing Collaborative Filtering in a Web 2.0 Application Wolfgang Woerndl, Johannes Helminger, Vivian Prinz TU Muenchen, Chair for Applied Informatics Cooperative Systems Boltzmannstr.

### Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

### BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering Yeming TANG Department of Computer Science and Technology Tsinghua University Beijing, China tym13@mails.tsinghua.edu.cn Qiuli

### Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Content-Based Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item

### Mining Web Data. Lijun Zhang

Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

### Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

### Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Kiran Kannar A53089098 kkannar@eng.ucsd.edu Saicharan Duppati A53221873 sduppati@eng.ucsd.edu Akanksha Grover A53205632 a2grover@eng.ucsd.edu

### A Scalable, Accurate Hybrid Recommender System

A Scalable, Accurate Hybrid Recommender System Mustansar Ali Ghazanfar and Adam Prugel-Bennett School of Electronics and Computer Science University of Southampton Highfield Campus, SO17 1BJ, United Kingdom

### Introduction. Chapter Background Recommender systems Collaborative based filtering

ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

### NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems

1 NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems Santosh Kabbur and George Karypis Department of Computer Science, University of Minnesota Twin Cities, USA {skabbur,karypis}@cs.umn.edu

### ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

### Performance of Recommender Algorithms on Top-N Recommendation Tasks

Performance of Recommender Algorithms on Top- Recommendation Tasks Paolo Cremonesi Politecnico di Milano Milan, Italy paolo.cremonesi@polimi.it Yehuda Koren Yahoo! Research Haifa, Israel yehuda@yahoo-inc.com

### Web Personalization & Recommender Systems

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

### Neighborhood-Based Collaborative Filtering

Chapter 2 Neighborhood-Based Collaborative Filtering When one neighbor helps another, we strengthen our communities. Jennifer Pahlka 2.1 Introduction Neighborhood-based collaborative filtering algorithms,

### BPR: Bayesian Personalized Ranking from Implicit Feedback

452 RENDLE ET AL. UAI 2009 BPR: Bayesian Personalized Ranking from Implicit Feedback Steffen Rendle, Christoph Freudenthaler, Zeno Gantner and Lars Schmidt-Thieme {srendle, freudenthaler, gantner, schmidt-thieme}@ismll.de

### Recommender Systems. Collaborative Filtering & Content-Based Recommending

Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on

### amount of available information and the number of visitors to Web sites in recent years

Collaboration Filtering using K-Mean Algorithm Smrity Gupta Smrity_0501@yahoo.co.in Department of computer Science and Engineering University of RAJIV GANDHI PROUDYOGIKI SHWAVIDYALAYA, BHOPAL Abstract:

### Web Personalization & Recommender Systems

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

### A Recursive Prediction Algorithm for Collaborative Filtering Recommender Systems

A Recursive rediction Algorithm for Collaborative Filtering Recommender Systems ABSTRACT Jiyong Zhang Human Computer Interaction Group, Swiss Federal Institute of Technology (EFL), CH-1015, Lausanne, Switzerland

### Kernels and representation

Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical

### Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback Yi Fang and Luo Si Department of Computer Science Purdue University West Lafayette, IN 47906, USA fangy@cs.purdue.edu

### Yelp Recommendation System

Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop

### Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

### SUGGEST. Top-N Recommendation Engine. Version 1.0. George Karypis

SUGGEST Top-N Recommendation Engine Version 1.0 George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 karypis@cs.umn.edu Last updated on

### ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations

ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations Michal Aharon Yahoo Labs, Haifa, Israel michala@yahoo-inc.com Dana Drachsler-Cohen Technion, Haifa, Israel ddana@cs.technion.ac.il Oren

### Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

### OBJECT-CENTERED INTERACTIVE MULTI-DIMENSIONAL SCALING: ASK THE EXPERT

OBJECT-CENTERED INTERACTIVE MULTI-DIMENSIONAL SCALING: ASK THE EXPERT Joost Broekens Tim Cocx Walter A. Kosters Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Email: {broekens,

### Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

### AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

### Factor in the Neighbors: Scalable and Accurate Collaborative Filtering

1 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering YEHUDA KOREN Yahoo! Research Recommender systems provide users with personalized suggestions for products or services. These systems

### Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213

### PERSONALIZED TAG RECOMMENDATION

PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications

### Available online at ScienceDirect. Procedia Technology 17 (2014 )

Available online at www.sciencedirect.com ScienceDirect Procedia Technology 17 (2014 ) 528 533 Conference on Electronics, Telecommunications and Computers CETC 2013 Social Network and Device Aware Personalized

### EigenRank: A Ranking-Oriented Approach to Collaborative Filtering

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering ABSTRACT Nathan N. Liu Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China

### COMP 465: Data Mining Recommender Systems

//0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square

### A Document-centered Approach to a Natural Language Music Search Engine

A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler

### Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

### Jeff Howbert Introduction to Machine Learning Winter

Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

### A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

### Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

### arxiv: v1 [cs.ir] 27 Jun 2016

The Apps You Use Bring The Blogs to Follow arxiv:1606.08406v1 [cs.ir] 27 Jun 2016 ABSTRACT Yue Shi Yahoo! Research Sunnyvale, CA, USA yueshi@acm.org Liang Dong Yahoo! Tumblr New York, NY, USA ldong@tumblr.com

### A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

### WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

### Nearly-optimal associative memories based on distributed constant weight codes

Nearly-optimal associative memories based on distributed constant weight codes Vincent Gripon Electronics and Computer Enginering McGill University Montréal, Canada Email: vincent.gripon@ens-cachan.org

### Predicting Popular Xbox games based on Search Queries of Users

1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

### A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

### Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

### Parallel Evaluation of Hopfield Neural Networks

Parallel Evaluation of Hopfield Neural Networks Antoine Eiche, Daniel Chillet, Sebastien Pillement and Olivier Sentieys University of Rennes I / IRISA / INRIA 6 rue de Kerampont, BP 818 2232 LANNION,FRANCE

### Music Recommendation Engine

International Journal of Mathematics and Computational Science Vol. 1, No. 6, 2015, pp. 352-363 http://www.aiscience.org/journal/ijmcs ISSN: 2381-7011 (Print); ISSN: 2381-702X (Online) Music Recommendation

### A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

### Improved Neighborhood-based Collaborative Filtering

Improved Neighborhood-based Collaborative Filtering Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park Ave, Florham Park, NJ 07932 {rbell,yehuda}@research.att.com ABSTRACT Recommender systems

### Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

### The Open University s repository of research publications and other research outputs. Search Personalization with Embeddings

Open Research Online The Open University s repository of research publications and other research outputs Search Personalization with Embeddings Conference Item How to cite: Vu, Thanh; Nguyen, Dat Quoc;

### User Profiling for Interest-focused Browsing History

User Profiling for Interest-focused Browsing History Miha Grčar, Dunja Mladenič, Marko Grobelnik Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia {Miha.Grcar, Dunja.Mladenic, Marko.Grobelnik}@ijs.si

### General Instructions. Questions

CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

### Recommendation with Differential Context Weighting

Recommendation with Differential Context Weighting Yong Zheng Robin Burke Bamshad Mobasher Center for Web Intelligence DePaul University Chicago, IL USA Conference on UMAP June 12, 2013 Overview Introduction

### A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

### Semi supervised clustering for Text Clustering

Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering

### MetaData for Database Mining

MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine

### Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

### Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

### Distribution-free Predictive Approaches

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

### Real-time Video Recommendation Exploration

Real-time Video Recommendation Exploration Yanxiang Huang, Bin Cui Jie Jiang Kunqiang Hong Wenyu Zhang Yiran Xie Key Lab of High Confidence Software Technologies (MOE), School of EECS, Peking University

### Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

### Fast or furious? - User analysis of SF Express Inc

CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

### A Model of Machine Learning Based on User Preference of Attributes

1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

### ORT EP R RCH A ESE R P A IDI! " #\$\$% &' (# \$!"

R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

### arxiv: v1 [cs.mm] 12 Jan 2016

Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

### Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

### Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

### CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

### Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

### SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

### Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

### Personalized Web Search

Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

### Towards Better User Preference Learning for Recommender Systems

Towards Better User Preference Learning for Recommender Systems by Yao Wu M.Sc., Chinese Academy of Sciences, 2012 B.Sc., University of Science and Technology of China, 2009 Dissertation Submitted in Partial

### A Collaborative Method to Reduce the Running Time and Accelerate the k-nearest Neighbors Search

A Collaborative Method to Reduce the Running Time and Accelerate the k-nearest Neighbors Search Alexandre Costa antonioalexandre@copin.ufcg.edu.br Reudismam Rolim reudismam@copin.ufcg.edu.br Felipe Barbosa

### Clustering Part 4 DBSCAN

Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

### Slope One Predictors for Online Rating-Based Collaborative Filtering

Slope One Predictors for Online Rating-Based Collaborative Filtering Daniel Lemire Anna Maclachlan February 7, 2005 Abstract Rating-based collaborative filtering is the process of predicting how a user

### From Ensemble Methods to Comprehensible Models

From Ensemble Methods to Comprehensible Models Cèsar Ferri, José Hernández-Orallo, M.José Ramírez-Quintana {cferri, jorallo, mramirez}@dsic.upv.es Dep. de Sistemes Informàtics i Computació, Universitat

### Online Social Networks and Media

Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

### Stable Matrix Approximation for Top-N Recommendation on Implicit Feedback Data

Proceedings of the 51 st Hawaii International Conference on System Sciences 2018 Stable Matrix Approximation for Top-N Recommendation on Implicit Feedback Data Dongsheng Li, Changyu Miao, Stephen M. Chu

### on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

### Multiple Query Optimization for Density-Based Clustering Queries over Streaming Windows

Worcester Polytechnic Institute DigitalCommons@WPI Computer Science Faculty Publications Department of Computer Science 4-1-2009 Multiple Query Optimization for Density-Based Clustering Queries over Streaming

### Justified Recommendations based on Content and Rating Data

Justified Recommendations based on Content and Rating Data Panagiotis Symeonidis, Alexandros Nanopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessaloniki 54124, Greece

### An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

### A Neural Network for Real-Time Signal Processing

248 MalkofT A Neural Network for Real-Time Signal Processing Donald B. Malkoff General Electric / Advanced Technology Laboratories Moorestown Corporate Center Building 145-2, Route 38 Moorestown, NJ 08057

### A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da