Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department Carnegie Mellon University wcohen@cs.cmu.edu ABSTRACT The Cold-Start problem is a well-known issue in recommendation systems: there is relatively little information about each user, which results in an inability to draw inferences to recommend items to users. In this paper, we try to give a solution to this problem based on homophily in social networks: we can use social networks information in order to fill the gap existing in cold-start problem and find similarities between users. In this study, we use communities, extracted from different dimensions of social networks, to capture the similarities of these different dimensions and accordingly, help recommendation systems to work based on the found latent similarities. By different dimensions, we mean friendship network, item similarity network, commenting network and etc. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval information filtering Keywords Recommendation, Cold-Start, Community Detection, Social Media 1. INTRODUCTION Recommendation systems have been developed as one of the possible solutions to the information overload problem. The cold start problem [7] is a typical problem in recommendation systems. In recent years, some studies tried to address this problem. For example in [6] and [7], hybrid recommendation approaches, that combine content and usage data, are proposed and in [1], a new similarity measure considering impact, popularity, and proximity is introduced as a solution to this problem. Most of these approaches consider content information or demographic data, and not the connection information, for performing the recommendations; However, in some cases these information might not Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WOODSTOCK 97 El Paso, Texas USA Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$10.00. be available. In this paper, we suggest user connections and ratings in social networks as a replacement. By the advance of openid protocol and the emerge of new social networks, user activities, connections and ratings in various networks are now more accessible. Social networks offer connection of different dimensions: people may be friends with each other, they might have similar interests, and may rate content similarly. These different dimensions can be used to detect communities among people. Using community detection techniques, collective behavior of users is predictable. For example, in [5], a comparison has been made between familiarity network based and similarity network based recommendations. In [4], a typical traditional collaborative filtering (CF) approach is compared to a social recommender/social filtering approach. These studies do not utilize latent community detection techniques to address the cold start problem. This study aims to use different dimensions of social networks to extract latent communities and use these communities to provide a solution to the cold start problem. In this paper, we first give a brief introduction to community detection methods. Then, we describe the Principal Modularity Maximization method [8] in section 2.1. After that, we propose our approaches to utilize the community detection algorithm in section 2.2., describe the used dataset in section 3, and discuss the experiments in section 4. 2. COMMUNITY DETECTION With the growth of social network web sites, the number of subjects within these networks has been growing rapidly. Community detection in social media analysis [3] helps to understand more of users collective behavior. The community detection techniques aim to find subgroups among subjects such that the amount of interaction within group is more than the interaction outside it. Multiple statistical and graph-based methods have been used recently for the community detection purposes. Bayesian generative models [2], graph clustering approaches, hierarchical clustering, and modularity-based methods [3] are a few examples. While the existing social networks consist of multiple types of subjects and interactions among them, most of these techniques focus only on one dimension of these interactions. Consider the example of blog networks in which people can connect to each other, comment on each other s posts, post link to other posts in their blog post, or blog about similar subjects. By considering only one of these dimensions, e.g. connections network, we loose important information about other dimensions in the network and the resulting communi-

ties will just represent a part of existing ones. In this paper we use modularity-based community detection method for multi-dimensional networks presented by Tang et. al [8] as brielfy described in the following subsection. 2.1 Principal Modularity Maximization Modularity-based methods consider the strength of a community partition for real-world networks by taking into account the degree distribution of nodes. Modularity measure is defined based on how far the within-group interaction of found communities deviates from a uniform random graph with the same degree distribution. The modularity measure is defined as follows: Q = 1 ( ) 2m T r S T BS B = A ddt 2m where S is a matrix indicating community membership (S ij = 1 if node i belongs to community j and 0 otherwise) and B is the modularity matrix defined in equation 2. In equation 2, which measures the deviation of network interactions from a random graph, A represents the sparse interaction matrix between actors of the network, d shows the degree of each node, and m is the total number of existing edges. The goal in modularity-based methods is to maximize Q, the strength of the community partition. By relaxing matrix S as a matrix with continuous elements, the optimal S can be computed as the top k eigenvectors of the modularity matrix B [8]. As said before, communities can consist of multiple dimensions like friendship dimension, co-rating dimension, commenting dimension and etc. Principal Modularity Maximization[8], is a modularity based method to find hidden communities in multi-dimensional networks. The idea is to integrate the network information of multiple dimensions in order to discover cross-dimension group structures. The method is a two-phase strategy to identify the hidden structures shared across dimensions. In the first phase, the structural features from each dimension of the network is extracted via modularity analysis (structural feature extraction), and then the features are integrated to find out a community structure among nodes (cross-dimension integration). The assumption behind this cross-dimensional integration is that the structure of all of the dimensions in the network should be similar to each other. In the first step, structural features are defined as the network-extracted dimensions that are indicative of community structure. They can be computed by a low-dimensional embedding using the top eigenvectors of the modularity matrix. Minimizing the difference among features of various dimensions in crossdimension integration is equivalent to performing Principal Component Analysis (PCA) on them. This results in a community membership matrix S which is continuous. This matrix shows how much each node belongs to each community. To group all the nodes in a discrete community membership based on these features, a simple clustering algorithm such as K-means is used on S. As a result of this clustering, each node will belong to just one community. 2.2 Cold Start Problem and Community Detection in Recommendation Systems (1) (2) The cold start problem [7] happens in recommendation systems due to the lack of information, on users or items. Usage-based recommendation systems work based on the similarity of taste of user to other users and content based recommendations take into account the similarity of items user has been consumed to other existing items. When a user is a newcomer in a system, or he/she has not yet rated enough number of items. So, there is not enough evidence for the recommendation system to build the user profile based on his/her taste and the user profile will not be comparable to other users or items. As a result, the recommendation system cannot recommend any items to such a user. Regarding the cold start problem for items, when an item is new in the usage based recommendation systems, no users have rated that item. So, it does not exist in any user profile. Since in collaborative filtering the items consumed in similar user profiles are recommended to the user, this new item cannot be considered for recommendation to anyone. In this paper, we concentrate on cold start problem for new users. we propose that if a user is new in one system, but has a history in another system, we can use his/her external profile to recommend relevant items, in the new system, to this user. As an example, consider a new user in youtube, of whom we are aware of his/her profile in Facebook. A comprehensive profile of the user can be produced by the movies he/she posted, liked or commented on in Facebook and this profile can be used to recommend relevant movies in youtube to the same user. In this example, the type of recommended items are the same: movies. Another hypothesis, is that users interest in specific items, might reveal his/her interest in other items. This is the same hypothesis that exists in multidimensional network community detection: we expect multiple dimensions of a network to have a similar structure. As an example, if a user is new to the books section of a system, but has a profile in the movies section, we can consider similar users to him/her, in terms of movie ratings, to have a similar taste on books with him/her; or if two users are friends, we expect them to have more similar behavior in the system. We utilize user profiles in other dimensions to predict their interests in another dimension can be used as a solution to the cold start problem. Community detection can provide us with a group of users similar to the target user considering multiple dimensions. We can use this information in multiple ways as suggested in the following. In traditional collaborative filtering, the predicted rating of active user a on each item j is calculated as a weighted sum of similar users rankings on the same item: Equation 3. Where n is the number of similar users we would like to take into account, α is a normalizer, v i,j is the vote of user i on item j, v i is the average rating of user i and w(a, i) is the weight of this n similar users. p a,j = v a + α n w(a, i)(v i,j v i) (3) i=1 The value of w(a, i) can be calculated in many ways. Common methods are Cosine similarity, Euclidean similarity, or Pearson Correlation on user profiles. we proposed and tried multiple approaches in community based collaborative filtering to predict user ratings. Once we have found latent communities in the data, we need to use this information to help with the recommendation of content to users. Our

assumption is that users within the same latent community are a better representative of user interests in comparison with all users. We propose approaches that consist of combinations of the following: 1. Using a community based similarity measure to calculate w(a, i): This is specifically useful in PMM community detection algorithm. Here, a matrix S, an N K matrix, which is an indicator of multi dimensional community membership is produced. It shows how much each user belongs to each community. we define the community-based similarity measure among users of the system as an N N matrix W in equation 4 and use it as a weight function in equation 3. Here, N is the total number of users and each element of the matrix shows the similarity between two of users based on the communities they belong to. W = SS T (4) Figure 1: Log-log plot of number of book ratings per user 2. Using co-community users (users within active user s community) instead of k-nearest neighbors: we define the predicted rating as in equation 5 in which community(a) indicates the community assigned to the active user by the community detection algorithm. Based on that, only users within a user community are considered in the CF algorithm. p a,j = v a + α i community(a) w(a, i)(v i,j v i) (5) In addition to using the proposed methods to address the cold-start problem, we believe that, the second case is useful where there are a large number of users and as a result, the traditional collaborative filtering approach takes a lot of space and time to converge. Instead, we can detect the community user belongs to, and use that community members to find relevant items to users. 3. DATASET The dataset used in this study is based on an online Russian social network called imhonet 1. This web site contains many aspects of a social network, including friendships, comments and ratings on items. We use a dataset that includes the connections between users of this web site and the ratings they had on books and movies. The friendship network contains approximately 240,000 connections among around 65,000 users. The average number of friends each user has is about 3.5. Additionally, there are about 16 million rating instances of the movie ratings on about 50,000 movies in the dataset and more than 11.5 million user ratings on about 195,000 available books in the dataset,. Figure 1 shows the log-log scale of the number of book ratings per user and Figure 2 shows the number of ratings for each book. As can be seen, the number of users per book follows the power law distribution. But for the number of book ratings per user, it doesn t show a power law distribution. It looks like a combination of two power law distributions. That is because imhonet asked its users to rate at least 20 books for building more complete user profiles. 1 www.imhonet.ru Figure 2: Log-log plot of number of ratings per book If we look at movie rating distribution (which we omitted due to the space restrictions), we can see the same behavior: based on imhonet s request, many users rated around 20 movies. Friendship connections between users follow a power law distribution too. To reduce the volume of the data, we used the ratings of users who had at least one connection in the dataset. The resulting dataset contains about 9 million movie ratings of 48,000 users on 50,000 movies and 1.2 million book ratings of 13,000 users on 140,000 books. For the experiments, we picked 10,000 random users among these users. 4. EXPERIMENTS We separated 10% of users as test users and the reminder as train users. To simulate the cold start problem, we removed all the book ratings of test users from the dataset and tried to predict these book ratings for them. We performed 10-fold cross-validation on this data. To apply PMM to the problem at hand, we need to define the various network dimensions. The first is obvious: we can simply use the friendship network itself. Then, we need a method to construct a similarity graph of users using their book and

movie ratings. To do so, we define an edge weight s(r i, r j) between each two users as follows: Let r i be the rating vector of user i, let σ x be the standard deviation of the non-zero elements of a vector x, and let covar(x, y) be covariance of points where both x and y are non-zero. Then, the similarity function is s(r i, r j) = covar(ri, rj) σ ri σ rj (6) provided that r i and r j overlap at at least 3 positions and 0 otherwise. A similarity score of 0 indicates that no edges should be added. This function is a modified version of Pearson s Correlation Coefficient that takes into account the standard deviation of a user s ratings instead of just the standard deviation of the overlap with another user. As such it is no longer constrained to the interval [ 1, 1] and does not have a direct interpretation, but it better represents the similarity between users. We can then use this function to create graphs from the book and movie ratings. Once we had different dimensions of the network, we can run PMM on the friendship, books, and movies graphs to obtain the latent communities. We set the number of communities and the number of neighbors in collaborative filtering approach to 30 in this experiment. Graphical results of performing PMM are shown in Figures 3 and 4 which are created by Gephi software 2. Figure 3: Communities detected by PMM shown in a graph sketched by Gephi software. Each community forms a square. Nodes are imhonet users and links are their friendship connections. We considered different combinations of the approaches proposed in Section 2.2 as follows: 1. We consider a vector space model for book and movie ratings and build user profiles by concatenating these two vectors in a combined space; then, performing traditional collaborative filtering using Pearson Correlation on the concatenated vector (CF), 2. As described in case 1, we perform collaborative filtering for all users considering their community mem- 2 www.gephi.org Figure 4: Pie chart of number of users in each community. Each color represents a community. berships as a similarity measure (CF with Community Simil), 3. As in case 2, we perform traditional collaborative filtering within the community (CF within Community), 4. We perform collaborative filtering using the community based similarity measure within the community (combination of cases 1 and 2) (CF with Community Simil within Community), The performance of these different combination are reported in Figure 5 in terms of ndcg at top k recommendations for k changing from one to ten. Notice that collaborative filtering within members of a community works slightly better than other methods. Also, performing CF within a community, either with community-based similarity measure or the Pearson correlation, works better than performing CF with a constant number of neighbors for knn. On the other hand, we can see that using community-based similarity reduces ndcg for both within community and global Cf methods. While this means that using only community members in CF helps in recommending more interesting items to users, it also means that Pearson correlation, works better as a similarity measure for CF in comparison with communitybased similarity measure. Generally, the ndcg results we obtain for the cold start problem is reasonable since the problem is simulated in a way that having no information about other dimensions, recommending items to users would be impossible. 5. CONCLUSIONS AND FUTURE WORKS We showed that performing collaborative filtering within community members is more effective than running collaborative filtering on all users. Also, we showed that using other dimensions of user interests or user connections, helps in having a reasonable ndcg in cold-start problem. Based on our experiments, the number of members in each community follows a power law. As a result, it is interesting to see the performance of proposed community-based recommendation methods on different size communities and see if these methods help in small-size, mid-size or big communities. Another interesting study is to consider the effect of number of neighbors in simple collaborative filtering approach on the results. In other words, it is interesting to see considering which number of neighbors is better in collaborative filtering and if this number is related to the average detected community size. Another future works would be to

[8] L. Tang, X. Wang, and H. Liu. Uncovering groups via heterogeneous interaction analysis. In ICDM, 2009. Figure 5: ndcg at top k recommendations consider Bayesian generative models of community detection and study how grouping connections of a user and assigning them to each of the dimensions of the network would help in the recommendations quality. 6. ACKNOWLEDGMENTS We would like to thank the administration of imhonet who kindly provided anonymized data for our study. Also, we would like to thank Dr. Peter Brusilovsky and Daniel Mills for their help during this study. This research is partially supported by the National Science Foundation under Grants No. 1059577 and 1138094. 7. REFERENCES [1] H. J. Ahn. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences, 178(1):37 51, 2008. [2] C. Delong and K. Erickson. Social topic models for community extraction categories and subject descriptors. October, 2008. [3] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 174, 2010. [4] G. Groh and C. Ehmig. Recommendations in taste related domains: collaborative filtering vs. social filtering. In proc. of the 2007 international ACM conf. on Supporting group work, GROUP 07, pages 127 136, New York, NY, USA, 2007. ACM. [5] I. Guy, N. Zwerdling, D. Carmel, I. Ronen, E. Uziel, S. Yogev, and S. Ofek-Koifman. Personalized recommendation of social software items based on social relations. In proc. of the third ACM conf. on Recommender systems, RecSys 09, pages 53 60, New York, NY, USA, 2009. ACM. [6] S.-T. Park and W. Chu. Pairwise preference regression for cold-start recommendation. RecSys 09, pages 21 28, New York, NY, USA, 2009. ACM. [7] A. I. Schein, A. Popescul, L. H., R. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In In proc. of ACM SIGIR conf. on Research and Development in Information Retrieval, pages 253 260. ACM Press, 2002.