New user profile learning for extremely sparse data sets

New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl. Marii Curie-Skladowskiej 5, 6-965 Poznan, Poland {tomasz.hoffmann,tadeusz.janasiewicz,andrzej.szwabe}@put.poznan.pl http://www.put.poznan.pl Abstract. We propose a new method of online user profile learning for recommender systems, that deals effectively with extreme sparsity of behavioral data. The proposed method enhances the singular values rescaling method and uses a pair of vectors to represent both positive and neutral user preferences. A list of discarded elements is used in a simple implementation of negative relevance feedback. We experimentally show the negative impact of dimensionality reduction on the accuracy of recommendations based on extremely sparse data. We introduce a new method for recommendation quality evaluation that involves on the measurement of F1 performed iteratively during a simulated session. The combined use of the singular value rescaling and the user profile representation based on two complementary vectors has been compared with the use of well-known recommendation methods showing the superiority of our method in the online user profile updating scenario. Keywords: Recommender systems, user profile learning, collaborative data sparsity, vector space model, cold-start problem, relevance feedback 1 Introduction The main purpose of many recommender systems is to recommend items to users in the interactive web environment [6], [7]. Behavioral data sparsity makes the effective online interaction between users and a recommender system an especially challenging task [3]. To our knowledge, there are only few algorithms for new user profile learning that are oriented towards dealing with extremely sparse data sets. As shown in [2], data sparsity is a severe limitation for the effectiveness of methods based on dimensionality reduction [6]. In the classical vector space model auser profileisrepresentedbyavectorthat aggregatesvectorsofall items selected by the user [1], [6]. In that case no additional information about unselected items is used, i.e., only positive preferences are stored. Such an approach to user profile modeling has a significant impact on recommendation accuracy. We assume that the purpose of personalized recommendation is to identify topn products that are the most relevant to the user [8]. Following this assumption, in this paper we investigate a double vector representation of a user profile,

2 T. Hoffmann, T. Janasiewicz, A. Szwabe that takes into account the sparsity of data set [3]. We compare the proposed method to a few widely-used methods, such as collaborative filtering, ratings prediction and popularity-based item recommendation. We propose to estimate item relevance as a dot product between a user vector and item vector weighted by means of a probability model. Finally, we evaluate the presented method by using the F1 measure [6]. 2 Evaluation of iterative user profile updating methods We propose binary representation of the ratings[6]. Taking the perspective of the find-good-items task [3], we assume that what is important is not how much the user likes a given product, but the fact that she or he was interested in it. To our knowledge, there has been no research in to the direct impact of the dimensionality reduction process on recovered matrix. We propose applying concentration curves [9] to visualize ratings distribution before and after dimensionality reduction. We evaluate the quality of recommendations by performing an F1 measurement [6] after each user action. The parameter denoted as x determines the number of ratings in the training set [6]. The interaction with a new user is simulated by iterative shifting of user s ratings from the training set to the test set. Initially, the most popular items are recommended for all compared methods. Next, the user selects the first item and the system generates a recommendation list by performing the following steps: 1) items that were discarded by the user are added to DL (Discarded List a list of discarded items), 2) the user profile is updated according to the evaluated method, 3) and a recommendation list is generated using method. 3 Evaluated recommendation methods We compare our approach to a few well-known recommendation methods. Firstly, we evaluate the most popular item method (MP), which, as shown in [2], can effectively cope with data-set sparsity. Secondly, we use collaborative filtering (SVD-CF) that is based on the vector space model [3], [6]. The method uses SVD (Singular Value Decomposition) to obtain users and items vectors. When applying this method, we use the first 2 dimensions (k = 2) to find latent correlations between users, and to identify the 3 nearest neighbors (knn = 3). Moreover, we compare our approach to the rating prediction method (SVD- RP)[6] as well as to a variant with averagevalues removed from the input matrix (SVD-RPav)[6]. The solution proposed in this paper is referred to as the complementary spaces method (CSM). The first step of the algorithm is to decompose binary input matrix A m n. As a results of this decomposition, three matrices U, S and V are obtained, where U is a matrix containing users vector u i, V is a matrix containing items vectors v i and S is a diagonal matrix of the singular values of A, denoted as σ i. Our approach is based on representing a user profile

New user profile learning for extremely sparse data sets 3 by means of two vectors containing user s positive and neutral preferences. As shown in [1], an extension of user profile representation may improve the recommendation quality. In the case of our method, the vectors representing a user profile are built as a sum of vectors of the rated items set and the unrated items set, respectively u p+ = i I R v i, u p = i I NR v i, where v i denotes the i-th item vector, I R is a set of items rated by user and I NR is a set of items unrated by user. We propose using a simple probabilistic model based on the one proposed in [8] in order to weight the importance of each part of a user profile. Each dimension of the vector space corresponds to the probability value, proportionally to the square of the respective singular value σ i. For all vectors in the space, we compute the value of the probability based on the following assumptions: 1) probability distribution is defined as d = [d i ], where d i = (σ 2 i )/( j I σ2 j ), I = {1,2,,min (m,n)}, i I and i I d i = 1 2) probability value related to an item vector is equal to P( v j ) = i I v2 j,i d i, where j=1...m v2 i,j = 1 This model is based on the quantum probability framework proposed in [4]. It permits us to weight parts of the user profile by using appropriate probability values, determined by means of the singular values distribution. We implemented negative relevance feedback [5] that is based on the assumption that elements recommended by a system and discarded by the user are no more useful during the session. All the discarded items are stored on a list denoted as DL. Our singular values rescaling method is based on the probabilistic interpretation of vectors coordinates. Firstly, distribution d is prepared. Secondly, we compute a superposition of squared vectors representing items selected or rated by the user, called user square profile u sqp = i I R v 2 i. Next, the user square profile is used to scale d and to obtain a new distribution d new = mul( u sqp, d), where mul denotes an element multiplication operation. The relationbetween d new and d is representedbyavectorofcoefficients (each corresponding to a particular dimension), denoted as w scale = div( d new, d ), where div denotes an coordinate-by-coordinate division, and is used to scale the coordinates of items vectors from matrix V. Respectively, we compute w scale = div( d new, d ) where d new = sub( d, d new ), and sub is a subtraction of vector coordinates. Next, these coefficients are used to scale the user profile vectors u new+ = mul( w scale, u p+ ), u new = mul( w scale, u p ) and items vectors V new = V diag( w scale ), V new = V diag( w scale), where diag denotes the diagonal matrix in which a given vector forms the diagonal. According to the user profile representation, we obtain two lists denoted as r 1 = sqr( u new+ V new ), r 2 = sqr( u new V new). Next, we obtain two probabilities p 1 = mul( u sqp, d) and p 2 = 1 p 1 for both profile vectors. These probabilities are used as weights for similarity vectors r 1 and r 2. Thus, the final form of the similarity vector is as follows: r = p 1 r 1 +p 2 r 2. As a result of our algorithm, the system is able to recommend items from both the positive and the neutral list, applying an appropriate proportional weighting.

4 T. Hoffmann, T. Janasiewicz, A. Szwabe 4 Experiments We used a well-known MovieLens ML1k data set, which has accompanied by widely-referenced experimental results, e.g., [6], [7]. To analyze the characteristics of the data set we used concentration curves[9] and applied SVD at different k-cut values. As shown in Fig. 1, in the case of extremely sparse data sets, dimensionality reduction has a negative impact on the number of ratings appearing in recovered data sets. In such a case, each dimension corresponds to one of disjoint subsets, which reduce the number of item/user subsets that may appear in recommendation lists. cumulative % of ratings 1 9 8 7 6 5 4 3 2 1 k = 1 k = 2 k = 1 k = 2 k = 943 4 5 6 7 8 9 1 cumulative % of users cumulative % of ratings 1 9 8 7 6 5 4 3 2 1 k = 1 k = 2 k = 1 k = 2 k = 943 4 5 6 7 8 9 1 cumulative % of users Fig.1. Rating concentration curves for ML1k, x =.4 (on the left), x =.8 (on the right)..16.14.12.1.8.6.4.2 2 4 6 8 1 12 14 16 18 2 the number of iterations Fig. 2. Recommendation accuracy for x =.4. F1@1 5 Conclusions CSM SVD-RPav MP SVD-RP SVD-CF The results of the experiments show that as far as the online user profile updating scenario is concerned the proposed method performed better than several widely used methods. In the analyzed online sessions (in both cases of x =.4 and x =.8), the CSM method allowed us to achieve even 1 percent gain in the recommendation accuracy over the second best method - this result is shown in Fig. 2 and Fig. 3. The method based on item popularity (MP) allowed us to provide comparatively good recommendations when there was a higher amount of behavioral data in the train-set: for x =.4 MP

New user profile learning for extremely sparse data sets 5.2.18.16.14.12.1.8.6.4.2 2 4 6 8 1 12 14 16 18 2 the number of iterations Fig. 3. Recommendation accuracy for x =.8. F1@1 CSM MP SVD-RPav SVD-RP SVD-CF performed similarly to SVD-RPav, while for x =.8 the difference between the quality ofmp and the quality ofsvd-rpav wasmuch more visible. SVD-CF method was the worst one in both analyzed cases. An important contribution of this paper is the demonstration of a strong negative impact that dimensionality reduction has on the recommendation quality when it is applied to extremely sparse data sets, as shown in Fig. 1. 6 Acknowledgments This work is supported by the Polish Ministry of Science and Higher Education, grant N N516 196737. References 1. Berry, M., Dumais, S. and O Brien, G.: Using linear algebra for intelligent information retrieval, SIAM Rev. 37, 573-595, (1995) 2. Gedikli, F. and Jannach, D.: Recommending based on rating frequencies, 4th ACM conference on Recommender systems, RecSys 1, Spain, (21) 3. Herlocker, J.L., Konstan, J.A., Terveen, L.G. and Riedl, J.T.: Evaluating Collaborative Filtering Recommender Systems, ACM Trans. Inf. Syst., 22, 1, 5-53, (24) 4. Rijsbergen, C. J. van: The Geometry of Information Retrieval. Cambridge University Press, New York, NY, USA, (24) 5. Sandler, M. and Muthukrishnan, S.: Monitoring algorithms for negative feedback systems, WWW 1, Raleigh, North Carolina, USA, (21) 6. Sarwar B. M., Karypis G., Konstan J. A. and Riedl J.: Application of dimensionality reduction in recommender system - a case study, WebKDD, (2) 7. Shani, G. and Gunawardana, A.: Evaluating Recommender Systems, November, Microsoft Research, Redmond, USA, (29) 8. Varshavsky R., Gottlieb A., Linial M. and Hornl D.: Information extraction novel unsupervised feature filtering of biological data, Bioinformatics, (26) 9. Zhang M. and Hurley N.: Niche Product Retrieval in Top-N Recommendation, WI-IAT 1, Washington, DC, USA, (21) 1. Zhang, M. and Hurley, N.: Novel Item Recommendation by User Profile Partitioning, WI-IAT 9, Washington, DC, USA, (29)