Hybrid Tag Recommendation for Social Annotation Systems

Size: px
Start display at page:

Download "Hybrid Tag Recommendation for Social Annotation Systems"

Transcription

1 Hybrid Tag Recommendation for Social Annotation Systems Jonathan Gemmell, Thomas Schimoler, Bamshad Mobasher, Robin Burke Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA {jgemmell, tschimo1, mobasher, ABSTRACT Social annotation systems allow users to annotate resources with personalized tags and to navigate large and complex information spaces without the need to rely on predefined hierarchies. These systems help users organize and share their own resources, as well as discover new ones annotated by other users. Tag recommenders in such systems assist users in finding appropriate tags for resources and help consolidate annotations across all users and resources. But the size and complexity of the data, as well as the inherent noise and inconsistencies in the underlying tag vocabularies, have made the design of effective tag recommenders a challenge. Recent efforts have demonstrated the advantages of integrative models that leverage all three dimensions of a social annotation system: users, resources and tags. Among these approaches are recommendation models based on matrix factorization. But, these models tend to lack scalability and often hide the underlying characteristics, or information channels of the data that affect recommendation effectiveness. In this paper we propose a weighted hybrid tag recommender that blends multiple recommendation components drawing separately on complementary dimensions, and evaluate it on six large real-world datasets. In addition, we attempt to quantify the strength of the information channels in these datasets and use these results to explain the performance of the hybrid. We find our approach is not only competitive with the state-of-the-art techniques in terms of accuracy, but also has the added benefits of being scalable to large real world applications, extensible to incorporate a wide range of recommendation techniques, easily updateable, and more scrutable than other leading methods. Categories and Subject Descriptors H.2 [Database Management]: H.2.8 Database application Data mining; H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval Search process Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM 10, October 26 30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM /10/10...$ General Terms Algorithms, Experimentation, Performance Keywords Social Annotation, Information Channels, Hybrid Recommenders, Recommender Systems 1. INTRODUCTION In social annotation systems, information access functions such as search, navigation and resource sharing are supported by annotations, arbitrary tags applied to resources by individual users. Delicious 1 supports users as they bookmark URLs. Citeulike 2 enables researchers to manage scholarly references. Bibsonomy 3 allows users to annotate both. Social annotation systems are quickly becoming ubiquitous in a variety of domains. For example, Amazon 4 and others have incorporated social annotations into their web site. The popularity of social annotation systems is driven in part by the low entry barrier and the freedom to annotate resources with any tag. The aggregated connections between users, resources and tags provide a rich information space for users to explore. However, the benefits of social annotation systems do not come without a cost. The size, noise and dimensionality of the data make navigation and information access difficult. Recommender systems are therefore a critical component of these applications. In this work we focus on tag recommenders which assist users during the annotation process by recommending tags for a selected resource. Recent efforts in tag recommendation have proven that integrative models that leverage all three dimensions of a social annotation system (users, resources, tags) produce superior results. Graph-based models [7] and a variety of latent variable techniques [13, 14, 18, 19] have been investigated. These approaches tend to be computationally intensive and scale poorly. Previous work on hybrid tag recommenders [3, 4] that combine several components, each exploiting different dimensions of the data, have been shown to offer competitive results while maintaining the simplicity, computational efficiency and explanatory capacity of the component recommenders. However, these results have focused on hybrid models specifically designed for a particular dataset or as a means to augment another integrative technique. 1 delicious.com 2 citeulike.org 3 bibsonomy.org 4 amazon.com

2 This paper proposes a framework for constructing linear weighted hybrids that combines component recommenders into a single integrated model. No individual component is required to cover all dimensions of the data, but when taken together the components complement one another. The hybrid is therefore able to produce results superior to what the components produce alone. To help understand our experimental results, we explore the notion of an information channel: the power one dimension possesses in predicting or modeling another dimension of the social annotation system. To quantify the strength of these information channels, we develop a family of metrics based on conditional entropy. These metrics reveal marked differences in the characteristics of the datasets, which are reflected in the performance of the recommendation components. The rest of the paper is organized as follows. In Section 2 we present related work. Section 3 introduces tag recommendation. Results of the recommendation techniques are provided in Section 4. In Section 5 we introduce the notion of information channels and use them to evaluate the results of the tag recommenders. Finally, we conclude the paper with a summary of our findings. 2. RELATED WORK One of the first techniques to demonstrate the value of an integrative approach for tag recommendation in social annotation systems was a graph-based variant [7] of the wellknown PageRank algorithm. The computational burden of computing the PageRank values of each user, resource and tag for every recommendation makes the algorithm ill-suited for large-scale deployment. Tensor factorization is another integrative technique for making tag recommendations. Tucker decomposition is one such example that factors the three dimensional tagging data into three feature spaces and a core residual tensor [18, 19]. Unlike the graph-based model, online computation of recommendations is highly efficient. However, the offline computation required to build the model is not scalable to the demands of real-world applications. A pair-wise interaction tensor factorization model has also been proposed, which offers far more reasonable run times in both the construction of the model and the generation of recommendations [13, 14]. It optimizes the ranking of tags given user-resource pairs in the training data. Tags may then be recommended for a new user-resource pair. This approach represents the current state-of-the-art in tag recommendation providing both a high degree of accuracy and computational efficiency. Our previous work in tag recommendation has demonstrated the benefits of hybrid recommenders [3, 4]. One approach demonstrated that the graph-based models may be improved by incorporating item-based collaborative filtering. Another effort resulted in a hybrid recommender for Bibsonomy in the context of the PKDD-ECML 2009 challenge [10]. In this paper we extend those efforts, proposing a more general framework for constructing linear weighted hybrid tag recommenders. The hybrid is constructed from component recommenders and produces results competitive to state-of-the-art techniques. 3. TAG RECOMMENDATION This section begins with a discussion of the data models for a social annotation system. We then present our proposed framework for the linear weighted hybrid tag recommender and discuss the individual components that may be incorporated into the hybrid. For comparative purposes we also describe the state-of-the-art pair-wise interaction tensor factorization algorithm. 3.1 Data Model The foundation of a social annotation system is the annotation: the record of a user labeling a resource with one or more tags. A collection of annotations results in a complex network of interrelated users, resources and tags [11]. A social annotation system can be described as a four-tuple: U, R, T, A, where U is a set of users; R is a set of resources; T is a set of tags; and A is a set of annotations. An annotation contains a user, resource and all tags the user applied to the resource. A social annotation system can also be viewed as a three dimensional matrix, URT, in which an entry URT(u,r,t) is 1 if u tagged r with t. Aggregate projections of the data can be constructed, reducing the dimensionality but sacrificing information [12]. For example, the relation between resources and tags can be defined as RT (r, t). In this work, we calculate RT (r, t) as the number of users that have applied t to r. This notion strongly resembles the bag-of-words vector space model [15] and is analogous to the idea of term frequency common in information retrieval. A similar two dimensional projection can be constructed for UT, in which an entry contains the number of times a user has applied a tag to any resource. Finally, UR is a binary matrix indicating whether or not a user has annotated a resource. An alternative approach would be to define an entry in the matrix as the number of tags a user has applied to a resource. Our previous work and continued experimentation has shown that the binary model for UR produces better results. Each resource, r, may be modeled as a vector over the multi-dimensional space of tags, where a weight, w(t i), in dimension i corresponds to the importance of a particular tag, t i: r t = w(t 1),w(t 2)...w(t T ) (1) Similarly, a resource can be modeled as a vector over the space of users where each weight, w(u i), corresponds to the importance of a particular user, u i to produce r u. Analogous vector models can be constructed for users ( u r, u t ) and tags ( t u, t r ). We draw the weights directly from the previously constructed aggregate projections UR,UT and RT. The model of a user, resource or tag is defined as a row or column taken from one of the projections. 3.2 Linear Weighted Hybrid Our proposed framework aggregates the results of several component recommenders in linear combination [1]. The components are freed from the burden of covering all the available dimensions of the data and instead specialize in only a few. A successful hybrid creates a synergistic blend of its component parts producing results superior to what they could achieve alone.

3 We can view each component of a tag recommendation system as a function ψ : U R T R, which, given a user u U and a resource r R, produces a real-valued result p as the predicted relevance of a tag t for that particular user-resource pair: ψ(u, r, t) = p. In the most common settings tag recommenders are used to produce a ranked list of suggested tags for a particular user and given a specific resource. To do so using the above formulation, for a given user u and resource r, we iterate over all tags, sort them by their corresponding relevance scores, and return the top n tags: rec(u, r) =TOP n t T ψ(u, r, t). (2) In our proposed hybrid framework the relevance score for a tag is calculated using several component tag recommenders. These scores are then combined in a linear model. Specifically, given a set of component tag recommenders C, a linear weighted hybrid tag recommender will accept a user u and resource r. It will then query each of its component recommenders, c C, for a tag, t, and combine the results in the linear model: ψ h(u,r,t)= c C α cψ c(u,r,t) (3) where ψ h(u, r, t) is the linear weighted relevance score of the tag and α c is the weight given to the component, c. It should be noted that the scores from one component may be drawn from a different distribution than the other components. In order to ensure that the relevance scores for all component recommenders are on the same scale, we normalize the scores so that each ψ c(u, r, t) falls in the interval [0,1]. As additional recommenders are added to the hybrid, its complexity grows. The challenge then becomes how to ascertain the correct α for each component in order to maximize the effectiveness of the hybrid. We use a hill climbing technique because of its speed and simplicity. The α vector is initialized with random positive numbers constrained such that the sum of the vector equals 1. The vector is then randomly modified and tested against a holdout set to ascertain if it achieves better results. The holdout set may be evaluated for recall, precision or F-measure. In this work we rely on the F-measure since it incorporates both the recall and precision. If the result is improved, the change is accepted; otherwise it is usually rejected. Occasionally a change to the α vector is accepted even when it does not improve the results in order to more fully explore the α space. Modifications continue until the vector stabilizes. In order to ensure that a local maximum has not been discovered, the experiment is repeated 20 times from different random starting points. With this integrative model any tag recommender can be incorporated into the hybrid. We focus on relatively simple component recommenders due to their speed and scrutability. We now present those components Popularity Models Perhaps the simplest recommendation strategy is merely to recommend the most commonly used tags in the system. Alternatively, given a user-resource pair a recommender may ignore the user and recommend the most popular tags for that particular resource. This strategy is strictly resource dependent and does not take into account the tagging habits of the user. We define ψ(u, r, t) for the resource based popularity recommender, pop r, as: ψ(u, r, t) = v U θ(v,r,t) (4) We define θ(v, r, t) as 1 if v tagged r with t and 0 otherwise. In a similar fashion a recommender may ignore the resource and recommend the most popular tags for that particular user. While such an algorithm would include tags frequently applied by the user, it does not consider the resource information and may recommend tags irrelevant to the current resource. We define ψ(u, r, t) for the user based popularity recommender, pop u, as: ψ(u, r, t) = s R θ(u, s, t) (5) While popularity models are not necessarily the most effective techniques, they do serve as a baseline and may benefit the hybrid. Popularity based recommenders require little online computation. They are easily built offline and can be incrementally updated User-Based Collaborative Filtering User-based collaborative filtering [5, 9, 17] works under the assumption that users who have agreed in the past are likely to agree in the future. A neighborhood, N r, of the k most similar users to u is identified through a similarity metric such that all the neighbors have tagged r. For any given resource the weighted sum can then be calculated as: ψ(u, r, t) = σ(u, v)θ(v,r,t) (6) v N r where σ(u, v) is the similarity between the users u and v. In this work we rely on cosine similarity of the user models. As before, θ(v,r,t)is1ifvhas annotated r with t and 0 otherwise. When users are modeled as resources we call this approach KNN ur. When users are modeled as tags we call this technique KNN ut. Since the algorithm will only populate the neighborhood with users that have annotated r, the number of similarities to calculate can be quite small. The popularity of resources in social annotation systems follows the power law and the great majority of resources will benefit from this reduced computation, while a few will require additional computational effort. As a result the algorithm scales well with large datasets. Similarities may even be computed offline. This approach relies on the collaboration of other users. It may be the case that an appropriate tag cannot be recommended because it does not appear in a neighbor s profile. While the personalization offered by user-based filtering is an important benefit, it lacks the ability to reflect the habits and patterns of the larger crowd Item-Based Collaborative Filtering Item-based collaborative filtering [2, 16] relies on discovering similarities among resources rather than among users. We may model the resources as a vector over the user space.

4 We call this model KNN ru. When relying on tags, the vector contains the frequency with which a resource has been annotated with the tags. We call this model KNN rt. We define N u as the k nearest resources to r drawn from the user profile, u, and then define the relevance score of a tag for a user-resource pair as: ψ(u, r, t) = σ(r, s)θ(u, s, t) (7) s N u If a user has annotated resources similar to r with t then ψ(u, r, t) will be high. Otherwise the relevance score will be correspondingly low. The strength of this approach is that it can draw the most relevant tags from the user profile. Its weakness is that it cannot recommend tags from outside the user profile. Similarity metrics need only be calculated with resources in the user profile. If the user profile is not exceptionally large, this computation can be quickly done in real time. Otherwise, similarities can be calculated offline. 3.3 Pair-wise Interaction Tensor Factorization For the sake of comparison, we have chosen a tag recommender based on pair-wise interaction tensor factorization [14], which formed the basis for the winning submission of the PKDD 2009 Tag Recommendation Challenge [10]. This model-based approach generates a set of factor matrices which resembles a special case of the Tucker decomposition of a tensor. The tensor itself is not directly induced by the data (this could be achieved by regarding each (u,r,t) triple as a binary cell of a tensor), but rather reflects a ranking over the tags for each user-resource pair. The model is built by first considering observations in the data of the form (u, r, t +,t ), where (u, r, t +) is a triple which is found in the data (a positive example of tag selection) and (u, r, t ) is a triple not found in the data (a negative example of tag selection). An iterative gradient-descent algorithm is employed to optimize a ranking function (based on Bayesian conditionals) that prefers positive examples in the data over negative ones. Each of four related matrices is updated until convergence is found. The matrices represent the factor-reduced components of the specialized tensor factorization M = U ktk U + R ktk R, where U k is the user factor matrix, R k is the resource factor matrix, Tk U is the tag factor matrix with respect to users and Tk R is the tag factor matrix with respect to resources, k is the selected number of factors, and M is the personalized tag-ranking tensor. Generating a tag recommendation for a given user u and resource r is simply a matter of referring to the appropriate user-resource column of the ranking tensor M. The relevance score of a tag given a user-resource pair is calculated as: ψ(u,r,t)= k U k[u][i]tk U [t][i]+r k[r][i]tk R [t][i] (8) i=1 4. EXPERIMENTAL EVALUATION In this section we describe the methods used to gather and pre-process our six datasets. Following an outline of our methodology, we examine the results of our proposed linear weighted hybrid tag recommender along with its components and the pair-wise interaction tensor factorization model. Finally we draw some general conclusions. 4.1 Datasets Our experiments are conducted using data from six large real-world social annotation systems. On all datasets we generate p-cores [8]. Users, resources and tags are removed from the dataset in order to produce a residual dataset that guarantees each user, resource and tag occur in at least p annotations. We define a annotations to include a user, a resource, and every tag the user has applied to the resource. For the larger datasets we use 20-cores. In the smaller datasets 5-cores are used. Several reasons exist to construct p-cores. By eliminating infrequent items, the size of the data is dramatically reduced allowing the application of recommendation techniques that would otherwise be computationally impractical. By removing rarely occurring users, resource or tags, noise in the data can be dramatically reduced. Because of their scarcity, these are the very items likely to confound recommenders. Recommendation in the so-called long tail is a valid area of exploration, but it lies outside the scope of this paper. Bibsonomy enables its users to annotate both URL bookmarks and journal articles. The dataset was gathered on 1 January 2009 encompassing the entire system. This data set has been made available online by the system administrators [6]. They have pre-processed the data to remove anomalies. A 5-core was taken. It contains 13,909 annotations with 357 users, 1,738 resources and 1,573 tags. Citeulike is a popular online tool used by researchers to specifically manage and catalog journal articles. The site owners make their dataset freely available to download. We use a snapshot taken as of 17 February Once a 5-core was computed, the remaining dataset contains 2,051 users, 5,376 resources, 3,343 tags and 105,873 annotations. MovieLens is a data set gathered from the corresponding MovieLens Web site and is administered by the GroupLens research lab at the University of Minnesota. It contains users, rating of movies, and tags. A 5-core was generated from the data resulting in 35,366 annotations with 819 users, 2,445 resources and 2,309 tags. Delicious is a popular Web site in which users annotate URLs. On 19 October 2008, 198 of the most popular tags were taken from the user interface and the site was recursively explored. From 20 October to 15 December, the complete profiles of 524,790 users were collected. Due to memory and time constraints, 10% of the user profiles was randomly selected, and a 20-core taken for experiments. The dataset is our largest, containing 7,665 users, 15,612 resource and 5,746 tags. It contains 720,788 annotations. Amazon is one of the world s largest retailers. The site includes a myriad of ways for users to express and discover opinions of the products: ratings, editorial reviews, customer reviews, product details, and customer purchasing habits. Recently, Amazon has added social tagging to this list. Beginning on 1 July 2009 we recursively explored the site to gather 1.5 million user profiles. Many users had extremely small profiles or used idiosyncratic tags. After taking a 20-core of the data it contained 498,217 annotations with 8,802 users, 10,679 resource and 5,559 tags. LastFM users upload their music profile, create playlists and share their musical tastes online. We selected 100 random users from the system and recursively explored the friend network. Only about 20% of the users had annotated a resource. Users have the option to tag songs, artists or albums. The tagging data here is limited to album an-

5 notations. Experimentation on artists and song data reveal similar trends. A p-core of 20 was drawn from the data. It contains 2,368 users, 2,350 resources, 1,141 tags and 172,177 annotations. 4.2 Methodology Each user s annotations were divided equally among five folds. Four folds were used as training data to build the recommenders. The fifth was used to train the model parameters and ascertain the optimal weights of the components in the hybrids. The results of the fifth fold was then discarded and we performed four fold cross validation on the remaining folds. The results were averaged over each user, then over the final four folds. The recommenders are evaluated on their ability to recommend tags given a user-resource pair. The user and resource for each annotation where submitted to the recommenders and the recommenders returned a set of tags, T r. These were then evaluated against the tags in the holdout annotation, T h. Recall is a common metric for evaluating the utility of recommendation algorithms. It measures the percentage of items in the holdout set that appear in the recommendation set. Recall is a measure of completeness and is defined as: Th Tr recall = (9) T h Precision is another common metric for measuring the usefulness of recommendation algorithms. It measures the percentage of items in the recommendation set that appear in the holdout set. Precision measures the exactness of the recommendation algorithm and is defined as: Th Tr precision = (10) T r The recall and precision will vary depending on the size on the recommendation set. In the following experiments we present the metrics with recommendation sets of size one through ten. 4.3 Experimental Results In this section we offer some general observations about the experimental results reported in Figure 1. We then examine each dataset individually before offering a summary of our conclusions. After tuning the variables we chose a k of 30 for all collaborative filtering techniques. The trend was for the recall and precision to steadily increase as k was increased and then suffer from diminishing returns. PITF, the pair-wise interaction tensor factorization model, was built with 64 features and a learning rate of 0.03 [14]. It was trained until convergence. We did experiments with 10 to 100 features. The results exhibited a sharp increase and then leveled out as the number of features approached 50. The hybrid reported in Figure 1 is composed of the two popularity based recommenders and four collaborative filtering recommenders. We have purposely constructed the hybrid with simple recommenders in order to permit insights into the datasets that might otherwise be obscured. By observing the importance of a component to the hybrid, we may infer the importance of the dimensions covered by that component. The composition of the hybrids is reported in Table 1. The hybrids do not draw upon PITF. A motivation of this paper is to demonstrate that hybrid recommenders can integrate multiple dimensions of the data by exploiting simple components. If PITF had been included in the hybrid it would not be clear if the success of the hybrid was owed to PITF or the ability of the hybrid to produce a synergistic blend of its constituent parts. Instead, we report the PITF results because it represents the state-of-the-art tag recommender and therefore offers an important point of comparison. While not evaluated in this paper, experimentation has revealed that incorporating PITF into the hybrid produces a small improvement over both PITF and the linear weighted hybrid. In all six datasets the hybrid outperforms its constituent parts, revealing that a linear weighted hybrid can exploit multiple dimensions of the data through its components. These components are not individually required to cover all dimensions of the data, and may instead focus on a particular dimension such as the relationship between tags and resources. When aggregated into a single framework, the components provide complementary information while maintaining their simplicity, speed and insights into the data. The hybrid is competitive with PITF, often surpassing it. In MovieLens PITF proves marginally better. In Bibsonomy, Citeulike and LastFM the results are very similar. In Delicious and Amazon the hybrid is clearly superior. The difference between Delicious and Amazon versus the other datasets is the diversity of the user profiles. Citeulike and Bibsonomy users focus on their area of expertise. Movie- Lens and LastFM users gravitate toward particular genres of music and movies. In Delicious, however, the users are able to tag web pages from across the entire Internet. Consequently, the user profiles often contain numerous unrelated topics. Similarly, Amazon users do not restrict their annotations to particular categories. The user profiles reflect the diversity one might expect of a consumer visiting the world s largest online retailer. These diverse user profiles are difficult to characterize with a feature space model, the foundation of PITF. When recommending tags, PITF cannot draw upon particular features while ignoring others. PITF may recommend a tag not relevant to the particular context. In contrast, user-based and item-based collaborative filtering is able to focus on the most relevant parts of the user profile. User-based collaborative filtering only recommends tags applied to the query resource, narrowing the focus of the recommendation regaress of the diversity in the user profile. Item-based collaborative filtering techniques construct a neighborhood of resources from the user profile most similar to the query resource, effectively ignoring parts of the user profile that are not relevant to the immediate recommendation task. Our proposed linear weighted hybrid inherits the capacity to focus on specific aspects of the user profile. The hybrid offers additional benefits. When constructed from simple yet fast components, the hybrid itself maintains these properties offering a highly scalable and easily updatable solution for tag recommendation. It is possible to explain the results from the component recommenders and consequently the hybrid itself. In contrast PITF is a black box with little explanatory capacity.

6 d d Figure 1: Recall (x-axis) and precision (y-axis) plotted for recommendation sets of size one through ten on the six datasets. The hybrid also offers extensibility. In this work we focused on recommenders which focus primarily on the URT data model. Other recommenders could be incorporated into the hybrid based on recency, context or content. A recommender based on recency might favor tags recently added to the user profile over tags that have not been used lately. A recommender might interpret context in a myriad of ways: recent queries, recently visited resources, or even routinely visited user profiles. Content-based recommenders might propose author s names, movie genres or product information. Other recommendation techniques are possible and can be easily included as a component in the proposed

7 pop u pop r KNN ur KNN ut KNN ru KNN rt Bibsonomy Citeulike MovieLens Delicious Amazon LastFM Table 1: Contribution of the individual components in the hybrids for each of the six data sets. framework. Other state-of-the-art techniques are often unable to accommodate this information. Generally, the hybrid draws little strength from the two popularity-based algorithms in favor of the collaborative filtering methods as shown in Table 1. KNN ur appears to be universally important across all six datasets accounting for as much as 47% of the hybrid. In most cases KNN rt is also extremely important. We now turn our attention to the six datasets and discuss each individually in respect to the performance of the individual components and the performance of the integrative models Bibsonomy The performance of the tag recommenders is presented in Figure 1. In Bibsonomy both pop u and KNN ru perform poorly. These techniques recommend tags drawn from the user profile. KNN rt also recommends tags from the user profile but performs far better; it relies on tags rather than users to model the resource. The methods that rely on the resource-tag information (pop r, KNN UR, KNN UT, KNN RT ) are tightly grouped and are among the top performing components. This analysis suggests that for the purpose of tag recommendation in Bibsonomy, the interaction between the resource and tag dimensions is dominant over the interaction between the user and tag dimensions. The integrative techniques offer a large improvement over the individual recommenders. The hybrid and PITF produce nearly identical results, marking the hybrid as a viable alternative. The hybrid relies most strongly on KNN ut and KNN rt as shown in Table 1. The user-based and itembased collaborative filtering methods appear to complement one another. This improvement may be explained through an analysis of the application. Bibsonomy is designed for researchers to share and organize scholarly references and web sites. When annotating journal articles, users often focus on their area of expertise and use domain driven tags. In this case, KNN rt may be particularly relevant as reflected in its performance in Citeulike, an application which focuses entirely on publications. When tagging web pages users may exhibit broader interests and employ more user-specific tags. For the purpose of tagging web pages, KNN ut demonstrates efficacy as it does in Delicious, a site devoted to web pages. In an application that permits the tagging of both types of resources, the hybrid can achieve maximum effectiveness when combining these two complementary components Citeulike In Citeulike we observe a social annotation system that is entirely focused on scholarly publications. Its users are often interested in a narrow field and employ tags taken from their respective research communities. In this context it is not surprising that KNN rt performs so well. It creates a neighborhood of resources drawn from the user profile and recommends tags which the user applied to these similar resources. Because the users are often interested in a narrow domain, it is relatively easy to find similar resources in the user profile. Because the user is motivated to organize resources for later retrieval (perhaps when citing research in his own publications), the tags applied to the neighbors are a good indicator of which tags should be applied for the new resource. The utility of KNN rt is also demonstrated in its dominance in the hybrid. Its α is more than 50% as shown in Table 1. As usual KNN ut plays an important role in the hybrid, promoting tags that have been applied to the resource by other users. The hybrid outperforms PITF for smaller recommendation sets, but as the size of the recommendation set increases the results become nearly identical MovieLens MovieLens exhibits similar patterns to Citeulike. The ordering of the components is nearly identical. The hybrid and PITF produce a modest improvement over KNN rt and the hybrid is composed primarily of KNN rt and secony by KNN ut. This may be due to the similarity of how users interact with the two systems. MovieLens users will gravitate toward particular types of movies; in Citeulike users will focus on their area of research. In MovieLens users might be influenced by the labels often attributed to movies ( action, horror, romance ); likewise, Citeulike users often employ labels taken from their area of expertise. The similarity in how users interact with the system result in datasets with similar underlying characteristics. The composition of the hybrid is mostly KNN ut and KNN rt. In this dataset, PITF outperforms the hybrid by a small but statistically significant amount. PITF appears able to identify important latent features unattainable by the component recommenders and consequently the hybrid Delicious Delicious is our largest and most diverse dataset. It contains 720,788 annotations, in which users tag web pages. The worst recommender is pop u. In no other dataset does it perform so poorly. This indicates that users in Delicious are not as likely to reuse tags as users in other systems, perhaps because the resource space is much broader, encompassing the entire Internet. On the other hand, pop r does remarkably well for such a simple recommender, revealing that the users are arriving at a consensus on how to label resources.

8 The two user-based collaborative filtering methods perform similarly. Drawing upon a neighbor s opinion about a web site, appears to do well whether or not that neighbor was discovered by modeling users as resources or as tags. In contrast there is a marked difference in the two item-based methods: KNN rt does far better than KNN ru, suggesting once again that in the confines of tag recommendation resources are better modeled by tags than by users. PITF outperforms all the individual recommenders. The hybrid offers a clear improvement over the other methods including PITF. As with most of the datasets it strongly relies on KNN ut and KNN rt Amazon Amazon presents one of the easier targets for tag recommendation. The two integrative models achieve better than 95% recall for a recommendation set with ten tags. The hybrid clearly outperforms PITF. In this dataset KNN ur and KNN ut are relatively close and run parallel to one another. Likewise, KNN ru and KNN rt do the same. This congruence suggests that multiple dimensions of the dataset contain valuable information. Given the task of tag recommendation, however, it appears that it is marginally better to model users and resources over the tag space. Amazon users tag products for later retrieval. Very often they use tags drawn from the product space such as action or dvd. This behavior is similar to that observed in Citeulike. In contrast, Amazon users rarely limit themselves to a narrow range of items. They may freely label books, electronics or clothing. As a result the user-based collaborative filtering is more competitive. It selects tags already applied to the resource rather than relying on tags applied by the user to similar items. Unlike Citeulike, it is not as likely that the user profile will contain these similar items LastFM LastFM is another easy target for tag recommenders offering more than 90% recall. The results of the two integrative approaches are so similar that the recall-precision lines obscure one another. LastFM users appear to reuse tags to a high degree as made evident by the success of pop u. In contrast the poor results of pop r show that users do not often agree on how to label a resource. In LastFM, item-based collaborative filtering does very well drawing upon the user s prevalence to tag similar items in a similar manner. User-based filtering, which relies on the opinions of others does poorly. The composition of the hybrid reveals a sharp departure from the other datasets. It favors KNN ru over KNN rt even though KNN rt does marginally better as an individual recommender. The importance of modeling resources as users in the hybrid may be due to the interaction of users within the social annotation system. An important focus of the application is the sharing and discovery of resources through the user space. 4.4 Summary These results underscore the importance of an integrative approach to tag recommendation in social annotation systems. Social annotation systems vary in how users interact with the system. The differences between datasets make the performance of individual recommenders unpredictable. For example, KNN ru does well in LastFM but performs poorly in Delicious. In contrast, the integrative techniques perform well regaress of the characteristics of the data. The proposed linear weighted hybrid offers additional benefits. It is easily extensible. In this work we constructed the hybrid with popularity based and collaborative filtering techniques, but the hybrid could be augmented with recommendation techniques that draw from different approaches such as recency, content or context based recommenders. When constructed from individual components the hybrid is easily updatable and suitable for large scale deployment. The use of individual components also permits the examination of the underlying characteristics of the data through an analysis of the contributions of the components and the dimensions of the data which they exploit. Furthermore individual recommendations can be explained, a capability not shared by black-box recommenders such as PITF. In many cases the hybrid outperforms the pair-wise interaction tensor factorization model. In Delicious and Amazon where the user models are most diverse the benefit is most noticeable. This marks our proposed linear weighted hybrid as a viable state-of-the-art tag recommender. 5. USING INFORMATION CHANNELS TO EXPLAIN THE PERFORMANCE OF TAG RECOMMENDERS Our results have demonstrated a difference in how individual component recommenders perform. In this section we turn our attention to why these differences may occur. To that end we introduce the notion of information channels. An information channel models the relationship between the underlying dimensions of an annotation system: users, resources and tags. A strong information channel between two dimensions means that information in the first dimension will be useful in building a predictor for the second dimension. For example, a strong information channel between users and tags means that user characteristics will be a good basis on which to predict tags. We first define information channels in terms of conditional entropy. We then explore the impact of information channels on the previously presented component recommenders. Finally, we offer a summary of our findings. 5.1 Quantifying Information Channels We propose entropy and conditional entropy for the evaluation of information channels. Entropy measures the amount of uncertainty associated with a dimension, in this case the user, resource or tag dimensions. It relies heavily on probabilities, however the notion of probabilities in social annotation systems can be ambiguous. The probability of resource might be its likelihood to occur in a user profile, a tag profile or in an annotation. We define the probability of a resource r as: u U t T URT(u, r, t) p(r) = (11) y where y is defined as the number of non-zero entries in URT. We may then define the entropy as: H(R) = r R p(r)log yp(r) (12)

9 H(U) H(U R) H(U T ) H(R) H(R U) H(R T ) H(T ) H(T U) H(T R) Bibsonomy Citeulike MovieLens Delicious Amazon LastFM Table 2: The entropy and conditional entropy of users, resources and tags across all six datasets. Entropy calculations often use the log base of 2, 10 or e. In this work we use a base of y. Doing so bounds the maximum entropy to 1. This will not change the relative values within a dataset, but it will permit the comparison of values across datasets. Conditional entropy measures the uncertainty of a dimension given another dimension. The conditional entropy of the resource space given the tag space is defined as: H(R T )= p(r, t) p(r, t)log y (13) p(t) r R t T where p(r, t) is the likelihood of r and t occurring together in URT, or more formally: u U p(r, t) = URT(u,r,t) (14) y The conditional entropy of resources given users, H(R U) can be similarly calculated. Once H(R), H(R T ) and H(R U) have been calculated, it is possible to evaluate the information channels. If H(R T ) is roughly equal to H(R), it means that tags are not offering additional information about the resource space; it might then be difficult to predict a resource given a tag. On the other hand, if H(R T ) is less than H(R) it means that tags may be a good predictor of resources. Comparing the H(R T ) and H(R U) values may suggest which information channel is most useful. Analogous definitions can be constructed for the entropy and conditional entropy of the user and tag spaces. It is important to note that H(R T ) is not equal to H(T R). It may be the case that tags are good predictors of resources, but resources are not good predictors of tags. 5.2 Information Channels and Component Recommenders The metrics are reported in Table 2. The entropy of the tag space appears to coincide with the difficulty in recommending tags. The largest H(T ) is found in MovieLens, where the top recommenders achieve a precision of just over 50%. The next largest values occur in Citeulike and Bibsonomy where precision reaches 60% and 70% respectively. Amazon and LastFM produce the lowest values and allow precisions of more than 80%. In general the higher the entropy of the tag space, the more difficult it is to recommend tags. The exception to this trend is Delicious which appears to have low entropy but presents a more difficult target. This is explainable by the higher H(T U) and H(T R); the user and resource dimensions do not offer the same utility as they do in Amazon and LastFM. The two user-based collaborative filtering methods build a neighborhood of similar users. This neighborhood is restricted to users that have annotated the query resource. In this respect both KNN ur and KNN ut draw from the userresource channel. Both algorithms recommend tags applied to the input resource, emphasizing the resource-tag channel. The algorithms differ in the way they model users. KNN ur models users over the resource space, reusing the user-resource channel. KNN ut, on the other hand, models users over the tag space adding a new dimension to the algorithm. This fundamental analysis based on information channels suggests that KNN ut should outperform KNN ur. In all six cases presented in Figure 1 it does. Quantifying the strength of the information channels permits further insights. In Delicious, KNN ur comes closest to KNN ut. H(U T ) is compared to the H(U) of 0.551; it appears that tags are not adding a great deal of new information. The resource-user channel, on the other hand, is stronger; the H(U R) of suggests that resources are much better than tags at modeling users. In this case, it is advantageous to reuse the user-resource channel. Bibsonomy, Citeulike and Amazon show similar trends. The most extreme difference between KNN ur and KNN ut occurs in LastFM. In this case H(U R) and H(U T ) show that resources are not any better than tags in modeling users and the additional dimension covered by KNN ut allows the better results. MovieLens displays similar characteristics. In the item-based collaborative filtering methods the recommended tags are drawn directly from the user profile stressing the user-tag channel. The user-resource channel is exploited by focusing on resources from the user s profile. The two methods differ in how the resource is represented. KNN ru models the resource as a vector over the user space and KNN rt models the resources as a vector over the tag space. Since KNN rt is adding an additional information channel to the approach, we expect it to outperform KNN ru. In all six cases we observe this to be true. This theoretical analysis based on information channels is once again augmented by an examination of the metrics. In LastFM KNN ru performs nearly as well as KNN rt This is due to the fact that the H(T U) is so low in LastFM; users reuse tags with such consistency that it matters little how the resources are modeled. Likewise the congruence of the two models in Amazon is owed to the low overall H(T ) and ability to represent resources as tags demonstrated by H(T R). In the remaining datasets where KNN rt is clearly outperformed by KNN rt, H(T ) is larger and it appears to be more difficult to model users with resources. These results point toward a framework for understanding the structure of social annotation data. These systems vary in the way users interact with the application, producing underlying characteristics which draw upon different dimensions of the data. Our information channel metrics

10 based on entropy and conditional entropy attempt to reveal these characteristics and explain the performance of tag recommenders across several datasets. 6. CONCLUSIONS This paper has explored the problem of tag recommendation in social annotation systems and proposed a weighted linear hybrid incorporating simple popularity and collaborative filtering components. The success of the hybrid over the lower-dimensional components demonstrates clearly the importance of an integrative approach that exploits multiple dimensions of the data. Evaluations also show that the hybrid matches or outperforms a state-of-the-art model-based algorithm based on tensor factorization (PITF), particularly when the user profiles are diverse. The weighted hybrid has the additional advantages of being more efficient, scalable, extensible and explainable than PITF. Experiments across six real-world datasets reveal interesting differences between social annotation applications, a result of the widely-varying user populations, resource types and application characteristics found in these applications. These differences are revealed most clearly in the performance of the individual components of the hybrid, which vary widely from dataset to dataset. By measuring characteristics of the data via the metrics of entropy and conditional entropy, we show that it is possible to explain in qualitative terms the reasons for these differences in recommender performance. ACKNOWLEDGMENTS This work was supported in part by the National Science Foundation Grant IIS and a grant from the Department of Education, Graduate Assistance in the Area of National Need, P200A REFERENCES [1] R. Burke. Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction, 12(4): , [2] M. Deshpande and G. Karypis. Item-Based Top-N Recommendation Algorithms. ACM Transactions on Information Systems, 22(1): , [3] J. Gemmell, M. Ramezani, T. Schimoler, L. Christiansen, and B. Mobasher. A Fast Effective Multi-Channeled Tag Recommender. ECML/PKDD 2009 Discovery Challenge Workshop, part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 59 63, [4] J. Gemmell, T. Schimoler, M. Ramezani, L. Christiansen, and B. Mobasher. Improving FolkRank With Item-Based Collaborative Filtering. Recommender Systems & the Social Web, [5] J. Herlocker, J. Konstan, A. Borchers, and J. Ri. An Algorithmic Framework for Performing Collaborative Filtering. In 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 237. ACM, [6] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. BibSonomy: A social bookmark and publication sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pages Citeseer, [7] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information Retrieval in Folksonomies: Search and ranking. Lecture Notes in Computer Science, 4011: , [8] R. Jaschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag Recommendations in Folksonomies. Lecture Notes In Computer Science, 4702:506, [9] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Ri. GroupLens: Applying Collaborative Filtering to Usenet News. Communications of the ACM, 40(3):87, [10] L. Marinho, C. Preisach, L. Schmidt-Thieme, I. Cantador, D. Vallet, J. Jose, H. Cao, M. Xie, L. Xue, C. Liu, et al. ECML PKDD Discovery Challenge 2009-DC09. [11] A. Mathes. Folksonomies-Cooperative Classification and Communication Through Shared Metadata. Computer Mediated Communication, (Doctoral Seminar), Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, December, [12] P. Mika. Ontologies are us: A unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 5(1):5 15, [13] S. Rene and L. Schmidt-Thieme. Factor Models for Tag Recommendation in BibSonomy. ECML/PKDD 2008 Discovery Challenge Workshop, part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages , [14] S. Rene and L. Schmidt-Thieme. Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation. In Proceedings of the third ACM international conference on Web search and data mining, pages ACM, [15] G. Salton, A. Wong, and C. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11): , [16] B. Sarwar, G. Karypis, J. Konstan, and J. Rei. Item-Based Collaborative Filtering Recommendation Algorithms. In 10th International Conference on World Wide Web, page 295. ACM, [17] U. Shardanand and P. Maes. Social Information Filtering: Algorithms for Automating ŞWord of MouthŤ. In SIGCHI Conference on Human Factors in Computing Systems, pages New York, NY, USA, [18] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2008 ACM conference on Recommender systems, pages ACM, [19] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis. IEEE Transactions on Knowledge and Data Engineering, 2009.

Tag-Based Resource Recommendation in Social Annotation Applications

Tag-Based Resource Recommendation in Social Annotation Applications Tag-Based Resource Recommendation in Social Annotation Applications Jonathan Gemmell, Thomas Schimoler, Bamshad Mobasher, and Robin Burke Center for Web Intelligence, School of Computing, DePaul University

More information

A Fast Effective Multi-Channeled Tag Recommender

A Fast Effective Multi-Channeled Tag Recommender A Fast Effective Multi-Channeled Tag Recommender Jonathan Gemmell, Maryam Ramezani, Thomas Schimoler, Laura Christiansen, and Bamshad Mobasher Center for Web Intelligence School of Computing, DePaul University

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

Repositorio Institucional de la Universidad Autónoma de Madrid.

Repositorio Institucional de la Universidad Autónoma de Madrid. Repositorio Institucional de la Universidad Autónoma de Madrid https://repositorio.uam.es Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of

More information

Collaborative Tag Recommendations

Collaborative Tag Recommendations Collaborative Tag Recommendations Leandro Balby Marinho and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University of Hildesheim, D-31141 Hildesheim, Germany

More information

Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

More information

Relational Classification for Personalized Tag Recommendation

Relational Classification for Personalized Tag Recommendation Relational Classification for Personalized Tag Recommendation Leandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Automatically Building Research Reading Lists

Automatically Building Research Reading Lists Automatically Building Research Reading Lists Michael D. Ekstrand 1 Praveen Kanaan 1 James A. Stemper 2 John T. Butler 2 Joseph A. Konstan 1 John T. Riedl 1 ekstrand@cs.umn.edu 1 GroupLens Research Department

More information

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK PRESENTED BY: DEEVEN PAUL ADITHELA 2708705 OUTLINE INTRODUCTION DIFFERENT TYPES OF FILTERING

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, Eugenio Di Sciascio Polytechnic University of Bari, Italy {ostuni,mirizzi}@deemail.poliba.it,

More information

arxiv: v1 [cs.ir] 29 Sep 2013

arxiv: v1 [cs.ir] 29 Sep 2013 Improving tag recommendation by folding in more consistency Modou Gueye 1,2, Talel Abdessalem 1, and Hubert Naacke 3 arxiv:1309.7517v1 [cs.ir] 29 Sep 2013 1 Institut Telecom - Telecom ParisTech 46, rue

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Web Personalization & Recommender Systems

Web Personalization & Recommender Systems Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

More information

PERSONALIZED TAG RECOMMENDATION

PERSONALIZED TAG RECOMMENDATION PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications

More information

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Understanding the user: Personomy translation for tag recommendation

Understanding the user: Personomy translation for tag recommendation Understanding the user: Personomy translation for tag recommendation Robert Wetzker 1, Alan Said 1, and Carsten Zimmermann 2 1 Technische Universität Berlin, Germany 2 University of San Diego, USA Abstract.

More information

A Constrained Spreading Activation Approach to Collaborative Filtering

A Constrained Spreading Activation Approach to Collaborative Filtering A Constrained Spreading Activation Approach to Collaborative Filtering Josephine Griffith 1, Colm O Riordan 1, and Humphrey Sorensen 2 1 Dept. of Information Technology, National University of Ireland,

More information

CS224W Project: Recommendation System Models in Product Rating Predictions

CS224W Project: Recommendation System Models in Product Rating Predictions CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history

More information

Web Personalization & Recommender Systems

Web Personalization & Recommender Systems Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM SUHAS PATIL [M.Tech Scholar, Department Of Computer Science &Engineering, RKDF IST, Bhopal, RGPV University, India] Dr.Varsha Namdeo [Assistant Professor, Department

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

A Recommender System Based on Improvised K- Means Clustering Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:

More information

Music Recommendation with Implicit Feedback and Side Information

Music Recommendation with Implicit Feedback and Side Information Music Recommendation with Implicit Feedback and Side Information Shengbo Guo Yahoo! Labs shengbo@yahoo-inc.com Behrouz Behmardi Criteo b.behmardi@criteo.com Gary Chen Vobile gary.chen@vobileinc.com Abstract

More information

Joining Collaborative and Content-based Filtering

Joining Collaborative and Content-based Filtering Joining Collaborative and Content-based Filtering 1 Patrick Baudisch Integrated Publication and Information Systems Institute IPSI German National Research Center for Information Technology GMD 64293 Darmstadt,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Contextion: A Framework for Developing Context-Aware Mobile Applications

Contextion: A Framework for Developing Context-Aware Mobile Applications Contextion: A Framework for Developing Context-Aware Mobile Applications Elizabeth Williams, Jeff Gray Department of Computer Science, University of Alabama eawilliams2@crimson.ua.edu, gray@cs.ua.edu Abstract

More information

Tag-Based Contextual Collaborative Filtering

Tag-Based Contextual Collaborative Filtering Tag-Based Contextual Collaborative Filtering Reyn Nakamoto Shinsuke Nakajima Jun Miyazaki Shunsuke Uemura Abstract In this paper, we introduce a new Collaborative Filtering (CF) model which takes into

More information

Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs

Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs Yongfeng Zhang, Tsinghua University zhangyf07@gmail.com Outline Research Background Research Topic Current

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Recommender Systems: User Experience and System Issues

Recommender Systems: User Experience and System Issues Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Summer 2005 1 About me Professor of Computer Science & Engineering,

More information

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.

More information

A mixed hybrid recommender system for given names

A mixed hybrid recommender system for given names A mixed hybrid recommender system for given names Rafael Glauber 1, Angelo Loula 1, and João B. Rocha-Junior 2 1 Intelligent and Cognitive Systems Lab (LASIC) 2 Advanced Data Management Research Group

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Collaborative Filtering using a Spreading Activation Approach

Collaborative Filtering using a Spreading Activation Approach Collaborative Filtering using a Spreading Activation Approach Josephine Griffith *, Colm O Riordan *, Humphrey Sorensen ** * Department of Information Technology, NUI, Galway ** Computer Science Department,

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

Review on Techniques of Collaborative Tagging

Review on Techniques of Collaborative Tagging Review on Techniques of Collaborative Tagging Ms. Benazeer S. Inamdar 1, Mrs. Gyankamal J. Chhajed 2 1 Student, M. E. Computer Engineering, VPCOE Baramati, Savitribai Phule Pune University, India benazeer.inamdar@gmail.com

More information

KNOW At The Social Book Search Lab 2016 Suggestion Track

KNOW At The Social Book Search Lab 2016 Suggestion Track KNOW At The Social Book Search Lab 2016 Suggestion Track Hermann Ziak and Roman Kern Know-Center GmbH Inffeldgasse 13 8010 Graz, Austria hziak, rkern@know-center.at Abstract. Within this work represents

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

A Bayesian Approach to Hybrid Image Retrieval

A Bayesian Approach to Hybrid Image Retrieval A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in

More information

Telling Experts from Spammers Expertise Ranking in Folksonomies

Telling Experts from Spammers Expertise Ranking in Folksonomies 32 nd Annual ACM SIGIR 09 Boston, USA, Jul 19-23 2009 Telling Experts from Spammers Expertise Ranking in Folksonomies Michael G. Noll (Albert) Ching-Man Au Yeung Christoph Meinel Nicholas Gibbins Nigel

More information

Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback

Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback ABSTRACT Yi Fang Department of Computer Science Purdue University West Lafayette, IN 47907, USA fangy@cs.purdue.edu

More information

Using Data Mining to Determine User-Specific Movie Ratings

Using Data Mining to Determine User-Specific Movie Ratings Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

Project Report. An Introduction to Collaborative Filtering

Project Report. An Introduction to Collaborative Filtering Project Report An Introduction to Collaborative Filtering Siobhán Grayson 12254530 COMP30030 School of Computer Science and Informatics College of Engineering, Mathematical & Physical Sciences University

More information

A Constrained Spreading Activation Approach to Collaborative Filtering

A Constrained Spreading Activation Approach to Collaborative Filtering A Constrained Spreading Activation Approach to Collaborative Filtering Josephine Griffith 1, Colm O Riordan 1, and Humphrey Sorensen 2 1 Dept. of Information Technology, National University of Ireland,

More information

Social Data Exploration

Social Data Exploration Social Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization Workshop 12ème Séminaire POC LIP6 Dec 5 th, 2014 Collaborative data model User space (with attributes)

More information

Large Scale Graph Algorithms

Large Scale Graph Algorithms Large Scale Graph Algorithms A Guide to Web Research: Lecture 2 Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Stuttgart, Spring 2007 1 / 34 Talk Objective To pose an abstract computational

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

Detecting Tag Spam in Social Tagging Systems with Collaborative Knowledge

Detecting Tag Spam in Social Tagging Systems with Collaborative Knowledge 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery Detecting Tag Spam in Social Tagging Systems with Collaborative Knowledge Kaipeng Liu Research Center of Computer Network and

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

Movie Recommender System - Hybrid Filtering Approach

Movie Recommender System - Hybrid Filtering Approach Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Prowess Improvement of Accuracy for Moving Rating Recommendation System

Prowess Improvement of Accuracy for Moving Rating Recommendation System 2017 IJSRST Volume 3 Issue 1 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Scienceand Technology Prowess Improvement of Accuracy for Moving Rating Recommendation System P. Damodharan *1,

More information

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Majid Hatami Faculty of Electrical and Computer Engineering University of Tabriz,

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Tag-Based Contextual Collaborative Filtering

Tag-Based Contextual Collaborative Filtering DEWS007 M5-6 Tag-Based Contextual Collaborative Filtering Reyn NAKAMOTO, Shinsuke NAKAJIMA, Jun MIYAZAKI, and Shunsuke UEMURA Nara Institute of Science and Technology, 896-5 Takayama-cho, Ikoma-shi, Nara-ken,

More information

A Time-based Recommender System using Implicit Feedback

A Time-based Recommender System using Implicit Feedback A Time-based Recommender System using Implicit Feedback T. Q. Lee Department of Mobile Internet Dongyang Technical College Seoul, Korea Abstract - Recommender systems provide personalized recommendations

More information

Recommender Systems. Techniques of AI

Recommender Systems. Techniques of AI Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for

More information

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23 Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback Yi Fang and Luo Si Department of Computer Science Purdue University West Lafayette, IN 47906, USA fangy@cs.purdue.edu

More information

QueryLines: Approximate Query for Visual Browsing

QueryLines: Approximate Query for Visual Browsing MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com QueryLines: Approximate Query for Visual Browsing Kathy Ryall, Neal Lesh, Tom Lanning, Darren Leigh, Hiroaki Miyashita and Shigeru Makino TR2005-015

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

Chapter 2. Related Work

Chapter 2. Related Work Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

RSDC 09: Tag Recommendation Using Keywords and Association Rules

RSDC 09: Tag Recommendation Using Keywords and Association Rules RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing Asif Dhanani Seung Yeon Lee Phitchaya Phothilimthana Zachary Pardos Electrical Engineering and Computer Sciences

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

arxiv: v1 [cs.ir] 30 Jun 2014

arxiv: v1 [cs.ir] 30 Jun 2014 Recommending Items in Social Tagging Systems Using Tag and Time Information arxiv:1406.7727v1 [cs.ir] 30 Jun 2014 Paul Seitlinger Knowledge Technology Institute paul.seitlinger@tugraz.at Emanuel Lacic

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Recommender Systems. Nivio Ziviani. Junho de Departamento de Ciência da Computação da UFMG

Recommender Systems. Nivio Ziviani. Junho de Departamento de Ciência da Computação da UFMG Recommender Systems Nivio Ziviani Departamento de Ciência da Computação da UFMG Junho de 2012 1 Introduction Chapter 1 of Recommender Systems Handbook Ricci, Rokach, Shapira and Kantor (editors), 2011.

More information

arxiv: v2 [cs.lg] 15 Nov 2011

arxiv: v2 [cs.lg] 15 Nov 2011 Using Contextual Information as Virtual Items on Top-N Recommender Systems Marcos A. Domingues Fac. of Science, U. Porto marcos@liaad.up.pt Alípio Mário Jorge Fac. of Science, U. Porto amjorge@fc.up.pt

More information

SUGGEST. Top-N Recommendation Engine. Version 1.0. George Karypis

SUGGEST. Top-N Recommendation Engine. Version 1.0. George Karypis SUGGEST Top-N Recommendation Engine Version 1.0 George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 karypis@cs.umn.edu Last updated on

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

More Efficient Classification of Web Content Using Graph Sampling

More Efficient Classification of Web Content Using Graph Sampling More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information

More information

Detection and Extraction of Events from s

Detection and Extraction of Events from  s Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to

More information

amount of available information and the number of visitors to Web sites in recent years

amount of available information and the number of visitors to Web sites in recent years Collaboration Filtering using K-Mean Algorithm Smrity Gupta Smrity_0501@yahoo.co.in Department of computer Science and Engineering University of RAJIV GANDHI PROUDYOGIKI SHWAVIDYALAYA, BHOPAL Abstract:

More information

A Document-centered Approach to a Natural Language Music Search Engine

A Document-centered Approach to a Natural Language Music Search Engine A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler

More information

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation

More information