Predicting user preference for movies using movie lens dataset

Similar documents
Recommendation system Based On Cosine Similarity Algorithm

Hybrid Recommendation System Using Clustering and Collaborative Filtering

A Time-based Recommender System using Implicit Feedback

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM

Recommender System for volunteers in connection with NGO

A Survey on Various Techniques of Recommendation System in Web Mining

amount of available information and the number of visitors to Web sites in recent years

A Constrained Spreading Activation Approach to Collaborative Filtering

System For Product Recommendation In E-Commerce Applications

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using a Spreading Activation Approach

Recommender Systems - Content, Collaborative, Hybrid

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

A Constrained Spreading Activation Approach to Collaborative Filtering

Survey on Collaborative Filtering Technique in Recommendation System

A Scalable, Accurate Hybrid Recommender System

Recommender Systems using Collaborative Filtering D Yogendra Rao

Towards Time-Aware Semantic enriched Recommender Systems for movies

Keywords: geolocation, recommender system, machine learning, Haversine formula, recommendations

THE RECOMMENDATION ALGORITHM FOR AN ONLINE ART GALLERY

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Literature Survey on Various Recommendation Techniques in Collaborative Filtering

Real Estate Recommender System Using Case-Based Reasoning Approach

Seminar Collaborative Filtering. KDD Cup. Ziawasch Abedjan, Arvid Heise, Felix Naumann

Robustness and Accuracy Tradeoffs for Recommender Systems Under Attack

Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN)

Recommender Systems using Graph Theory

Proposing a New Metric for Collaborative Filtering

The Design and Implementation of an Intelligent Online Recommender System

Towards a hybrid approach to Netflix Challenge

Recommender Systems. Nivio Ziviani. Junho de Departamento de Ciência da Computação da UFMG

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

Analysis of Website for Improvement of Quality and User Experience

Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Recommender System using Collaborative Filtering Methods: A Performance Evaluation

Web Feeds Recommending System based on social data

A Recommender System Based on Improvised K- Means Clustering Algorithm

CS249: ADVANCED DATA MINING

Project Report. An Introduction to Collaborative Filtering

CSE 454 Final Report TasteCliq

Movie Recommender System - Hybrid Filtering Approach

Content-based Dimensionality Reduction for Recommender Systems

Study on Recommendation Systems and their Evaluation Metrics PRESENTATION BY : KALHAN DHAR

Recommendation System for Netflix

Using Data Mining to Determine User-Specific Movie Ratings

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

AN AGENT-BASED MOBILE RECOMMENDER SYSTEM FOR TOURISMS

Performance Comparison of Algorithms for Movie Rating Estimation

Web Service Recommendation Using Hybrid Approach

Michele Gorgoglione Politecnico di Bari Viale Japigia, Bari (Italy)

Feature-weighted User Model for Recommender Systems

Social Voting Techniques: A Comparison of the Methods Used for Explicit Feedback in Recommendation Systems

Collaborative Filtering based on User Trends

Review on Techniques of Collaborative Tagging

Data Mining Lecture 2: Recommender Systems

COLD-START PRODUCT RECOMMENDATION THROUGH SOCIAL NETWORKING SITES USING WEB SERVICE INFORMATION

COMP6237 Data Mining Making Recommendations. Jonathon Hare

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

KNOW At The Social Book Search Lab 2016 Suggestion Track

Movie Recommendation Using OLAP and Multidimensional Data Model

Prowess Improvement of Accuracy for Moving Rating Recommendation System

AN EFFECTIVE MODEL FOR IMPROVING THE QUALITY OF RECOMMENDER SYSTEMS IN MOBILE E-TOURISM

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

A Web Mining Methodology for Personalized Recommendations in E-commerce

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

Recommender Systems. Techniques of AI

Collaborative Filtering and Recommender Systems. Definitions. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Q.I. Leap Analytics Inc.

Part 11: Collaborative Filtering. Francesco Ricci

Research Article Novel Neighbor Selection Method to Improve Data Sparsity Problem in Collaborative Filtering

Part 11: Collaborative Filtering. Francesco Ricci

The Tourism Recommendation of Jingdezhen Based on Unifying User-based and Item-based Collaborative filtering

arxiv: v4 [cs.ir] 28 Jul 2016

Tag-Based Contextual Collaborative Filtering

Web Personalization & Recommender Systems

SUGGEST. Top-N Recommendation Engine. Version 1.0. George Karypis

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES

Application of Dimensionality Reduction in Recommender System -- A Case Study

A System for Identifying Voyage Package Using Different Recommendations Techniques

ER.STUFF. Kolhapur, Maharashtra, India.

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

José Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University

Solving the Sparsity Problem in Recommender Systems Using Association Retrieval

LECTURE 12. Web-Technology

How to predict IMDb score

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Clustering and Correlation based Collaborative Filtering Algorithm for Cloud Platform

Recommender Systems 6CCS3WSN-7CCSMWAL

Privacy-Preserving Collaborative Filtering using Randomized Perturbation Techniques

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM

Evaluation of the Item-Based Top-$ $i$ $N$ $/i$ $ Recommendation Algorithms. Technical Report

Vision Document. Online E-commerce Music CD Store Version 2.0

Experiences from Implementing Collaborative Filtering in a Web 2.0 Application

Explaining Recommendations: Satisfaction vs. Promotion

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

GENETIC ALGORITHM BASED COLLABORATIVE FILTERING MODEL FOR PERSONALIZED RECOMMENDER SYSTEM

Image Similarity Measurements Using Hmok- Simrank

A Recursive Prediction Algorithm for Collaborative Filtering Recommender Systems

Transcription:

Predicting user preference for movies using movie lens dataset Tanvi Bhagat 1, Megharani Patil 2 1 M.E Student, Department of Computer Engineering, Thakur College of Engineering, Mumbai. 2 Asst. Prof., Department of Computer Engineering, Thakur College of Engineering, Mumbai. Abstract- With the vast amount of data that the world has nowadays, institutions are looking for more and more accurate ways of using this data. Companies like Amazon use their huge amounts of data to give recommendations for users. Based on similarities among items, systems can give predictions for a new item's rating. Recommender systems use the user, item, and ratings information to predict how other users will like a particular item. Recommender systems are now pervasive and seek to make profit out of customers or successfully meet their needs. However, to reach this goal, systems need to parse a lot of data and collect information, sometimes from different resources, and predict how the user will like the product or item. Recommendation algorithms can be generally classified into three types: Non-Personalized, Content-Based, and Collaborative Filtering algorithms. But traditional collaborative filtering methods are inefficient especially when the user-rating data is extremely sparse. To solve this problem, we propose an approach to compute the user similarity with the type of users-rating items in this paper, and then we develop a collaborative filtering algorithm based on this approach. Furthermore, we put forward an improved collaborative filtering algorithm based on user similarity combination, which combines the user similarity based on user-rating items and the user similarity based on the types of user-rating items. Keywords-user based CF; recommendation; Pearson correlation; ratings; neighborhood formation I. INTRODUCTION Recommender System is a special type of information filtering technique that attempts to present information of items (such as movies, music, news) that are likely of interest to the user. Recommender Systems help users navigating through large product assortments, in making decisions in an e-commerce scenario and overcome information overload. A user would obviously prefer a website that recommends him something that is useful to him over a website that simply requires users to navigate into the site to find the products the user need. Recommender systems are now pervasive in consumers' lives. They aim to help users in finding items that they would like to buy or consider based on huge amounts of data collected. Amazon, Facebook, LinkedIn, and other commercial and social networking websites use these systems. Parsing a huge amount of data to predict a user's preference or his or her similarity with other group of users is the core of a recommender system. There are two main Recommendation techniques: Personalized and Non- Personalized Systems. A. PERSONALIZED RECOMMENDER SYSTEM E-commerce has been growing rapidly keeping the pace with the web. Its rapid growth has made both companies and customers face a new situation. E-commerce has been growing rapidly keeping the pace with the web. Its rapid growth has made both companies and customers face a new situation. The need for new marketing strategies such as one-to-one marketing and customer relationship management (CRM) has been stressed both from researches as well as from practical affairs. One solution to realize these strategies is personalized recommenders. In Personalized Systems, a separate list of products that they would like to purchase is generated for every customer. DOI : 10.23883/IJRTER.2017.3018.K3LXM 156

Personal Preference of the users is taken into account while making the recommendations. Example: IMDB website suggest users, movies based on their choices. B. NON-PERSONALIZED RECOMMENDER SYSTEMS Non-personalized recommender systems are the simplest type of recommender systems. As suggested by the name, these types of recommender systems do not take into account the personal preferences of the users. The recommendations produced by these systems are identical for each customer. In case of E-Commerce websites, the online retailer, based on the popularity of items, can either manually select the recommendations or the recommendations can be the top-n new products. For example, if we go to amazon.com as an anonymous user it shows items that are currently viewed by other members. These systems recommend items to consumers based on what other consumers have said about the items or rated them on an average. As seen earlier recommendations are simply suggestions or list of items that user might like and these recommendations are independent of the consumer. Non-personalized recommender systems mainly use two types of algorithms: Aggregated opinion recommender and Basic product association recommender. C. ROLE OF RECOMMENDER SYSTEMS IN E-COMMERCE: E-commerce sites to suggest products to their customers use recommender systems. The products can be recommended based on the top overall sellers on a site, based on the demographics of the customer, or based on an analysis of the past buying behavior of the customer as a prediction for future buying behavior. Broadly, these techniques are part of personalization on a site, because they help the site adapt itself to each customer. Recommender systems automate personalization on the Web, enabling individual personalization for each customer. Thus, Recommender systems enhance E-commerce sales in three ways: 1. Browsers into buyers: RS helps customers to buy products they wish to purchase. 2. Cross sell: suggest additional products. 3. Loyalty: Relationship between the user and the website is maintained. II. RECOMMENDATION PROCESS The below figure shows the recommendation process with its three basic tasks: 1. Representation 2. Neighborhood Formation 3. Recommendation Generation We will discuss the above tasks in the next section in detail. Figure 1: The Recommendation Process For the system to predict accurate ratings and generate perfect recommendations, we need to study types of filtering in deep, especially collaborative filtering. The next sections consist of all the detail study of content based filtering and collaborative filtering. Those items that are mostly similar to the positively rated ones will be recommended to the user. @IJRTER-2017, All Rights Reserved 157

A. CONTENT BASED FILTERING The item recommended by content-based filtering often indicates textual information, such as news webs and documents. And these items usually describe with keywords and its weights. Contentbased recommender systems work with profiles of users that are created at the beginning. A profile has information about a user and his taste. Taste is based on how the user rated items. Content Based algorithm is as follows: Step 1: Registered users create their user profile, where a login and password is created which provides md5 protection. Step 2: Information is stored in the database, which is queried to display a list of available items to the user. Step 3: Once the user selects the desired category, for example books, the database queries the records and returns a match for the chosen category. Step 4: Average of all the recommendations are calculated and only those above a certain threshold-which changes always- are displayed to the user. Step 5: Final Recommendations are displayed to user based on step 4. Content-based recommendation method use extra information of user's profile or item's profile in the computation. To give recommendation to one user, the profile of target user will analyze and items which matching to user's profile will be selected. Content based algorithm can work best with items that has lots of information like documents or news website. On Google website, a content based algorithm is used to give user news and information based on user's location. When we login to Google and go to news page, we can see all the news that is happening in the city where we stay. B. COLLABORATIVE FILTERING Collaborative recommendation is probably the most familiar, most widely implemented and most mature of the technologies. Collaborative recommender systems aggregate ratings or recommendations of objects, recognize commonalities between users on the basis of their ratings, and generate new recommendations based on inter-user comparisons. The greatest strength of collaborative techniques is that they are completely independent of any machine- readable representation of the objects being recommended, and work well for complex objects such as music and movies. The critical step of collaborative filtering approach lies in searching the similar preference customers with the active customer, that is, find the similar customers. After finding similar customers, it then presents recommendation for active customer according to the preference of similar ones. Figure 2: The basic collaborative Filtering Process Figure 2 shows the schematic diagram of the collaborative filtering process. CF algorithms represent the entire m n user-item data as a ratings matrix,. Each entry a i, j in represent the preference score (ratings) of the i th user on the j th item. Each individual rating is within a numerical scale and it can @IJRTER-2017, All Rights Reserved 158

as well be 0 indicating that the user has not yet rated that item. Researchers have devised a number of collaborative filtering algorithms that can be divided into two main categories-memory-based (userbased) and Model-based (item-based) algorithms [6]. In this section we provide a detailed analysis of CF-based recommender system algorithms. Typically, the workflow of a collaborative filtering system is: 1. A user's ratings can be viewed as an approximate representation of its interest in the corresponding domain. 2. The system matches this user's ratings against other users' and finds the people with most similar tastes. 3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user. III. PROPOSED IDEA In this system, a basic movie recommendation website is created. The user needs to make an account to be able to rate the movies and get recommendations as per his/her liking. The input is basically the ratings and the comments that a user gives to a particular movie and the output is the suggested rating about a movie that the user has not seen/rated. The movie recommendations are calculated on the basis of user based and item based collaborative filtering techniques. But before we get the result, like-minded users are grouped together by using Pearson correlation formula. Combined Recommendations of user based and item based CF: We use a combined recommendation i.e. outputs from both user as well as item based CF are used to provide recommendations. In many scenarios, the system is unable to find similar users due to sparseness of users; in these situations item based recommendation gives better results. Thus whenever there is a lack of similar users to get N' recommendation, the item based recommendation is used to fill the niche. Figure 3: Combined user and item based CF flowchart IV. BACKGROUND TECHNOLOGIES USED User Based Collaborative Filtering: User-based CF algorithms use the entire or a sample of the useritem database to generate a prediction. Every user is part of a group of people with similar interests. By identifying the so-called neighbors of a new user (or active user), a prediction of preferences on new items for him or her can be produced. The similarity calculation is a very important component of CF recommendation algorithm. We have proposed hybrid user- based similarity, which combines traditional user based similarity (Pearson correlation) with user rating item type similarity. The hybrid user based similarity between two users can be computed as follows: Step 1: Input user rating and movie type's data, that is, the data in table 1; @IJRTER-2017, All Rights Reserved 159

Step 2: According to the data in step 1, compute each user's average rating = 4, = 4; Step 3: Based on user-rating data and movie types, obtain the counts of each user-rating each type. For example, user a rates Avatar, Titanic 3D, and Inception, so user a rates Action types movie 1 time, Sci-Fi type movie 2 times, and adventure 1 time, and so on. The results show in table 2. Step 4: Firstly, find the item set Ci = {Titanic 3D, Inception} that is rated by both user k and user a; secondly, apply equation 2 to compute the Pearson correlation similarity between user a and user k. And the Pearson similarity = -0.5; Step 5: According to user-rating item type data in table 2 at step 3, seek out the common types Si = {Sci-Fi, Drama, History, Love, and Suspense}, and then apply equation 3 to calculate the user-rating item type similarity =0.7893; Step 6: chose reasonable similarity hybrid ratio =0.5 (assumed) and use results from step 4 and 5 to compute the hybrid user similarity =0.1447; Following characteristics of the hybrid similarity calculation method can be drawn from above example: 1) It makes more comprehensively use of the users' rating data, and avoids common rating items insufficient problem in the Pearson correlation similarity calculation method; 2) It alleviates the user-rating data sparsity problem, and is more suitable for user-rating data sparse compared with traditional similarity calculation approaches; 3) It can overcome the negative impact on recommendation precise due to single similarity calculation method inaccurate. V. IMPLEMENTATION The implementation of this system is done in JSP and it uses MySQL database to store all the relevant data. In order to validate the recommendation algorithms proposed by this study, this paper chooses the classic Movie lens data sets to do experiments. Movie lens dataset is provided by Group lens Group for free, which can be obtained from Group lens official website, and it also provides various datasets for collaborative filtering research. @IJRTER-2017, All Rights Reserved 160

This dataset contains 943 users 100,000 ratings for 1682 movies, and the rating score value is between 1 and 5. The first step of the implementation is the login page where every new user needs to register and an already registered user needs to login to get access to the list of movies. The main page of the website has login and registration links. The login will redirect the user to a login form with formal username and password inputs. Figure 4: Login/Registration page On logging in, for a normal user, there are various tabs displayed at the top, which are Home, Movie list, watch list, logout. If the logged in user is an admin then an additional tab of add movie is shown as only the admin can add new movies to the database. Figure 5: Demonstrates a normal user login page Figure 6: An admin login page In the E-commerce websites, users have different preferences, and they purchase commodities with different brand, style, color, etc. so their preferences can reflect from their purchase commodities types. Similarly the film is divided into action, crime, comedy, romance, animation, etc., and user's rated movies types can precisely reflect the user's preferences and tastes on movie, that is, if some users prefer to watch the action movie, they will rate more action movies than others. In this system there are 17 movie genres taken into consideration. @IJRTER-2017, All Rights Reserved 161

Figure 7: Displaying movie genres On clicking on a particular movie genre, few top rated movies in that particular genre are displayed. If the user clicks on a particular movie, the movie gets added to the watch-list (explained in the next image.) So, when a user selects a particular movie, few details of the movies like the release year, genres are displayed. Along with these details, the suggested rating (i.e. on a scale of 1 to 5, how much likely it is for you to like the movie) is also suggested. There is a submit rating and a submit comment section too where the users can submit the reviews. The movies, which are displayed below, are the suggested movies (similar movies) on the basis of the selected movie genre. Figure 8: Displaying the suggested rating for an unseen movie and the similar movies the user would be interested in. On clicking a particular movie, it seems that the user is interested to watch that movie, if not immediately then maybe sometime later. So whichever movie the user opens, gets added to the watch-list. Whenever the user logs-in after a few days then that movie is still in the watch-list and can be easily traceable saving some valuable time of the user to search that movie again. Figure 9: Movies that get added to the watch-list V. DISCUSSIONS AND CONCLUSION This paper puts forwards the user-ratings item type similarity calculation method. The experiment shows this similarity calculation method obviously outperforms the traditional methods at accuracy. As Number of item types is far less than that of items, so it can easily avoid rating data sparsity problems of similarity calculation and improve the scalability of similarity calculation method. In order to yield even better recommendation result, this work comes up with improved user-based collaborative filtering algorithm by similarity fusion. The experiment has demonstrated that the method is superior to traditional user-based collaborative filtering algorithm and the collaborative filtering algorithm based on user rating item type. And the improved collaborative filtering algorithm @IJRTER-2017, All Rights Reserved 162

can overcome the effect on similarity measurement due to the inaccuracy of one similarity method and take advantage of each similarity calculation method. The best hybrid ratio is varying in different data. REFERENCES 1. Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl, Analysis of Recommendation Algorithms for E- Commerce, GroupLens Research Group/Army HPC Research Center, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455. 2. Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl, Item-Based Collaborative Filtering Recommendation Algorithms, GroupLens Research Group/Army HPC Research Center, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455. 3. Gediminas Adomavicius and Alexander Tuzhilin Toward the Next Generation of Recommender Systems: A survey of te State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 6, June 2005. 4. Rohini Nair and Kavita Kelkar Implementation of Item and Content based Collaborative Filtering Techniques based on Ratings Average for Recommender Systems, International Journal of Computer Applications(0975-8887), Volume 65- No.24, March 2013. 5. Prem Melville and Vikas Sindhwani, Recommender Systems, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598. 6. Recommender Systems in e-commerce: Methodologies and Applications of Data Mining, Dr. Bharat Bhasker, K Srikumar,July 29, 2010. 7. Meisamshabanpoor and Mehregan Mahdavi Implementation of a Recommender System on Medical Recognition and Treatment, International Journal of e-education, e-businees, e-management and e-learning, Vol.2, No.4, August 2012. 8. Recommender Systems Handbook, Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B. (Eds.)2011. 9. Hill, W., Stead, L., Rosenstein, M., and Furnas, G. (1995) An Algorithmic Framework for Performing Collaborative Filtering, in Proceedings of ACM SIGIR 99. ACM press. 10. Aggarwal, C. C., Wolf, J.L., Wu, K., and Yu, P.S.(1999) Horting Hatches an Egg: A New Graph-theoretic Approach to Collaborative Filtering, in Proceedings of the ACM KDD 99 Conference, San Diego, CA. pp. 201-212. 11. Andreas Geyer-Schulz and Michael Hahsler Comparing two Recommender Algorithms with Help of Recommendations by Peers. 12. GlebBeliakov,TomasaCalvoandSimonJames Aggregationofpreferencesinrecommendersystems. 13. ManishaHiralall Recommendersystemsfore-shops Vrijeuniversity,Amsterdam. 14. AyhanDemiriz EnhancingProductRecommenderSystemsonSparseBinaryData. 15. Debajyoti Mukhopadhyay, Ruma Dutta, Anirban Kundu and Rana Dattagupta A Product Recommendation System using Vector Space Model and Association Rule. 16. Robin Burke Intergrating Knowledge-based and Collaborative-filtering Recommender Systems Recommender.com, Inc. and Information and Computer Science, University of California, Irvine, CA 92697. 17. Non-Personalized Recommender Systems with Pandas and Python by Marcel Caraciolo, Artificial Intelligence in Motion, A blog about scientific python, Data, Machine Learning and Recommender systems(aimotion.blogspot.in). @IJRTER-2017, All Rights Reserved 163