By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

Size: px
Start display at page:

Download "By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin"

Transcription

1 By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin

2 Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future Work Conclusion Q & A

3

4 Problem Statement Given a set of users with their previous ratings for a set of movies, can we predict the rating they will assign to a movie they have not previously rated? Netflix puts it as The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love. So what do they want? 10% improvement to their existing system. They are paying $1 Million for this.

5 Problem Statement Similarly, which movie will you like given that you have seen X-Men, X-Men II, X-Men : The Last Stand and users who saw these movies also liked X-Men Origins : Wolverine? Answer:?

6 Dataset Background for the problem Background for the Solution

7 Background - Dataset Netflix Prize Dataset Netflix released data for this competition Contains nearly 100 Million ratings Number of users (Anonymous) = 480,189 Number of movies rated by them = 17,770 Training Data is provided per movie To verify the model developed without submitting the predictions to Netflix probe.txt is provided To submit the predictions for competition qualifying.txt is used

8 Background - Dataset Data in the training file is per movie It looks like this Movie# Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating - Example 4: ,3, ,1, ,5,

9 Background - Dataset Data points in the probe.txt looks like this (Have answers) Movie# Customer# Customer# Data in the qualifying.txt looks like this (No answers) Movie# Customer#, DateofRating Customer#, DateofRating 1: : , , ,

10 Background Dataset stats Total ratings possible = 480,189 (user) * 17,770 (movies) = (8.5 Billion) Total available = 100 Million The User x Movies matrix has 8.4 Billion entries missing Sparse Data

11 Background of the problem Recommender Systems Examples: Yahoo, Google, youtube, Amazon. Recommend item that you might like. The recommendation is made based on past behavior. Collaborative Filtering [Gábor, 2009] What is it? Who collaborates and what is filtered? How can it be applied in this contest?

12 Background of the problem Earlier systems implemented in 1990s. GroupLens (Usenet articles) [Resnick, 1997] Siteseer (Cross linking technical papers)[resnick, 1997] Tapestry ( filtering) [Goldberg, 1992] Earlier solutions provided for users to rate the item. Two major divisions of methods Model based fit a model to the training data. Memory based Nearest Neighbor Methods.

13 Background for the Solution K-Nearest Neighbor (K-NN) method. Memory Based method. Measures Distance between the query instance and every instance in the training set. Find the K training instances with the least distance from query instance. Using these K instances, average their rating for this movie for these training instances. Distances can be measured using the following formulae.

14 Background for the Solution Distance formulae. Distance Formula Manhattan Distance features f 1 x i, f x j, f Euclidean Distance features f 1 ( x i, f x j, f ) 2 Minkowsky Distance p features ( x i x f 1 p, f j, f ) Mahalanobis Distance 1 ( x x ) ( x x ) i j i j T

15 Background for the Solution How important is distance measure? Curse of Dimensionality. Example: what if we were to characterize the movie by it actors, directors, writers, genre, and then all of its CREW? What is the problem? What if some attributes are more dominant than others? Example: Cost of home are much larger quantities than person s height.

16 Background of the Solution What if I was very conservative about my rating and someone else was too generous? I rate the movie I like the most as 3 and the least as 1. someone else rates his/her high at 5 and high at 3. So am I like this person? Difficult to say. We are comparing two people with very high personal biases. Which will result in obvious flawed similarity measure. Solution? Normalization of the data.

17 Background for the Solution Normalization What is that? How do we do it? How will it change my ratings? Won t I loose the original rating? We will calculate Mean rating for every user over the movies he / she has rated Also calculate standard deviation for the user s rating. From every rating we will subtract the user s mean rating and divide it by their standard deviation.

18 Background for the Solution Should all members of the neighborhood contribute equally to the prediction? Not always, we can argue that people who are similar to you, i.e. have least distance from you should contribute more than farther ones. This is done by weighing the prediction by the instance s distance from the query instance.

19 Background for the Solution Clustering Idea is to group the items together based on their attributes. Data is typically unlabeled. Similarity is measured using the distance between the two points. Example: Consider going in to a comic book shop and putting together comics from a pile of comics that are similar. Types: Partitional Clustering: K-Means Hierarchical clustering: Agglomerative Clustering

20 Background for the Solution K-Means clustering [MacQueen, 1967] Randomly select K instances as cluster centers. Label every data point with its nearest cluster centers. Re-compute the cluster centers. Repeat the above two steps until no instances change clusters or certain iterations have gone by. How is it related to our discussion today?

21 K Nearest Neighbor Algorithm Clustering Based Nearest Neighbor Algorithm

22 Proposed Solution K-Nearest Neighbor approach (Overview) Given a query instance q(movieid, UserId) normalize the data before processing. Find the distance of this instance with all the users who rated this movie. Of the these users select the K users that are nearest to the query instance as its neighborhood. Average the rating of the users form this neighborhood for this particular movie. This is the predicted rating for the query instance.

23 Proposed Solution - Example Example: (Representative data, not real) Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 5? 4 4

24 Proposed Solution - Example calculate the Mean and Standard Deviation vectors. meanrating standarddeviation Jim Sean John Sidd Penny Pete

25 Proposed Solution - Example Normalized data Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 1.15?

26 Proposed Solution - Example So now we have a query instance q(pete, Sita Aur Gita) i.e. we wish to evaluate how much will Pete like movie Sita Aur Gita on a scale of 1-5. To do this we need to indentify Pete s two neighbors who rated this movie. (2-NN case). Users who rated the movie Sita Aur Gita are. candidate_users Jim Sidd Penny

27 Proposed Solution - Example Users with their distance and the 2 neighbors in the neighborhood are Users Distance Jim Sidd Peny Nearest Neighbors are Jim and Sidd.

28 Proposed Solution - Example The average of the ratings by Jim and Sidd to movie Sita Aur Gita is So is our prediction correct? Not yet. This prediction is in normalized form. We need to bring it back to Pete s prediction level. How? Multiply by Standard Deviation of Pete s ratings. Add Pete mean rating to this product. ( * ) = So predicted rating for Pete is

29 Proposed Solution C-K-NN Clustering based Nearest Neighbor appraoch Obtain for every movie its genre from external sources. (IMDB in our case) Create for every user a vector representing each genre as one cell. In that cell we count number of movies that users has rated for the genre. (We have one such vector for each user.) Cluster the users as per the genres of the movie they have rated. Cluster centers of these clusters represent the collective opinion of the users in that cluster about the movies of that particular genre. We call them Super Users

30 Proposed Solution C-K-NN For each super user we predict rating of all the movies of that genre as the average of the ratings of the users that rated the movie. When presented with query point q(movieid, userid). We find all the genre for that movie. For each genre we calculate distance of the user from cluster centers for the genre. We select the nearest K cluster centers and average the rating of these cluster centers for the movie to predict movie rating for this genre. We average per genre predicted rating and get the predicted ratings for q.

31 Proposed Solution Example (C-K-NN) We use the data from our previous example. (recap) Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 5? 4 4

32 Proposed Solution Example (C-K-NN) We find genre for every movie. Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Matrix Star Wars Dark Knight Rocky 1 1 Sita Aur Gita 1 1 Star Trek 1 1 Cliffhanger A.I MI X-Men 1 1 1

33 Proposed Solution Example (C-K-NN) Convert User Movie Data to User Genre Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Jim Sean John Sidd Penny Pete

34 Proposed Solution Example (C-K-NN) We cluster users in to two clusters. Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Jim Sean John Sidd Penny Pete

35 Proposed Solution Example (C-K-NN) The query point as last time is q(pete, Sita Aur Gita) Per genre cluster look like (Genres of Sita Aur Gita ) Adventure Drama 1 2 Matrix Star Wars Sita Aur Gita 2 3 A.I. 5 Sita Aur Gita Star Trek 3 4 Cliffhanger A.I. 5 MI 2 1

36 Proposed Solution Example (C-K-NN) Distance of Pete from Cluster centers of Adventure 1 2 Pete Distance of Pete from Cluster centers of Drama Not applicable as Pete does not rate any movie from that genre. We try to find one (K=1) nearest cluster for Adventure genre. That is cluster two.

37 Proposed Solution Example (C-K-NN) Hence, the rating for the query point q(pete, Sita Aur Gita) calculated by taking the rating of cluster two of adventure genre. Our prediction is: 2 for this movie. What if Pete had rated one of the movies from drama genre? We would predict the rating for Drama genre as well for Pete Then, average the predicted rating for the two genre to get the final rating.

38

39 Experiments Setup Dataset used Netflix Prize Dataset. Experiments performed on Randomly selected 1121 movies covering users. These data instances are chosen form the probe file from the Netflix Dataset. We have the ratings for these instances in the training data. These instances are treated as Hold out set in the experiments.

40 Experiments Setup We normalize the data for the K-NN method Predictions so done are converted back to the denormalized form We test the same set of movie, user pairs on both methods Standard K-Nearest Neighbor Clustered-K-Nearest Neighbor

41 Experiments - Setup This is a regression problem, hence we want to know if we are off the expected value, how off are we? Hence, Test Metric used is Root Mean Square Error (RMSE): Absolute Average Error (AAE): Time taken.

42 Experiments - Implementation K-NN Implemented C / C++. Classes converted to Structure. Difficult to manage the massive dataset in the memory. Size of the program makes it difficult to run in C++ Comparison to every user needs a lot of fine tuning of the code to achieve a reasonable performance K-NNs inherent problem Ease of implementation vs. speed is important trade off Using maps, vectors only adds storage speed added is negated by this.

43 Experiments - Implementation C-K-NN Implemented using Perl, Matlab, Python, MySQL. Perl s hashes of hashes came to rescue Ease of token / string processing was most helpful Complex logic hence easy to express in Perl (Regex help) Python Interfaces with IMDB (IMDbPY), MySQL has local database of IMDB. Matlab does the clustering (K-Means) Fine tuning of algorithm and ample available memory negates the slow / interpreted nature of the languages.

44 Experiments - Results Result on described dataset Method Absolute Average Error Root Mean Square Error Time (Minutes) K-NN * C-K-NN Netflix (Ladder Board NA NA Topper) Netflix Current System 1 NA NA

45 Experiments - Results RMSE Comparisons Comparison of the RMSE and Absolute Average Error Time taken Time in Minutes RMSE Time in Minutes Absolu te Averag e Error K-NN C-K-NN Netflix (Current Topper) Netflix (Current System) 0 K-NN C-K-NN

46 Experiments - Results Distribution of the Absolute Average Error for K-NN and C-K-NN methods Number of Movies with error for standard K- NN method Number of Movies with error for C-K-NN method

47

48 Related Work Methods already applied to this problem are Matrix Factorization Methods Regularized Singular Value Decomposition [Paterek, 2007][Webb, 2007] Baises with Regularized SVD [Paterek, 2007] Probabilistic Latent Semantic Analysis plsa [(Hofmann, 2004] Nearest Neighbor Methods [Bell and Koren, 2007] Alternate Least Squares [Bell and Koren, 2007] Post processing of SVD features. [Paterek, 2007]

49

50 Future Work K-NN method Different values of K could experimented Distributed processing of this problem Distance weighing the contributions from neighbors C-K-NN Trying different # of clusters Dates provided with the ratings could be used in clustering along with genre More information form IMDB or other sources might included Application of Movie clustering and then predicting the rating for users is also possible

51

52 Conclusions We presented results of two methods to solve the Netflix Prize Problem including a novel based clustering method First method, a standard K-Nearest Neighbor method although gets lower RMSE value is very slow in prediction A function of comparison with every user who rated this movie Second method, clusters the users based on the genre of the movies they rated and creates super users from these clusters

53 Conclusions Standard K-NN method performs slightly better compared to the Clustering based method on the Root Mean Square Error metric but is extremely slow Our clustering based method has higher Root Mean Square Error than Standard K-NN method but is extremely fast and practical for large scale method implementations It also shows promise of being accurate for many predictions

54

55 Atul S Kulkarni kulka053@d.umn.edu

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen Recommendation Algorithms: Collaborative Filtering CSE 6111 Presentation Advanced Algorithms Fall. 2013 Presented by: Farzana Yasmeen 2013.11.29 Contents What are recommendation algorithms? Recommendations

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of

More information

Performance Comparison of Algorithms for Movie Rating Estimation

Performance Comparison of Algorithms for Movie Rating Estimation Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical

More information

COMP 465: Data Mining Recommender Systems

COMP 465: Data Mining Recommender Systems //0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square

More information

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys Metallica

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Customer X Buys Metalica CD Buys Megadeth CD Customer Y Does search on Metalica Recommender system suggests Megadeth

More information

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Eibe Frank 1 and Mark Hall 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand eibe@cs.waikato.ac.nz

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 258 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

How to predict IMDb score

How to predict IMDb score How to predict IMDb score Jiawei Li A53226117 Computational Science, Mathematics and Engineering University of California San Diego jil206@ucsd.edu Abstract This report is based on the dataset provided

More information

Recommendation Systems

Recommendation Systems Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

Sparse Estimation of Movie Preferences via Constrained Optimization

Sparse Estimation of Movie Preferences via Constrained Optimization Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion

More information

COSC6376 Cloud Computing Homework 1 Tutorial

COSC6376 Cloud Computing Homework 1 Tutorial COSC6376 Cloud Computing Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Outline Homework1 Tutorial based on Netflix dataset Homework 1 K-means

More information

CS224W Project: Recommendation System Models in Product Rating Predictions

CS224W Project: Recommendation System Models in Product Rating Predictions CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data Nicola Barbieri, Massimo Guarascio, Ettore Ritacco ICAR-CNR Via Pietro Bucci 41/c, Rende, Italy {barbieri,guarascio,ritacco}@icar.cnr.it

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007 CptS 570 Machine Learning Project: Netflix Competition Team: MLSurvivors Parisa Rashidi Vikramaditya Jakkula Wednesday, December 12, 2007 Introduction In current report, we describe our efforts put forth

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

More information

Using Data Mining to Determine User-Specific Movie Ratings

Using Data Mining to Determine User-Specific Movie Ratings Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Orange3 Data Fusion Documentation. Biolab

Orange3 Data Fusion Documentation. Biolab Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /7/0 Jure Leskovec, Stanford CS6: Mining Massive Datasets, http://cs6.stanford.edu High dim. data Graph data Infinite

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Supervised Learning: Nearest Neighbors

Supervised Learning: Nearest Neighbors CS 2750: Machine Learning Supervised Learning: Nearest Neighbors Prof. Adriana Kovashka University of Pittsburgh February 1, 2016 Today: Supervised Learning Part I Basic formulation of the simplest classifier:

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S May 2, 2009 Introduction Human preferences (the quality tags we put on things) are language

More information

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Hybrid Recommendation System Using Clustering and Collaborative Filtering Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student

More information

Data Mining Lecture 2: Recommender Systems

Data Mining Lecture 2: Recommender Systems Data Mining Lecture 2: Recommender Systems Jo Houghton ECS Southampton February 19, 2019 1 / 32 Recommender Systems - Introduction Making recommendations: Big Money 35% of Amazons income from recommendations

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

Recommender Systems. Techniques of AI

Recommender Systems. Techniques of AI Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for

More information

Assignment 5: Collaborative Filtering

Assignment 5: Collaborative Filtering Assignment 5: Collaborative Filtering Arash Vahdat Fall 2015 Readings You are highly recommended to check the following readings before/while doing this assignment: Slope One Algorithm: https://en.wikipedia.org/wiki/slope_one.

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems Recommender Systems - Introduction Making recommendations: Big Money 35% of amazons income from recommendations Netflix recommendation engine worth $ Billion per year And yet, Amazon seems to be able to

More information

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:

More information

Introduction. Chapter Background Recommender systems Collaborative based filtering

Introduction. Chapter Background Recommender systems Collaborative based filtering ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

More information

Computational Intelligence Meets the NetFlix Prize

Computational Intelligence Meets the NetFlix Prize Computational Intelligence Meets the NetFlix Prize Ryan J. Meuth, Paul Robinette, Donald C. Wunsch II Abstract The NetFlix Prize is a research contest that will award $1 Million to the first group to improve

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets Training data 00 million ratings, 80,000 users, 7,770 movies

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

Recommender Systems - Content, Collaborative, Hybrid

Recommender Systems - Content, Collaborative, Hybrid BOBBY B. LYLE SCHOOL OF ENGINEERING Department of Engineering Management, Information and Systems EMIS 8331 Advanced Data Mining Recommender Systems - Content, Collaborative, Hybrid Scott F Eisenhart 1

More information

Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

More information

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare

More information

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Content-Based Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item

More information

Deep Learning for Recommender Systems

Deep Learning for Recommender Systems join at Slido.com with #bigdata2018 Deep Learning for Recommender Systems Oliver Gindele @tinyoli oliver.gindele@datatonic.com Big Data Conference Vilnius 28.11.2018 Who is Oliver? + Head of Machine Learning

More information

A Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018

A Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018 A Recommender System John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Obvious Applications We are now advanced enough that we can aspire to a serious application.

More information

arxiv: v4 [cs.ir] 28 Jul 2016

arxiv: v4 [cs.ir] 28 Jul 2016 Review-Based Rating Prediction arxiv:1607.00024v4 [cs.ir] 28 Jul 2016 Tal Hadad Dept. of Information Systems Engineering, Ben-Gurion University E-mail: tah@post.bgu.ac.il Abstract Recommendation systems

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Predicting User Ratings Using Status Models on Amazon.com

Predicting User Ratings Using Status Models on Amazon.com Predicting User Ratings Using Status Models on Amazon.com Borui Wang Stanford University borui@stanford.edu Guan (Bell) Wang Stanford University guanw@stanford.edu Group 19 Zhemin Li Stanford University

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University We need your help with our research on human interpretable machine learning. Please complete a survey at http://stanford.io/1wpokco. It should be fun and take about 1min to complete. Thanks a lot for your

More information

CSE 454 Final Report TasteCliq

CSE 454 Final Report TasteCliq CSE 454 Final Report TasteCliq Samrach Nouv, Andrew Hau, Soheil Danesh, and John-Paul Simonis Goals Your goals for the project Create an online service which allows people to discover new media based on

More information

Recommender Systems: User Experience and System Issues

Recommender Systems: User Experience and System Issues Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Summer 2005 1 About me Professor of Computer Science & Engineering,

More information

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Recommender Systems. Collaborative Filtering & Content-Based Recommending Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

Using Social Networks to Improve Movie Rating Predictions

Using Social Networks to Improve Movie Rating Predictions Introduction Using Social Networks to Improve Movie Rating Predictions Suhaas Prasad Recommender systems based on collaborative filtering techniques have become a large area of interest ever since the

More information

Data Mining Concepts & Tasks

Data Mining Concepts & Tasks Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time

More information

Document Information

Document Information Horizon 2020 Framework Programme Grant Agreement: 732328 FashionBrain Document Information Deliverable number: D5.3 Deliverable title: Early Demo for Trend Prediction Deliverable description: This early

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation

More information

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

Entry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST

Entry Name: INRIA-Perin-MC1 VAST 2013 Challenge Mini-Challenge 1: Box Office VAST Entry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST Team Members: Charles Perin, INRIA, Univ. Paris-Sud, CNRS-LIMSI, charles.perin@inria.fr PRIMARY Student Team: YES Analytic

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center

More information

Nearest Neighbor Classification. Machine Learning Fall 2017

Nearest Neighbor Classification. Machine Learning Fall 2017 Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision

More information

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask Machine Learning and Data Mining Collaborative Filtering & Recommender Systems Kalev Kask Recommender systems Automated recommendations Inputs User information Situation context, demographics, preferences,

More information

Movie Recommender System - Hybrid Filtering Approach

Movie Recommender System - Hybrid Filtering Approach Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.

More information

Scope of Recommenders. Recommender Systems: User Experience and System Issues. Wide Range of Algorithms. About me. Classic Collaborative Filtering

Scope of Recommenders. Recommender Systems: User Experience and System Issues. Wide Range of Algorithms. About me. Classic Collaborative Filtering Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Scope of Recommenders Purely Editorial Recommenders Content Filtering

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1 Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that

More information