By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin
|
|
- Stewart Lawrence
- 6 years ago
- Views:
Transcription
1 By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin
2 Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future Work Conclusion Q & A
3
4 Problem Statement Given a set of users with their previous ratings for a set of movies, can we predict the rating they will assign to a movie they have not previously rated? Netflix puts it as The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love. So what do they want? 10% improvement to their existing system. They are paying $1 Million for this.
5 Problem Statement Similarly, which movie will you like given that you have seen X-Men, X-Men II, X-Men : The Last Stand and users who saw these movies also liked X-Men Origins : Wolverine? Answer:?
6 Dataset Background for the problem Background for the Solution
7 Background - Dataset Netflix Prize Dataset Netflix released data for this competition Contains nearly 100 Million ratings Number of users (Anonymous) = 480,189 Number of movies rated by them = 17,770 Training Data is provided per movie To verify the model developed without submitting the predictions to Netflix probe.txt is provided To submit the predictions for competition qualifying.txt is used
8 Background - Dataset Data in the training file is per movie It looks like this Movie# Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating - Example 4: ,3, ,1, ,5,
9 Background - Dataset Data points in the probe.txt looks like this (Have answers) Movie# Customer# Customer# Data in the qualifying.txt looks like this (No answers) Movie# Customer#, DateofRating Customer#, DateofRating 1: : , , ,
10 Background Dataset stats Total ratings possible = 480,189 (user) * 17,770 (movies) = (8.5 Billion) Total available = 100 Million The User x Movies matrix has 8.4 Billion entries missing Sparse Data
11 Background of the problem Recommender Systems Examples: Yahoo, Google, youtube, Amazon. Recommend item that you might like. The recommendation is made based on past behavior. Collaborative Filtering [Gábor, 2009] What is it? Who collaborates and what is filtered? How can it be applied in this contest?
12 Background of the problem Earlier systems implemented in 1990s. GroupLens (Usenet articles) [Resnick, 1997] Siteseer (Cross linking technical papers)[resnick, 1997] Tapestry ( filtering) [Goldberg, 1992] Earlier solutions provided for users to rate the item. Two major divisions of methods Model based fit a model to the training data. Memory based Nearest Neighbor Methods.
13 Background for the Solution K-Nearest Neighbor (K-NN) method. Memory Based method. Measures Distance between the query instance and every instance in the training set. Find the K training instances with the least distance from query instance. Using these K instances, average their rating for this movie for these training instances. Distances can be measured using the following formulae.
14 Background for the Solution Distance formulae. Distance Formula Manhattan Distance features f 1 x i, f x j, f Euclidean Distance features f 1 ( x i, f x j, f ) 2 Minkowsky Distance p features ( x i x f 1 p, f j, f ) Mahalanobis Distance 1 ( x x ) ( x x ) i j i j T
15 Background for the Solution How important is distance measure? Curse of Dimensionality. Example: what if we were to characterize the movie by it actors, directors, writers, genre, and then all of its CREW? What is the problem? What if some attributes are more dominant than others? Example: Cost of home are much larger quantities than person s height.
16 Background of the Solution What if I was very conservative about my rating and someone else was too generous? I rate the movie I like the most as 3 and the least as 1. someone else rates his/her high at 5 and high at 3. So am I like this person? Difficult to say. We are comparing two people with very high personal biases. Which will result in obvious flawed similarity measure. Solution? Normalization of the data.
17 Background for the Solution Normalization What is that? How do we do it? How will it change my ratings? Won t I loose the original rating? We will calculate Mean rating for every user over the movies he / she has rated Also calculate standard deviation for the user s rating. From every rating we will subtract the user s mean rating and divide it by their standard deviation.
18 Background for the Solution Should all members of the neighborhood contribute equally to the prediction? Not always, we can argue that people who are similar to you, i.e. have least distance from you should contribute more than farther ones. This is done by weighing the prediction by the instance s distance from the query instance.
19 Background for the Solution Clustering Idea is to group the items together based on their attributes. Data is typically unlabeled. Similarity is measured using the distance between the two points. Example: Consider going in to a comic book shop and putting together comics from a pile of comics that are similar. Types: Partitional Clustering: K-Means Hierarchical clustering: Agglomerative Clustering
20 Background for the Solution K-Means clustering [MacQueen, 1967] Randomly select K instances as cluster centers. Label every data point with its nearest cluster centers. Re-compute the cluster centers. Repeat the above two steps until no instances change clusters or certain iterations have gone by. How is it related to our discussion today?
21 K Nearest Neighbor Algorithm Clustering Based Nearest Neighbor Algorithm
22 Proposed Solution K-Nearest Neighbor approach (Overview) Given a query instance q(movieid, UserId) normalize the data before processing. Find the distance of this instance with all the users who rated this movie. Of the these users select the K users that are nearest to the query instance as its neighborhood. Average the rating of the users form this neighborhood for this particular movie. This is the predicted rating for the query instance.
23 Proposed Solution - Example Example: (Representative data, not real) Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 5? 4 4
24 Proposed Solution - Example calculate the Mean and Standard Deviation vectors. meanrating standarddeviation Jim Sean John Sidd Penny Pete
25 Proposed Solution - Example Normalized data Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 1.15?
26 Proposed Solution - Example So now we have a query instance q(pete, Sita Aur Gita) i.e. we wish to evaluate how much will Pete like movie Sita Aur Gita on a scale of 1-5. To do this we need to indentify Pete s two neighbors who rated this movie. (2-NN case). Users who rated the movie Sita Aur Gita are. candidate_users Jim Sidd Penny
27 Proposed Solution - Example Users with their distance and the 2 neighbors in the neighborhood are Users Distance Jim Sidd Peny Nearest Neighbors are Jim and Sidd.
28 Proposed Solution - Example The average of the ratings by Jim and Sidd to movie Sita Aur Gita is So is our prediction correct? Not yet. This prediction is in normalized form. We need to bring it back to Pete s prediction level. How? Multiply by Standard Deviation of Pete s ratings. Add Pete mean rating to this product. ( * ) = So predicted rating for Pete is
29 Proposed Solution C-K-NN Clustering based Nearest Neighbor appraoch Obtain for every movie its genre from external sources. (IMDB in our case) Create for every user a vector representing each genre as one cell. In that cell we count number of movies that users has rated for the genre. (We have one such vector for each user.) Cluster the users as per the genres of the movie they have rated. Cluster centers of these clusters represent the collective opinion of the users in that cluster about the movies of that particular genre. We call them Super Users
30 Proposed Solution C-K-NN For each super user we predict rating of all the movies of that genre as the average of the ratings of the users that rated the movie. When presented with query point q(movieid, userid). We find all the genre for that movie. For each genre we calculate distance of the user from cluster centers for the genre. We select the nearest K cluster centers and average the rating of these cluster centers for the movie to predict movie rating for this genre. We average per genre predicted rating and get the predicted ratings for q.
31 Proposed Solution Example (C-K-NN) We use the data from our previous example. (recap) Matrix Star Wars Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men Jim Sean John Sidd Penny Pete 5? 4 4
32 Proposed Solution Example (C-K-NN) We find genre for every movie. Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Matrix Star Wars Dark Knight Rocky 1 1 Sita Aur Gita 1 1 Star Trek 1 1 Cliffhanger A.I MI X-Men 1 1 1
33 Proposed Solution Example (C-K-NN) Convert User Movie Data to User Genre Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Jim Sean John Sidd Penny Pete
34 Proposed Solution Example (C-K-NN) We cluster users in to two clusters. Action Adventure Crime Drama Fantasy Sci-Fi Sport Thriller Jim Sean John Sidd Penny Pete
35 Proposed Solution Example (C-K-NN) The query point as last time is q(pete, Sita Aur Gita) Per genre cluster look like (Genres of Sita Aur Gita ) Adventure Drama 1 2 Matrix Star Wars Sita Aur Gita 2 3 A.I. 5 Sita Aur Gita Star Trek 3 4 Cliffhanger A.I. 5 MI 2 1
36 Proposed Solution Example (C-K-NN) Distance of Pete from Cluster centers of Adventure 1 2 Pete Distance of Pete from Cluster centers of Drama Not applicable as Pete does not rate any movie from that genre. We try to find one (K=1) nearest cluster for Adventure genre. That is cluster two.
37 Proposed Solution Example (C-K-NN) Hence, the rating for the query point q(pete, Sita Aur Gita) calculated by taking the rating of cluster two of adventure genre. Our prediction is: 2 for this movie. What if Pete had rated one of the movies from drama genre? We would predict the rating for Drama genre as well for Pete Then, average the predicted rating for the two genre to get the final rating.
38
39 Experiments Setup Dataset used Netflix Prize Dataset. Experiments performed on Randomly selected 1121 movies covering users. These data instances are chosen form the probe file from the Netflix Dataset. We have the ratings for these instances in the training data. These instances are treated as Hold out set in the experiments.
40 Experiments Setup We normalize the data for the K-NN method Predictions so done are converted back to the denormalized form We test the same set of movie, user pairs on both methods Standard K-Nearest Neighbor Clustered-K-Nearest Neighbor
41 Experiments - Setup This is a regression problem, hence we want to know if we are off the expected value, how off are we? Hence, Test Metric used is Root Mean Square Error (RMSE): Absolute Average Error (AAE): Time taken.
42 Experiments - Implementation K-NN Implemented C / C++. Classes converted to Structure. Difficult to manage the massive dataset in the memory. Size of the program makes it difficult to run in C++ Comparison to every user needs a lot of fine tuning of the code to achieve a reasonable performance K-NNs inherent problem Ease of implementation vs. speed is important trade off Using maps, vectors only adds storage speed added is negated by this.
43 Experiments - Implementation C-K-NN Implemented using Perl, Matlab, Python, MySQL. Perl s hashes of hashes came to rescue Ease of token / string processing was most helpful Complex logic hence easy to express in Perl (Regex help) Python Interfaces with IMDB (IMDbPY), MySQL has local database of IMDB. Matlab does the clustering (K-Means) Fine tuning of algorithm and ample available memory negates the slow / interpreted nature of the languages.
44 Experiments - Results Result on described dataset Method Absolute Average Error Root Mean Square Error Time (Minutes) K-NN * C-K-NN Netflix (Ladder Board NA NA Topper) Netflix Current System 1 NA NA
45 Experiments - Results RMSE Comparisons Comparison of the RMSE and Absolute Average Error Time taken Time in Minutes RMSE Time in Minutes Absolu te Averag e Error K-NN C-K-NN Netflix (Current Topper) Netflix (Current System) 0 K-NN C-K-NN
46 Experiments - Results Distribution of the Absolute Average Error for K-NN and C-K-NN methods Number of Movies with error for standard K- NN method Number of Movies with error for C-K-NN method
47
48 Related Work Methods already applied to this problem are Matrix Factorization Methods Regularized Singular Value Decomposition [Paterek, 2007][Webb, 2007] Baises with Regularized SVD [Paterek, 2007] Probabilistic Latent Semantic Analysis plsa [(Hofmann, 2004] Nearest Neighbor Methods [Bell and Koren, 2007] Alternate Least Squares [Bell and Koren, 2007] Post processing of SVD features. [Paterek, 2007]
49
50 Future Work K-NN method Different values of K could experimented Distributed processing of this problem Distance weighing the contributions from neighbors C-K-NN Trying different # of clusters Dates provided with the ratings could be used in clustering along with genre More information form IMDB or other sources might included Application of Movie clustering and then predicting the rating for users is also possible
51
52 Conclusions We presented results of two methods to solve the Netflix Prize Problem including a novel based clustering method First method, a standard K-Nearest Neighbor method although gets lower RMSE value is very slow in prediction A function of comparison with every user who rated this movie Second method, clusters the users based on the genre of the movies they rated and creates super users from these clusters
53 Conclusions Standard K-NN method performs slightly better compared to the Clustering based method on the Root Mean Square Error metric but is extremely slow Our clustering based method has higher Root Mean Square Error than Standard K-NN method but is extremely fast and practical for large scale method implementations It also shows promise of being accurate for many predictions
54
55 Atul S Kulkarni kulka053@d.umn.edu
Recommender Systems New Approaches with Netflix Dataset
Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationRecommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen
Recommendation Algorithms: Collaborative Filtering CSE 6111 Presentation Advanced Algorithms Fall. 2013 Presented by: Farzana Yasmeen 2013.11.29 Contents What are recommendation algorithms? Recommendations
More informationTowards a hybrid approach to Netflix Challenge
Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of
More informationPerformance Comparison of Algorithms for Movie Rating Estimation
Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical
More informationCOMP 465: Data Mining Recommender Systems
//0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square
More informationCSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)
CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction
More informationCS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys Metallica
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Customer X Buys Metalica CD Buys Megadeth CD Customer Y Does search on Metalica Recommender system suggests Megadeth
More informationAdditive Regression Applied to a Large-Scale Collaborative Filtering Problem
Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Eibe Frank 1 and Mark Hall 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand eibe@cs.waikato.ac.nz
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning
More informationDemystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian
Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationUse of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University
Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the
More informationProgress Report: Collaborative Filtering Using Bregman Co-clustering
Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationCSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)
CSE 258 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction
More informationHow to predict IMDb score
How to predict IMDb score Jiawei Li A53226117 Computational Science, Mathematics and Engineering University of California San Diego jil206@ucsd.edu Abstract This report is based on the dataset provided
More informationRecommendation Systems
Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)
More informationWeighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract
Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.
More informationThanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive
More informationSparse Estimation of Movie Preferences via Constrained Optimization
Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion
More informationCOSC6376 Cloud Computing Homework 1 Tutorial
COSC6376 Cloud Computing Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Outline Homework1 Tutorial based on Netflix dataset Homework 1 K-means
More informationCS224W Project: Recommendation System Models in Product Rating Predictions
CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationCS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of
More informationAn Empirical Comparison of Collaborative Filtering Approaches on Netflix Data
An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data Nicola Barbieri, Massimo Guarascio, Ettore Ritacco ICAR-CNR Via Pietro Bucci 41/c, Rende, Italy {barbieri,guarascio,ritacco}@icar.cnr.it
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationCptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007
CptS 570 Machine Learning Project: Netflix Competition Team: MLSurvivors Parisa Rashidi Vikramaditya Jakkula Wednesday, December 12, 2007 Introduction In current report, we describe our efforts put forth
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationUsing Data Mining to Determine User-Specific Movie Ratings
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationOrange3 Data Fusion Documentation. Biolab
Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /7/0 Jure Leskovec, Stanford CS6: Mining Massive Datasets, http://cs6.stanford.edu High dim. data Graph data Infinite
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationSupervised Learning: Nearest Neighbors
CS 2750: Machine Learning Supervised Learning: Nearest Neighbors Prof. Adriana Kovashka University of Pittsburgh February 1, 2016 Today: Supervised Learning Part I Basic formulation of the simplest classifier:
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationSampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S
Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S May 2, 2009 Introduction Human preferences (the quality tags we put on things) are language
More informationHybrid Recommendation System Using Clustering and Collaborative Filtering
Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student
More informationData Mining Lecture 2: Recommender Systems
Data Mining Lecture 2: Recommender Systems Jo Houghton ECS Southampton February 19, 2019 1 / 32 Recommender Systems - Introduction Making recommendations: Big Money 35% of Amazons income from recommendations
More informationCS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of
More informationRecommender Systems. Techniques of AI
Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for
More informationAssignment 5: Collaborative Filtering
Assignment 5: Collaborative Filtering Arash Vahdat Fall 2015 Readings You are highly recommended to check the following readings before/while doing this assignment: Slope One Algorithm: https://en.wikipedia.org/wiki/slope_one.
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationRecommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems
Recommender Systems - Introduction Making recommendations: Big Money 35% of amazons income from recommendations Netflix recommendation engine worth $ Billion per year And yet, Amazon seems to be able to
More informationRecommender System. What is it? How to build it? Challenges. R package: recommenderlab
Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing
More informationData Mining Techniques
Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:
More informationIntroduction. Chapter Background Recommender systems Collaborative based filtering
ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,
More informationComputational Intelligence Meets the NetFlix Prize
Computational Intelligence Meets the NetFlix Prize Ryan J. Meuth, Paul Robinette, Donald C. Wunsch II Abstract The NetFlix Prize is a research contest that will award $1 Million to the first group to improve
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets Training data 00 million ratings, 80,000 users, 7,770 movies
More informationData Mining Techniques
Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)
More informationPart 11: Collaborative Filtering. Francesco Ricci
Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationRecommender Systems - Content, Collaborative, Hybrid
BOBBY B. LYLE SCHOOL OF ENGINEERING Department of Engineering Management, Information and Systems EMIS 8331 Advanced Data Mining Recommender Systems - Content, Collaborative, Hybrid Scott F Eisenhart 1
More informationCollaborative Filtering for Netflix
Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented
More informationExtension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize
Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare
More informationProperty1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14
Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Content-Based Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item
More informationDeep Learning for Recommender Systems
join at Slido.com with #bigdata2018 Deep Learning for Recommender Systems Oliver Gindele @tinyoli oliver.gindele@datatonic.com Big Data Conference Vilnius 28.11.2018 Who is Oliver? + Head of Machine Learning
More informationA Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018
A Recommender System John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Obvious Applications We are now advanced enough that we can aspire to a serious application.
More informationarxiv: v4 [cs.ir] 28 Jul 2016
Review-Based Rating Prediction arxiv:1607.00024v4 [cs.ir] 28 Jul 2016 Tal Hadad Dept. of Information Systems Engineering, Ben-Gurion University E-mail: tah@post.bgu.ac.il Abstract Recommendation systems
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions
More informationPredicting User Ratings Using Status Models on Amazon.com
Predicting User Ratings Using Status Models on Amazon.com Borui Wang Stanford University borui@stanford.edu Guan (Bell) Wang Stanford University guanw@stanford.edu Group 19 Zhemin Li Stanford University
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
We need your help with our research on human interpretable machine learning. Please complete a survey at http://stanford.io/1wpokco. It should be fun and take about 1min to complete. Thanks a lot for your
More informationCSE 454 Final Report TasteCliq
CSE 454 Final Report TasteCliq Samrach Nouv, Andrew Hau, Soheil Danesh, and John-Paul Simonis Goals Your goals for the project Create an online service which allows people to discover new media based on
More informationRecommender Systems: User Experience and System Issues
Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Summer 2005 1 About me Professor of Computer Science & Engineering,
More informationRecommender Systems. Collaborative Filtering & Content-Based Recommending
Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationUsing Social Networks to Improve Movie Rating Predictions
Introduction Using Social Networks to Improve Movie Rating Predictions Suhaas Prasad Recommender systems based on collaborative filtering techniques have become a large area of interest ever since the
More informationData Mining Concepts & Tasks
Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time
More informationDocument Information
Horizon 2020 Framework Programme Grant Agreement: 732328 FashionBrain Document Information Deliverable number: D5.3 Deliverable title: Early Demo for Trend Prediction Deliverable description: This early
More informationRecommender Systems 6CCS3WSN-7CCSMWAL
Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:
More informationPart 12: Advanced Topics in Collaborative Filtering. Francesco Ricci
Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules
More informationInternational Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation
More information5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction
Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome
More informationCPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017
CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationGlobal Journal of Engineering Science and Research Management
A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationEntry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST
Entry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST Team Members: Charles Perin, INRIA, Univ. Paris-Sud, CNRS-LIMSI, charles.perin@inria.fr PRIMARY Student Team: YES Analytic
More informationNon-negative Matrix Factorization for Multimodal Image Retrieval
Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The
More informationNon-negative Matrix Factorization for Multimodal Image Retrieval
Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering
More informationA Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)
A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center
More informationNearest Neighbor Classification. Machine Learning Fall 2017
Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision
More informationMachine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask
Machine Learning and Data Mining Collaborative Filtering & Recommender Systems Kalev Kask Recommender systems Automated recommendations Inputs User information Situation context, demographics, preferences,
More informationMovie Recommender System - Hybrid Filtering Approach
Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.
More informationScope of Recommenders. Recommender Systems: User Experience and System Issues. Wide Range of Algorithms. About me. Classic Collaborative Filtering
Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Scope of Recommenders Purely Editorial Recommenders Content Filtering
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationCSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction
CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationMatrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1
Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that
More information