COMP 465: Data Mining Recommender Systems

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "COMP 465: Data Mining Recommender Systems"

Transcription

1 //0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square error (RMSE) r xi r (x,i) T xi N where N = T r xi is predicted rating r xi is the actual rating of x on i

2 //0 Narrow focus on accuracy sometimes misses the point Prediction Diversity Prediction Context Order of predictions In practice, we care only to predict high ratings: RMSE might penalize a method that does well for high ratings and badly for others Alterative: precision at top k Percentage of predictions in the user s top k withheld ratings 6 Training data 00 million ratings, 80,000, 7,770 movies 6 years of data: Test data Last few ratings of each user ( million) Evaluation criterion: Root Mean Square Error (RMSE) = rxi r R (i,x) R xi Netflix s system RMSE: 0 Competition,700+ teams $ million prize for 0% improvement on Netflix Matrix R 7,700 movies 80,

3 //0 Matrix R 7,700 movies Training Data Set?? RMSE = R 80,000??? (i,x) R r,6 Test Data Set rxi r xi Predicted rating True rating of user x on item i 9 Training data 00 million ratings, 80,000, 7,770 movies 6 years of data: Test data Last few ratings of each user ( million) Evaluation criterion: Root Mean Square Error (RMSE) = rxi r R (i,x) R xi Netflix s system RMSE: 0 Competition,700+ teams $ million prize for 0% improvement on Netflix 0 The winner of the Netflix Challenge! Multi-scale modeling of the data: Combine top level, regional modeling of the data, with a refined, local view: Global: Overall deviations of /movies Factorization: Addressing regional effects Collaborative filtering: Extract local patterns Global effects Factorization Collaborative filtering Global: Mean movie rating: stars The Sixth Sense is 0. stars above avg. Joe rates 0. stars below avg. Baseline estimation: Joe will rate The Sixth Sense stars Local neighborhood (CF/NN): Joe didn t like related movie Signs Final estimate: Joe will rate The Sixth Sense stars

4 //0 Earliest and most popular collaborative filtering method Derive unknown ratings from those of similar movies (item-item variant) Define similarity measure s ij of i and j Select k-nearest neighbors, compute the rating N(i; x): most similar to i that were rated by x rˆ xi j N ( i; x) s ij jn ( i; x) r s ij xj s ij similarity of i and j r xj rating of user x on item j N(i;x) set of similar to item i that were rated by x In practice we get better estimates if we model deviations: ^ rxi b xi baseline estimate for r xi b xi = μ + b x + b i μ = overall mean rating b x = rating deviation of user x = (avg. rating of user x) μ b i = (avg. rating of movie i) μ jn ( i; x) s ij ( r jn ( i; x) xj s ij b Problems/Issues: ) Similarity measures are arbitrary ) Pairwise similarities neglect interdependencies among ) Taking a weighted average can be restricting Solution: Instead of s ij use w ij that we estimate directly from data xj ) Basic Collaborative filtering: 0 CF+Biases+learned weights: 0 Global average: 6 User average:.06 Movie average:.0 Netflix: 0 Grand Prize: 06 Goal: Make good recommendations uantify goodness using RMSE: Lower RMSE better recommendations Want to make good recommendations on that user has not yet seen. Can t really do this! Let s set build a system such that it works well on known (user, item) ratings And hope the system will also predict well the unknown ratings 6

5 //0 SVD on Netflix data: R R For now let s assume we can approximate the rating matrix R as a product of thin R has missing entries but let s ignore that for now! Basically, we will want the reconstruction error to be small on known ratings and we don t care about the values on the missing ones SVD: A = U V T females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus Ocean s The Lion King Funny Braveheart Independence Day Lethal Weapon males Dumb and Dumber 7 8 How to estimate the missing rating of user x for item i? r xi = q i p x ? = q if p xf. -. f q i = row i of p x = column x of How to estimate the missing rating of user x for item i? r xi = q i p x ? = q if p xf. -. f q i = row i of p x = column x of

6 f Factor Factor //0 How to estimate the missing rating of user x for item i? r xi = q i p x ? f = q if p xf. -. f q i = row i of p x = column x of females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus The Lion King Funny Braveheart Lethal Weapon Ocean s Factor males Independence Day Dumb and Dumber females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus The Lion King Funny Braveheart Lethal Weapon Ocean s Factor males Independence Day Dumb and Dumber SVD: A: Input data matrix U: Left singular vecs V: Right singular vecs : Singular values So in our case: SVD on Netflix data: R A = R, = U, = V T m n A m U n V T r xi = q i p x 6

7 //0 SVD gives minimum reconstruction error (Sum of Squared Errors): min A ij UΣV T ij U,V,Σ ij A Note two things: SSE and RMSE are monotonically related: RMSE = SSE Great news: SVD is minimizing RMSE c Complication: The sum in SVD error term is over all entries (no-rating in interpreted as zero-rating). But our R has missing entries! SVD isn t defined when entries are missing! Use specialized methods to find P, min P, r xi q i p i,x R x rxi = q i p x Note: We don t require cols of P, to be orthogonal/unit length P, map /movies to a latent space The most popular model among Netflix contestants Sudden rise in the average movie rating (early 00) Improvements in Netflix GUI improvements Meaning of rating changed Movie age Users prefer new movies without any reasons Older movies are just inherently better than newer ones Y. Koren, Collaborative filtering with temporal dynamics, KDD

8 RMSE //0 0 CF (no time bias) 0 Basic Latent Factors CF (time bias) 0 Latent Factors w/ Biases 00 + Linear time 0 + Per-day user biases + CF Millions of parameters Basic Collaborative filtering: 0 Collaborative filtering++: 0 Latent : 00 Latent +Biases: 09 Latent +Biases+Time: 076 Global average: 6 User average:.06 Movie average:.0 Netflix: 0 Grand Prize: 06 Still no prize! Getting desperate. Try a kitchen sink approach! 0 June 6 th submission triggers 0-day last call Ensemble team formed Group of other teams on leaderboard forms a new team Relies on combining their models uickly also get a qualifying score over 0% BellKor Continue to get small improvements in their scores Realize that they are in direct competition with Ensemble Strategy Both teams carefully monitoring the leaderboard Only sure way to check for improvement is to submit a set of predictions This alerts the other team of your latest score 8

9 //0 Submissions limited to a day Only final submission could be made in the last h hours before deadline BellKor team member in Austria notices (by chance) that Ensemble posts a score that is slightly better than BellKor s Frantic last hours for both teams Much computer time on final optimization Carefully calibrated to end about an hour before deadline Final submissions BellKor submits a little early (on purpose), 0 mins before deadline Ensemble submits their final entry 0 mins later.and everyone waits

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:

More information

Recommendation Systems

Recommendation Systems Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

More information

Yelp Recommendation System

Yelp Recommendation System Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future

More information

Performance Comparison of Algorithms for Movie Rating Estimation

Performance Comparison of Algorithms for Movie Rating Estimation Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Collaborative Filtering with Temporal Dynamics

Collaborative Filtering with Temporal Dynamics Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo! Research, Haifa, Israel yehuda@yahoo-inc.com ABSTRACT Customer preferences for products are drifting over time. Product perception and

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

More information

Factor in the Neighbors: Scalable and Accurate Collaborative Filtering

Factor in the Neighbors: Scalable and Accurate Collaborative Filtering 1 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering YEHUDA KOREN Yahoo! Research Recommender systems provide users with personalized suggestions for products or services. These systems

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Improved Neighborhood-based Collaborative Filtering

Improved Neighborhood-based Collaborative Filtering Improved Neighborhood-based Collaborative Filtering Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park Ave, Florham Park, NJ 07932 {rbell,yehuda}@research.att.com ABSTRACT Recommender systems

More information

Introduction. Chapter Background Recommender systems Collaborative based filtering

Introduction. Chapter Background Recommender systems Collaborative based filtering ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

More information

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Football result prediction using simple classification algorithms, a comparison between k-nearest Neighbor and Linear Regression

Football result prediction using simple classification algorithms, a comparison between k-nearest Neighbor and Linear Regression EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2016 Football result prediction using simple classification algorithms, a comparison between k-nearest Neighbor and Linear Regression PIERRE

More information

arxiv: v4 [cs.ir] 28 Jul 2016

arxiv: v4 [cs.ir] 28 Jul 2016 Review-Based Rating Prediction arxiv:1607.00024v4 [cs.ir] 28 Jul 2016 Tal Hadad Dept. of Information Systems Engineering, Ben-Gurion University E-mail: tah@post.bgu.ac.il Abstract Recommendation systems

More information

Predicting Gene Function and Localization

Predicting Gene Function and Localization Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Performance of Recommender Algorithms on Top-N Recommendation Tasks

Performance of Recommender Algorithms on Top-N Recommendation Tasks Performance of Recommender Algorithms on Top- Recommendation Tasks Paolo Cremonesi Politecnico di Milano Milan, Italy paolo.cremonesi@polimi.it Yehuda Koren Yahoo! Research Haifa, Israel yehuda@yahoo-inc.com

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213

More information

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering Yeming TANG Department of Computer Science and Technology Tsinghua University Beijing, China tym13@mails.tsinghua.edu.cn Qiuli

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Hybrid Recommendation Models for Binary User Preference Prediction Problem

Hybrid Recommendation Models for Binary User Preference Prediction Problem JMLR: Workshop and Conference Proceedings 18:137 151, 2012 Proceedings of KDD-Cup 2011 competition Hybrid Recommation Models for Binary User Preference Prediction Problem Siwei Lai swlai@nlpr.ia.ac.cn

More information

Improving the Accuracy of Top-N Recommendation using a Preference Model

Improving the Accuracy of Top-N Recommendation using a Preference Model Improving the Accuracy of Top-N Recommendation using a Preference Model Jongwuk Lee a, Dongwon Lee b,, Yeon-Chang Lee c, Won-Seok Hwang c, Sang-Wook Kim c a Hankuk University of Foreign Studies, Republic

More information

Topic 7 Machine learning

Topic 7 Machine learning CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Observations. Basic iteration Line estimated from 2 inliers

Observations. Basic iteration Line estimated from 2 inliers Line estimated from 2 inliers 3 Observations We need (in this case!) a minimum of 2 points to determine a line Given such a line l, we can determine how well any other point y fits the line l For example:

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Clustering. Bruno Martins. 1 st Semester 2012/2013

Clustering. Bruno Martins. 1 st Semester 2012/2013 Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Content-Based Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item

More information

2007 Canadian Computing Competition: Senior Division. Sponsor:

2007 Canadian Computing Competition: Senior Division. Sponsor: 2007 Canadian Computing Competition: Senior Division Sponsor: Canadian Computing Competition Student Instructions for the Senior Problems 1. You may only compete in one competition. If you wish to write

More information

Recommender system techniques applied to Netflix movie data

Recommender system techniques applied to Netflix movie data Recommender system techniques applied to Netflix movie data Research Paper Business Analytics Steven Postmus (s.h.postmus@student.vu.nl) Supervisor: Sandjai Bhulai (s.bhulai@vu.nl) Vrije Universiteit Amsterdam,

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Parallel machine learning using Menthor

Parallel machine learning using Menthor Parallel machine learning using Menthor Studer Bruno Ecole polytechnique federale de Lausanne June 8, 2012 1 Introduction The algorithms of collaborative filtering are widely used in website which recommend

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

CSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015

CSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015 Instructions: CSE15 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 9-May-015 This assignment should be solved, and written

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

Pairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University

Pairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 - Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Rating Prediction Using Preference Relations Based Matrix Factorization

Rating Prediction Using Preference Relations Based Matrix Factorization Rating Prediction Using Preference Relations Based Matrix Factorization Maunendra Sankar Desarkar and Sudeshna Sarkar Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur,

More information

A Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018

A Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018 A Recommender System John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Obvious Applications We are now advanced enough that we can aspire to a serious application.

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Frequency Distributions and Descriptive Statistics in SPS

Frequency Distributions and Descriptive Statistics in SPS 230 Combs Building 859.622.3050 studentcomputing.eku.edu studentcomputing@eku.edu Frequency Distributions and Descriptive Statistics in SPSS In this tutorial, we re going to work through a sample problem

More information

Data clustering & the k-means algorithm

Data clustering & the k-means algorithm April 27, 2016 Why clustering? Unsupervised Learning Underlying structure gain insight into data generate hypotheses detect anomalies identify features Natural classification e.g. biological organisms

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Does Wikipedia Information Help Netflix Predictions?

Does Wikipedia Information Help Netflix Predictions? Does Wikipedia Information Help Netflix Predictions? John Lees-Miller, Fraser Anderson, Bret Hoehn, Russell Greiner University of Alberta Department of Computing Science {leesmill, frasera, hoehn, greiner}@cs.ualberta.ca

More information

ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations

ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations Michal Aharon Yahoo Labs, Haifa, Israel michala@yahoo-inc.com Dana Drachsler-Cohen Technion, Haifa, Israel ddana@cs.technion.ac.il Oren

More information

Data Mining Lab 2: A Basic Tree Classifier

Data Mining Lab 2: A Basic Tree Classifier Data Mining Lab 2: A Basic Tree Classifier 1 Introduction In this lab we are going to look at the Titanic data set, which provides information on the fate of passengers on the maiden voyage of the ocean

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Linear and Quadratic Least Squares

Linear and Quadratic Least Squares Linear and Quadratic Least Squares Prepared by Stephanie Quintal, graduate student Dept. of Mathematical Sciences, UMass Lowell in collaboration with Marvin Stick Dept. of Mathematical Sciences, UMass

More information

Applications Video Surveillance (On-line or off-line)

Applications Video Surveillance (On-line or off-line) Face Face Recognition: Dimensionality Reduction Biometrics CSE 190-a Lecture 12 CSE190a Fall 06 CSE190a Fall 06 Face Recognition Face is the most common biometric used by humans Applications range from

More information

Robert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms

Robert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms Lecture 09: Stereo Algorithms left camera located at (0,0,0) Recall: Simple Stereo System Y y Image coords of point (X,Y,Z) Left Camera: x T x z (, ) y Z (, ) x (X,Y,Z) z X right camera located at (T x,0,0)

More information

In-class activities: Sep 25, 2017

In-class activities: Sep 25, 2017 In-class activities: Sep 25, 2017 Activities and group work this week function the same way as our previous activity. We recommend that you continue working with the same 3-person group. We suggest that

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Predicting Results of a Biological Experiment Using Matrix Completion Algorithms

Predicting Results of a Biological Experiment Using Matrix Completion Algorithms Predicting Results of a Biological Experiment Using Matrix Completion Algorithms by Trevor Sabourin A research paper presented to the University of Waterloo in partial fulfillment of the requirement for

More information

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Landslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis.

Landslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis. Journal of Geoscience and Environment Protection, 2017, 5, 118-122 http://www.scirp.org/journal/gep ISSN Online: 2327-4344 ISSN Print: 2327-4336 Landslide Monitoring Point Optimization Deployment Based

More information

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2 Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing

More information

Predicting Bus Arrivals Using One Bus Away Real-Time Data

Predicting Bus Arrivals Using One Bus Away Real-Time Data Predicting Bus Arrivals Using One Bus Away Real-Time Data 1 2 3 4 5 Catherine M. Baker Alexander C. Nied Department of Computer Science Department of Computer Science University of Washington University

More information

Coding for Random Projects

Coding for Random Projects Coding for Random Projects CS 584: Big Data Analytics Material adapted from Li s talk at ICML 2014 (http://techtalks.tv/talks/coding-for-random-projections/61085/) Random Projections for High-Dimensional

More information

Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

More information

Spatial Variation of Sea-Level Sea level reconstruction

Spatial Variation of Sea-Level Sea level reconstruction Spatial Variation of Sea-Level Sea level reconstruction Biao Chang Multimedia Environmental Simulation Laboratory School of Civil and Environmental Engineering Georgia Institute of Technology Advisor:

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

Neighborhood-Based Collaborative Filtering

Neighborhood-Based Collaborative Filtering Chapter 2 Neighborhood-Based Collaborative Filtering When one neighbor helps another, we strengthen our communities. Jennifer Pahlka 2.1 Introduction Neighborhood-based collaborative filtering algorithms,

More information

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Kiran Kannar A53089098 kkannar@eng.ucsd.edu Saicharan Duppati A53221873 sduppati@eng.ucsd.edu Akanksha Grover A53205632 a2grover@eng.ucsd.edu

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Router placement. Problem statement for Final Round, Hash Code 2017

Router placement. Problem statement for Final Round, Hash Code 2017 Router placement Problem statement for Final Round, Hash Code 2017 Introduction Who doesn't love wireless Internet? Millions of people rely on it for productivity and fun in countless cafes, railway stations

More information

Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive

More information

Clustering. Distance Measures Hierarchical Clustering. k -Means Algorithms

Clustering. Distance Measures Hierarchical Clustering. k -Means Algorithms Clustering Distance Measures Hierarchical Clustering k -Means Algorithms 1 The Problem of Clustering Given a set of points, with a notion of distance between points, group the points into some number of

More information

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis 7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information