Inf2b Learning and Data

Similar documents
Similarity and recommender systems

Clustering and Visualisation of Data

Recommendation Systems

Data Mining Lecture 2: Recommender Systems

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Introduction to Data Mining

CS 124/LINGUIST 180 From Languages to Information

COMP6237 Data Mining Making Recommendations. Jonathon Hare

CS 124/LINGUIST 180 From Languages to Information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Jeff Howbert Introduction to Machine Learning Winter

COMP 465: Data Mining Recommender Systems

General Instructions. Questions

CS 124/LINGUIST 180 From Languages to Information

Recommender Systems 6CCS3WSN-7CCSMWAL

BBS654 Data Mining. Pinar Duygulu

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Using Social Networks to Improve Movie Rating Predictions

Introduction. Chapter Background Recommender systems Collaborative based filtering

SIMPLE INPUT and OUTPUT:

NumPy. Daniël de Kok. May 4, 2017

A Dendrogram. Bioinformatics (Lec 17)

[1] CURVE FITTING WITH EXCEL

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Exploratory data analysis for microarrays

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

EECS730: Introduction to Bioinformatics

8 2 Properties of a normal distribution.notebook Properties of the Normal Distribution Pages

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Recommendation Systems

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

IMAGE PROCESSING AND IMAGE REGISTRATION ON SPIRAL ARCHITECTURE WITH salib

Data Mining Techniques

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Clustering. Introduction to Data Science University of Colorado Boulder SLIDES ADAPTED FROM LAUREN HANNAH

Mixture Models and the EM Algorithm

10.4 Linear interpolation method Newton s method

Some basic set theory (and how it relates to Haskell)

Data Mining Techniques

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Statistical Pattern Recognition

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

CPSC 340: Machine Learning and Data Mining

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

Progress Report: Collaborative Filtering Using Bregman Co-clustering

IDS 101 Introduction to Spreadsheets

k-means A classical clustering algorithm

Statistical Pattern Recognition

To Plot a Graph in Origin. Example: Number of Counts from a Geiger- Müller Tube as a Function of Supply Voltage

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

PrimaryTools.co.uk 2012 MATHEMATICS KEY STAGE TEST C LEVEL 6 CALCULATOR ALLOWED PAGE TOTAL MARKS First Name Last Name School Prim

Advanced Digital Signal Processing Adaptive Linear Prediction Filter (Using The RLS Algorithm)

Collaborative Filtering Applied to Educational Data Mining

Chapters 5-6: Statistical Inference Methods

Gene Clustering & Classification

IALP Algorithms

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

Linear Algebra Review

Hacking AES-128. Timothy Chong Stanford University Kostis Kaffes Stanford University

Unit #4: Statistics Lesson #1: Mean, Median, Mode, & Standard Deviation

Hierarchical Clustering

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

MATLAB Introduction To Engineering for ECE Topics Covered: 1. Creating Script Files (.m files) 2. Using the Real Time Debugger

Singular Value Decomposition, and Application to Recommender Systems

Lecture 6: Arithmetic and Threshold Circuits

Image Classification pipeline. Lecture 2-1

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Guided Analysis of WS3

Convolutional Codes. COS 463: Wireless Networks Lecture 9 Kyle Jamieson. [Parts adapted from H. Balakrishnan]

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

18.02 Multivariable Calculus Fall 2007

Python Advance Course via Astronomy street. Sérgio Sousa (CAUP) ExoEarths Team (

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

CPIB SUMMER SCHOOL 2011: INTRODUCTION TO BIOLOGICAL MODELLING

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

Missing Data. Where did it go?

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Part 11: Collaborative Filtering. Francesco Ricci

3.5 Applying the Normal Distribution: Z - Scores

MATLAB - Lecture # 4

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a

CHAPTER 2 DESCRIPTIVE STATISTICS

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Transcription:

Inf2b Learning and Data http://www.inf.ed.ac.uk/teaching/courses/inf2b/ Lecture 2 Similarity and Recommender systems Iain Murray, 2013 School of Informatics, University of Edinburgh

The confection m&m s (185g) Jelly Belly (100g) Chocolate Raisins (200g)

Recommender systems What makes recommendations good?

The Netflix million dollar prize C =480, 189 users/critics M =17, 770 movies C M matrix of ratings {1, 2, 3, 4, 5} Full matrix 10 billion cells 1% cells filled (100,480,507 ratings available) (ordinal values) Also available: dates of ratings; possibly movie information We ll start with a smaller, simpler setup.

Today s Schedule: Distances between entities Similarity and recommendations Normalization, Pearson Correlation And a trick: transpose your data matrix and run your code again. The result is sometimes interesting.

The Films

The Critics David Denby Todd McCarthy Joe Morgenstern Claudia Puig Peter Travers Kenneth Turan

The Data Body of Burn Rev Australia Lies After Hancock Milk Road Denby 3 7 4 9 9 7 McCarthy 7 5 5 3 8 8 M stern 7 5 5 0 8 4 Puig 5 6 8 5 9 8 Travers 5 8 8 8 10 9 Turan 7 7 8 4 7 8

A two-dimensional review space 10 8 McCarthy Turan Puig Travers Rev Road 6 4 User 1 Morgenstern Denby 2 0 0 2 4 6 8 10 Hancock

Euclidean distance Distance between 2D vectors: (x, y) and (x, y ) r 2 = (x x ) 2 + (y y ) 2 Distance between D-dimensional vectors: x and x r 2 (x, x ) = D (x d x d )2 d=1 Measures similarities between feature vectors i.e., similarities between digits, critics, movies, genes,...

Distances between critics Denby McCarthy M stern Puig Travers Turan Denby 7.7 10.6 6.2 5.2 7.9 McCarthy 7.7 5.0 4.4 7.2 3.9 M stern 10.6 5.0 7.5 10.7 6.8 Puig 6.2 4.4 7.5 3.9 3.2 Travers 5.2 7.2 10.7 3.9 5.6 Turan 7.9 3.9 6.8 3.2 5.6 Distances measured in a 6-dimensional space The closest pair is Puig and Turan

Transposed problem

Distance between Movies Body of Burn Rev Australia Lies After Hancock Milk Road Australia 5.8 5.3 10.9 8.9 7.2 Body of Lies 5.8 3.7 6.6 5.9 4.0 Burn After 5.3 3.7 8.9 7.0 4.5 Hancock 10.9 6.6 8.9 10.9 8.4 Milk 8.9 5.9 7.0 10.9 4.8 Rev. Road 7.2 4.0 4.5 8.4 4.8 Run the same code for distance between critics, simply transpose the data matrix first Transpose of data in numpy is data.t, in Matlab/Octave it s data

Today s Schedule: Distances between entities Similarity and recommendations Normalization, Pearson Correlation And a trick: transpose your data matrix and run your code again. The result is sometimes interesting.

User 2 Body of Burn After Revolutionary Lies Reading Road User2 6 9 6 Now measuring distances in 3D: Critic Denby McCarthy Morgenstern Puig Travers Turan r 2 (critic, user2) 27 = 5.2 21 = 4.6 21 = 4.6 5 = 2.2 14 = 3.7 6 = 2.4 User 2 seems most similar to Claudia Puig

Weighted averages The mean or average of critic scores for movie z: 1 C C c=1 sc c (z) Weighted average for user u, based on similarity to critics: sc u (z) = 1 C c=1 sim(x(u), x (c) ) C sim(x (u), x (c) ) sc c (z) c=1 The normalization outside each sum, means that if every critic has the same score, the (weighted) average will report that score.

Simple recommender system Predicted score: average critic score weighted by similarity Similarity measures: There s a choice. For example: sim(x, y) = 1 1 + r 2 (x, y) Can now predict scores for User 2 (see notes) Good measure? Consider distances 0,, and in between. What if not all critics have seen the same movies? What if some critics rate more highly than others? What if some critics have a wider spread than others?

Today s Schedule: Distances between entities Similarity and recommendations Normalization, Pearson Correlation And a trick: transpose your data matrix and run your code again. The result is sometimes interesting.

Normalization Different means and spreads make reviewers look different Create standard score with mean zero and st. dev. 1 Mean and standard deviation of critic c s scores: µ (c) = 1 M x m (c) ; σ (c) = 1 M ( (c) x m µ M M 1 (c)) 2 m=1 Standardized score: m=1 z (c) m = x(c) m µ (c) σ (c) Many learning systems work better with standardized features/outputs

Pearson correlation Estimate of correlation between critics c and d: ρ(c, d) = 1 M 1 M m=1 z (c) m z (d) m. Tends to one value as M Based on standard scores (a shift and stretch of a reviewer s scale makes no difference) Used in the mix by the winning netflix teams: E.g., http://public.research.att.com/~yehuda/pubs/cf-workshop.pdf

Summary Rating prediction: fill in entries of a C M matrix a row is a feature vector of a critic guess cells based on weighted average of similar rows similarity based on distance and Pearson correlation could transpose matrix and run same code!

NumPy programming example from numpy import * c_scores = array([ [3, 7, 4, 9, 9, 7], [7, 5, 5, 3, 8, 8], [7, 5, 5, 0, 8, 4], [5, 6, 8, 5, 9, 8], [5, 8, 8, 8, 10, 9], [7, 7, 8, 4, 7, 8]]) # C,M u2_scores = array([6, 9, 6]) u2_movies = array([1, 2, 5]) # zero-based indices r2 = sqrt(sum((c_scores[:,u2_movies] - u2_scores)**2, 1).T) # C, sim = 1/(1 + r2) # C, pred_scores = dot(sim, c_scores) / sum(sim) print(pred_scores) # The predicted scores has predictions for all movies, # including ones where we know the true rating from u2.

Matlab/Octave version c_scores = [ 3 7 4 9 9 7; 7 5 5 3 8 8; 7 5 5 0 8 4; 5 6 8 5 9 8; 5 8 8 8 10 9; 7 7 8 4 7 8]; % CxM u2_scores = [6 9 6]; u2_movies = [2 3 6]; % one-based indices % The next line is complicated. See also next slide: d2 = sum(bsxfun(@minus, c_scores(:,u2_movies), u2_scores).^2, 2) ; r2 = sqrt(d2); sim = 1./(1 + r2); % 1xC pred_scores = (sim * c_scores) / sum(sim) % 1xM = 1xC * CxM

Matlab/Octave square distances Other ways to get square distances: % The next line is like the Python, but not valid Matlab. % Works in recent builds of Octave. d2 = sum((c_scores(:,u2_movies) - u2_scores).^2, 2) ; % Old-school Matlab way to make sizes match: d2 = sum((c_scores(:,u2_movies) -... repmat(u2_scores, size(c_scores,1), 1)).^2, 2) ; % Sq. distance is common; I have a general routine at: % homepages.inf.ed.ac.uk/imurray2/code/imurray-matlab/square_dist.m d2 = square_dist(u2_scores, c_scores(:,u2_movies) ); Or you could write a for loop and do it as you might in Java. Worth doing to check your code.