IMP: A Message-Passing Algorithm for Matrix Completion

Similar documents
Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS

Mixture Models and the EM Algorithm

Computer Vision. Exercise Session 10 Image Categorization

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Singular Value Decomposition, and Application to Recommender Systems

Lecture 17 Sparse Convex Optimization

CS281 Section 9: Graph Models and Practical MCMC

08 An Introduction to Dense Continuous Robotic Mapping

Multi-area Nonlinear State Estimation using Distributed Semidefinite Programming

Outline. Data Association Scenarios. Data Association Scenarios. Data Association Scenarios

Limitations of Matrix Completion via Trace Norm Minimization

Note Set 4: Finite Mixture Models and the EM Algorithm

Outlier Pursuit: Robust PCA and Collaborative Filtering

AMS526: Numerical Analysis I (Numerical Linear Algebra)

The convex geometry of inverse problems

Scalable Network Analysis

Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis

GENERAL ASSIGNMENT PROBLEM via Branch and Price JOHN AND LEI

Lecture 26: Missing data

Machine Learning. Unsupervised Learning. Manfred Huber

Random projection for non-gaussian mixture models

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Spectral Clustering and Community Detection in Labeled Graphs

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

What Does Robustness Say About Algorithms?

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Nonparametric Importance Sampling for Big Data

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Multi-label classification using rule-based classifier systems

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Orange3 Data Fusion Documentation. Biolab

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets

Probabilistic Matrix Factorization

Factorization with Missing and Noisy Data

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing

Big-data Clustering: K-means vs K-indicators

Hidden Markov Models in the context of genetic analysis

Learning Generative Graph Prototypes using Simplified Von Neumann Entropy

Compressive Sensing. Part IV: Beyond Sparsity. Mark A. Davenport. Stanford University Department of Statistics

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

CS 229 Midterm Review

Variational Methods for Discrete-Data Latent Gaussian Models

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

K-Means and Gaussian Mixture Models

Document Information

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

IBL and clustering. Relationship of IBL with CBR

CSC 411 Lecture 18: Matrix Factorizations

Clustering and Visualisation of Data

Probabilistic Robotics

10701 Machine Learning. Clustering

Preference Elicitation for Single Crossing Domain

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

Computational complexity

Analyzing the Peeling Decoder

Recommender Systems New Approaches with Netflix Dataset

Network Lasso: Clustering and Optimization in Large Graphs

I How does the formulation (5) serve the purpose of the composite parameterization

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Figure (5) Kohonen Self-Organized Map

Data-Driven Geometry Processing Map Synchronization I. Qixing Huang Nov. 28 th 2018

The Principle and Improvement of the Algorithm of Matrix Factorization Model based on ALS

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Digraph Complexity Measures and Applications in Formal Language Theory

Visual Representations for Machine Learning

6 Randomized rounding of semidefinite programs

Centroid Decomposition Based Recovery for Segmented Time Series

1 Case study of SVM (Rob)

Inference and Representation

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Two-phase matrix splitting methods for asymmetric and symmetric LCP

A Memetic Heuristic for the Co-clustering Problem

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Numerical Robustness. The implementation of adaptive filtering algorithms on a digital computer, which inevitably operates using finite word-lengths,

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Mathematical and Algorithmic Foundations Linear Programming and Matchings

External Encodings Do not Prevent Transient Fault Analysis

Day 3 Lecture 1. Unsupervised Learning

Convex Optimization / Homework 2, due Oct 3

Optimal Denoising of Natural Images and their Multiscale Geometry and Density

Unsupervised: no target value to predict

over Multi Label Images

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Robust Principal Component Analysis (RPCA)

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Uncertainties: Representation and Propagation & Line Extraction from Range data

Algebraic Iterative Methods for Computed Tomography

Lecture 6: Spectral Graph Theory I

Transcription:

IMP: A Message-Passing Algorithm for Matrix Completion Henry D. Pfister (joint with Byung-Hak Kim and Arvind Yedla) Electrical and Computer Engineering Texas A&M University ISTC 2010 Brest, France September 9, 2010 Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 1 / 17

Outline 1 Matrix Completion via Iterative Information Processing 2 Factor Graph Model for Matrix Completion 3 IMP Algorithm and Performance Results 4 Conclusions and Open Problems Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 2 / 17

Matrix Completion via Iterative Information Processing Matrix Completion Problem In its simplest form, the problem is to recover a matrix from a small sample of its entries,... Imagine now that we only observe a few entries of a data matrix. Then is it possible to accurately or even exactly guess the entries that we have not seen?... - Candes and Plan 09 Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 3 / 17

Matrix Completion via Iterative Information Processing Motivation: The Netflix Challenge ( 06-09) An overwhelming portion of the user-movie matrix (e.g., 99%) is unknown and the few observed movie ratings are noisy! Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 4 / 17

Matrix Completion via Iterative Information Processing Recent Work on Matrix Completion Efficient Models and Practical Algorithms Low rank matrix models and clustering models Convex relaxation (SDP) and Bayesian approaches Exploration of the Fundamental Limits Main Differences Relationship between sparse observation and the recovery of missing entries Cold-start problem in recommender systems 1 Focus on the matrix where the entries (drawn from a finite alphabet) are modeled by a factor graph 2 MP based algorithm to learn the missing information and eventually estimate missing entries Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 5 / 17

Matrix Completion via Iterative Information Processing Our Goal: Matrix Completion via Iterative Processing Basic Approach Establish a factor-graph model Estimate parameters of model Use for inference via message-passing (IMP) Benefits of our factor-graph model Establishes a generative model for data matrices Sparse observations reduce complexity Drawbacks of this approach Mixes clustering and message-passing Difficult to analyze behavior Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 6 / 17

Factor Graph Model for Matrix Completion Factor Graph Model (1): Conditional Independence Movies V Ratings R U Users V 0 V 1 V 2 V 3 V 4 V 5 V 6 permutation permutation U 0 U 1 U 2 U 3 U 4 U 5 U 6 V M U N Assumption The movie ratings R nm (user n, movie m) are conditionally independent given the user group U n and the movie group V m Pr (R O U, V) w (R nm U n, V m ) (n,m) O where w(r u, v) Pr(R nm = r U n = u, V m = v) Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 7 / 17

Factor Graph Model for Matrix Completion Factor Graph Model (2): Computational Tree Unknown Rating Known Rating Movie Group User Group Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 8 / 17

Factor Graph Model for Matrix Completion Factor Graph Model (3): Matrix Decomposition MMSE estimates can be written as a matrix factorization In contrast to the standard low-rank matrix model, this adds non-negativity and normalization constraints Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 9 / 17

IMP Algorithm and Performance Results Inference by Message-Passing (IMP) Algorithm (1) Step I. Initialize Group Rating Probabilities w(r u, v) Cluster users (and movies) using variable-dimension vector quantization (VDVQ), which is a variant of VQ where the codebook vectors are full, but training vectors are sparse. Training is based on based on the generalized Lloyd algorithm (GLA) and the distance computed only on overlapping entries. For user clustering, the K codebook vectors can be thought of as K critics which have rated every movie. After clustering users/movies each into user/movie groups, estimate w(r u, v) from the frequencies of each user-group / movie-group / rating triple. Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 10 / 17

IMP Algorithm (2) IMP Algorithm and Performance Results Movies V V 0 V 1 V 2 V 3 V 4 V 5 V 6 permutation V M x (i) Ratings R O permutation Notation U Users U 0 U 1 U 2 U 3 U 4 U 5 U 6 Let T (i) m n be m n computation tree after i iterations, then message from movie m to user n is x (i) m n Pr(V m = v T (i) message from user n to movie m is y (i) n m Pr(U n = u T (i) n m) U N y (i) m n) and Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 11 / 17

IMP Algorithm (3) IMP Algorithm and Performance Results Step II. Message-Passing Update of Group Vectors 1 Initialization of user/movie to prior group probabilities m (v)=p V (v), y (0) n (u)=p U (u) x (0) 2 Recursive update for user/movie group probabilities x (i+1) m n(v) x m (0) (v) w (r u, v) y (i) y (i+1) k U m\n u n m(u) y n (0) (u) w (r u, v) x (i) k V n\m v k m (u) k n (v) U m : all users who rated movie m V n : all movies whose rating by user n was observed 3 Update w(r u, v) for group ratings Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 12 / 17

IMP Algorithm and Performance Results IMP Algorithm (4) Step III. Approximate Matrix Completion After i iterations of message-passing update, Output probability of rating R nm given observed ratings ˆp (i+1) R nm R (r) x (i+1) O m n(v)w (r u, v) y n m(u) (i+1) u,v Minimize the mean-squared prediction error with ˆr (i+1) n,m = r R r ˆp (i+1) R nm R O (r) Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 13 / 17

IMP Algorithm and Performance Results IMP Improves the Cold-Start Problem RMSE 1.4 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 Netflix Data Matrix 2 IMP OptSpace SET SVT Movie Average 5 10 15 20 25 30 Average Number of Observed Ratings per User Dataset Description Netflix submatrix of 5,035 movies and 5,017 users by avoiding movies and users with less than 3 ratings 16% of the users and 41% of the movies have less than 10 ratings Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 14 / 17

IMP Algorithm and Performance Results Generation of Synthetic Datasets RMSE 1.1 1.05 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 Synthetic Dataset 2 IMP Movie Average Lower Bound 5 10 15 20 25 30 Average Number of Observed Ratings per User Dataset Description Synthetic Dataset generated after learning Netflix Data Matrix with 16 movie/user groups and randomly subsampled Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 15 / 17

Conclusion Conclusion Main Contributions 1 Factor Graph Model: Probabilistic low-rank matrix model 2 IMP Algorithm: Combines clustering with message-passing Avenues for Future Research 1 Combine with clustering via message-passing to get a fully iterative approach 2 Obtain performance analysis of IMP algorithm via density evolution Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 16 / 17

Thank you Kim, Yedla and Pfister (Texas A&M) IMP for Matrix Completion ISTC 2010 17 / 17