Outlier Pursuit: Robust PCA and Collaborative Filtering

Similar documents
Robust Principal Component Analysis (RPCA)

Convex Optimization MLSS 2015

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Introduction to Topics in Machine Learning

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

ELEG Compressive Sensing and Sparse Signal Representations

Sparse Reconstruction / Compressive Sensing

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Section Notes 5. Review of Linear Programming. Applied Math / Engineering Sciences 121. Week of October 15, 2017

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Lecture 26: Missing data

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

Lecture 2 September 3

Intrinsic Dimensionality Estimation for Data Sets

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Linear Programming and its Applications

Unsupervised learning in Vision

Clustering Partially Observed Graphs via Convex Optimization

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Alternating Projections

ELEG Compressive Sensing and Sparse Signal Representations

Low Rank Representation Theories, Algorithms, Applications. 林宙辰 北京大学 May 3, 2012

Data-driven modeling: A low-rank approximation problem

A Geometric Analysis of Subspace Clustering with Outliers

Data-driven modeling: A low-rank approximation problem

Algebraic Iterative Methods for Computed Tomography

Math 414 Lecture 2 Everyone have a laptop?

Collaborative Sparsity and Compressive MRI

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro

Parallel and Distributed Sparse Optimization Algorithms

Online Subspace Estimation and Tracking from Missing or Corrupted Data

What Does Robustness Say About Algorithms?

Lecture 7: Support Vector Machine

Outline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Models of distributed computing: port numbering and local algorithms

The Simplex Algorithm for LP, and an Open Problem

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs

Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

IE598 Big Data Optimization Summary Nonconvex Optimization

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

The convex geometry of inverse problems

Composite Self-concordant Minimization

Lecture 17 Sparse Convex Optimization

Exploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting

Towards Efficient Higher-Order Semidefinite Relaxations for Max-Cut

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Compressive Sensing of High-Dimensional Visual Signals. Aswin C Sankaranarayanan Rice University

Proximal operator and methods

P257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies

4 Integer Linear Programming (ILP)

Face Recognition via Sparse Representation

Support Vector Machines

6.854 Advanced Algorithms. Scribes: Jay Kumar Sundararajan. Duality

ICRA 2016 Tutorial on SLAM. Graph-Based SLAM and Sparsity. Cyrill Stachniss

6 Randomized rounding of semidefinite programs

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization

CS 664 Structure and Motion. Daniel Huttenlocher

Kernel Methods & Support Vector Machines

4 LINEAR PROGRAMMING (LP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Integer Programming Theory

UNLABELED SENSING: RECONSTRUCTION ALGORITHM AND THEORETICAL GUARANTEES

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

A General Greedy Approximation Algorithm with Applications

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

Two-view geometry Computer Vision Spring 2018, Lecture 10

Limitations of Matrix Completion via Trace Norm Minimization

Provable Non-convex Robust PCA

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Robust Principal Component Analysis with Side Information

LECTURES 3 and 4: Flows and Matchings

Lecture 9. Introduction to Numerical Techniques

Convex Optimization M2

Motivation. My General Philosophy. Assumptions. Advanced Computer Graphics (Spring 2013) Precomputation-Based Relighting

60 2 Convex sets. {x a T x b} {x ã T x b}

Sparse Solutions to Linear Inverse Problems. Yuzhe Jin

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

Self Scaled Regularized Robust Regression

Math 5593 Linear Programming Lecture Notes

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Algebraic Iterative Methods for Computed Tomography

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Efficient Iterative Semi-supervised Classification on Manifold

Convex Optimization. Lijun Zhang Modification of

Programming, numerics and optimization

CS675: Convex and Combinatorial Optimization Spring 2018 The Simplex Algorithm. Instructor: Shaddin Dughmi

ONLINE ROBUST PRINCIPAL COMPONENT ANALYSIS FOR BACKGROUND SUBTRACTION: A SYSTEM EVALUATION ON TOYOTA CAR DATA XINGQIAN XU THESIS

On Unbounded Tolerable Solution Sets

Support Vector Machines.

On the number of distinct directions of planes determined by n points in R 3

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning

Generalized Principal Component Analysis CVPR 2007

- - Tues 14 March Tyler Neylon

S(x) = arg min s i 2S kx

Transcription:

Outlier Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University of Singapore Joint w/ Constantine Caramanis, Yudong Chen, Sujay Sanghavi

Outline PCA and Outliers - Why SVD fails - Corrupted features vs. corrupted points Our idea + Algorithms Results - Full observation - Missing Data Framework for Robustness in High Dimensions

Principal Components Analysis Given points that lie on/near a Lower dimensional subspace, find this subspace. Data Points Classical technique: 1. Organize points as matrix 2. Take SVD 3. Top singular vectors span space Low Rank Matrix

Fragility Gross errors of even one/few points can completely throw off PCA Reason: Classical PCA minimizes error, which is susceptible to gross outliers

Two types of gross errors Corrupted Features Outliers - Individual entries corrupted -Entire columns corrupted.. and missing data versions of both

PCA with Outliers Points Objective: find identities of outliers (and hence col. space of true matrix)

Outlier Pursuit - Idea Points Standard PCA

Outlier Pursuit - Method Points We propose: Convex surrogate for Rank constraint Convex surrogate for Column-sparsity

When does it (not) work? When certain directions of column space of poorly represented This vector has large inner product with some coordinate axes is large

Results Assumption: Columns of true are incoherent: Note: First consider: Noiseless case

Results Assumption: Columns of true are incoherent: Note: Theorem: (noiseless case) Our convex program can identify upto a fraction of outliers as long as Outer bound: makes the problem un-identifiable

Proof Technique A point is the optimum of a convex function Steps: 1. guess a nice point, -- oracle problem Zero lies in the (sub) gradient of at 2. show it is the optimum by showing zero is in subgradient

Proof Technique Guessing a nice optimum (Note: Due to the fact that column sparse matrix is also low-rank, hence nice points are non-unique. In problems like matrix completion, compressed sensing or RPCA with sparse corruption, this is not an issue) Oracle Problem: is, by definition, a nice point. Rest of proof: showing it is the optimum of original program, under our assumption.

Proof Technique Constructing Dual Certificate: need a Q to satisfy A first guess is to use This works when each corrupted column is orthogonal, but fails otherwise. To satisfy the equalities, use a corrected Q. Then show the inequalities hold under the conditions given in the theorem.

(More detailed) Proof Technique One can verify that Let then (1) and (2) are satisfied To satisfy (3), we need, such that This can be constructed via Indeed the least square solution. The inverse operator =, provided the series converges. satisfies all three equalities. Then show the inequalities hold.

Noisy case We observe for unknown noise Relax the equality constraint to inequality, i.e., solve Under essentially same condition as the noiseless case, the noisy outlier pursuit succeeds: it finds a solution close to a pair with correct column space and column support.

Implementation Issue Outlier Pursuit is an SDP. Interior methods may not scale. We are interested in the structure can sacrifice accuracy for scalability Use proximal gradient algorithms.

Performance L + C formulation L + S formulation ( from [Chandrasekaran et. al.], [Candes,et. al.] )

More experiment 220 samples of digit 1 and 11 samples of digit 7, unlabeled.

Another view Mean is solution of Standard PCA of M is solution of Fragile: Can be easily skewed by one / few points Median is solution of Our method is (convex rel. of) Robust: skewing requires Error in constant fraction of pts

Collaborative Filtering w/ Adversaries

In an incident that highlights the pitfalls of online recommendation systems, Amazon.com on Friday removed a link to a sex manual that appeared next to a listing for a spiritual guide by well-known Christian televangelist Pat Robertson. http://news.cnet.com/2100-1023-976435.html

Manipulation in RS Recommendation systems utilize ALL users ratings to predict theirs future preferences. Malicious users try to skew prediction by injecting fake ratings. Some popular algorithms (e.g. knn) are vulnerable to manipulation.

RS via Matrix Completion (convention wisdom) Preference Matrix Partial Observations Given Recover L 0?

RS via Matrix Completion Assumption: User Preferences form Low-rank Matrix Users Products Features User Features Products User 3 s overall preference for Product 2 Product 2 s feature vector User 3 s feature vector e.g. Product = Laptop Performance Portability Preference for performance Preference for portability

Robust Matrix Completion (New idea) M L 0 (Low-rank) Authentic columns C 0 (Column-Sparse) Arbitrarily corrupted columns Partial Observation Problem: Given, find L 0 and indentify columns of C 0

Collaborative Filtering w/ Adversaries Users Low-rank matrix that -Is partiallly observed -Has some corrupted columns == outliers with missing data! Our setting: -Good users == random sampling of incoherent matrix (as in matrix completion) -Manipulators == completely arbitrary sampling, values

Outlier Pursuit with Missing Data for observed Now: need row space to be incoherent as well - since we are doing matrix completion and manipulator identification

Our Result Theorem: Convex program optimum is such that has the correct column space and the support of is exactly the set of manipulators, whp, provided Sampling density Fraction of users that are manipulators Note: no assumptions on manipulators

Corollaries The algorithm succeeds if there are a constant fraction of corrupted columns, and the fraction of observation is also a constant; there are a growing number of corrupted columns, and the fraction of observations goes to zero.

Roadmap of the Proof Need a Q to satisfy Least square solution to satisfy (1), (2), and (5)? Problem: least square solution involves infinite series, which breaks down independence, while to get sharp inequality, we need to exploit iid of sampling.

Roadmap of the proof -2 Rethink of the optimality condition: Optim. certificate insures for any feasible pertubation The condition, comes from the need to show above inequality for any However, that is not necessary. One can show that We can relax the equality condition to inequality, but tighten the other inequalities. Makes it possible to avoid infinite series.

Robust Collaborative Filtering Algo: Partially observed Low-rank + Column-sparse Algo: Partially observed Sparse + Low-rank

More generally Several methods in High-dim. Statistics Loss function regularizer Our approach: (same) Loss function Weighted sum of regularizers Yields robustness + flexibility in several settings. Today: PCA wit Outliers + missing data

Papers can be found in my website: http://guppy.mpe.nus.edu.sg/~mpexuh/