Variational Methods for Discrete-Data Latent Gaussian Models

Size: px

Start display at page:

Download "Variational Methods for Discrete-Data Latent Gaussian Models"

Opal Davidson
5 years ago
Views:

1 Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012

2 The Big Picture Joint density models for data with mixed data types Bayesian models principled and robust approach Algorithms that are not only accurate and fast, but are also easy to tune, implement, and intuitive (speed- accuracy tradeoffs) Variational Methods for Discrete-Data Latent Gaussian Models Slide 2 of 46

3 Sources of Discrete Data User rating data Survey/voting data and blogs for sentiment analysis Health data Consumer choice data Sports/game data tag correlation. Slide 3 of 46

4 Motivation: Recommendation system Movie rating dataset Missing values Different types of data User1 User2 User3 User4 User5 User6. Movie Movie Movie Movie Movie Movie Slide 4 of 46

Missing Ratings From Wikipedia on Netflix-prize dataset The training set is such that the average user rated over 200 movies, and the average movie was rated by over 5000

5 Missing Ratings From Wikipedia on Netflix-prize dataset The training set is such that the average user rated over 200 movies, and the average movie was rated by over 5000 users. But there is wide variance in the data some movies in the training set have as few as 3 ratings, while one user rated over 17,000 movies. Movielens Dataset Slide 6 of 46

6 Sources of Discrete Data User rating data Survey/voting data and blogs for sentiment analysis Health data Consumer choice data Sports/game data tag correlation. Slide 8 of 46

7 What we need! For these datasets, we need a method of analysis which Handles missing values efficiently Makes efficient use of the data by weighting reliable data vectors more than the unreliable ones Makes efficient use of the data by fusing different types of data efficiently (binary, ordinal, categorical, count, text) Slide 9 of 46

8 Factor Model movies D Y D movies W L factors N users Z N users L factors ExpFamily: Gaussian: Slide 10 of 46 Collins et. al. 2002, Khan et. al.2010, Yu et.al. 2009

9 Bayesian Learning µ, Σ x z n y n n=1:n W This talk: Lower bound maximization Slide 11 of 46

10 Variational Methods Design tractable bounds to reduce approximation error Efficient optimization since lower bounds are concave : good convergence rates and easy convergence diagnostics Efficient expectation-maximization (EM) algorithms for parameter leaning Comparable performance to MCMC, but much faster Algorithms with a wide range of speed-accuracy trade-offs Slide 12 of 46

11 Outline Latent Gaussian models Bounds for binary data Bounds for categorical data Results Future work and conclusions Slide 13 of 46

12 Outline Latent Gaussian models Definition and examples Problem with parameter learning Bounds for binary data Bounds for categorical data Results Future work and conclusions Slide 14 of 46

13 Latent Gaussian Model (LGM) y 11 y 12 y 13 η 11 η 12 η 13 z 11 z 12 z 13 y 21 y 23 y 32 y 32 η 21 η 23 w η 32 η 32 N L z 21 z 22 z 23 D y D1 y D2 y D3 N η D1 η D2 η D3 D L µ, Σ z n W y n n=1:n Slide 15 of 46

14 Likelihood Examples Data type Real Count Binary Distribution Gaussian Poisson Bernoulli-Logit 0.9 Categorical Ordinal Multinomial-Logit Proportional-odds 2 Slide 16 of 46

15 Parameter Estimation µ, Σ z n W y n Bernoulli-logit n=1:n x Slide 19 of 46

16 Jensen s Lower Bound Slide 20 of 46

17 Variational Lower Bound Generalized EM algorithm E-step involves minimizing convex function Early stopping in E-step (Almost) no tuning parameters Easy convergence diagnostics Slide 21 of 46

18 Outline Latent Gaussian models Bounds for binary data Bernoulli-logistic likelihood The Bohning bound (Khan, Marlin, Bouchard, Murphy, NIPS 2010) Piecewise bounds (Marlin, Khan, Murphy, ICML 2011) Bounds for categorical data Results Future work and conclusions Slide 22 of 46

19 Bernoulli-Logit Likelihood some other tractable terms in m and V Slide 23 of 46

20 Local Variational Bounds + some other tractable terms in m and V x Bohning s bound (Khan, Marlin, Bouchard, Murphy 2010) Jaakola s bound (Jaakkola and Jordan1996) Piecewise quadratic bounds (Marlin, Khan, Murphy 2011) Slide 24 of 46

21 Bohning Bound is Faster Slide 25 of 46 O(L 3 ND) For n = 1:N V n = (W T A n W + I) -1 m n =.. end O(L 2 ND) V = (W T AW + I) -1 For n = 1:N m n =.. end

22 Piecewise bounds are more accurate Bohning Jaakkola Piecewise Q 1 (x) Q 3 (x) Q 2 (x) Slide 27 of 46

23 Details of Piecewise bounds Find cut points and parameters of each piece by minimizing maximum error Linear pieces (Hsiung, Kim and Boyd, 2008) Quadratic Pieces (Nelder- Mead method) Fixed Piecewise Bounds! Increase accuracy by increasing the number of pieces Slide 28 of 46

24 Outline Latent Gaussian models Bounds for binary data Bounds for categorical data Multinomial-logistic likelihood and local variational bounds Stick-breaking likelihood (Khan, Mohamed, Marlin, Murphy, AI-Stats 2012) Results Future work and conclusions Slide 29 of 46

25 Multinomial-Logit Likelihood Slide 30 of 46

26 Local Variational bounds The Bohning bound Fast and closed form updates The log bound (Blei and Lafferty 2006) More accurate than the Bohning bound, but slower The product of sigmoid bound (Bouchard 2007) Slide 31 of 46

27 Stick-Breaking Likelihood 0 1 Slide 32 of 46

28 Outline Latent Gaussian models Bounds for binary data Bounds for categorical data Results Future work and conclusions Slide 33 of 46

29 Speed Accuracy Trade-offs Binary FA : UCI voting dataset (D=15, N=435) Bohning Jaakkola Piecewise Linear with 3 pieces Piecewise Quad with 3 pieces Piecewise Quad with 10 pieces Slide 34 of 46

30 Comparison with EP Binary Gaussian Process : Ionosphere dataset (D=200) Σ ij =σ exp[- x i -x j 2 /s] µ, Σ z n W y n n=1:n Slide 35 of 46

31 EP vs PW : Posterior Distribution (Neg) KL-Lower Bound to MargLik Approximation To MargLik Pred Error EP Piece ewise Bound Slide 38 of 46

32 Comparison with EP Both methods give very similar results for GPs Our approach can be easily extended to factor models Variational EM objective function is well-defined and can be obtained by solving minimization of convex functions Numerically stable Slide 39 of 46

33 MultiClass Gaussian Process MCMC Glass dataset (D=143, K=6) Bohning Log VB-probit Stick-PW Prediction Error Ne egloglik Slide 40 of 46

34 Categorical Factor Analysis Glass dataset (D = 10, N = 958, sum of K = 29) Logit-log Stick-PW Slide 41 of 46

35 Outline Latent Gaussian models Bounds for binary data Bounds for categorical data Results Future work and conclusions Slide 42 of 46

36 Future Work Large-scale collaborative filtering Use convexity to design approximate gradient methods Sparse Gaussian Posterior Distribution Tuning HMC using Bayesian optimization methods Latent Sparse-factor model Conditional models (e.g. to model for tag-image correlation) Slide 43 of 46

37 Conclusions Variational methods show comparable performance with existing approaches The main sources of errors is the bounding error Design of piecewise bounds to control these errors A good control over speed-accuracy trade-offs can be obtained Variational lower bounds can be optimized efficiently Use of convex optimization methods to get fast convergence rates and easy convergence diagnostics Design of efficient expectation-maximization (EM) algorithms Slide 44 of 46

38 Collaborators Kevin Murphy UBC Benjamin Marlin U. Mass-Amherst Guillaume Bouchard XRCE, France Shakir Mohamed U. Cambridge, now at UBC Slide 45 of 46

39 Thank You Slide 46 of 46

08 An Introduction to Dense Continuous Robotic Mapping

NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy