Score function estimator and variance reduction techniques
|
|
- Marilynn Arnold
- 5 years ago
- Views:
Transcription
1 and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1
2 Outline Wilker Aziz Discrete variables 1
3 Variational inference for belief networks Generative model with NN likelihood z x N λ θ Let z {0, 1} d and q λ (z x) = = d q λ (z i x) i=1 d Bern(z i sigmoid(f λ (x))) Jointly optimise generative model p θ (x z) and inference model q λ (z x) under the same objective (ELBO) i=1 (1) Mnih and Gregor (2014) Wilker Aziz Discrete variables 2
4 Objective log p θ (x) Parameter estimation ELBO {}}{ E qλ (z x) [log p θ (x, Z) + H (q λ (z x)) = E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) arg max θ,λ E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) Wilker Aziz Discrete variables 3
5 Generative Network Gradient E θ qλ (z x) [log p θ (x z) constant wrt θ {}}{ KL (q λ (z x) p(z))
6 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :)
7 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :) MC 1 K K θ log p θ(x z (k) ) k=1 z (k) q λ (Z x)
8 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :) MC 1 K K θ log p θ(x z (k) ) k=1 z (k) q λ (Z x) Note: q λ (z x) does not depend on θ.
9 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) Wilker Aziz Discrete variables 5
10 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) = λ E q λ (z x) [log p θ (x z) λ KL (q λ(z x) p(z)) }{{} analytical computation Wilker Aziz Discrete variables 5
11 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) = λ E q λ (z x) [log p θ (x z) λ KL (q λ(z x) p(z)) }{{} analytical computation The first term again requires approximation by sampling, but there is a problem Wilker Aziz Discrete variables 5
12 Inference Network Gradient λ E q λ (z x) [log p θ (x z) Wilker Aziz Discrete variables 6
13 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ Wilker Aziz Discrete variables 6
14 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation Wilker Aziz Discrete variables 6
15 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Wilker Aziz Discrete variables 6
16 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Differentiating the expression does not yield an expectation: cannot approximate via MC Wilker Aziz Discrete variables 6
17 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Differentiating the expression does not yield an expectation: cannot approximate via MC The original VAE employed a reparameterisation, but... Wilker Aziz Discrete variables 6
18 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j Wilker Aziz Discrete variables 7
19 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j { b j if z j = 1 = 1 b j if z j = 0 Wilker Aziz Discrete variables 7
20 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j { b j if z j = 1 = 1 b j if z j = 0 Can we reparameterise a Bernoulli variable? Wilker Aziz Discrete variables 7
21 Reparameterisation requires a Jacobian matrix Not really :( q(z; λ) = φ(ɛ = h(z, λ)) det Jh(z,λ) }{{} change of density (2) Elements in the Jacobian matrix J h(z,λ) [i, j = h i(z, λ) z j are not defined for non-differentiable functions Wilker Aziz Discrete variables 8
22 Outline Wilker Aziz Discrete variables 9
23 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation Wilker Aziz Discrete variables 10
24 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} = not an expectation q λ (z x) λ (log q λ(z x)) log p θ (x z) dz Wilker Aziz Discrete variables 10
25 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation = q λ (z x) λ (log q λ(z x)) log p θ (x z) dz [ = E qλ (z x) log p θ (x z) λ log q λ(z x) }{{} expected gradient :) Wilker Aziz Discrete variables 10
26 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x)
27 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x)
28 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely
29 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient
30 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful
31 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful
32 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful fully differentiable!
33 When variance is high we can sample more Wilker Aziz Discrete variables 12
34 When variance is high we can sample more won t scale Wilker Aziz Discrete variables 12
35 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) Wilker Aziz Discrete variables 12
36 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) excellent idea! Wilker Aziz Discrete variables 12
37 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) excellent idea! and now it s time for it! Wilker Aziz Discrete variables 12
38 Example Model Let us consider a latent factor model for topic modelling: Wilker Aziz Discrete variables 13
39 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model Wilker Aziz Discrete variables 13
40 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model the categorical distribution in turn depends on the binary latent factors z = (z 1,..., z k ) which are also i.i.d. Wilker Aziz Discrete variables 13
41 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model the categorical distribution in turn depends on the binary latent factors z = (z 1,..., z k ) which are also i.i.d. z j Bernoulli (φ) (1 j k) x i Categorical (g θ (z)) (1 i n) Here φ specifies a Bernoulli prior and g θ ( ) is a function computed by neural network with softmax output. Wilker Aziz Discrete variables 13
42 Example Model z 1 z 2 z 3 x 1 x 2 x 3 x 4 At inference time the latent variables are marginally dependent. For our variational distribution we are going to assume that they are not (recall: mean field assumption). Wilker Aziz Discrete variables 14
43 Inference Network λ z 1 z 2 z 3 x 1 x 2 x 3 x 4 The inference network needs to predict k Bernoulli parameters ψ. Any neural network with sigmoid output will do that job. k q λ (z x) = Bern(z j ψ j ) j=1 where ψ k 1 = f λ (x) (3) Wilker Aziz Discrete variables 15
44 Computation Graph ψ x λ x θ inference model generation model
45 Computation Graph KL ψ x λ x θ inference model generation model
46 Computation Graph log q λ (z ψ) log p θ (x z) KL ψ x λ x θ inference model generation model
47 Computation Graph log q λ (z ψ) log p θ (x z) z Bernoulli (ψ) KL ψ x λ x θ inference model generation model Wilker Aziz Discrete variables 16
48 Reparametrisation Gradient x generation model KL z θ KL inference model µ σ λ x λ ɛ N (0, 1) Wilker Aziz Discrete variables 17
49 Pros and Cons Pros Applicable to all distributions Many libraries come with samplers for common distributions Wilker Aziz Discrete variables 18
50 Pros and Cons Pros Applicable to all distributions Many libraries come with samplers for common distributions Cons High Variance! Wilker Aziz Discrete variables 18
51 Outline Wilker Aziz Discrete variables 19
52 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Greensmith et al. (2004) Wilker Aziz Discrete variables 20
53 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) Greensmith et al. (2004) Wilker Aziz Discrete variables 20
54 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) If ψ(z) = f (z), and we estimate the expected value of f (x) ψ(x), then we have reduced variance to 0. Greensmith et al. (2004) Wilker Aziz Discrete variables 20
55 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) If ψ(z) = f (z), and we estimate the expected value of f (x) ψ(x), then we have reduced variance to 0. In general Var(f ψ) = Var(f ) 2 Cov(f, ψ) + Var(ψ) (5) If f and ψ are strongly correlated and the covariance is greater than Var(ψ), then we improve on the original estimation problem. Greensmith et al. (2004) Wilker Aziz Discrete variables 20
56 Reducing variance of score function estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x)
57 Reducing variance of score function estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) = E qλ (z x) log p θ(x z) λ log q λ(z x) C(x) }{{} λ log q λ(z x) }{{} f (z) ψ(z) + E qλ (z x) C(x) λ log q λ(z x) }{{} ψ(z) The last term is very simple!
58 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) Wilker Aziz Discrete variables 22
59 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ log q λ(z x) Wilker Aziz Discrete variables 22
60 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) [ λ log q λ(z x) Wilker Aziz Discrete variables 22
61 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) dz Wilker Aziz Discrete variables 22
62 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) λ q λ(z x) dz = C(x)E qλ (z x) = C(x) q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) dz Wilker Aziz Discrete variables 22
63 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22
64 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ = C(x) λ 1 q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22
65 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ = C(x) λ 1 = 0 q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22
66 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) Wilker Aziz Discrete variables 23
67 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) λ log q λ(z x) [ + E qλ (z x) C(x) λ log q λ(z x) Wilker Aziz Discrete variables 23
68 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ + E qλ (z x) C(x) λ log q λ(z x) = E qλ (z x) λ log q λ(z x) [ log p θ (x z) λ log q λ(z x) C(x) λ log q λ(z x) Wilker Aziz Discrete variables 23
69 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ + E qλ (z x) C(x) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ = E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) C(x) is called a baseline λ log q λ(z x) λ log q λ(z x) Wilker Aziz Discrete variables 23
70 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) (6) Williams (1992) Wilker Aziz Discrete variables 24
71 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) or input-dependent [ E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) (6) (7) Williams (1992) Wilker Aziz Discrete variables 24
72 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) or input-dependent [ E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) or both E qλ (z x) [ (log p θ (x z) C C(x)) λ log q λ(z x) (6) (7) (8) Williams (1992) Wilker Aziz Discrete variables 24
73 Full power of control variates If we design C( ) to depend on the variable of integration z, we exploit the full power of control variates, but designing and using those require more careful treatment Blei et al. (2012); Ranganath et al. (2014); Gregor et al. (2014) Wilker Aziz Discrete variables 25
74 Learning baselines Baselines are predicted by a regression model (e.g. a neural net). One idea is to centre the learning signal, in which case we train the baseline with an L 2 -loss: ρ = arg min (C ρ (x) log p(x z)) 2 ρ Gu et al. (2015) Wilker Aziz Discrete variables 26
75 Putting it together Parameter estimation arg max θ,λ E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) arg min (C ρ (x) log p(x z)) 2 ρ Generative gradient E qλ (z x) [ θ log p θ(x z) Inference gradient E qλ (z x) [ (log p θ (x z) C(x)) λ log q λ(z x)
76 Summary Wilker Aziz Discrete variables 28
77 Summary Reparametrisation not available for discrete variables. Wilker Aziz Discrete variables 28
78 Summary Reparametrisation not available for discrete variables. Use score function estimator. Wilker Aziz Discrete variables 28
79 Summary Reparametrisation not available for discrete variables. Use score function estimator. High variance. Wilker Aziz Discrete variables 28
80 Summary Reparametrisation not available for discrete variables. Use score function estimator. High variance. Always use baselines for variance reduction! Wilker Aziz Discrete variables 28
81 Literature I David M. Blei, Michael I. Jordan, and John W. Paisley. Variational bayesian inference with stochastic search. In ICML, URL Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov): , Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra. Deep autoregressive networks. In Eric P. Xing and Tony Jebara, editors, ICML, pages , URL Shixiang Gu, Sergey Levine, Ilya Sutskever, and Andriy Mnih. Muprop: Unbiased backpropagation for stochastic neural networks. arxiv preprint arxiv: , Wilker Aziz Discrete variables 29
82 Literature II Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. arxiv preprint arxiv: , Rajesh Ranganath, Sean Gerrish, and David Blei. Black Box Variational Inference. In Samuel Kaski and Jukka Corander, editors, AISTATS, pages , URL Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4): , URL Wilker Aziz Discrete variables 30
Deep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationGRADIENT-BASED OPTIMIZATION OF NEURAL
Workshop track - ICLR 28 GRADIENT-BASED OPTIMIZATION OF NEURAL NETWORK ARCHITECTURE Will Grathwohl, Elliot Creager, Seyed Kamyar Seyed Ghasemipour, Richard Zemel Department of Computer Science University
More informationHierarchical Variational Models
Rajesh Ranganath Dustin Tran David M. Blei Princeton University Harvard University Columbia University rajeshr@cs.princeton.edu dtran@g.harvard.edu david.blei@columbia.edu Related Work. Variational models
More informationDeep generative models of natural images
Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders
More information19: Inference and learning in Deep Learning
10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models
More informationVariational Autoencoders. Sargur N. Srihari
Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a
More informationarxiv: v1 [stat.ml] 31 Dec 2013
Black Box Variational Inference Rajesh Ranganath Sean Gerrish David M. Blei Princeton University, 35 Olden St., Princeton, NJ 08540 {rajeshr,sgerrish,blei} AT cs.princeton.edu arxiv:1401.0118v1 [stat.ml]
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationSemi-Amortized Variational Autoencoders
Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag Alexander Rush Code: https://github.com/harvardnlp/sa-vae Background: Variational Autoencoders (VAE) (Kingma et al.
More informationOne-Shot Generalization in Deep Generative Models
One-Shot Generalization in Deep Generative Models Danilo J. Rezende* Shakir Mohamed* Ivo Danihelka Karol Gregor Daan Wierstra Google DeepMind, London bstract Humans have an impressive ability to reason
More informationIterative Amortized Inference
Iterative Amortized Inference Joseph Marino Yisong Yue Stephan Mt Abstract Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationDeep Generative Models and a Probabilistic Programming Library
Deep Generative Models and a Probabilistic Programming Library Discriminative (Deep) Learning Learn a (differentiable) function mapping from input to output x f(x; θ) y Gradient back-propagation Generative
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:
More informationLEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review
LEARNING TO INFER Anonymous authors Paper under double-blind review ABSTRACT Inference models, which replace an optimization-based inference procedure with a learned model, have been fundamental in advancing
More informationProbabilistic Programming with Pyro
Probabilistic Programming with Pyro the pyro team Eli Bingham Theo Karaletsos Rohit Singh JP Chen Martin Jankowiak Fritz Obermeyer Neeraj Pradhan Paul Szerlip Noah Goodman Why Pyro? Why probabilistic modeling?
More informationLatent Regression Bayesian Network for Data Representation
2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Center, Cancún, México, December 4-8, 2016 Latent Regression Bayesian Network for Data Representation Siqi Nie Department of Electrical,
More informationIterative Inference Models
Iterative Inference Models Joseph Marino, Yisong Yue California Institute of Technology {jmarino, yyue}@caltech.edu Stephan Mt Disney Research stephan.mt@disneyresearch.com Abstract Inference models, which
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationCSC412: Stochastic Variational Inference. David Duvenaud
CSC412: Stochastic Variational Inference David Duvenaud Admin A3 will be released this week and will be shorter Motivation for REINFORCE Class projects Class Project ideas Develop a generative model for
More informationImplicit generative models: dual vs. primal approaches
Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationarxiv: v2 [stat.ml] 22 Feb 2018
Truncated Variational Sampling for Black Box Optimization of Generative Models arxiv:1712.08104v2 [stat.ml] 22 Feb 2018 Jörg Lücke joerg.luecke@uol.de Universität Oldenburg and Cluster of Excellence H4a
More informationDEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17
DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17 GENERATIVE MODELS DATA 3 DATA 4 example 1 DATA 5 example 2 DATA 6 example 3 DATA 7 number of
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationModeling and Optimization of Thin-Film Optical Devices using a Variational Autoencoder
Modeling and Optimization of Thin-Film Optical Devices using a Variational Autoencoder Introduction John Roberts and Evan Wang, {johnr3, wangevan}@stanford.edu Optical thin film systems are structures
More informationLearning from Data: Adaptive Basis Functions
Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.
More informationFactor Topographic Latent Source Analysis: Factor Analysis for Brain Images
Factor Topographic Latent Source Analysis: Factor Analysis for Brain Images Jeremy R. Manning 1,2, Samuel J. Gershman 3, Kenneth A. Norman 1,3, and David M. Blei 2 1 Princeton Neuroscience Institute, Princeton
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationVariational Autoencoders
red red red red red red red red red red red red red red red red red red red red Tutorial 03/10/2016 Generative modelling Assume that the original dataset is drawn from a distribution P(X ). Attempt to
More informationVariational Methods for Discrete-Data Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian
More informationUnsupervised Learning
Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationTowards Conceptual Compression
Towards Conceptual Compression Karol Gregor karolg@google.com Frederic Besse fbesse@google.com Danilo Jimenez Rezende danilor@google.com Ivo Danihelka danihelka@google.com Daan Wierstra wierstra@google.com
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationarxiv: v1 [stat.ml] 3 Apr 2017
Lars Maaløe 1 Marco Fraccaro 1 Ole Winther 1 arxiv:1704.00637v1 stat.ml] 3 Apr 2017 Abstract Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationarxiv: v1 [cs.lg] 5 May 2015
Reinforced Decision Trees Reinforced Decision Trees arxiv:1505.00908v1 [cs.lg] 5 May 2015 Aurélia Léon aurelia.leon@lip6.fr Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationStochastic Simulation with Generative Adversarial Networks
Stochastic Simulation with Generative Adversarial Networks Lukas Mosser, Olivier Dubrule, Martin J. Blunt lukas.mosser15@imperial.ac.uk, o.dubrule@imperial.ac.uk, m.blunt@imperial.ac.uk (Deep) Generative
More informationImportance Sampled Stochastic Optimization for Variational Inference
Importance Sampled Stochastic Optimization for Variational Inference Joseph Sakaya Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki, Finland Arto
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationConjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models
Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models Mohammad Emtiyaz Khan Center for Advanced Intelligence Project (AIP)
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationarxiv: v6 [stat.ml] 15 Jun 2015
VARIATIONAL RECURRENT AUTO-ENCODERS Otto Fabius & Joost R. van Amersfoort Machine Learning Group University of Amsterdam {ottofabius,joost.van.amersfoort}@gmail.com ABSTRACT arxiv:1412.6581v6 [stat.ml]
More informationarxiv: v1 [stat.ml] 17 Apr 2017
Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Ardavan Saeedi CSAIL, MIT Matthew D. Hoffman Adobe Research Stephen J. DiVerdi Adobe Research arxiv:1704.04997v1 [stat.ml]
More informationNeural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick
Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear
More informationCOMBINING MODEL-BASED AND MODEL-FREE RL
COMBINING MODEL-BASED AND MODEL-FREE RL VIA MULTI-STEP CONTROL VARIATES Anonymous authors Paper under double-blind review ABSTRACT Model-free deep reinforcement learning algorithms are able to successfully
More informationarxiv: v1 [stat.ml] 1 Sep 2017
Mean Actor Critic Kavosh Asadi 1 Cameron Allen 1 Melrose Roderick 1 Abdel-rahman Mohamed 2 George Konidaris 1 Michael Littman 1 arxiv:1709.00503v1 [stat.ml] 1 Sep 2017 Brown University 1 Amazon 2 Providence,
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationCost Functions in Machine Learning
Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something
More informationTOWARDS A NEURAL STATISTICIAN
TOWARDS A NEURAL STATISTICIAN Harrison Edwards School of Informatics University of Edinburgh Edinburgh, UK H.L.Edwards@sms.ed.ac.uk Amos Storkey School of Informatics University of Edinburgh Edinburgh,
More informationGenerative Models in Deep Learning. Sargur N. Srihari
Generative Models in Deep Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1 Topics 1. Need for Probabilities in Machine Learning 2. Representations 1. Generative and Discriminative Models 2. Directed/Undirected
More informationMax-Margin Deep Generative Models
Max-Margin Deep Generative Models Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing Research,
More informationPIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS
PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma {tim,karpathy,peter,dpkingma}@openai.com
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationEfficient Feature Learning Using Perturb-and-MAP
Efficient Feature Learning Using Perturb-and-MAP Ke Li, Kevin Swersky, Richard Zemel Dept. of Computer Science, University of Toronto {keli,kswersky,zemel}@cs.toronto.edu Abstract Perturb-and-MAP [1] is
More informationThe Amortized Bootstrap
Eric Nalisnick 1 Padhraic Smyth 1 Abstract We use amortized inference in conjunction with implicit models to approximate the bootstrap distribution over model parameters. We call this the amortized bootstrap,
More informationMultimodal Prediction and Personalization of Photo Edits with Deep Generative Models
Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Ardavan Saeedi Matthew D. Hoffman Stephen J. DiVerdi CSAIL, MIT Google Brain Adobe Research Asma Ghandeharioun Matthew
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation
More informationLatent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.
z i! z j Latent Variable Models for the Analysis, Visualization and Prediction of etwork and odal Attribute Data School of Engineering University of Bristol isabella.gollini@bristol.ac.uk January 4th,
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationarxiv: v2 [cs.lg] 17 Dec 2018
Lu Mi 1 * Macheng Shen 2 * Jingzhao Zhang 2 * 1 MIT CSAIL, 2 MIT LIDS {lumi, macshen, jzhzhang}@mit.edu The authors equally contributed to this work. This report was a part of the class project for 6.867
More informationDEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS
DEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS Nat Dilokthanakul,, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran & Murray Shanahan
More informationDeep Learning. Yee Whye Teh (Oxford Statistics & DeepMind)
Deep Learning Yee Whye Teh (Oxford Statistics & DeepMind) http://csml.stats.ox.ac.uk/people/teh What is Machine Learning? Information Structure Prediction Decisions Actions data What is Machine Learning?
More informationGAN Frontiers/Related Methods
GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs (Salimans, et. al 2016) CSC 2541 (07/10/2016) Robin Swanson (robin@cs.toronto.edu) Training GANs is Difficult
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationNon-Linearity of Scorecard Log-Odds
Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9 Lloyds Banking
More informationLearning to generate with adversarial networks
Learning to generate with adversarial networks Gilles Louppe June 27, 2016 Problem statement Assume training samples D = {x x p data, x X } ; We want a generative model p model that can draw new samples
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationReplacing Neural Networks with Black-Box ODE Solvers
Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning
More informationVulnerability of machine learning models to adversarial examples
Vulnerability of machine learning models to adversarial examples Petra Vidnerová Institute of Computer Science The Czech Academy of Sciences Hora Informaticae 1 Outline Introduction Works on adversarial
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationThe Multi-Entity Variational Autoencoder
The Multi-Entity Variational Autoencoder Charlie Nash 1,2, S. M. Ali Eslami 2, Chris Burgess 2, Irina Higgins 2, Daniel Zoran 2, Theophane Weber 2, Peter Battaglia 2 1 Edinburgh University 2 DeepMind Abstract
More informationA derivative-free trust-region algorithm for reliability-based optimization
Struct Multidisc Optim DOI 10.1007/s00158-016-1587-y BRIEF NOTE A derivative-free trust-region algorithm for reliability-based optimization Tian Gao 1 Jinglai Li 2 Received: 3 June 2016 / Revised: 4 September
More informationWarped Mixture Models
Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable
More informationProgramming Exercise 4: Neural Networks Learning
Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationWhen Variational Auto-encoders meet Generative Adversarial Networks
When Variational Auto-encoders meet Generative Adversarial Networks Jianbo Chen Billy Fang Cheng Ju 14 December 2016 Abstract Variational auto-encoders are a promising class of generative models. In this
More informationarxiv: v2 [stat.ml] 21 Oct 2017
Variational Approaches for Auto-Encoding Generative Adversarial Networks arxiv:1706.04987v2 stat.ml] 21 Oct 2017 Mihaela Rosca Balaji Lakshminarayanan David Warde-Farley Shakir Mohamed DeepMind {mihaelacr,balajiln,dwf,shakir}@google.com
More informationA trust-region method for stochastic variational inference with applications to streaming data
with applications to streaming data Lucas Theis Centre for Integrative Neuroscience, Otfried-Müller-Str. 25, BW 72076 Germany Matthew D. Hoffman Adobe Research, 601 Townsend St., San Francisco, CA 94103
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More information