Score function estimator and variance reduction techniques

Size: px
Start display at page:

Download "Score function estimator and variance reduction techniques"

Transcription

1 and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1

2 Outline Wilker Aziz Discrete variables 1

3 Variational inference for belief networks Generative model with NN likelihood z x N λ θ Let z {0, 1} d and q λ (z x) = = d q λ (z i x) i=1 d Bern(z i sigmoid(f λ (x))) Jointly optimise generative model p θ (x z) and inference model q λ (z x) under the same objective (ELBO) i=1 (1) Mnih and Gregor (2014) Wilker Aziz Discrete variables 2

4 Objective log p θ (x) Parameter estimation ELBO {}}{ E qλ (z x) [log p θ (x, Z) + H (q λ (z x)) = E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) arg max θ,λ E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) Wilker Aziz Discrete variables 3

5 Generative Network Gradient E θ qλ (z x) [log p θ (x z) constant wrt θ {}}{ KL (q λ (z x) p(z))

6 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :)

7 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :) MC 1 K K θ log p θ(x z (k) ) k=1 z (k) q λ (Z x)

8 Generative Network Gradient constant wrt θ {}}{ E θ qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) [ = E qλ (z x) θ log p θ(x z) }{{} expected gradient :) MC 1 K K θ log p θ(x z (k) ) k=1 z (k) q λ (Z x) Note: q λ (z x) does not depend on θ.

9 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) Wilker Aziz Discrete variables 5

10 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) = λ E q λ (z x) [log p θ (x z) λ KL (q λ(z x) p(z)) }{{} analytical computation Wilker Aziz Discrete variables 5

11 Inference Network Gradient λ analytical {}}{ E qλ (z x) [log p θ (x z) KL (q λ (z x) p(z)) = λ E q λ (z x) [log p θ (x z) λ KL (q λ(z x) p(z)) }{{} analytical computation The first term again requires approximation by sampling, but there is a problem Wilker Aziz Discrete variables 5

12 Inference Network Gradient λ E q λ (z x) [log p θ (x z) Wilker Aziz Discrete variables 6

13 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ Wilker Aziz Discrete variables 6

14 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation Wilker Aziz Discrete variables 6

15 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Wilker Aziz Discrete variables 6

16 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Differentiating the expression does not yield an expectation: cannot approximate via MC Wilker Aziz Discrete variables 6

17 Inference Network Gradient λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation MC estimator is non-differentiable: cannot sample first Differentiating the expression does not yield an expectation: cannot approximate via MC The original VAE employed a reparameterisation, but... Wilker Aziz Discrete variables 6

18 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j Wilker Aziz Discrete variables 7

19 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j { b j if z j = 1 = 1 b j if z j = 0 Wilker Aziz Discrete variables 7

20 Bernoulli pmf Bern(z j b j ) = b z j j (1 b j ) 1 z j { b j if z j = 1 = 1 b j if z j = 0 Can we reparameterise a Bernoulli variable? Wilker Aziz Discrete variables 7

21 Reparameterisation requires a Jacobian matrix Not really :( q(z; λ) = φ(ɛ = h(z, λ)) det Jh(z,λ) }{{} change of density (2) Elements in the Jacobian matrix J h(z,λ) [i, j = h i(z, λ) z j are not defined for non-differentiable functions Wilker Aziz Discrete variables 8

22 Outline Wilker Aziz Discrete variables 9

23 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation Wilker Aziz Discrete variables 10

24 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} = not an expectation q λ (z x) λ (log q λ(z x)) log p θ (x z) dz Wilker Aziz Discrete variables 10

25 We can again use the log identity for derivatives λ E q λ (z x) [log p θ (x z) = q λ (z x) log p θ (x z)dz λ = λ (q λ(z x)) log p θ (x z) dz }{{} not an expectation = q λ (z x) λ (log q λ(z x)) log p θ (x z) dz [ = E qλ (z x) log p θ (x z) λ log q λ(z x) }{{} expected gradient :) Wilker Aziz Discrete variables 10

26 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x)

27 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x)

28 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely

29 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient

30 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful

31 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful

32 : high variance We can now build an MC estimator λ E q λ (z x) [log p θ (x z) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) MC 1 K log p θ (x z (k) ) K λ log q λ(z (k) x) k=1 z (k) q λ (Z x) but but magnitude of log p θ (x z) varies widely model likelihood does not contribute to direction of gradient too much variance to be useful fully differentiable!

33 When variance is high we can sample more Wilker Aziz Discrete variables 12

34 When variance is high we can sample more won t scale Wilker Aziz Discrete variables 12

35 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) Wilker Aziz Discrete variables 12

36 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) excellent idea! Wilker Aziz Discrete variables 12

37 When variance is high we can sample more won t scale use variance reduction techniques (e.g. baselines and control variates) excellent idea! and now it s time for it! Wilker Aziz Discrete variables 12

38 Example Model Let us consider a latent factor model for topic modelling: Wilker Aziz Discrete variables 13

39 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model Wilker Aziz Discrete variables 13

40 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model the categorical distribution in turn depends on the binary latent factors z = (z 1,..., z k ) which are also i.i.d. Wilker Aziz Discrete variables 13

41 Example Model Let us consider a latent factor model for topic modelling: a document x = (x 1,..., x n ) consists of n i.i.d. categorical draws from that model the categorical distribution in turn depends on the binary latent factors z = (z 1,..., z k ) which are also i.i.d. z j Bernoulli (φ) (1 j k) x i Categorical (g θ (z)) (1 i n) Here φ specifies a Bernoulli prior and g θ ( ) is a function computed by neural network with softmax output. Wilker Aziz Discrete variables 13

42 Example Model z 1 z 2 z 3 x 1 x 2 x 3 x 4 At inference time the latent variables are marginally dependent. For our variational distribution we are going to assume that they are not (recall: mean field assumption). Wilker Aziz Discrete variables 14

43 Inference Network λ z 1 z 2 z 3 x 1 x 2 x 3 x 4 The inference network needs to predict k Bernoulli parameters ψ. Any neural network with sigmoid output will do that job. k q λ (z x) = Bern(z j ψ j ) j=1 where ψ k 1 = f λ (x) (3) Wilker Aziz Discrete variables 15

44 Computation Graph ψ x λ x θ inference model generation model

45 Computation Graph KL ψ x λ x θ inference model generation model

46 Computation Graph log q λ (z ψ) log p θ (x z) KL ψ x λ x θ inference model generation model

47 Computation Graph log q λ (z ψ) log p θ (x z) z Bernoulli (ψ) KL ψ x λ x θ inference model generation model Wilker Aziz Discrete variables 16

48 Reparametrisation Gradient x generation model KL z θ KL inference model µ σ λ x λ ɛ N (0, 1) Wilker Aziz Discrete variables 17

49 Pros and Cons Pros Applicable to all distributions Many libraries come with samplers for common distributions Wilker Aziz Discrete variables 18

50 Pros and Cons Pros Applicable to all distributions Many libraries come with samplers for common distributions Cons High Variance! Wilker Aziz Discrete variables 18

51 Outline Wilker Aziz Discrete variables 19

52 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Greensmith et al. (2004) Wilker Aziz Discrete variables 20

53 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) Greensmith et al. (2004) Wilker Aziz Discrete variables 20

54 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) If ψ(z) = f (z), and we estimate the expected value of f (x) ψ(x), then we have reduced variance to 0. Greensmith et al. (2004) Wilker Aziz Discrete variables 20

55 Control variates Suppose we want to estimate E[f (Z) and we know the expected value of another function ψ(z) on the same support. Then it holds that E[f (Z) = E[f (Z) ψ(z) + E[ψ(Z) (4) If ψ(z) = f (z), and we estimate the expected value of f (x) ψ(x), then we have reduced variance to 0. In general Var(f ψ) = Var(f ) 2 Cov(f, ψ) + Var(ψ) (5) If f and ψ are strongly correlated and the covariance is greater than Var(ψ), then we improve on the original estimation problem. Greensmith et al. (2004) Wilker Aziz Discrete variables 20

56 Reducing variance of score function estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x)

57 Reducing variance of score function estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) = E qλ (z x) log p θ(x z) λ log q λ(z x) C(x) }{{} λ log q λ(z x) }{{} f (z) ψ(z) + E qλ (z x) C(x) λ log q λ(z x) }{{} ψ(z) The last term is very simple!

58 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) Wilker Aziz Discrete variables 22

59 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ log q λ(z x) Wilker Aziz Discrete variables 22

60 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) [ λ log q λ(z x) Wilker Aziz Discrete variables 22

61 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) dz Wilker Aziz Discrete variables 22

62 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) λ q λ(z x) dz = C(x)E qλ (z x) = C(x) q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) dz Wilker Aziz Discrete variables 22

63 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22

64 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ = C(x) λ 1 q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22

65 E[ψ(Z) E qλ (z x) [ C(x) λ log q λ(z x) = C(x)E qλ (z x) = C(x) [ λ q λ(z x) q λ (z x) = C(x)E qλ (z x) = C(x) λ q λ(z x) dz = C(x) λ = C(x) λ 1 = 0 q λ (z x) [ λ log q λ(z x) λ q λ(z x) q λ (z x) q λ (z x) dz dz Wilker Aziz Discrete variables 22

66 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) Wilker Aziz Discrete variables 23

67 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) λ log q λ(z x) [ + E qλ (z x) C(x) λ log q λ(z x) Wilker Aziz Discrete variables 23

68 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ + E qλ (z x) C(x) λ log q λ(z x) = E qλ (z x) λ log q λ(z x) [ log p θ (x z) λ log q λ(z x) C(x) λ log q λ(z x) Wilker Aziz Discrete variables 23

69 Improved estimator Back to the score function estimator [ E qλ (z x) log p θ (x z) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ + E qλ (z x) C(x) λ log q λ(z x) [ = E qλ (z x) log p θ (x z) λ log q λ(z x) C(x) [ = E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) C(x) is called a baseline λ log q λ(z x) λ log q λ(z x) Wilker Aziz Discrete variables 23

70 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) (6) Williams (1992) Wilker Aziz Discrete variables 24

71 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) or input-dependent [ E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) (6) (7) Williams (1992) Wilker Aziz Discrete variables 24

72 Baselines Baselines can be constant [ E qλ (z x) (log p θ (x z) C) λ log q λ(z x) or input-dependent [ E qλ (z x) (log p θ (x z) C(x)) λ log q λ(z x) or both E qλ (z x) [ (log p θ (x z) C C(x)) λ log q λ(z x) (6) (7) (8) Williams (1992) Wilker Aziz Discrete variables 24

73 Full power of control variates If we design C( ) to depend on the variable of integration z, we exploit the full power of control variates, but designing and using those require more careful treatment Blei et al. (2012); Ranganath et al. (2014); Gregor et al. (2014) Wilker Aziz Discrete variables 25

74 Learning baselines Baselines are predicted by a regression model (e.g. a neural net). One idea is to centre the learning signal, in which case we train the baseline with an L 2 -loss: ρ = arg min (C ρ (x) log p(x z)) 2 ρ Gu et al. (2015) Wilker Aziz Discrete variables 26

75 Putting it together Parameter estimation arg max θ,λ E qλ (z x) [log p θ (x Z) KL (q λ (z x) p(z)) arg min (C ρ (x) log p(x z)) 2 ρ Generative gradient E qλ (z x) [ θ log p θ(x z) Inference gradient E qλ (z x) [ (log p θ (x z) C(x)) λ log q λ(z x)

76 Summary Wilker Aziz Discrete variables 28

77 Summary Reparametrisation not available for discrete variables. Wilker Aziz Discrete variables 28

78 Summary Reparametrisation not available for discrete variables. Use score function estimator. Wilker Aziz Discrete variables 28

79 Summary Reparametrisation not available for discrete variables. Use score function estimator. High variance. Wilker Aziz Discrete variables 28

80 Summary Reparametrisation not available for discrete variables. Use score function estimator. High variance. Always use baselines for variance reduction! Wilker Aziz Discrete variables 28

81 Literature I David M. Blei, Michael I. Jordan, and John W. Paisley. Variational bayesian inference with stochastic search. In ICML, URL Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov): , Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra. Deep autoregressive networks. In Eric P. Xing and Tony Jebara, editors, ICML, pages , URL Shixiang Gu, Sergey Levine, Ilya Sutskever, and Andriy Mnih. Muprop: Unbiased backpropagation for stochastic neural networks. arxiv preprint arxiv: , Wilker Aziz Discrete variables 29

82 Literature II Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. arxiv preprint arxiv: , Rajesh Ranganath, Sean Gerrish, and David Blei. Black Box Variational Inference. In Samuel Kaski and Jukka Corander, editors, AISTATS, pages , URL Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4): , URL Wilker Aziz Discrete variables 30

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

GRADIENT-BASED OPTIMIZATION OF NEURAL

GRADIENT-BASED OPTIMIZATION OF NEURAL Workshop track - ICLR 28 GRADIENT-BASED OPTIMIZATION OF NEURAL NETWORK ARCHITECTURE Will Grathwohl, Elliot Creager, Seyed Kamyar Seyed Ghasemipour, Richard Zemel Department of Computer Science University

More information

Hierarchical Variational Models

Hierarchical Variational Models Rajesh Ranganath Dustin Tran David M. Blei Princeton University Harvard University Columbia University rajeshr@cs.princeton.edu dtran@g.harvard.edu david.blei@columbia.edu Related Work. Variational models

More information

Deep generative models of natural images

Deep generative models of natural images Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders

More information

19: Inference and learning in Deep Learning

19: Inference and learning in Deep Learning 10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models

More information

Variational Autoencoders. Sargur N. Srihari

Variational Autoencoders. Sargur N. Srihari Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a

More information

arxiv: v1 [stat.ml] 31 Dec 2013

arxiv: v1 [stat.ml] 31 Dec 2013 Black Box Variational Inference Rajesh Ranganath Sean Gerrish David M. Blei Princeton University, 35 Olden St., Princeton, NJ 08540 {rajeshr,sgerrish,blei} AT cs.princeton.edu arxiv:1401.0118v1 [stat.ml]

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Semi-Amortized Variational Autoencoders

Semi-Amortized Variational Autoencoders Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag Alexander Rush Code: https://github.com/harvardnlp/sa-vae Background: Variational Autoencoders (VAE) (Kingma et al.

More information

One-Shot Generalization in Deep Generative Models

One-Shot Generalization in Deep Generative Models One-Shot Generalization in Deep Generative Models Danilo J. Rezende* Shakir Mohamed* Ivo Danihelka Karol Gregor Daan Wierstra Google DeepMind, London bstract Humans have an impressive ability to reason

More information

Iterative Amortized Inference

Iterative Amortized Inference Iterative Amortized Inference Joseph Marino Yisong Yue Stephan Mt Abstract Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

Deep Generative Models and a Probabilistic Programming Library

Deep Generative Models and a Probabilistic Programming Library Deep Generative Models and a Probabilistic Programming Library Discriminative (Deep) Learning Learn a (differentiable) function mapping from input to output x f(x; θ) y Gradient back-propagation Generative

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:

More information

LEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review

LEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review LEARNING TO INFER Anonymous authors Paper under double-blind review ABSTRACT Inference models, which replace an optimization-based inference procedure with a learned model, have been fundamental in advancing

More information

Probabilistic Programming with Pyro

Probabilistic Programming with Pyro Probabilistic Programming with Pyro the pyro team Eli Bingham Theo Karaletsos Rohit Singh JP Chen Martin Jankowiak Fritz Obermeyer Neeraj Pradhan Paul Szerlip Noah Goodman Why Pyro? Why probabilistic modeling?

More information

Latent Regression Bayesian Network for Data Representation

Latent Regression Bayesian Network for Data Representation 2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Center, Cancún, México, December 4-8, 2016 Latent Regression Bayesian Network for Data Representation Siqi Nie Department of Electrical,

More information

Iterative Inference Models

Iterative Inference Models Iterative Inference Models Joseph Marino, Yisong Yue California Institute of Technology {jmarino, yyue}@caltech.edu Stephan Mt Disney Research stephan.mt@disneyresearch.com Abstract Inference models, which

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

CSC412: Stochastic Variational Inference. David Duvenaud

CSC412: Stochastic Variational Inference. David Duvenaud CSC412: Stochastic Variational Inference David Duvenaud Admin A3 will be released this week and will be shorter Motivation for REINFORCE Class projects Class Project ideas Develop a generative model for

More information

Implicit generative models: dual vs. primal approaches

Implicit generative models: dual vs. primal approaches Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

arxiv: v2 [stat.ml] 22 Feb 2018

arxiv: v2 [stat.ml] 22 Feb 2018 Truncated Variational Sampling for Black Box Optimization of Generative Models arxiv:1712.08104v2 [stat.ml] 22 Feb 2018 Jörg Lücke joerg.luecke@uol.de Universität Oldenburg and Cluster of Excellence H4a

More information

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17 DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17 GENERATIVE MODELS DATA 3 DATA 4 example 1 DATA 5 example 2 DATA 6 example 3 DATA 7 number of

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

Modeling and Optimization of Thin-Film Optical Devices using a Variational Autoencoder

Modeling and Optimization of Thin-Film Optical Devices using a Variational Autoencoder Modeling and Optimization of Thin-Film Optical Devices using a Variational Autoencoder Introduction John Roberts and Evan Wang, {johnr3, wangevan}@stanford.edu Optical thin film systems are structures

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

Factor Topographic Latent Source Analysis: Factor Analysis for Brain Images

Factor Topographic Latent Source Analysis: Factor Analysis for Brain Images Factor Topographic Latent Source Analysis: Factor Analysis for Brain Images Jeremy R. Manning 1,2, Samuel J. Gershman 3, Kenneth A. Norman 1,3, and David M. Blei 2 1 Princeton Neuroscience Institute, Princeton

More information

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Variational Autoencoders

Variational Autoencoders red red red red red red red red red red red red red red red red red red red red Tutorial 03/10/2016 Generative modelling Assume that the original dataset is drawn from a distribution P(X ). Attempt to

More information

Variational Methods for Discrete-Data Latent Gaussian Models

Variational Methods for Discrete-Data Latent Gaussian Models Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level

More information

Towards Conceptual Compression

Towards Conceptual Compression Towards Conceptual Compression Karol Gregor karolg@google.com Frederic Besse fbesse@google.com Danilo Jimenez Rezende danilor@google.com Ivo Danihelka danihelka@google.com Daan Wierstra wierstra@google.com

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

arxiv: v1 [stat.ml] 3 Apr 2017

arxiv: v1 [stat.ml] 3 Apr 2017 Lars Maaløe 1 Marco Fraccaro 1 Ole Winther 1 arxiv:1704.00637v1 stat.ml] 3 Apr 2017 Abstract Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

arxiv: v1 [cs.lg] 5 May 2015

arxiv: v1 [cs.lg] 5 May 2015 Reinforced Decision Trees Reinforced Decision Trees arxiv:1505.00908v1 [cs.lg] 5 May 2015 Aurélia Léon aurelia.leon@lip6.fr Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Stochastic Simulation with Generative Adversarial Networks

Stochastic Simulation with Generative Adversarial Networks Stochastic Simulation with Generative Adversarial Networks Lukas Mosser, Olivier Dubrule, Martin J. Blunt lukas.mosser15@imperial.ac.uk, o.dubrule@imperial.ac.uk, m.blunt@imperial.ac.uk (Deep) Generative

More information

Importance Sampled Stochastic Optimization for Variational Inference

Importance Sampled Stochastic Optimization for Variational Inference Importance Sampled Stochastic Optimization for Variational Inference Joseph Sakaya Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki, Finland Arto

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models

Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models Mohammad Emtiyaz Khan Center for Advanced Intelligence Project (AIP)

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

arxiv: v6 [stat.ml] 15 Jun 2015

arxiv: v6 [stat.ml] 15 Jun 2015 VARIATIONAL RECURRENT AUTO-ENCODERS Otto Fabius & Joost R. van Amersfoort Machine Learning Group University of Amsterdam {ottofabius,joost.van.amersfoort}@gmail.com ABSTRACT arxiv:1412.6581v6 [stat.ml]

More information

arxiv: v1 [stat.ml] 17 Apr 2017

arxiv: v1 [stat.ml] 17 Apr 2017 Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Ardavan Saeedi CSAIL, MIT Matthew D. Hoffman Adobe Research Stephen J. DiVerdi Adobe Research arxiv:1704.04997v1 [stat.ml]

More information

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear

More information

COMBINING MODEL-BASED AND MODEL-FREE RL

COMBINING MODEL-BASED AND MODEL-FREE RL COMBINING MODEL-BASED AND MODEL-FREE RL VIA MULTI-STEP CONTROL VARIATES Anonymous authors Paper under double-blind review ABSTRACT Model-free deep reinforcement learning algorithms are able to successfully

More information

arxiv: v1 [stat.ml] 1 Sep 2017

arxiv: v1 [stat.ml] 1 Sep 2017 Mean Actor Critic Kavosh Asadi 1 Cameron Allen 1 Melrose Roderick 1 Abdel-rahman Mohamed 2 George Konidaris 1 Michael Littman 1 arxiv:1709.00503v1 [stat.ml] 1 Sep 2017 Brown University 1 Amazon 2 Providence,

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Cost Functions in Machine Learning

Cost Functions in Machine Learning Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something

More information

TOWARDS A NEURAL STATISTICIAN

TOWARDS A NEURAL STATISTICIAN TOWARDS A NEURAL STATISTICIAN Harrison Edwards School of Informatics University of Edinburgh Edinburgh, UK H.L.Edwards@sms.ed.ac.uk Amos Storkey School of Informatics University of Edinburgh Edinburgh,

More information

Generative Models in Deep Learning. Sargur N. Srihari

Generative Models in Deep Learning. Sargur N. Srihari Generative Models in Deep Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1 Topics 1. Need for Probabilities in Machine Learning 2. Representations 1. Generative and Discriminative Models 2. Directed/Undirected

More information

Max-Margin Deep Generative Models

Max-Margin Deep Generative Models Max-Margin Deep Generative Models Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing Research,

More information

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma {tim,karpathy,peter,dpkingma}@openai.com

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text

More information

Efficient Feature Learning Using Perturb-and-MAP

Efficient Feature Learning Using Perturb-and-MAP Efficient Feature Learning Using Perturb-and-MAP Ke Li, Kevin Swersky, Richard Zemel Dept. of Computer Science, University of Toronto {keli,kswersky,zemel}@cs.toronto.edu Abstract Perturb-and-MAP [1] is

More information

The Amortized Bootstrap

The Amortized Bootstrap Eric Nalisnick 1 Padhraic Smyth 1 Abstract We use amortized inference in conjunction with implicit models to approximate the bootstrap distribution over model parameters. We call this the amortized bootstrap,

More information

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Ardavan Saeedi Matthew D. Hoffman Stephen J. DiVerdi CSAIL, MIT Google Brain Adobe Research Asma Ghandeharioun Matthew

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation

More information

Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.

Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini. z i! z j Latent Variable Models for the Analysis, Visualization and Prediction of etwork and odal Attribute Data School of Engineering University of Bristol isabella.gollini@bristol.ac.uk January 4th,

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

arxiv: v2 [cs.lg] 17 Dec 2018

arxiv: v2 [cs.lg] 17 Dec 2018 Lu Mi 1 * Macheng Shen 2 * Jingzhao Zhang 2 * 1 MIT CSAIL, 2 MIT LIDS {lumi, macshen, jzhzhang}@mit.edu The authors equally contributed to this work. This report was a part of the class project for 6.867

More information

DEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS

DEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS DEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS Nat Dilokthanakul,, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran & Murray Shanahan

More information

Deep Learning. Yee Whye Teh (Oxford Statistics & DeepMind)

Deep Learning. Yee Whye Teh (Oxford Statistics & DeepMind) Deep Learning Yee Whye Teh (Oxford Statistics & DeepMind) http://csml.stats.ox.ac.uk/people/teh What is Machine Learning? Information Structure Prediction Decisions Actions data What is Machine Learning?

More information

GAN Frontiers/Related Methods

GAN Frontiers/Related Methods GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs (Salimans, et. al 2016) CSC 2541 (07/10/2016) Robin Swanson (robin@cs.toronto.edu) Training GANs is Difficult

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Non-Linearity of Scorecard Log-Odds

Non-Linearity of Scorecard Log-Odds Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9 Lloyds Banking

More information

Learning to generate with adversarial networks

Learning to generate with adversarial networks Learning to generate with adversarial networks Gilles Louppe June 27, 2016 Problem statement Assume training samples D = {x x p data, x X } ; We want a generative model p model that can draw new samples

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Replacing Neural Networks with Black-Box ODE Solvers

Replacing Neural Networks with Black-Box ODE Solvers Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Vulnerability of machine learning models to adversarial examples

Vulnerability of machine learning models to adversarial examples Vulnerability of machine learning models to adversarial examples Petra Vidnerová Institute of Computer Science The Czech Academy of Sciences Hora Informaticae 1 Outline Introduction Works on adversarial

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

The Multi-Entity Variational Autoencoder

The Multi-Entity Variational Autoencoder The Multi-Entity Variational Autoencoder Charlie Nash 1,2, S. M. Ali Eslami 2, Chris Burgess 2, Irina Higgins 2, Daniel Zoran 2, Theophane Weber 2, Peter Battaglia 2 1 Edinburgh University 2 DeepMind Abstract

More information

A derivative-free trust-region algorithm for reliability-based optimization

A derivative-free trust-region algorithm for reliability-based optimization Struct Multidisc Optim DOI 10.1007/s00158-016-1587-y BRIEF NOTE A derivative-free trust-region algorithm for reliability-based optimization Tian Gao 1 Jinglai Li 2 Received: 3 June 2016 / Revised: 4 September

More information

Warped Mixture Models

Warped Mixture Models Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable

More information

Programming Exercise 4: Neural Networks Learning

Programming Exercise 4: Neural Networks Learning Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course

More information

When Variational Auto-encoders meet Generative Adversarial Networks

When Variational Auto-encoders meet Generative Adversarial Networks When Variational Auto-encoders meet Generative Adversarial Networks Jianbo Chen Billy Fang Cheng Ju 14 December 2016 Abstract Variational auto-encoders are a promising class of generative models. In this

More information

arxiv: v2 [stat.ml] 21 Oct 2017

arxiv: v2 [stat.ml] 21 Oct 2017 Variational Approaches for Auto-Encoding Generative Adversarial Networks arxiv:1706.04987v2 stat.ml] 21 Oct 2017 Mihaela Rosca Balaji Lakshminarayanan David Warde-Farley Shakir Mohamed DeepMind {mihaelacr,balajiln,dwf,shakir}@google.com

More information

A trust-region method for stochastic variational inference with applications to streaming data

A trust-region method for stochastic variational inference with applications to streaming data with applications to streaming data Lucas Theis Centre for Integrative Neuroscience, Otfried-Müller-Str. 25, BW 72076 Germany Matthew D. Hoffman Adobe Research, 601 Townsend St., San Francisco, CA 94103

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information