Non-parametric Approximate Bayesian Computation for Expensive Simulators

Size: px
Start display at page:

Download "Non-parametric Approximate Bayesian Computation for Expensive Simulators"

Transcription

1 Non-parametric Approximate Bayesian Computation for Expensive Simulators Steven Laan Master s thesis 42 EC Master s programme Artificial Intelligence University of Amsterdam Supervisor Ted Meeds A thesis submitted in conformity with the requirements for the degree of MSc. in Artificial Intelligence August 26, 2014

2 Acknowledgements I would like to express my gratitude to my supervisor Ted Meeds for his patience and advice every time I stumbled into his office. Furthermore I would like to thank Max Welling for his helpful suggestions and remarks along the way. I would like to thank my thesis committee Henk, Max and Ted for agreeing on a defence date on a rather short notice. I would like to thank Henk Zeevat in particular, for chairing the committee just after his vacation. Finally, I would like to thank my girlfriend and family for their endless support. i

3 Abstract This study investigates new methods for Approximate Bayesian Computation (ABC). The main goal is to create ABC methods that are easy to use by researchers in other fields and perform well with expensive simulators. Kernel ABC methods have been used to approximate non-gaussian densities, however, these methods are inefficient in use of simulations in two ways. First, they do not assess the required number of simulations. For some parameter settings only a small number of simulations is needed. Second, all known methods throw away the entire history of simulations, ignoring the valuable information it holds. This thesis addresses these problems by introducing three new algorithms: Adaptive KDE Likelihood ABC (AKL-ABC), Projected Synthetic Surrogate ABC (PSS-ABC) and Projected KDE Surrogate ABC (PKS-ABC). The first one, AKL-ABC, performs a kernel density estimation at each parameter location and adaptively chooses the number of simulations at each location. The other two algorithms take advantage of the simulation history to estimate the simulator outcomes at the current parameter location. The difference between them is that PSS-ABC assumes a Gaussian conditional probability and hence is a parametric method, whereas PKS-ABC is nonparametric by performing a kernel density estimate. Experiments demonstrate the advantages of these algorithms in particular with respect to the number of simulations needed. Additionally, the flexibility of non-parametric methods is illustrated. ii

4 List of Abbreviations ABC MCMC KDE GP MH LOWESS (A)SL (A)KL PSS PKS Approximate Bayesian Computation Markov chain Monte Carlo Kernel density estimate Gaussian process Metropolis-Hastings Locally weighted linear regression (Adaptive) Synthetic Likelihood (Adaptive) KDE Likelihood Projected Synthetic Surrogate Projected KDE Surrogate List of Symbols θ Vector of parameters y E( ) U(a, b) Vector of observed values Expected value of a random variable Uniform distribution on the interval a, b N (µ, Σ) Normal distribution with mean vector µ and covariance matrix Σ Gam(α, β) Gamma distribution with shape α and rate β k h ( ) A kernel function with bandwidth h π(x y) Probability of x given y KDE(x k, h, w, X) The kernel density estimate at location x, using kernel k with bandwidth h and data points X, each with a weight from weight vector w. iii

5 Contents 1 Introduction Goals Contributions Thesis Contents Approximate Bayesian Computation Rejection ABC MCMC Methods Marginal and Pseudo-marginal Samplers Synthetic Likelihood Adaptive Synthetic Likelihood KDE Likelihood ABC Population Methods Surrogate Methods Simulator Surrogate ABC Likelihood Surrogate ABC Kernel ABC Methods Overview of ABC methods Adaptive KDE Likelihood ABC Kernel Surrogate Methods Projected Synthetic Surrogate Projected Kernel Density Estimated Surrogate Ease of Use Experiments Exponential Toy Problem Multimodal Problem Blowfly Problem Conclusion and Discussion 42 iv

6 A Kernel Methods 49 A.1 Kernel Density Estimation A.2 Bandwidth Selection A.3 Adaptive Kernel Density Estimation A.4 Kernel regression B Pseudocode for SL and ASL 57 C Kernel choice and bandwidth selection 59 C.1 Kernel choice C.2 Bandwidth selection v

7 1 Introduction Since the beginning of mankind, we have wanted to understand our surroundings. Traditionally, this was done by observing and interacting with the world. A scientist or philosopher has an idea or model of how things work and tries to test his hypothesis in the real world. As our understanding grew, the concepts and models involved in explaining the world became more complex. These models of nature can often be viewed as a black box with certain parameters or knobs, accounting for different situations. For example a rabbit population model can have knobs for the number of initial organisms in the population, the number of female rabbits and maybe others to account for predators or forces of nature. If someone comes up with such a model, it has to be verified on some real world data; we must be certain that it is a good model for the intended purpose. To test whether a model is adequate, observations or measurements in the real world are gathered. To fit the model to observations, its parameters are adjusted so that the model outputs best match the observations. From a probabilistic viewpoint the problem becomes: what parameter settings are likely to result in the observed values? In most cases, scientists have intuitions about the ranges of the different parameters. Sometimes they might even know the exact setting. However, as the models become more complex, it also becomes harder to set these parameters. Thus it would be convenient if there were some methods to do this automatically. These methods are called inference methods with Bayesian inference being the largest area within this topic. Bayesian inference relies on Bayes Theorem which states that the posterior distribution of parameters given the observed values can be calculated using the likelihood function, which tells you how likely a value is obtained given a certain parameter setting, and a prior over the parameters. More specifically: the posterior is proportional to the prior times the likelihood. One problem of Bayesian inference for simulation-based science is that it needs to evaluate the likelihood function. For most complex models this 1

8 likelihood is either unknown or intractable to compute. This is where Approximate Bayesian Computation (ABC) comes in, the main subject of this thesis. Approximate Bayesian Computation, sometimes called likelihood-free methods, performs an approximation to Bayesian inference, without using the likelihood function. Different methods for ABC already exist. However, these are often targeted at fast simulators or make assumptions about the underlying likelihood. This thesis introduces non-parametric alternatives to existing parametric methods that aim to overcome the shortcomings of their counterparts. 1.1 Goals The main focus of this work is on algorithms for expensive simulators, where it is desirable to keep the number of simulation calls to a minimum. An example of an expensive simulator is the whole cell simulation [19], which has 30 parameters and where a single simulation takes core hours. For these kind of simulators more sophisticated methods are required, that take every single simulation result into account. Hence our research question is: Can we build non-parametric ABC methods that take advantage of the simulation history and have similar performance to existing methods? It should be noted that there are two ways of interpreting the term nonparametric. The first interpretation is that non-parametric methods do not rely on assumptions that data belong to any specific distribution. The second interpretation covers methods that adapt the complexity of the model to the complexity of the data. In these kinds of models individual variables are often still assumed to belong to a particular distribution. The structure of the data is however not assumed to be fixed and can be adapted by adding variables. Our goal is to create non-parametric methods that do not rely on any specific underlying distribution. Hence we use the first interpretation of non-parametric. Another goal is to make the methods as easy to use as possible, preferably plug and play. Hence there should be as few algorithm parameters as possible. 2

9 The simulation-based science community should not have to worry about the details of the algorithm, only about the results it obtains. Hence the augmented research question becomes: Can we build nonparametric ABC methods that take advantage of the simulation history, have similar performance to existing methods and are easy to use? 1.2 Contributions The main contributions of this thesis are three new algorithms. The first algorithm is an adaptive version of the existing kernel ABC. This algorithm, AKL-ABC, keeps track of the Metropolis-Hastings acceptance error. It adaptively chooses the number of simulation calls such that the acceptance error is below some threshold. The other two algorithms take advantage of the whole simulation history. Hence these methods are much cheaper than existing algorithms in terms of simulation calls. Because simulation results from all locations are incorporated, instead of only simulations of the current location, we call these methods global methods. The second algorithm is called Projected Synthetic Surrogate ABC and asssumes a Gaussian conditional distribution. All points in the simulation history are projected onto the current parameter location and the weighted sample mean and variance are estimated. The third method, Projected KDE Surrogate ABC, does not make the Gaussian assumption and is therefore better equipped to model non-gaussian conditional distributions, such as multimodal or heavy tailed distributions. Instead of computing a weighted sample mean and variance, it performs a weighted kernel density estimate. 1.3 Thesis Contents This thesis is organized as follows. An introduction to Approximate Bayesian Computation, along with descriptions of the different known algorithms and their weaknesses is given in section 2. Section 3 describes the main contributions of this thesis. These contrbutions are then tested and compared 3

10 with the known algorithms on different problems in section 4. Finally, the experimental result are discussed and an outlook is given in section 5. Throughout this thesis we assume the reader is familiar with probability theory, calculus and kernel methods. We think this last subject is less wellknown, hence a short introduction to kernel methods and kernel density estimation is provided in appendix A. 4

11 2 Approximate Bayesian Computation Approximate Bayesian Computation (ABC) or likelihood-free methods are employed to compute the posterior distribution when there is no likelihood function available. This can either be because it is intractable to compute the likelihood, or that a closed form just not exists. Although the likelihood is unavailable, it is assumed that there does exist a simulator that returns simulated samples. More formally, we can write a simulation call as if it were a draw from a distribution π(x θ): x sim π(x θ) (1) The problem that ABC solves is also known as parameter inference: Given observations y, what is the distribution of the parameters θ of the simulator that could have generated these observations. The idea behind ABC is that this simulator can be used to bypass the likelihood function in the following way, by augmenting the posterior with a draw from a simulator. π(θ y ) π(y θ) π(θ) π(θ, x y ) π(y θ, x) π(x θ) π(θ) (2) Where π(y θ, x) is weighting function or similarity measure of how close the simulated x is to the observed data y. In this augmented system an approximation of the true posterior can be obtained by integrating out the auxiliary variable x: ˆπ(θ y ) π(θ) π(y θ, x) π(x θ) dx (3) Sometimes this approximation is called π LF (θ y ) [39]. 5

12 2.1 Rejection ABC The first and most simple ABC method was introduced in a thought experiment by Rubin [34] and is now known as the ABC rejection algorithm. At each iteration a sample x is drawn from the simulator and is kept only if its equal to the observed y. The algorithm iterates until it has reaches a fixed number of samples N, where N is usually set to some high number. It is rather trivial that you end up with the exact posterior distribution using this method. However, the algorithm is very inefficient, as it is rejecting samples most of the time. This is where distance measures and (sufficient) statistics come in. Instead of only keeping the sample if it is exactly equal to the observed value, an error margin ɛ is introduced, sometimes called the epsilon tube. Samples within this tube are accepted. Another technique that is often employed in higher dimensional models is the notion of sufficient statistics [12, 18, 29]. Instead of matching the exact output of the real world, both the simulator and observed value are summarized using statistics. This reduces dimensionality while, if the statistics are sufficient, still retaining the same information. The pseudocode for Rejection ABC with these two additions is shown in algorithm 1. Here ρ( ) is the distance measure and S( ) is the statistics function. Algorithm 1 Rejection ABC with ɛ-tube and sufficient statistics 1: procedure Reject-ABC(y, ɛ, N, π(θ), π(x θ)) 2: n 1 3: while n < N do 4: repeat 5: θ n π(θ) 6: x sim π(x θ n ) 7: until ρ(s(x), S(y )) ɛ 8: n n + 1 9: end while 10: return θ 1,..., θ N 11: end procedure 6

13 2.2 MCMC Methods While rejection ABC is an effective method when simulation is very cheap and the space low dimensional, when either the simulator is computationally expensive, or the output is high dimensional the rejection sampler is hopeless: almost every sample from the prior will be rejected. This is because the proposals from the prior will often be in regions of low posterior probability. To address the inefficiency of rejection sampling, several Markov chain Monte Carlo (MCMC) ABC methods exist [2, 3, 24]. A well-known algorithm that implements the MCMC scheme is the Metropolis- Hastings (MH) algorithm. A Metropolis-Hastings scheme can be constructed to sample from the posterior distribution. Hence, we need a Markov chain with π(θ y ) as stationary distribution. As before, the state of the chain is augmented to (θ, x), where x is generated from the simulator with parameter setting θ. Consider the following factorization of the proposal for the chain with augmented states: q ((θ, x ) (θ, x)) = q(θ θ) π(x θ ) (4) That is, when in state (θ, x), first propose a new parameter location θ using the proposal distribution: θ q(θ θ). Then a simulator output x is simulated at location θ : x sim π(x θ ). Now the acceptance probability α can be formulated as [39]: ( α = min 1, π(θ, x y ) ( = min q((θ, x) (θ ), x )) π(θ, x y ) q((θ, x ) (θ, x)) 1, π(y θ, x ) π(x θ ) π(θ ) q(θ θ ) ) π(x θ) π(y θ, x) π(x θ) π(θ) q(θ θ) π(x θ ) ) ( = min 1, π(y θ, x ) π(θ ) q(θ θ ) π(y θ, x) π(θ) q(θ θ) The resulting MCMC procedure is shown in algorithm 2. We will later refer to lines 4 to 11 as the MH-step of the algorithm, where a new parameter location (5) 7

14 is proposed and either accepted or rejected. Pseudocode of algorithms later in this thesis will only contain their MH-step procedure, since it is the only part that is different. Algorithm 2 Pseudo-code for likelihood-free MCMC. 1: procedure LF-MCMC(T, q, π(θ), π(x θ), y ) 2: Initialize θ 0, x 0 3: for t 0 to T do 4: θ q(θ θ) 5: x sim π(x θ ) 6: Set α using equation (5) 7: if U(0, 1) α then 8: (θ t+1, x t+1 ) (θ, x ) 9: else 10: (θ t+1, x t+1 ) (θ t, x t ) 11: end if 12: end for 13: end procedure Marginal and Pseudo-marginal Samplers Given the definition of equation (5), an unbiased estimate of the marginal likelihood can be obtained by using Monte Carlo integration [3]: π(y θ) 1 S S π(θ) π(y θ, x s ) (6) where x s is an independent sample from the simulator. Then in the acceptance probability the division by S cancels out to obtain: ( α = min 1, π(θ ) ) S s=1 π(y θ, x s) q(θ θ ) π(θ) (7) S s=1 π(y θ, x s ) q(θ θ) Note that the denominator does not have to be re-evaluated at every iteration: it can be carried over from the previous iteration. If this equation is plugged into algorithm 2 we obtain what is called the pseudo-marginal sampler. It is known to suffer from poor mixing [37], which 8 s=1

15 means that the chain is slow in converging to the desired distribution. This is due to the fact that the denominator is not recomputed: if we once get a lucky draw that has high posterior probability, it will be hard to accept any other sample and hence the chain can get stuck in that location. Therefore, the marginal sampler was proposed. While it does not have the same guarantees as the pseudo-marginal sampler, it does mix better. Next to the numerator it also re-estimates the denominator. As a result a single draw has less influence on the acceptance rate, because the next iteration it is thrown away and a new sample is drawn. Hence, the chain less likely to get stuck this way. It is however more costly in terms of simulation calls Synthetic Likelihood Instead of approximating the likelihood term with a Monte Carlo integration, Wood proposed to model the simulated values with a Gaussian distribution [46]. This Gaussian can be estimated by calculating the sample mean and sample variance: ˆµ θ = 1 S S x s (8) s=1 ˆΣ θ = 1 S 1 S (x s ˆµ θ )(x s ˆµ θ ) (9) s=1 The probability of a pseudo-datum x given our parameter location θ is set to the proposed Gaussian: π(x θ) = N (ˆµ θ, ˆΣ θ ). If this is plugged into equation (3) and we choose to use a Gaussian kernel for π(y x), the integral can be computed analytically. This leads to the following probability for y 9

16 given θ: π(y θ) = = = π(y x) π(x θ) dx k h (y, x) π(x θ) dx N (0, ɛ 2 I) N (ˆµ θ, ˆΣ θ ) dx = N (ˆµ θ, ˆΣ θ + ɛ 2 I) Hence assuming that the pseudo-data at each parameter location are distributed normally is equivalent to assuming that the underlying likelihood function is Gaussian. If this is plugged into the Metropolis-Hastings algorithm, the acceptance probability becomes: α = min ( 1, N (ˆµ θ, ˆΣ θ + ɛ 2 I) π(θ ) q(θ θ ) N (ˆµ θ, ˆΣ θ + ɛ 2 I) π(θ) q(θ θ) ) (10) The pseudo-code for the resulting Synthetic Likelihood ABC algorithm is given in appendix B Adaptive Synthetic Likelihood One downside to Synthetic Likelihood ABC (SL-ABC) is that there is no parameter to set the accuracy of the method. There is of course the number of simulations at each location, but this number gives no guarantee in terms of any error criterion. Moreover, at some parameter locations it may be more prone to making errors in accepting/rejecting than others. Hence, sampling the same number of times at each location may not be optimal. Therefore, instead of retrieving an equal amount of samples at each location the idea is to initially call the simulator S 0 times. Additional simulations are only performed when needed. More formally: keep track of the probability of making a Metropolis-Hastings error in accepting/rejecting the sample. When this error is larger than some threshold value ξ, more simulations are 10

17 required. The first example of an adaptive method is the Adaptive Synthetic Likelihood algorithm (ASL-ABC) introduced by Meeds and Welling [25]. To be able to estimate the probability of an acceptance error, an estimate of the acceptance distribution is needed. This can be done by creating M sets of S samples and calculating the acceptance probabilities for each set. To obtain the samples, the simulator could be called S (M 1) times extra. However, because the synthetic likelihood assumes a normal distribution for each slice, we can derive that the variance of the mean is proportional to 1/S. Thus instead of calling the simulator, sample means can be drawn from a normal distribution. Each sample mean µ m can be used to calculate one acceptance probability α m using equation (10). With these probabilities α m the acceptance error can be computed. There are two possibilities for making an error: 1. Reject, while we should accept. 2. Accept, while we should reject. Hence, the total error, conditioned on the uniform sample u U(0, 1), becomes: E u (α) = 1 M 1 M m [α m < u] m [α m u] if u τ if u > τ (11) Where τ is the decision threshold. The total unconditional error can then be obtained by integrating out u: E(α) = E u (α) U(0, 1) du (12) This can be done by Monte Carlo integration or using grid values for u. Equation (12) is known as the mean absolute deviation of p(α), which is minimized for τ = median(α) [25]. Hence, in the algorithm the decision threshold is set to median of the α samples. The pseudocode for the algorithm is given in appendix B. 11

18 2.3 KDE Likelihood ABC The SL-ABC and ASL-ABC algorithms assume that the likelihood function at each parameter location is a Gaussian, which may not be the case. the underlying density has for example multiple modes or a heavy tail the resulting Gaussian fit can be very poor. Hence, instead of assuming a certain form of the likelihood, a non-parametric estimate can be advantageous in certain scenarios [27, 41, 44]. If Moreover, Turner [41] shows that when using a kernel density estimate (KDE) there is no need for sufficient statistics. The acceptance probability when using a KDE becomes: ( α = min 1, s k h(y x s) π(θ ) q(θ θ ) ) s k h(y x s ) π(θ) q(θ θ) (13) Where k h ( ) is a kernel function with bandwidth h. The resulting Metropolis- Hastings step is shown in algorithm 3. Note that at each parameter location the bandwidth is re-estimated. The algorithm performs a simulation at both the current and the proposed location each iteration. Hence this is the marginal version of the algorithm. Algorithm 3 The MH-step for KL-ABC. 1: procedure KL-ABC MH-step(q, θ, π(θ), π(x θ), y, S, S) 2: θ q(θ θ) 3: for s 1 to S do 4: x sim s π(x θ) 5: x sim s π(x θ ) 6: end for 7: Set α using equation (13) 8: if U(0, 1) α then 9: return θ 10: end if 11: return θ 12: end procedure 12

19 2.4 Population Methods Instead of working with one sample of θ at a time, it is also possible to keep a population of samples, sometimes called particles. These are the population Monte Carlo (PMC) and sequential Monte Carlo (SMC) approaches and have the advantage that the resulting samples are individually independent. This is in contrast to single chain ABC-MCMC methods, where often thinning is used to obtain more independent samples. 1 Another advantage of population methods is that they are quite easily parallelized and therefore may result in speedups [13]. A disadvantage of population methods is that they require more simulation calls. For each particle at each iteration a simulation call is needed. Therefore if you have N particles, population methods need N simulation calls more per iteration compared to the single chain algorithms. For costly simulators the number of simulation samples needs to be minimized. For this reason, we will only focus on single chain methods. It should be noted that the ideas behind the proposed algorithms could also be viable for PMC or SMC algorithms. 2.5 Surrogate Methods For fast simulators, local methods such as SL-ABC and marginal-abc are perfectly fine. If however it is costly to simulate, it is desirable to incorporate the information that you gained by simulations at all previous locations. Therefore the question becomes: How can the information of the entire simulation history be used in the estimate for the current location? The idea is to aggregate all information of the samples in a surrogate function. This can be done in two ways: either you model the simulator or you model the likelihood function with the surrogate. Both approaches will be described in the next sections. Surrogate methods have been implemented in different research areas before [20, 21, 32], but are relatively new in the field of ABC [25, 45]. 1 Thinning is the process of only keeping every Nth sample. However, to keep a reasonable number of samples the chain needs to be run for a longer time. 13

20 2.5.1 Simulator Surrogate ABC In the first case, the surrogate function tries to model the simulator, but it should be computationally cheaper to run. Instead of calling the simulator for the current θ location directly, first the surrogate function is queried about its estimate. If the surrogate has small uncertainty about the queried value, the value returned by the surrogate is treated like it was from the real simulator. Otherwise, if the uncertainty is too high, additional samples are obtained from the simulator. After enough samples, the surrogate should be certain enough in all locations and hence not require any additional simulator calls. The first surrogate method of this kind is the Gaussian Process Surrogate (GPS-ABC) by Meeds and Welling [25]. As the name suggests, they fitted a Gaussian process to the acquired samples as a surrogate. They showed that a huge reduction in simulator calls is possible, while retaining similar performance. A nice property of modelling the simulator is that previously acquired samples can be used to train the model. Moreover, the results of different runs of the algorithm can be merged to obtain a better approximation. While Gaussian processes are a potent candidate for surrogate functions there are a couple of problems that surface in practice. The computational complexity of GPS-ABC is high. Moreover, to be able to run the algorithm, the hyperparameters of the Gaussian process need to be set, which can be difficult to elicit beforehand. A downside of GPS-ABC is that for simulators with J output dimensions the algorithm models J independent GPs instead of one J-dimensional output. Hence the different outputs are assumed to be independent, which may not be the case. For multimodal and other non-gaussian conditional probabilities, the Gaussian assumption of GPS-ABC is incorrect. Which leads to incorrect posterior distributions. 14

21 2.5.2 Likelihood Surrogate ABC Instead of modelling the simulator, the (log-)likelihood surface can be modelled. This is much easier than modelling a high-dimensional simulator, as the likelihood is a one dimensional function of θ. When a decent model of the likelihood surface has been obtained, regions of unlikely space can be ruled out. That is, no more samples should be accepted in regions of low likelihood. Wilkinson first proposed this approach using Gaussian Processes [45]. To discard areas of low likelihood he employs a technique called sequential history matching [9]. This is a hierarchical approach to rule out regions of space. At each step a new model is built, modelling only the region that was labelled plausible by the previous models. This way, each model gives a more detailed view of the log-likelihood surface. An advantage of this approach over the GPS-ABC algorithm that models the simulator, is that only a single Gaussian process is fitted, whereas the GPS-ABC has a GP for each dimension of the output. On the other hand, each sequential history step a new GP is fit and hence needs to be tuned. A downside of this approach is that the resulting likelihood surface is tightly connected to the parameter settings for the experiment. For example changing the error criterion ɛ changes the entire likelihood surface and hence the results for one experiment are not directly useful for another. 15

22 3 Kernel ABC Methods In this section we propose three new algorithms. However, first we give short descriptions of both the existing algorithms and the newly proposed algorithms. A table that provides an overview of the different properties of the various algorithm is also presented. 3.1 Overview of ABC methods A short description of each algorithm, where the ones denoted with an asterisk (*) will be proposed later in this thesis: Synthethic Likelihood assumes a Gaussian conditional distribution, which is estimated using a fixed number of local samples. Adaptive Synthetic Likelihood assumes a Gaussian conditional distribution, which is estimated using an adaptively chosen number of local samples. KDE Likelihood approximates the conditional distribution using a KDE based on a fixed number of local samples. Adaptive KDE Likelihood* approximates the conditional distribution using a KDE based on an adaptively chosen number of local samples. Projected Synthetic Surrogate* assumes a Gaussian conditional distribution. It is estimated using a weighted estimate of mean and variance based on all samples from the simulation history, which are projected onto the current parameter location. Projected KDE Surrogate* approximates the conditional distribution using a weighted KDE of projected samples from the simulation history. There are different properties that ABC algorithms can have. We put the existing algorithms as well as the ones that will be proposed later in table 1. For reading convenience the different abbreviations are stated in the Abbreviation column. The column labelled Local states whether the algorithm 16

23 is local or global in nature. Local algorithms only use samples on the parameter location, whereas global methods incorporate the samples from other locations as well. The Parametric column states whether the algorithm makes any assumptions about underlying structures. Finally, whether the algorithm adapts the number of samples to the parameter location is stated in the last column labelled Adaptive. Algorithm Abbreviation Local Parametric Adaptive Synthetic Likelihood SL Yes Yes No Adaptive Synthetic Likelihood ASL Yes Yes Yes KDE Likelihood KL Yes No No Adaptive KDE Likelihood AKL Yes No Yes Projected Synthetic Surrogate PSS No Yes Yes Projected KDE Surrogate PKS No No Yes Table 1: Different ABC algorithms and their properties. 3.2 Adaptive KDE Likelihood ABC In the same way the adaptive synthetic likelihood differs from the original synthetic likelihood, we built an adaptive version of the KDE Likelihood algorithm. As before, a distribution over acceptance probabilities is needed. Hence we need the variance of the estimator. One solution for this problem is to just get additional sets of S simulations, but this is too costly and other methods are preferred. The first solution is to use well-known asymptotic results for the variance of KDE. The approximation for the variance is [14]: σ(x) = ˆπ(x) Nh k(u) 2 du (14) Where ˆπ(x) is the KDE at location x, N is the number of training points, h is the bandwidth and the integral is known as the roughness of the kernel, which can be computed analytically for the commonly used kernels. 17

24 When the variance has been computed, several samples from the normal distribution N (µ(x), σ(x)) can be drawn to create an acceptance distribution, where µ(x) = ˆπ(x) is the KDE at location x. There are however two problems with this approach. The asymptotic theory from which equation (??) is derived, is based on the scenario that N, whereas we are in a regime where we specifically want to limit the number of simulation calls. Hence equation (14) is only valid for large values of N. The other problem is that we are now introducing a parametric estimate into the non-parametric algorithm. The goal was to have as few assumptions as possible, so any methods that do not assume a Gaussian likelihood are preferred. One such method is bootstrapping. A bootstrap sample of the simulated values is taken and a new kernel density estimate is performed. This new KDE yields another acceptance probability. Hence, this bootstrap process can be repeated M times to obtain a distribution over acceptance probabilities. This is the approach that is used in our new adaptive KDE Likelihood (AKL-ABC) algorithm. The pseudocode of this AKL-ABC is very similar to that of ASL-ABC. The only difference is how the acceptance probabilities are estimated. The pseudocode is shown in algorithm Kernel Surrogate Methods We propose two new surrogate methods that aim to address the shortcomings of the Gaussian process surrogate method described in section The reason we chose to model the simulator, instead of the likelihood function, is that when modelling the simulator results for one parameter setting of the algorithm can be reused for a different setting, as the modelled simulator remains the same. This property is especially desirable when working with simulators that are expensive. When modelling the log-likelihood this is not necessarily the case, as the likelihood surface is tightly connected to for example the ɛ parameter. 18

25 Algorithm 4 The MH-step for AKL-ABC. 1: procedure AKL-ABC MH-step(q, θ, π(θ), π(x θ), y, S 0, S, ξ, M) 2: θ q(θ θ) 3: S 0 4: c S 0 5: repeat 6: for s S to S + c do 7: x sim s π(x θ) 8: x s π(x θ ) 9: end for 10: S S + c 11: c S 12: for m 1 to M do 13: Obtain bootstrap samples of x 1,..., x S and x 1,..., x S 14: Set α m using equation (13) 15: end for 16: τ median(α) 17: Set E(α) using equation (12) 18: until E(α) ξ 19: if U(0, 1) τ then 20: return θ 21: end if 22: return θ 23: end procedure Projected Synthetic Surrogate The GPS-ABC algorithm uses a Gaussian process as a surrogate function. This GP provides for each parameter setting a mean and standard deviation. The main idea behind Projected Synthetic Surrogate ABC (PSS-ABC) is that we can compute these statistics using kernel regression. The computation of the mean and standard deviation using kernel regression is done using equations (15) and (16): ˆf(θ ) = ˆµ(θ ) = N n=1 k h(θ n θ ) y n N n=1 k h(θ n θ ) (15) 19

26 y y Orthogonal Linear corrected True density θ Orthogonal Linear corrected True density θ Figure 1: Linear corrected projection versus orthogonal projection with different numbers of training points. ˆσ 2 (θ ) = N n=1 k h(θ n θ ) (y n ˆµ(θ )) 2 N n=1 k h(θ n θ ) (16) Where θ is the proposed parameter location, θ n is the nth parameter location from the simulation history with its corresponding simulator output y n. The derivation of equation (15) can be found in appendix A.4. Kernel regression is a local constant estimator. This can be viewed as an orthogonal projection of neighbouring points onto the vertical line at the parameter location θ. A weighted average of the projected points is calculated using the kernel weights. This is the kernel regression estimate ˆf(θ) for that location. However, projecting orthogonally can lead to overly dispersed estimates, which is illustrated in figure 1. Thus instead of always projecting orthogonally, we first perform a local linear regression [7] and then project along this line, or hyperplane in higher dimensions. This idea of linear correction has been used in the ABC framework before [4,5]. The difference is that up to now it is used it as a post processing step, 20

27 whereas here it is an integral part of the algorithm. Moreover, because it is a post processing step, they project onto the θ axis, whereas we project on a line perpendicular to that. The locally weighted linear regression (LOWESS) [7] assumes the following regression model: y = Θβ + ɛ (17) Where ɛ is a vector of zero mean Gaussian noise, Θ the N by D design matrix consisting of N data points θ n with dimension D and β is the D-dimensional vector of regression coefficients. Note that we set θ 0 = 1 and hence β 0 is the intercept. To compute the estimated value ŷ for a proposed location θ locally weighted regression first computes a kernel weight w n for each data point θ n : w n = k h (θ n θ ) (18) Cleveland suggests to use the tricube kernel for these weights [7]. Then the following system of equations needs to be solved to obtain the regression coefficients: w wθ1 wθd y wθ1 wθ 2 1 wθ1 Θ D ywθ β =.. wθd wθd Θ 1 wθ 2 D ywθd (19) Where w is the vector of kernel weights for the N training points and Θ d denotes the dth column vector of Θ containing the dth entry of every training point, i.e. Θ 2 = [Θ 12, Θ 22,... Θ N2 ]. The resulting solution 2 for β is the vector of regression coefficients for the linear equation that describes the trend at location θ. Therefore, this β provides a hyperplane along which the data can be projected. 2 Which can easily be solved by most linear algebra packages. 21

28 When β is the zero-vector, you are projecting along a flat hyperplane and orthogonally on the θ slice. There are border cases where linear correction can get overconfident. For example when there are few samples near the proposed location. In the extreme case, this means the regression will only be based on two samples and in effect the regression is the line through these points, which can be very different than the actual trend going on. To overcome this problem, the algorithm needs to recognise these situations or assess the uncertainty in these situations. An elegant solution is to use smoothed bootstrapping [11]. Smoothed bootstrapping resamples from a set of samples, but unlike ordinary bootstrapping it adds (Gaussian) noise to the newly obtained samples. The variance of this noise depends on the size of the set of original samples, usually σ = 1/ N. As a result there is more variance in resampled values when there are few samples and hence more uncertainty in the local regression. Because of this noise, the computed hyperplanes will vary much more with a small pool of samples. The pseudocode for PSS-ABC is shown in algorithm 5. Note that a diagonal covariance matrix is computed, which is equivalent to assuming independence between the output variables. This is because a full rank covariance matrix is much harder to estimate with few points. The acquisition of new training points at line 21 is at either the current parameter location θ or the proposed parameter location θ, each with 0.5 probability. We note that more sophisticated methods could be implemented, but this simple procedure worked fine in our experiments. 3.4 Projected Kernel Density Estimated Surrogate Instead of assuming that the conditional distribution π(y θ) is Gaussian, this conditional distribution can also be approximated using a kernel density estimate. As with PSS-ABC, the points from the simulation history are projected onto the θ slice. Then, instead of computing the weighted mean and variance of a Gaussian, a weighted kernel density estimate is performed. 22

29 Algorithm 5 The MH-step for PSS-ABC. 1: procedure PSS-ABC MH-step(q, θ, π(θ), π(x θ), y, S, M, ξ) 2: θ q(θ, θ) 3: repeat 4: Compute weights using equation (18) 5: for m 1 to M do 6: Get smoothed bootstrap samples of y 7: for j 1 to J do 8: Compute β j and β j using a local regression at θ and θ 9: Project bootstrapped y j along hyperplane β j 10: Compute ˆµ j, ˆσ j using equations (15) and (16) 11: Project bootstrapped y j along hyperplane β j 12: Compute ˆµ j, ˆσ j using equations (15) and (16) 13: end for 14: ˆµ [ˆµ 1,..., ˆµ J ], ˆΣ diag(ˆσ1,..., ˆσ J ) 15: ˆµ [ˆµ 1,..., ˆµ J ], ˆΣ diag(ˆσ 1,..., ˆσ J ) 16: Set α m using equation (10) 17: end for 18: τ median(α) 19: Set E(α) using equation (12) 20: if E(α) > ξ then 21: Acquire S new training points 22: end if 23: until E(α) ξ 24: if U(0, 1) τ then 25: return θ 26: end if 27: return θ 28: end procedure 23

30 The resulting MH-step is shown in algorithm 6. Algorithm 6 The MH-step for PKS-ABC. 1: procedure PSS-ABC MH-step(q, θ, S, M, ξ, y ) 2: θ q(θ θ) 3: repeat 4: Compute weights using equation (18) 5: for m 1 to M do 6: Get smoothed bootstrap samples of y 7: p 1, p 1 8: for j 1 to J do 9: Compute β j and β j using a local regression at θ and θ 10: Z j projection of bootstrapped y j along hyperplane β j 11: Z j projection of bootstrapped y j along hyperplane β j 12: p p KDE(yj k, h, Z j ) 13: p p KDE(yj k, h, Z j) 14: end for ) 15: α m min (1, q(θ θ )π(θ )p q(θ θ)π(θ)p 16: end for 17: τ median(α) 18: Set E(α) using equation (12) 19: if E(α) > ξ then 20: Acquire S new training points 21: end if 22: until E(α) ξ 23: if U(0, 1) τ then 24: return θ 25: end if 26: return θ 27: end procedure 3.5 Ease of Use The proposed methods aim to be as simple in use as possible. In contrast to Gaussian processes, there is no difficult task to tune hyperparameters, such as the length scales and covariance function. Moreover, GPs have to be monitored for degeneracies over time, which means that the algorithm cannot be run as a black box. 24

31 The kernel methods described earlier, do not have these problems; they are nearly plug-and-play. The kernels and the bandwidth selection method are the only parameters that need to be set. A couple of motivations are made as to how to set these parameters in appendix C. We suggest to set the kernel in the y direction to a kernel with infinite support, such as the Gaussian kernel. For the bandwidth selection it is known that for non-gaussian densities Silverman s rule of thumb overestimates the bandwidth. Hence better performance may be achieved by consistently dividing the computed bandwidth by a fixed value. 25

32 4 Experiments We perform three sets of experiments: 1. Exponential problem: a toy Bayesian inference problem. 2. Multimodal problem: a toy Bayesian inference problem, with multiple modes at some parameter locations. 3. Blowfly problem: inference of parameters in a chaotic ecological system. The first two are mathematical toy problems to show correctness of our proposed algorithms. The goal of the second problem is to illustrate differences between parametric and non-parametric ABC methods. The last experiment is more challenging and allows a view of the performance in a more realistic setting. 4.1 Exponential Toy Problem The exponential problem is to estimate the posterior distribution of the rate parameter of an exponential distribution. The simulator in this case consists of drawing N = 500 from an exponential distribution, parametrized by the rate θ. The only statistic is the mean of the N draws. The observed value y = 9.42, was generated using 500 draws from an exponential distribution with θ = 0.1 and a fixed random seed. The prior on θ is a Gamma distribution with parameters α = β = 0.1. Note that this is quite a broad prior. We used a log-normal proposal distribution with σ = 0.1. The exponential problem was also used to test different other approaches [25, 42]. In figure 2 the convergence of KL-ABC compared to SL-ABC is shown. On the vertical axis the total variation distance to the true posterior is shown. The total variation distance is the integrated absolute difference between the two probability density functions [22] and is computed as: D(f, g) = 1 2 We approximated the integral using a binned approach. 26 f(x) g(x) dx (20)

33 Total Variation Distance KL-ABC SL ABC Total Variation Distance KL-ABC SL ABC Number of samples Number of simulation calls Figure 2: Convergence of KL-ABC vs SL-ABC on the exponential problem. For both algorithms S was set to 20. The ɛ of SL-ABC was set to zero. Bandwidth h was computed using Silverman s rule of thumb. Results are averaged over 10 runs. Shaded areas denote 2 standard deviations. The results in figure 2 are averaged over 10 runs and show that both approaches have similar performance. The marginal versions of the algorithms were used, which means both numerator and denominator are re-estimated each iteration. KL-ABC has a better approximation after fewer samples, however after 10K samples, the SL-ABC algorithm has slightly lower bias. The S parameter, that controls the number of simulations at each location, was fixed to 20 for both algorithms. In the right plot of figure 2 it can be seen that both algorithms used the same number of simulation calls. MCMC runs were also performed for the adaptive variants of both algorithms, ASL-ABC and AKL-ABC. The results are shown in figure 3. The error controlling parameter ξ was set to 0.1 for both algorithms. The initial number of simulations S 0 was set to 10 and S to 2. Bandwidths were estimated using Silverman s rule (equation (35) in appendix A). After 10K samples AKL performs slightly better on average: D = versus D = for ASL, but the errors are quite close. Therefore it is more informative to look at the numbers of simulations needed, which are 27

34 Total Variation Distance Exponential Problem AKL ASL Total Variation Distance Exponential Problem AKL ASL Number of samples Number of simulation calls Figure 3: Convergence of AKL-ABC vs ASL-ABC on the exponential problem. For both algorithms S 0 was set to 10. The ɛ of ASL-ABC was set to zero. The ξ was set to 0.1. Results are averaged over 10 runs. shown in the left subplot of figure 3. It can be seen that AKL does need more simulations to obtain the 10K samples: on average opposed to of ASL, which is approximately 5 simulation calls more per sample. The results of the global algorithms PSS-ABC and PKS-ABC are shown in figure 4. Recall that PSS-ABC is the parametric algorithm and the global counterpart of ASL-ABC, whereas PKS-ABC is non-parametric and the global version of AKL-ABC. The ξ for both algorithms was set to For the horizontal kernel, on the θ-axis, the Epanechnikov kernel (equation (30) in appendix A) was used for both algorithms. PKS-ABC employs a Gaussian kernel in the y-direction. The initial number of simulations was set to 10 for both algorithms. Silverman s rule was used to set bandwidths. For both algorithms we projected orthogonally, i.e. no local regression was performed. This is because we want to show the sensitivity of PSS to outliers. When performing a linear correction, there are fewer outliers and hence the effect is less pronounced. It can be seen that the PSS algorithm performs quite poorly in terms of total variation distance. The posterior distribution that it obtains is however 28

35 Total Variation Distance Exponential Problem PSS orth PKS orth Number of samples Number of simulation calls Exponential Problem PSS orth PKS orth Number of samples Figure 4: Convergence of PKS-ABC versus PSS-ABC on the exponential problem. For both algorithms S 0 was set to 10, S = 2, and ξ was set to Results are averaged over 10 runs. closer to the true posterior than the total variation distance might imply. This is illustrated in figure 5. It can be seen that the PSS posterior is overly dispersed and shifted. The reason for this is illustrated in figure 6. From this we can see that the kernel regression always overestimates the true function. Since the estimate of the kernel regression is used as the mean of the Gaussian in PSS-ABC, it will always overestimate the mean and as a result the obtained posterior is shifted. The conditional distributions of both PSS-ABC and PKS-ABC for θ = 0.1 are also shown. Note that the Gaussian distribution of PSS-ABC is very flat. This is because there are some projected samples from θ = with y 500, which causes the computed standard deviation to increase rapidly. A higher standard deviation will lead to a wider posterior. The PKS-ABC algorithm, which does not employ a Gaussian approximation, does not suffer from these problems. The conditional distribution it infers has a heavy tail as can be seen in figure 6. The points projected from locations close to zero, i.e. θ = 0.002, only cause small bumps in the distribution. For example in the conditional distribution in figure 6 there is 29

36 True posterior KRS posterior π(θ y) θ Figure 5: The posterior distribution obtained by PSS-ABC versus the true posterior of the exponential problem. The vertical line is the true setting θ = y PSS Conditional PKS Conditional Kernel regression True function θ Figure 6: The failure of kernel regression to model the exponential mean function properly. The vertical line is the true setting θ = 0.1, the horizontal line is the observed value y = The conditional distributions of both PSS and PKS are also shown. These are computed on the same data points and both share the same outliers. 30

37 a bump of at the location y = 513, but this is not influencing the mode at y. 3 A notable result is that the number of simulation calls keeps increasing. We think the recalculation of the bandwidth causes this. In general: the more training points, the smaller the optimal bandwidth. 4 As additional training samples are obtained, the optimal bandwidth will (in general) become smaller. With a smaller bandwidth fewer points will be effectively included in an estimate and hence the variance of the estimate will increase. This increased variance leads to increased uncertainty and hence acquisition of additional training points. It should be noted that the right plot in figure 4 is shown on a log-log scale and hence the rate at which the number of simulations calls is increasing, decreases. Note that both global methods require far fewer simulation calls than their local counterparts: After 10K samples PSS has performed 1999 simulation calls and PKS 485. Compared to the calls of AKL or the of ASL, this is a big gain. 4.2 Multimodal Problem Multimodal distributions are generally poorly modelled by a Gaussian distribution. A multimodal distribution will therefore be used to illustrate the shortcomings of some of the algorithms. For this experiment, the simulator consist of a mixture of two functions. The resulting function can be described formally as: sin(θ) + ɛ with probability ρ Multimodal(θ) = sin(θ) + ɛ with probability 1 ρ (21) Where ɛ is Gaussian noise with σ = 0.5 and ρ controls the level of multimodality. We used a value of 0.5 for ρ. The observed value y was set to 7. 3 Note that this is not shown in figure 6 because otherwise the figure became too cluttered. 4 This is also reflected in Silverman s rule (equation (35) in appendix A) in the division by the number of training points. 31

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Modified Metropolis-Hastings algorithm with delayed rejection

Modified Metropolis-Hastings algorithm with delayed rejection Modified Metropolis-Hastings algorithm with delayed reection K.M. Zuev & L.S. Katafygiotis Department of Civil Engineering, Hong Kong University of Science and Technology, Hong Kong, China ABSTRACT: The

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Monte Carlo Methods and Statistical Computing: My Personal E

Monte Carlo Methods and Statistical Computing: My Personal E Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

GT "Calcul Ensembliste"

GT Calcul Ensembliste GT "Calcul Ensembliste" Beyond the bounded error framework for non linear state estimation Fahed Abdallah Université de Technologie de Compiègne 9 Décembre 2010 Fahed Abdallah GT "Calcul Ensembliste" 9

More information

This chapter explains two techniques which are frequently used throughout

This chapter explains two techniques which are frequently used throughout Chapter 2 Basic Techniques This chapter explains two techniques which are frequently used throughout this thesis. First, we will introduce the concept of particle filters. A particle filter is a recursive

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc. Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization

More information

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea Bayesian Statistics Group 8th March 2000 Slice samplers (A very brief introduction) The basic idea lacements To sample from a distribution, simply sample uniformly from the region under the density function

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Hierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014

Hierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014 Hierarchical Bayesian Modeling with Ensemble MCMC Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014 Simple Markov Chain Monte Carlo Initialise chain with θ 0 (initial

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

INLA: Integrated Nested Laplace Approximations

INLA: Integrated Nested Laplace Approximations INLA: Integrated Nested Laplace Approximations John Paige Statistics Department University of Washington October 10, 2017 1 The problem Markov Chain Monte Carlo (MCMC) takes too long in many settings.

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Approximate Bayesian Computation using Auxiliary Models

Approximate Bayesian Computation using Auxiliary Models Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012 Tony Pettitt () ABC using

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

INLA: an introduction

INLA: an introduction INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

STAT 725 Notes Monte Carlo Integration

STAT 725 Notes Monte Carlo Integration STAT 725 Notes Monte Carlo Integration Two major classes of numerical problems arise in statistical inference: optimization and integration. We have already spent some time discussing different optimization

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Model validation T , , Heli Hiisilä

Model validation T , , Heli Hiisilä Model validation T-61.6040, 03.10.2006, Heli Hiisilä Testing Neural Models: How to Use Re-Sampling Techniques? A. Lendasse & Fast bootstrap methodology for model selection, A. Lendasse, G. Simon, V. Wertz,

More information

Integration. Volume Estimation

Integration. Volume Estimation Monte Carlo Integration Lab Objective: Many important integrals cannot be evaluated symbolically because the integrand has no antiderivative. Traditional numerical integration techniques like Newton-Cotes

More information

The Plan: Basic statistics: Random and pseudorandom numbers and their generation: Chapter 16.

The Plan: Basic statistics: Random and pseudorandom numbers and their generation: Chapter 16. Scientific Computing with Case Studies SIAM Press, 29 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit IV Monte Carlo Computations Dianne P. O Leary c 28 What is a Monte-Carlo method?

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

Probabilistic Robotics

Probabilistic Robotics Probabilistic Robotics Discrete Filters and Particle Filters Models Some slides adopted from: Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras and Probabilistic Robotics Book SA-1 Probabilistic

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

ECSE-626 Project: An Adaptive Color-Based Particle Filter

ECSE-626 Project: An Adaptive Color-Based Particle Filter ECSE-626 Project: An Adaptive Color-Based Particle Filter Fabian Kaelin McGill University Montreal, Canada fabian.kaelin@mail.mcgill.ca Abstract The goal of this project was to discuss and implement a

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

1

1 Zeros&asymptotes Example 1 In an early version of this activity I began with a sequence of simple examples (parabolas and cubics) working gradually up to the main idea. But now I think the best strategy

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Artificial Intelligence for Robotics: A Brief Summary

Artificial Intelligence for Robotics: A Brief Summary Artificial Intelligence for Robotics: A Brief Summary This document provides a summary of the course, Artificial Intelligence for Robotics, and highlights main concepts. Lesson 1: Localization (using Histogram

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm

Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm Utrecht University Department of Information and Computing Sciences Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm February 2015 Author Gerben van Veenendaal ICA-3470792 Supervisor

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit IV Monte Carlo

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit IV Monte Carlo Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit IV Monte Carlo Computations Dianne P. O Leary c 2008 1 What is a Monte-Carlo

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information