Bootstrapping Methods

Size: px
Start display at page:

Download "Bootstrapping Methods"

Transcription

1 Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

2 Monte Carlo Methods Resampling Methods Simulation MCMC Stochastic Diff Eq Bootstrapping Jackknife Permutation Tests

3 EXAMPLE: Are these data generated by the same psychometric function or not? We ve talked about how to do this using maximum likelihood methods luminance #1 luminance #1 accuracy luminance # luminance # motion coherence motion coherence

4 full model restricted model G = [ln L ln L ] F R G is distributed as a χ with df = # parm full - # parm restricted What does this mean?

5 full model restricted model G = [ln L ln L ] F R G is distributed as a χ with df = # parm full - # parm restricted this actually means that G is asymptotically distributed as a χ as the number of observations goes to infinity (recall the central limit theorem, where the mean of any distribution is normally distributed as the number of observations goes to infinity)

6 full model restricted model G = [ln L ln L ] F R G is distributed as a χ with df = # parm full - # parm restricted this actually means that G is asymptotically distributed as a χ as the number of observations goes to infinity it can actually deviate considerably from a χ when the number of observations is low, and what s low depends a lot on the functional form of the models being compared (unlike the central limit theorem, which works well when The number of observations is really low )

7 full model restricted model G = [ln L ln L ] F R Imagine that the data is actually generated by the restricted model. We would expect by chance that the full model, more free parameters, would fit better than the restricted model (even though the data was actually generated by the restricted model. Asymptotic statistical theory tells us that G should be distributed as a χ with df given by the number of constrained parameters in the restricted model.

8 full model restricted model G = [ln L ln L ] F R this is the χ based on asymptotic statistical theory only a 5% chance of getting a G in this range if the data was generated by the restricted model that depends on the χ being the right distribution

9 full model restricted model G = " [ln L F! ln L R ] this is the! based on asymptotic statistical theory only a 5% chance of getting a G in this range if the data was generated by the restricted model that depends on the! being the right distribution Is there another way to generate a distribution of expected G values under the assumption that the data was generated by the restricted model?

10 Parametric Bootstrapping Technique 1. Fit the restricted and full model to the observed data like always, calculating G.

11 Parametric Bootstrapping Technique 1. Fit the restricted and full model to the observed data like always, calculating G.. Now, use the best-fitting parameters from the restricted model. Use a Monte Carlo simulation to generate simulated data produced by the restricted model.

12 Parametric Bootstrapping Technique 1. Fit the restricted and full model to the observed data like always, calculating G.. Now, use the best-fitting parameters from the restricted model. Use a Monte Carlo simulation to generate simulated data produced by the restricted model. 3. Fit the restricted and full model to the simulated data, calculating G. The simulated data came from the restricted model, you re fitting both the restricted and full model, so this is the distribution expected in G by chance.

13 Parametric Bootstrapping Technique 1. Fit the restricted and full model to the observed data like always, calculating G.. Now, use the best-fitting parameters from the restricted model. Use a Monte Carlo simulation to generate simulated data produced by the restricted model. 3. Fit the restricted and full model to the simulated data, calculating G. The simulated data came from the restricted model, you re fitting both the restricted and full model, so this is the distribution expected in G by chance. 4. Do # and #3 thousands of times. Generate a histogram of G.

14 Parametric Bootstrapping Technique 1. Fit the restricted and full model to the observed data like always, calculating G.. Now, use the best-fitting parameters from the restricted model. Use a Monte Carlo simulation to generate simulated data produced by the restricted model. 3. Fit the restricted and full model to the simulated data, calculating G. The simulated data came from the restricted model, you re fitting both the restricted and full model, so this is the distribution expected in G by chance. 4. Do # and #3 thousands of times. Generate a histogram of G. 5. Is the observed G in the upper 5% tail?

15 luminance #1 luminance #1 accuracy luminance # luminance # motion coherence motion coherence Full Model α 1, β 1, γ 1, λ 1 for luminance #1 curve α, β, γ, λ for luminance # curve Restricted Model α, β, γ, λ for luminance #1 curve α, β, γ, λ for luminance # curve

16 see week7_psychometric_function.m

17

18 Confidence Intervals on Parameter Estimates What is the confidence interval on a model parameter estimate?

19 Previous discussion was regarding inferences about models Now we ll talk a bit about inferences about parameters values

20 Confidence Intervals on Parameter Estimates if you repeated the experiment, 95% of the time the parameter estimate would fall in that interval

21 Confidence Intervals on Parameter Estimates if you repeated the experiment, 95% of the time the parameter estimate would fall in that interval We don t want to run the experiment 1000 s of times

22 Confidence Intervals and Standard Errors of Parameter Estimates Confidence interval related to the steepness or shallowness of the error surface. steep = small confidence interval shallow = large confidence interval

23 Confidence Intervals and Standard Errors of Parameter Estimates w1

24 parm parm parm 1 small SE s small confidence intervals parm 1 large SE s large confidence intervals

25 How do we calculated confidence intervals and standard errors of parameter estimates? * We could do a full grid search around the best-fitting values

26 How do we calculated confidence intervals and standard errors of parameter estimates? * We could do a full grid search around the best-fitting values With -3 parameters, that s feasible 100X100X100 = 1,000,000 iterations With 50 parameters, it s impossible 100^50 = 10^100 = a google take longer than the life of the universe

27 How do we calculated confidence intervals and standard errors of parameter estimates? What measure let s us tell the difference between these? lnl lnl parm parm

28 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian What measure let s us tell the difference between these? lnl lnl parm parm

29 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian What measure let s us tell the difference between these? lnl lnl p = 0 lnl lnl p = 0 parm parm

30 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian What measure let s us tell the difference between these? lnl ln L p << 0 lnl ln L p < 0 parm parm

31 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian (matrix of partial nd derivative) SE is an inverse function of the second derivative lnl ln L p << 0 lnl ln L p < 0 parm parm

32 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm ln L p 1 parm 1

33 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm ln L p 1 ln L p parm 1

34 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian ln L p p 1 parm ln L p 1 ln L p parm 1

35 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm 1 parm ln p L 1 ln p p L = ln ln ln ln ) (ln p L p p L p p L p L L H 1 ln p L

36 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm 1 parm ln p L 1 ln p p L = ln ln ln ln ) (ln p L p p L p p L p L L H 1 ln p L!!p 1!ln L!p " # $ % & '!!p!ln L!p " # $ % & '

37 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm 1 parm ln p L 1 ln p p L = ln ln ln ln ) (ln p L p p L p p L p L L H evaluate Hessian at max lnl take inverse of the Hessian H -1 that gives variance/covariance matrix 1 ln p L

38 How do we calculated confidence intervals and standard errors of parameter estimates? * We could use the Hessian parm 1 parm ln p L 1 ln p p L = ln ln ln ln ) (ln p L p p L p p L p L L H evaluate Hessian at max lnl ) ( ) ( p SE z p p CI ± = α 1 ln p L

39 parm parm parm 1 small SE s small confidence intervals parm 1 large SE s large confidence intervals What factors might lead to small or large SE s/ci s?

40 What factors might lead to small or large SE s/ci s? The Confidence Interval is the range of values reflecting the true parameter value with 95% certainty

41 What factors might lead to small or large SE s/ci s? * property of the model how changes in parameter values produce changes in predictions * property of the data itself how confident you are of the data, which is a function of the number of observations

42 see week7_psychometric_function.m

43

44 As was the case for testing G using the χ test, use of the Hessian to calculate confidence intervals on parameters estimates assumes that the number of datapoints is large. Maximum likelihood estimators are asymptotically normal. Asymptotic is violated when the number of datapoints is small. from the inverse of the Hessian CI ( p1 ) = p1 ± zα SE( p1 ) assuming normal distribution to calculate 95% confidence intervals

45 As was the case for testing G using the χ test, use of the Hessian to calculate confidence intervals on parameters estimates assumes that the number of datapoints is large. Maximum likelihood estimators are asymptotically normal. Asymptotic is violated when the number of datapoints is small. Confidence Intervals via Nonparametric Bootstrapping - or - Parametric Bootstrapping

46 Bootstrapping If the model is the right model of cognition or perception then the parameters found by maximum likelihood methods are estimates of the true parameters underlying behavior.

47 Bootstrapping If the model is the right model of cognition or perception then the parameters found by maximum likelihood methods are estimates of the true parameters underlying behavior. Since these are estimates, the best-fitting parameter values are within some range of the true parameter values.!x i x = n σ σ M = n

48 Bootstrapping If the model is the right model of cognition or perception then the parameters found by maximum likelihood methods are estimates of the true parameters underlying behavior. Since these are estimates, the best-fitting parameter values are within some range of the true parameter values. One way to figure out what the true parameter values really are would be to do many many replications of the same experiment under the same conditions with the same subjects. But that s way too costly.

49 Bootstrapping If the model is the right model of cognition or perception then the parameters found by maximum likelihood methods are estimates of the true parameters underlying behavior. Since these are estimates, the best-fitting parameter values are within some range of the true parameter values. One way to figure out what the true parameter values really are would be to do many many replications of the same experiment under the same conditions with the same subjects. But that s way too costly. But, is there some way to use the data itself to simulate replications of the original experiment, at least to get an estimate of the variability of the estimates?

50 That s what Nonparametric Bootstrapping does The first thing you do is fit the model to the observed data and calculate the maximum likelihood parameter estimates, just as we ve been doing all along.

51 That s what Nonparametric Bootstrapping does The first thing you do is fit the model to the observed data and calculate the maximum likelihood parameter estimates, just as we ve been doing all along. The next thing you do is create replications of the experiment, using the observed data, and calculate the maximum likelihood estimates for each replication.

52 That s what Nonparametric Bootstrapping does The first thing you do is fit the model to the observed data and calculate the maximum likelihood parameter estimates, just as we ve been doing all along. The next thing you do is create replications of the experiment, using the observed data, and calculate the maximum likelihood estimates for each replication. For each replication, you have the best-fitting parameter estimates. The distribution of parameter estimates across all these replications gives you the variability of the parameter estimates. The 95% confidence interval for each parameter is found by examining the distribution of parameter values in the range from the.5% to 97.5% values.

53 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N

54 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 motion #

55 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y motion #

56 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N motion #

57 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y motion #

58 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y motion #

59 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N motion #

60 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y motion #

61 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion #

62 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y

63 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N

64 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N

65 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y

66 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N

67 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N N N N N N N N Y N Y N

68 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Fit model to bootstrap simulated replication. Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N N N N N N N N Y N Y N

69 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y motion #

70 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y Y motion #

71 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y Y N motion #

72 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Bootstrap Simulated Replication: motion #1 Y Y N Y Y Y N N N N Y Y Y Y Y N motion # N Y Y N N N N N N N N Y N Y N N

73 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement why with replacement??? Bootstrap Simulated Replication: motion #1 Y Y N Y Y Y N N N N Y Y Y Y Y N motion # N Y Y N N N N N N N N Y N Y N N

74 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement Fit model to bootstrap simulated replication. Bootstrap Simulated Replication: motion #1 Y Y N Y Y Y N N N N Y Y Y Y Y N motion # N Y Y N N N N N N N N Y N Y N N

75 What are the replications? nonparametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N Randomly sample from the original data with replacement *** Do this about 5000 times *** Bootstrap Simulated Replication: motion #1 Y Y N Y Y Y N N N N Y Y Y Y Y N motion # N Y Y N N N N N N N N Y N Y N N

76

77 see week7_psychometric_function.m

78 What are the replications? parametric bootstrapping Original Data: motion #1 Y Y Y Y Y Y Y Y Y Y N N N N N N motion # Y Y Y Y Y N N N N N N N N N N N With nonparametric bootstrapping, the simulated replications come from the data (hence nonparametric). With parametric bootstrapping, the simulated replications instead come from simulating the best fitting model (hence parametric). Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N N N N N N N N Y N Y N

79 What are the replications? parametric bootstrapping Use the best-fitting model to generate data Fit model to bootstrap simulated replication. Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N N N N N N N N Y N Y N

80 What are the replications? parametric bootstrapping Use the best-fitting model to generate data *** Do this about 5000 times *** Bootstrap Simulated Replication: motion #1 Y N Y Y N Y N Y Y Y N Y Y Y N N motion # Y N N Y N N N N N N N N Y N Y N

81

82 see week7_psychometric_function.m

83

84 Null Model Prototype Model compare nonnested models Exemplar Model compare nested models Mixture Model compare nested models Saturated Model

85 forgetting curves

86 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b)

87 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) Why do I say these are nonnested?

88 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) Wouldn t you just pick the model that fits best?

89 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b)

90 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) 3 parameters parameters Akaike Information Criterion AIC = -lnl + M M = # parameters

91 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) 3 parameters parameters AIC = * *3 AIC = AIC = * * AIC = favor the model with the lower AIC statistic

92 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) 3 parameters parameters Bayesian Information Criterion BIC = -lnl + M*ln(N) M = # parameters N = total # observations

93 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) 3 parameters parameters BIC = * *ln(600) BIC = BIC = * *ln(600) BIC = 66.7 favor the model with the lower BIC statistic

94 lnl = lnl = p(t a,b,c) = a*exp(-b*t) + c p(t a,b,c) = a*t^(-b) 3 parameters parameters We may talk about more sophisticated ways of assessing model complexity later in the course.

95

96 PROCESS MODELS OF RESPONSE TIME AND CHOICE PROBABILITY We ve talked about discrete behaviors identification, categorization, yes/no We ve talked a bit about continuous behaviors selecting responses from a ring ala Zhang and Luck Response time is another basic behavioral measure

97 PROCESS MODELS OF RESPONSE TIME AND CHOICE PROBABILITY Give us an opportunity to more about Monte Carlo simulation methods Fitting choice and RT simultaneously Special techniques for fitting RT distributions and why you might care Using diffusion models as data analysis devices

98

99 Stochastic Accumulation of Evidence Models

100 Stochastic Accumulation of Evidence Models perceptual time

101 Stochastic Accumulation of Evidence Models perceptual time

102 Stochastic Accumulation of Evidence Models perceptual time

103 Stochastic Accumulation of Evidence Models perceptual time

104 Stochastic Accumulation of Evidence Models perceptual time Variety of stochastic accumulator models have been proposed and tested race and counter models e.g., Smith & Van Zandt 00 random walk models e.g., Link 197; Nosofsky & Palmeri 1997; Palmeri 1997 diffusion models e.g., Ratcliff & Smith 004; Ratcliff & McKoon 008 competitive models e.g., Usher & McClelland 001 Models account for precise behavioral details of accuracy and response time

105 Stochastic Accumulation of Evidence Models perceptual drift a ac@on z T R T M time These models have free parameters that define the time for perceptual processing, starting point and threshold for evidence, rate of accumulation

106 @me

107

108 decision A boundary evidence decision B

109 decision A boundary evidence diffusion coefficient (noise) star@ng point decision B

110 move boundaries in to stress speed over accuracy

111 move the point to bias one response over another response

112 determined by and knowledge diffusion coefficient

113 RT = TR

114

115

116 evidence a z dt dx evidence = z; while (evidence<a & + dt; r = rand; if r < f(mu,sigma) evidence = evidence + dx; else evidence = evidence dx; end end

117 a dx evidence z dt evidence = z; while (evidence<a & + dt; r = rand; if r < f(mu,sigma) evidence = evidence + dx; else evidence = evidence dx; end end

118 dx =! dt dt dx p = 1! # 1+ µ dt "! q = 1! # 1' µ dt "! $ & % $ & %

119 dx =! dt dt dx p = 1! # 1+ µ dt "! q = 1! # 1' µ dt "! $ & = 1 % + 1 $ & = 1 % ' 1 µ dt! µ dt! μ = drift rate σ = noise so μ/σ is a signal-to-noise ratio probability of moving up dx with probability p, or moving down dx with probability q=1-p, is a simple function of the signal-to-noise ratio scaled by the time increment

120

121 see week7.m

122

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

An Introduction to the Bootstrap

An Introduction to the Bootstrap An Introduction to the Bootstrap Bradley Efron Department of Statistics Stanford University and Robert J. Tibshirani Department of Preventative Medicine and Biostatistics and Department of Statistics,

More information

Annotated multitree output

Annotated multitree output Annotated multitree output A simplified version of the two high-threshold (2HT) model, applied to two experimental conditions, is used as an example to illustrate the output provided by multitree (version

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard 1 Motivation Recall: Discrete filter Discretize the continuous state space High memory complexity

More information

CVEN Computer Applications in Engineering and Construction. Programming Assignment #2 Random Number Generation and Particle Diffusion

CVEN Computer Applications in Engineering and Construction. Programming Assignment #2 Random Number Generation and Particle Diffusion CVE 0-50 Computer Applications in Engineering and Construction Programming Assignment # Random umber Generation and Particle Diffusion Date distributed: 0/06/09 Date due: 0//09 by :59 pm (submit an electronic

More information

Slides 11: Verification and Validation Models

Slides 11: Verification and Validation Models Slides 11: Verification and Validation Models Purpose and Overview The goal of the validation process is: To produce a model that represents true behaviour closely enough for decision making purposes.

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Bayesian Approaches to Content-based Image Retrieval

Bayesian Approaches to Content-based Image Retrieval Bayesian Approaches to Content-based Image Retrieval Simon Wilson Georgios Stefanou Department of Statistics Trinity College Dublin Background Content-based Image Retrieval Problem: searching for images

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Probabilistic Robotics

Probabilistic Robotics Probabilistic Robotics Discrete Filters and Particle Filters Models Some slides adopted from: Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras and Probabilistic Robotics Book SA-1 Probabilistic

More information

COPYRIGHTED MATERIAL CONTENTS

COPYRIGHTED MATERIAL CONTENTS PREFACE ACKNOWLEDGMENTS LIST OF TABLES xi xv xvii 1 INTRODUCTION 1 1.1 Historical Background 1 1.2 Definition and Relationship to the Delta Method and Other Resampling Methods 3 1.2.1 Jackknife 6 1.2.2

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

How to Win With Non-Gaussian Data: Poisson Imaging Version

How to Win With Non-Gaussian Data: Poisson Imaging Version Comments On How to Win With Non-Gaussian Data: Poisson Imaging Version Alanna Connors David A. van Dyk a Department of Statistics University of California, Irvine a Joint Work with James Chaing and the

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Modeling and Performance Analysis with Discrete-Event Simulation

Modeling and Performance Analysis with Discrete-Event Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation Chapter 10 Verification and Validation of Simulation Models Contents Model-Building, Verification, and Validation Verification

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates CSCI 599 Class Presenta/on Zach Levine Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates April 26 th, 2012 Topics Covered in this Presenta2on A (Brief) Review of HMMs HMM Parameter Learning Expecta2on-

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Stochastic Simulation: Algorithms and Analysis

Stochastic Simulation: Algorithms and Analysis Soren Asmussen Peter W. Glynn Stochastic Simulation: Algorithms and Analysis et Springer Contents Preface Notation v xii I What This Book Is About 1 1 An Illustrative Example: The Single-Server Queue 1

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES Integrative Biology 200, Spring 2014 Principles of Phylogenetics: Systematics University of California, Berkeley Updated by Traci L. Grzymala Lab 07: Maximum Likelihood Model Selection and RAxML Using

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Supplementary Materials: Competing theories of multi-alternative, multi-attribute preferential choice

Supplementary Materials: Competing theories of multi-alternative, multi-attribute preferential choice Supplementary Materials: Competing theories of multi-alternative, multi-attribute preferential choice Brandon M. Turner a,, Dan R. Schley b, Carly Muller a, Konstantinos Tsetsos c a Department of Psychology,

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Canopy Light: Synthesizing multiple data sources

Canopy Light: Synthesizing multiple data sources Canopy Light: Synthesizing multiple data sources Tree growth depends upon light (previous example, lab 7) Hard to measure how much light an ADULT tree receives Multiple sources of proxy data Exposed Canopy

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization CS 650: Computer Vision Bryan S. Morse Optimization Approaches to Vision / Image Processing Recurring theme: Cast vision problem as an optimization

More information

Monte Carlo Methods and Statistical Computing: My Personal E

Monte Carlo Methods and Statistical Computing: My Personal E Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

More advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K.

More advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K. More advanced use of mgcv Simon Wood Mathematical Sciences, University of Bath, U.K. Fine control of smoothness: gamma Suppose that we fit a model but a component is too wiggly. For GCV/AIC we can increase

More information

6.801/866. Segmentation and Line Fitting. T. Darrell

6.801/866. Segmentation and Line Fitting. T. Darrell 6.801/866 Segmentation and Line Fitting T. Darrell Segmentation and Line Fitting Gestalt grouping Background subtraction K-Means Graph cuts Hough transform Iterative fitting (Next time: Probabilistic segmentation)

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Estimation and Inference by the Method of Projection Minimum Distance. Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada

Estimation and Inference by the Method of Projection Minimum Distance. Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada Estimation and Inference by the Method of Projection Minimum Distance Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada The Paper in a Nutshell: An Efficient Limited Information Method Step 1: estimate

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Here are some of the more basic curves that we ll need to know how to do as well as limits on the parameter if they are required.

Here are some of the more basic curves that we ll need to know how to do as well as limits on the parameter if they are required. 1 of 10 23/07/2016 05:15 Paul's Online Math Notes Calculus III (Notes) / Line Integrals / Line Integrals - Part I Problems] [Notes] [Practice Problems] [Assignment Calculus III - Notes Line Integrals Part

More information

Supplementary Material sppmix: Poisson point process modeling using normal mixture models

Supplementary Material sppmix: Poisson point process modeling using normal mixture models Supplementary Material sppmix: Poisson point process modeling using normal mixture models Athanasios C. Micheas and Jiaxun Chen Department of Statistics University of Missouri April 19, 2018 1 The sppmix

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bootstrap confidence intervals Class 24, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to construct and sample from the empirical distribution of data. 2. Be able to explain the bootstrap

More information

COMPARING MODELS AND CURVES. Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting.

COMPARING MODELS AND CURVES. Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting. COMPARING MODELS AND CURVES An excerpt from a forthcoming book: Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting. Harvey Motulsky GraphPad Software

More information

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices Int J Adv Manuf Technol (2003) 21:249 256 Ownership and Copyright 2003 Springer-Verlag London Limited Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices J.-P. Chen 1

More information

Sistemática Teórica. Hernán Dopazo. Biomedical Genomics and Evolution Lab. Lesson 03 Statistical Model Selection

Sistemática Teórica. Hernán Dopazo. Biomedical Genomics and Evolution Lab. Lesson 03 Statistical Model Selection Sistemática Teórica Hernán Dopazo Biomedical Genomics and Evolution Lab Lesson 03 Statistical Model Selection Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires Argentina 2013 Statistical

More information

Nonparametric Testing

Nonparametric Testing Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

SEM 1: Confirmatory Factor Analysis

SEM 1: Confirmatory Factor Analysis SEM 1: Confirmatory Factor Analysis Week 2 - Fitting CFA models Sacha Epskamp 10-04-2017 General factor analysis framework: in which: y i = Λη i + ε i y N(0, Σ) η N(0, Ψ) ε N(0, Θ), y i is a p-length vector

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

Macros and ODS. SAS Programming November 6, / 89

Macros and ODS. SAS Programming November 6, / 89 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Bayesian model selection and diagnostics

Bayesian model selection and diagnostics Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

PRINCIPLES OF PHYLOGENETICS Spring 2008 Updated by Nick Matzke. Lab 11: MrBayes Lab

PRINCIPLES OF PHYLOGENETICS Spring 2008 Updated by Nick Matzke. Lab 11: MrBayes Lab Integrative Biology 200A University of California, Berkeley PRINCIPLES OF PHYLOGENETICS Spring 2008 Updated by Nick Matzke Lab 11: MrBayes Lab Note: try downloading and installing MrBayes on your own laptop,

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Modeling time series with hidden Markov models

Modeling time series with hidden Markov models Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard Time series data Barometric pressure Temperature Data Humidity Time What s going

More information

Link Prediction and Anomoly Detection

Link Prediction and Anomoly Detection Graphs and Networks Lecture 23 Link Prediction and Anomoly Detection Daniel A. Spielman November 19, 2013 23.1 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Pattern Recognition ( , RIT) Exercise 1 Solution

Pattern Recognition ( , RIT) Exercise 1 Solution Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Stochastic Road Shape Estimation, B. Southall & C. Taylor. Review by: Christopher Rasmussen

Stochastic Road Shape Estimation, B. Southall & C. Taylor. Review by: Christopher Rasmussen Stochastic Road Shape Estimation, B. Southall & C. Taylor Review by: Christopher Rasmussen September 26, 2002 Announcements Readings for next Tuesday: Chapter 14-14.4, 22-22.5 in Forsyth & Ponce Main Contributions

More information

Fitting D.A. Forsyth, CS 543

Fitting D.A. Forsyth, CS 543 Fitting D.A. Forsyth, CS 543 Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can t tell whether a set of points lies on

More information

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Approximate Bayesian Computation. Alireza Shafaei - April 2016 Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested

More information

Inverse and Implicit functions

Inverse and Implicit functions CHAPTER 3 Inverse and Implicit functions. Inverse Functions and Coordinate Changes Let U R d be a domain. Theorem. (Inverse function theorem). If ϕ : U R d is differentiable at a and Dϕ a is invertible,

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Authored by: Francisco Ortiz, PhD Version 2: 19 July 2018 Revised 18 October 2018 The goal of the STAT COE is to assist in

More information

Computer Vision 2 Lecture 8

Computer Vision 2 Lecture 8 Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de

More information

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11: Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,

More information

Random Number Generators

Random Number Generators 1/17 Random Number Generators Professor Karl Sigman Columbia University Department of IEOR New York City USA 2/17 Introduction Your computer generates" numbers U 1, U 2, U 3,... that are considered independent

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Histograms. h(r k ) = n k. p(r k )= n k /NM. Histogram: number of times intensity level rk appears in the image

Histograms. h(r k ) = n k. p(r k )= n k /NM. Histogram: number of times intensity level rk appears in the image Histograms h(r k ) = n k Histogram: number of times intensity level rk appears in the image p(r k )= n k /NM normalized histogram also a probability of occurence 1 Histogram of Image Intensities Create

More information

Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics Masayo Haneda Peter M.W. Knijnenburg Harry A.G. Wijshoff LIACS, Leiden University Motivation An optimal compiler optimization

More information

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide

More information

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance CSC55 Machine Learning Sam Roweis high bias low variance Typical Behaviour low bias high variance Lecture : Overfitting and Capacity Control error training set test set November, 6 low Model Complexity

More information