PARAMETRIC MODEL SELECTION TECHNIQUES

Size: px
Start display at page:

Download "PARAMETRIC MODEL SELECTION TECHNIQUES"

Transcription

1 PARAMETRIC MODEL SELECTION TECHNIQUES GARY L. BECK, STEVE FROM, PH.D. Abstract. Many parametric statistical models are available for modelling lifetime data. Given a data set of lifetimes, which may or may not be censored, which parametric model should be used to conduct statistical tests? In only a few cases can analytical expressions be found to answer this question in some optimal fashion. Various measures of discrepancy and other functionals of the distribution function will be considered for a finite number of competing parametric statistical models. Utilizing techniques developed by Linhart and Zucchini, survival data from pediatric patients who have received stem cell transplants will be analyzed to determine if models for random samples represent the actual model for the population. 1. Introduction Probability models are useful for providing information about observations of seemingly random variables. In controlled settings, various parametric models may be chosen which appear to fit the data. However, it is highly unlikely that investigators will have a complete data set with which to base assumptions and are therefore resigned to using a battery of tests to confirm how well the chosen model fits the observed pattern of data [1]. Linhart and Zucchini [2] lay the groundwork to find more accurate means of model selection. Taking simple random samples, observations may be regarded as independent and identically distributed random variables with a non-negative probability density function (pdf). This particular pdf, notated f(x), may be regarded as the model which gave rise to the observations, which is referred to as the operating model [3]. In practice, the operating model is used to make estimations The author would like to give special thanks to John Maloney, Ph.D., for his invaluable assistance in programming Maple. 1

2 2 GARY L. BECK, STEVE FROM, PH.D. about f(x) even though it is based on only a sample of the population. It is only exceptional cases wherein sufficient information is available to identify the operating model. It is even more of a rarity to have a complete data set available to identify the operating model. Therefore, it is important to have an understanding of the information under investigation in order to circumscribe a family of models to best represent the pdf. The size of the family of models is determined by the number of its independent parameters. These parameters are estimated from observations, the accuracy of which depends on the amount of data available relative to the number of parameters to be estimated. This improves as either the sample size increases or the number of parameters decreases. Attempting to estimate too many parameters is referred to as overfitting, which leads to instability. In other words, repeated samples collected under comparable conditions leading to widely varying models exemplifies overfitting. Consequently, this suggests that successful model fitting is a matter of ensuring a sufficient data set to achieve a determined level of accuracy. Traditionally, histograms have been used to summarize a set of data graphically. Based on the graphical display, the behavior of the histogram may be used to estimate the probability density function whether it be a final estimate or intermediate estimate in search of a smoother model. However, the properties of the histogram as an estimator of the pdf strongly depend on the sample size and number of intervals of the information. Depending on the sample size, the number of intervals tend to display greater variability; however, the uncertainty of this phenomenon is identifying whether these graphical variations represent the population as a whole or simply sample characteristics. A common approach to fitting a model is to select the simplest approximating family which is consistent with the data. This is a matter of selecting from a catalog of models which match the general features apparent in the data, which may be recognized using histograms. It is assumed

3 PARAMETRIC MODEL SELECTION TECHNIQUES 3 that the family of models represents the true situation, which must then be tested by means of a hypothesis to determine if the model is consistent with specific aspects of the data. The advantage to this method is that relatively simple models may be selected to analyze the data. In so doing, assumptions are made that the models chosen are valid, and thus estimators are chosen and decisions made. The drawback is that the assumptions about unknown parameters becomes the focus, which in reality are only of interest if they can be naturally and usefully interpreted in the context of the observation. Instead, Linhart and Zucchini suggest using estimators robust to deviations from the assumed family selected for fitting holds. [2]. This approach is to choose a family of models which are estimated to be the most appropriate under the circumstances, which they identify as the background assumptions, the sample size, and the specific requirements of the user. Discrepancies Before comparing performances of competing models may occur, what measure to use to assess the fit or lack thereof must be determined. This measure of lack of fit is referred to as a discrepancy, denoted (f, g θ ). A discrepancy between the operating model and the best approximating model is referred to as the discrepancy due to approximation. It constitutes the lower bound for the discrepancy for models in the approximating family and does not depend on the data, the sample size, or the method of estimation employed. The discrepancy between the fitted model and the best approximating model is called the discrepancy due to estimation, which does depend on the sample values and changes depending on the sample. Therefore, discrepancy due to estimation is a random variable. Finally, the overall discrepancy is defined as the discrepancy between the operating model and fitted model, which takes both factors into account. Therefore, it is necessary to take the two issues into account when comparing approximating families of different complexity. The best model in more complex families is typically closer to the

4 4 GARY L. BECK, STEVE FROM, PH.D. operating model than the best model in simpler families. However, the fitted model in the complex family tends to be further away from the best model than is the simpler model. Thus, complex families have more potential but tend to perform below potential. Therefore, the overall discrepancy, which is a sum of its two component discrepancies, allows for an appropriate compromise between the two opposing properties [3]. All of this is possible when the actual operating model is known. However, in practice, this is rarely the case. In practice, calculating discrepancies is not possible even though they exist. Since the actual operating model is unknown and overall discrepancy is unknown, an estimator of the expected discrepancy, E (f, gˆθ), can be made, which is called a criterion. The expected discrepancy given by Linhart and Zucchini is notated as (1.1) E F (ˆθ) = E F (f(x) g (I) ˆθ (x))2 dx where g (I) ˆθ (x) = n ii 100n for 100(i 1) I < x 100i I, i = 1, 2,..., I. In this equation, n i represents the frequency of the ith interval. In taking the expectation of this integral after sufficient subdividing, the following equation was obtained 100 (1.2) E F (ˆθ) = E F (f(x) 2 dx + I ( 1 (n + 1) 100n where π i = 100i I 100(i 1) f(x)dx. I 0 The first term does not depend on the approximating family and therefore can be ignored. The second term is the essential term and an unbiased estimator of this term, a criterion, is given by I i=1 π 2 i ) (1.3) [ I 1 n + 1 ( I n 2 )] i 100n n 1 n 1 i=1 This type of procedure does not specify any one approximating family of models except on the basis of their criteria. Some situations merit using a simple or a complex model depending on the behavior of the data set. For some instances, it may also be possible to construct a test of a

5 PARAMETRIC MODEL SELECTION TECHNIQUES 5 hypothesis that a particular approximating family has a smaller expected discrepancy than another. In most cases, tests are difficult to construct, meriting reliance on simple comparisons of estimated expected discrepancies against a selected level of significance. Consider the following random age distributions from the Federal Republic of Germany in 1974 (Statistisches Bundesant,1976, p. 58). Figures 1 to 2 display the distribution of ages in histogram format. The figures show the grouped data in individual intervals, intervals of 10 and intervals of 50. The size of the interval will determine the optimal criterion. Figure 3 shows the criterion values for the population, indicating at I = 6 is approximately the smallest criterion value, thereby identifying it as the most appropriate. Table 1. Random Age Distributions, n= This leads to the primary issue of model selection; namely, identifying an operating family of models and constructing a discrepancy. Identifying the operating model is typically determined by the type of analysis one intends to complete, whether it is hypothesis testing, regression analysis,

6 6 GARY L. BECK, STEVE FROM, PH.D Figure 1. Age histogram, I=10 analysis of variance, etc. The next step is to choose a discrepancy. For this, what must be determined is the use to which the fitted model is to be assigned and the determination of which aspect of it must conform to the operating model. Important Discrepancies Discrepancies should be selected to match the objectives of the analysis [3]. A natural estimator to use with a particular discrepancy is a minimum discrepancy estimator, or minimum distance estimator. In other words, this is simply the discrepancy between the approximating model and the proposed operating model. The method of maximum likelihood estimation (MLE) which was developed by R.A. Fisher states the desired probability distribution be that which makes the observed data most likely [1]. Using

7 PARAMETRIC MODEL SELECTION TECHNIQUES Figure 2. Age histogram, I=50 MLE is an important general purpose method for calculating discrepancies. MLEs are asymptotically normally distributed, asymptotically minimum variance, and asymptotically unbiased (as n approaches infinity) [1]. The Kullback-Leibler discrepancy, notated K L (f, g θ ) = E F log(g θ (x)) = log(g θ (x)f(x)dx where g θ is the pdf characterizing the approximating family of models. The minimum discrepancy estimator associated with this discrepancy is also MLE. This discrepancy focuses on the expected log-likelihood when the approximating model is g θ, and, as a result, the higher the expected loglikelihood the better the model. Another possible discrepancy is the Cramér-von Mises discrepancy, which is notated as C M (θ) = E Gθ (F (x) G θ (x)) 2

8 8 GARY L. BECK, STEVE FROM, PH.D Figure 3. Graph of Criterion of Ages For discrete or grouped data sets, the Pearson chi-squared or Neyman chi-squared discrepancies are suitable. They are notated, respectively, as P (θ) = x (f(x) g θ (x)) 2, g θ (x) 0 g θ (x) and N (θ) = x (f(x) g θ (x)) 2, f(x) 0 f(x) where f and g θ are the pdf s characterizing the operating model and approximating family. Discrepancies need not depend on every detail of the distribution. Rather, discrepancies may be based on some specific aspect of the distributions. For example, in regression analysis, only certain expectations are of particular interest. Thus, the use of this method for model selection, albeit complex, is very flexible for any aspect of data analysis. Derivation of Criteria

9 PARAMETRIC MODEL SELECTION TECHNIQUES 9 Each operating model and discrepancy require their own method of analysis. Having decided which approximating families to be considered, the methods to be used to estimate the parameters, and which discrepancy to use to assess the fit, a criterion must be found, which is an estimator of the expected discrepancy E F (ˆθ). This expectation is taken with respect to the operating model F. The derivation of this can be straightforward or exceptionally complex (NOTE: The appendix of Linhart and Zucchini [2] details the derivation of these criterion). When the expected discrepancy is too complex to derive, asymptotic methods will sometimes work; i.e., its limiting values as the sample increases indefinitely. By approximating (ˆθ) by the first terms of its Taylor expansion about the point θ 0, the expectation is then calculated. Thus, as the sample size increases, the expected discrepancy approaches the form (1.4) E F (ˆθ) (θ 0 ) + K 2n where K = trω 1 Σ, a trace term of the product of two matrices. Bootstrap methods provide a simple and effective means of circumventing the technical problems encountered in deriving expected discrepancies and estimators. With this method, one generates repeated samples of size n using a fixed F n which was derived from the operating model and empirical distribution function. Each sample leads to more estimates of ˆθ, which is an estimator for an approximating family of models. The average of this converges to the expected discrepancy. Cross-validation is a technique to assess the validity of a statistical analysis. With this method, data are subdivided into a calibration sample (sample size of n-m) and the second sample to validate it (sample size of m). This procedure of fitting and validating is repeated m times, one for each subdivision. There is a problem in deciding how to select m without limiting the number of observations available to fit the model. With cross-validation, one may use a small m and follows the these steps for all possible model fitting samples of size n-m: fit the model to the calibration

10 10 GARY L. BECK, STEVE FROM, PH.D. sample then estimate the expected discrepancy for the fitted model using the validation sample. The cross-validation criterion is therefore the average over these repetitions of the estimates obtained in the second step. As shown in Figures 1 and 2, histogram densities are universally applicable. However, lower discrepancies may be achieved by fitting smoother approximating densities that depend on fewer parameters. Histograms are primarily used as a means of selecting approximating models and, therefore, smoother models such as the normal, lognormal, and gamma provide more concise descriptions. Unless certain distributional properties of the estimator are available, it is not possible to derive exact expressions for the expected discrepancy. Since finite sample distributions are too difficult to derive, one must rely on asymptotic methods, Bootstrap methods, or cross-validatory methods. This study concentrated on asymptotic methods of a complete data set acquired from the University of Nebraska Medical Center. In order to obtain asymptotic criteria, it is necessary to obtain a trace term, trω 1 Σ, derived from a functional on an M xm k-dimensional stochastic process. This may be estimated from data using an estimator for Ω and Σ. The criterion is then n (ˆθ) + trω 1 n Σ n n, wherein Ω and Σ have been derived. Alternatively, if the operating model were a member of the approximating family, the trace term becomes significantly simpler. For a number of discrepancies, the trace term is simply a multiple of p, the number of free parameters in the approximating family. It should be noted that approximations on which the derivation of simpler criteria is used will be inaccurate whenever the discrepancy due to approximation is large [2]. The Kullback-Leibler discrepancy is one of the most important general purpose discrepancies. It is also an essential part of the expected log-likelihood ratio and is related to entropy, a fundamental property in information theory. This discrepancy and its asymptotic criteria gives rise to a number

11 PARAMETRIC MODEL SELECTION TECHNIQUES 11 of discrepancies of standard distributions. The discrepancy is notated as (θ) = (G θ, F ) = E F log(g θ (x)) with empirical discrepancy of n (θ) = (G θ, F n ) = 1 nlog(g θ (x i )) n The asymptotic criterion is notated as i 1 n (ˆθ) + trω 1 n Σ n n where the simpler criterion is notated to be n (ˆθ) + p n The minimum discrepancy estimator, which is the maximum likelihood estimator, is notated ˆθ = argmin{ n (θ) : θ Θ} Given this information, Linhart and Zucchini have identified maximum likelihood estimators for a variety of probability distributions. For the purposes of this study, the normal, lognormal and gamma distributions were used. Based on the methodologies of Linhart and Zucchini (Appendix, [2]), the Kullback-Leibler criterion for each model is indicated below. Please note that sample moments and sample moments about sample means are denoted m h [ ] and m h[ ], h=1,2,... For example, m 2 [log(x)] = 1 n n i=1 log2 (x i ) is the second moment about the mean of log(x). Therefore, for normal distributions, the criterion is (1.5) 1 + log(2πm 2 ) 2 + m 4 + m 2 2 2nm 2 2

12 12 GARY L. BECK, STEVE FROM, PH.D. with trace term equalling m 4+m 2 2. The estimators for the normal distribution are 2nm 2 2 ˆλ = m 2 ˆµ = m 1 It should be noted that ˆλ is the same as ˆσ 2 for this distribution and the lognormal distribution. For the lognormal distribution, the criterion is (1.6) m 1[log(x)] log(2π) + log(m 2[log(x)]) 2 + m 4[log(x)] + m 2 2 [log(x)] 2nm 2 2 [log(x)] with trace term equalling m 4[log(x)]+m 2 2 [log(x)] 2nm 2 2 [log(x)]. The maximum likelihood estimators for lognormal are ˆµ = m 1[log(x)] ˆλ = m 2 [log(x)] Finally, the criterion for the gamma distribution is notated (1.7) log(γ(ˆν)) ˆν(log(ˆλ 1) (ˆν 1)m 1[log(x)] + trω 1 n Σ n n Trace term for the gamma distribution is the last term in equation 1.7 where Ω n = m 2 1 ˆν m 1 ˆν m 1 ˆν ψ (ˆν) Σ n = m 2 m 11 [x, log(x)] m 11 [x, log(x)] m 2 [log(x)]) The estimators for the gamma distribution are ˆνˆλ = m 1 and log(ˆλ) ψ(ˆν) = m 1 [log(x)].

13 PARAMETRIC MODEL SELECTION TECHNIQUES Methods Data Collection Data for this study was obtained from the University of Nebraska Medical Center Clinical Cancer Trials office. Pediatric patients who have undergone stem cell transplants are tracked by this office. Information pertaining to the date of diagnosis, date of treatment, and date of expiration are collected and stored in a Microsoft Access database. The Institutional Review Board gave an exemption to this study because no subjects would be identifiable. Analysis The entire population of pediatric patients who have received stem cell transplants at the University of Nebraska Medical Center is 289. Therefore, it is possible to compare the operating model based on the entire population with repeated random samples of 20. Use of equation 1.3, criterion for the data set was examined. Figures 4 and 5 show histograms of survival data based on intervals when I = 10 and I = 50. It was found that at I = 6, the criterion was the smallest and thus the optimal number of intervals for this histogram. Using the Kullback-Leibler criterion of normal, lognormal, and gamma distributions, the study will emphasize how well one of the chosen models selects the true operating model as determined by the mean square error. Using the known population, this technique should select the correct best model from the approximating models, where correct best is measured by the fraction of samples of size 20 when certain functionals of the cumulative distribution function of the population are most closely estimated. Let Y 0 be a real positive number. Let M = P (Y Y 0 ), where Y is a population value. For a given sample of n data points (n=20 ), each competing model (normal, lognormal, gamma) will provide an estimate of M. Let these be denoted by ˆM N (normal), ˆML (lognormal), and ˆM G (gamma) for each model. Now let ˆM N (i) equal the value of ˆMN for sample number i. This was computed

14 14 GARY L. BECK, STEVE FROM, PH.D Figure 4. Histogram of Survival Data, I=10 from the MLEs of the model parameter (a parametric model). Similar definitions for ˆM L (i) and ˆM G (i) exist. Let R n equal the number of samples generated at random of size n=20. Computing the absolute errors, the following results are E N (i) = M ˆM N (i), i = 1, 2,..., R n E L (i) = M ˆM L (i), i = 1, 2,..., R n E G (i) = M ˆM G (i), i = 1, 2,..., R n By generating 5,000 random samples of size n, approximate mean square errors (MSEs) are computed. MSE equals the mean of the squares of the deviations from the target [4]; i.e., MSE N = MSE L = 5000 i=1 (M ˆM N (i)) i=1 (M ˆM L (i)) = = 5000 i=1 E N(i) i=1 E L(i)

15 PARAMETRIC MODEL SELECTION TECHNIQUES Figure 5. Histogram of Survival Data, I=50 MSE G = 5000 i=1 (M ˆM G (i)) = 5000 i=1 E G(i) The actual best model will be the one with the smallest MSE( ) value among the normal, lognormal, and gamma models. Upon completion of these computations, the Kullback-Leibler criterion was calculated for each model to ascertain whether this method matched the determination based on the actual operating model. For this, the asymptotic criteria values for the Kullback-Leibler criterion for these models, notated as A N (i), A L (i), and A G (i), using equations 1.5, 1.6, and 1.7. The chosen parametric model should correspond with the smallest of these three values. For each value of size n=20, the asymptotic criteria was computed to determine a concluded best model. It should be noted this method of comparing asymptotic criteria with MSEs is valid for this study only because the complete population is known.

16 16 GARY L. BECK, STEVE FROM, PH.D. 3. Discussion The entire population of pediatric stem cell recipients were included in this data set, n=289. A co-author (SF) wrote a program in Fortran to generate random samples of 20 from the data set which calculated mean square errors and asymptotic criterion. To ensure accuracy of this methodology, 5,000 random samples were generated for each MSE and asymptotic criterion. All of these results could not be presented in this report. Consequently, Table 2 includes a representative sample of n=20 from the data set. Table 2. Random Sample of Survival Data of Stem Cell Recipients, n= From this random sample, maximum likelihood estimators for normal, lognormal, and gamma distributions were calculated. Based on these results, the Kullback-Leibler criterion, estimated standard deviation and trace values were calculated. These results are shown in Table 3. Table 3. Random Sample Criterion Results ˆµ ˆλ K-L Criterion Est. Std. Dev. Trace Normal Lognormal Gamma From these results, it is evident that the lognormal distribution has the smallest asymptotic criterion followed closely by the gamma distribution. Regardless, according to Kullback-Leibler, this would

17 PARAMETRIC MODEL SELECTION TECHNIQUES 17 then be the best model to use to analyze the population. The interesting note is that for a sample of this size in much of the medical literature, a normal distribution would be assumed due to the n size. However, by this method, the normal distribution resulted in the largest criterion and would therefore be a less than optimal choice for this population. The real data was used to obtain the actual value of M = P (Y Y 0 ), where Y 0 is a given positive real number and Y is a population value. Using the following inequality M = #populationvalues Y 0 N where N=289 is the population size. Based on the smallest MSE from 5,000 random samples of size n=20, the best parametric model was selected. In calculating the mean standard error, Y 0 = 50.0, M = P (Y Y 0 ) = The MSEs for each distribution were as follows: Normal = Lognormal = Gamma = Determining the MSE for the entire population, the gamma distribution appeared to be the optimal distribution to use for analyzing the population. However, given the close values for lognormal and gamma MSEs, the gamma distribution is only slightly better. In both cases of asymptotic criterion and MSE, the normal distribution is clearly not an appropriate means of analysis. When all 5,000 random samples were analyzed, the model that was deemed to be the most appropriate should have the smallest value the majority of the time. For this simulation, 76.4% of the samples of n=20 found that the asymptotic criterion correctly picked the best model for Y 0 = Of interest, the middle third of the distributions, the asymptotic criterion correctly

18 18 GARY L. BECK, STEVE FROM, PH.D. picked the best model. Where Y 0 is small or large, the asymptotic criterion does not seem to be the most optimal selection method Percent Y_0 Figure 6. Y 0 versus Percentage (Mean= ; Variance= , Median= ) 4. Conclusion The Kullback-Leibler criterion seems to work well for the middle of this population distribution (See Figure 6). It seems the normal distribution had a better fit for the left tail when Y 0 was small and the lognormal distribution is a better fit for the right tail for large Y 0. This implies another criterion which emphasizes a better fit for the tails is needed. The small population size and the fact that MSEs were extremely close for the gamma and lognormal distributions where the above percentages were small are also contributing factors to this phenomenon. Limitations:

19 PARAMETRIC MODEL SELECTION TECHNIQUES 19 The population size for this study was relatively small, n=289. To better test the Kullback- Leibler criterion, acquiring a population of much larger size would be ideal. This would allow for a greater range of random samples from which to draw. A serious limitation for this study was also the author s (GLB) inability to write Fortran code. This type of intense random sampling and calculation necessitates the ability to generate unique code not found in standard software packages. It should also be noted that even acquiring the assistance of an expert programmer for Maple software still proved to be inadequate for generating the necessary results. For example, criterion calculated using equation 1.3 tended toward I = 10 for both the example data and real data sets. Additional manual computations were done for the samples in Table 1 to ensure Maple s output was accurate. Future Considerations: Given the tail effects of the random sampling, it is evident a single parametric model is insufficient to measure this population as a whole. Rather than take a traditional approach of partitioning the data for analysis with the three best fitting models, considering a hybrid probability distribution may prove to be be a better model for this analysis. This would take into consideration the smaller and larger samples so that a single distribution could be used to model the population. It may also be possible to obtain an even larger data set from the University of Nebraska Medical Center. With a larger sample size, some of the phenomenon with the tail effects may be diminished. It would also allow for a more accurate assessment of the comparisons between MSEs and Kullback- Leibler criterion. References [1] Myung IJ. Maximum Likelihood Estimation. (Submitted for publication 11/21/01)

20 20 GARY L. BECK, STEVE FROM, PH.D. [2] Linhart H, Zucchini W. Wiley Series In Probability And Mathematical Statistics: Model Selection. John Wiley and Sons, New York, [3] Zucchini W. An Introduction to Model Selection. J Math Psych 2000, 44: [4] Battaglia GJ. Mean Square Error. AMP J Tech 1996; 6: Department of Pediatrics, University of Nebraska Medical Center, Omaha, NE, Department of Mathematics, University of Nebraska at Omaha, Omaha, NE

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

PASS Sample Size Software. Randomization Lists

PASS Sample Size Software. Randomization Lists Chapter 880 Introduction This module is used to create a randomization list for assigning subjects to one of up to eight groups or treatments. Six randomization algorithms are available. Four of the algorithms

More information

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization 10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala

More information

Annotated multitree output

Annotated multitree output Annotated multitree output A simplified version of the two high-threshold (2HT) model, applied to two experimental conditions, is used as an example to illustrate the output provided by multitree (version

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation 1. Introduction This appendix describes a statistical procedure for fitting fragility functions to structural

More information

Comparison of Methods for Analyzing and Interpreting Censored Exposure Data

Comparison of Methods for Analyzing and Interpreting Censored Exposure Data Comparison of Methods for Analyzing and Interpreting Censored Exposure Data Paul Hewett Ph.D. CIH Exposure Assessment Solutions, Inc. Gary H. Ganser Ph.D. West Virginia University Comparison of Methods

More information

Chapter 5. Track Geometry Data Analysis

Chapter 5. Track Geometry Data Analysis Chapter Track Geometry Data Analysis This chapter explains how and why the data collected for the track geometry was manipulated. The results of these studies in the time and frequency domain are addressed.

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Monte Carlo Methods and Statistical Computing: My Personal E

Monte Carlo Methods and Statistical Computing: My Personal E Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme arxiv:1811.06857v1 [math.st] 16 Nov 2018 On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme Mahdi Teimouri Email: teimouri@aut.ac.ir

More information

VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung

VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung POLYTECHNIC UNIVERSITY Department of Computer and Information Science VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung Abstract: Techniques for reducing the variance in Monte Carlo

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

1 Introduction RHIT UNDERGRAD. MATH. J., VOL. 17, NO. 1 PAGE 159

1 Introduction RHIT UNDERGRAD. MATH. J., VOL. 17, NO. 1 PAGE 159 RHIT UNDERGRAD. MATH. J., VOL. 17, NO. 1 PAGE 159 1 Introduction Kidney transplantation is widely accepted as the preferred treatment for the majority of patients with end stage renal disease [11]. Patients

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces

Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces Eric Christiansen Michael Gorbach May 13, 2008 Abstract One of the drawbacks of standard reinforcement learning techniques is that

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Generating random samples from user-defined distributions

Generating random samples from user-defined distributions The Stata Journal (2011) 11, Number 2, pp. 299 304 Generating random samples from user-defined distributions Katarína Lukácsy Central European University Budapest, Hungary lukacsy katarina@phd.ceu.hu Abstract.

More information

Computer Experiments. Designs

Computer Experiments. Designs Computer Experiments Designs Differences between physical and computer Recall experiments 1. The code is deterministic. There is no random error (measurement error). As a result, no replication is needed.

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Optimization and Simulation

Optimization and Simulation Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A. - 430 - ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD Julius Goodman Bechtel Power Corporation 12400 E. Imperial Hwy. Norwalk, CA 90650, U.S.A. ABSTRACT The accuracy of Monte Carlo method of simulating

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Laplace Transform of a Lognormal Random Variable

Laplace Transform of a Lognormal Random Variable Approximations of the Laplace Transform of a Lognormal Random Variable Joint work with Søren Asmussen & Jens Ledet Jensen The University of Queensland School of Mathematics and Physics August 1, 2011 Conference

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

The Cross-Entropy Method

The Cross-Entropy Method The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,

More information

(X 1:n η) 1 θ e 1. i=1. Using the traditional MLE derivation technique, the penalized MLEs for η and θ are: = n. (X i η) = 0. i=1 = 1.

(X 1:n η) 1 θ e 1. i=1. Using the traditional MLE derivation technique, the penalized MLEs for η and θ are: = n. (X i η) = 0. i=1 = 1. EXAMINING THE PERFORMANCE OF A CONTROL CHART FOR THE SHIFTED EXPONENTIAL DISTRIBUTION USING PENALIZED MAXIMUM LIKELIHOOD ESTIMATORS: A SIMULATION STUDY USING SAS Austin Brown, M.S., University of Northern

More information

1. Assumptions. 1. Introduction. 2. Terminology

1. Assumptions. 1. Introduction. 2. Terminology 4. Process Modeling 4. Process Modeling The goal for this chapter is to present the background and specific analysis techniques needed to construct a statistical model that describes a particular scientific

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide

More information

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Peer-Reviewed Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Fasheng Li, Brad Evans, Fangfang Liu, Jingnan Zhang, Ke Wang, and Aili Cheng D etermining critical

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

Optimal designs for comparing curves

Optimal designs for comparing curves Optimal designs for comparing curves Holger Dette, Ruhr-Universität Bochum Maria Konstantinou, Ruhr-Universität Bochum Kirsten Schorning, Ruhr-Universität Bochum FP7 HEALTH 2013-602552 Outline 1 Motivation

More information

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation Unit 5 SIMULATION THEORY Lesson 39 Learning objective: To learn random number generation. Methods of simulation. Monte Carlo method of simulation You ve already read basics of simulation now I will be

More information

Themes in the Texas CCRS - Mathematics

Themes in the Texas CCRS - Mathematics 1. Compare real numbers. a. Classify numbers as natural, whole, integers, rational, irrational, real, imaginary, &/or complex. b. Use and apply the relative magnitude of real numbers by using inequality

More information

A Random Number Based Method for Monte Carlo Integration

A Random Number Based Method for Monte Carlo Integration A Random Number Based Method for Monte Carlo Integration J Wang and G Harrell Department Math and CS, Valdosta State University, Valdosta, Georgia, USA Abstract - A new method is proposed for Monte Carlo

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM)

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM) Estimation of Unknown Parameters in ynamic Models Using the Method of Simulated Moments (MSM) Abstract: We introduce the Method of Simulated Moments (MSM) for estimating unknown parameters in dynamic models.

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Middle School Math Course 2

Middle School Math Course 2 Middle School Math Course 2 Correlation of the ALEKS course Middle School Math Course 2 to the Indiana Academic Standards for Mathematics Grade 7 (2014) 1: NUMBER SENSE = ALEKS course topic that addresses

More information

APPENDIX K MONTE CARLO SIMULATION

APPENDIX K MONTE CARLO SIMULATION APPENDIX K MONTE CARLO SIMULATION K-1. Introduction. Monte Carlo simulation is a method of reliability analysis that should be used only when the system to be analyzed becomes too complex for use of simpler

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Content based Image Retrievals for Brain Related Diseases

Content based Image Retrievals for Brain Related Diseases Content based Image Retrievals for Brain Related Diseases T.V. Madhusudhana Rao Department of CSE, T.P.I.S.T., Bobbili, Andhra Pradesh, INDIA S. Pallam Setty Department of CS&SE, Andhra University, Visakhapatnam,

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Simulation Input Data Modeling

Simulation Input Data Modeling Introduction to Modeling and Simulation Simulation Input Data Modeling OSMAN BALCI Professor Department of Computer Science Virginia Polytechnic Institute and State University (Virginia Tech) Blacksburg,

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Kanru Xia 1, Steven Pedlow 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603

More information

Model-based Recursive Partitioning

Model-based Recursive Partitioning Model-based Recursive Partitioning Achim Zeileis Torsten Hothorn Kurt Hornik http://statmath.wu-wien.ac.at/ zeileis/ Overview Motivation The recursive partitioning algorithm Model fitting Testing for parameter

More information

A Modified Weibull Distribution

A Modified Weibull Distribution IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 33 A Modified Weibull Distribution C. D. Lai, Min Xie, Senior Member, IEEE, D. N. P. Murthy, Member, IEEE Abstract A new lifetime distribution

More information

Introduction. Advanced Econometrics - HEC Lausanne. Christophe Hurlin. University of Orléans. October 2013

Introduction. Advanced Econometrics - HEC Lausanne. Christophe Hurlin. University of Orléans. October 2013 Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans October 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne October 2013 1 / 27 Instructor Contact

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

Introductory Applied Statistics: A Variable Approach TI Manual

Introductory Applied Statistics: A Variable Approach TI Manual Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Estimating the wavelength composition of scene illumination from image data is an

Estimating the wavelength composition of scene illumination from image data is an Chapter 3 The Principle and Improvement for AWB in DSC 3.1 Introduction Estimating the wavelength composition of scene illumination from image data is an important topics in color engineering. Solutions

More information

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by Course of study- Algebra 1-2 1. Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by students in Grades 9 and 10, but since all students must

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 13 Random Numbers and Stochastic Simulation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information