COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

Size: px
Start display at page:

Download "COPULA MODELS FOR BIG DATA USING DATA SHUFFLING"

Transcription

1 COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK Management Science & Information Systems, Spears College of Business, Oklahoma State University, Stillwater OK ABSTRACT Big data often involves complex relationships among variables. Copula models offer a simple, effective approach for modeling such complex relationships. Traditional implementation of copula models require the identification of the marginal distribution of the variables that make it difficult to automate the modeling process. In this study, we provide a non-parametric implementation using a new procedure called Data Shuffling that allows the entire modeling process to be easily automated. INTRODUCTION With the recent focus on the use of big data for marketing purposes, the development of models that can be used for prediction and inference using big data have gained importance. While traditional statistical models are useful in big data, there is a need to develop more sophisticated models. When the underlying distribution of the data is not normal and the relationship between the variables are non-linear, traditional statistical models are not appropriate. To overcome this problem, Danaher and Smith (2011) recently introduced the use of copula models for marketing applications. Using different examples from the marketing literature, they also demonstrated the versatility, flexibility, and accuracy of copulas for modeling marketing problems. The approach proposed by Danaher and Smith (2011) involves the following steps: (1) Estimation of the marginal distribution of the individual variables, (2) Estimation of the dependence parameters (the correlation matrix), (3) Generation of the multiple samples using Markov Chain Monte Carlo (MCMC) simulation, and (4) Estimating Bayesian point estimates of parameters and other metrics of interest using the MCMC generated samples. The procedure suggested by Danaher and Smith (2011) is an excellent one for the purposes of model construction, validation, and hypothesis testing in the context of marketing. However, in the context of big data, it would be difficult to use this procedure

2 Big data has three main characteristics: volume (quantity of data), variety (types of data), and velocity (speed of data collection). When implementing prediction models for big data, it is important that we consider the characteristics of the data. Since the data is high volume, it is necessary that the prediction models must be scalable to large scale data sets. Since the data has considerable variety, it is necessary that the prediction models be capable of incorporating numerical (both continuous and discrete) and categorical. Since the data has high velocity, the implementation should be automated and require no manual intervention. Unfortunately, the Danaher and Smith (2011) procedure does not adequately satisfy any of these requirements as we now discuss. Identification of the marginal distribution The identification of the marginal distribution of the individual variables is an important step in modeling using copulas. In many cases, there may be a clear theoretical basis for using a particular marginal distribution to model a particular variable. For example, the log-normal distribution is used as the appropriate marginal distribution for duration of visits based on the previous study by Danaher and Smith (2007); the beta binomial distribution is used as the appropriate marginal distribution for eggs and bacon data and the exposure to magazine advertisements based on a prior studies (Chandon 1986, Danaher and Hardie 2005, Rust 1986); and the negative binomial distribution for modeling the number of page views by web sites based on prior studies by Danahaer (2007) and Huang and Lin (2006). Thus, when there is prior information about the characteristics of the marginal distribution of the individual variables, then it would be appropriate to use this information. In situations where no theoretical basis exists for the use of a particular distribution for a particular variable, the decision becomes more difficult. The estimation of the marginal distribution of an individual variable has received considerable attention in the statistical literature. As observed by Danaher and Smith (2011), many approaches for estimating the marginal distribution of a variable exists including maximum likelihood, Bayesian, or method-of-moments. With each of these approaches, there are also multiple criteria for assessing the goodness of fit of the estimated marginal to the observed data. Unfortunately, the diversity of approaches also presents a problem it is difficult to identify one particular approach as being superior to all others. Even within a given approach, the diversity of criteria used to assess the goodness of fit creates doubt as to which distribution and estimated parameters provide the best fit for the observed data. Hence, for a given dataset, there may be reasonable disagreement as to which marginal distribution is best suited to model a particular variable. The problem is magnified when the problem under consideration has many marginal variables. In the Danaher and Smith (2011) study, the web site page views data consists of 45 variables. If previous studies had not established that the beta binomial distribution as the appropriate model for web site page views, then it would be necessary to identify the marginal distribution of each of these variables, which can be a difficult task. In addition, if the variables under consideration were not of similar measures (web site views), but represented completely different characteristics (like

3 for example age, income, etc.), the task of identifying the marginal distribution of each of these different characteristics can be an imposing task. Computational complexity The Danaher and Smith (2011) procedure is a considerable improvement over other procedures previously used for modeling complex relationships in marketing. However, their model requires considerable computational effort in order to compute estimates due to the use of the MCMC simulation. For instance, even for a relatively small problem with only observations, implementing their procedure requires approximately 55 minutes of computational time. This is a considerable improvement over other models which require over 6 hours of computational time. But in the context of big data, this computational effort would still be considered burdensome. Thus, we need to evaluate alternative approaches for implementing copula models for big data. In this study, we propose a modified version of the copula approach called data shuffling. However, it should be noted that the Danaher and Smith (2011) procedure should be the preferred approach for modeling in research scenarios. Prior to describing the data shuffling approach, we briefly describe the concept of copulas. COPULA MODELS Consider a set of random variables AA 1, AA 2,, AA mm with marginal cumulative distribution functions (CDF) uu ii = FF ii (aa ii ), ii = 1, 2,, mm and joint CDF = FF(aa 1, aa 2,, aa mm ). Sklar (1959) showed the joint CDF can be written as: FF(aa 1, aa 2,, aa mm ) = CC[uu 1, uu 2,, uu mm ], (1) where CC[uu 1, uu 2,, uu mm ] is a joint copula CDF with uniform marginal distributions. In addition, the joint probability density function (pdf) can be written in product form as: mm ff(aa 1, aa 2,, aa mm ) = ii=1 ff ii (aa ii )cc[uu 1, uu 2,, uu mm ] (2) where cc is called the copula density and ff ii (aa ii ) are the marginal densities of AA ii. The joint density shown in (2) also provides the ability to derive the conditional density of one or more of the random variables with respect to the other variables. The primary applications of copulas have been to combine specified (arbitrary) marginal distributions into joint distributions that exhibit certain specified dependence or joint behavior. Joe (1997), Nelsen (1995, 1999) and Schweizer (1991) serve as good introductions to the theory and application of copulas. A wide variety of copula functions have been investigated for combining non-normal distributions, both discrete and continuous. The selection of a copula function depends on the specific problem under consideration. The characteristics of the joint distribution also vary depending on the specific type of copula selected

4 In this study, we use the multivariate normal copula for illustration purposes. The normal copula parameterized with product moment correlation matrix ρρ can be written as: CC ρρ (uu) = φφ ρρ mm φφ 1 (uu 1 ), φφ 1 (uu 2 ),, φφ 1 (uu mm ), (3) where φφ ρρ mm represents the joint CDF of a mm-variate standard multivariate normal distribution with correlation matrix ρρ and φφ 1 represents the inverse of the CDF of the univariate standard normal function. In addition, for the multivariate normal distribution, the relationship between the rank order and product moment correlation can be expressed as follows: ρρ iiii = 2 SSSSSS ππrr iiii (4) 6 where ρρ iiii and rr iiii are the product moment and rank order correlation, respectively, between variables (ii, jj). Hence, the rank order correlation matrix RR of the original data can be used to compute the product moment correlation ρρ of the transformed normal variables. Thus, the multivariate normal copula model allows us to express the relationship between a set of variables with arbitrary marginal distributions using the relatively simple multivariate normal distribution. The application of copulas as described above requires the identification of the marginal distribution of each of the variables as described in Danaher and Smith (2011). As discussed earlier, the identification of the marginal distribution is often complicated and requires human involvement in the identification and selection of the best fit distribution. This may prevent the ability to automate the modeling process which is essential for big data applications. A nonparametric approach based only on the empirically observed data, called Data Shuffling, may provide a viable approach in these cases. DATA SHUFFLING Data shuffling was originally proposed by Muralidhar and Sarathy (2006) in the context of statistical disclosure limitation. The purpose of data shuffling was to generate a new data set that preserved the characteristics of the original data without disclosing information about individual records. Data shuffling is implemented as follows: (1) Identify the rank order correlation of the original data, (2) Construct the copula model using the rank order correlation, (3) Generate a new data set using the normalized copula model, and (4) Reverse map the original data onto to the generated normalized values. Note that the data shuffling process does not require the marginal distribution to be modeled and it does not require MCMC simulation, the two steps that make the implementation of the Danaher and Smith (2011) procedure problematic in the big data scenario. The key to data shuffling is the concept of reverse mapping. It is the ability to reverse map the normalized values back to the original data using ranks that allows for the implementation of the entire procedure without having

5 to identify the original marginal distribution. In traditional copula implementation, once the normalized values have been generated, it would be necessary to map these values back to the original distribution via the marginal distribution of the original data. In reverse mapping, we consider the normalized values that were generated as another random realization of the original data and the original data as the empirical marginal distribution and the random realization is then reverse mapped back to the original marginal distribution. Note that for large data sets, there exists a practically infinite number of potential combinations that result in the same rank order correlation matrix. The only exception to this rule would be the scenario where there is complete collinearity in the data (that is, the rank order correlation is either +1 or 1). For brevity, we do not go into the details of the theoretical derivations relating to data shuffling. We refer the interested reader to Muralidhar and Sarathy (2006). Consider the following simple illustration involving two variables and 20 observations with a rank order correlation of 0.60 (see Table 1). We use a multivariate copula to model this scenario (although data shuffling is not necessarily limited to this particular copula). The table shows the original data, the generated normalized copula values, and the reverse mapped values. It is important to note that when reverse mapping is performed, the rank order correlation of the normalized copula values and the reverse mapped data are exactly the same. This is extremely important since the key to the copula model is the preservation of the rank order correlation. To highlight the process of reverse mapping, consider the first normalized copula value (the first value in the YY 1 column) of In the traditional copula approach, we would first compute the normal distribution probability of this value. Then we would use this probability to compute the original value as the inverse of the cumulative probability of the marginal distribution of the variable. Obviously, this would require that we the marginal distribution of the variable XX 1. With reverse mapping, we avoid this process. We simply find the rank of the normalized copula value (which is 1). We replace this value with the value from XX 1 with a rank of 1 (0.2317). We repeat this for every record and every variable. As observed earlier, the rank order correlation of the normalized copula values and the reverse mapped values are identical, preserving the relationship between the variables. The illustration in Table 1 provides the basic conceptual idea behind data shuffling. In the following section, we provide the results of a comprehensive simulation experiment conducted to assess the effectiveness of data shuffling. EMPIRICAL EVALUATION OF DATA SHUFFLING In this section, we describe a simulation experiment conducted to evaluate the effectiveness of data shuffling. To be realistic, we intentionally chose the relationship between these variables to be complex. We consider a dataset with observations and 3 variables. Figures 1-3 show the relationship among these variables and their rank order correlation

6 Table 1. Small illustrative example of data shuffling Original data set Copula generated normalized data set Shuffled data set XX 1 XX 2 Rank Rank Rank Rank YY XX 1 XX 1 YY 2 2 (YY 1 & ZZ 1 ) (YY 2 & ZZ 2 ) ZZ 1 ZZ Rank order correlation Figure 1. Scatter plot of variables 1 and 2 for the original data (Rank order correlation = 0.70) Figure 2. Scatter plot of variables 1 and 3 for the original data (Rank order correlation = 0.60)

7 Figure 3. Scatter plot of variables 2 and 3 for the original data (Rank order correlation = 0.75) For illustrative purposes, Figures 4-6 provide the values generated using the data shuffling approach for a single set of simulated values. Comparing Figures (1 and 4), (2 and 5), and (3 and 6), we observe that the data generated using the data shuffling procedure closely approximates the original data. This result provides basic visual verification of the effectiveness of the data shuffling procedure. We conducted a simulation experiment to evaluate the performance of the data shuffling procedure. The purpose of the experiment was to assess the extent to which the values generated using the data shuffling approach was able to approximate the original relationship. The process of generating new samples was replicated 100 times. For each sample, we computed the rank order correlation between the variables. Using this rank order correlation, we computed the bias (the difference between the rank order correlation of the generated values and the original data). Figure 4. Scatter plot of variables 1 and 2 for the shuffled data Figure 5. Scatter plot of variables 1 and 3 for the shuffled data Figure 6. Scatter plot of variables 2 and 3 for the shuffled data

8 Table 2 provides the mean and standard deviation of this bias. The results indicate that the mean of this measure is very close to zero, indicating that data shuffling provides is unbiased in estimating the rank order correlation. The standard deviation of the bias is also very small, indicating that for this data set, data shuffling is very effective in providing a very close approximation of the true relationship between the variables. Figure 7 provides the frequency distribution of the bias in the rank order correlation between variables 1 and 2 across all 100 replications. The figure indicates that the difference between the simulated and original values are extremely small, and in most practical scenarios, would be considered negligible. In addition, the frequency distribution also provides the decision maker with the ability to perform simple hypothesis tests regarding the relationship between the variables. In summary, these results provide strong evidence of the effectiveness of data shuffling as a simple but effective procedure for using copula models in a big data environment. Table 2. Bias in correlation for shuffled data Measure Variables Variables Variables 1 and 2 1 and 3 2 and 3 Average Bias Standard deviation of Bias Figure 7. Frequency distribution of shuffled rank order correlation for variables 1 and 2 CONCLUSIONS The characteristics of big data (volume, variety, velocity) make it necessary that existing analytical approaches are modified to suit big data. Specifically, it is necessary that the modeling approaches are automated (requiring little or no human intervention) and scalable. When modeling complex relationships among variables, copulas offer a simple but effective approach. The traditional copula implementation require considerable effort in identifying the marginal distribution which makes them a difficult option for big data. In this study, we offer data shuffling as an alternative to the traditional copula models to overcome this problem. Our experimental results indicate that data shuffling is capable of effectively modeling complex relationships. By using reverse mapping, data shuffling eliminates the need for identifying the marginal distribution of the variables when implementing copula models. In addition, data shuffling is asymptotic, that is, as the size of the data set increases, the difference between using reverse mapping approach and the traditional marginal distribution approach becomes negligibly small. A more comprehensive investigation using more variables, different sample sizes, and different relationships is currently being conducted

9 REFERENCES Chandon, J.-L. J. (1986) A Comparative Study of Media Exposure Models. Garland, New York. Danaher, P. J. (2007) Modeling page views across multiple websites with an application to Internet reach and frequency prediction. Marketing Science, 26(3), Danaher, P. J., and Hardie, B.G.S. (2005) Bacon with your eggs? Applications of a new bivariate beta-binomial distribution. American Statistician, 59(4), Danaher, P.J. and Smith, M.S. (2011) Modeling Multivariate Distributions Using Copulas: Applications in Marketing. Marketing Science, 30(1), Huang, C.-Y. and Lin, C.-S. (2006) Modeling the audience s banner ad exposure for Internet advertising planning. Journal of Advertising, 35(2), Joe, J. (1997) Multivariate Models and Dependence Concepts. Chapman & Hall, London. Muralidhar, K. and Sarathy, R. (2006) Data Shuffling A New Masking Approach for Numerical Data. Management Science, 52(5), Nelsen, R.B. (1995) Copulas, Characterization, Correlation and Counterexamples. Math. Magazine, Rust, R. T. (1986) Advertising Media Models: A Practical Guide. Lexington Books, Lexington, MA. Schweizer, B. (1991) Thirty Years of Copulas. in G. Dall Aglio, S. Kotz, G. Salinetti, (eds.) Advances in Probability Distributions with Given Marginals, Kluwer, Dordrecht, Netherlands, Sklar, A. (1959) Fonctions de Répartition à n dimensions et Leurs Mages. Publications de l Institut Statisitque de l Universite de Paris,

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Statistical matching: conditional. independence assumption and auxiliary information

Statistical matching: conditional. independence assumption and auxiliary information Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Random Number Generation and Monte Carlo Methods

Random Number Generation and Monte Carlo Methods James E. Gentle Random Number Generation and Monte Carlo Methods With 30 Illustrations Springer Contents Preface vii 1 Simulating Random Numbers from a Uniform Distribution 1 1.1 Linear Congruential Generators

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool SMA6304 M2 ---Factory Planning and scheduling Lecture Discrete Event of Manufacturing Systems Simulation Sivakumar AI Lecture: 12 copyright 2002 Sivakumar 1 Simulation Simulation - A Predictive Tool Next

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Thomas Mejer Hansen, Klaus Mosegaard, and Knud Skou Cordua 1 1 Center for Energy Resources

More information

An Introduction to the Bootstrap

An Introduction to the Bootstrap An Introduction to the Bootstrap Bradley Efron Department of Statistics Stanford University and Robert J. Tibshirani Department of Preventative Medicine and Biostatistics and Department of Statistics,

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR Numerical Computation, Statistical analysis and Visualization Using MATLAB and Tools Authors: Jamuna Konda, Jyothi Bonthu, Harpitha Joginipally Infotech Enterprises Ltd, Hyderabad, India August 8, 2013

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Multivariate probability distributions

Multivariate probability distributions Multivariate probability distributions September, 07 STAT 0 Class Slide Outline of Topics Background Discrete bivariate distribution 3 Continuous bivariate distribution STAT 0 Class Slide Multivariate

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Pallavi Baral and Jose Pedro Fique Department of Economics Indiana University at Bloomington 1st Annual CIRANO Workshop on Networks

More information

Improving the Post-Smoothing of Test Norms with Kernel Smoothing

Improving the Post-Smoothing of Test Norms with Kernel Smoothing Improving the Post-Smoothing of Test Norms with Kernel Smoothing Anli Lin Qing Yi Michael J. Young Pearson Paper presented at the Annual Meeting of National Council on Measurement in Education, May 1-3,

More information

CONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS

CONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS CONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS Introduction Ning Liu and Dean S. Oliver University of Oklahoma, Norman, Oklahoma, USA; ning@ou.edu The problem of estimating the

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

Package alphastable. June 15, 2018

Package alphastable. June 15, 2018 Title Inference for Stable Distribution Version 0.1.0 Package stable June 15, 2018 Author Mahdi Teimouri, Adel Mohammadpour, Saralees Nadarajah Maintainer Mahdi Teimouri Developed

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Data Quality Control: Using High Performance Binning to Prevent Information Loss Paper 2821-2018 Data Quality Control: Using High Performance Binning to Prevent Information Loss Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation ABSTRACT It is a well-known fact that the structure

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

Bayesian Robust Inference of Differential Gene Expression The bridge package

Bayesian Robust Inference of Differential Gene Expression The bridge package Bayesian Robust Inference of Differential Gene Expression The bridge package Raphael Gottardo October 30, 2017 Contents Department Statistics, University of Washington http://www.rglab.org raph@stat.washington.edu

More information

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A. - 430 - ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD Julius Goodman Bechtel Power Corporation 12400 E. Imperial Hwy. Norwalk, CA 90650, U.S.A. ABSTRACT The accuracy of Monte Carlo method of simulating

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen 1 Background The primary goal of most phylogenetic analyses in BEAST is to infer the posterior distribution of trees and associated model

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

Monte Carlo Methods and Statistical Computing: My Personal E

Monte Carlo Methods and Statistical Computing: My Personal E Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Uniform Fractional Part Algorithm And Applications

Uniform Fractional Part Algorithm And Applications Proceedings of the 2011 International Conference on Industrial Engineering and Operations Management Kuala Lumpur, Malaysia, January 22 24, 2011 Uniform Fractional Part Algorithm And Applications Elham

More information

Why is Statistics important in Bioinformatics?

Why is Statistics important in Bioinformatics? Why is Statistics important in Bioinformatics? Random processes are inherent in evolution and in sampling (data collection). Errors are often unavoidable in the data collection process. Statistics helps

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Bland-Altman Plot and Analysis

Bland-Altman Plot and Analysis Chapter 04 Bland-Altman Plot and Analysis Introduction The Bland-Altman (mean-difference or limits of agreement) plot and analysis is used to compare two measurements of the same variable. That is, it

More information

BRANDING AND STYLE GUIDELINES

BRANDING AND STYLE GUIDELINES BRANDING AND STYLE GUIDELINES INTRODUCTION The Dodd family brand is designed for clarity of communication and consistency within departments. Bold colors and photographs are set on simple and clean backdrops

More information

MCMC Methods for Bayesian Mixtures of Copulas

MCMC Methods for Bayesian Mixtures of Copulas Ricardo Silva Department of Statistical Science University College London ricardo@stats.ucl.ac.uk Robert B. Gramacy Statistical Laboratory University of Cambridge bobby@statslab.cam.ac.uk Abstract Applications

More information

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Approximate Bayesian Computation. Alireza Shafaei - April 2016 Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark Scalable Multidimensional Hierarchical Bayesian Modeling on Spark Robert Ormandi, Hongxia Yang and Quan Lu Yahoo! Sunnyvale, CA 2015 Click-Through-Rate (CTR) Prediction Estimating the probability of click

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Model validation through "Posterior predictive checking" and "Leave-one-out"

Model validation through Posterior predictive checking and Leave-one-out Workshop of the Bayes WG / IBS-DR Mainz, 2006-12-01 G. Nehmiz M. Könen-Bergmann Model validation through "Posterior predictive checking" and "Leave-one-out" Overview The posterior predictive distribution

More information

A MECHANICAL APPROACH OF MULTIVARIATE DENSITY FUNCTION APPROXIMATION

A MECHANICAL APPROACH OF MULTIVARIATE DENSITY FUNCTION APPROXIMATION A MECHANICAL APPROACH OF MULTIVARIATE DENSITY FUNCTION APPROXIMATION László Mohácsi (a), Orsolya Rétallér (b) (a) Department of Computer Science, Corvinus University of Budapest (b) MTA-BCE Lendület Strategic

More information

A Random Number Based Method for Monte Carlo Integration

A Random Number Based Method for Monte Carlo Integration A Random Number Based Method for Monte Carlo Integration J Wang and G Harrell Department Math and CS, Valdosta State University, Valdosta, Georgia, USA Abstract - A new method is proposed for Monte Carlo

More information

In a two-way contingency table, the null hypothesis of quasi-independence. (QI) usually arises for two main reasons: 1) some cells involve structural

In a two-way contingency table, the null hypothesis of quasi-independence. (QI) usually arises for two main reasons: 1) some cells involve structural Simulate and Reject Monte Carlo Exact Conditional Tests for Quasi-independence Peter W. F. Smith and John W. McDonald Department of Social Statistics, University of Southampton, Southampton, SO17 1BJ,

More information

Webinar Parameter Identification with optislang. Dynardo GmbH

Webinar Parameter Identification with optislang. Dynardo GmbH Webinar Parameter Identification with optislang Dynardo GmbH 1 Outline Theoretical background Process Integration Sensitivity analysis Least squares minimization Example: Identification of material parameters

More information

An Interval-Based Tool for Verified Arithmetic on Random Variables of Unknown Dependency

An Interval-Based Tool for Verified Arithmetic on Random Variables of Unknown Dependency An Interval-Based Tool for Verified Arithmetic on Random Variables of Unknown Dependency Daniel Berleant and Lizhi Xie Department of Electrical and Computer Engineering Iowa State University Ames, Iowa

More information

Simulation Modeling and Analysis

Simulation Modeling and Analysis Simulation Modeling and Analysis FOURTH EDITION Averill M. Law President Averill M. Law & Associates, Inc. Tucson, Arizona, USA www. averill-law. com Boston Burr Ridge, IL Dubuque, IA New York San Francisco

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Descriptive and Graphical Analysis of the Data

Descriptive and Graphical Analysis of the Data Descriptive and Graphical Analysis of the Data Carlo Favero Favero () Descriptive and Graphical Analysis of the Data 1 / 10 The first database Our first database is made of 39 seasons (from 1979-1980 to

More information

Applications of the k-nearest neighbor method for regression and resampling

Applications of the k-nearest neighbor method for regression and resampling Applications of the k-nearest neighbor method for regression and resampling Objectives Provide a structured approach to exploring a regression data set. Introduce and demonstrate the k-nearest neighbor

More information

Probability and Statistics for Final Year Engineering Students

Probability and Statistics for Final Year Engineering Students Probability and Statistics for Final Year Engineering Students By Yoni Nazarathy, Last Updated: April 11, 2011. Lecture 1: Introduction and Basic Terms Welcome to the course, time table, assessment, etc..

More information

Two-dimensional Totalistic Code 52

Two-dimensional Totalistic Code 52 Two-dimensional Totalistic Code 52 Todd Rowland Senior Research Associate, Wolfram Research, Inc. 100 Trade Center Drive, Champaign, IL The totalistic two-dimensional cellular automaton code 52 is capable

More information

Direct Sequential Co-simulation with Joint Probability Distributions

Direct Sequential Co-simulation with Joint Probability Distributions Math Geosci (2010) 42: 269 292 DOI 10.1007/s11004-010-9265-x Direct Sequential Co-simulation with Joint Probability Distributions Ana Horta Amílcar Soares Received: 13 May 2009 / Accepted: 3 January 2010

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Monte Carlo Simulations

Monte Carlo Simulations Monte Carlo Simulations DESCRIPTION AND APPLICATION Outline Introduction Description of Method Cost Estimating Example Other Considerations Introduction Most interesting things are probabilistic (opinion)

More information

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications D.A. Karras 1 and V. Zorkadis 2 1 University of Piraeus, Dept. of Business Administration,

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART Vadhana Jayathavaj Rangsit University, Thailand vadhana.j@rsu.ac.th Adisak

More information

A Class of Symmetric Bivariate Uniform Distributions

A Class of Symmetric Bivariate Uniform Distributions A Class of Symmetric Bivariate Uniform Distributions Thomas S. Ferguson, 7/8/94 A class of symmetric bivariate uniform distributions is proposed for use in statistical modeling. The distributions may be

More information

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Today s outline: pp

Today s outline: pp Chapter 3 sections We will SKIP a number of sections Random variables and discrete distributions Continuous distributions The cumulative distribution function Bivariate distributions Marginal distributions

More information

SPSS Basics for Probability Distributions

SPSS Basics for Probability Distributions Built-in Statistical Functions in SPSS Begin by defining some variables in the Variable View of a data file, save this file as Probability_Distributions.sav and save the corresponding output file as Probability_Distributions.spo.

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger Pennsylvania State University ONR grant N00014-08-1-1015 Scalable Methods for the Analysis of Network-Based

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

For the hardest CMO tranche, generalized Faure achieves accuracy 10 ;2 with 170 points, while modied Sobol uses 600 points. On the other hand, the Mon

For the hardest CMO tranche, generalized Faure achieves accuracy 10 ;2 with 170 points, while modied Sobol uses 600 points. On the other hand, the Mon New Results on Deterministic Pricing of Financial Derivatives A. Papageorgiou and J.F. Traub y Department of Computer Science Columbia University CUCS-028-96 Monte Carlo simulation is widely used to price

More information

QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 2013

QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 2013 QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 213 Introduction Many statistical tests assume that data (or residuals) are sampled from a Gaussian distribution. Normality tests are often

More information

Multivariate Standard Normal Transformation

Multivariate Standard Normal Transformation Multivariate Standard Normal Transformation Clayton V. Deutsch Transforming K regionalized variables with complex multivariate relationships to K independent multivariate standard normal variables is an

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Calculus Limits Images in this handout were obtained from the My Math Lab Briggs online e-book.

Calculus Limits Images in this handout were obtained from the My Math Lab Briggs online e-book. Calculus Limits Images in this handout were obtained from the My Math Lab Briggs online e-book. A it is the value a function approaches as the input value gets closer to a specified quantity. Limits are

More information

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA. Subject Descriptive statistics with TANAGRA. The aim of descriptive statistics is to describe the main features of a collection of data in quantitative terms 1. The visualization of the whole data table

More information

Shading II. CITS3003 Graphics & Animation

Shading II. CITS3003 Graphics & Animation Shading II CITS3003 Graphics & Animation Objectives Introduce distance terms to the shading model. More details about the Phong model (lightmaterial interaction). Introduce the Blinn lighting model (also

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All

More information

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Bayesian Analysis of Extended Lomax Distribution

Bayesian Analysis of Extended Lomax Distribution Bayesian Analysis of Extended Lomax Distribution Shankar Kumar Shrestha and Vijay Kumar 2 Public Youth Campus, Tribhuvan University, Nepal 2 Department of Mathematics and Statistics DDU Gorakhpur University,

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Kernel Density Estimation (KDE)

Kernel Density Estimation (KDE) Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this

More information

A Bayesian approach to parameter estimation for kernel density estimation via transformations

A Bayesian approach to parameter estimation for kernel density estimation via transformations A Bayesian approach to parameter estimation for kernel density estimation via transformations Qing Liu,, David Pitt 2, Xibin Zhang 3, Xueyuan Wu Centre for Actuarial Studies, Faculty of Business and Economics,

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

SAS High-Performance Analytics Products

SAS High-Performance Analytics Products Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products

More information