In a two-way contingency table, the null hypothesis of quasi-independence. (QI) usually arises for two main reasons: 1) some cells involve structural
|
|
- Sabina Sanders
- 5 years ago
- Views:
Transcription
1 Simulate and Reject Monte Carlo Exact Conditional Tests for Quasi-independence Peter W. F. Smith and John W. McDonald Department of Social Statistics, University of Southampton, Southampton, SO17 1BJ, United Kingdom 1 Introduction In a two-way contingency table, the null hypothesis of quasi-independence (QI) usually arises for two main reasons: 1) some cells involve structural zeros or 2) interest is focused on part of the table, e.g., the o-diagonal cells. Consider Table 1, analyzed by Becker (1990), which cross-classies two independent interpretations of sputum cytology slides for lung cancer. Since the two interpretations tend to agree, most of the observations lie on the main diagonal and the hypothesis of independence is rejected. The hypothesis of QI for the o-diagonal cells, i.e., that the interpretations are independent given that they dier, is considered. However, the sparseness of the o-diagonal cells causes concern about the validity of using asymptotic tests, and an exact conditional test is used. Table 1: Cross-classication of rst and second independent interpretations of sputum cytology slides for lung cancer (Source: Archer et al., 1966) First interpretation Second interpretation N A S P T Negative Ambiguous cells Suspect Positive Technically unsatisfactory In order to perform an exact test of quasi-independence the null distribution of an appropriate test statistic must be calculated or simulated. For both independence and quasi-independence calculating the required distribution is often computationally infeasible. So simulation is used and a Monte Carlo exact conditional test is performed. 1
2 A Monte Carlo exact conditional test for independence is described by Agresti, Wackerly and Boyett (1979), Kreiner (1987) and Whittaker (1990). Briey, one generates a random sample of tables according to the conditional distribution of the table counts given the marginal totals. For each generated table, an appropriate test statistic is calculated and the exact conditional p-value is estimated by the proportion of generated tables which are at least as discrepant from the null as the observed. The accuracy of this unbiased estimate may be evaluated using binomial condence intervals. The problem, when using this approach to test for quasi-independence, is how to generate a random sample of tables from the null distribution. Since, as shown by Smith and McDonald (1993), the null distribution has a normalizing constant which is very dicult to evaluate. In the next section, we introduce a simulate and reject procedure based on simulating tables under independence. We then suggest some modications which dramatically reduce the rejection rate and so make the procedure viable. 2 Simulate and Reject Procedure Let X = fx ij : ij 2 I = (1; : : : ; r) (1; : : : ; c)g be a r c contingency table, and let I be a proper subset of the index set I. We call the cells in I the cells of interest and the cells not in I xed. For the 5 5 Table 1, I refers to the o-diagonal cells. The saturated log-linear model for m ij = E(X ij ) has the form log m ij = + 1 i + 2 j + 12 ij : The hypothesis of quasi-independence over I corresponds to 12 ij = 0 for ij 2 I. Now 12 for ij ij 62 I are nuisance parameters with sucient statistics x ij ; ij 62 I. Therefore, an exact conditional test for QI is constructed using the conditional distribution of the table counts, given the margins and the observed counts in the xed cells. Hence, tables under QI can be generated by simulating tables under independence and only retaining those where the counts in the xed cells match the observed values. For Table 1, we simulate tables from a multivariate hypergeometric distribution, thus maintaining the margins, and reject all tables which do not match the diagonal (26,11,6,4,2). Methods for simulating from a multivariate hypergeometric distribution are given by Agresti, Wackerly and Boyett (1979) and Pateeld (1981). Alas, this naive simulate and reject procedure is not computationally viable, since we failed to simulate under independence a table with a matching diagonal in over one billion attempts! All is not lost. Smith and McDonald (1993) show that the distribution of the cells of interest under quasi-independence does not depend on the observed values of the xed cells. By replacing the values in the xed cells with any counts and adjusting the margins accordingly, the simulate and reject procedure yields the correct null distribution. Therefore, the rejection rate 2
3 can be signicantly reduced by replacing the counts in the xed cells by those closest to independence, based on the adjusted margins. For Table 1 we replace the diagonal with (3,13,1,0,1). Note that the row and column margins for this adjusted tables are (x i+) = (30; 27; 8; 1; 3) and (x+i) = (6; 34; 7; 9; 13), respectively, and that now x ii equals the nearest integer to x i+x+i=x++, where + denotes summation over a subscript. Using this adjusted table, in order to obtain 2000 tables with matching diagonal, 234,595 tables were simulated under independence. The rejection rate of 99.15% is very large, but this adjusted-margins simulate and reject procedure is now computationally feasible. Pateeld (1981) simulates the required multivariate hypergeometric distribution by simulating cell by cell and row by row from univariate hypergeometrics, based on a factorization of the multiple hypergeometric mass function. Note that each r c table requires (r? 1) (c? 1) simulated counts (the others obtained by subtraction). For Table 1, 234; = 3; 753; 520 simulations were required to obtain 2000 tables with matching diagonal, i.e., an average of 1877 simulations per retained table. We now propose various ways of reducing the average number of simulated cell counts required per retained table, by modifying Pateeld's algorithm. 2.1 Rejecting Partly Simulated Tables Pateeld's algorithm simulates tables cell by cell, so a mismatch can be identied immediately after the count for a xed cell has been simulated, thus eliminating unnecessary simulation of the remaining cell counts in the table. For Table 1, after adjusting the diagonal and margins, we would repeatedly simulate the (1,1) cell count until a match of 3 occurs, then simulate the (1,2) to (1,4) cell counts and obtain the (1,5) cell count by subtraction. However, since the number of rejections does not aect the distribution of the tables retained, the (1,1) cell count can be set at its observed value and the rest of the row obtained as described. Next the (2,1) and (2,2) cell counts are simulated. If the simulated (2,2) cell count matches the observed value of 13, the rest of the row can be simulated. If not, the whole table must be rejected and a new table started. Once we have a successful match for the (2,2) cell count, we can continue simulating the table until the count in the next xed cell is simulated, the (3,3) cell here. Again, if we have a match, we continue simulating the table; a mismatch means that we must reject the table and start again. We continue in this manner until we have simulated a table with the required matching counts for all xed cells, remembering to check for matches where the count is obtained by subtraction. Partly simulated tables are now rejected, so eciency is measured by the number of cell counts simulated per retained table. By xing the rst cell count and rejecting partly simulated tables, 481,605 simulations were required to obtain 2000 tables, an average 241 per retained table (versus 1877 without these improvements). 3
4 2.2 Changing the Order of Cell Count Simulation A further improvement is to permute the rows and columns of the table in order to attempt to match the counts in the xed cells as early as possible. Hence, on average, reducing the number of wasted simulations. For example, if the only xed cell in a r c table is the (r; c) cell, we must simulate the whole table before checking for a match for the last cell. By permuting the rows and columns so the xed cell becomes the (1,1) cell, we can set the count in the (1,1) cell at its observed value and simulate the rest of the table. Therefore, no rejection is required. McDonald and Smith (1994) extend this idea to triangular tables and propose an algorithm where no rejection is necessary. However, when simulating cell counts row by row, no such permutation is possible for tables with only diagonal xed cells. We now discuss the important and common situation of testing for odiagonal QI in a r r table. Recall that Pateeld's algorithm simulates cell by cell, row by row. However, one can show that in order to simulate the (i; j) cell only cells above and to the left need to have been simulated, i.e., the cells (k; l); k = 1; : : : ; i; l = 1; : : : ; j; k 6= l. Note that these cells plus the cell whose count is being simulated form a rectangle. Therefore, we can change the order in which the cells are simulated so as to attempt to match the counts in the xed cells as early as possible. When matching on the diagonal, we can set the (1,1) cell count to the (adjusted) observed value. The next xed count to match on is in the (2,2) cell, so we need only simulate cells counts above and to the left before checking that the simulated count for the (2,2) cell equals the (adjusted) observed value. Here we have only simulated 3 cell counts before checking for a match. If we have a mismatch, we have saved r? 3 unnecessary simulations for the rst row. If we have a match, we continue by simulating the counts of the cells above and to the left of the (3,3) cell, which reduces the number of simulations required before the second match is attempted. After each successful match we continue through the table in this manner. We call this the expanding-rectangle algorithm. For Table 1, this algorithm reduced the average number of simulations per retained table to 172 (from 241 when simulating row by row). For a r r table with xed diagonal, the counts in the xed cells can be reordered by permuting the rows and columns using the same permutation. For the r! possible reorderings, the average number of simulations per retained table varies. In our experience, attempting the \hardest" matches rst reduces the average number of simulations per retained table. For example if the (r; r) cell count is the \hardest", we would simulate the whole table only to have to reject the table frequently because the nal match is the \hardest". On the other hand, if the \hardest" match is the (1,1) cell count, this count is set to the (adjusted) observed value and the \hardest" match never attempted. Our measure of hardness of match for the (i; i) cell count is the conditional probability of a match, given that we have matched on the (k; k); k = 1; : : : ; i? 1, cell counts. 4
5 When trying to determine the optimum permutation of the diagonal, the problem is how to calculate the conditional probability of a match, i.e., the hardness of a match. However, our experience suggests that the conditional probability of a match is approximately equal to the \marginal" probability of a match, i.e., the probability of a match if we were simulating the whole table before checking for matches. This is easily calculated for each diagonal cell since, as shown by Pateeld's factorization and used in his algorithm, the marginal probability of a match is hypergeometric. For Table 1 with diagonal (3,13,1,0,1), the marginal probabilities of a match are , , , , , respectively. We permute the rows (and columns) using the permutation (2,1,5,3,4) so that these probabilities are in increasing order for the rearranged table. Now using the expanding-rectangle algorithm on the permuted table, the average number of simulations per retained table is reduced to 104 (from 172 before permuting). 2.3 Estimated P-values The likelihood ratio test statistic for quasi-independence for Table 1 is with estimated exact p-value of and associated 99% condence interval of ( , ), based on 20,000 tables generated under QI. While the observed test statistic and associated p-value are extreme, note that the rejection rate does not depend on their values. 3 Discussion In this paper, we propose improvements to a naive simulate and reject procedure for generating r c tables under quasi-independence for an arbitrary pattern of xed cells. Although some of the algorithmic improvements are described for generating under QI for the o-diagonal cells of a square table, the ideas are applicable to other patterns of xed cells. Apart from complete enumeration, which is only viable for small tables, the simulate and reject procedure is currently the only method for generating independent tables from the exact null distribution under QI. Our improvements to the naive procedure greatly increase its eciency. Smith, McDonald and Forster (1994) discuss another method for generating tables under QI using a Gibbs sampling approach, based on theoretical results in Forster, McDonald and Smith (1994). However, the generated tables are not necessarily independent and are only realizations from an approximation to the exact null distribution. When using a single Markov chain, the observed table is the obvious starting value. For multiple chains, obtaining other starting values with the same sucient statistics for the nuisance parameters as the observed data is problematic. A possible solution is to generate a small number of independent starting values using the simulate and reject algorithms proposed. 5
6 Acknowledgements This work was supported by Economic and Social Research Council award H as part of the Analysis of Large and Complex Datasets Programme. References Agresti, A., Wackerly, D. and Boyett, J. M. (1979). Exact conditional tests for cross-classications: approximation of attained signicance levels. Psychometrika, 44, 75{83. Archer, P. G., Koprowska, I., McDonald, J. R., Naylor, B., Papanicolaou, G. N. and Umiker, W. O. (1966). A study of variability in the interpretation of sputum cytology slides. Cancer Res., 26, 2122{2144. Becker, M. P. (1990). Quasisymmetric models for the analysis of square contingency tables. J. R. Statist. Soc. B, 52, 369{378. Forster, J. J., McDonald, J. W. and Smith, P. W. F. (1994). Monte Carlo exact conditional tests for log-linear and logistic models. Working Paper, University of Southampton. Kreiner, S. (1987). Analysis of multi-dimensional contingency tables by exact conditional tests: techniques and strategies. Scand. J. Statist., 14, 97{112. McDonald, J. W. and Smith, P. W. F. (1994). Exact conditional tests of quasi-independence for triangular contingency tables: estimating attained signicance levels. Appl. Statist., (to appear). Pateeld, W. M. (1981). Algorithm AS 159: An ecient method of generating random R C tables with given row and column totals. Appl. Statist., 30, 91{97. Smith, P. W. F. and McDonald, J. W. (1993). Exact conditional tests for incomplete contingency tables: estimating attained signicance levels. Working Paper, University of Southampton. Smith, P. W. F., McDonald, J. W. and Forster, J. J. (1994). Monte Carlo exact conditional tests for quasi-independence using Gibbs sampling. Working Paper, University of Southampton. Whittaker J. (1990). Graphical Models in Applied Multivariate Statistics. Chichester: Wiley. 6
Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in
A Complete Anytime Algorithm for Number Partitioning Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90095 korf@cs.ucla.edu June 27, 1997 Abstract Given
More informationFor the hardest CMO tranche, generalized Faure achieves accuracy 10 ;2 with 170 points, while modied Sobol uses 600 points. On the other hand, the Mon
New Results on Deterministic Pricing of Financial Derivatives A. Papageorgiou and J.F. Traub y Department of Computer Science Columbia University CUCS-028-96 Monte Carlo simulation is widely used to price
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationIMPUTING MISSING VALUES IN TWO-WAY CONTINGENCY TABLES USING LINEAR PROGRAMMING AND MARKOV CHAIN MONTE CARLO
STATISTICAL COMMISSION and Working paper no. 39 ECONOMIC COMMISSION FOR EUROPE English only CONFERENCE OF EUROPEAN STATISTICIANS UNECE Work Session on Statistical Data Editing (27-29 May 2002, Helsinki,
More informationComputational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016
Computational Methods in Statistics with Applications A Numerical Point of View L. Eldén SeSe March 2016 Large Data Sets IDA Machine Learning Seminars, September 17, 2014. Sequential Decision Making: Experiment
More informationClustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract
Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationIssues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users
Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationIntegration. Volume Estimation
Monte Carlo Integration Lab Objective: Many important integrals cannot be evaluated symbolically because the integrand has no antiderivative. Traditional numerical integration techniques like Newton-Cotes
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationThe Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing
UW Biostatistics Working Paper Series 9-6-2005 The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing John D. Storey University of Washington, jstorey@u.washington.edu Suggested
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationWhat Is An Algorithm? Algorithms are the ideas behind computer programs. An algorithm is the thing which stays the same whether
What Is An Algorithm? Algorithms are the ideas behind computer programs An algorithm is the thing which stays the same whether the program is in Pascal running on a Cray innew York or is in BASIC running
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationProbabilistic (Randomized) algorithms
Probabilistic (Randomized) algorithms Idea: Build algorithms using a random element so as gain improved performance. For some cases, improved performance is very dramatic, moving from intractable to tractable.
More informationRowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907
The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationAn Algorithm to Compute Exact Power of an Unordered RxC Contingency Table
NESUG 27 An Algorithm to Compute Eact Power of an Unordered RC Contingency Table Vivek Pradhan, Cytel Inc., Cambridge, MA Stian Lydersen, Department of Cancer Research and Molecular Medicine, Norwegian
More informationCluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]
Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of
More informationCOPULA MODELS FOR BIG DATA USING DATA SHUFFLING
COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More informationIntroduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8
More informationBootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping
Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,
More informationChapter 5. Radicals. Lesson 1: More Exponent Practice. Lesson 2: Square Root Functions. Lesson 3: Solving Radical Equations
Chapter 5 Radicals Lesson 1: More Exponent Practice Lesson 2: Square Root Functions Lesson 3: Solving Radical Equations Lesson 4: Simplifying Radicals Lesson 5: Simplifying Cube Roots This assignment is
More informationRandom Number Generation and Monte Carlo Methods
James E. Gentle Random Number Generation and Monte Carlo Methods With 30 Illustrations Springer Contents Preface vii 1 Simulating Random Numbers from a Uniform Distribution 1 1.1 Linear Congruential Generators
More informationBMVC 1996 doi: /c.10.41
On the use of the 1D Boolean model for the description of binary textures M Petrou, M Arrigo and J A Vons Dept. of Electronic and Electrical Engineering, University of Surrey, Guildford GU2 5XH, United
More informationLecture notes on Transportation and Assignment Problem (BBE (H) QTM paper of Delhi University)
Transportation and Assignment Problems The transportation model is a special class of linear programs. It received this name because many of its applications involve determining how to optimally transport
More informationPart 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm
In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.
More information1 Introduction Testing seeks to reveal software faults by executing a program and comparing the output expected to the output produced. Exhaustive tes
Using Dynamic Sensitivity Analysis to Assess Testability Jerey Voas, Larry Morell y, Keith Miller z Abstract: This paper discusses sensitivity analysis and its relationship to random black box testing.
More informationAB AC AD BC BD CD ABC ABD ACD ABCD
LGORITHMS FOR OMPUTING SSOITION RULES USING PRTIL-SUPPORT TREE Graham Goulbourne, Frans oenen and Paul Leng Department of omputer Science, University of Liverpool, UK graham g, frans, phl@csc.liv.ac.uk
More informationSampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation
Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Thomas Mejer Hansen, Klaus Mosegaard, and Knud Skou Cordua 1 1 Center for Energy Resources
More informationGeodesic and parallel models for leaf shape
Geodesic and parallel models for leaf shape Stephan F. Huckemann and Thomas Hotz Institute for Mathematical Stochastics, Georg-August Universität Göttingen 1 Introduction Since more than a decade, many
More informationApproximate Bayesian Computation. Alireza Shafaei - April 2016
Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested
More informationCreating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression
Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationMDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science
MDP Routing in ATM Networks Using the Virtual Path Concept 1 Ren-Hung Hwang, James F. Kurose, and Don Towsley Department of Computer Science Department of Computer Science & Information Engineering University
More informationInstability, Sensitivity, and Degeneracy of Discrete Exponential Families
Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger Pennsylvania State University ONR grant N00014-08-1-1015 Scalable Methods for the Analysis of Network-Based
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationHyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University
Hyperplane Ranking in Simple Genetic Algorithms D. Whitley, K. Mathias, and L. yeatt Department of Computer Science Colorado State University Fort Collins, Colorado 8523 USA whitley,mathiask,pyeatt@cs.colostate.edu
More informationStochastic Function Norm Regularization of DNNs
Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center
More informationTruncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.
Chapter 4: Roundoff and Truncation Errors Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4. 1 Outline Errors Accuracy and Precision
More informationVARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung
POLYTECHNIC UNIVERSITY Department of Computer and Information Science VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung Abstract: Techniques for reducing the variance in Monte Carlo
More informationChapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.
Chapter 8 out of 7 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal 8 Matrices Definitions and Basic Operations Matrix algebra is also known
More informationAPPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES
APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department
More informationApproximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)
Approximate (Monte Carlo) Inference in Bayes Nets Basic idea: Let s repeatedly sample according to the distribution represented by the Bayes Net. If in 400/1000 draws, the variable X is true, then we estimate
More informationBinary Diagnostic Tests Clustered Samples
Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster
More informationAn Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster
More informationOverview. H. R. Alvarez A., Ph. D.
Network Modeling Overview Networks arise in numerous settings: transportation, electrical, and communication networks, for example. Network representations also are widely used for problems in such diverse
More information1 Introduction Complex decision problems related to economy, environment, business and engineering are multidimensional and have multiple and conictin
A Scalable Parallel Algorithm for Multiple Objective Linear Programs Malgorzata M. Wiecek Hong Zhang y Abstract This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear
More informationResampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016
Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation
More informationNetworks for Control. California Institute of Technology. Pasadena, CA Abstract
Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationDierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel
Dierential-Linear Cryptanalysis of Serpent Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haifa 32000, Israel fbiham,orrdg@cs.technion.ac.il 2 Mathematics Department,
More informationWorst-case running time for RANDOMIZED-SELECT
Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case
More informationThe Relative Neighbourhood Graph. of a Finite Planar Set. Godfried T. Toussaint
The Relative Neighbourhood Graph of a Finite Planar Set Godfried T. Toussaint Published in Pattern Recognition, Vol. 12, 1980, pp. 261-268. Winner of an Outstanding Paper Award given by the Pattern Recognition
More informationLab 2: Support Vector Machines
Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates
More informationThe only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori
Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT
More informationDepartment of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley
Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping
More informationJournal of Global Optimization, 10, 1{40 (1997) A Discrete Lagrangian-Based Global-Search. Method for Solving Satisability Problems *
Journal of Global Optimization, 10, 1{40 (1997) c 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Discrete Lagrangian-Based Global-Search Method for Solving Satisability Problems
More informationParameterized Complexity of Independence and Domination on Geometric Graphs
Parameterized Complexity of Independence and Domination on Geometric Graphs Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de
More informationDistances between intuitionistic fuzzy sets
Fuzzy Sets and Systems 4 (000) 505 58 www.elsevier.com/locate/fss Distances between intuitionistic fuzzy sets Eulalia Szmidt, Janusz Kacprzyk Systems Research Institute, Polish Academy of Sciences, ul.
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationFast Fuzzy Clustering of Infrared Images. 2. brfcm
Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.
More informationLower Bounds for Insertion Methods for TSP. Yossi Azar. Abstract. optimal tour. The lower bound holds even in the Euclidean Plane.
Lower Bounds for Insertion Methods for TSP Yossi Azar Abstract We show that the random insertion method for the traveling salesman problem (TSP) may produce a tour (log log n= log log log n) times longer
More informationStatsMate. User Guide
StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with
More informationPhysics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -
Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/
More informationForward Error Correction Codes
Appendix 6 Wireless Access Networks: Fixed Wireless Access and WLL Networks Ð Design and Operation. Martin P. Clark Copyright & 000 John Wiley & Sons Ltd Print ISBN 0-471-4998-1 Online ISBN 0-470-84151-6
More information[5] R. A. Dwyer. Higher-dimensional Voronoi diagrams in linear expected time. Discrete
[5] R. A. Dwyer. Higher-dimensional Voronoi diagrams in linear expected time. Discrete and Computational Geometry, 6(4):343{367, 1991. [6] J. H. Friedman and L. C. Rafsky. Multivariate generalizations
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationISyE 6416: Computational Statistics Spring Lecture 13: Monte Carlo Methods
ISyE 6416: Computational Statistics Spring 2017 Lecture 13: Monte Carlo Methods Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Determine area
More informationEcient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines
Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,
More informationEVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA. Shang-Lin Yang. B.S., National Taiwan University, 1996
EVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA By Shang-Lin Yang B.S., National Taiwan University, 1996 M.S., University of Pittsburgh, 2005 Submitted to the
More informationThe ctest Package. January 3, 2000
R objects documented: The ctest Package January 3, 2000 bartlett.test....................................... 1 binom.test........................................ 2 cor.test.........................................
More informationUsing the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection
Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationBESTFIT, DISTRIBUTION FITTING SOFTWARE BY PALISADE CORPORATION
Proceedings of the 1996 Winter Simulation Conference ed. J. M. Charnes, D. J. Morrice, D. T. Brunner, and J. J. S\vain BESTFIT, DISTRIBUTION FITTING SOFTWARE BY PALISADE CORPORATION Linda lankauskas Sam
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationNOTATION AND TERMINOLOGY
15.053x, Optimization Methods in Business Analytics Fall, 2016 October 4, 2016 A glossary of notation and terms used in 15.053x Weeks 1, 2, 3, 4 and 5. (The most recent week's terms are in blue). NOTATION
More informationMonte Carlo Methods. Lecture slides for Chapter 17 of Deep Learning Ian Goodfellow Last updated
Monte Carlo Methods Lecture slides for Chapter 17 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2017-12-29 Roadmap Basics of Monte Carlo methods Importance Sampling Markov Chains
More informationThe Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a
Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris
More informationInference for loglinear models (contd):
Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory
More informationDynamic Programming. Outline and Reading. Computing Fibonacci
Dynamic Programming Dynamic Programming version 1.2 1 Outline and Reading Matrix Chain-Product ( 5.3.1) The General Technique ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Dynamic Programming version 1.2 2 Computing
More informationSimulation. Monte Carlo
Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run is always random A single instance of a random variable Goal of a simulation experiment is to get knowledge about
More informationBayesian Robust Inference of Differential Gene Expression The bridge package
Bayesian Robust Inference of Differential Gene Expression The bridge package Raphael Gottardo October 30, 2017 Contents Department Statistics, University of Washington http://www.rglab.org raph@stat.washington.edu
More informationOn the Number of Tilings of a Square by Rectangles
University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange University of Tennessee Honors Thesis Projects University of Tennessee Honors Program 5-2012 On the Number of Tilings
More informationCondence Intervals about a Single Parameter:
Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationExact Sampling for Hardy- Weinberg Equilibrium
Exact Sampling for Hardy- Weinberg Equilibrium Mark Huber Dept. of Mathematics and Institute of Statistics and Decision Sciences Duke University mhuber@math.duke.edu www.math.duke.edu/~mhuber Joint work
More informationLevel-set MCMC Curve Sampling and Geometric Conditional Simulation
Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve
More informationHandbook of Statistical Modeling for the Social and Behavioral Sciences
Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationMultidimensional Scaling Methods For Many-Object Sets: a. Review. L. Tsogo, M.H. Masson. Universite de Technologie de Compiegne
Multidimensional Scaling Methods For Many-Object Sets: a Review L. Tsogo, M.H. Masson Centre de Recherches de Royallieu UMR CNRS 6599 Universite de Technologie de Compiegne B.P. 20529-60205 Compiegne cedex
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More information