Section 4.2: Discrete Data Histograms

Similar documents
Discrete-Event Simulation:

Section 2.3: Monte Carlo Simulation

Package simed. November 27, 2017

Programming Assignment #4 Arrays and Pointers

Probability Models.S4 Simulating Random Variables

SIMULATION FUNDAMENTALS

BASIC SIMULATION CONCEPTS

Section 5.2: Next Event Simulation Examples

Section 6.5: Random Sampling and Shuffling

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

C Functions. 5.2 Program Modules in C

Ch6: The Normal Distribution

Scientific Computing: An Introductory Survey

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

Descriptive and Graphical Analysis of the Data

Probability and Statistics for Final Year Engineering Students

Using the DATAMINE Program

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

Lecture Notes 3: Data summarization

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Tutorial 12 Craps Game Application: Introducing Random Number Generation and Enumerations

Chapter 2 Random Number Generators

CHAPTER 2: Describing Location in a Distribution

IT 403 Practice Problems (1-2) Answers

Section 9: One Variable Statistics

Testing Random- Number Generators

Math 574 Review Exam #1

What We ll Do... Random

Discrete Mathematics 2 Exam File Spring 2012

Chapter 4: Implicit Error Detection

Review Chapter 6 in Bravaco. Short Answers 1. This type of method does not return a value. a. null b. void c. empty d. anonymous

Model validation T , , Heli Hiisilä

Functions. Computer System and programming in C Prentice Hall, Inc. All rights reserved.

1. To condense data in a single value. 2. To facilitate comparisons between data.

Model selection and validation 1: Cross-validation

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

CHAPTER 5 NEXT-EVENT SIMULATION

Week 3. Sun Jun with slides from Hans Petter Langtangen

STA 570 Spring Lecture 5 Tuesday, Feb 1

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Discrete-Event Simulation:

Lecture 19. Topics: Chapter 9. Simulation and Design Moving to graphics library Unit Testing 9.5 Other Design Techniques

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Last Time: Rolling a Weighted Die

Distributions of Continuous Data

Student Responsibilities. Mat 2170 Week 9. Notes About Using Methods. Recall: Writing Methods. Chapter Six: Objects and Classes

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

UNIT 9A Randomness in Computation: Random Number Generators

Frequency Distributions

Stochastic Algorithms

Quantitative - One Population

Chapter 1. Introduction

Chapter 6: DESCRIPTIVE STATISTICS

Problem One: A Quick Algebra Review

Generating random samples from user-defined distributions

2016 Stat-Ease, Inc. & CAMO Software

Activity 7. Modeling Exponential Decay with a Look at Asymptotes. Objectives. Introduction. Modeling the Experiment: Casting Out Sixes

Section 3.2: Multi Stream Lehmer RNGs

CSE 230 Computer Science II (Data Structure) Introduction

Design of Experiments

Blocking Combinatorial Games

Random Numbers. Chapter 14

VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung

CPSC 427: Object-Oriented Programming

Dealing with Categorical Data Types in a Designed Experiment

1.1 - Introduction to Sets

Simulation. Chapter Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall

Line Arrangement. Chapter 6

Monte Carlo Simula/on and Copula Func/on. by Gerardo Ferrara

BIOL Gradation of a histogram (a) into the normal curve (b)

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009

WELCOME! Lecture 3 Thommy Perlinger

Data Collection, Preprocessing and Implementation

EE109 Lab Need for Speed

Chapter Four: Loops II

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Practical Questions CSCA48 Winter 2018 Week 11

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Dr M Kasim A Jalil. Faculty of Mechanical Engineering UTM (source: Deitel Associates & Pearson)

lecture 19: monte carlo methods

Random Walks. Dieter W. Heermann. Monte Carlo Methods. Dieter W. Heermann (Monte Carlo Methods) Random Walks / 22

Mat 2170 Week 9. Spring Mat 2170 Week 9. Objects and Classes. Week 9. Review. Random. Overloading. Craps. Clients. Packages. Randomness.

Data Handling. Moving from A to A* Calculate the numbers to be surveyed for a stratified sample (A)

Lecture 04 FUNCTIONS AND ARRAYS

UNIT 1A EXPLORING UNIVARIATE DATA

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Chapter Goals. Contents LOOPS

Using AIMMS for Monte Carlo Simulation

Sampling and Monte-Carlo Integration

CSE123. Program Design and Modular Programming Functions 1-1

Objective 1: To simulate the rolling of a die 100 times and to build a probability distribution.

Lecture 6: Chapter 6 Summary

hp calculators HP 9g Probability Random Numbers Random Numbers Simulation Practice Using Random Numbers for Simulations

Transcription:

Section 4.2: Discrete Data Histograms Discrete-Event Simulation: A First Course c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 1/ 21

Section 4.2: Discrete-Data Histograms Given a discrete-data sample multiset S = {x 1,x 2,...,x n } with possible values X, the relative frequency is ˆf (x) = the number of x i S with x i = x n A discrete-data histogram is a graphical display of ˆf (x) versus x If n = S is large relative to X then values will appear multiple times Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 2/ 21

Example 4.2.1 Program galileo was used to replicate n = 1000 rolls of three dice Discrete-data sample is S = {x 1,x 2,...,x 1000 } Each x i is an integer between 3 and 18, so X = {3,4,...,18} Theoretical probabilities: s (limit as n ) ˆf(x) 0.15 0.10 0.05 0.00 3 8 13 18 x 3 8 13 18 x Since X is known a priori, use an array Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 3/ 21

Example 4.2.2 Suppose 2n = 2000 balls are placed at random into n = 1000 boxes: Example 4.2.2 n = 1000; for (i = 1; i<= n; i++) { /* i counts boxes */ x i = 0; } for (j = 1; j<= 2*n; j++) { /* j counts balls */ i = Equilikely(1, n); /* pick a box at random */ x i ++; /* then put a ball in it */ } return x 1, x 2,...,x n ; Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 4/ 21

Example 4.2.2 S = {x 1,x 2,...,x n }, x i is the number of balls placed in box i x = 2.0 Some boxes will be empty, some will have one ball, some will have two balls, etc. For seed 12345: 0.3 ˆf(x) 0.2 0.1 0.0 0 1 2 3 4 5 6 7 8 9 x 0 1 2 3 4 5 6 7 8 9 x Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 5/ 21

Histogram Mean and Standard Deviation The discrete-data histogram mean is x = x X xˆf (x) The discrete-data histogram standard deviation is ( ) s = (x x) 2ˆf (x) or s = x 2ˆf (x) x 2 x x The discrete-data histogram variance is s 2 Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 6/ 21

Histogram Mean versus Sample Mean By definition, ˆf (x) 0 for all x X and ˆf (x) = 1 From the definition of S and X, x n x i = x xnˆf (x) and n (x x) 2 nˆf (x) i=1 i=1(x i x) 2 = x The sample mean/standard deviation is mathematically equivalent to the discrete-data histogram mean/standard deviation If frequencies ˆf ( ) have already been computed, x and s should be computed using discrete-data histogram equations Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 7/ 21

Example 4.2.3 For the data in Example 4.2.1 (three dice): v 18 X x = xˆf (x) ux = 10.609 and s = t 18 (x x) 2ˆf (x) = 2.925 x=3 x=3 For the data in Example 4.2.2 (balls placed in boxes) v 9X ux x = xˆf (x) = 2.0 and s = t 9 (x x) 2ˆf (x) = 1.419 x=0 x=0 Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 8/ 21

Algorithm 4.2.1 Given integers a,b and integer-valued data x 1,x 2,... the following computes a discrete-data histogram: Algorithm 4.2.1 long count[b-a+1]; n = 0; for (x = a; x <= b; x++) count[x-a] = 0; outliers.lo = 0; outliers.hi = 0; while ( more data ) { x = GetData(); n++; if ((a <= x) and (x <= b)) count[x-a]++; else if (a > x) outliers.lo++; else outliers.hi++; } return n, count[], outliers /* ˆf (x) is (count[x-a] / n) */ Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 9/ 21

Outliers Algorithm 4.2.1 allows for outliers Occasional x i outside the range a x i b Necessary due to array structure Outliers are common with some experimentally measured data Generally, valid discrete-event simulations should not produce any outlier data Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 10/ 21

General-Purpose Discrete-Data Histogram Algorithm Algorithm 4.2.1 is not a good general purpose algorithm: For integer-valued data, a and b must be chosen properly, or else Outliers may be produced without justification The count array may needlessly require excessive memory For data that is not integer-valued, algorithm 4.2.1 is not applicable We will use a linked-list histogram Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 11/ 21

Linked-List Discrete-Data Histograms head value count next value count next...... value count next Algorithm 4.2.2 Initialize the first list node, where value = x 1 and count = 1 For all sample data x i, i = 2,3,...,n Traverse the list to find a node with value == x i If found, increase corresponding count by one Else add a new node, with value = x i and count = 1 Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 12/ 21

Example 4.2.4 Discrete data sample S = {3.2, 3.7, 3.7, 2.9, 3.7, 3.2, 3.7, 3.2} Algorithm 4.2.2 generates the linked list: head 3.2 3 next 3.7 4 next...... 2.9 1 next Node order is determined by data order in the sample Alternatives: Use count to maintain the list in order of decreasing frequency Use value to sort the list by data value Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 13/ 21

Program ddh Generates a discrete-data histogram, based on algorithm 4.2.2 and the linked-list data structure Valid for integer-valued and real-valued input No outlier checks No restriction on sample size Supports file redirection Assumes X is small, a few hundred or less Requires O( X ) computation per sample value Should not be used on continuous samples (see Section 4.3) Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 14/ 21

Example 4.2.5 Construct a histogram of the inventory level (prior to review) for the simple inventory system Remove summary statistics from sis2 Print the inventory within the while loop in main: Printing Inventory. index++; printf( % ld \ n, inventory); /* This line is new */ if (inventory < MINIMUM) {. If the new executable is sis2mod, the command sis2mod ddh > sis2.out will produce a discrete-data histogram file sis2.out Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 15/ 21

Example 4.2.6 Using sis2mod to generate 10 000 weeks of sample data, the inventory level histogram from sis2.out can be constructed ˆf(x) 0.020 0.015 0.010 0.005 0.000 x ± 2s 30 20 10 0 10 20 30 40 50 60 70 x x denotes the inventory level prior to review x = 27.63 and s = 23.98 About 98.5% of the data falls within x ± 2s Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 16/ 21

Example 4.2.7 Change the demand to Equilikely(5, 25)+Equilikely(5, 25) in sis2mod and construct the new inventory level histogram ˆf(x) 0.020 0.015 0.010 0.005 0.000 x ± 2s 30 20 10 0 10 20 30 40 50 60 70 x More tapered at the extreme values x = 27.29 and s = 22.59 About 98.5% of the data falls within x ± 2s Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 17/ 21

Example 4.2.8 Monte Carlo simulation was used to generate 1000 point estimates of the probability of winning the dice game craps Sample is S = {p 1,p 2,...,p 1000 } with p i = # wins in N plays N X = {0/N,1/N,...,N/N} Compare N = 25 and N = 100 plays per estimate X is small, can use discrete-data histograms Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 18/ 21

Histograms of Probability of Winning Craps 0.20 0.15 ˆf(p) 0.10 0.05 0.00 0.10 N = 25 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p ˆf(p) 0.05 N = 100 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p N = 25: p = 0.494 s = 0.102 N = 100: p = 0.492 s = 0.048 Four-fold increase in number of replications produces two-fold reduction in uncertainty Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 19/ 21

Empirical Cumulative Distribution Functions Histograms estimate distributions Sometimes, the cumulative version is preferred: Useful for quantiles ˆF(x) = the number of x i S with x i x n Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 20/ 21

Example 4.2.9 The empirical cumulative distribution function for N = 100 from Example 4.2.8 (in four different styles): ˆF(p) 1.0 0.5 0.0 0.3 0.4 0.5 0.6 0.7 p ˆF(p) 1.0 0.5 0.0 0.3 0.4 0.5 0.6 0.7 p 1.0 1.0 ˆF(p) 0.5 ˆF(p) 0.5 0.0 0.3 0.4 0.5 0.6 0.7 p 0.0 0.3 0.4 0.5 0.6 0.7 p Discrete-Event Simulation: A First Course Section 4.2: Discrete Data Histograms 21/ 21