What We ll Do... Random- number generation Random Number Generation Generating random variates Nonstationary Poisson processes Variance reduction Sequential sampling Designing and executing simulation experiments Random-Number Generators (RNGs) Algorithm to generate independent, identically distributed draws from the continuous UNIF (0, ) distribution f(x) These are called Random Number Generators in simulation 0 Basis for generating observations from all other distributions and random processes Transform random numbers in a way that depends on the desired distribution or process It s essential to have a good RNG There are a lot of bad RNGs this is very tricky Methods, coding are both tricky x Properties of RNG Numbers produced should appear to be distributed uniformly on [0,] and should not exhibit any correlation with each other. Generator should be fast. Should be able to reproduce a given stream of random numbers: For debugging, and To use identical random numbers in simulating different systems in order to obtain a more precise comparison Should be a provision in the generator to easily reproduce separate streams of random numbers. The Nature of RNGs Recursive formula (algorithm) Start with a seed (or seed vector) Do something weird to the seed to get the next one Repeat generate the same sequence Will eventually repeat (cycle) want long cycle length Not really random as in unpredictable Want to design RNGs Long cycle length Good statistical properties (uniform, independent) -- tests Fast Streams (subsegments) many and long (for variance reduction later) Linear Congruential Generators (LCGs) The most common of several different methods Generate a sequence of integers Z, Z 2, Z 3, via the recursion Z i = (a Z i + c) (mod m) If c > 0 : mixed LCGs and If c = 0 : multiplicative LCGs a, c, and m are carefully chosen constants Specify a seed Z 0 to start off mod m means take the remainder of dividing by m as the next Z i All the Z i s are between 0 and m Return the ith random number as U i = Z i / m
Example of LCG Parameters m = 63, a = 22, c = 4, Z 0 = 9: Z i = (22 Z i + 4) (mod 63), seed with Z 0 = 9 Another Example of LCG m = 6, a = 5, c = 3, Z 0 = 7 LCG Z i = (5 Z i- + 3)(mod 6) with Z 0 = 7 i 22 Z i +4 Z i U i 0 9 422 44 0.6984 2 972 27 0.4286 3 598 3 0.492 4 686 56 0.8889 : :: : 6 58 32 0.5079 62 708 5 0.238 63 334 9 0.306 64 422 44 0.6984 65 972 27 0.4286 66 598 3 0.492 : :: : Cycling will repeat forever Cycle length is at most m (could be << m depending on parameters) Pick m BIG But that might not be enough for good statistical properties Z = (22*9 + 4) (mod 63) = (48 + 4) mod 63 = 422 mod 63 = 44 U = 44 / 63 = 0.6984 i 0 2 3 4 Z i 7 6 8 U i - 0.375 0.063 0.500 0.688 Issues with LCGs Cycle length: < m Typically, m = 2. billion (= 2 3 ) or more Which used to be a lot not anymore Other parameters chosen so that cycle length = m or m Statistical properties Uniformity, independence There are many tests of RNGs Empirical tests Theoretical tests lattice structure (next slide ) Speed, storage both are usually fine Must be carefully, cleverly coded BIG integers Reproducibility streams (long internal subsequences) with fixed seeds Issues with LCGs (cont d.) Regularity of LCGs (and other kinds of RNGs): For the earlier LCG Plot of U i vs. i Plot of U i+ vs. U i Random Numbers Fall Mainly in the Planes George Marsaglia Design RNGs: dense lattice in high dimensions Other kinds of RNGs longer memory in recursion, combination of several RNGs The Current Arena RNG Uses some of the same ideas as LCG Modulo division, recursive on earlier values But is not an LCG Combines two separate component generators Recursion involves more than just the preceding value Combined multiple recursive generator (CMRG) A n = (403580 A n-2 80728 A n-3 ) mod 4294967087 B n = (52762 B n- 370589 B n-3 ) mod 4294944443 Z n = (A n B n ) mod 4294967087 Combine the two Z n / 4294967088 if Z n > 0 U n = 4294967087 / 4294967088 if Z n = 0 Seed = a six-vector of first three A n s, B n s The next random number Two simultaneous recursions The Current Arena RNG Properties Extremely good statistical properties Good uniformity Cycle length = 3. 0 57 To cycle, all six seeds must match up Only slightly slower than old LCG And RNG is usually a minor part of overall computing time
Mixed Generators Other RNGs Choose m = 2 b, where b is the bits in the computer system. 32 bit system has b = 3 (left most bit is sign bit) That will give m = 2 3 > 2. billion. Multiplicative Generators Addition of c is not needed Instead of m = 2 b, let m = a large prime number < 2 3 Combined LCGs Combine 2 or more multiplicative LCGs so as to have good statistical properties and longer period. Tests for Random Numbers Need to test uniformity and independence Frequency test Kolmogorov Smirnov (K-S) test or Chi-Square test to compare distribution of the set of numbers generated to a uniform distribution Runs Test Uses Chi-Square test to compare the runs above and below the mean by comparing actual values with expected values Autocorrelation Test Tests the correlation between numbers and compares the sample correlation with expected correlation of zero Gap Test Counts the number of digits that appear between repetitions of a particular digit and then uses K-S test to compare with the expected size of gaps Generating Random Variates Generating from Discrete Distributions How to transform a uniform distribution between 0 and into draws from the input probability distributions you want for your model? Have: Desired input distribution for model (fitted or specified in some way), and RNG (UNIF (0, )) Want: Transform UNIF (0, ) random numbers into draws from the desired input distribution Method: Mathematical transformations of random numbers to deform them to the desired distribution Specific transform depends on desired distribution Do discrete, continuous distributions separately Example: probability mass function Divide [0, ] Into subintervals of length 0., 0.5, 0.4 Generate U ~ UNIF (0, ) See which subinterval it s in Return X = corresponding value 2 0 3 Discrete Generation: Another View Example Plot cumulative distribution function; generate U and plot on vertical axis; read across and down Inverting the CDF Equivalent to earlier method The discrete RV X has the following probability function: k P (X = k) 2 3 4 5 0.30 0.35 0.20 0.0 0.05 Use the uniform random number U = 0.79 to generate an observation for X 0.00 to 0.30 = 0.30 to 0.65 = 2 0.65 to 0.85 = 3 U = 0.79 X = 3
Continuous Distributions Exponential The exponential distribution probablity density function (PDF) is given by λe λx x 0 f(x) = 0 x < 0 Cumulative Distribution Function (CDF) is given by For any i F(x) = - e λx 0 x 0 x < 0 E(X i ) = / λ = mean interarrival time. Goal is to develop a procedure for generating values X, X 2, etc λ = mean number of occurrences per time unit (arrival rate) EXPO (5) distribution Density (PDF) Distribution (CDF) Example General algorithm Inverse Transform Technique:. Generate a random number U ~ UNIF(0, ) 2. Set U = F(X) and solve for X = F (U) Solving analytically for X may or may not be simple (or possible) Sometimes use numerical approximation to solve Example (cont d.) Uniform Distribution Set U = F(X) = e X/5 e X/5 = U X/5 = ln ( U) X = 5 ln ( U) Picture (inverting the CDF, as in discrete case): PDF: CDF: f(x) = F(x) = a x b b a 0 otherwise 0 x a b a x < a a x b x > b F(x) = (X a) / (b a) = U X = a + (b a) U X is uniformly distributed on the interval [a, b]. PDF: f(x) = CDF: For 0 X For X 2 Triangular Distribution F(x) = U = X 2 / 2 U = (2 X) 2 2 x 0 x 2 x < x 2 0 otherwise 0 x 0 x 2 / 2 0 < x ((2 x) 2 / 2) < x 2 x > 2 Nonstationary Poisson Processes Many systems have externally originating events affecting them e.g., arrivals of customers If process is stationary over time, usually specify a fixed interevent- time distribution But process could vary markedly in its rate Fast-food lunch rush Freeway rush hours Ignoring nonstationarity can lead to serious model and output errors
Nonstationary Poisson Processes Definition Usual model: nonstationary Poisson process: Have a rate function l(t) Number of events in [t, t 2 ] ~ Poisson with mean λ(t) Nonstationary Poisson Processes Estimating the Rate Function Estimation of the rate function Probably the most practical method is piecewise constant Decide on a time interval within which rate is fixed Estimate from data the (constant) rate during each interval Be careful to get the units right Issues: How to estimate rate function? Given an estimate, how to generate during simulation? t Other (more complicated) methods exist in the literature Variance Reduction Random input random output (RIRO) In other words, output has variance Higher output variance means less precise results Would like to eliminate or reduce output variance One (bad) way to eliminate: replace all input random variables by constants (like their mean) Will get rid of random output, but will also invalidate model Thus, best hope is to reduce output variance Easy (brute- force) variance reduction: just simulate some more Terminating: additional replications Steady-state: additional replications or a longer run Variance Reduction (cont d.) But sometimes can reduce variance without more runs Key: unlike physical experiments, can control randomness in computer-simulation experiments via manipulating the RNG Re-use the same random numbers either as they were, in some opposite sense, or for a similar but simpler model Induce certain kinds of correlations to exploit to take advantage to reduce the variance of the output. Several different variance-reduction techniques Classified into categories common random numbers, antithetic variates, control variates, indirect estimation, Usually requires thorough understanding of model, code Will look only at common random numbers in detail Common Random Numbers (CRN) Applies when objective is to compare two (or more) alternative configurations or models Interest is in difference(s) of performance measure(s) across alternatives Example: A. Base case (as is) B. 3.5% increase in business (interarrival-time mean falls from 3 to 2.56 minutes) Same run conditions The Usual Comparison Run case A, make the change to get to case B and run it, then Compare Means via Output Analyzer: Difference is not statistically significant Were the runs of A and B statistically independent? Did we use the same random numbers running A and B?
CRN Intuition Get sharper comparison if you subject all alternatives to the same conditions Then observed differences are due to model differences rather than random differences in the conditions For both A and B runs, cause: The same parts arrive at the same times Be assigned same attributes (job type) Have the same process times at each step Then observed differences will be attributable to system differences, not random bounce There isn t any random bounce Synchronization of Random Numbers in CRN One approach is to dedicate a stream of random numbers to each place in the model where variates are needed. extra parameter in r.v. calls Fairly simple but might not ensure complete synchronization Another approach is to assign to each entity, immediately upon its arrival, attribute values for all possible processing times, branching decisions, etc. when the entity needs one of these values, you just read it out of the appropriate attribute of the entity instead of generating it on the spot. Takes lot of computer memory Sequential Sampling Always try to quantify imprecision in results If imprecision is small enough, you re done If not, need to do something to increase precision Just saw one way: variance-reduction techniques Obvious way to increase precision: keep simulating one more step at a time, quit when you achieve desired precision Terminating models: step = another replication Cannot extend length of replications that s part of the model Steady-state models: step = another replication if using truncated replications, or step = some extension of the run if using batch means