Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -

Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu

Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/ This Week final homework #8 assigned this week, due April 13, 2011 define and outline course projects this week - title - abstract - plans for course project work: what do you plan to calculate?

Statistics & Numerical Methods Topics Monte Carlo techniques random numbers review maximum likelihood least square method hypothesis testing goodness of fit

Course Projects Define your course project this week are your interested in detectors? data analysis and statistical techniques? numerical methods Goals for course project address a physics problem, provide physics background (e.g. Higgs search, Dama) apply one or more of the topics from this course (detectors, background, statistics) Timeline choose topic - by April 4, 2011 Latex outline - by April 8, 2011 title + abstract bullet points of what you are going to do some literature references See instructions on course website. Please email and contact me with any questions!

Review of Homework Karsten Heeger, Univ. of Wisconsin NUSS, July 13, 2009

Statistical Distributions binomial Poisson Gaussian chisquare distribution

Statistical Distributions Gaussian A P (x) = 1 σ (x µ) 2 (2π) e 2σ 2 binomial B P (r) = N! r!(n r)! pr (1 p) N r Poisson chisquare distribution C D P (r) = µr e µ r! P (u)du = (u/2)(u/2) 1 e u/2 2Γ(ν/2) du

Probability and Statistics Poisson distribution cumulative distribution

Error Propagation for linearly independent variables

Error Propagation

Standard Deviation of the Mean

Central Limit Theorem If xi are a set of n independent variables of mean μ and variance σ 2 then for large n: y = x i n will tend to a Gaussian with mean μ and variance σ 2 /n This is true even if xi come from distributions with different means μi and variance σi 2 In this case mean = µ i n variance = σ 2 i n

Random Numbers Monte Carlo Techniques Karsten Heeger, Univ. of Wisconsin NUSS, July 13, 2009

Random Numbers & Monte Carlo Techniques Who has used a Monte Carlo before? Who has written a Monte Carlo before? what are elements of Monte Carlo?

Random Numbers & Monte Carlo Techniques Monte Carlo (MC) refers to any procedure that makes use of random numbers MC methods are used in simulation of natural phenomena simulation of experimental apparatus numerical analysis (e.g. integration of many variables)

Random Numbers & Monte Carlo Techniques Simulating Data or Experiment

Random Numbers & Monte Carlo Techniques Simulating Radioactive Decay

Random Numbers & Monte Carlo Techniques Estimating the Area of a Circle 1.00 0.20 4.20 ** * ** ++?+ ++ + + ++ 't-.r o o F o z -o.60-1.00-1.00 4.60 4.20 0.20 0.60 1.00 x 2.s 2.7 3.0 3.2 3.5 3.7 Area of circle hits from 100 pairs of random numbers uniformly distributed between -1 and +1 # of hits inside circle give area estimate circle area estimates obtained from 100 MC runs, each with 100 pairs of random numbers. Gaussian curve based on mean and standard deviation of 100 estimated areas.

Random Numbers & Monte Carlo Techniques Random Numbers What is a random number? Is 3 a random number?

Random Numbers & Monte Carlo Techniques Random Numbers What is a random number? Is 3 a random number? No such thing as a single random number. A sequence of random numbers or a set of numbers that have nothing to do with the other numbers in the sequence. In a uniform distribution of random numbers in the range of [0,1] every number has the sam chance of turning up. 00001 is as likely as 0.5.

Random Numbers & Monte Carlo Techniques How to Generate Random Numbers chaotic system e.g. lottery random process radioactive decay thermal noise cosmic ray arrival random number tables computer code

Random Numbers & Monte Carlo Techniques Random Number Tables

Random Numbers & Monte Carlo Techniques How to Generate Random Numbers all algorithms produce a periodic sequence of numbers sequence of numbers in a uniform distribution in the range [0,1] algorithms generate integers between 0 and M and return a real value x n = I n /M to obtain effectively random values, use small subset of a single period e.g. Mersenne twister algorithm = long period 2 19937 1

Random Numbers & Monte Carlo Techniques How to Generate Random Numbers Middle Square, Von Neumann, 1946 generate a sequence of 10 digit integers, start with one, square it, and then take the middle 10 digits from answer as next number in sequence sequence is not random since each number is completely determined from previous one. but it appears random. a more complex algorithm does not lead to a better random sequence. it is better to use an algorithm that is well understood.

Random Numbers & Monte Carlo Techniques RANDU from IBM in 1960s RANDU 2D I n+1 =(65539 I n )mod2 31 RANDU 3D

Random Numbers & Monte Carlo Techniques How to Generate Random Numbers not all random number generators are good! For example, in ROOT TRandom3 recommended by ROOT TRandom too short of a period For example, in Numerical Recipes authors have admitted that RAN1 and RAN2 in first edition are mediocre generators ran0, ran1, ran2 are much better in second edition

Random Numbers & Monte Carlo Techniques How to Improve Generators improve behavior and increase period ny modifying algorithms I n =(a I n 1 + b I n 2 )mod m this has 2 initial seeds and can have a period greater than m RABMAR generator in CERNLIB requires 103 seeds. the ultimate random number generator.

Random Numbers & Monte Carlo Techniques Simulating Distributions so far we have only considered random number in [0,1] more complicated problems generally require random numbers generated according to specific distributions we can generate random numbers according to certain distributions (e.g Poisson for radioactive decay) Goal: obtain a random deviate x from any probability density distribution function f(x) can use special purpose algorithms. use numerical libraries and routines. we will discuss 2 techniques here...

Acceptance/Rejection Method (von Neumann) Problem: generate a series of random numbers, xi, which follow a distribution f(x) Method: choose trial value, xtrial. accept with probability f(xtrial) choose trial x with random number λ1 x trial = x min +(x max x min )λ 1 random points are chosen inside the box and rejected if the ordinate exceeds f(x)

Acceptance/Rejection Method (von Neumann) random points are chosen inside the box and rejected if the ordinate exceeds f(x) bounding region is method to increase efficiency efficiency of method = ratio of areas keep Ch(x) as close as possible to f(x) Method applicable if - f(x) is too complex for other techniques - f(x) can be computed beware of normalization

Acceptance/Rejection Method (von Neumann) rejection algorithm is not efficient if the distribution has one or more larger peaks (or poles). in this case trial events are seldomly accepted. algorithm does not work when the range of x is [-, + ]

Inverse Transform Method applicable for simple distribution functions Method probability density function is f(x) in [-, + ] integrated probability up to point a is F(a) for x a F(a) is itself a random variable which will occur with uniform probability density on [0,1] we can find a unique x for a given u if u = F (x) provided we can find inverse x = F (u) 1

Inverse Transform Method Use of a random number u chosen from a uniform distribution [0,1] to find a random number x from a distribution with cumulative distribution function F(x) PDG

Inverse Transform Method Practical Method 1. normalize distribution function so that it becomes a probability distribution function (PDF) 2. integrate PDF from xmin to arbitrary x. this is probability of choosing a value less than x. 3. equate this to a uniform random number and solve for x. the resulting x will be distributed according to PDF. in other words, solve following equation for x given a uniform random number λ x x f(x)dx min xmax f(x)dx = λ x min

Inverse Transform Method convenient when you can calculate the inverse function e.g. exp(x), (1-x) n, 1/(1+x 2 ) there are som packages that do this for you. e.g. UNU.RAN in ROOT Examples generate x between 0 and 4 according to f(x) =x 0.5 generate x between 0 and according to f(x) =e x

Random Numbers & Monte Carlo Techniques What if rejection technique is impractical and you cannot invert the integral of the distribution function? Replace the distribution function f(x) by an approximate form f* (x)for which the inversion technique can be applied Generate trial values for x with inversion technique according to f*(x), and accept trial value with probability proportional to weight w = f(x)/f (x) f (x) rejection technique = special case where f*(x) is constant

Random Numbers & Monte Carlo Techniques Multidimensional Simulation (simulating a distribution in more than one dimension) if distribution is separable then variables are uncorrelated, each can be generated as before f(x, y) =g(x)h(y) generate x according to g(x) and y according to h(y) otherwise, distribution along each dimension needs to be calculated ymax D x (x) = f(x, y)dy y min f (x, y)dx find approximate distribution so that f (x, y)dy are invertible weights for trial events are given by w = f(x, y) f (x, y)

Monte Carlo Numbering Scheme To facilitate interfacing between event generators, detector simulators, and analysis packages used in particle physics

Maximum Likelihood Method of Least Squares Karsten Heeger, Univ. of Wisconsin NUSS, July 13, 2009

Maximum Likelihood Estimation general method of parameter estimation when functional form of parent distribution is known for large samples the ML estimators are normally distributed, and hence the variances are easy to determine for small samples, ML estimators possess most good properties n measurements xi of a quantity with probability density function f(x θ) L(x 1,x 2,... θ) =Πf(x i,θ) estimate ˆθ is the value which maximizes L

Maximum Likelihood Estimation L(x 1,x 2,... θ) =Πf(x i,θ) Since L and lnl attain their maximum values at the same point one usually uses lnl since sums are easier to work with than products: lnl = i ln(f(x i θ)) Normally point of maximum likelihood is found numerically.

Maximum Likelihood Estimation Properties of ML estimators invariant under parameter transformation consistent estimators converge on true parameter unbiased sometimes biased for finite samples. ˆθ efficient may be unbiased but u(θ) ˆ may be biased if a sufficient estimator exists, the ML method will produce it

Examples of Likelihood Distributions central values and 1σ intervals uncertainty is deduced from position where lnl is reduced by 1/2 lnl(ˆθ + σ(ˆθ)) = lnl max 0.5 even applies for non-gaussian likelihood

Examples of Likelihood Distributions central values and 1σ intervals asymmetric errors e.g. 4.0 +2.5 1.25

Likelihood for Two Parameters L(x θ 1,θ 2 ) θ 1,θ 2 Given plot contours of constant likelihood in the plane To find the uncertainty, plot contour with lnl(ˆθ + σ(ˆθ)) = lnl max 0.5 and look at the projection of the contour on the two axes. Using correct method, uncertainties do not depend on correlation of variables.

Likelihood Function and Binned Data Application of ML Method to Binned Data If the sample is very large and f(x θ) is complex, computation can be reduced by grouping the sample into bins and write L as product of probability of finding n entries in each bin. L(n 1,n 2,...) θ) =n!π(n i!) 1 p n i i pi = probability for bin i lnl = n i lnp i (θ) There will be some loss of information by binning the data, but as long as the variation in f across each bin is small there should be no great loss in precision of ˆθ

Method of Least Squares relate data and model frequently used method for parameter estimation but no general optimal properties to recommend it if parameter dependence is linear, method of least squares (LS) produced unbiased estimators of minimum variance ( yi f(x i θ j ) ) 2 S = i σ i if data are Gaussian distributed then LS is equivalent to ML method if in addition, observables are linear functions of the parameters, the S will follow distribution χ 2

Method of Least Squares Degrees of Freedom If data is Gaussian distributed then S follows a distribution with N degrees of freedom. N data points in general n degrees of freedom N data points, m parameters of linear model N-m degrees of freedom

Data with Error Bars For ±1σ, 1/3 of data should be outside fit

Method of Least Squares Degrees of Freedom

Method of Least Squares Application of LS Method to Binned Data If the data is split into N bins, with n i entries in bin i, and p i ( ) is the probability of an event to populate bin i, then the expected number of events in each bin is given by, f i = n p i where n = i=1 N n i If the number of bins is large enough, the error matrix is diagonal and the LS method reduces to minimising X 2 = i=1 N (ni -f i ) 2 / i 2 N i=1 (ni -f i ) 2 /f i which can be done numerically.

Method of Least Squares Application of LS Method to Binned Data X 2 = i=1 N (ni -f i ) 2 / i 2 N i=1 (ni -f i ) 2 /f i 2 Sometimes i is approximated by n i, but the estimates found this way are more sensitive to statistical fluctuations. (For large sample sizes the two choices give the same result.) Since 1 degree of freedom has been lost due to the normalisation condition, n i =n, X 2 min would follow f( 2 N-1-L) if the model consisted of L independent parameters.

Method of Least Squares Binned data two common choices - equal width - equal probability Must not choose binning to make S as small as possible. In this case it would no longer follow χ 2 distribution. Necessary to have several entries in bin to approximate Gaussian statistics (e.g. more than 5 entries).

Method of Least Squares Goodness of Fit Least squares is a measure of the agreement between the fitted quantities and the measurements.

Method of Least Squares X 2 Distribution probability density function cumulative distribution function

Method of Least Squares Example of X 2 Test 60 50 40 30 A simulated data sample is shown with distribution function that was not used to generate the data. There are 20 bins. Distribution function was normalized to match number of events in data sample. 20 10 0 0 0.5 1 1.5 2 Does the model fit the data? The value of χ 2 for this distribution is 25.2 for 19 d.o.f. resulting in P χ 2 =0.16

Goodness of Fit Tests Example of X 2 Test 60 50 40 30 A simulated data sample is shown with distribution function that was not used to generate the data. There are 20 bins. Distribution function was normalized to match number of events in data sample. 20 10 0 0 0.5 1 1.5 2 Does the model fit the data? The value of χ 2 for this distribution is 25.2 for 19 d.o.f. resulting in P χ 2 =0.16 BUT... does this look right to you? We will get back to this question.

Karsten Heeger, Univ. of Wisconsin NUSS, July 13, 2009