Random Variables and Probability Distributions

Similar documents
Lecture 5: Probability Distributions. Random Variables

Machine Learning: Algorithms and Applications

CS 534: Computer Vision Model Fitting

Wishing you all a Total Quality New Year!

Complex Filtering and Integration via Sampling

C2 Training: June 8 9, Combining effect sizes across studies. Create a set of independent effect sizes. Introduction to meta-analysis

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Monte Carlo 1: Integration

X- Chart Using ANOM Approach

Monte Carlo 1: Integration

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Monte Carlo Integration

Biostatistics 615/815

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

USING GRAPHING SKILLS

Distribution Analysis

Programming in Fortran 90 : 2017/2018

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

y and the total sum of

NGPM -- A NSGA-II Program in Matlab

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Unsupervised Learning and Clustering

Active Contours/Snakes

Monte Carlo Rendering

EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1

GSLM Operations Research II Fall 13/14

Design of Structure Optimization with APDL

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Mathematics 256 a course in differential equations for engineering students

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Phd Program in Transportation. Transport Demand Modeling. Session 7

Intro. Iterators. 1. Access

Performance Evaluation of Information Retrieval Systems

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Classifier Selection Based on Data Complexity Measures *

Summarizing Data using Bottom-k Sketches

Mixed Linear System Estimation and Identification

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Algorithm To Convert A Decimal To A Fraction

An Entropy-Based Approach to Integrated Information Needs Assessment

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

Data Foundations: Data Types and Data Preprocessing. Introduction. Data, tasks and simple visualizations. Data sets. Some key data factors?

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Automatic selection of reference velocities for recursive depth migration

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Unsupervised Learning

EXTENDED BIC CRITERION FOR MODEL SELECTION

Problem Set 3 Solutions

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Multilevel Analysis with Informative Weights

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Report on On-line Graph Coloring

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Statistical Data Set Comparison for Continuous, Dependent Data by T. C. Smith

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Fitting: Deformable contours April 26 th, 2018

11. HARMS How To: CSV Import

Anonymisation of Public Use Data Sets

IMPROVING AND EXTENDING THE INFORMATION ON PRINCIPAL COMPONENT ANALYSIS FOR LOCAL NEIGHBORHOODS IN 3D POINT CLOUDS

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Associative Based Classification Algorithm For Diabetes Disease Prediction

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

On the diameter of random planar graphs

Dynamic Camera Assignment and Handoff

Pass by Reference vs. Pass by Value

OPL: a modelling language

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Smoothing Spline ANOVA for variable screening

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

CSCI 5417 Information Retrieval Systems Jim Martin!

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

A Flexible Architecture for Creating Scheduling Algorithms as used in STK Scheduler

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

Context-Specific Bayesian Clustering for Gene Expression Data

A Robust Method for Estimating the Fundamental Matrix

5 The Primal-Dual Method

S1 Note. Basis functions.

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Support Vector Machines

Stata data file Log file (explained below) ASCII (text) file

A DATA ANALYSIS CODE FOR MCNP MESH AND STANDARD TALLIES

CS1100 Introduction to Programming

Centroid Density of Interval Type-2 Fuzzy Sets: Comparing Stochastic and Deterministic Defuzzification

(1) The control processes are too complex to analyze by conventional quantitative techniques.

Mutual Information between Discrete and Continuous Data Sets

Transcription:

Random Varables and Probablty Dstrbutons Some Prelmnary Informaton Scales on Measurement IE231 - Lecture Notes 5 Mar 14, 2017 Nomnal scale: These are categorcal values that has no relatonshp of order or rank among them. (e.g. colors, speces) Ordnal scale: These are categorcal values that has relatonshp of order or rank among them (e.g. mltary ranks, competton results). Though the relatve order has no defned magntude (e.g. Champon can get 40 ponts, runner up 39 and thrd place 30). Interval scale: There s a numercal order but the dfference can only be defned n ntervals, snce there s no absolute mnmum. We cannot compare n relatve values. For nstance, we cannot say 10 degree celsus s twce as hot as 5 degree celsus; what about -5 vs +5? Rato scale: Scale wth an absolute mnmum. (e.g. If I have 50TL and my frend has 100TL, I can say that she has twce the money that I have.) Heght, weght, age are smlar examples. See more on https://en.wkpeda.org/wk/level_of_measurement. Infnty The concept of nfnty s very broad. Currently, you just need to keep the dstncton of countable and uncountable nfntes n mnd. Countably nfnte: 1, 2, 3, 4,... (e.g. natural numbers, ntegers, ratonal numbers) Uncountably nfnte: 1, 1.01, 1.001, 1.0001, 1.00001,... (e.g. real numbers) How many real numbers are there between 0 and 1? Descrptve Statstcs Here are bref descrptons of mean (expectaton), medan, mode, varance, standard devaton, quantle. Mean: X = N X Medan: Let s say X k are ordered from smallest to largest and there are n values n the sample. Medan(X)= X (n+1)/2 f n s odd and (usually) Medan(X)= X (n/2) + X (n/2+1). 2 Quantle: On an ordered lst of values for quantle (α) provdes the (α n) th smallest value of the lst. For nstance, f α = 70% = 0.7 quantle value s the 7th smallest value n a lst of 10 values. α = 1 means the maxmum. Quantle s an mportant parameter n especally statstcs. Mode: X k wth the hghest frequency n the sample. In a sample of (1, 2, 2, 3, 4, 5), 2 s the mode. N Varance: V (X) = (X X) 2 n 1 N Standard Devaton: σ(x) = (X X) 2 n 1 1

set.seed(231) #Let's pck 10 values from the numbers between 1 and 50. numbers <- sample(1:50,10,replace=true) #The sorted verson of the numbers sort(numbers) ## [1] 1 9 15 16 16 18 26 31 32 35 #The mean values of the numbers sum(numbers)/10 ## [1] 19.9 #or n R mean(numbers) ## [1] 19.9 #Medan of the numbers medan(numbers) ## [1] 17 #Quantle 7/9 of the numbers quantle(numbers,7/9) ## 77.77778% ## 31 #Quantle 0 of the numbers (also the mn) quantle(numbers,0) ## 0% ## 1 #Quantle 1 of the numbers (also the max) quantle(numbers,1) ## 100% ## 35 #No smple soluton for mode n R freq_table<-table(numbers) freq_table ## numbers ## 1 9 15 16 18 26 31 32 35 ## 1 1 1 2 1 1 1 1 1 names(freq_table[whch.max(freq_table)]) ## [1] "16" #Sample varance of numbers sum((numbers - mean(numbers))^2)/(10-1) ## [1] 118.7667 #For large values you can take n ~ n-1 #n R var(numbers) 2

## [1] 118.7667 #Sample standard devaton of values sqrt(sum((numbers - mean(numbers))^2)/(10-1)) ## [1] 10.89801 #n R sd(numbers) ## [1] 10.89801 Random Varables Random varables are the abstractons of uncertan events so that we can generalze events n formal functons nstead of explctly enumeratng the outcomes. For nstance, assume X s the number of tals n 2 con tosses. X can take values 0, 1 and 2. X s a dscrete random varable. P (X = 0) = P ({H, H}) = 1/4 (1) P (X = 1) = P ({H, T }, {T, H}) = 2/4 (2) P (X = 2) = P ({T, T }) = 1/4 (3) There are also the contnuous random varables. Contnuous random varables are usually defned n ntervals nstead of ndvdual values. For nstance, defne Y as any real number between 0 and 1 and and all values wthn the nterval are equally probable (.e. unform dstrbuton). (4) P (Y 0.25) = 1/4 (5) P (X 0.5) = 2/4 (6) P (X 0.75) = 3/4 (7) (8) Fundamental Concepts There are several fundamental concepts to keep n mnd. Probablty Mass Functon (pmf): pmf s the pont probablty for dscrete dstrbutons (.e. f(x) = P (X = x)). For nstance P (X = H) = 1/2, P (X = T ) = 1/2. n f(x ) = 1 Probablty Densty Functon (pdf): pdf s the nterval probablty for contnuous dstrbutons (.e. f(x) = P (a < X < b) = b f(x)dx). Snce almost all pont probabltes n contnuous dstrbutons a are 0 (due to nfnty), ntervals. f(x)dx = 1 3

Fgure 1: Fgure 2: 4

Cumulatve Dstrbuton Functon (cdf): cdf s the cumulatve probablty for all values smaller than x (.e. F (x) = P (X x)). For the con toss an example cdf would be two or less tals (P (X 2)). Man relatonshp between cdf and pdf s (F (X a) = a f(x)dx). Expected Value (E[X]): Expected value of a probablty dstrbuton s calculated as follows. for dscrete dstrbutons. µ = E[X] = n x f(x ) µ = E[X] = for contnuous dstrbutons. xf(x)dx Example: Calculate the expected value of number of tals n two con tosses. n E[X] = x f(x ) = 0 P (X = 0) + 1 P (X = 1) + 2 P (X = 2) (9) = 0 1/4 + 1 1/2 + 2 1/4 (10) = 1 (11) Varance (V (X)): Varance s calculated as follows for dscrete dstrbutons. V (X) = E[(X µ) 2 ] = n (x µ) 2 f(x ) V (X) = E[(X µ) 2 ] = for dscrete dstrbutons. Varance can also be calculated as V (X) = E[X 2 ] (E[X]) 2. (x µ) 2 f(x)dx Some Dscrete Dstrbutons Bernoull Dstrbuton It can also be called sngle con toss dstrbuton. For a sngle event wth probablty of success p and falure q = 1 p, the dstrbuton s called Bernoull. pmf: f(x = 0; p) = q, f(x = 1) = p E[X] = 0 (1 p) + 1 p = p V [X] = pq Example: Con Toss p = 0.5, q = 1 p = 0.5 pmf: f(x = 0) = 0.5, f(x = 1) = 0.5 5

E[X] = 0 (1 0.5) + 1 0.5 = 0.5 V (X) = 0.5 0.5 = 0.25 Bnomal Dstrbuton Thnk of multple Bernoull trals (e.g. several con tosses). pmf: f(x; p, n) = ( ) n x p x q (n x) E[X] = np V (X) = npq cdf: F (X x) = n =0 f() Example: Multple Con Tosses (x5 cons, p = 0.5) pmf: f(x = 3; n = 5) = ( 5 3) (0.5) 3 (1 0.5) (5 3) = 0.3125 #R way #(d)ensty(bnom)al dbnom(x=3,sze=5,prob=0.5) ## [1] 0.3125 E[X] = 5 0.5 = 2.5 V (X) = 5 0.5 0.5 = 1.25 cdf: F (X 3; n = 5) = 5 =0 f() = 0.8125 #R way pbnom(q=3,sze=5,prob=0.5) ## [1] 0.8125 Multnomal Dstrbuton Now suppose there s not one probablty (p) but there are many probabltes (p 1, p 2,..., p k ). pmf: f(x 1,..., x k ; p 1,..., p k ; n) = ( ) n x 1,...,x k p x 1 1 px k k where ( ) n n! x 1,...,x k = x 1!... x k!, k x = n and k p = 1. Example: Customers of a coffee shop prefer Turksh coffee wth probablty 0.4, espresso 0.25 and flter coffee 0.35. What s the probablty that out of the frst 10 customers, 3 wll prefer Turksh coffee, 5 wll prefer espresso and 2 wll prefer flter coffee? f(3, 5, 2; 0.4, 0.25, 0.35; 10) = ( 10 3,5,2) 0.4 3 0.25 5 0.35 1 0 = 4.3 10 6 = 0.0193 #Explct form factoral(10)/(factoral(3)*factoral(5)*factoral(2))*0.4^3 * 0.25^5 * 0.35^2 ## [1] 0.01929375 #Densty multnomal dmultnom(x=c(3,5,2),prob=c(0.4,0.25,0.35)) ## [1] 0.01929375 Bnomal dstrbuton s a specal case of multnomal dstrbuton. 6

Hypergeometrc Dstrbuton Hypergeometrc dstrbuton can be used n case the sample s dvded n two such as defectve/nondefectve, whte/black, Ankara/Istanbul. Suppose there are a total of N tems, k of them are from group 1 and N k of them are from group 2. We want to know the probablty of gettng x tems from group 1 and n k tems from group 2. ( k N k ) x)( pmf: f(x, n; k, N) = E[X] = nk N n x ( N n) V [X] = N n N 1 n k N (1 k N ) Example: Suppose we have a group of 20 people, 12 from Istanbul and 8 from Ankara. If we randomly select 5 people from t what s the probablty that 1 of them s from Ankara and 4 of them from Istanbul. ( 8 20 8 ) 1)( f(1, 5; 8, 20) = 5 1 ( 20 5 ) = 0.256 #Explct form x=1 n=5 k=8 N=20 (choose(k,x)*choose(n-k,n-x))/choose(n,n) ## [1] 0.255418 #Densty hypergeometrc, see?dhyper for explanatons dhyper(x=1,m=8,n=12,k=5) ## [1] 0.255418 Negatve Bnomal Dstrbuton Negatve Bnomal dstrbuton answers the queston What s the probablty that k-th success occurs n n trals?. Dfferently from the bnomal case, we fx the last attempt as success. pmf: f(x; p, n) = ( ) n 1 x 1 p x q (n x) Example: Suppose I m repeatedly tossng cons. What s the probablty that 3rd Heads come n the 5th toss? f(3; 0.5, 5) = ( 5 1 3 1) 0.5 3 0.5 (5 3) = 0.1875 #Explct form choose(5-1,3-1)*0.5^3*0.5^(5-3) ## [1] 0.1875 #Bnomal way dbnom(3-1,5-1,0.5)*0.5 ## [1] 0.1875 #Negatve bnomal way dnbnom(x=5-3,sze=3,prob=0.5) ## [1] 0.1875 7

Geometrc Dstrbuton Geometrc dstrbuton answers What s the probablty that frst success comes n the n-th tral? pmf: f(x; p, n) = q (n 1) p E[X] = 1/p V [X] = 1 p p 2 8