Biostatistics & SAS programming. Kevin Zhang

Similar documents
Probability and Statistics for Final Year Engineering Students

Chapter 6: Simulation Using Spread-Sheets (Excel)

Today s outline: pp

SPSS Basics for Probability Distributions

A Quick Introduction to R

CREATING SIMULATED DATASETS Edition by G. David Garson and Statistical Associates Publishing Page 1

Will Monroe July 21, with materials by Mehran Sahami and Chris Piech. Joint Distributions

R Programming Basics - Useful Builtin Functions for Statistics

Chapter 6 Normal Probability Distributions

Stat Wk 5. Random number generation. Special variables in data steps. Setting labels.

Lab #3: Probability, Simulations, Distributions:

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Statistical Methods for NLP LT 2202

Stat 302 Statistical Software and Its Applications SAS: Distributions

Lecture 8: Jointly distributed random variables

Introduction to Machine Learning

Random Number Generators

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Objective 1: To simulate the rolling of a die 100 times and to build a probability distribution.

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Distributions of Continuous Data

Lecture 09: Continuous RV. Lisa Yan July 16, 2018

Package visualize. April 28, 2017

Parameter Estimation. Learning From Data: MLE. Parameter Estimation. Likelihood. Maximum Likelihood Parameter Estimation. Likelihood Function 12/1/16

CHAPTER 6. The Normal Probability Distribution

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Probability Models.S4 Simulating Random Variables

Section 6.2: Generating Discrete Random Variates

Lecture 8 Mathematics

Multivariate probability distributions

Introductory Applied Statistics: A Variable Approach TI Manual

What s New in Oracle Crystal Ball? What s New in Version Browse to:

hp calculators HP 9g Probability Random Numbers Random Numbers Simulation Practice Using Random Numbers for Simulations

Biostatistics & SAS programming. Kevin Zhang

Sampling random numbers from a uniform p.d.f.

Topic 5 - Joint distributions and the CLT

adjacent angles Two angles in a plane which share a common vertex and a common side, but do not overlap. Angles 1 and 2 are adjacent angles.

What We ll Do... Random

COMPUTING AND DATA ANALYSIS WITH EXCEL. Numerical integration techniques

MODIFIED VERSION OF: An introduction to Matlab for dynamic modeling ***PART 3 ***

Semantic Importance Sampling for Statistical Model Checking

Discrete Mathematics Course Review 3

Package simed. November 27, 2017

Random Number Generation and Monte Carlo Methods

INF : NumPy and SciPy

Algebra 1. Standard 11 Operations of Expressions. Categories Combining Expressions Multiply Expressions Multiple Operations Function Knowledge

CDA6530: Performance Models of Computers and Networks. Chapter 8: Statistical Simulation --- Discrete-Time Simulation

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Modules and Clients 1 / 21

Simulation and Statistical Exploration of Data (e.g. Fair Die or Unfair Die) Test of Hypothesis on Fair Die (Simulation of Chi Square Tests)

Data Handling. Moving from A to A* Calculate the numbers to be surveyed for a stratified sample (A)

Fathom Dynamic Data TM Version 2 Specifications

Programming and Post-Estimation

Outline. 1 Using Functions in Other Programs. 2 Modular Programming Abstractions. 3 Random Numbers. 4 List Processing. 5 Standard Statistics 1 / 21

McGraw-Hill Ryerson. Data Management 12. Section 5.1 Continuous Random Variables. Continuous Random. Variables

Sampling and Monte-Carlo Integration

Why is Statistics important in Bioinformatics?

Use of Extreme Value Statistics in Modeling Biometric Systems

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

Simulation Input Data Modeling

Server-side Statistics Scripting in PHP

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

Generating random samples from user-defined distributions

BESTFIT, DISTRIBUTION FITTING SOFTWARE BY PALISADE CORPORATION

Math INTRODUCTION TO MATLAB L. J. Gross - August 1995

Chapter 2 Modeling Distributions of Data

Using the HP 38G in upper school: preliminary thoughts

Numerical Integration

Contents of SAS Programming Techniques

BASIC SIMULATION CONCEPTS

SAS (Statistical Analysis Software/System)

CS 112: Computer System Modeling Fundamentals. Prof. Jenn Wortman Vaughan April 21, 2011 Lecture 8

Macros and ODS. SAS Programming November 6, / 89

MEI STRUCTURED MATHEMATICS 2614/1

Monte Carlo Techniques. Professor Stephen Sekula Guest Lecture PHY 4321/7305 Sep. 3, 2014

6-1 THE STANDARD NORMAL DISTRIBUTION

Practical 2: Using Minitab (not assessed, for practice only!)

Simulation. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen

Package UnivRNG. R topics documented: January 10, Type Package

Modeling RNA/DNA with Matlab - Chemistry Summer 2007

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Solution: It may be helpful to list out exactly what is in each of these events:

StatsMate. User Guide

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Figure 1. Figure 2. The BOOTSTRAP

STATS PAD USER MANUAL

MULTI-DIMENSIONAL MONTE CARLO INTEGRATION

11-2 Probability Distributions

Pairs of a random variable

Dealing with Categorical Data Types in a Designed Experiment

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

SD 372 Pattern Recognition

MATLAB Modul 4. Introduction

Table Of Contents. Table Of Contents

4.3 The Normal Distribution

CS 237 Fall 2018, Homework 08 Solution

Activity Overview A basic introduction to the many features of the calculator function of the TI-Nspire

arxiv: v1 [physics.comp-ph] 23 Oct 2009

GAMES Webinar: Rendering Tutorial 2. Monte Carlo Methods. Shuang Zhao

Transcription:

Biostatistics & SAS programming Kevin Zhang February 27, 2017 Random variables and distributions 1

Data analysis Simulation study Apply existing methodologies to your collected samples, with the hope to find some useful conclusions. Check assumptions Apply the PROC Interpret results Development Try to develop new methodologies or enhance existing methods to draw some conclusions. Derive formulas Programming using IML and MACRO Simulation study to verify the results February 27, 2017 Biostat 2

Simulation procedure: Generate datasets from assumed distribution Applying your algorithm to each dataset and collect results Interpret results: Is it close to what you expected? Assessing the accuracy and compare to existing methods February 27, 2017 Biostat 3

Distribution The distribution defines the rule of the probability Evaluate the probability Generate random values from a specified probability In fact, each variable in your sample is a sequence of random values (in most cases we don t know the distribution) February 27, 2017 Biostat 4

Mathematical expression f(x) Density function (PDF) / Mass function (PMF): Describing the probability assignment to each possible values It means f(a) = P(X=a), i.e. what is the probability assigned to value a F(x) Cumulative distribution function (CDF): Telling what is the probability from the very beginning till a given threshold It means F(a) = P(X a) February 27, 2017 Biostat 5

Commonly used distributions Discrete Bernoulli (or called 0-1), B(1, p) Continuous Continuous uniform, U(a,b) Binomial, B(n,p) Normal, N(μμ, σσ 2 ) Poisson, P(λλ) Geometric, G(p) Discrete uniform, DU(a,b) Student s T, t(df) Exponential, Exp(ββ) Chi-square, χχ 2 (df) F distribution, F(df1, df2) February 27, 2017 Biostat 6

Bernoulli distribution Modeling cases with only two outcomes F(x) and f(x): Numerical characteristics Example: Flip a coin, get a Head? February 27, 2017 Biostat 7

Binomial distribution Try a sequence of same designed Bernoulli case F(x) and f(x) Numerical characteristics Example: Flip a coin 10 times, how many Heads you got? February 27, 2017 Biostat 8

Poisson distribution How many desired results will be obtained during the given time? F(x) and f(x) Numerical characteristics: Example: How many customers entering the local Walmart between 8 am and 10 am? February 27, 2017 Biostat 9

Geometric distribution How many trials are needed to acquire the desired number of results? F(x) and f(x) Numerical characteristics Example: How many trials will allow us to get five 1 s by rolling a same fair die? February 27, 2017 Biostat 10

Discrete uniform distribution Modeling the cases that all possible results are equally likely. F(x) and f(x) Characteristics Example: Rolling a fair die February 27, 2017 Biostat 11

Random numbers in Computer Random generator In fact it is an algorithm that choosing numbers randomly from a certain sequence of numbers. The randomness in the computer depends on time, date, computer name, IP address, hardware IDs, etc. Thus it makes the choice different from computer to computer. Random seed: a number to distinguish the randomness. In fact is the evidence for the computer to choose values from the certain sequence. Computers will obtain EXACTLY SAME random sequence if you set a same random seed. February 27, 2017 Biostat 12

DATA step SAS: Using DATA step together with loop /* Bernolli experiment */ data bino1(keep = x); p = 0.5; n = 1; keep lists the variables you wish to keep inside the data set Parameters for the distributions call streaminit(123); /* set random number seed */ do i = 1 to 1000; x = rand("binomial", p, n); /* x ~ Bernolli(0.5) */ output; end; Random generator run; Random seed Loop 1000 times, thus you get 1000 values February 27, 2017 Biostat 13

More Poisson distribution /* --- Poisson random numbers --- */ data pos(keep = x); call streaminit(123); /* set random number seed */ lambda = 4; do i = 1 to 1000; x = rand("poisson", lambda); /* x ~ Pois(10) */ output; end; run; February 27, 2017 Biostat 14

Full list You can find the manual of RAND() function call of SAS here: https://support.sas.com/documentation/cdl/en/lefunctionsref/69762/h TML/default/viewer.htm#p0fpeei0opypg8n1b06qe4r040lv.htm We can use RAND() to get random numbers of all distributions together with provide parameters February 27, 2017 Biostat 15

PROC IML We can also use IML procedure to program it: IML Interactive Matrix Programming, https://support.sas.com/rnd/app/iml/ It allows us to define vectors and matrices, and calculate some results just like other programming languages (like MATLAB, R, Python) Example: proc iml; call randseed(123); /* set random number seed */ x = j(10,1); /* allocate a vector with 10 values in it */ call randgen(x, "Uniform"); /* u ~ U[0,1] */ print x; run; February 27, 2017 Data Mining: Concepts and Techniques 16

HW Try to generate following random sequence Normal, with mean 3 and standard deviation 4 Chi-square with degrees of freedom 5 Student s T with degrees of freedom 10 Geometric distribution with p = 0.3 Exponential distribution F distribution with n=3 and d=10 Research the histograms of above random sequences, together with /normal option, see what happens? February 27, 2017 Biostat 17