Statistical Tests for Variable Discrimination
|
|
- Alexis Hart
- 6 years ago
- Views:
Transcription
1 Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
2 General statistics Descriptional: Describing samples statistic properties Mathematical: Studying the probability distributions Question around the samples starting from a known distribution Knowing the 50% of the population read books, what is the probability that in a sample of 100 subjects 70 of them read books? Inferential: Starting from the samples, what about the statistical distribution? In a sample of 100 subjects, 65 of them read books. May I infer that more than 50% of the general population read books? What is the probability of an error? (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
3 Relative frequencies and Percentage Example Given the birthwt dataset where n = 189 What are the relative frequencies for the race variable? Relative frequencies can be computedd as: nc n head(birthwt) ## low age lwt race smoke ptl ht ui ftv bwt ## African-American ## Other ## White ## White ## White ## Other table(birthwt$race) ## Frequencies ## ## White African-American Other ## (table(birthwt$race) / nrow(birthwt))*100 # Relative Frequencies ## ## White African-American Other ## (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
4 Statistical Inference Definition: The process of using the data to draw conclusions about the whole population Example Examples of statistical inference. Let s say I want to test the hypothesis about the average normal body temperature. 1 Get the body temperature of the whole population NOT FEASIBLE 2 Study a sample of representative members selected from the population Samples should be chosen randomly Samples are assumed to be independent 3 Try to estimate the unknown population average NB The real population average remains unknown. The estimation depends on our observations There is always an uncertainty (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
5 How to choose the population? More on sampling How do we select samples from a population? SRS: Simple Random Sampling. The most straight sampling procedure. Give a number 1... N to each member in the population Extract randomly n numbers Change of being selected is the same for any group of n members in the population SS Stratified Sampling. The sample should be comparable to the whole population with respect to representative groups. No subgroup in the observations should be overrepresented CS Clustering sample. Start the sampling grouping in clusters Sample from the clusters Subsample some or all members of the cluster (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
6 Population vs Samples Population parameters estimate Mean: µ = N i=1 x i N Population N x is an estimator of the µ (true population mean) In particular x µ for n Variance: σ 2 = N i=1 (x i µ) 2 N Mean: x = Variance: s 2 = Sample n n i=1 x i n n i=1 (x i x) 2 n 1 mean(birthwt$smoke) ## Smoking mothers mean ## [1] var(birthwt$smoke) ## Smoking mothers variance ## [1] mean(birthwt$smoke) * (1 - mean(birthwt$smoke)) ## See the bernoulli dist. ## [1] (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
7 Law of Large Numbers µ^ If the sample size is large enough... The mean estimator converges to the population mean Mean estimation for n > Inf from N(0,1) ^2 10^3 10^4 10^5 Number of extraction (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
8 Sample distributions Probability distributions for estimators are called sampling distribution Assumptions Assume random variable X has a normal N (0, 1) distribution Assume σ 2 is known We use X to estimate µ What is the sampling distribution of X? Extract n samples from the population X 1,...,n N (µ, σ 2 ) with X 1,...,n independent. X 1 + X X n N (nµ, nσ 2 ) n i=1 X i N (nµ, nσ 2 ) The sum of n identically distributed normal variables is itself normally distributed n i=1 Given the sample mean estimator X = X i the mean and variance of the sample mean n estimator is: nµ/n and nσ 2 /n 2 = σ 2 /n X N (µ, σ 2 /n) (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
9 Sample distributions II Example Consider the random variable X N (125, 15 2 ) representing the systolic blood pression Extract 100 samples X 1,, X 100 N (125, 15 2 ) and X N (125, 15 2 /100) Estimators depend on the specific sample selected from the population Repeating the sampling lead to different values for the estimator Theoretical Distribution Sample mean probability distribution Density Density x X (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
10 Hints on how to compute those plots Draw the population density distribution Extract 100 samples from the population distribution Create the probability distribution Plot everything Draw the sample mean distribution Extract 100 samples from the distribution Estimate the mean of the distribution Repeat the same operation 1000 times Plot everything (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
11 Confidence Intervals Definition Variations of the estimators if different members of the population were selected Example Consider the Systolic Blood Pressure example: We know the sample mean distribution is: X = N (µ, σ 2 /n) Since the % rule applies, with 0.95 of probability: µ X µ We want to estimate the true population µ probability, X 3 µ X + 3 µ falls within [ X 3, X + 3] we could repeatedly sample n, find the sample mean and determine the interval In reality we have only one sample so the true µ with 0.95 of probability is in: [ x 3, x + 3] (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
12 Confidence intervals for the Population Proportion Suppose we want to find the 95% CI for the population proportion of mothers who smoke during pregnancy in Using the birthwt dataset x = 0.39 sum(birthwt$smoke)/189 ## [1] Estimate the variance s 2 = p(1 p) = 0.24 s <- (sum(birthwt$smoke)/189) * (1-sum(birthwt$smoke)/189) ## [1] The Standard Error (SE) for the sample mean is σ n = SE <- sqrt(s/189) The 95% CI is [p z crit SE, p + z crit SE]: p(1 p) n = 0.3 [ , ] = [0.33, 0.45] Therefore we can define the Margin of Error as: e = z crit σ n (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
13 The % rule The % rule for normally distributed values: 68% of values fall within 1 standard deviation of the mean P(µ σ < X µ + σ) = % of values fall within 2 standard deviation of the mean P(µ 2σ < X µ + 2σ) = % of values fall within 3 standard deviation of the mean P(µ 3σ < X µ + 3σ) = (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
14 Check the % rule with R For a sufficient number of samples we can estimate the typical ranges n < mynorm <- rnorm(n) # Extract n samples from N(0,1) sum(mynorm>mean(mynorm)-sd(mynorm) & mynorm<=mean(mynorm)+sd(mynorm))/n ## [1] sum(mynorm>mean(mynorm)-2*sd(mynorm) & mynorm<=mean(mynorm)+2*sd(mynorm))/n ## [1] sum(mynorm>mean(mynorm)-3*sd(mynorm) & mynorm<=mean(mynorm)+3*sd(mynorm))/n ## [1] (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
15 How the rule looks like 68% Interval 95% Interval Density σ + σ Density σ +2σ x x (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
16 Exercises Recall the % Rule and find the multiplier for the confidence intervals at 70, 80, 90% for a normally distributed variable. We assume that the probability distribution of blood pressure, X N (µ, σ 2 ) distribution. Suppose we know that σ = 6. To estimate µ, we randomly selected 9 people and measured their blood pressure. The sample mean is x = Write down the sampling distribution of the sample mean X and find its standard deviation. 2 Find the 75% CI estimation for µ (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
17 Case-Control study Example We want to study the effect of smoking on lung cancer. Retrospective Select a group of patients with lung cancer and survey them to determine if they have smoked in the past. Prospective Select a group of smokers and observe them over time without influencing the natural process. To make resonable conclusion we need to compare patients in the study with patients with the same habits without lung cancer which are similar in all other aspects. Compare cases (lung cuncer patients) with controls (no lung cancer) Individual in the case group should not be related with the control group. (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
18 Hypothesis Testing Assumptions Idea: Starting with an hypothesis we want to test if it is real In the Body Temperature dataset the hypothesis is that in average the body temperature is less than 98.6 degree F The statement can be expressed as µ < 98.6 We can now create an hypothesis which invalidates the previous one µ This is called the null hypothesis H 0 The null hypothesis reflects the nothing of interest We can define the alternative hypothesis denoting this as H A or H 1 which is what we want to investigate The procedure of evaluating the hypothesis is called hypothesis testing Examine the evidence the data provides against the null hypothesis. If the evidence is strong we reject H 0 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
19 Testing the mean In particular we want to test: X H 0 N (µ, σ 2 /n) Example From the body temperature dataset: We have H 0 : µ = 98.6 and H a : µ < 98.6 Select 25 healthy patients and σ 2 = 1 thus: X H 0 N (98.6, 1/25) From the 25 samples we have only one x. Suppose x = 98.4 We want to evaluate the lower tail probability for x = 98.4 The significance level is the p-value defined as: p obs = P( X x H 0 ) (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
20 Visualizing the hypothesis testing See the probability for p obs and the x p obs = P( X x H 0 ) Density p obs x x p obs = P( X 98.4) pnorm(98.4,mean=m,sd=s) ## Compute the above probability ## [1] (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
21 Hypothesis testing One-side vs Two-sides One-sided Test H 0 : µ = µ 0 against H 1 : µ < µ 0 Departure from the mean is on one direction Example with body temperature: H 0 : µ = 98.6 and H 1 : µ < 98.6 Computing: p obs = P(Z z) where Z = X µ0 σ/ x µ0 N (0, 1) and z = n σ/ n Two-sided We might be indifferent to the direction, thus: H 0 : µ = µ 0 and H 1 : µ µ 0 Example with body temperature: H 0 = µ = 98.6 and H 1 : µ 98.6 Computing: p obs = P(Z z ) + P(Z z ) = 2 P(Z z ) (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
22 Two-sided hypothesis tests Distribution of the Z normalized standard variable with z = 1 Z distribution Density p obs z x (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
23 Hypothesis Testing Aim: Answering to the general population distribution variable, starting from the samples collected From the population A and B average m A and m B Hypothesis: The mean µ A and µ B from population A and B respectively are equal (H0 null hypothesis) Alternatively,more of interest... µ A µ B H1=not H0 Result: Whether to accept or refuse H0 minimizing the type I error (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
24 Hypothesis testing T-test Example T-test: 1 Assumptions: Observations are indipendent Observation come from gaussian variables with mean µ a and µ b and variance σ a and σ b σ a = σ b 2 Null hypothesis H0: µ a = µ b 3 Compute T variable y = ma m b sp 1 na + 1 n b s p = (na 1)s2 a +(n b 1)s2 b na+n b 2 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
25 Examples in R One sided Using the Pima.tr dataset to test H 0 : µ = 30 and H 1 : µ > 30 t.test(pima.tr$bmi, alternative="greater", mu=30, conf.level=0.95) ## ## One Sample t-test ## ## data: Pima.tr$bmi ## t = , df = 199, p-value = 1.331e-07 ## alternative hypothesis: true mean is greater than 30 ## 95 percent confidence interval: ## Inf ## sample estimates: ## mean of x ## Two sided-two sample Use the BodyTemperature dataset to test if there is differences in body temperature between genders t.test(temperature~gender, data=bt, var.equal=true) ## ## Two Sample t-test ## ## data: Temperature by Gender ## t = , df = 98, p-value = ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## mean in group F mean in group M ## (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
26 Paired t-test Until now we assumed variables in two groups are independent. Example What if the variables are dependent? Is the t.test still valid? 1 Test the effect of a diet on blood pressure A sample can have a lower blood pressure before starting the experiment There can be differences given by the age of the subjects How to avoid the effect of this issues? A possible solution is to assign the same subject to each diet group Each subject follow the prescribed diet, and we measure the blood pressure, then they are asked to follow another diet for six months and then measure the blood pressure again. NB Individual in the two groups are paired (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
27 Paired t-test Examples Example To show the use of the paired version of the t.test we use the study on the effect of tobacco smoke on patelet function by Levine. hypothesis Higher frequency of arterial thrombosis in cigarette smokers could be partially explained by increased platelet aggregation caused by smoking study in a group of eleven people he measured the patelet aggregation before and after smoking a cigarette testing test if the difference in patelet aggregation:h 0 : µ = 0 and H 1 : µ < 0 t.test(pt$before,pt$after, paired=true) ## ## Paired t-test ## ## data: pt$before and pt$after ## t = , df = 10, p-value = ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## mean of the differences ## (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
28 Testing for normality All what we have seen before suppose the variables are normally distributed How do we check this? 1: Visual Inspection Test normality for Body Mass Index qqnorm(pima.tr$bmi) Normal Q Q Plot Theoretical Quantiles Sample Quantiles Test normality for Age qqnorm(pima.tr$age) Normal Q Q Plot Theoretical Quantiles Sample Quantiles (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
29 Testing for normality 2: Normality tests Shapiro-Wilk test for checking the normality It evaluates the null hypothesis that the distribution of a random variable is normal. Test normality for Body Mass Index shapiro.test(pima.tr$bmi) ## ## Shapiro-Wilk normality test ## ## data: Pima.tr$bmi ## W = 0.991, p-value = Test normality for Age shapiro.test(pima.tr$age) ## ## Shapiro-Wilk normality test ## ## data: Pima.tr$age ## W = , p-value = 1.853e-12 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
30 Testing for Homoscedasticity Null Hypothesis H 0 : The variance of the groups are equal. Parametric test: bartlett test: bartlett.test(x,y) levene test (from car library): levenetest(y x) (Non) Parametric tests: Fligner-Killeen test: fligner.test(y x) bartlett.test(bt$temperature,bt$gender) ## ## Bartlett test of homogeneity of variances ## ## data: bt$temperature and bt$gender ## Bartlett's K-squared = 2.189, df = 1, p-value = Density N = 51 Bandwidth = (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
31 Excercise I 1 We assume that the probability distribution of blood pressure, X N (µ, σ 2 ) distribution suppose that we did not know σ and estimated it using the sample standard deviation s=6 1 Find the standard error for the sample mean as the estimator of the population mean 2 Find the 80% CI estimation for µ based on this sample 2 Given a distribution with 20 degree of freedom compute the confidence interval at 0.99, 0.95, 0.90 probability. 3 Using the bodytemperature dataset, find the point estimate and the 78% confidence interval estimate for the population means of hear rate and normal body temperature 4 Suppose that we interviewed a random sample of 2000 people and found that 320 of them smoke regularly. Find the 90% confidence interval for the population proportion of smokers 5 With the Pima.tr dataset suppose a BMI greater than 30 denote obesity. We know obesity and diabetes are related. Suppose sample size is n = 100 and σ 2 = 6 2. How can you test if this population is obese? Write the formulas and test it using R. 6 Use the Pima.tr to find the difference between the sample means of diastolic blood pressure for diabetic and nondiabetic Pima Indian women. Is the differ- ence between the means of diastolic blood pressure statistically significant at 0.01 level? 7 Answer the above question for the number of pregnancies and BMI 8 Use the birthwt data set to examine the relationship between hypertension history (ht) and the risk of having low-birthweight baby (low). 9 Use the birthwt dataset and examining the effect of smoke on birth weight. There is any significant difference? What is the p-value? (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, / 31
Regression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationChapter 2 Data Exploration
Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of
More informationSTA215 Inference about comparing two populations
STA215 Inference about comparing two populations Al Nosedal. University of Toronto. Summer 2017 June 22, 2017 Two-sample problems The goal of inference is to compare the responses to two treatments or
More informationSTAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015
STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationPackage distdichor. R topics documented: September 24, Type Package
Type Package Package distdichor September 24, 2018 Title Distributional Method for the Dichotomisation of Continuous Outcomes Version 0.1-1 Author Odile Sauzet Maintainer Odile Sauzet
More informationUnit 5: Estimating with Confidence
Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating
More informationStat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution
Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal
More informationInterval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.
Interval Estimation It is a common requirement to efficiently estimate population parameters based on simple random sample data. In the R tutorials of this section, we demonstrate how to compute the estimates.
More informationThe Bootstrap and Jackknife
The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter
More informationCondence Intervals about a Single Parameter:
Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationChapter 8. Interval Estimation
Chapter 8 Interval Estimation We know how to get point estimate, so this chapter is really just about how to get the Introduction Move from generating a single point estimate of a parameter to generating
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationUnit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users
BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit
More informationContinuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.
Continuous Improvement Toolkit Normal Distribution The Continuous Improvement Map Managing Risk FMEA Understanding Performance** Check Sheets Data Collection PDPC RAID Log* Risk Analysis* Benchmarking***
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationThe Normal Distribution. John McGready, PhD Johns Hopkins University
The Normal Distribution John McGready, PhD Johns Hopkins University General Properties of The Normal Distribution The material in this video is subject to the copyright of the owners of the material and
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationMAT 110 WORKSHOP. Updated Fall 2018
MAT 110 WORKSHOP Updated Fall 2018 UNIT 3: STATISTICS Introduction Choosing a Sample Simple Random Sample: a set of individuals from the population chosen in a way that every individual has an equal chance
More informationINTRODUCTION to SAS STATISTICAL PACKAGE LAB 3
Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you
More informationIn this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:
UPPSALA UNIVERSITY Department of Mathematics Måns Thulin, thulin@math.uu.se Analysis of regression and variance Fall 2011 COMPUTER EXERCISE 2: One-way ANOVA In this computer exercise we will work with
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More information23.2 Normal Distributions
1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationPredicting Diabetes using Neural Networks and Randomized Optimization
Predicting Diabetes using Neural Networks and Randomized Optimization Kunal Sharma GTID: ksharma74 CS 4641 Machine Learning Abstract This paper analysis the following randomized optimization techniques
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationNonparametric and Simulation-Based Tests. Stat OSU, Autumn 2018 Dalpiaz
Nonparametric and Simulation-Based Tests Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 What is Parametric Testing? 2 Warmup #1, Two Sample Test for p 1 p 2 Ohio Issue 1, the Drug and Criminal Justice Policies
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal Email: yuppal@ysu.edu Chapter 8: Interval Estimation Population Mean: Known Population Mean: Unknown Margin of Error and the Interval
More informationBIOL Gradation of a histogram (a) into the normal curve (b)
(التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationConfidence Intervals. Dennis Sun Data 301
Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationCpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.
C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationLab 5 - Risk Analysis, Robustness, and Power
Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationAssumption 1: Groups of data represent random samples from their respective populations.
Tutorial 6: Comparing Two Groups Assumptions The following methods for comparing two groups are based on several assumptions. The type of test you use will vary based on whether these assumptions are met
More informationWeek 7: The normal distribution and sample means
Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample
More informationMAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015
MAT 142 College Mathematics Statistics Module ST Terri Miller revised July 14, 2015 2 Statistics Data Organization and Visualization Basic Terms. A population is the set of all objects under study, a sample
More informationSo..to be able to make comparisons possible, we need to compare them with their respective distributions.
Unit 3 ~ Modeling Distributions of Data 1 ***Section 2.1*** Measures of Relative Standing and Density Curves (ex) Suppose that a professional soccer team has the money to sign one additional player and
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationE-Campus Inferential Statistics - Part 2
E-Campus Inferential Statistics - Part 2 Group Members: James Jones Question 4-Isthere a significant difference in the mean prices of the stores? New Textbook Prices New Price Descriptives 95% Confidence
More informationStat 427/527: Advanced Data Analysis I
Stat 427/527: Advanced Data Analysis I Chapter 3: Two-Sample Inferences September, 2017 1 / 44 Stat 427/527: Advanced Data Analysis I Chapter 3: Two-Sample Inferences September, 2017 2 / 44 Topics Suppose
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationSoci Statistics for Sociologists
University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 7 Inference for Distributions For further information on
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationExploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management
Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management Megha Maheshwari 1, Samir Chatterjee 1, David Drew 2 1 Network Convergence Lab, Claremont Graduate University http://ncl.cgu.edu
More informationOne Factor Experiments
One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationThe problem we have now is called variable selection or perhaps model selection. There are several objectives.
STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationConfidence Intervals: Estimators
Confidence Intervals: Estimators Point Estimate: a specific value at estimates a parameter e.g., best estimator of e population mean ( ) is a sample mean problem is at ere is no way to determine how close
More informationMixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.
Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics
More informationCHAPTER 2: Describing Location in a Distribution
CHAPTER 2: Describing Location in a Distribution 2.1 Goals: 1. Compute and use z-scores given the mean and sd 2. Compute and use the p th percentile of an observation 3. Intro to density curves 4. More
More informationMachine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 13W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B,
More informationWHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
More informationEquivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences
Chapter 520 Equivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences Introduction This procedure calculates power and sample size of statistical tests of equivalence of the means of
More informationUsing R. Liang Peng Georgia Institute of Technology January 2005
Using R Liang Peng Georgia Institute of Technology January 2005 1. Introduction Quote from http://www.r-project.org/about.html: R is a language and environment for statistical computing and graphics. It
More informationResampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016
Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation
More informationMICROSOFT EXCEL BASIC FORMATTING
MICROSOFT EXCEL BASIC FORMATTING To create a new workbook: [Start All Programs Microsoft Office - Microsoft Excel 2010] To rename a sheet(1): Select the sheet whose tab you want to rename (the selected
More informationAdvanced Statistical Computing Week 2: Monte Carlo Study of Statistical Procedures
Advanced Statistical Computing Week 2: Monte Carlo Study of Statistical Procedures Aad van der Vaart Fall 2012 Contents Sampling distribution Estimators Tests Computing a p-value Permutation Tests 2 Sampling
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationNonparametric and Simulation-Based Tests. STAT OSU, Spring 2019 Dalpiaz
Nonparametric and Simulation-Based Tests STAT 3202 @ OSU, Spring 2019 Dalpiaz 1 What is Parametric Testing? 2 Warmup #1, Two Sample Test for p 1 p 2 Ohio Issue 1, the Drug and Criminal Justice Policies
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationCHAPTER 6. The Normal Probability Distribution
The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationConfidence Interval of a Proportion
Confidence Interval of a Proportion FPP 20-21 Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the population). In real-world problems, we do not. In random
More informationDistributions of random variables
Chapter 3 Distributions of random variables 31 Normal distribution Among all the distributions we see in practice, one is overwhelmingly the most common The symmetric, unimodal, bell curve is ubiquitous
More informationThe ctest Package. January 3, 2000
R objects documented: The ctest Package January 3, 2000 bartlett.test....................................... 1 binom.test........................................ 2 cor.test.........................................
More informationThis is a good time to refresh your memory on double-integration. We will be using this skill in the upcoming lectures.
Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 1: Sections 5-1.1 to 5-1.4 For both discrete and continuous random variables we will discuss the following... Joint Distributions (for two or more r.v. s)
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationfor statistical analyses
Using for statistical analyses Robert Bauer Warnemünde, 05/16/2012 Day 6 - Agenda: non-parametric alternatives to t-test and ANOVA (incl. post hoc tests) Wilcoxon Rank Sum/Mann-Whitney U-Test Kruskal-Wallis
More informationChapter Two: Descriptive Methods 1/50
Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained
More informationi2itracks Population Health Analytics (ipha) Custom Reports & Dashboards
i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards 377 Riverside Drive, Suite 300 Franklin, TN 37064 707-575-7100 www.i2ipophealth.com Table of Contents Creating ipha Custom Reports
More informationPART III APPLICATIONS
S. Vieira PART III APPLICATIONS Fuzz IEEE 2013, Hyderabad India 1 Applications Finance Value at Risk estimation based on a PFS model for density forecast of a continuous response variable conditional on
More informationBox-Cox Transformation for Simple Linear Regression
Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are
More informationBIOS: 4120 Lab 11 Answers April 3-4, 2018
BIOS: 4120 Lab 11 Answers April 3-4, 2018 In today s lab we will briefly revisit Fisher s Exact Test, discuss confidence intervals for odds ratios, and review for quiz 3. Note: The material in the first
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationChapter 6. THE NORMAL DISTRIBUTION
Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells
More information