Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.


 Marilyn Ross
 5 years ago
 Views:
Transcription
1 Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department of Statistics at UBC: SOS Program  An hour of free consulting to UBC graduate students. Funded by the Provost and VP Research Office. STAT Stat grad students taking this course offer free statistical advice. Fall semester every academic year. Short Term Consulting Service  Advice from Stat grad students. Feeforservice on small projects (less than 15 hours). Hourly Projects  ASDa professional staff. Feeforservice consulting. Outline Methods for predicting continuous outcomes The language of statistics is not as standardized as you might like! Correlation analysis Simple linear regression Different terms can be used for essentially the same model Statisticians consider regression as the general approach Multiple linear regression ANOVA and ANCOVA
2 Correlation Measures the direction and strength of relationship between numeric variables ρ = 0 ρ = 0.4 ρ = 0.8 Data format Rectangular data, fits in a rectangle where each row represents a sampled unit and each column is a characteristic observed on that unit Example: Each row represents a different car model and the columns are various measured features ρ = 0.9 ρ = 0.95 ρ = 1 ## mpg cyl wt am gear ## Mazda RX ## Mazda RX4 Wag ## Datsun ## Hornet 4 Drive ## Hornet Sportabout ## Valiant ## Duster ## Merc 240D ## Merc ## Merc Association between car weight and mileage 1974 Motor Trends car data (32 different models of cars) Mileage (mpg) Summary statistics from the data ## Mean Variance SD Correlation ## mpg ## wt We can test the correlation between mpg and weight ## Estimate t value p value ## Correlation e10 ## 95% confidence interval: ## Squared Correlation = Weight (in 1000lbs)
3 Data (2 numeric variables) Estimating the parameters from the sample From a random sample of n independent units from a population we measure 2 different variables (data) Each variable has a mean and a variance and the two variables have a correlation Let s call our variables X and Y {X, Y } = {x i, y i } for i = 1,..., n Correlation between x i and x j is 0 Correlation between y i and y j is 0 Correlation between x i and y i is ρ The population parameters: The means are denoted by µ x and µ y The variances are denoted by σ 2 x and σ 2 y The covariance of the two variables is denoted by σ xy The correlation is ρ = σxy σ xσ y is a number between 1 and 1 Formulas for the sample estimates of the population parameters: ˆµ x = x = n 1 x i /n ˆσ 2 x = s 2 x = n 1 (x i x) 2 /(n 1) ˆσ xy = s xy = n 1 (x i x)(y i ȳ)/(n 1) ˆρ xy = r xy = s xy /(s x s y ) Correlation analysis By correlation we usually mean the Pearson productmoment correlation coefficient. It measures the linear relationship between X and Y. ρ is estimated by r = sxy s xs y Hypothesis of interest is usually H 0 : ρ = 0 versus H 1 : ρ 0 and we use the test statistic r n 2 1 r 2 t n 2 We can also use Fisher s Transformation (F(r) = r )) to test if ρ equals any value including 0 and to construct confidence intervals for ρ ln( 1+r Neither method is overly sensitive to the normality assumption. While 1 r 1, < F(r) < Spearman s rank correlation coefficient Is a rank version of Pearson s correlation computed by converting the data for each variable into a rank before computing the correlation. It uses same test statistics as Pearson s correlation. Is closely related to the Pearson s correlation coefficient except the relationship need not be linear in nature Is robust to outliers If the relationship is linear without any extreme points, Spearman s and Pearson s will be similar
4 Two prediction scenarios The prediction line Mileage is random, Weight is not Weight is random, Mileage is not Mileage is random, Weight is not Weight is random, Mileage is not Mileage (mpg) Weight (in 1000lbs) Mileage (mpg) Weight (in 1000lbs) y Weight (in 1000lbs) Mileage (mpg) Weight (in 1000lbs) Mileage (mpg) Determining the prediction line Minimize the error (residuals), the vertical distance between the observed values and the predicted line Fitted values, residuals and mean squared error We use the estimated slope and intercept with each x i to compute a fitted value for y i ŷ i = b 0 + b 1 x i We compute the residual by taking the difference between the observed data and the fitted value ˆε i = y i ŷ i We estimate the variance of the residuals Minimize the sum of the squared errors, method called Least Squares Two parameters determine the line: slope and intercept x ˆσ 2 ε = s 2 ε = ˆε 2 /(n 2) Note: The average of the residuals is 0. We use n 2 because we estimated 2 parameters (b 0 and b 1 ).
5 Least Squares estimates for the slope and intercept The slope is primarily determined by the correlation between Y and X. It s magnitude is limited by the ratio of the standard deviation of Y and X. s slope: b 1 = r y xy s x = sxy sx 2 We estimate the intercept by picking a point on the line then using our estimate of the slope, solve for the intercept. { x, ȳ} is always on the line. intercept: b 0 = ȳ b 1 x With some math it can be shown that both b 0 and b 1 can be computed as a weighted average of Y. This means both will follow a normal distribution if n is large enough by the central limit theorem of statistics (CLT). Example revisited Summary statistics from the data ## Mean SD Correlation ## mpg ## wt Regression coefficients predicting mpg by weight ## (Intercept) wt ## ## * / = ## * = Estimated line to predict mpg from weight t value Pr(> t ) ## (Intercept) e19 ## wt e10 ## R squared = Estimated line to predict weight from mgp t value Pr(> t ) ## (Intercept) e18 ## mpg e10 Compare the 3 results ## Estimate t value p value ## Correlation e10 t value Pr(> t ) ## wt e10 t value Pr(> t ) ## mpg e10 All three have the same t value and p value ## R squared = 0.753
6 Fitted values and residuals ŷ i = b 0 + b 1 x i Fit i = ( 5.34wt i ) ˆε i = y i ŷ i Res i = wt i Fit i ## wt mpg Fit Res ## Mazda RX ## Mazda RX4 Wag ## Datsun ## Hornet 4 Drive ## Hornet Sportabout ## MSE = The coefficient of determination Better prediction with regression line compared to red mean line? Mileage (mpg) Weight (in 1000lbs) Variation line doesn t explain (sum square of errors to line) = 278 Total variation in y (sum square of errors to ȳ) = 1126 % variation NOT explained by the line = 278 / 1126 = % variation explained by the line = = The coefficient of determination Tells how much better at predicting Y the regression model is compared to just using ȳ proportion of the variance of Y explained by the model depends on the size of the errors (residuals) For simple linear regression R 2 = ρ 2 xy The R 2 alone doesn t tell the whole story: The more variables in the model the higher the R 2 BUT also the higher the variability in the predictions made from the model (due to having to estimate more coefficients for the model) With large sample sizes, the R 2 value could still be low even with highly significant model coefficients With small sample sizes, the R 2 value could still be high even with insignificant model coefficients The error of the fitted value Decreases as x i moves closer to x fitted values at the endpoints depend greatly on the slope of the line fitted values at the middle are relatively insensitive to the slope Decreases as the variance in x increases since the slope is determined by the endpoints The distribution of the fitted value can be approximated by a normal distribution because of the CLT. This means we can compute accurate confidence bounds for the fitted line.
7 95% Confidence Interval (green) for the fitted line y y Prediction Interval High Intercept High Slope Low Slope The prediction estimate is the same as the fitted value The uncertainty (variability) in the prediction includes the uncertainty in the fitted line (model) plus the variability in the y data There is a 95% chance that the interval at a given x i covers the true mean for y i x 95% Prediction Interval (blue dashed) Confidence Interval for mean yi Prediction Interval for yi There is a 95% chance that the interval at a given x i covers the true y i x Simple Linear Regression A line that summarizes the relationship between two quantitative variables and is used to make predictions Regression assumes that Y is random but X is not Model the mean of Y as a function of X µ yi = β 0 + β 1 x i Each observation is described as being some distance (error ε i ) from the estimated mean where ε i N(0, σ 2 ) Assumptions: normality is not critical homoscedasticity (constant variance) is important independent errors is critical
8 Diagnostics for Regression Example Revisited Before fitting the model plot y i versus x i Does the spread of Y depend on X? What does the relationship look like? Are there any extreme X or Y values? Mileage (mpg) Raw Data Residuals Chrysler Imperial Residuals vs Fitted Fiat 128 Toyota Corolla Plot the residuals versus the quantiles of a normal distribution Weight (in 1000lbs) Fitted values Plot residuals (ε i ) versus fitted values (ŷ i ). You should see an uncorrelated oval of data. If the data can be ordered over time or space, check the residuals for indications of serial correlation Examine the influence of each observation using Cook s distance Standardized residuals Normal Q Q Theoretical Quantiles Fiat 128 Chrysler Toyota Imperial Corolla Cook's distance Cook's distance Chrysler Imperial Toyota Corolla Fiat Obs. number Heteroscedasticity Serial Correlation
9 Effect of Outliers or high leverage points y y y y Regession with 2 predictors Motor Trends car data (32 different models of cars) mpg x x x x hp wt Regression predicting mpg by hp and weight (wt) Summary statistics from the data ## Mean Variance SD ## mpg ## hp ## wt ## correlations between the variables ## mpg hp wt ## mpg ## hp ## wt t value Pr(> t ) ## (Intercept) e20 ## wt e06 ## hp e03 ## R squared = ## wt hp mpg Fit Res ## Mazda RX ## Mazda RX4 Wag ## Datsun ## MSE = 6.726
10 Model Diagnostics Multiple Linear Regression (2 predictors) Residuals Chrysler Imperial Residuals vs Fitted Fiat Toyota 128Corolla Standardized residuals Normal Q Q Toyota Fiat 128 Chrysler Corolla Imperial The assumptions are the same but the equation (model) contains more parameters: µ Y = β 0 + β 1 X 1 + β 2 X 2 computing the estimates of the coefficients is more complicated but can be easily expressed in matrix form (not presented here) Cook's distance Fitted values Cook's distance Chrysler Imperial Maserati Bora Toyota Corolla 0 5 Obs. number Theoretical Quantiles The key concern is how much correlation exists between X 1 and X 2 since severe multicollinearity can increase the variance of the coefficient estimates can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model assess using the variance inflation factor (VIF), 1/(1 ρ 2 x 1,x 2 ) if VIF >= 5, coefficients are poorly estimated and one should be wary of their pvalues doesn t affect how well the model fits, a model with severe multicollinearity can produce great predictions Example 1 Random data Y, X 1, X 2 Example 2 Random data Y, X 1, X 2 True model: Y = X 1 with ρ x1,x 2 = 0 ## X ## X ## X ## X SE multiplier = 1.0 True model: Y = X 1 with ρ x1,x 2 = 0.5 ## X ## X ## X ## X SE multiplier = / = 1.15
11 Example 3 Random data Y, X 1, X 2 Example 4 Random Data Y, X 1, X 2 True model: Y = X 1 with ρ x1,x 2 = 0.9 ## X ## X ## X ## X SE multiplier = / = 2.28 True model: Y = X 1 /2 + X 2 /2 with ρ x1,x 2 = 0 ## X ## X ## X ## X Example 5 Random data Y, X 1, X 2 Example 6 Random Data Y, X 1, X 2 True model: Y = X 1 /2 + X 2 /2 with ρ x1,x 2 = 0.5 ## X ## X ## X ## X True model: Y = X 1 /2 + X 2 /2 with ρ x1,x 2 = 0.9 ## X ## X ## X ## X
12 Dealing with multicollinearity What makes a regession linear? Possible solutions: standardize the predictors (subtract the mean) remove highly correlated predictors linearly combine predictors (add them together) do nothing if the pvalues aren t important If you can live with less precise coefficient estimates, or a model that has a high Rsquared but few significant predictors, doing nothing can be the correct decision because it won t impact the fit Linear regression refers to the coefficients in the model. It has nothing to do with the way X is included in the model or the relationship between X and Y. It is the form of β: Linear regression models: µ y = β 0 + β 1 X + β 2 X 2 µ y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + β 4 X β 5X 2 2 Not a linear regression model because of β 2 1 : µ y = β 0 + β 1 X 1 + β 2 1 X 2 This is a (linear) Regression The observed versus the model fit µ y = 2 + x/2 + x 2 /5 Linear Regression Fit Linear Regression Fit y Observed Fitted x
13 Regression with k predictors Like the two predictor case, we can compute all the parameter estimates easily using matrix algebra. The model includes more terms but otherwise the issues are similar to the two predictor case. We need to look at the correlation matrix of all the predictors to determine if colinearity is a problem. While a predictor may not be highly correlated to any one other predictor, it may be highly correlated with a set of other predictors. If some variables are excluded, all the other predictors in the model will cooperate to act as surrogates for the excluded variables depending on the strength of the correlation between the covariates If more than two predictors are included in the model then the VIF is unique for each predictor and is computed by regressing each predictor on the other predictors in the model and looking at the coefficient of determination ANOVA as a regression model (categorical predictors) In order to fit an ANOVA model as a regression model, the p level factor variable is converted into p 1 indicator variables Example: The 3 level cylinder predictor is converted into 2 indicator variables, one for 6 cylinders and one for 8 cylinders ## mpg cyl ## Mazda RX ## Mazda RX4 Wag ## Datsun ## Hornet 4 Drive ## Hornet Sportabout ## mpg cyl6 cyl8 ## Mazda RX ## Mazda RX4 Wag ## Datsun ## Hornet 4 Drive ## Hornet Sportabout Example mpg ## cyl N Mean SD ## ## ## t value Pr(> t ) ## (Intercept) e22 ## cyl e04 ## cyl e # of cylinders
14 What about the constant variance assumption? Residuals Y Y ANCOVA continuous and categorical predictor Residuals vs Fitted Toyota Corolla Fiat 128 Volvo 142E Standardized residuals Volvo 142E Normal Q Q Toyota Corolla Fiat 128 This is the common names for a model that contains both categorical and continuous predictors. We include categorical predictors by converting them into indicator variables then proceed with a multiple regression. Only difference is this time the model also includes a continuous predictor X Fitted values Theoretical Quantiles Example: Treatment versus placebo with a covariate Cook's distance Cook's distance Toyota Corolla Fiat 128 Volvo 142E 0 5 Obs. number µ Y = β 0 + β 1 I(trt) + β 2 X b 0 = intercept for the placebo group b 1 = change in intercept from placebo to treatment b 2 = common slope for the covariate This model fits parallel lines µ y = 5 + 5I(trt) + X Regression estimates ## Placebo ## TRTPLB ## Intercept ## Slope ## Placebo ## TRTPLB ## Slope Placebo Treatment X
15 Model with different slopes µ y = 5 + 5I(treat) + X XI(treat) The slopes of our continuous predictor need not be the same for each level of our categorical predictor. We can model this with an interaction term. µ Y = β 0 + β 1 I(trt) + β 2 X + β 3 XI(trt) b 0 = intercept for the placebo group b 1 = change in intercept from placebo to treatment b 2 = slope for the placebo group b 3 = change in slope from placebo to treatment Lines are no longer parallel meaning the difference between the groups changes with the value of X Y X Regression estimates ## Intercept ## Slope ## Placebo ## TRTPLB ## Slope ## Int(Placebo) ## Int(TRTPLB) ## Slp(Placebo) ## Slp(TRTPLB) Sample Size and Power calculations α (or Significance) is the chance of accepting what is false. It does not depend on n, the sample size, but is chosen. Want it to be small and is commonly set at β (or Power) is the chance of accepting what is true. It depends on n and increases as n increases. To calculate Power, you need to specify each parameter in the model (based on previous studies, educated guesses or some other source). The more complicated the model the more parameters you need to specify. Java Applets to calculate sample size and power G*Power for Mac or Windows
16 Questions? Department of Statistics at UBC: SOS Program  An hour of free consulting to UBC graduate students. Funded by the Provost and VP Research Office. STAT Stat grad students taking this course offer free statistical advice. Fall semester every academic year. Short Term Consulting Service  Advice from Stat grad students. Feeforservice on small projects (less than 15 hours). Hourly Projects  ASDa professional staff. Feeforservice consulting.
Regression Models Course Project Vincent MARIN 28 juillet 2016
Regression Models Course Project Vincent MARIN 28 juillet 2016 Executive Summary "Is an automatic or manual transmission better for MPG" "Quantify the MPG difference between automatic and manual transmissions"
More informationMixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.
Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento  FBK 2 March, 2015 (UNITNFBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationIntroduction for heatmap3 package
Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are
More informationChapter 4: Analyzing Bivariate Data with Fathom
Chapter 4: Analyzing Bivariate Data with Fathom Summary: Building from ideas introduced in Chapter 3, teachers continue to analyze automobile data using Fathom to look for relationships between two quantitative
More informationWill Landau. January 24, 2013
Iowa State University January 24, 2013 Iowa State University January 24, 2013 1 / 30 Outline Iowa State University January 24, 2013 2 / 30 statistics: the use of plots and numerical summaries to describe
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationQuick Guide for pairheatmap Package
Quick Guide for pairheatmap Package Xiaoyong Sun February 7, 01 Contents McDermott Center for Human Growth & Development The University of Texas Southwestern Medical Center Dallas, TX 75390, USA 1 Introduction
More informationBasic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)
Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20%20mim/0%20classes/13_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.
More informationIntroduction to R: Day 2 September 20, 2017
Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationCDAA No. 4  Part Two  Multiple Regression  Initial Data Screening
CDAA No. 4  Part Two  Multiple Regression  Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationStudy Guide. Module 1. Key Terms
Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiaodong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 20162017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationTHE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann
Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 275993270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easytouse data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationOne Factor Experiments
One Factor Experiments 201 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and FTest Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationRSM SplitPlot Designs & Diagnostics Solve RealWorld Problems
RSM SplitPlot Designs & Diagnostics Solve RealWorld Problems Shari Kraber Pat Whitcomb Martin Bezener StatEase, Inc. StatEase, Inc. StatEase, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easytouse
More informationSLStats.notebook. January 12, Statistics:
Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pretreatment 1. Normalization 2. Centering,
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationSection E. Measuring the Strength of A Linear Association
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationCrossvalidation and the Bootstrap
Crossvalidation and the Bootstrap In the section we discuss two resampling methods: crossvalidation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationTwoStage Least Squares
Chapter 316 TwoStage Least Squares Introduction This procedure calculates the twostage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationIntroductory Applied Statistics: A Variable Approach TI Manual
Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright
More informationProblem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA
ECL 290 Statistical Models in Ecology using R Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA Datasets in this problem set adapted from those provided
More informationCorrelation and Regression
How are X and Y Related Correlation and Regression Causation (regression) p(y X=) Correlation p(y=, X=) http://kcd.com/552/ Correlation Can be Induced b Man Mechanisms # of Pups 1 2 3 4 5 6 7 8 Inbreeding
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology StepbyStep  Minitab Minitab is a statistical software application originally created
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationRobust Linear Regression (Passing Bablok MedianSlope)
Chapter 314 Robust Linear Regression (Passing Bablok MedianSlope) Introduction This procedure performs robust linear regression estimation using the PassingBablok (1988) medianslope algorithm. Their
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationGoals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)
SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple
More informationOutline. Topic 16  Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16  Other Remedies Ridge Regression Robust Regression Regression Trees Outline  Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationSecondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012
Secondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012 Important Notes for Teachers: The vocabulary cards in this file match the Common Core, the math curriculum adopted by the Utah State Board
More informationCCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 912
Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 912 Name of Reviewer School/District Date Name of Curriculum Materials:
More informationReference
Leaning diary: research methodology 30.11.2017 Name: Juriaan Zandvliet Student number: 291380 (1) a short description of each topic of the course, (2) desciption of possible examples or exercises done
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationChapter 5snow year.notebook March 15, 2018
Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data
More informationPsychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding
Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using
More information8.NS.1 8.NS.2. 8.EE.7.a 8.EE.4 8.EE.5 8.EE.6
Standard 8.NS.1 8.NS.2 8.EE.1 8.EE.2 8.EE.3 8.EE.4 8.EE.5 8.EE.6 8.EE.7 8.EE.7.a Jackson County Core Curriculum Collaborative (JC4) 8th Grade Math Learning Targets in Student Friendly Language I can identify
More informationBIOL 458 BIOMETRY Lab 10  Multiple Regression
BIOL 458 BIOMETRY Lab 10  Multiple Regression Many problems in science involve the analysis of multivariable data sets. For data sets in which there is a single continuous dependent variable, but several
More informationStatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 19382018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationData Management  50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management  50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationRegression on SAT Scores of 374 High Schools and Kmeans on Clustering Schools
Regression on SAT Scores of 374 High Schools and Kmeans on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin  La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin  La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationThis is a simple example of how the lasso regression model works.
1 of 29 5/25/2016 11:26 AM This is a simple example of how the lasso regression model works. save.image("backup.rdata") rm(list=ls()) library("glmnet") ## Loading required package: Matrix ## ## Attaching
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationUsing Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers
Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being
More informationAnalysis of variance  ANOVA
Analysis of variance  ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationBivariate (Simple) Regression Analysis
Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (twovariable) regression equation. It assumes that you have set Stata up on your
More informationIAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram
IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationCREATING THE ANALYSIS
Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 ResidualsbyPredictedPlot...219
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationSPSS INSTRUCTION CHAPTER 9
SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeatedmeasures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationHeteroskedasticity and Homoskedasticity, and HomoskedasticityOnly Standard Errors
Heteroskedasticity and Homoskedasticity, and HomoskedasticityOnly Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationLecture 20: Outliers and Influential Points
Lecture 20: Outliers and Influential Points An outlier is a point with a large residual. An influential point is a point that has a large impact on the regression. Surprisingly, these are not the same
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationChapters 56: Statistical Inference Methods
Chapters 56: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational rawscore formula Survey data Fun facts about r Sensitivity to outliers Spearman rankorder
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More information14.2 The Regression Equation
14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas
More informationThe Truth behind PGA Tour Player Scores
The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the
More informationSASEG 9B Regression Assumptions
SASEG 9B Regression Assumptions (Fall 2015) Sources (adapted with permission) T. P. Cronan, Jeff Mullins, Ron Freeze, and David E. Douglas Course and Classroom Notes Enterprise Systems, Sam M. Walton
More informationMiddle School Math Course 2
Middle School Math Course 2 Correlation of the ALEKS course Middle School Math Course 2 to the Indiana Academic Standards for Mathematics Grade 7 (2014) 1: NUMBER SENSE = ALEKS course topic that addresses
More informationStat 241 Review Problems
1 Even when things are running smoothly, 5% of the parts produced by a certain manufacturing process are defective. a) If you select parts at random, what is the probability that none of them are defective?
More informationCHAPTER 2: SAMPLING AND DATA
CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 StemandLeaf Graphs (Stemplots),
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationLab 5  Risk Analysis, Robustness, and Power
Type equation here.biology 458 Biometry Lab 5  Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors
More informationMeasures of Dispersion
Lesson 7.6 Objectives Find the variance of a set of data. Calculate standard deviation for a set of data. Read data from a normal curve. Estimate the area under a curve. Variance Measures of Dispersion
More informationAP Statistics Summer Assignment:
AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this
More informationNuts and Bolts Research Methods Symposium
Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Topics to Discuss: Types of Variables Constructing a Variable Code Book Developing Excel Spreadsheets
More informationError Analysis, Statistics and Graphing
Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your
More informationObjects, Class and Attributes
Objects, Class and Attributes Introduction to objects classes and attributes Practically speaking everything you encounter in R is an object. R has a few different classes of objects. I will talk mainly
More informationRecall the expression for the minimum significant difference (w) used in the Tukey fixedrange method for means separation:
Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,
More informationLab #7  More on Regression in R Econ 224 September 18th, 2018
Lab #7  More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas
More informationGeneralized least squares (GLS) estimates of the level2 coefficients,
Contents 1 Conceptual and Statistical Background for TwoLevel Models...7 1.1 The general twolevel model... 7 1.1.1 Level1 model... 8 1.1.2 Level2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical
More informationUNIT 1A EXPLORING UNIVARIATE DATA
A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics
More information