OLS Assumptions and Goodness of Fit
|
|
- Lisa Fisher
- 5 years ago
- Views:
Transcription
1 OLS Assumptions and Goodness of Fit
2 A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free throws B. Make six out of eight Which should I choose? Why?
3 Gauss-Markov Assumptions These are the full ideal conditions If these are met, OLS is BLUE i.e. efficient and unbiased. Your data will rarely meet these conditions This class helps you understand what to do about this.
4 Pop Quiz Take out a sheet of paper and write down all the Gauss-Markov assumptions.
5 Assumptions of Classical Linear Regression A1: The regression model is linear in parameters It may not be linear in variables Y=B 1 +B 2 X i
6 Assumptions of Classical Linear Regression A1: The regression model is linear in parameters It may not be linear in variables Y=B 1 +B 2 X+B 3 X 2
7 Assumptions of Classical Linear Regression A2: X values are fixed in repeated sampling Think about an experiment with different dosages assigned to different groups We can also do this if X values vary in repeated sampling, as long as cov(xi, ui) = 0 See chapter 13 if you re curious about the details
8 What if we violate linearity? If you have a non-linear relationship between X and Y and you don t include an X-squared or X- cubed term, what is the problem? A true relationship may exist between X and Y that you fail to detect.
9 A2: X values are fixed in repeated sampling Think about an experiment with different dosages assigned to different groups We can also do this if X values vary in repeated sampling, as long as cov(xi, ui) = 0 Think about this as requiring random sampling
10 Expected Value of Errors is Zero A3: Mean value of ui = 0. E[ui Xi] = 0 E[ui] = 0 if X is fixed (non-stochastic) Its ok to have big errors, but we can t be wrong systematically We call that bias
11 What if the expected value of the errors is not zero? This would indicate specification error Omitted variable bias, for example
12 Assumptions of Classical Linear Regression Homoskedasticity or Constant Variance of ui
13 What happens if we violate homoskedasticity? This is called heteroskedasticity. Model uncertainty varies from observation to observation. Often true in cross-sectional data due to omitted variable bias. See chapter 13 if you re curious about the details of heteroskedasticity
14 No Autocorrelation A5: No autocorrelation between disturbances cov(ui,uj Xi, Xj) = 0 The observations are sampled independently
15 What if we have autocorrelation? More or less always the case in panel data. So we have panel-corrected standard errors, etc. Also sometimes the case if we sample multiple children from the same family, or multiple regions from the same country, etc. Clustered standard errors
16 Degrees of Freedom A6: Number of observations n must be greater than the number of parameters to be estimated n > number of explanatory variables AKA degrees of freedom
17 Not Enough Degrees of Freedom If you don t have enough degrees of freedom, you can t estimate your parameters The smaller your sample size, the less precise your estimates (i.e. large standard errors). Unable to reject the null hypothesis of no difference even if the true effect is large.
18 Variation but no Outliers A7: X must vary, but there must not be any outliers
19 What if there are outliers? Our model works too hard to fit these values, giving them effectively too much weight This is the squared errors problem.
20 Correct specification A8: Regression model is correctly specified. The correct variables are included We have the correct functional form Correct assumptions about the probability distributions of Y i, X i and u i.
21 No perfect multicollinearity A9: With multiple regression, we add the assumption of no perfect multicollinearity The correlation between any two x variables < 1
22 No perfect multicollinearity With perfect collinearity, we have to drop one x variable to even estimate our betas. With near-perfect collinearity, variance is inflated But estimates are not biased
23 Gauss-Markov Theorem When all those assumptions hold, OLS is BLUE Best Linear Unbiased Estimator Best means least variance (most efficient) Unbiased means: E[ ˆ2] = 2
24 How good does it fit? To measure reduction in errors we need a benchmark for comparison. The mean of the dependent variable is a relevant and tractable benchmark for comparing predictions. The mean of Y represents our best guess at the value of Y i absent other information.
25 Sums of Squares This gives us the following 'sum-of-squares' measures: Total Variation = Explained Variation + Unexplained Variation
26 How well does our model perform? R squared statistic = TSS-USS/TSS =ESS/TSS Bounded between 0 and 1 Higher values indicate a better fit
27 Questions How do the fitted values of Y change if we multiply X by a constant? What if we add a constant to X?
28 Why do we have an error term The error term includes the effect of all X variables not in our model that still effect Y. Parsimony, intrinsic randomness of humans, Vague theory, measurement error, wrong functional form
29 What does an error term imply? If we run our project multiple times, we will estimate a slightly different regression line every time.
30 How do we know if our test statistic is any good? OLS is an estimator It calculates the slope of the sample regression line (i.e. the SRF) It gives us a test statistic (i.e. a p-value) What does that mean? IF AND ONLY IF the assumptions of OLS are met, and the true slope of the population regression line is 0, there is an x percent chance we would estimate a slope this large in our sample regression.
31 Can we test that? YES! First, estimate our regression line, and calculate the critical value (p=.05) 2nd, lets make there be no relationship. shuffle the data 3rd, re-estimate the regression line. Is the slope steeper than our critical value? Repeat steps 2 and 3 10,000 times. How often should the slope of the regression line be greater than the critical value I ve
32 What does that tell us? It tells us our type 1 error rate How often would we reject the null when we shouldn t (i.e. when the null is true) What about Type 2 errors? How often would we fail to reject the null when the true value of beta is actually B1? To calculate that, we need the sample size, the variance of x and y, all the OLS assumptions Or we can simulate it Validity and Power
Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix
Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationStandard Errors in OLS Luke Sonnet
Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More information7. Collinearity and Model Selection
Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationModel Diagnostic tests
Model Diagnostic tests 1. Multicollinearity a) Pairwise correlation test Quick/Group stats/ correlations b) VIF Step 1. Open the EViews workfile named Fish8.wk1. (FROM DATA FILES- TSIME) Step 2. Select
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More informationComparison of Means: The Analysis of Variance: ANOVA
Comparison of Means: The Analysis of Variance: ANOVA The Analysis of Variance (ANOVA) is one of the most widely used basic statistical techniques in experimental design and data analysis. In contrast to
More informationFirst-level fmri modeling
First-level fmri modeling Monday, Lecture 3 Jeanette Mumford University of Wisconsin - Madison What do we need to remember from the last lecture? What is the general structure of a t- statistic? How about
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationPANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model?
PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model? ADESETE, Ahmed Adefemi 12/6/2017 2 PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model? Panel
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationMultivariate Analysis Multivariate Calibration part 2
Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data
More informationData Management - 50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationGov Troubleshooting the Linear Model II: Heteroskedasticity
Gov 2000-10. Troubleshooting the Linear Model II: Heteroskedasticity Matthew Blackwell December 4, 2015 1 / 64 1. Heteroskedasticity 2. Clustering 3. Serial Correlation 4. What s next for you? 2 / 64 Where
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationStatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...
More informationNonparametric Testing
Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationA Multiple-Line Fitting Algorithm Without Initialization Yan Guo
A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee
More informationLabor Economics with STATA. Estimating the Human Capital Model Using Artificial Data
Labor Economics with STATA Liyousew G. Borga December 2, 2015 Estimating the Human Capital Model Using Artificial Data Liyou Borga Labor Economics with STATA December 2, 2015 84 / 105 Outline 1 The Human
More informationExcel Assignment 4: Correlation and Linear Regression (Office 2016 Version)
Economics 225, Spring 2018, Yang Zhou Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version) 30 Points Total, Submit via ecampus by 8:00 AM on Tuesday, May 1, 2018 Please read all
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationAssignments Fill out this form to do the assignments or see your scores.
Assignments Assignment schedule General instructions for online assignments Troubleshooting technical problems Fill out this form to do the assignments or see your scores. Login Course: Statistics W21,
More informationCREATING THE ANALYSIS
Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219
More informationPanel Data 4: Fixed Effects vs Random Effects Models
Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,
More informationBuilding Better Parametric Cost Models
Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute
More informationWeek 11: Interpretation plus
Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationSOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models
SOCY776: Longitudinal Data Analysis Instructor: Natasha Sarkisian Panel Data Analysis: Fixed Effects Models Fixed effects models are similar to the first difference model we considered for two wave data
More informationWorkshop 8: Model selection
Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationThe problem we have now is called variable selection or perhaps model selection. There are several objectives.
STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationEverything you did not want to know about least squares and positional tolerance! (in one hour or less) Raymond J. Hintz, PLS, PhD University of Maine
Everything you did not want to know about least squares and positional tolerance! (in one hour or less) Raymond J. Hintz, PLS, PhD University of Maine Least squares is used in varying degrees in -Conventional
More informationHistorical Data RSM Tutorial Part 1 The Basics
DX10-05-3-HistRSM Rev. 1/27/16 Historical Data RSM Tutorial Part 1 The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert software, intended for response surface
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 14: Introduction to hypothesis testing (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 10 Hypotheses 2 / 10 Quantifying uncertainty Recall the two key goals of inference:
More informationAn Introductory Guide to Stata
An Introductory Guide to Stata Scott L. Minkoff Assistant Professor Department of Political Science Barnard College sminkoff@barnard.edu Updated: July 9, 2012 1 TABLE OF CONTENTS ABOUT THIS GUIDE... 4
More informationWeek 4: Describing data and estimation
Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationSession 2: Fixed and Random Effects Estimation
Session 2: Fixed and Random Effects Estimation Principal, Developing Trade Consultants Ltd. ARTNeT/RIS Capacity Building Workshop on the Use of Gravity Modeling Thursday, November 10, 2011 1 Outline Fixed
More informationChapter 3: Describing, Exploring & Comparing Data
Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationEvaluation Strategies for Network Classification
Evaluation Strategies for Network Classification Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tao Wang, Brian Gallagher, and Tina Eliassi-Rad) 1 Given
More informationQuality Checking an fmri Group Result (art_groupcheck)
Quality Checking an fmri Group Result (art_groupcheck) Paul Mazaika, Feb. 24, 2009 A statistical parameter map of fmri group analyses relies on the assumptions of the General Linear Model (GLM). The assumptions
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationIntroduction to mixed-effects regression for (psycho)linguists
Introduction to mixed-effects regression for (psycho)linguists Martijn Wieling Department of Humanities Computing, University of Groningen Groningen, April 21, 2015 1 Martijn Wieling Introduction to mixed-effects
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More informationStatistical Analysis of MRI Data
Statistical Analysis of MRI Data Shelby Cummings August 1, 2012 Abstract Every day, numerous people around the country go under medical testing with the use of MRI technology. Developed in the late twentieth
More informationDepartments of Economics and Agricultural and Applied Economics Ph.D. Written Qualifying Examination August 2010 will not required
Departments of Economics and Agricultural and Applied Economics Ph.D. Written Qualifying Examination August 2010 Purpose All Ph.D. students are required to take the written Qualifying Examination. The
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationDetecting and Circumventing Collinearity or Ill-Conditioning Problems
Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems Section 8.1 Introduction Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning
More informationStatistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *
OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX
More informationData Analysis Guidelines
Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationData Mining. Wes Wilson Gerry Wiener Bill Myers
Data Mining Wes Wilson Gerry Wiener Bill Myers OLS Ordinary Least Squares You're no less a miracle, just because you're ordinary E.Jong Regression Model: y(t) = W 0 + Σ W i *P i (t) P i is the i th predictor
More informationUsing SPSS with The Fundamentals of Political Science Research
Using SPSS with The Fundamentals of Political Science Research Paul M. Kellstedt and Guy D. Whitten Department of Political Science Texas A&M University c Paul M. Kellstedt and Guy D. Whitten 2009 Contents
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More information1. Assumptions. 1. Introduction. 2. Terminology
4. Process Modeling 4. Process Modeling The goal for this chapter is to present the background and specific analysis techniques needed to construct a statistical model that describes a particular scientific
More informationIntegers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions
Unit 1: Rational Numbers & Exponents M07.A-N & M08.A-N, M08.B-E Essential Questions Standards Content Skills Vocabulary What happens when you add, subtract, multiply and divide integers? What happens when
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationNetwork Management System Dimensioning with Performance Data. Kaisa Tuisku
Network Management System Dimensioning with Performance Data Kaisa Tuisku University of Tampere School of Information Sciences Computer Science M. Sc. thesis Supervisor: Jorma Laurikkala June 2016 i University
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationCPSC 427: Object-Oriented Programming
CPSC 427: Object-Oriented Programming Michael J. Fischer Lecture 18 November 7, 2016 CPSC 427, Lecture 18 1/19 Demo: Craps Game Polymorphic Derivation (continued) Name Visibility CPSC 427, Lecture 18 2/19
More informationError Analysis, Statistics and Graphing
Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your
More informationChapter 8 The C 4.5*stat algorithm
109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationBias-Variance Analysis of Ensemble Learning
Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationParallel line analysis and relative potency in SoftMax Pro 7 Software
APPLICATION NOTE Parallel line analysis and relative potency in SoftMax Pro 7 Software Introduction Biological assays are frequently analyzed with the help of parallel line analysis (PLA). PLA is commonly
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationMissing Data. SPIDA 2012 Part 6 Mixed Models with R:
The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationAlgorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs
Algorithms in Systems Engineering ISE 172 Lecture 12 Dr. Ted Ralphs ISE 172 Lecture 12 1 References for Today s Lecture Required reading Chapter 5 References CLRS Chapter 11 D.E. Knuth, The Art of Computer
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More information