The problem we have now is called variable selection or perhaps model selection. There are several objectives.
|
|
- Doris Bryan
- 5 years ago
- Views:
Transcription
1 STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We like these to be close to 1, and we certainly get upset when they exceed 10. VIF stands for variance inflation factor. The lowest possible value is 1.0, which is considered good. High values represent trouble, in that a variable with high VIF is likely to be strongly linearly dependent on other independent variables. A related concept is the TOLERANCE, which is provided by some other software. The 1 quantities are related as VIF =. TOLERANCE values are between 0 (bad) TOLERANCE and 1 (good). For a problem in which we regress Y on (A, B, C, D), the TOLERANCE for variable A is defined as TOLERANCE(A) = 1 - R (regression of A on {B, C, D} ) Similarly, the TOLERANCE for variable B is TOLERANCE(B) = 1 - R (regression of B on {A, C, D} ) Thus, we ve got a good regression with some problems. The likely possibility is that there are some strong dependencies among A, B, C, and D, since each has a large VIF number. You should observe that the VIF numbers are computed without reference to the dependent variable Y. That is, these VIFs are comments only about the independent variables. The problem we have now is called variable selection or perhaps model selection. There are several objectives. (1) We d like to get an model in which all the variables play an active role. That is, each variable has p () We d like to get a model that s very good for prediction. (3) We d like to get a model in which we can figure out the roles of the independent variables in determining Y. Objectives (1) and () are achievable. Objective (3) cannot always be achieved. 1
2 There are several strategies that can be used. One is to remove independent variables one at a time until the VIF values for the variables remaining are all acceptable. This works sometimes, but many people like to use an automated method. Minitab employs two automated methods, stepwise regression and best subset regression. In terms of selecting which variables to use, let s look at the automated procedures best subsets and stepwise regression. Let s start with best subsets. This method will list the best (in terms of R ) model for each number of independent variables to be tried. The best subsets option will by default list the two best models for each level of complexity. You will likely find it easier to just list one. You can fix this by Stat Regression Best Subsets Options Models of each size to print: (then select 1). In the best subset regression, the program will show the best model(s) of each level of complexity. Note this: Quality of fit is measured by R, discussed just below. R adj, and s ε. Also given is the C p statistic, The number of models of each level of complexity to be shown is specified by the user. The Minitab default is, but most users just want to see the single best model. The C p statistic is frequently used as a measure of fit of any particular model. The p here is 1 + number of independent variables used in the test model. (The phrase test model refers to the set of independent variables currently being tried out.) The statistic is defined as Residual SUM of squares for test model C p = ( n p) Residual MEAN square using all the independent variables It always happens that the model with all the variables has C p = p exactly. For test models, a good fit is indicated by C p p, with C p < p even better. It should be pointed out that C p measures the quality of a test model relative to the model that uses all available independent variables. It could easily happen that one has a very bad model even while using all the available independent variables. Here s a crude layout:
3 C p >> p (much bigger) C p > p C p p C p < p test model is definitely not acceptable judgment call test model is acceptable test model is excellent Why does this work? Here s a short digression. Case 1. The proposed test model is adequate (at least compared to the model that uses all available predictors). That is, the variables that appear among the K all predictors in the full model but do not appear in the proposed test model are irrelevant. In this case, the residual mean square in the analysis of variance table would still estimate σ. Thus, Residual mean square in test model estimates σ or (what is the same thing) Residual sum of squares n p in test model estimates σ. We can rewrite the statement above as Residual sum of squares in test model estimates (n - p) σ. This final statement corresponds to the numerator of C p. Thus, when the proposed test model is adequate the numerator of C p estimates (n - p) σ the denominator of C p estimates σ p σ overall, C p estimates ( ) ( n p) n σ = p Thus C p should be close to p for any adequate test model. 3
4 Case. The proposed test model is not adequate. That is, the variables that appear among the K all predictors in the full model but do not appear in the proposed test model are relevant to the relationship with the dependent variable. In this case, the residual mean square in the analysis of variance table estimates something larger than σ. Why? The residual sum of squares is ( ˆ ) n i= 1 Y i Y i, where Y ˆi is the i th value. If the proposed test model is not adequate, then Y ˆi is far away from its best value, and this sum of squares is inflated. Following the logic of case 1, we see now that C p will estimate something larger than p. test Thus, for adequate test models, C p estimates p for inadequate test models, C p estimates something larger than p In order to select a model, we choose the simplest model (smallest number of predictors) for which C p is near p. Of course, using the near p statement requires some judgment. Finding C p < p is certainly an indication of a good fit. You are not morally compelled to obey the dictates of the C p statistic. It s a very helpful suggestion. You might the R column or the s ε column more compelling. This will be illustrated (first) with the library data and then with the low birth weight example. In the latter, we will keep together the race indicators. These are related to separate handouts. We looked at best subsets as a method for screening out potential regression models. This will list the best model at each level of complexity and leave us with a relatively easy selection job. Stepwise regression pushes this one step further, and actually selects a model for us. Stepwise regression, as performed by Minitab, will start with an empty model (no predictors) and then sequentially add variables to the model as long as it seems that the quality of fit is being improved. Actually, there is a formal inferential-type step involved in this, requiring that any variable added to the model must do so with a t statistic with a p-value less than or equal to some threshold, called alpha-to-enter, set by default to Stepwise regression can even remove a variable from a regression model, if it fails the t 4
5 criterion; the corresponding threshold on the p-value, called alpha-to-remove, is also set by default to Here we ll recommend that these values be set to 0.05, so that the stepwise regression decisions will be more likely to agree with decision made through best subsets regression. Here is the set of Minitab commands: Stat Regression Stepwise Methods Use alpha values You might wish to reset the alpha values from 0.15 to This tends to make stepwise easier to compare to best subsets. Before we do this with the low birth weight data set, let s looks at the original fitted equation with all variables. This was The regression equation is BWT = AGE LWT SMOKE - 49 PTL HT UI FTV AfAmer OtherRace The interpretation on -489 (for AfAmer) must be that, all other things equal, AfAmer babies would be predicted to be 489 g lighter than White babies. Here the white indicator was not used. Similarly, the OtherRace babies would be predicted to be 357 g lighter than White babies. If you had run this with indicators for White and AfAmer (but not OtherRace), you d get this: The regression equation is BWT = AGE LWT SMOKE - 49 PTL HT UI FTV White AfAmer Now the coefficient on White is 357. This says that, everything else equal, White babies would be predicted to be 357 g heavier than OtherRace babies. Note also that AfAmer babies would be predicted to be [ (-133) 357 ] = 490 g lighter than White babies. There are internal consistencies here. Thus, in using a procedure like Stepwise or Best Subsets, we should keep these indicator sets together! Here now is the Best Subsets run on the low birth weight data. This is on a separate handout. 5
6 The methods illustrated here, best subsets and stepwise, have some great advantages and disadvantages. Advantages of best subsets regression and stepwise regression: The procedures are automated, so that the user does not have to think about correlations, VIF numbers, residual sums of squares. The procedures actually make choices. They are bold enough to actually select a model. (Well, best subset regression only goes as far as selecting the best model for each size, but the user s role thereafter is pretty easy.) The procedures do not care about collinearity. The procedures (especially stepwise) can be used in cases where there is a great excess of independent variables. Indeed, you can use stepwise regression even when n is less than the number of independent variables! (Minitab will not allow you to do best subsets in this case.) Disadvantages of best subsets regression and stepwise regression: The procedures sometimes select the wrong variables. For example, if A is really the variable that drives Y, you would like the regression to use variable A. If B is a correlated proxy for A, it could very well happen that the procedure uses B and omits A. The fit is often too good, in that s ε for the selected model may be rather smaller than σ ε, the true-but-unknown noise standard deviation. This occurs because the procedures choose among models which fluctuate around the truth, favoring models with low s ε. The statistical inferential calculations (t, p-values, F) are bogus. They were obtained after several steps of data-torturing and simply do not have the statistical properties of regressions done without all these steps. 6
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed
More information7. Collinearity and Model Selection
Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017
CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS
More informationCPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationPanel Data 4: Fixed Effects vs Random Effects Models
Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More information( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.
Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING
More informationDepth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3
Depth First Search A B C D E F G A 4 3 2 B 4 5 4 3 C 5 D 3 4 2 E 2 2 3 F 3 2 G 2 3 Minimum (Weight) Spanning Trees Let G be a graph with weights on the edges. We define the weight of any subgraph of G
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationHistorical Data RSM Tutorial Part 1 The Basics
DX10-05-3-HistRSM Rev. 1/27/16 Historical Data RSM Tutorial Part 1 The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert software, intended for response surface
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my Algebra course that I teach here at Lamar University, although I have to admit that it s been years since I last taught this course. At this point in my career I
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationProofwriting Checklist
CS103 Winter 2019 Proofwriting Checklist Cynthia Lee Keith Schwarz Over the years, we ve found many common proofwriting errors that can easily be spotted once you know how to look for them. In this handout,
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationCSC 2515 Introduction to Machine Learning Assignment 2
CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly
More informationDiode Lab vs Lab 0. You looked at the residuals of the fit, and they probably looked like random noise.
Diode Lab vs Lab In Lab, the data was from a nearly perfect sine wave of large amplitude from a signal generator. The function you were fitting was a sine wave with an offset, an amplitude, a frequency,
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationIntroduction to Programming
CHAPTER 1 Introduction to Programming Begin at the beginning, and go on till you come to the end: then stop. This method of telling a story is as good today as it was when the King of Hearts prescribed
More informationRACKET BASICS, ORDER OF EVALUATION, RECURSION 1
RACKET BASICS, ORDER OF EVALUATION, RECURSION 1 COMPUTER SCIENCE 61AS 1. What is functional programming? Give an example of a function below: Functional Programming In functional programming, you do not
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationSELECTION OF A MULTIVARIATE CALIBRATION METHOD
SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper
More informationpredict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015
predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout
More information6.001 Notes: Section 8.1
6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationLiving with Collinearity in Local Regression Models
Living with Collinearity in Local Regression Models Chris Brunsdon 1, Martin Charlton 2, Paul Harris 2 1 People Space and Place, Roxby Building, University of Liverpool,L69 7ZT, UK Tel. +44 151 794 2837
More informationLecture 20: Outliers and Influential Points
Lecture 20: Outliers and Influential Points An outlier is a point with a large residual. An influential point is a point that has a large impact on the regression. Surprisingly, these are not the same
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationEverything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness.
/Users/astacbf/Desktop/Assessing smartpls (engelsk).docx 1/8 Assessing smartpls Everything taken from (Hair, Hult et al. 017) but some formulas taken elswere or created by Erik Mønness. Run PLS algorithm,
More informationIntro. Scheme Basics. scm> 5 5. scm>
Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if
More informationDidacticiel - Études de cas
Subject In this tutorial, we use the stepwise discriminant analysis (STEPDISC) in order to determine useful variables for a classification task. Feature selection for supervised learning Feature selection.
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationRapid Software Testing Guide to Making Good Bug Reports
Rapid Software Testing Guide to Making Good Bug Reports By James Bach, Satisfice, Inc. v.1.0 Bug reporting is a very important part of testing. The bug report, whether oral or written, is the single most
More informationPsychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding
Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using
More informationFractional. Design of Experiments. Overview. Scenario
Design of Experiments Overview We are going to learn about DOEs. Specifically, you ll learn what a DOE is, as well as, what a key concept known as Confounding is all about. Finally, you ll learn what the
More informationIndependent Variables
1 Stepwise Multiple Regression Olivia Cohen Com 631, Spring 2017 Data: Film & TV Usage 2015 I. MODEL Independent Variables Demographics Item: Age Item: Income Dummied Item: Gender (Female) Digital Media
More informationChapter 7: Linear regression
Chapter 7: Linear regression Objective (1) Learn how to model association bet. 2 variables using a straight line (called "linear regression"). (2) Learn to assess the quality of regression models. (3)
More informationStatistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}
Statistics 1 - Basic Commands http://mathbits.com/mathbits/tisection/statistics1/basiccommands.htm Page 1 of 3 Entering Data: Basic Commands Consider the data set: {15, 22, 32, 31, 52, 41, 11} Data is
More informationIntro To Excel Spreadsheet for use in Introductory Sciences
INTRO TO EXCEL SPREADSHEET (World Population) Objectives: Become familiar with the Excel spreadsheet environment. (Parts 1-5) Learn to create and save a worksheet. (Part 1) Perform simple calculations,
More informationES-2 Lecture: Fitting models to data
ES-2 Lecture: Fitting models to data Outline Motivation: why fit models to data? Special case (exact solution): # unknowns in model =# datapoints Typical case (approximate solution): # unknowns in model
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationSection 0.3 The Order of Operations
Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation
More informationNote that ALL of these points are Intercepts(along an axis), something you should see often in later work.
SECTION 1.1: Plotting Coordinate Points on the X-Y Graph This should be a review subject, as it was covered in the prerequisite coursework. But as a reminder, and for practice, plot each of the following
More informationTable of Laplace Transforms
Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationFirst-level fmri modeling
First-level fmri modeling Monday, Lecture 3 Jeanette Mumford University of Wisconsin - Madison What do we need to remember from the last lecture? What is the general structure of a t- statistic? How about
More informationSets. Sets. Examples. 5 2 {2, 3, 5} 2, 3 2 {2, 3, 5} 1 /2 {2, 3, 5}
Sets We won t spend much time on the material from this and the next two chapters, Functions and Inverse Functions. That s because these three chapters are mostly a review of some of the math that s a
More informationLab 07: Multiple Linear Regression: Variable Selection
Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics
More informationMath 1: Solutions to Written Homework 1 Due Friday, October 3, 2008
Instructions: You are encouraged to work out solutions to these problems in groups! Discuss the problems with your classmates, the tutors and/or the instructors. After working doing so, please write up
More informationExercise: Graphing and Least Squares Fitting in Quattro Pro
Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.
More informationThe first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.
Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you
More informationCSCI 204 Introduction to Computer Science II. Lab 6: Stack ADT
CSCI 204 Introduction to Computer Science II 1. Objectives In this lab, you will practice the following: Learn about the Stack ADT Implement the Stack ADT using an array Lab 6: Stack ADT Use a Stack to
More informationBuilding Better Parametric Cost Models
Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute
More informationLinear and Quadratic Least Squares
Linear and Quadratic Least Squares Prepared by Stephanie Quintal, graduate student Dept. of Mathematical Sciences, UMass Lowell in collaboration with Marvin Stick Dept. of Mathematical Sciences, UMass
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several
More informationCSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression
CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised
More informationA QUICK INTRODUCTION TO MINITAB
A QUICK INTRODUCTION TO MINITAB The Stern School Statistics Group uses Minitab Release 14 for Windows as its course software. This program was chosen specifically for use with courses B01.1305, C22.0103,
More informationData Analysis Multiple Regression
Introduction Visual-XSel 14.0 is both, a powerful software to create a DoE (Design of Experiment) as well as to evaluate the results, or historical data. After starting the software, the main guide shows
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationSkill 1: Multiplying Polynomials
CS103 Spring 2018 Mathematical Prerequisites Although CS103 is primarily a math class, this course does not require any higher math as a prerequisite. The most advanced level of mathematics you'll need
More informationLibrary Website Migration and Chat Functionality/Aesthetics Study February 2013
Library Website Migration and Chat Functionality/Aesthetics Study February 2013 Summary of Study and Results Georgia State University is in the process of migrating its website from RedDot to WordPress
More information(iv) insufficient flexibility under conditions of increasing load, (vi) the scheme breaks down because of message length indeterminacy.
Edwin W. Meyer, Jr. MIT Project MAC 27 June 1970 The method of flow control described in RFC 54, prior allocation of buffer space by the use of ALL network commands, has one particular advantage. If no
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationIn the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the
In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the relationships between concepts. And we discussed common
More informationChapter 3 Analyzing Normal Quantitative Data
Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing
More informationUsing Excel for Graphical Analysis of Data
Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationMeeting 1 Introduction to Functions. Part 1 Graphing Points on a Plane (REVIEW) Part 2 What is a function?
Meeting 1 Introduction to Functions Part 1 Graphing Points on a Plane (REVIEW) A plane is a flat, two-dimensional surface. We describe particular locations, or points, on a plane relative to two number
More information1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More information1. The Normal Distribution, continued
Math 1125-Introductory Statistics Lecture 16 10/9/06 1. The Normal Distribution, continued Recall that the standard normal distribution is symmetric about z = 0, so the area to the right of zero is 0.5000.
More informationExcel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller
Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing
More informationReplication. Feb 10, 2016 CPSC 416
Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front
More informationEconomics Nonparametric Econometrics
Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class
More informationx 2 + 3, r 4(x) = x2 1
Math 121 (Lesieutre); 4.2: Rational functions; September 1, 2017 1. What is a rational function? It s a function of the form p(x), where p(x) and q(x) are both polynomials. In other words, q(x) something
More informationMAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic
MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent
More informationChapter 1 Operations With Numbers
Chapter 1 Operations With Numbers Part I Negative Numbers You may already know what negative numbers are, but even if you don t, then you have probably seen them several times over the past few days. If
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationWhat is a Fraction? Fractions. One Way To Remember Numerator = North / 16. Example. What Fraction is Shaded? 9/16/16. Fraction = Part of a Whole
// Fractions Pages What is a Fraction? Fraction Part of a Whole Top Number? Bottom Number? Page Numerator tells how many parts you have Denominator tells how many parts are in the whole Note: the fraction
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationLecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.
Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject
More informationThis chapter is intended to take you through the basic steps of using the Visual Basic
CHAPTER 1 The Basics This chapter is intended to take you through the basic steps of using the Visual Basic Editor window and writing a simple piece of VBA code. It will show you how to use the Visual
More information5.5 Completing the Square for the Vertex
5.5 Completing the Square for the Vertex Having the zeros is great, but the other key piece of a quadratic function is the vertex. We can find the vertex in a couple of ways, but one method we ll explore
More information6.1 Evaluate Roots and Rational Exponents
VOCABULARY:. Evaluate Roots and Rational Exponents Radical: We know radicals as square roots. But really, radicals can be used to express any root: 0 8, 8, Index: The index tells us exactly what type of
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationGenerating Functions
6.04/8.06J Mathematics for Computer Science Srini Devadas and Eric Lehman April 7, 005 Lecture Notes Generating Functions Generating functions are one of the most surprising, useful, and clever inventions
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationOLS Assumptions and Goodness of Fit
OLS Assumptions and Goodness of Fit A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free
More information