Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression
|
|
- Ezra Underwood
- 6 years ago
- Views:
Transcription
1 Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12,
2 1 Repetition 1.1 Estimation using the mean square error Assume to have N observation pairs (x i, y i ) generated by the following stochastic process y i = β 0 + β 1 x i + w i, where the w i are iid realisations of a random variable w with mean zero and constant variance σw. 2 The x i can be seen as fixed, the only random component in the sampleset D N is therefore contained in the y i (which are random due to the w i ). The coefficients β 0 and β 1 can be estimated using the least squares method. This method consists of taking those estimators βˆ 0 and βˆ 1 which minimize R emp = (y i ŷ i ) 2, (1) i 1 where This is equivalent to The solution is given by where { ˆ β 0, ˆ β 1 } = arg min b 0,b 1 βˆ 1 = S xy, ŷ i = ˆ β 0 + ˆ β 1 x i. (2) (y i b 0 b 1 x i ) 2. (3) βˆ 0 = ȳ βˆ 1 x, (4) x = N x N i N, ȳ = y i N, S xy = (x i x)y i, = (x i x) 2. (5) 1.2 Properties of the estimator E DN [ ˆ β 1 ] = β 1 Var[ ˆ β 1 ] = σ2 E[ ˆ β 0 ] = β 0 ( ) Var[ βˆ 0 ] = σ 2 1 N + x 2 ˆσ 2 w = N (y i ŷ i ) 2 N 2 is a non-biased estimator of σ 2 w. 2
3 1.3 Partitioning the variability The variability of the response y i can be expressed as follows that is (y i ȳ) 2 = (ŷ i ȳ) 2 + (y i ŷ i ) 2, (6) SS tot = SS mod + SS res. (7) 1.4 The F-test Goal: test if the variable y is really influenced by the variable x. This can be formulated as a hypothesis test β 1 = 0. If the test is rejected, it can be deduced that x influences y significantly. It can be shown that given a normally distributed w: if the hypothesis β 1 = 0 is true. SS mod SS res /(N 2) F 1,N 2 (8) 1.5 The t-test It can be shown that given a normally distributed w: ˆ β 1 N (β 1, σ 2 / ) (9) and βˆ 1 β 1 Sxx T N 2. (10) ˆσ This can be used for testing the following hypothesis: β 1 = β. 1.6 Confidence intervals With a probability 1 α, the true parameter β 1 lies in the interval 1.7 Variance of the response ˆ β 1 ± t α/2,n 2 ˆσ 2. (11) Let We can show that for all x: and ŷ(x) = ˆ β 0 + ˆ β 1 x. (12) E DN [ŷ(x)] = E [y] [y(x)] (13) Var[ŷ(x)] = σ 2 [ 1 N ] (x x)2 +. (14) 3
4 2 Linear regression exercises 2.1 Exercise 1 Compare with the theoretical part of this course (slides 7 and 28 of the chapter Regression Modelling ). The goal of this exercise is to investigate the link between two variables originating from medical data by studying the ventricular shortening velocity in function of blood glucose. # data preparation library(iswr) data(thuesen) I <-!is.na(thuesen[, "short.velocity"]) Y <- thuesen[i, "short.velocity"] X <- thuesen[i, "blood.glucose"] (a) Apply the mean square method by hand using equations (4) and (5) to compute the coefficients β 0 and β 1 of a linear model for our data. print(paste("beta.hat.0 = ", beta.hat.0)) ## [1] "beta.hat.0 = " print(paste("beta.hat.1 = ", beta.hat.1)) ## [1] "beta.hat.1 = " (b) Test the hypothesis β 1 = 0 using an F-test using equation (8) and the F distribution function pf followed by a t-test using equation (10) and the t distribution function pt print(paste("f-test result: F.value= ", F.value)) ## [1] "F-test result: F.value= " print(paste(" Pr[F >= F.value]= ", F.pr)) ## [1] " Pr[F >= F.value]= " print(paste("t-test result: t.value= ", t.value)) ## [1] "t-test result: t.value= " 4
5 print(paste("; Pr[ T >= t.value]= ", t.pr)) ## [1] "; Pr[ T >= t.value]= " (c) Compute the confidence interval for β 1 using equation (11) and the function qt. print(paste("confidence interval for beta1=")) ## [1] "Confidence interval for beta1=" print(paste("(", conf.interval.min, ",", conf.interval.max, ")")) ## [1] "( , )" (d) Use the function lm to obtain the same results automatically and compare these with the ones obtained earlier. (e) Visualize the data and the regression line Histogram of Y Y Y X 5
6 2.2 Exercise 2 The goal of this exercise is to experimentally study the bias and the variance of βˆ 0, βˆ 1, ˆσ and ŷ(x i ). See also the theoretical part of this course (slide 27 of the chapter Regression Modelling ). ## Fix model, data and number of iterations rm(list = ls()) X <- seq(-10, 10, by = 1) # the x_i are fixed beta0 <- -1 # y_i = -1 + x_i + Normal(0,5) beta1 <- 1 sd.w <- 5 N <- length(x) R <- 100 #00 \t\t# number of iterations for the simulation ## Initialize beta.hat.1 <- numeric(r) beta.hat.0 <- numeric(r) var.hat.w <- numeric(r) Y.hat <- array(na, c(r, N)) (a) Compute ˆβ 0, ˆβ 1 and ˆσ and plot their distributions. Distribution of beta.hat.1: beta1= 1 Distribution of beta.hat.0: beta0= 1 Distribution of var.hat.w: var w= beta.hat beta.hat var.hat.w (b) Illustrate the theorem Var[y(x)] = σ 2 ( 1 N + (x x)2 ). ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " 6
7 ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " 7
8 3 Multiple regression exercise This example is taken from the theoretical part of this course (slide 36 of the chapter Regression Modelling ). Mutiple linear dependence occurs when the variable x is a vector instead of a scalar. The goal of this exercise is to verify the theoretical results for the estimators ˆσ 2 and ˆβ obtained for the least squares method (no bias and analytical results concerning Var[ ˆβ]). ## Initialize rm(list = ls()) library(mass) # initial values for n, (sigma_w) and beta n <- 3 # number of input variables p <- n + 1 beta <- seq(2, p + 1) # beta =(2,3,...,n+2) sd.w <- 5 # generating data D_N N <- 100 # number of samples X <- array(runif(n * n, min = -20, max = 20), c(n, n)) X <- cbind(array(1, c(n, 1)), X) R <- 100 #00 # number of iterations beta.hat <- array(0, c(p, R)) var.hat.w <- numeric(r) Y.hat <- array(na, c(r, N)) (a) Compute Ŷ, ˆβ and ˆσ following the equations in the course slides 33, 35 and 37. (b) Plot the histograms for ˆσ and for each ˆβ 8
9 Distribution of var.hat.w: var w= 25 Distribution of beta.hat. 1 : beta 1 = var.hat.w beta.hat[i, ] Distribution of beta.hat. 2 : beta 2 = 3 Distribution of beta.hat. 3 : beta 3 = beta.hat[i, ] beta.hat[i, ] Distribution of beta.hat. 4 : beta 4 = beta.hat[i, ] 9
10 Session Info R version ( ), x86_64-apple-darwin9.8.0 Base packages: base, datasets, grdevices, graphics, methods, stats, utils Other packages: ISwR 2.0-6, MASS , knitr 1.1 Loaded via a namespace (and not attached): digest 0.5.2, evaluate 0.4.3, formatr 0.6, plyr 1.8, stringr 0.6.1, tools
Statistical foundations of Machine Learning INFO-F-422 TP: Prediction
Statistical foundations of Machine Learning INFO-F-422 TP: Prediction Catharina Olsen and Gianluca Bontempi March 25, 2013 1 1 Introduction: supervised learning A supervised learning problem lets us study
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationStandard Errors in OLS Luke Sonnet
Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationStatistical foundations of machine learning
Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Machine Learning Group Computer Science Department mlg.ulb.ac.be Some algorithms for nonlinear modeling Feedforward neural network
More informationLab 07: Multiple Linear Regression: Variable Selection
Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationPerformance Evaluation
Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationA Multiple-Line Fitting Algorithm Without Initialization Yan Guo
A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee
More informationHyperparameters and Validation Sets. Sargur N. Srihari
Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationModel combination. Resampling techniques p.1/34
Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationAdvanced Statistical Computing Week 2: Monte Carlo Study of Statistical Procedures
Advanced Statistical Computing Week 2: Monte Carlo Study of Statistical Procedures Aad van der Vaart Fall 2012 Contents Sampling distribution Estimators Tests Computing a p-value Permutation Tests 2 Sampling
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationStatsMate. User Guide
StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationSimulation and resampling analysis in R
Simulation and resampling analysis in R Author: Nicholas G Reich, Jeff Goldsmith, Andrea S Foulkes, Gregory Matthews This material is part of the statsteachr project Made available under the Creative Commons
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationNONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR
NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationStatistics 406 Exam November 17, 2005
Statistics 406 Exam November 17, 2005 1. For each of the following, what do you expect the value of A to be after executing the program? Briefly state your reasoning for each part. (a) X
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationRecent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will
Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationSplines and penalized regression
Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,
More informationModel selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53
/53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationRSM Split-Plot Designs & Diagnostics Solve Real-World Problems
RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationDealing with Categorical Data Types in a Designed Experiment
Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of
More informationMATH : EXAM 3 INFO/LOGISTICS/ADVICE
MATH 3342-004: EXAM 3 INFO/LOGISTICS/ADVICE INFO: WHEN: Friday (04/22) at 10:00am DURATION: 50 mins PROBLEM COUNT: Appropriate for a 50-min exam BONUS COUNT: At least one TOPICS CANDIDATE FOR THE EXAM:
More informationsurvsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes
survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationGeneral Factorial Models
In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 1 1 / 31 It is possible to have many factors in a factorial experiment. We saw some three-way factorials earlier in the DDD book (HW 1 with 3 factors:
More informationINFO0948 Fitting and Shape Matching
INFO0948 Fitting and Shape Matching Renaud Detry University of Liège, Belgium Updated March 31, 2015 1 / 33 These slides are based on the following book: D. Forsyth and J. Ponce. Computer vision: a modern
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationGeneral Factorial Models
In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface
More informationMicroscopic Traffic Simulation
Microscopic Traffic Simulation Lecture Notes in Transportation Systems Engineering Prof. Tom V. Mathew Contents Overview 2 Traffic Simulation Models 2 2. Need for simulation.................................
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More information5.5 Regression Estimation
5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship
More information. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)
DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their
More informationCSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation
CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting
More informationThe Euler Equidimensional Equation ( 3.2)
The Euler Equidimensional Equation ( 3.) The Euler Equidimensional Equation ( 3.) The Euler Equidimensional Equation Definition The Euler equidimensional equation for the unknown function y with singular
More informationINTRODUCTION TO PANEL DATA ANALYSIS
INTRODUCTION TO PANEL DATA ANALYSIS USING EVIEWS FARIDAH NAJUNA MISMAN, PhD FINANCE DEPARTMENT FACULTY OF BUSINESS & MANAGEMENT UiTM JOHOR PANEL DATA WORKSHOP-23&24 MAY 2017 1 OUTLINE 1. Introduction 2.
More informationCalibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec
Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,
More informationEstimating R 0 : Solutions
Estimating R 0 : Solutions John M. Drake and Pejman Rohani Exercise 1. Show how this result could have been obtained graphically without the rearranged equation. Here we use the influenza data discussed
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationRecall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:
Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,
More informationBias-variance trade-off and cross validation Computer exercises
Bias-variance trade-off and cross validation Computer exercises 6.1 Cross validation in k-nn In this exercise we will return to the Biopsy data set also used in Exercise 4.1 (Lesson 4). We will try to
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More information1 StatLearn Practical exercise 5
1 StatLearn Practical exercise 5 Exercise 1.1. Download the LA ozone data set from the book homepage. We will be regressing the cube root of the ozone concentration on the other variables. Divide the data
More informationUsing Large Data Sets Workbook Version A (MEI)
Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More informationMultiple Linear Regression: Global tests and Multiple Testing
Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike
More information2014 Stat-Ease, Inc. All Rights Reserved.
What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation
More informationChapter 16. Microscopic Traffic Simulation Overview Traffic Simulation Models
Chapter 6 Microscopic Traffic Simulation 6. Overview The complexity of traffic stream behaviour and the difficulties in performing experiments with real world traffic make computer simulation an important
More informationPhysics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -
Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/
More informationMicroscopic Traffic Simulation
Transportation System Engineering 37. Microscopic Traffic Simulation Chapter 37 Microscopic Traffic Simulation 37. Overview The complexity of traffic stream behaviour and the difficulties in performing
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationHomework 5. Due: April 20, 2018 at 7:00PM
Homework 5 Due: April 20, 2018 at 7:00PM Written Questions Problem 1 (25 points) Recall that linear regression considers hypotheses that are linear functions of their inputs, h w (x) = w, x. In lecture,
More informationA. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.
WILD 502 Lab 1 Estimating Survival when Animal Fates are Known Today s lab will give you hands-on experience with estimating survival rates using logistic regression to estimate the parameters in a variety
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More information5 Bootstrapping. 4.7 Extensions. 4.8 References. 5.1 Bootstrapping for random samples (the i.i.d. case) ST697F: Topics in Regression.
ST697F: Topics in Regression. Spring 2007 c 21 4.7 Extensions The approach above can be readily extended to the case where we are interested in inverse prediction or regulation about one predictor given
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationTesting Random- Number Generators
Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63131 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse574-06/ 27-1 Overview
More information