Chapitre 2 : modèle linéaire généralisé
|
|
- Margery Wilkerson
- 5 years ago
- Views:
Transcription
1 Chapitre 2 : modèle linéaire généralisé
2 Introduction et jeux de données
3 Avant de commencer Faire pointer R vers votre répertoire setwd("~/dropbox/evry/m1geniomhe/cours/") source(file = "fonction_illustration_logistique.r") ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union (Installer et) charger les librairies ISLR, ggplot2,dplyr subsampling # sum(default == "Yes") # sample(which(default == "No"), size = 500, replace =FALSE) # Default_small = slice(default,c(which(default == "Yes"),sample(which(default = # write.table(default_small,file = 'Default_small')
4 les données Default ## [1] ## Observations: 833 ## Variables: 4 ## $ default <fctr> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes... ## $ student <fctr> Yes, Yes, Yes, No, Yes, Yes, No, No, No, No, No, No,... ## $ balance <dbl> , , , , ,... ## $ income <dbl> , , , , ,
5 scatterplot pairs(default, main="simple Scatterplot Matrix") Simple Scatterplot Matrix default student balance income
6 histogram ggplot(default, aes(x=default)) + geom_bar( fill = "steelblue") + theme_bw() count No default Yes
7 pour les variables balance et income ggplot(default, aes(balance, income)) + geom_point(aes(colour = default)) income default No Yes balance
8 ggplot(default, aes(default, balance)) + geom_boxplot(aes(fill = default)) 2000 balance default No Yes No default Yes
9 ggplot(default, aes(default, income)) + geom_boxplot(aes(fill = default)) income default No Yes No default Yes
10 ggplot(default, aes(balance, as.numeric(default)-1)) + geom_point() + geom_smooth(method = "glm", method.args = list(family = "gaussian")) 1.0 as.numeric(default) balance
11 plt = illustration_logistique(default,default$balance,default$default) plt probability of Default$default $(Default, balance)
12 plt = illustration_logistique(default,default$balance,default$default, log_reg = plt probability of Default$default $(Default, balance)
13 Les données crab crab = read.table("crab.txt",header = TRUE) glimpse(crab) ## Observations: 173 ## Variables: 6 ## $ Obs <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,... ## $ C <int> 2, 3, 3, 4, 2, 1, 4, 2, 2, 2, 1, 3, 2, 2, 3, 3, 2, 2, 2, 2... ## $ S <int> 3, 3, 3, 2, 3, 2, 3, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3... ## $ W <dbl> 28.3, 26.0, 25.6, 21.0, 29.0, 25.0, 26.2, 24.9, 25.7, ## $ Wt <dbl> 3.05, 2.60, 2.15, 1.85, 3.00, 2.30, 1.30, 2.10, 2.00, ## $ Sa <int> 8, 4, 0, 0, 1, 3, 0, 0, 8, 6, 5, 4, 3, 4, 3, 5, 8, 3, 6, 4... crab = mutate(crab, C = factor(c)) crab = mutate(crab, S = factor(s)) glimpse(crab) ## Observations: 173 ## Variables: 6 ## $ Obs <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,... ## $ C <fctr> 2, 3, 3, 4, 2, 1, 4, 2, 2, 2, 1, 3, 2, 2, 3, 3, 2, 2, 2,... ## $ S <fctr> 3, 3, 3, 2, 3, 2, 3, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3,... ## $ W <dbl> 28.3, 26.0, 25.6, 21.0, 29.0, 25.0, 26.2, 24.9, 25.7, ## $ Wt <dbl> 3.05, 2.60, 2.15, 1.85, 3.00, 2.30, 1.30, 2.10, 2.00, ## $ Sa <int> 8, 4, 0, 0, 1, 3, 0, 0, 8, 6, 5, 4, 3, 4, 3, 5, 8, 3, 6, 4... attach(crab)
14 scatterplot pairs(crab[-1], main="simple Scatterplot Matrix") Simple Scatterplot Matrix C S W Wt Sa
15 histogram ggplot(crab, aes(x=sa)) + geom_bar( fill = "steelblue") + theme_bw() count Sa
16 pour la variable W plt = illustration_poisson(crab,w,sa) plt mean of Sa W
17 plt = illustration_poisson(crab,w,sa, log_pois = TRUE) plt mean of Sa W
18 Régression logistique
19 Estimation des coefficients fit_logistic = glm(default ~ student + balance + income, family = "binomial", data=default) summary(fit_logistic) ## ## Call: ## glm(formula = default ~ student + balance + income, family = "binomial", ## data = Default) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error z value Pr(> z ) ## (Intercept) e e <2e-16 *** ## studentyes e e ## balance 5.330e e <2e-16 *** ## income 1.165e e ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: on 832 degrees of freedom ## Residual deviance: on 829 degrees of freedom
20 Prédictions # Prédictions predictions = predict(fit_logistic,type = "response") predictions_01 = predict(fit_logistic,type = "response") > 1/2 # Matrice de confusion table(predictions_01,default) ## default ## predictions_01 No Yes ## FALSE ## TRUE
21 ## ## Attaching package: 'gplots' ## The following object is masked from 'package:stats': ## ## lowess
22 Courbe ROC pred <- prediction( predictions, Default$default) perf <- performance( pred, "tpr", "fpr" )
23 plot( perf ) True positive rate False positive rate
24 AUC ROC_auc <- performance( pred,"auc") AUC <- print(auc) ## [1]
25 Test de nullité simultanée des coefficients fit_null = glm(default ~ 1, family = "binomial", data=default) anova(fit_null,fit_logistic,test="chisq") ## Analysis of Deviance Table ## ## Model 1: default ~ 1 ## Model 2: default ~ student + balance + income ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## ## < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
26 Résidus de déviance / observations aberrantes which(abs(resid(fit_logistic,type="deviance"))>2) ## ## ## ##
27 Attention pas de diagnostic gaussien! n = 100 x <-rnorm(n) p<-exp(x)/(1+exp(x)) #simulate response y <- rbinom(n,1,p) #fit model model = glm(y~x,family="binomial") qqnorm(resid(model, type="deviance")) abline(0,1) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
28 Leviers d observations influences = influence(fit_logistic) hat = influences$hat which(hat > 3*ncol(Default) /nrow(default)) ## ## ## ##
29 Diagnostic sur X library(car) ## ## Attaching package: 'car' ## The following object is masked from 'package:dplyr': ## ## recode vif(fit_logistic) ## student balance income ##
30 Régression de Poisson
31 Estimation des coefficients glimpse(crab) ## Observations: 173 ## Variables: 6 ## $ Obs <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,... ## $ C <fctr> 2, 3, 3, 4, 2, 1, 4, 2, 2, 2, 1, 3, 2, 2, 3, 3, 2, 2, 2,... ## $ S <fctr> 3, 3, 3, 2, 3, 2, 3, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3,... ## $ W <dbl> 28.3, 26.0, 25.6, 21.0, 29.0, 25.0, 26.2, 24.9, 25.7, ## $ Wt <dbl> 3.05, 2.60, 2.15, 1.85, 3.00, 2.30, 1.30, 2.10, 2.00, ## $ Sa <int> 8, 4, 0, 0, 1, 3, 0, 0, 8, 6, 5, 4, 3, 4, 3, 5, 8, 3, 6, 4... fit_poisson = glm(sa ~ C + S + W + Wt, family = "poisson", data=crab) summary(fit_poisson) ## ## Call: ## glm(formula = Sa ~ C + S + W + Wt, family = "poisson", data = crab) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error z value Pr(> z ) ## (Intercept)
32 Prédictions # Prédictions predictions = predict(fit_poisson,type = "response")
33 Test de nullité simultanée des coefficients fit_null = glm(sa ~ 1, family = "poisson", data=crab) anova(fit_null,fit_poisson,test="chisq") ## Analysis of Deviance Table ## ## Model 1: Sa ~ 1 ## Model 2: Sa ~ C + S + W + Wt ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## ## e-15 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
34 Résidus de déviance / observations aberrantes which(abs(resid(fit_poisson,type="deviance"))>2) ## ## ## ## ## ##
35 Leviers d observations influences = influence(fit_poisson) hat = influences$hat which(hat > 3*(ncol(crab)-1) /nrow(crab)) ## ## ## ##
36 Diagnostic sur X library(car) vif(fit_poisson) ## GVIF Df GVIF^(1/(2*Df)) ## C ## S ## W ## Wt
Discriminant analysis in R QMMA
Discriminant analysis in R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l4-lda-eng.html#(1) 1/26 Default data Get the data set Default library(islr)
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationStat 4510/7510 Homework 4
Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader
More information1 The SAS System 23:01 Friday, November 9, 2012
2101f12HW9chickwts.log Saved: Wednesday, November 14, 2012 6:50:49 PM Page 1 of 3 1 The SAS System 23:01 Friday, November 9, 2012 NOTE: Copyright (c) 2002-2010 by SAS Institute Inc., Cary, NC, USA. NOTE:
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationBinary Regression in S-Plus
Fall 200 STA 216 September 7, 2000 1 Getting Started in UNIX Binary Regression in S-Plus Create a class working directory and.data directory for S-Plus 5.0. If you have used Splus 3.x before, then it is
More informationMultinomial Logit Models with R
Multinomial Logit Models with R > rm(list=ls()); options(scipen=999) # To avoid scientific notation > # install.packages("mlogit", dependencies=true) # Only need to do this once > library(mlogit) # Load
More informationSubsetting, dplyr, magrittr Author: Lloyd Low; add:
Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And
More informationBen Baumer Instructor
MULTIPLE AND LOGISTIC REGRESSION What is logistic regression? Ben Baumer Instructor A categorical response variable ggplot(data = hearttr, aes(x = age, y = survived)) + geom_jitter(width = 0, height =
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationSTAT:5201 Applied Statistic II
STAT:5201 Applied Statistic II Two-Factor Experiment (one fixed blocking factor, one fixed factor of interest) Randomized complete block design (RCBD) Primary Factor: Day length (short or long) Blocking
More informationMath 263 Excel Assignment 3
ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce
More information1 Simple Linear Regression
Math 158 Jo Hardin R code 1 Simple Linear Regression Consider a dataset from ISLR on credit scores. Because we don t know the sampling mechanism used to collect the data, we are unable to generalize the
More informationMore data analysis examples
More data analysis examples R packages used library(ggplot2) library(tidyr) library(mass) library(leaps) library(dplyr) ## ## Attaching package: dplyr ## The following object is masked from package:mass
More informationNina Zumel and John Mount Win-Vector LLC
SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification
More informationrun ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt
/* This program is stored as bliss.sas */ /* This program uses PROC LOGISTIC in SAS to fit models with logistic, probit, and complimentary log-log link functions to the beetle mortality data collected
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationSTAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.
STAT 48 - STAT 48 - December 5, 27 STAT 48 - STAT 48 - Here are a few questions to consider: What does statistical learning mean to you? Is statistical learning different from statistics as a whole? What
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationNEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age
NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive
More informationLab #13 - Resampling Methods Econ 224 October 23rd, 2018
Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationSolution to Series 7
Dr. Marcel Dettling Applied Statistical Regression AS 2015 Solution to Series 7 1. a) We begin the analysis by plotting histograms and barplots for all variables. > ## load data > load("customerwinback.rda")
More informationLecture 3 - Object-oriented programming and statistical programming examples
Lecture 3 - Object-oriented programming and statistical programming examples Björn Andersson (w/ Ronnie Pingel) Department of Statistics, Uppsala University February 1, 2013 Table of Contents 1 Some notes
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationNon-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel
Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common
More informationThis is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or
STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible
More informationDynamic Network Regression Using R Package dnr
Dynamic Network Regression Using R Package dnr Abhirup Mallik July 26, 2018 R package dnr enables the user to fit dynamic network regression models for time variate network data available mostly in social
More informationStat 5303 (Oehlert): Response Surfaces 1
Stat 5303 (Oehlert): Response Surfaces 1 > data
More informationURLs identification task: Istat current status. Istat developed and applied a procedure consisting of the following steps:
ESSnet BIG DATA WorkPackage 2 URLs identification task: Istat current status Giulio Barcaroli, Monica Scannapieco, Donato Summa Istat developed and applied a procedure consisting of the following steps:
More informationReliability Coefficients
Reliability Coefficients Introductory notes That data used for these computations is the pre-treatment scores of all subjects. There are three items in the SIAS (5, 9 and 11) that require reverse-scoring.
More informationR Graphics. SCS Short Course March 14, 2008
R Graphics SCS Short Course March 14, 2008 Archeology Archeological expedition Basic graphics easy and flexible Lattice (trellis) graphics powerful but less flexible Rgl nice 3d but challenging Tons of
More informationSYS 6021 Linear Statistical Models
SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationLogistic Regression. (Dichotomous predicted variable) Tim Frasier
Logistic Regression (Dichotomous predicted variable) Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationChapter 10: Extensions to the GLM
Chapter 10: Extensions to the GLM 10.1 Implement a GAM for the Swedish mortality data, for males, using smooth functions for age and year. Age and year are standardized as described in Section 4.11, for
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationStat 5303 (Oehlert): Unbalanced Factorial Examples 1
Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More informationCentering and Interactions: The Training Data
Centering and Interactions: The Training Data A random sample of 150 technical support workers were first given a test of their technical skill and knowledge, and then randomly assigned to one of three
More informationOrange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)
Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated
More informationLab #7 - More on Regression in R Econ 224 September 18th, 2018
Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas
More informationSolution to Bonus Questions
Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample
More informationWork through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.
CDT R Review Sheet Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. 1. Vectors (a) Generate 100 standard normal random variables,
More information610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison
610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because
More informationAA BB CC DD EE. Introduction to Graphics in R
Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat
More informationAnnexes : Sorties SAS pour l'exercice 3. Code SAS. libname process 'G:\Enseignements\M2ISN-Series temp\sas\';
Annexes : Sorties SAS pour l'exercice 3 Code SAS libname process 'G:\Enseignements\M2ISN-Series temp\sas\'; /* Etape 1 - Création des données*/ proc iml; phi={1-1.583 0.667-0.083}; theta={1}; y=armasim(phi,
More informationClassification. Data Set Iris. Logistic Regression. Species. Petal.Width
Classification Data Set Iris # load data data(iris) # this is what it looks like... head(iris) Sepal.Length Sepal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 0.2 5 5.0 3.6 1.4
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationbook 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition
book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................
More informationStat 8053, Fall 2013: Additive Models
Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An
More informationEXPLORATORY DATA ANALYSIS. Introducing the data
EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationPackage citools. October 20, 2018
Type Package Package citools October 20, 2018 Title Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models Version 0.5.0 Maintainer John Haman Functions
More informationCSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM
options pagesize=53 linesize=76 pageno=1 nodate; proc format; value $stcktyp "1"="Growth" "2"="Combined" "3"="Income"; data invstmnt; input stcktyp $ perform; label stkctyp="type of Stock" perform="overall
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationMACHINE LEARNING TOOLBOX. Logistic regression on Sonar
MACHINE LEARNING TOOLBOX Logistic regression on Sonar Classification models Categorical (i.e. qualitative) target variable Example: will a loan default? Still a form of supervised learning Use a train/test
More informationAn Introductory Guide to R
An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics
More informationBernt Arne Ødegaard. 15 November 2018
R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationExperiments in Imputation
Experiments in Imputation Mean substitution ############################################################################### # Imputation of missing values can inflate Type I error probability. # Imputation
More informationAssignment 5.5. Nothing here to hand in
Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationSAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/
STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape
More information1 Lab 1. Graphics and Checking Residuals
R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows
More informationETC 2420/5242 Lab Di Cook Week 5
ETC 2420/5242 Lab 5 2017 Di Cook Week 5 Purpose This lab is to practice fitting linear models. Reading Read the material on fitting regression models in Statistics online textbook, Diez, Barr, Cetinkaya-
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationCH9.Generalized Additive Model
CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing
More informationUsing the SemiPar Package
Using the SemiPar Package NICHOLAS J. SALKOWSKI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA salk0008@umn.edu May 15, 2008 1 Introduction The
More informationOrganizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set
Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationFrom logistic to binomial & Poisson models
From logistic to binomial & Poisson models Ben Bolker October 17, 2018 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share
More informationK-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017
K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold
More informationQuantitative Methods in Management
Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises
More informationComparing Fitted Models with the fit.models Package
Comparing Fitted Models with the fit.models Package Kjell Konis Acting Assistant Professor Computational Finance and Risk Management Dept. Applied Mathematics, University of Washington History of fit.models
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationSome issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,
R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationLoglinear and Logit Models for Contingency Tables
Chapter 8 Loglinear and Logit Models for Contingency Tables Loglinear models comprise another special case of generalized linear models designed for contingency tables of frequencies. They are most easily
More informationR Programming Basics - Useful Builtin Functions for Statistics
R Programming Basics - Useful Builtin Functions for Statistics Vectorized Arithmetic - most arthimetic operations in R work on vectors. Here are a few commonly used summary statistics. testvect = c(1,3,5,2,9,10,7,8,6)
More informationStatistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection
Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Exploratory regression and model selection The lecture notes, exercises
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang
ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report Team Member Names: Xi Yang, Yi Wen, Xue Zhang Project Title: Improve Room Utilization Introduction Problem
More informationFactorial ANOVA with SAS
Factorial ANOVA with SAS /* potato305.sas */ options linesize=79 noovp formdlim='_' ; title 'Rotten potatoes'; title2 ''; proc format; value tfmt 1 = 'Cool' 2 = 'Warm'; data spud; infile 'potato2.data'
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Exploratory regression and model
More informationSTAT - Edit Scroll up the appropriate list to highlight the list name at the very top Press CLEAR, followed by the down arrow or ENTER
Entering/Editing Data Use arrows to scroll to the appropriate list and position Enter or edit data, pressing ENTER after each (including the last) Deleting Data (One Value at a Time) Use arrows to scroll
More informationMultistat2 1
Multistat2 1 2 Multistat2 3 Multistat2 4 Multistat2 5 Multistat2 6 This set of data includes technologically relevant properties for lactic acid bacteria isolated from Pasta Filata cheeses 7 8 A simple
More informationSome methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance.
Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance. Soil Security Laboratory 2018 1 Universal kriging prediction variance In
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own
More information