Modelo linear no. Valeska Andreozzi
|
|
- Malcolm Barnaby Cain
- 5 years ago
- Views:
Transcription
1 Modelo linear no Valeska Andreozzi valeska.andreozzi at fc.ul.pt Centro de Estatística e Aplicações da Universidade de Lisboa Faculdade de Ciências da Universidade de Lisboa Lisboa, 2012 Sumário 1 Correlação de Pearson 2 2 Correlação de Spearman 3 3 Modelo linear Exemplo Ajuste do modelo linear Fórmula Sumário Intervalo de confiança Comparação de modelos Seleção de covariáveis Análise de resíduos Predição Gráfico dos efeitos Extraindo valores
2 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Valeska Andreozzi 1 CORRELAÇÃO DE PEARSON 1 Correlação de Pearson > library(iswr) > data(thuesen) > View(thuesen) > plot(thuesen) > cor(thuesen$blood.glucose, thuesen$short.velocity) [1] NA > cor(thuesen$blood.glucose, thuesen$short.velocity,use="complete.obs") [1] short.velocity blood.glucose Outro exemplo >?longley > library(car) > scatterplotmatrix(longley) > cor(longley) 2
3 GNP.deflator GNP Unemployed Armed.Forces Population GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Year Employed GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Correlação de Spearman > cor(thuesen$blood.glucose,thuesen$short.velocity, + use="complete.obs",method="spearman") [1] Modelo linear Referências online Exemplo Com o objetivo de identificar fatores associados ao peso ao nascer, pesquisadores coletaram as seguintes informações: Utilize estes dados para estimar uma regressão linear múltipla e responder o objetivo do estudo. > bp <- read.table("lowbwtdata.dat", header = T) > dim(bp) [1] > names(bp) <- tolower(names(bp)) > head(bp) 3
4 Descrição Códigos/Valores Variável Identification Code ID Number ID Low Birth Weight 1 = BWT<=2500g LOW 0 = BWT>2500g Age of Mother Years AGE Weight of Mother at Pounds LWT Last Menstrual Period Race 1 = White, 2 = Black RACE 3 = Other Smoking Status 0 = No, 1 = Yes SMOKE During Pregnancy History of Premature Labor 0,1,2, PTL History of Hypertension 0 = No, 1 = Yes HT Presence of Uterine Irritability 0 = No, 1 = Yes UI Number of Physician Visits 0,1,2, FTV During the First Trimester Birth Weight Grams BWT id low age lwt race smoke ptl ht ui ftv bwt Indicando ao R que as variáveis são categóricas > bp$race <- factor(bp$race) > bp$smoke <- factor(bp$smoke) > bp$ht <- factor(bp$ht) > bp$ui <- factor(bp$ui) > bp$low <- factor(bp$low) Para saber quais as classes são referências, temos > contrasts(bp$race) > contrasts(bp$smoke)
5 > contrasts(bp$ht) > contrasts(bp$ui) > contrasts(bp$low) Trocando a escala da variável resposta para kg > bp$bwt<-bp$bwt/1000 Sumário dos dados > summary(bp) id low age lwt race smoke Min. : 4.0 0:130 Min. :14.00 Min. : :96 0:115 1st Qu.: : 59 1st Qu.: st Qu.: :26 1: 74 Median :123.0 Median :23.00 Median : :67 Mean :121.1 Mean :23.24 Mean : rd Qu.: rd Qu.: rd Qu.:140.0 Max. :226.0 Max. :45.00 Max. :250.0 ptl ht ui ftv bwt Min. : :177 0:161 Min. : Min. : st Qu.: : 12 1: 28 1st Qu.: st Qu.:2.414 Median : Median : Median :2.977 Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.:3.475 Max. : Max. : Max. :4.990 > as.data.frame(table(bp$ptl)) 5
6 Var1 Freq > as.data.frame(table(bp$ftv)) Var1 Freq Ajuste do modelo linear > bp.lm1 <- lm(bwt ~ age+lwt+race+ftv, data = bp) > bp.lm1 Call: lm(formula = bwt ~ age + lwt + race + ftv, data = bp) (Intercept) age lwt race2 race3 ftv Fórmula + para incluir efeitos principais, A+B : para incluir interações, A : B * para incluir efeitos principais e interações, A B = A+B +A : B I() para incluir termos matemáticos, I(A 2) Exemplos > fit1 <- lm(bwt ~age*race, data = bp) > fit1 Call: lm(formula = bwt ~ age * race, data = bp) (Intercept) age race2 race3 age:race2 age:race
7 > fit2 <- lm(bwt ~age + I(age^2), data = bp) > fit2 Call: lm(formula = bwt ~ age + I(age^2), data = bp) (Intercept) age I(age^2) Sumário > summary(bp.lm1) Call: lm(formula = bwt ~ age + lwt + race + ftv, data = bp) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** age lwt * race ** race * ftv Signif. codes: 0 *** ** 0.01 * Residual standard error: on 183 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 183 DF, p-value: Intervalo de confiança > confint(bp.lm1) 2.5 % 97.5 % (Intercept) age lwt race race ftv
8 3.6 Comparação de modelos > fit0 <- lm(bwt ~age+race, data = bp) > fit1 <- lm(bwt ~age*race, data = bp) > anova(fit0,fit1,test="f") Analysis of Variance Table Model 1: bwt ~ age + race Model 2: bwt ~ age * race Res.Df RSS Df Sum of Sq F Pr(>F) Seleção de covariáveis Procedimento stepwise > bw.mod1 <- glm(bwt ~ age+lwt+race+smoke+ht+ftv, data = bp) > summary(bw.mod1) Call: glm(formula = bwt ~ age + lwt + race + smoke + ht + ftv, data = bp) Deviance Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) e-15 *** age lwt ** race ** race ** smoke *** ht * ftv Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for gaussian family taken to be ) Null deviance: on 188 degrees of freedom Residual deviance: on 181 degrees of freedom AIC: Number of Fisher Scoring iterations: 2 8
9 > mod.both<-step(bw.mod1,direction="both") Start: AIC= bwt ~ age + lwt + race + smoke + ht + ftv - ftv age <none> ht lwt smoke race Step: AIC= bwt ~ age + lwt + race + smoke + ht - age <none> ftv ht lwt smoke race Step: AIC= bwt ~ lwt + race + smoke + ht <none> age ftv ht lwt smoke race > mod.both Call: glm(formula = bwt ~ lwt + race + smoke + ht, data = bp) (Intercept) lwt race2 race3 smoke1 ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC:
10 Procedimento backward > mod.back<-step(bw.mod1,direction="backward") Start: AIC= bwt ~ age + lwt + race + smoke + ht + ftv - ftv age <none> ht lwt smoke race Step: AIC= bwt ~ age + lwt + race + smoke + ht - age <none> ht lwt smoke race Step: AIC= bwt ~ lwt + race + smoke + ht <none> ht lwt smoke race > mod.back Call: glm(formula = bwt ~ lwt + race + smoke + ht, data = bp) (Intercept) lwt race2 race3 smoke1 ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC:
11 Procedimento forward > bw.nulo <- glm(bwt ~ 1, data = bp) > mod.forw<-step(bw.nulo,scope=list(upper=~age+lwt+race+smoke+ht+ftv), + direction="forward") Start: AIC= bwt ~ 1 + race smoke lwt ht <none> age ftv Step: AIC= bwt ~ race + smoke lwt ht <none> age ftv Step: AIC= bwt ~ race + smoke + lwt ht <none> ftv age Step: AIC= bwt ~ race + smoke + lwt + ht <none> age ftv Step: AIC=
12 bwt ~ race + smoke + lwt + ht <none> age ftv > mod.forw Call: glm(formula = bwt ~ race + smoke + lwt + ht, data = bp) (Intercept) race2 race3 smoke1 lwt ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC: Análise de resíduos Calculando os resíduos > res<-rstandard(mod.both,type="deviance") > layout(matrix(c(1,2,3,4),2,2)) > plot(mod.both) 3.9 Predição Considere o modelo > fit<-lm(bwt~lwt+race+smoke+ht,data=bp) > summary(fit) Call: lm(formula = bwt ~ lwt + race + smoke + ht, data = bp) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** 12
13 lwt ** race ** race ** smoke *** ht * --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 183 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 183 DF, p-value: 9.859e-07 Considere uma nova observação com a seguintes características: lwt=170 race=3 smoke=1 ht=1 Qual o peso esperado à nascença para uma criança cuja mãe apresenta as características acima? > new<-data.frame(lwt=170,race="3",smoke="1",ht="1") > predict(fit, new,se.fit = TRUE) $fit $se.fit [1] $df [1] 183 $residual.scale [1] Obtenha um intervalo de confiança a um nível de 95% de confiança para esta nova criança. > pred.w.plim <- predict(fit,new, interval="prediction") > pred.w.plim fit lwr upr
14 Obtenha um intervalo de confiança a um nível de 95% de confiança para o peso médio à nascença das crianças cujas mães possuem essas mesmas características. > pred.w.clim <- predict(fit,new, interval="confidence") > pred.w.clim fit lwr upr Gráfico dos efeitos > library(effects) > plot(effect("lwt",fit)) lwt effect plot bwt lwt > plot(effect("race",fit)) race effect plot bwt race > plot(alleffects(fit),ask=false) lwt effect plot race effect plot bwt bwt lwt smoke effect plot race ht effect plot bwt bwt smoke 0 1 ht 14
15 3.11 Extraindo valores > fitted(fit) #valores ajustados > coefficients(fit) #coeficientes do modelo > names(fit) #lista o nome dos objetos do modelo fit > is.list(fit) 15
Regression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More information1 The SAS System 23:01 Friday, November 9, 2012
2101f12HW9chickwts.log Saved: Wednesday, November 14, 2012 6:50:49 PM Page 1 of 3 1 The SAS System 23:01 Friday, November 9, 2012 NOTE: Copyright (c) 2002-2010 by SAS Institute Inc., Cary, NC, USA. NOTE:
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More informationStat 5303 (Oehlert): Response Surfaces 1
Stat 5303 (Oehlert): Response Surfaces 1 > data
More informationStatistical Tests for Variable Discrimination
Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:
More informationSTAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.
STAT 48 - STAT 48 - December 5, 27 STAT 48 - STAT 48 - Here are a few questions to consider: What does statistical learning mean to you? Is statistical learning different from statistics as a whole? What
More informationThe Statistical Sleuth in R: Chapter 10
The Statistical Sleuth in R: Chapter 10 Kate Aloisio Ruobing Zhang Nicholas J. Horton September 28, 2013 Contents 1 Introduction 1 2 Galileo s data on the motion of falling bodies 2 2.1 Data coding, summary
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationthat is, Data Science Hello World.
R 4 hackers Hello World that is, Data Science Hello World. We got some data... Sure, first we ALWAYS do some data exploration. data(longley) head(longley) GNP.deflator GNP Unemployed Armed.Forces Population
More informationThe problem we have now is called variable selection or perhaps model selection. There are several objectives.
STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We
More informationOrange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)
Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationSolution to Bonus Questions
Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample
More informationModel selection and model averaging with an information criterion (AIC) approach G. San Martin
Model selection and model averaging with an information criterion (AIC) approach G. San Martin gilles.sanmartin@gmail.com Centre Wallon de Recherche Agronomique Outline Example of ecological dataset/questions
More informationStat 4510/7510 Homework 4
Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader
More informationMultiple Linear Regression: Global tests and Multiple Testing
Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike
More informationDiscriminant analysis in R QMMA
Discriminant analysis in R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l4-lda-eng.html#(1) 1/26 Default data Get the data set Default library(islr)
More informationTRI para escalas politômicas. Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco
TRI para escalas politômicas Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco Modelos Modelo Rasch-Andrich Rating Scale Model (respostas graduais)
More informationMultinomial Logit Models with R
Multinomial Logit Models with R > rm(list=ls()); options(scipen=999) # To avoid scientific notation > # install.packages("mlogit", dependencies=true) # Only need to do this once > library(mlogit) # Load
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More information610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison
610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several
More informationURLs identification task: Istat current status. Istat developed and applied a procedure consisting of the following steps:
ESSnet BIG DATA WorkPackage 2 URLs identification task: Istat current status Giulio Barcaroli, Monica Scannapieco, Donato Summa Istat developed and applied a procedure consisting of the following steps:
More informationA Knitr Demo. Charles J. Geyer. February 8, 2017
A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.
More informationMultivariate Analysis Multivariate Calibration part 2
Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationRegression III: Lab 4
Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would
More information1 Lab 1. Graphics and Checking Residuals
R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows
More informationLab #13 - Resampling Methods Econ 224 October 23rd, 2018
Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section
More informationWhat is an Algebra. Core Relational Algebra. What is Relational Algebra? Operação de Seleção. Álgebra Relacional: Resumo
What is an Algebra Bancos de Dados Avançados Revisão: Álgebra Relacional DCC030 - TCC: Bancos de Dados Avançados (Ciência Computação) DCC049 - TSI: Bancos de Dados Avançados (Sistemas Informação) DCC842
More informationBernt Arne Ødegaard. 15 November 2018
R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,
More informationS CHAPTER return.data S CHAPTER.Data S CHAPTER
1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]
More informationChapter 2 Data Exploration
Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of
More informationThis is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or
STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible
More informationCSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation
CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting
More informationSolution to Series 7
Dr. Marcel Dettling Applied Statistical Regression AS 2015 Solution to Series 7 1. a) We begin the analysis by plotting histograms and barplots for all variables. > ## load data > load("customerwinback.rda")
More informationSalary 9 mo : 9 month salary for faculty member for 2004
22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationChapter 10: Extensions to the GLM
Chapter 10: Extensions to the GLM 10.1 Implement a GAM for the Swedish mortality data, for males, using smooth functions for age and year. Age and year are standardized as described in Section 4.11, for
More informationSome issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,
R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely
More informationSAS/ETS. Séries Temporais Usando o SAS. Kim Samejima. November 4, 2018 UFBA. Kim Samejima (UFBA) SAS/ETS November 4, / 22
SAS/ETS Séries Temporais Usando o SAS Kim Samejima UFBA November 4, 2018 Kim Samejima (UFBA) SAS/ETS November 4, 2018 1 / 22 SAS Datasets Criando libnames e tabelas SAS (datasets) data lib.dboutput(keep=
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationInstruction on JMP IN of Chapter 19
Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"
More informationWINKS SDA Statistical Data Analysis and Graphs. WINKS R Command Summary Reference Guide
WINKS SDA Statistical Data Analysis and Graphs WINKS R Command Summary Reference Guide 2011 Alan C. Elliott, TexaSoft For the latest edition, go to http:///winksr_guide.pdf WINKS R Command Summary 2 Table
More informationmcssubset: Efficient Computation of Best Subset Linear Regressions in R
mcssubset: Efficient Computation of Best Subset Linear Regressions in R Marc Hofmann Université de Neuchâtel Cristian Gatu Université de Neuchâtel Erricos J. Kontoghiorghes Birbeck College Achim Zeileis
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Exploratory regression and model
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationDynamic Network Regression Using R Package dnr
Dynamic Network Regression Using R Package dnr Abhirup Mallik July 26, 2018 R package dnr enables the user to fit dynamic network regression models for time variate network data available mostly in social
More information5.5 Regression Estimation
5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationBinary Regression in S-Plus
Fall 200 STA 216 September 7, 2000 1 Getting Started in UNIX Binary Regression in S-Plus Create a class working directory and.data directory for S-Plus 5.0. If you have used Splus 3.x before, then it is
More informationThe linear mixed model: modeling hierarchical and longitudinal data
The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical
More informationStatistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection
Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Exploratory regression and model selection The lecture notes, exercises
More informationsrc docs Release Author
src docs Release 0.8.18 Author September 20, 2018 Contents 1 networkapiclient package 3 1.1 Submodules............................................... 3 1.2 networkapiclient.ambiente module...................................
More informationMath 263 Excel Assignment 3
ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce
More informationChapitre 2 : modèle linéaire généralisé
Chapitre 2 : modèle linéaire généralisé Introduction et jeux de données Avant de commencer Faire pointer R vers votre répertoire setwd("~/dropbox/evry/m1geniomhe/cours/") source(file = "fonction_illustration_logistique.r")
More informationStat 5303 (Oehlert): Unbalanced Factorial Examples 1
Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationStat 8053, Fall 2013: Additive Models
Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An
More informationInstrumental variables, bootstrapping, and generalized linear models
The Stata Journal (2003) 3, Number 4, pp. 351 360 Instrumental variables, bootstrapping, and generalized linear models James W. Hardin Arnold School of Public Health University of South Carolina Columbia,
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationITSx: Policy Analysis Using Interrupted Time Series
ITSx: Policy Analysis Using Interrupted Time Series Week 5 Slides Michael Law, Ph.D. The University of British Columbia COURSE OVERVIEW Layout of the weeks 1. Introduction, setup, data sources 2. Single
More informationBayesFactor Examples
BayesFactor Examples Michael Friendly 04 Dec 2015 The BayesFactor package enables the computation of Bayes factors in standard designs, such as one- and two- sample designs, ANOVA designs, and regression.
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationRepeated Measures Part 4: Blood Flow data
Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value
More informationrun ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt
/* This program is stored as bliss.sas */ /* This program uses PROC LOGISTIC in SAS to fit models with logistic, probit, and complimentary log-log link functions to the beetle mortality data collected
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationNina Zumel and John Mount Win-Vector LLC
SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification
More informationQuantitative Methods in Management
Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises
More informationOrganizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set
Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,
More informationGxE.scan. October 30, 2018
GxE.scan October 30, 2018 Overview GxE.scan can process a GWAS scan using the snp.logistic, additive.test, snp.score or snp.matched functions, whereas snp.scan.logistic only calls snp.logistic. GxE.scan
More informationModel selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41
1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,
More informationMore data analysis examples
More data analysis examples R packages used library(ggplot2) library(tidyr) library(mass) library(leaps) library(dplyr) ## ## Attaching package: dplyr ## The following object is masked from package:mass
More informationThe theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that
The theory of the linear model 41 Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that E(Y X) =X 0 b 0 0 the F-test statistic follows an F-distribution with (p p 0, n p) degrees
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationOptimization Models for Capacitated Clustering Problems
Optimization Models for Capacitated Clustering Problems Marcos Negreiros, Pablo Batista, João Amilcar Rodrigues Universidade Estadual do Ceará (UECE) Mestrado Profissional em Computação Aplicada MPCOMP/UECE-IFCE
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationStatistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015
Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015 Data Exploration Import Relevant Packages: library(grdevices) library(graphics) library(plyr) library(hexbin) library(base)
More informationR Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R
R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the
More informationLearn Sphinx Documentation Documentation
Learn Sphinx Documentation Documentation Release 0.0.1 Lucas Simon Rodrigues Magalhaes January 31, 2014 Contents 1 Negrito e italico 1 2 Listas 3 3 Titulos 5 4 H1 Titulo 7 4.1 H2 Sub-Titulo.............................................
More information1. Introduction. Ciampi 45
From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Designing Neural Networks from Statistical Models: A new approach to data exploration Antonio Ciampi* and Yves Lechevallier**
More information