Modelo linear no. Valeska Andreozzi

Size: px
Start display at page:

Download "Modelo linear no. Valeska Andreozzi"

Transcription

1 Modelo linear no Valeska Andreozzi valeska.andreozzi at fc.ul.pt Centro de Estatística e Aplicações da Universidade de Lisboa Faculdade de Ciências da Universidade de Lisboa Lisboa, 2012 Sumário 1 Correlação de Pearson 2 2 Correlação de Spearman 3 3 Modelo linear Exemplo Ajuste do modelo linear Fórmula Sumário Intervalo de confiança Comparação de modelos Seleção de covariáveis Análise de resíduos Predição Gráfico dos efeitos Extraindo valores

2 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Valeska Andreozzi 1 CORRELAÇÃO DE PEARSON 1 Correlação de Pearson > library(iswr) > data(thuesen) > View(thuesen) > plot(thuesen) > cor(thuesen$blood.glucose, thuesen$short.velocity) [1] NA > cor(thuesen$blood.glucose, thuesen$short.velocity,use="complete.obs") [1] short.velocity blood.glucose Outro exemplo >?longley > library(car) > scatterplotmatrix(longley) > cor(longley) 2

3 GNP.deflator GNP Unemployed Armed.Forces Population GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Year Employed GNP.deflator GNP Unemployed Armed.Forces Population Year Employed Correlação de Spearman > cor(thuesen$blood.glucose,thuesen$short.velocity, + use="complete.obs",method="spearman") [1] Modelo linear Referências online Exemplo Com o objetivo de identificar fatores associados ao peso ao nascer, pesquisadores coletaram as seguintes informações: Utilize estes dados para estimar uma regressão linear múltipla e responder o objetivo do estudo. > bp <- read.table("lowbwtdata.dat", header = T) > dim(bp) [1] > names(bp) <- tolower(names(bp)) > head(bp) 3

4 Descrição Códigos/Valores Variável Identification Code ID Number ID Low Birth Weight 1 = BWT<=2500g LOW 0 = BWT>2500g Age of Mother Years AGE Weight of Mother at Pounds LWT Last Menstrual Period Race 1 = White, 2 = Black RACE 3 = Other Smoking Status 0 = No, 1 = Yes SMOKE During Pregnancy History of Premature Labor 0,1,2, PTL History of Hypertension 0 = No, 1 = Yes HT Presence of Uterine Irritability 0 = No, 1 = Yes UI Number of Physician Visits 0,1,2, FTV During the First Trimester Birth Weight Grams BWT id low age lwt race smoke ptl ht ui ftv bwt Indicando ao R que as variáveis são categóricas > bp$race <- factor(bp$race) > bp$smoke <- factor(bp$smoke) > bp$ht <- factor(bp$ht) > bp$ui <- factor(bp$ui) > bp$low <- factor(bp$low) Para saber quais as classes são referências, temos > contrasts(bp$race) > contrasts(bp$smoke)

5 > contrasts(bp$ht) > contrasts(bp$ui) > contrasts(bp$low) Trocando a escala da variável resposta para kg > bp$bwt<-bp$bwt/1000 Sumário dos dados > summary(bp) id low age lwt race smoke Min. : 4.0 0:130 Min. :14.00 Min. : :96 0:115 1st Qu.: : 59 1st Qu.: st Qu.: :26 1: 74 Median :123.0 Median :23.00 Median : :67 Mean :121.1 Mean :23.24 Mean : rd Qu.: rd Qu.: rd Qu.:140.0 Max. :226.0 Max. :45.00 Max. :250.0 ptl ht ui ftv bwt Min. : :177 0:161 Min. : Min. : st Qu.: : 12 1: 28 1st Qu.: st Qu.:2.414 Median : Median : Median :2.977 Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.:3.475 Max. : Max. : Max. :4.990 > as.data.frame(table(bp$ptl)) 5

6 Var1 Freq > as.data.frame(table(bp$ftv)) Var1 Freq Ajuste do modelo linear > bp.lm1 <- lm(bwt ~ age+lwt+race+ftv, data = bp) > bp.lm1 Call: lm(formula = bwt ~ age + lwt + race + ftv, data = bp) (Intercept) age lwt race2 race3 ftv Fórmula + para incluir efeitos principais, A+B : para incluir interações, A : B * para incluir efeitos principais e interações, A B = A+B +A : B I() para incluir termos matemáticos, I(A 2) Exemplos > fit1 <- lm(bwt ~age*race, data = bp) > fit1 Call: lm(formula = bwt ~ age * race, data = bp) (Intercept) age race2 race3 age:race2 age:race

7 > fit2 <- lm(bwt ~age + I(age^2), data = bp) > fit2 Call: lm(formula = bwt ~ age + I(age^2), data = bp) (Intercept) age I(age^2) Sumário > summary(bp.lm1) Call: lm(formula = bwt ~ age + lwt + race + ftv, data = bp) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** age lwt * race ** race * ftv Signif. codes: 0 *** ** 0.01 * Residual standard error: on 183 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 183 DF, p-value: Intervalo de confiança > confint(bp.lm1) 2.5 % 97.5 % (Intercept) age lwt race race ftv

8 3.6 Comparação de modelos > fit0 <- lm(bwt ~age+race, data = bp) > fit1 <- lm(bwt ~age*race, data = bp) > anova(fit0,fit1,test="f") Analysis of Variance Table Model 1: bwt ~ age + race Model 2: bwt ~ age * race Res.Df RSS Df Sum of Sq F Pr(>F) Seleção de covariáveis Procedimento stepwise > bw.mod1 <- glm(bwt ~ age+lwt+race+smoke+ht+ftv, data = bp) > summary(bw.mod1) Call: glm(formula = bwt ~ age + lwt + race + smoke + ht + ftv, data = bp) Deviance Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) e-15 *** age lwt ** race ** race ** smoke *** ht * ftv Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for gaussian family taken to be ) Null deviance: on 188 degrees of freedom Residual deviance: on 181 degrees of freedom AIC: Number of Fisher Scoring iterations: 2 8

9 > mod.both<-step(bw.mod1,direction="both") Start: AIC= bwt ~ age + lwt + race + smoke + ht + ftv - ftv age <none> ht lwt smoke race Step: AIC= bwt ~ age + lwt + race + smoke + ht - age <none> ftv ht lwt smoke race Step: AIC= bwt ~ lwt + race + smoke + ht <none> age ftv ht lwt smoke race > mod.both Call: glm(formula = bwt ~ lwt + race + smoke + ht, data = bp) (Intercept) lwt race2 race3 smoke1 ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC:

10 Procedimento backward > mod.back<-step(bw.mod1,direction="backward") Start: AIC= bwt ~ age + lwt + race + smoke + ht + ftv - ftv age <none> ht lwt smoke race Step: AIC= bwt ~ age + lwt + race + smoke + ht - age <none> ht lwt smoke race Step: AIC= bwt ~ lwt + race + smoke + ht <none> ht lwt smoke race > mod.back Call: glm(formula = bwt ~ lwt + race + smoke + ht, data = bp) (Intercept) lwt race2 race3 smoke1 ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC:

11 Procedimento forward > bw.nulo <- glm(bwt ~ 1, data = bp) > mod.forw<-step(bw.nulo,scope=list(upper=~age+lwt+race+smoke+ht+ftv), + direction="forward") Start: AIC= bwt ~ 1 + race smoke lwt ht <none> age ftv Step: AIC= bwt ~ race + smoke lwt ht <none> age ftv Step: AIC= bwt ~ race + smoke + lwt ht <none> ftv age Step: AIC= bwt ~ race + smoke + lwt + ht <none> age ftv Step: AIC=

12 bwt ~ race + smoke + lwt + ht <none> age ftv > mod.forw Call: glm(formula = bwt ~ race + smoke + lwt + ht, data = bp) (Intercept) race2 race3 smoke1 lwt ht Degrees of Freedom: 188 Total (i.e. Null); 183 Residual Null Deviance: Residual Deviance: AIC: Análise de resíduos Calculando os resíduos > res<-rstandard(mod.both,type="deviance") > layout(matrix(c(1,2,3,4),2,2)) > plot(mod.both) 3.9 Predição Considere o modelo > fit<-lm(bwt~lwt+race+smoke+ht,data=bp) > summary(fit) Call: lm(formula = bwt ~ lwt + race + smoke + ht, data = bp) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** 12

13 lwt ** race ** race ** smoke *** ht * --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 183 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 183 DF, p-value: 9.859e-07 Considere uma nova observação com a seguintes características: lwt=170 race=3 smoke=1 ht=1 Qual o peso esperado à nascença para uma criança cuja mãe apresenta as características acima? > new<-data.frame(lwt=170,race="3",smoke="1",ht="1") > predict(fit, new,se.fit = TRUE) $fit $se.fit [1] $df [1] 183 $residual.scale [1] Obtenha um intervalo de confiança a um nível de 95% de confiança para esta nova criança. > pred.w.plim <- predict(fit,new, interval="prediction") > pred.w.plim fit lwr upr

14 Obtenha um intervalo de confiança a um nível de 95% de confiança para o peso médio à nascença das crianças cujas mães possuem essas mesmas características. > pred.w.clim <- predict(fit,new, interval="confidence") > pred.w.clim fit lwr upr Gráfico dos efeitos > library(effects) > plot(effect("lwt",fit)) lwt effect plot bwt lwt > plot(effect("race",fit)) race effect plot bwt race > plot(alleffects(fit),ask=false) lwt effect plot race effect plot bwt bwt lwt smoke effect plot race ht effect plot bwt bwt smoke 0 1 ht 14

15 3.11 Extraindo valores > fitted(fit) #valores ajustados > coefficients(fit) #coeficientes do modelo > names(fit) #lista o nome dos objetos do modelo fit > is.list(fit) 15

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables: Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

1 The SAS System 23:01 Friday, November 9, 2012

1 The SAS System 23:01 Friday, November 9, 2012 2101f12HW9chickwts.log Saved: Wednesday, November 14, 2012 6:50:49 PM Page 1 of 3 1 The SAS System 23:01 Friday, November 9, 2012 NOTE: Copyright (c) 2002-2010 by SAS Institute Inc., Cary, NC, USA. NOTE:

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

Stat 5303 (Oehlert): Response Surfaces 1

Stat 5303 (Oehlert): Response Surfaces 1 Stat 5303 (Oehlert): Response Surfaces 1 > data

More information

Statistical Tests for Variable Discrimination

Statistical Tests for Variable Discrimination Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:

More information

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods. STAT 48 - STAT 48 - December 5, 27 STAT 48 - STAT 48 - Here are a few questions to consider: What does statistical learning mean to you? Is statistical learning different from statistics as a whole? What

More information

The Statistical Sleuth in R: Chapter 10

The Statistical Sleuth in R: Chapter 10 The Statistical Sleuth in R: Chapter 10 Kate Aloisio Ruobing Zhang Nicholas J. Horton September 28, 2013 Contents 1 Introduction 1 2 Galileo s data on the motion of falling bodies 2 2.1 Data coding, summary

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

that is, Data Science Hello World.

that is, Data Science Hello World. R 4 hackers Hello World that is, Data Science Hello World. We got some data... Sure, first we ALWAYS do some data exploration. data(longley) head(longley) GNP.deflator GNP Unemployed Armed.Forces Population

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1) Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Solution to Bonus Questions

Solution to Bonus Questions Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample

More information

Model selection and model averaging with an information criterion (AIC) approach G. San Martin

Model selection and model averaging with an information criterion (AIC) approach G. San Martin Model selection and model averaging with an information criterion (AIC) approach G. San Martin gilles.sanmartin@gmail.com Centre Wallon de Recherche Agronomique Outline Example of ecological dataset/questions

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

Multiple Linear Regression: Global tests and Multiple Testing

Multiple Linear Regression: Global tests and Multiple Testing Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike

More information

Discriminant analysis in R QMMA

Discriminant analysis in R QMMA Discriminant analysis in R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l4-lda-eng.html#(1) 1/26 Default data Get the data set Default library(islr)

More information

TRI para escalas politômicas. Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco

TRI para escalas politômicas. Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco TRI para escalas politômicas Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco Modelos Modelo Rasch-Andrich Rating Scale Model (respostas graduais)

More information

Multinomial Logit Models with R

Multinomial Logit Models with R Multinomial Logit Models with R > rm(list=ls()); options(scipen=999) # To avoid scientific notation > # install.packages("mlogit", dependencies=true) # Only need to do this once > library(mlogit) # Load

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several

More information

URLs identification task: Istat current status. Istat developed and applied a procedure consisting of the following steps:

URLs identification task: Istat current status. Istat developed and applied a procedure consisting of the following steps: ESSnet BIG DATA WorkPackage 2 URLs identification task: Istat current status Giulio Barcaroli, Monica Scannapieco, Donato Summa Istat developed and applied a procedure consisting of the following steps:

More information

A Knitr Demo. Charles J. Geyer. February 8, 2017

A Knitr Demo. Charles J. Geyer. February 8, 2017 A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

Regression III: Lab 4

Regression III: Lab 4 Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would

More information

1 Lab 1. Graphics and Checking Residuals

1 Lab 1. Graphics and Checking Residuals R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

What is an Algebra. Core Relational Algebra. What is Relational Algebra? Operação de Seleção. Álgebra Relacional: Resumo

What is an Algebra. Core Relational Algebra. What is Relational Algebra? Operação de Seleção. Álgebra Relacional: Resumo What is an Algebra Bancos de Dados Avançados Revisão: Álgebra Relacional DCC030 - TCC: Bancos de Dados Avançados (Ciência Computação) DCC049 - TSI: Bancos de Dados Avançados (Sistemas Informação) DCC842

More information

Bernt Arne Ødegaard. 15 November 2018

Bernt Arne Ødegaard. 15 November 2018 R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,

More information

S CHAPTER return.data S CHAPTER.Data S CHAPTER

S CHAPTER return.data S CHAPTER.Data S CHAPTER 1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

Solution to Series 7

Solution to Series 7 Dr. Marcel Dettling Applied Statistical Regression AS 2015 Solution to Series 7 1. a) We begin the analysis by plotting histograms and barplots for all variables. > ## load data > load("customerwinback.rda")

More information

Salary 9 mo : 9 month salary for faculty member for 2004

Salary 9 mo : 9 month salary for faculty member for 2004 22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Chapter 10: Extensions to the GLM

Chapter 10: Extensions to the GLM Chapter 10: Extensions to the GLM 10.1 Implement a GAM for the Swedish mortality data, for males, using smooth functions for age and year. Age and year are standardized as described in Section 4.11, for

More information

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive, R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely

More information

SAS/ETS. Séries Temporais Usando o SAS. Kim Samejima. November 4, 2018 UFBA. Kim Samejima (UFBA) SAS/ETS November 4, / 22

SAS/ETS. Séries Temporais Usando o SAS. Kim Samejima. November 4, 2018 UFBA. Kim Samejima (UFBA) SAS/ETS November 4, / 22 SAS/ETS Séries Temporais Usando o SAS Kim Samejima UFBA November 4, 2018 Kim Samejima (UFBA) SAS/ETS November 4, 2018 1 / 22 SAS Datasets Criando libnames e tabelas SAS (datasets) data lib.dboutput(keep=

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Instruction on JMP IN of Chapter 19

Instruction on JMP IN of Chapter 19 Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"

More information

WINKS SDA Statistical Data Analysis and Graphs. WINKS R Command Summary Reference Guide

WINKS SDA Statistical Data Analysis and Graphs. WINKS R Command Summary Reference Guide WINKS SDA Statistical Data Analysis and Graphs WINKS R Command Summary Reference Guide 2011 Alan C. Elliott, TexaSoft For the latest edition, go to http:///winksr_guide.pdf WINKS R Command Summary 2 Table

More information

mcssubset: Efficient Computation of Best Subset Linear Regressions in R

mcssubset: Efficient Computation of Best Subset Linear Regressions in R mcssubset: Efficient Computation of Best Subset Linear Regressions in R Marc Hofmann Université de Neuchâtel Cristian Gatu Université de Neuchâtel Erricos J. Kontoghiorghes Birbeck College Achim Zeileis

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Exploratory regression and model

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Dynamic Network Regression Using R Package dnr

Dynamic Network Regression Using R Package dnr Dynamic Network Regression Using R Package dnr Abhirup Mallik July 26, 2018 R package dnr enables the user to fit dynamic network regression models for time variate network data available mostly in social

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

Binary Regression in S-Plus

Binary Regression in S-Plus Fall 200 STA 216 September 7, 2000 1 Getting Started in UNIX Binary Regression in S-Plus Create a class working directory and.data directory for S-Plus 5.0. If you have used Splus 3.x before, then it is

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Exploratory regression and model selection The lecture notes, exercises

More information

src docs Release Author

src docs Release Author src docs Release 0.8.18 Author September 20, 2018 Contents 1 networkapiclient package 3 1.1 Submodules............................................... 3 1.2 networkapiclient.ambiente module...................................

More information

Math 263 Excel Assignment 3

Math 263 Excel Assignment 3 ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce

More information

Chapitre 2 : modèle linéaire généralisé

Chapitre 2 : modèle linéaire généralisé Chapitre 2 : modèle linéaire généralisé Introduction et jeux de données Avant de commencer Faire pointer R vers votre répertoire setwd("~/dropbox/evry/m1geniomhe/cours/") source(file = "fonction_illustration_logistique.r")

More information

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Instrumental variables, bootstrapping, and generalized linear models

Instrumental variables, bootstrapping, and generalized linear models The Stata Journal (2003) 3, Number 4, pp. 351 360 Instrumental variables, bootstrapping, and generalized linear models James W. Hardin Arnold School of Public Health University of South Carolina Columbia,

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

CH5: CORR & SIMPLE LINEAR REFRESSION ======================================= STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

More information

ITSx: Policy Analysis Using Interrupted Time Series

ITSx: Policy Analysis Using Interrupted Time Series ITSx: Policy Analysis Using Interrupted Time Series Week 5 Slides Michael Law, Ph.D. The University of British Columbia COURSE OVERVIEW Layout of the weeks 1. Introduction, setup, data sources 2. Single

More information

BayesFactor Examples

BayesFactor Examples BayesFactor Examples Michael Friendly 04 Dec 2015 The BayesFactor package enables the computation of Bayes factors in standard designs, such as one- and two- sample designs, ANOVA designs, and regression.

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

run ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt

run ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt /* This program is stored as bliss.sas */ /* This program uses PROC LOGISTIC in SAS to fit models with logistic, probit, and complimentary log-log link functions to the beetle mortality data collected

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification

More information

Quantitative Methods in Management

Quantitative Methods in Management Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

GxE.scan. October 30, 2018

GxE.scan. October 30, 2018 GxE.scan October 30, 2018 Overview GxE.scan can process a GWAS scan using the snp.logistic, additive.test, snp.score or snp.matched functions, whereas snp.scan.logistic only calls snp.logistic. GxE.scan

More information

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41 1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,

More information

More data analysis examples

More data analysis examples More data analysis examples R packages used library(ggplot2) library(tidyr) library(mass) library(leaps) library(dplyr) ## ## Attaching package: dplyr ## The following object is masked from package:mass

More information

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that The theory of the linear model 41 Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that E(Y X) =X 0 b 0 0 the F-test statistic follows an F-distribution with (p p 0, n p) degrees

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Optimization Models for Capacitated Clustering Problems

Optimization Models for Capacitated Clustering Problems Optimization Models for Capacitated Clustering Problems Marcos Negreiros, Pablo Batista, João Amilcar Rodrigues Universidade Estadual do Ceará (UECE) Mestrado Profissional em Computação Aplicada MPCOMP/UECE-IFCE

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015

Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015 Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015 Data Exploration Import Relevant Packages: library(grdevices) library(graphics) library(plyr) library(hexbin) library(base)

More information

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the

More information

Learn Sphinx Documentation Documentation

Learn Sphinx Documentation Documentation Learn Sphinx Documentation Documentation Release 0.0.1 Lucas Simon Rodrigues Magalhaes January 31, 2014 Contents 1 Negrito e italico 1 2 Listas 3 3 Titulos 5 4 H1 Titulo 7 4.1 H2 Sub-Titulo.............................................

More information

1. Introduction. Ciampi 45

1. Introduction. Ciampi 45 From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Designing Neural Networks from Statistical Models: A new approach to data exploration Antonio Ciampi* and Yves Lechevallier**

More information