addesc Add a variable description to the key file CCDmanual0.docx

Size: px
Start display at page:

Download "addesc Add a variable description to the key file CCDmanual0.docx"

Transcription

1 addesc Add a variable description to the key file CCDmanual0.docx The function adds a variable description to the key file. This is useful in cases where a new variable is created, whose description is not yet in the key file. The description is then available for use in dools output. addesc (nvbs,nvbsdes,dsn=null) nvbs name of variable nvbsdes description of nvbs dsn name of data set variable is based upon ( EA, LRB, SCCS, WNAI ) ea lrb sccs wnai The function appends the description to the key file. CSVwrite Write object to *.csv file The function writes an object, with elements capable of being coerced to a dataframe, to a csv file. It is used to write the output from dools to a file that can be read by a spreadsheet. CSVwrite(a1,a2,a3=FALSE) a1 Object to be written typically output from function dools a2 The base name of the *.csv file (do not include the.csv extension) a3 Should the object be appended to the existing file (default=false) No values are returned in the R environment; only changes occur to the specified *.csv file. Set the option a3=true to append the output of object a1 to an existing file with base name a2. The default will simply overwrite any existing csv file with base name a2. Like the write.csv function, except that CSVwrite can append values to an existing csv file, and it can write elements of a list to a csv file. 1

2 domi Produce multiple imputed data sets The function produces multiple imputed data sets from SCCS data, using methods from the mice package. smi<-domi(eavs=null,lrbvs=null,sccsvs=null,wnaivs=null,nimp=10,maxit=7) EAvs character string containing names of variables from EA dataset LRBvs character string containing names of variables from LRB dataset SCCSvs character string containing names of variables from SCCS dataset WNAIvs character string containing names of variables from WNAI dataset nimp the number of imputed data sets to create (default=10) maxit the number of iterations used to estimate imputed data (default=7). The function domi returns a dataframe containing the number of imputed datasets specified by the nimp option. The datasets are stacked one atop the other, and indexed by the variable.imp. This function imputes several new datasets, using covariates for each variable to create a conditional distribution of estimates for each missing value, and then replacing the missing value with a draw from the distribution; as a result, each of the imputed datasets will typically have slightly different values for the estimated cells. The key to successful imputation is to have good covariates for each variable. The function domi begins the search for good covariates by grouping each variable in a cluster of collinear variables. For each cluster, the best covariates are selected from a set of variables with no missing values, including both network lag variables (based on geographic distance, language, and ecology) and climate and ecology variables. The first four arguments are lists of variable names, from the four ethnographic data sets (EA, LRB, SCCS, and WNAI). These will be the data used in model building. One should include all data one thinks might be useful, but no additional data, since additional variables will add to the time it takes for the procedure to run. The fifth argument is the number of imputed datasets to create: between 5 and 10 imputed data sets are considered adequate, but there is no harm in choosing more; the default is 10. The final argument is the number of iterations to perform in creating each imputed dataset; the default is 7. It is not usually necessary to examine the returned dataframe it is used in estimating the model, but is not in itself that interesting. Nevertheless, some output is automatically written to the console as it executes, in order to provide some information about the clusters to which the variables have been assigned, and the covariates selected for each cluster. For each cluster, the names of the members are printed, along with the method used for imputation (in most cases pmm predictive mean matching; variables without missing values are indicated by empty quotes). Prefixes l, e, and d indicate spatial lags for, respectively, linguistic, ecological, and geographic proximity. Additionally, those variables that could not be imputed, due to perfect multicollinearity, are indicated as each cluster is processed. Squared terms are then created for those variables with at least three unique values, and with maximum values below The squared variables are indicated by the sq suffix on the original variable name (e.g., SCCS.v72sq is the square of SCCS.v72 ). The last step is to identify those variables that are perfectly collinear with a linear combination of other variables users should consider dropping some of these, so that the problem of perfect multicollinearity does not crop up during estimation. Based on the methods proposed by Malcolm M. Dow and E.. 2

3 dools Estimate OLS model on multiply imputed data The function estimates an unrestricted and restricted OLS model, with network lag term, providing common diagnostics. h<-dools(smi, depvar, indpv, rindpv=null, othexog=null, dw=true, lw=true, stepw=false, relimp=false, slmtests=false) smi a multiply imputed dataset, created by the function domi depvar the name of the dependent variable (must be in smi) indpv the names of the independent variables for the unrestricted model (must be in smi) rindpv names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable) othexog names of additional exogenous variables (must be in smi; will be added to a list of 21 variables; default is NULL) dw Should geographic proximity be used in constructing composite weight matrix (default=true) lw Should linguistic proximity be used in constructing composite weight matrix (default=true) stepw Should stepwise regression be done to show most-selected variables from unrestricted model (default=false) relimp Should relative importance be calculated for independent variables of restricted model (default=false) slmtests Should spatial lag tests be run for the four weight matrices (default=false) Returns a list with 11 elements: DependVarb Identification of dependent variable URmodel Coefficient estimates from the unrestricted model (includes standardized coefficients and VIFs) Rmodel Coefficient estimates from the restricted model RmodelRobust Coefficient estimates from the restricted model with robust SEs Diagnostics Regression diagnostics for the restricted model (RESET test; Wald test on model restrictions; Breusch- Pagan heteroskedasticity test; Shapiro-Wilkes test for normality of residuals; Hausman tests for endogeneity of independent variables). OtherStats Other statistics: Composite weight matrix weights (see details); R 2 for all models (model creating instrument for network lag term; restricted model; unrestricted model); number of imputations; number of observations. DescripStats Descriptive statistics for variables in unrestricted model. dfbetas Influential observations for dfbetas (see details) totry Character string of variables that were most significant in the unrestricted model as well as additional variables that proved significant using the add1 function on the restricted model. didwell Character string of variables that were most significant in the unrestricted model. interacts Character string of interaction variables that proved significant using the add1 function on the restricted model. Users can choose two kinds of proximity/similarity weight matrices for constructing a network lag term: geographic and linguistic. In most cases, users should choose both (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the weight matrices, is that which maximizes unrestricted model R 2. The network lag term is entered in each model as the variable Wy. The dfbetas are scaled changes in coefficient estimates caused by adding an observation to the model. Only the most influential dfbetas are output. The stepwise procedure can provide additional insight on which independent variables provide the best model fit. Since the imputed datasets differ slightly from each other, the variables selected by a stepwise procedure typically differ slightly for each imputed dataset. If the stepw=true option is chosen, a column labeled stepkept will be added to the table reporting 3

4 unrestricted model results. The column reports the number of times the independent variable was retained in the model by a stepwise procedure using both forward and backward selection. The add1 function tests whether the members of a list of variables prove significant when added singly to a model. The list of variables includes all numeric variables in the imputed dataset, as well as squared terms of variables currently in the unrestricted regression. Variables proving significant in over 80 percent of the imputations are returned in the character string totry. Relative importance is a method of assigning R 2 to each independent variable. The method repeatedly estimates a model, first with one independent variable, then with two, etc. and calculates the change in R 2 as each variable is introduced. The order of entry is changed, and the process repeated, to consider all possible orders of entry. The relative importance measure is the average change in R 2 across all these different models. With large numbers of independent variables, the calculations are prohibitively slow. Setting relimp=true will calculate the relative importance of independent variables in the restricted model, and report these in the column labeled relimp. Based on the methods proposed by Malcolm M. Dow and E.. library(mice) library(foreign) library(stringr) library(psych) library(aer) library(relaimpo) library(geosphere) library(spdep) # --bring in functions and data-- load(url(" ls() #-can see the objects contained in DEz2.Rdata #--list and modify variables for use in model-- # --make new variables-- xcd$sccs.valchild<-(xcd$sccs.v473+xcd$sccs.v474+xcd$sccs.v475+xcd$sccs.v476) # --create descriptions for new variables-- addesc("sccs.valchild","degree to which society values children") addesc("wy","network lag term") # --create new dummy variables-- xcd<-cbind(xcd,mkdummy("sccs","v899",1)) # --identify variables to keep for model building-- ev<-c("v30","v78") lv<-c("group2","hunting","gatherin","fishing","huntfil2", "war1","reven","nomov","dismov","store","subdiv2") sv<-c("v1685","v72","v234","v236","v238","v1648","v899d1", "valchild","v1260","v79","v80","v81","v872","v871") wv<-c("v284","v285","v286","v288","v289","v135") # --make imputed data-- smi<-domi(eavs=ev,lrbvs=lv,sccsvs=sv,wnaivs=wv,nimp=5,maxit=5) names(smi) #--can see which variables are available smi$lrb.lngroup2<-log(smi$lrb.group2) xcd$lrb.lngroup2<-log(xcd$lrb.group2) addesc("lrb.lngroup2","natural log of LRB.group2") # --identify role of variables in model-- dv<-"lrb.lngroup2" riv<-uiv<-c("sccs.v21","wnai.v135","lrb.revensq","lrb.subdiv2","lrb.war1sq") h<-dools(fff=smi,depvar=dv,indpv=c("sccs.v1260",uiv), rindpv=riv,othexog=null,dw=true,lw=true, stepw=true,relimp=true,slmtests=false) print(h) # --print output to csv file-- 4

5 CSVwrite(h,"myOutput",FALSE) keyf keyfile dataset The data.frame keyf contains information about variables from four ethnographic datasets: EA, LRB, SCCS, and WNAI. Format rownames variable type description NOTmissing class nuniqvals FNOTmissing Fclass FnUniqVals db levels Variable names from the data.frame xcd Variable names as given within the ethnographic dataset ( EA, LRB, SCCS, or WNAI) Variable type ( ordinal or categorical ) Variable description Number of non-missing values for variable Variable class ( character or numeric ) Number of unique data values for variable For the factor version of the ethnographic dataset: Number of non-missing values for variable For the factor version of the ethnographic dataset: Variable class ( character, factor, integer, or numeric ) For the factor version of the ethnographic dataset: Number of unique data values for variable Source ethnographic dataset ( EA, LRB, SCCS, or WNAI). GIS data is indicated as gisx. Factor levels for variables defined as factors in the factor version (and with fewer than 20 factor levels). head(keyf) mkdummy Make dummy variable and store a description in key file The function makes a dummy variable from a variable in the data.frame xcd, and creates a description stored in the data.frame keyf. mkdummy(dsn,vv,val) dsn name of an ethnographic dataset (EA, LRB, SCCS, or WNAI) vv name of a variable from the specified ethnographic dataset val the value of variable vv for which the dummy equals one. The function returns a variable named dsn.vvdval, which equals one when xcd$dsn.vv==val, and equals zero otherwise. The main reason to use this function is that it will automatically append a description for the dummy variable to the key file, which is then available for use in dools output. The description is created using the variable name from the key file and the description of the value from the levels variable in the data.frame keyf. 5

6 mkwtmat Make and format three weight matrices for the societies in data.frame xcd The function makes and formats three weight matrices (geographic, linguistic, and ecological) for the societies in data.frame xcd. mkwtmat() The function returns three matrices: ddm eem llm Geographic proximity, based on the latitude and longitude fields in data.frame xcd. Each cell is the inverted squared distance between the row society and column society. The diagonal is set to zero, and then the rows are normalized so that their sum equals one. Ecological proximity, based on the Euclidean distance between societies in the 22-dimensional space defined by 19 climate variables, two altitude variables, and one measure of met primary productivity (all variables scaled to standard normal before distances are calculated). Each cell is exp(-d), where d is the distance between the row society and column society. The diagonal is set to zero, and then the rows are normalized so that their sum equals one. Linguistic proximity between each row and column society. This matrix is not created, but only row normalized. Since the geographic and ecological matrices are relatively fast to compute, but very large, it is more efficient to create them than to load an already constructed matrix. The linguistic matrix, on the other hand, takes a very long time to compute, but is small (many fewer unique values) and is therefore loaded with the other data and only row-normalized in this function. The function is run one time in the domi function, making the matrices available both in the function and in the general environment. xcd Cross cultural dataset The data.frame xcd contains the variables from four ethnographic datasets: EA, LRB, SCCS, and WNAI. The number of societies represented in each of the datasets is 1267 (EA), 339 (LRB), 186 (SCCS), and 172 (WNAI), for a total of 1964 records in the four datasets. However, some societies appear in more than one dataset (1090 appear only in one; 257 appear in two; 108 appear in three; and nine appear in all four), so there are 1464 unique societies. The data.frame xcd therefore contains 1464 observations and 2916 variables: 111 from EA; 262 from LRB; 2055 from SCCS; 440 from WNAI; and 48 that are drawn from GIS data. Format 6

7 For each variable drawn from an ethnographic dataset, the variable name is XX.vv where XX is the name of the ethnographic dataset, and vv is the name of the variable in that dataset. For example, variable v207 from SCCS is names SCCS.v207. dim(xcd) 7

new [[.Dow- Eff Functions - DEf]] blue- colored link to go to there and click one of the five models above listed at that page: e.g.

new [[.Dow- Eff Functions - DEf]] blue- colored link to go to there and click one of the five models above listed at that page: e.g. Make your own DEf model http://intersci.ss.uci.edu/wiki/pdf/make_your_own_def_model.pdf Read: http://intersci.ss.uci.edu/wiki/pdf/wileych5ccrnetsofvarsmodels2blackdrw.pdf This will become part of Wiley

More information

How to Deal with Missing Data and Galton s Problem in Cross-Cultural Survey Research: A Primer for R

How to Deal with Missing Data and Galton s Problem in Cross-Cultural Survey Research: A Primer for R How to Deal with Missing Data and Galton s Problem in Cross-Cultural Survey Research: A Primer for R E. Anthon Eff Malcolm Dow An Article Submitted to Structure and Dynamics: ejournal of Anthropological

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data Labor Economics with STATA Liyousew G. Borga December 2, 2015 Estimating the Human Capital Model Using Artificial Data Liyou Borga Labor Economics with STATA December 2, 2015 84 / 105 Outline 1 The Human

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

AMELIA II: A Program for Missing Data

AMELIA II: A Program for Missing Data AMELIA II: A Program for Missing Data Amelia II is an R package that performs multiple imputation to deal with missing data, instead of other methods, such as pairwise and listwise deletion. In multiple

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

Lab 07: Multiple Linear Regression: Variable Selection

Lab 07: Multiple Linear Regression: Variable Selection Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics

More information

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES GETTING STARTED WITH STATA Sébastien Fontenay ECON - IRES THE SOFTWARE Software developed in 1985 by StataCorp Functionalities Data management Statistical analysis Graphics Using Stata at UCL Computer

More information

Session 8. Statistical analysis Using Gauss Applications

Session 8. Statistical analysis Using Gauss Applications Session 8 Statistical analysis Using Gauss Applications page 1. Descriptive Statistics 8-2 Example: Frequencies 8-2 Example: Histogram 8-2 2. Linear Regression 8-3 Linear regression Options 8-3 Practical

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

OLS Assumptions and Goodness of Fit

OLS Assumptions and Goodness of Fit OLS Assumptions and Goodness of Fit A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Package fso. February 19, 2015

Package fso. February 19, 2015 Version 2.0-1 Date 2013-02-26 Title Fuzzy Set Ordination Package fso February 19, 2015 Author David W. Roberts Maintainer David W. Roberts Description Fuzzy

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

Package midastouch. February 7, 2016

Package midastouch. February 7, 2016 Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Stat 5100 Handout #14.a SAS: Logistic Regression

Stat 5100 Handout #14.a SAS: Logistic Regression Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Using SPSS with The Fundamentals of Political Science Research

Using SPSS with The Fundamentals of Political Science Research Using SPSS with The Fundamentals of Political Science Research Paul M. Kellstedt and Guy D. Whitten Department of Political Science Texas A&M University c Paul M. Kellstedt and Guy D. Whitten 2009 Contents

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is

More information

This electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.).

This electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.). This electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.). This process was used in Predicting instrumental mass fractionation

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Historical Data RSM Tutorial Part 1 The Basics

Historical Data RSM Tutorial Part 1 The Basics DX10-05-3-HistRSM Rev. 1/27/16 Historical Data RSM Tutorial Part 1 The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert software, intended for response surface

More information

Exercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:

Exercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer: Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). Use the inversion algorithm to find the inverse of an invertible matrix. Express

More information

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can: IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical

More information

Gov Troubleshooting the Linear Model II: Heteroskedasticity

Gov Troubleshooting the Linear Model II: Heteroskedasticity Gov 2000-10. Troubleshooting the Linear Model II: Heteroskedasticity Matthew Blackwell December 4, 2015 1 / 64 1. Heteroskedasticity 2. Clustering 3. Serial Correlation 4. What s next for you? 2 / 64 Where

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Package gwrr. February 20, 2015

Package gwrr. February 20, 2015 Type Package Package gwrr February 20, 2015 Title Fits geographically weighted regression models with diagnostic tools Version 0.2-1 Date 2013-06-11 Author David Wheeler Maintainer David Wheeler

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

MPhil computer package lesson: getting started with Eviews

MPhil computer package lesson: getting started with Eviews MPhil computer package lesson: getting started with Eviews Ryoko Ito (ri239@cam.ac.uk, itoryoko@gmail.com, www.itoryoko.com ) 1. Creating an Eviews workfile 1.1. Download Wage data.xlsx from my homepage:

More information

PcAux. Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017

PcAux. Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017 PcAux Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017 What is PcAux? PcAux is an R package that uses the Principal

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering

More information

SAP InfiniteInsight 7.0

SAP InfiniteInsight 7.0 End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight 7.0 Data Toolkit User Guide CUSTOMER Table of Contents 1 About this Document... 3 2 Common Steps... 4 2.1 Selecting a Data Set...

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

Package SeleMix. R topics documented: November 22, 2016

Package SeleMix. R topics documented: November 22, 2016 Package SeleMix November 22, 2016 Type Package Title Selective Editing via Mixture Models Version 1.0.1 Date 2016-11-22 Author Ugo Guarnera, Teresa Buglielli Maintainer Teresa Buglielli

More information

Package ArCo. November 5, 2017

Package ArCo. November 5, 2017 Title Artificial Counterfactual Package Version 0.3-1 Date 2017-11-05 Package ArCo November 5, 2017 Maintainer Gabriel F. R. Vasconcelos BugReports https://github.com/gabrielrvsc/arco/issues

More information

The perturb Package. April 11, colldiag... 1 consumption... 3 perturb... 4 reclassify Index 13

The perturb Package. April 11, colldiag... 1 consumption... 3 perturb... 4 reclassify Index 13 Title Tools for evaluating collinearity Version 2.01 Author John Hendrickx The perturb Package April 11, 2005 Description "perturb" evaluates collinearity by adding random noise to selected variables.

More information

Package spregime. March 12, 2012

Package spregime. March 12, 2012 Title Tools for Spatial Regime Analysis Version 0.3.0 Date 2012-03-12 Author Maintainer Package spregime March 12, 2012 A set of tools designed test to test for spatial heterogeneity characterized by groupwise

More information

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3 GRETL FOR TODDLERS!! JAVIER FERNÁNDEZ-MACHO CONTENTS 1. Access to the econometric software 3 2. Loading and saving data: the File menu 3 2.1. A new data set: 3 2.2. An existent data set: 3 2.3. Importing

More information

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010 INTRODUCTION to Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010 While we are waiting Everyone who wishes to work along with the presentation should log onto

More information

Introduction: EViews. Dr. Peerapat Wongchaiwat

Introduction: EViews. Dr. Peerapat Wongchaiwat Introduction: EViews Dr. Peerapat Wongchaiwat wongchaiwat@hotmail.com Today s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic tests

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Package StVAR. February 11, 2017

Package StVAR. February 11, 2017 Type Package Title Student's t Vector Autoregression (StVAR) Version 1.1 Date 2017-02-10 Author Niraj Poudyal Maintainer Niraj Poudyal Package StVAR February 11, 2017 Description Estimation

More information

Maximum Entropy (Maxent)

Maximum Entropy (Maxent) Maxent interface Maximum Entropy (Maxent) Deterministic Precise mathematical definition Continuous and categorical environmental data Continuous output Maxent can be downloaded at: http://www.cs.princeton.edu/~schapire/maxent/

More information

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...

More information

Regression III: Lab 4

Regression III: Lab 4 Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would

More information

Package pendvine. R topics documented: July 9, Type Package

Package pendvine. R topics documented: July 9, Type Package Type Package Package pendvine July 9, 2015 Title Flexible Pair-Copula Estimation in D-Vines using Bivariate Penalized Splines Version 0.2.4 Date 2015-07-02 Depends R (>= 2.15.1), lattice, TSP, fda, Matrix,

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Introduction to the R Statistical Computing Environment R Programming: Exercises

Introduction to the R Statistical Computing Environment R Programming: Exercises Introduction to the R Statistical Computing Environment R Programming: Exercises John Fox (McMaster University) ICPSR 2014 1. A straightforward problem: Write an R function for linear least-squares regression.

More information

[spa-temp.inf] Spatial-temporal information

[spa-temp.inf] Spatial-temporal information [spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................

More information

RUDIMENTS OF STATA. After entering this command the data file WAGE1.DTA is loaded into memory.

RUDIMENTS OF STATA. After entering this command the data file WAGE1.DTA is loaded into memory. J.M. Wooldridge Michigan State University RUDIMENTS OF STATA This handout covers the most often encountered Stata commands. It is not comprehensive, but the summary will allow you to do basic data management

More information

Local Minima in Regression with Optimal Scaling Transformations

Local Minima in Regression with Optimal Scaling Transformations Chapter 2 Local Minima in Regression with Optimal Scaling Transformations CATREG is a program for categorical multiple regression, applying optimal scaling methodology to quantify categorical variables,

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS} MVA MVA [VARIABLES=] {varlist} {ALL } [/CATEGORICAL=varlist] [/MAXCAT={25 ** }] {n } [/ID=varname] Description: [/NOUNIVARIATE] [/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n}

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

STATA TUTORIAL B. Rabin with modifications by T. Marsh

STATA TUTORIAL B. Rabin with modifications by T. Marsh STATA TUTORIAL B. Rabin with modifications by T. Marsh 5.2.05 (content also from http://www.ats.ucla.edu/stat/spss/faq/compare_packages.htm) Why choose Stata? Stata has a wide array of pre-defined statistical

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

DM and Cluster Identification Algorithm

DM and Cluster Identification Algorithm DM and Cluster Identification Algorithm Andrew Kusiak, Professor oratory Seamans Center Iowa City, Iowa - Tel: 9-9 Fax: 9- E-mail: andrew-kusiak@uiowa.edu Homepage: http://www.icaen.uiowa.edu/~ankusiak

More information

The Amelia Package. March 25, 2007

The Amelia Package. March 25, 2007 The Amelia Package March 25, 2007 Version 1.1-23 Date 2007-03-24 Title Amelia II: A Program for Missing Data Author James Honaker , Gary King , Matthew Blackwell

More information

PSS718 - Data Mining

PSS718 - Data Mining Lecture 3 Hacettepe University, IPS, PSS October 10, 2016 Data is important Data -> Information -> Knowledge -> Wisdom Dataset a collection of data, a.k.a. matrix, table. Observation a row of a dataset,

More information

Excel to R and back 1

Excel to R and back 1 Excel to R and back 1 The R interface in RegressIt allows the user to transfer data from an Excel file to a new data frame in RStudio, load packages, and run regression models with customized table and

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

Study Guide. Module 1. Key Terms

Study Guide. Module 1. Key Terms Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation

More information

A model is built by VORSIM from this Model Builder control screen that loads when the VORSIM desktop icon is clicked. One starts by defining a new

A model is built by VORSIM from this Model Builder control screen that loads when the VORSIM desktop icon is clicked. One starts by defining a new A model is built by VORSIM from this Model Builder control screen that loads when the VORSIM desktop icon is clicked. One starts by defining a new model and creating a model definition workbook. When the

More information

Table 1 below illustrates the construction for the case of 11 integers selected from 20.

Table 1 below illustrates the construction for the case of 11 integers selected from 20. Q: a) From the first 200 natural numbers 101 of them are arbitrarily chosen. Prove that among the numbers chosen there exists a pair such that one divides the other. b) Prove that if 100 numbers are chosen

More information

A quick introduction to STATA

A quick introduction to STATA A quick introduction to STATA Data files and other resources for the course book Introduction to Econometrics by Stock and Watson is available on: http://wps.aw.com/aw_stock_ie_3/178/45691/11696965.cw/index.html

More information

Detecting and Circumventing Collinearity or Ill-Conditioning Problems

Detecting and Circumventing Collinearity or Ill-Conditioning Problems Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems Section 8.1 Introduction Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

ITS Introduction to R course

ITS Introduction to R course ITS Introduction to R course Nov. 29, 2018 Using this document Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Math 263 Excel Assignment 3

Math 263 Excel Assignment 3 ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information

Computer Experiments: Space Filling Design and Gaussian Process Modeling

Computer Experiments: Space Filling Design and Gaussian Process Modeling Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,

More information

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR

More information

Further processing of estimation results: Basic programming with matrices

Further processing of estimation results: Basic programming with matrices The Stata Journal (2005) 5, Number 1, pp. 83 91 Further processing of estimation results: Basic programming with matrices Ian Watson ACIRRT, University of Sydney i.watson@econ.usyd.edu.au Abstract. Rather

More information

Quantitative Methods in Management

Quantitative Methods in Management Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises

More information

Data input for secr. Murray Efford May 5, 2010

Data input for secr. Murray Efford May 5, 2010 Data input for secr Murray Efford May 5, 2010 Data for analysis in secr must be prepared as an object of class capthist which includes both the detector layout and the capture data. The structure of a

More information

Introduction to the R Statistical Computing Environment R Programming: Exercises

Introduction to the R Statistical Computing Environment R Programming: Exercises Introduction to the R Statistical Computing Environment R Programming: Exercises John Fox (McMaster University) ICPSR Summer Program 2010 1. A challenging problem: Iterated weighted least squares (IWLS)

More information

Lisp Basic Example Test Questions

Lisp Basic Example Test Questions 2009 November 30 Lisp Basic Example Test Questions 1. Assume the following forms have been typed into the interpreter and evaluated in the given sequence. ( defun a ( y ) ( reverse y ) ) ( setq a (1 2

More information

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis.

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis. Center for Teaching, Research & Learning Research Support Group at the CTRL Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862 Intro to E-Views E-views is a statistical

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information