addesc Add a variable description to the key file CCDmanual0.docx
|
|
- Delilah Walton
- 5 years ago
- Views:
Transcription
1 addesc Add a variable description to the key file CCDmanual0.docx The function adds a variable description to the key file. This is useful in cases where a new variable is created, whose description is not yet in the key file. The description is then available for use in dools output. addesc (nvbs,nvbsdes,dsn=null) nvbs name of variable nvbsdes description of nvbs dsn name of data set variable is based upon ( EA, LRB, SCCS, WNAI ) ea lrb sccs wnai The function appends the description to the key file. CSVwrite Write object to *.csv file The function writes an object, with elements capable of being coerced to a dataframe, to a csv file. It is used to write the output from dools to a file that can be read by a spreadsheet. CSVwrite(a1,a2,a3=FALSE) a1 Object to be written typically output from function dools a2 The base name of the *.csv file (do not include the.csv extension) a3 Should the object be appended to the existing file (default=false) No values are returned in the R environment; only changes occur to the specified *.csv file. Set the option a3=true to append the output of object a1 to an existing file with base name a2. The default will simply overwrite any existing csv file with base name a2. Like the write.csv function, except that CSVwrite can append values to an existing csv file, and it can write elements of a list to a csv file. 1
2 domi Produce multiple imputed data sets The function produces multiple imputed data sets from SCCS data, using methods from the mice package. smi<-domi(eavs=null,lrbvs=null,sccsvs=null,wnaivs=null,nimp=10,maxit=7) EAvs character string containing names of variables from EA dataset LRBvs character string containing names of variables from LRB dataset SCCSvs character string containing names of variables from SCCS dataset WNAIvs character string containing names of variables from WNAI dataset nimp the number of imputed data sets to create (default=10) maxit the number of iterations used to estimate imputed data (default=7). The function domi returns a dataframe containing the number of imputed datasets specified by the nimp option. The datasets are stacked one atop the other, and indexed by the variable.imp. This function imputes several new datasets, using covariates for each variable to create a conditional distribution of estimates for each missing value, and then replacing the missing value with a draw from the distribution; as a result, each of the imputed datasets will typically have slightly different values for the estimated cells. The key to successful imputation is to have good covariates for each variable. The function domi begins the search for good covariates by grouping each variable in a cluster of collinear variables. For each cluster, the best covariates are selected from a set of variables with no missing values, including both network lag variables (based on geographic distance, language, and ecology) and climate and ecology variables. The first four arguments are lists of variable names, from the four ethnographic data sets (EA, LRB, SCCS, and WNAI). These will be the data used in model building. One should include all data one thinks might be useful, but no additional data, since additional variables will add to the time it takes for the procedure to run. The fifth argument is the number of imputed datasets to create: between 5 and 10 imputed data sets are considered adequate, but there is no harm in choosing more; the default is 10. The final argument is the number of iterations to perform in creating each imputed dataset; the default is 7. It is not usually necessary to examine the returned dataframe it is used in estimating the model, but is not in itself that interesting. Nevertheless, some output is automatically written to the console as it executes, in order to provide some information about the clusters to which the variables have been assigned, and the covariates selected for each cluster. For each cluster, the names of the members are printed, along with the method used for imputation (in most cases pmm predictive mean matching; variables without missing values are indicated by empty quotes). Prefixes l, e, and d indicate spatial lags for, respectively, linguistic, ecological, and geographic proximity. Additionally, those variables that could not be imputed, due to perfect multicollinearity, are indicated as each cluster is processed. Squared terms are then created for those variables with at least three unique values, and with maximum values below The squared variables are indicated by the sq suffix on the original variable name (e.g., SCCS.v72sq is the square of SCCS.v72 ). The last step is to identify those variables that are perfectly collinear with a linear combination of other variables users should consider dropping some of these, so that the problem of perfect multicollinearity does not crop up during estimation. Based on the methods proposed by Malcolm M. Dow and E.. 2
3 dools Estimate OLS model on multiply imputed data The function estimates an unrestricted and restricted OLS model, with network lag term, providing common diagnostics. h<-dools(smi, depvar, indpv, rindpv=null, othexog=null, dw=true, lw=true, stepw=false, relimp=false, slmtests=false) smi a multiply imputed dataset, created by the function domi depvar the name of the dependent variable (must be in smi) indpv the names of the independent variables for the unrestricted model (must be in smi) rindpv names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable) othexog names of additional exogenous variables (must be in smi; will be added to a list of 21 variables; default is NULL) dw Should geographic proximity be used in constructing composite weight matrix (default=true) lw Should linguistic proximity be used in constructing composite weight matrix (default=true) stepw Should stepwise regression be done to show most-selected variables from unrestricted model (default=false) relimp Should relative importance be calculated for independent variables of restricted model (default=false) slmtests Should spatial lag tests be run for the four weight matrices (default=false) Returns a list with 11 elements: DependVarb Identification of dependent variable URmodel Coefficient estimates from the unrestricted model (includes standardized coefficients and VIFs) Rmodel Coefficient estimates from the restricted model RmodelRobust Coefficient estimates from the restricted model with robust SEs Diagnostics Regression diagnostics for the restricted model (RESET test; Wald test on model restrictions; Breusch- Pagan heteroskedasticity test; Shapiro-Wilkes test for normality of residuals; Hausman tests for endogeneity of independent variables). OtherStats Other statistics: Composite weight matrix weights (see details); R 2 for all models (model creating instrument for network lag term; restricted model; unrestricted model); number of imputations; number of observations. DescripStats Descriptive statistics for variables in unrestricted model. dfbetas Influential observations for dfbetas (see details) totry Character string of variables that were most significant in the unrestricted model as well as additional variables that proved significant using the add1 function on the restricted model. didwell Character string of variables that were most significant in the unrestricted model. interacts Character string of interaction variables that proved significant using the add1 function on the restricted model. Users can choose two kinds of proximity/similarity weight matrices for constructing a network lag term: geographic and linguistic. In most cases, users should choose both (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the weight matrices, is that which maximizes unrestricted model R 2. The network lag term is entered in each model as the variable Wy. The dfbetas are scaled changes in coefficient estimates caused by adding an observation to the model. Only the most influential dfbetas are output. The stepwise procedure can provide additional insight on which independent variables provide the best model fit. Since the imputed datasets differ slightly from each other, the variables selected by a stepwise procedure typically differ slightly for each imputed dataset. If the stepw=true option is chosen, a column labeled stepkept will be added to the table reporting 3
4 unrestricted model results. The column reports the number of times the independent variable was retained in the model by a stepwise procedure using both forward and backward selection. The add1 function tests whether the members of a list of variables prove significant when added singly to a model. The list of variables includes all numeric variables in the imputed dataset, as well as squared terms of variables currently in the unrestricted regression. Variables proving significant in over 80 percent of the imputations are returned in the character string totry. Relative importance is a method of assigning R 2 to each independent variable. The method repeatedly estimates a model, first with one independent variable, then with two, etc. and calculates the change in R 2 as each variable is introduced. The order of entry is changed, and the process repeated, to consider all possible orders of entry. The relative importance measure is the average change in R 2 across all these different models. With large numbers of independent variables, the calculations are prohibitively slow. Setting relimp=true will calculate the relative importance of independent variables in the restricted model, and report these in the column labeled relimp. Based on the methods proposed by Malcolm M. Dow and E.. library(mice) library(foreign) library(stringr) library(psych) library(aer) library(relaimpo) library(geosphere) library(spdep) # --bring in functions and data-- load(url(" ls() #-can see the objects contained in DEz2.Rdata #--list and modify variables for use in model-- # --make new variables-- xcd$sccs.valchild<-(xcd$sccs.v473+xcd$sccs.v474+xcd$sccs.v475+xcd$sccs.v476) # --create descriptions for new variables-- addesc("sccs.valchild","degree to which society values children") addesc("wy","network lag term") # --create new dummy variables-- xcd<-cbind(xcd,mkdummy("sccs","v899",1)) # --identify variables to keep for model building-- ev<-c("v30","v78") lv<-c("group2","hunting","gatherin","fishing","huntfil2", "war1","reven","nomov","dismov","store","subdiv2") sv<-c("v1685","v72","v234","v236","v238","v1648","v899d1", "valchild","v1260","v79","v80","v81","v872","v871") wv<-c("v284","v285","v286","v288","v289","v135") # --make imputed data-- smi<-domi(eavs=ev,lrbvs=lv,sccsvs=sv,wnaivs=wv,nimp=5,maxit=5) names(smi) #--can see which variables are available smi$lrb.lngroup2<-log(smi$lrb.group2) xcd$lrb.lngroup2<-log(xcd$lrb.group2) addesc("lrb.lngroup2","natural log of LRB.group2") # --identify role of variables in model-- dv<-"lrb.lngroup2" riv<-uiv<-c("sccs.v21","wnai.v135","lrb.revensq","lrb.subdiv2","lrb.war1sq") h<-dools(fff=smi,depvar=dv,indpv=c("sccs.v1260",uiv), rindpv=riv,othexog=null,dw=true,lw=true, stepw=true,relimp=true,slmtests=false) print(h) # --print output to csv file-- 4
5 CSVwrite(h,"myOutput",FALSE) keyf keyfile dataset The data.frame keyf contains information about variables from four ethnographic datasets: EA, LRB, SCCS, and WNAI. Format rownames variable type description NOTmissing class nuniqvals FNOTmissing Fclass FnUniqVals db levels Variable names from the data.frame xcd Variable names as given within the ethnographic dataset ( EA, LRB, SCCS, or WNAI) Variable type ( ordinal or categorical ) Variable description Number of non-missing values for variable Variable class ( character or numeric ) Number of unique data values for variable For the factor version of the ethnographic dataset: Number of non-missing values for variable For the factor version of the ethnographic dataset: Variable class ( character, factor, integer, or numeric ) For the factor version of the ethnographic dataset: Number of unique data values for variable Source ethnographic dataset ( EA, LRB, SCCS, or WNAI). GIS data is indicated as gisx. Factor levels for variables defined as factors in the factor version (and with fewer than 20 factor levels). head(keyf) mkdummy Make dummy variable and store a description in key file The function makes a dummy variable from a variable in the data.frame xcd, and creates a description stored in the data.frame keyf. mkdummy(dsn,vv,val) dsn name of an ethnographic dataset (EA, LRB, SCCS, or WNAI) vv name of a variable from the specified ethnographic dataset val the value of variable vv for which the dummy equals one. The function returns a variable named dsn.vvdval, which equals one when xcd$dsn.vv==val, and equals zero otherwise. The main reason to use this function is that it will automatically append a description for the dummy variable to the key file, which is then available for use in dools output. The description is created using the variable name from the key file and the description of the value from the levels variable in the data.frame keyf. 5
6 mkwtmat Make and format three weight matrices for the societies in data.frame xcd The function makes and formats three weight matrices (geographic, linguistic, and ecological) for the societies in data.frame xcd. mkwtmat() The function returns three matrices: ddm eem llm Geographic proximity, based on the latitude and longitude fields in data.frame xcd. Each cell is the inverted squared distance between the row society and column society. The diagonal is set to zero, and then the rows are normalized so that their sum equals one. Ecological proximity, based on the Euclidean distance between societies in the 22-dimensional space defined by 19 climate variables, two altitude variables, and one measure of met primary productivity (all variables scaled to standard normal before distances are calculated). Each cell is exp(-d), where d is the distance between the row society and column society. The diagonal is set to zero, and then the rows are normalized so that their sum equals one. Linguistic proximity between each row and column society. This matrix is not created, but only row normalized. Since the geographic and ecological matrices are relatively fast to compute, but very large, it is more efficient to create them than to load an already constructed matrix. The linguistic matrix, on the other hand, takes a very long time to compute, but is small (many fewer unique values) and is therefore loaded with the other data and only row-normalized in this function. The function is run one time in the domi function, making the matrices available both in the function and in the general environment. xcd Cross cultural dataset The data.frame xcd contains the variables from four ethnographic datasets: EA, LRB, SCCS, and WNAI. The number of societies represented in each of the datasets is 1267 (EA), 339 (LRB), 186 (SCCS), and 172 (WNAI), for a total of 1964 records in the four datasets. However, some societies appear in more than one dataset (1090 appear only in one; 257 appear in two; 108 appear in three; and nine appear in all four), so there are 1464 unique societies. The data.frame xcd therefore contains 1464 observations and 2916 variables: 111 from EA; 262 from LRB; 2055 from SCCS; 440 from WNAI; and 48 that are drawn from GIS data. Format 6
7 For each variable drawn from an ethnographic dataset, the variable name is XX.vv where XX is the name of the ethnographic dataset, and vv is the name of the variable in that dataset. For example, variable v207 from SCCS is names SCCS.v207. dim(xcd) 7
new [[.Dow- Eff Functions - DEf]] blue- colored link to go to there and click one of the five models above listed at that page: e.g.
Make your own DEf model http://intersci.ss.uci.edu/wiki/pdf/make_your_own_def_model.pdf Read: http://intersci.ss.uci.edu/wiki/pdf/wileych5ccrnetsofvarsmodels2blackdrw.pdf This will become part of Wiley
More informationHow to Deal with Missing Data and Galton s Problem in Cross-Cultural Survey Research: A Primer for R
How to Deal with Missing Data and Galton s Problem in Cross-Cultural Survey Research: A Primer for R E. Anthon Eff Malcolm Dow An Article Submitted to Structure and Dynamics: ejournal of Anthropological
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationLabor Economics with STATA. Estimating the Human Capital Model Using Artificial Data
Labor Economics with STATA Liyousew G. Borga December 2, 2015 Estimating the Human Capital Model Using Artificial Data Liyou Borga Labor Economics with STATA December 2, 2015 84 / 105 Outline 1 The Human
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use
More informationAMELIA II: A Program for Missing Data
AMELIA II: A Program for Missing Data Amelia II is an R package that performs multiple imputation to deal with missing data, instead of other methods, such as pairwise and listwise deletion. In multiple
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationLab 07: Multiple Linear Regression: Variable Selection
Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics
More informationGETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES
GETTING STARTED WITH STATA Sébastien Fontenay ECON - IRES THE SOFTWARE Software developed in 1985 by StataCorp Functionalities Data management Statistical analysis Graphics Using Stata at UCL Computer
More informationSession 8. Statistical analysis Using Gauss Applications
Session 8 Statistical analysis Using Gauss Applications page 1. Descriptive Statistics 8-2 Example: Frequencies 8-2 Example: Histogram 8-2 2. Linear Regression 8-3 Linear regression Options 8-3 Practical
More informationIntroduction to Mixed Models: Multivariate Regression
Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate
More informationOLS Assumptions and Goodness of Fit
OLS Assumptions and Goodness of Fit A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free
More informationCorrectly Compute Complex Samples Statistics
PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationPackage fso. February 19, 2015
Version 2.0-1 Date 2013-02-26 Title Fuzzy Set Ordination Package fso February 19, 2015 Author David W. Roberts Maintainer David W. Roberts Description Fuzzy
More informationSTAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression
STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,
More informationData Management - 50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationPackage midastouch. February 7, 2016
Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationGeneralized least squares (GLS) estimates of the level-2 coefficients,
Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical
More informationStat 5100 Handout #14.a SAS: Logistic Regression
Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationUsing SPSS with The Fundamentals of Political Science Research
Using SPSS with The Fundamentals of Political Science Research Paul M. Kellstedt and Guy D. Whitten Department of Political Science Texas A&M University c Paul M. Kellstedt and Guy D. Whitten 2009 Contents
More informationD-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview
Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationRecognizing Handwritten Digits Using the LLE Algorithm with Back Propagation
Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is
More informationThis electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.).
This electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.). This process was used in Predicting instrumental mass fractionation
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationHistorical Data RSM Tutorial Part 1 The Basics
DX10-05-3-HistRSM Rev. 1/27/16 Historical Data RSM Tutorial Part 1 The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert software, intended for response surface
More informationExercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:
Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). Use the inversion algorithm to find the inverse of an invertible matrix. Express
More informationIBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:
IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical
More informationGov Troubleshooting the Linear Model II: Heteroskedasticity
Gov 2000-10. Troubleshooting the Linear Model II: Heteroskedasticity Matthew Blackwell December 4, 2015 1 / 64 1. Heteroskedasticity 2. Clustering 3. Serial Correlation 4. What s next for you? 2 / 64 Where
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationPackage gwrr. February 20, 2015
Type Package Package gwrr February 20, 2015 Title Fits geographically weighted regression models with diagnostic tools Version 0.2-1 Date 2013-06-11 Author David Wheeler Maintainer David Wheeler
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationMPhil computer package lesson: getting started with Eviews
MPhil computer package lesson: getting started with Eviews Ryoko Ito (ri239@cam.ac.uk, itoryoko@gmail.com, www.itoryoko.com ) 1. Creating an Eviews workfile 1.1. Download Wage data.xlsx from my homepage:
More informationPcAux. Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017
PcAux Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017 What is PcAux? PcAux is an R package that uses the Principal
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationData Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering
More informationSAP InfiniteInsight 7.0
End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight 7.0 Data Toolkit User Guide CUSTOMER Table of Contents 1 About this Document... 3 2 Common Steps... 4 2.1 Selecting a Data Set...
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationLogical operators: R provides an extensive list of logical operators. These include
meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few
More informationSerial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix
Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent
More informationPackage SeleMix. R topics documented: November 22, 2016
Package SeleMix November 22, 2016 Type Package Title Selective Editing via Mixture Models Version 1.0.1 Date 2016-11-22 Author Ugo Guarnera, Teresa Buglielli Maintainer Teresa Buglielli
More informationPackage ArCo. November 5, 2017
Title Artificial Counterfactual Package Version 0.3-1 Date 2017-11-05 Package ArCo November 5, 2017 Maintainer Gabriel F. R. Vasconcelos BugReports https://github.com/gabrielrvsc/arco/issues
More informationThe perturb Package. April 11, colldiag... 1 consumption... 3 perturb... 4 reclassify Index 13
Title Tools for evaluating collinearity Version 2.01 Author John Hendrickx The perturb Package April 11, 2005 Description "perturb" evaluates collinearity by adding random noise to selected variables.
More informationPackage spregime. March 12, 2012
Title Tools for Spatial Regime Analysis Version 0.3.0 Date 2012-03-12 Author Maintainer Package spregime March 12, 2012 A set of tools designed test to test for spatial heterogeneity characterized by groupwise
More informationGRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3
GRETL FOR TODDLERS!! JAVIER FERNÁNDEZ-MACHO CONTENTS 1. Access to the econometric software 3 2. Loading and saving data: the File menu 3 2.1. A new data set: 3 2.2. An existent data set: 3 2.3. Importing
More informationINTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010
INTRODUCTION to Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010 While we are waiting Everyone who wishes to work along with the presentation should log onto
More informationIntroduction: EViews. Dr. Peerapat Wongchaiwat
Introduction: EViews Dr. Peerapat Wongchaiwat wongchaiwat@hotmail.com Today s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic tests
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationPackage StVAR. February 11, 2017
Type Package Title Student's t Vector Autoregression (StVAR) Version 1.1 Date 2017-02-10 Author Niraj Poudyal Maintainer Niraj Poudyal Package StVAR February 11, 2017 Description Estimation
More informationMaximum Entropy (Maxent)
Maxent interface Maximum Entropy (Maxent) Deterministic Precise mathematical definition Continuous and categorical environmental data Continuous output Maxent can be downloaded at: http://www.cs.princeton.edu/~schapire/maxent/
More informationStatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...
More informationRegression III: Lab 4
Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would
More informationPackage pendvine. R topics documented: July 9, Type Package
Type Package Package pendvine July 9, 2015 Title Flexible Pair-Copula Estimation in D-Vines using Bivariate Penalized Splines Version 0.2.4 Date 2015-07-02 Depends R (>= 2.15.1), lattice, TSP, fda, Matrix,
More informationBox-Cox Transformation for Simple Linear Regression
Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationCDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed
More informationIntroduction to the R Statistical Computing Environment R Programming: Exercises
Introduction to the R Statistical Computing Environment R Programming: Exercises John Fox (McMaster University) ICPSR 2014 1. A straightforward problem: Write an R function for linear least-squares regression.
More information[spa-temp.inf] Spatial-temporal information
[spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................
More informationRUDIMENTS OF STATA. After entering this command the data file WAGE1.DTA is loaded into memory.
J.M. Wooldridge Michigan State University RUDIMENTS OF STATA This handout covers the most often encountered Stata commands. It is not comprehensive, but the summary will allow you to do basic data management
More informationLocal Minima in Regression with Optimal Scaling Transformations
Chapter 2 Local Minima in Regression with Optimal Scaling Transformations CATREG is a program for categorical multiple regression, applying optimal scaling methodology to quantify categorical variables,
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More information[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}
MVA MVA [VARIABLES=] {varlist} {ALL } [/CATEGORICAL=varlist] [/MAXCAT={25 ** }] {n } [/ID=varname] Description: [/NOUNIVARIATE] [/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n}
More informationSubset Selection in Multiple Regression
Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that
More informationSTATA TUTORIAL B. Rabin with modifications by T. Marsh
STATA TUTORIAL B. Rabin with modifications by T. Marsh 5.2.05 (content also from http://www.ats.ucla.edu/stat/spss/faq/compare_packages.htm) Why choose Stata? Stata has a wide array of pre-defined statistical
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationDM and Cluster Identification Algorithm
DM and Cluster Identification Algorithm Andrew Kusiak, Professor oratory Seamans Center Iowa City, Iowa - Tel: 9-9 Fax: 9- E-mail: andrew-kusiak@uiowa.edu Homepage: http://www.icaen.uiowa.edu/~ankusiak
More informationThe Amelia Package. March 25, 2007
The Amelia Package March 25, 2007 Version 1.1-23 Date 2007-03-24 Title Amelia II: A Program for Missing Data Author James Honaker , Gary King , Matthew Blackwell
More informationPSS718 - Data Mining
Lecture 3 Hacettepe University, IPS, PSS October 10, 2016 Data is important Data -> Information -> Knowledge -> Wisdom Dataset a collection of data, a.k.a. matrix, table. Observation a row of a dataset,
More informationExcel to R and back 1
Excel to R and back 1 The R interface in RegressIt allows the user to transfer data from an Excel file to a new data frame in RStudio, load packages, and run regression models with customized table and
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly
More informationChapter 13 Multivariate Techniques. Chapter Table of Contents
Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques
More informationStudy Guide. Module 1. Key Terms
Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation
More informationA model is built by VORSIM from this Model Builder control screen that loads when the VORSIM desktop icon is clicked. One starts by defining a new
A model is built by VORSIM from this Model Builder control screen that loads when the VORSIM desktop icon is clicked. One starts by defining a new model and creating a model definition workbook. When the
More informationTable 1 below illustrates the construction for the case of 11 integers selected from 20.
Q: a) From the first 200 natural numbers 101 of them are arbitrarily chosen. Prove that among the numbers chosen there exists a pair such that one divides the other. b) Prove that if 100 numbers are chosen
More informationA quick introduction to STATA
A quick introduction to STATA Data files and other resources for the course book Introduction to Econometrics by Stock and Watson is available on: http://wps.aw.com/aw_stock_ie_3/178/45691/11696965.cw/index.html
More informationDetecting and Circumventing Collinearity or Ill-Conditioning Problems
Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems Section 8.1 Introduction Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning
More informationMultiple imputation using chained equations: Issues and guidance for practice
Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau
More informationITS Introduction to R course
ITS Introduction to R course Nov. 29, 2018 Using this document Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationMath 263 Excel Assignment 3
ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce
More informationXES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework
XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research
More informationComputer Experiments: Space Filling Design and Gaussian Process Modeling
Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,
More informationANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3
ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR
More informationFurther processing of estimation results: Basic programming with matrices
The Stata Journal (2005) 5, Number 1, pp. 83 91 Further processing of estimation results: Basic programming with matrices Ian Watson ACIRRT, University of Sydney i.watson@econ.usyd.edu.au Abstract. Rather
More informationQuantitative Methods in Management
Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises
More informationData input for secr. Murray Efford May 5, 2010
Data input for secr Murray Efford May 5, 2010 Data for analysis in secr must be prepared as an object of class capthist which includes both the detector layout and the capture data. The structure of a
More informationIntroduction to the R Statistical Computing Environment R Programming: Exercises
Introduction to the R Statistical Computing Environment R Programming: Exercises John Fox (McMaster University) ICPSR Summer Program 2010 1. A challenging problem: Iterated weighted least squares (IWLS)
More informationLisp Basic Example Test Questions
2009 November 30 Lisp Basic Example Test Questions 1. Assume the following forms have been typed into the interpreter and evaluated in the given sequence. ( defun a ( y ) ( reverse y ) ) ( setq a (1 2
More informationIntro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis.
Center for Teaching, Research & Learning Research Support Group at the CTRL Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862 Intro to E-Views E-views is a statistical
More informationIterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms
Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear
More information