Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Topics to Discuss: Types of Variables Constructing a Variable Code Book Developing Excel Spreadsheets Data Entry Descriptive vs. Inferential Statistics Parametric vs. Nonparametric Statistics Variables A characteristic or condition that changes or has different values for different individuals Anything that can be measured TYPES OF VARIABLES Qualitative Variables Differ in kind rather than amount Differ in quality, not quantity or magnitude Also referred to as categorical or nominal Examples favorite color, treatment group, gender, race Quantitative Variables Assigned number values that represent differing quantities of the characteristics Examples medication dosage, # of doctor visits, it annual income Quantitative data can either be: Discrete a finite number of values (i.e., # of doctor visits last year) Continuous infinite continuum of possible real number values (i.e., # of minutes it takes to finish a book) the author. 1
Quantitative Variables Three types of quantitative variables: Ordinal categorical scales that have a natural ordering of values (i.e., SES Class low, middle, high) Interval distances between adjacent scores are equal & consistent throughout the scale with no absolute zero point (i.e., IQ scores, temperature) Ratio same as interval with a true zero point (i.e., length, distance, time) Variables Final Points It is possible to measure data on more than one scale Variables should always be measured on the highest scale possible Ratio Interval Ordinal Nominal NAMING VARIABLES The first row should include variable names this makes transfer to other programs easier (i.e., SPSS, SAS) Variable names can be up to 32 characters in length but anything more than 8-12 becomes very cumbersome to manage Each variable name must be unique; duplication is not allowed & names are not case sensitive Variable names should begin with a letter Avoid periods, #, @, $, and only use underscores within the variable name (not at tth the beginning i or end) No spaces are allowed in variable names Use meaningful names for variables Makes variables more self explanatory Some exceptions balance length/meaning Acceptable Names Q1; Q_1 Question1; Question_1 Q1 _ food Food DRS1; DRS_1 Unacceptable Names Q 1; 1Q; Q-1 Question 1; Question-1 Q1 food; Q1-food _Food_ DiabetesRiskScale1 The main thing is to be consistent when naming variables the author. 2
http://www.ciser.cornell.edu/images/excel2sasa.gif What is wrong with this file? CONSTRUCTING A VARIABLE CODE BOOK Purpose: Variable Code Books To create a data entry system To assist with data entry For statistical analysis When archiving data files for follow-up Code Book Construction Elements to include: Variable Name Variable Label describe the variable and/or include the question of interest t Value Labels give labels for each possible numeric value of the variable Example: Age Age of participant at time of survey 1=20-29, 2=30-39, 3=40-49, 4=50-59 Code Book Construction Word or Excel format is acceptable A columned list or table is acceptable All variables should be included with appropriate labeling information Variable labels can be any length but no longer than 256 characters is recommended The variable labels can contain spaces & characters not allowed in variable names Code Book Examples Polit Data Files Swedish Institute for Social Research ACHA NCHA II the author. 3
Code Book Final Points Be consistent in your coding! Update the code book as you enter your data if you make a change while entering your data, make sure you update your code book as well Check & double check your code book acts as a form of communication between you & your data analyst DEVELOPING EXCEL SPREADSHEETS Excel Basics Each individual row of data is known as a record, an observation, a case Do not leave any blank rows There cannot be information i about an item in more than one row Each column is a field labeled to identify the data it contains All data in each column should be formatted the same Do not leave blank columns in the table Excel Basics Once a database is created you can use Excel tools to manage the data Sorting Data Filtering Data Missing Values DATA ENTRY Should be entered consistently use 9 or 99 or 999 The value should be something that cannot represent a real numeric value for the variable in question Excel will recognize these missing values as real values so be careful if you are using Excel for analysis the author. 4
Additional Points Ensure rows below data are not activated so they are not mistaken during transfer as additional cases/observations Numeric values are always best to use for data entry regardless of the type of variable (quantitative vs. qualitative) Values/labels can always be assigned in a code book or data analysis program DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive vs. Inferential Descriptive Statistics Used to summarize, organize, and simplify data for better understanding Means, standard deviations, percents, frequencies, proportions, etc. Inferential Statistics Statistical procedures that allow researchers to study samples & then make generalizations about the population from which they were selected Allows the researcher to draw conclusions PARAMETRIC VS. NONPARAMETRIC STATISTICS Parametric Statistics Parametric Statistics A class of inferential statistical tests that involves (a) assumptions about the distribution of the variables, (b) the estimation of a parameter, and usually (c) the use of interval or ratio measures Statistical tests designed to be used when data have certain characteristics when they approximate a normal distribution & are measured with interval or ratio scales Bivariate One-sample test Two-sample test Analysis of variance (ANOVA) Repeated measures ANOVA Pearson s product moment correlation (r) Multivariate Multiple correlation/regression ANCOVA MANOVA MANCOVA Mixed design RM-ANOVA Canonical analysis Discriminant analysis Logistic regression Factor analysis the author. 5
Nonparametric Statistics Nonparametric Statistics A general class of inferential statistical tests that does not involve rigorous assumptions about the distribution of the variables; most often used with small samples, when data are measured on the nominal or ordinal scales, or when a distribution is severely skewed Statistical tests that are designed to be used when data being analyzed depart from the distributions that can be analyzed with parametric statistics Chi-square goodness-of-fit test Chi-square test of independence Fisher s exact test McNemar test Cochran s Q test Mann-Whitney U test Kruskal-Wallis test Wicoxon signed ranks test Friedman test Spearman s rank order correlation Kendall s tau the author. 6