Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the course there will be several practicals. Answers will be provided afterwards, including SPSS syntax. 2
Research General research question Objective / hypothesis Study design Data collection Database Data analysis Discussion / conclusions A valid data analysis can only take place when all the previous steps were performed adequately 3 Database Example 4
Types of data Type Continous Categorical - binary -ordinal - nominal Text Date Example Age Treatment Arm T stage Hospital Remarks Date of Birth 5 Types of data: special cases Identifiers. A unique code / number to identify an individual patient. Key variable (for merging data, patient file research, etc ). Censored data. Most common is right-censored: event will occur, but we do not know when, e.g. death. Interval-censored: the event occurred in a certain time interval, but we do not know exactly when. Derived data. E.g.: age at start of treatment, derived from birth date and treatment date. Imputed data. A way of handling missing data. E.g. estimation of start treatment, based on blood values. Missing data. Missing data are often coded as missing. Beware of these values when you start analyzing data (e.g. 99 = missing). 6
Date and Time Variables To calculate the time between two dates, you can subtract dates from each other. E.g.: (date start therapy) (birth date) = (age at start therapy). Beware of the unit of the calculated age. In SPSS, it will be calculated in seconds (using the option compute ). Age at start (in days) = ( (date start) (birth date) ) / ( 60*60*24) Age at start (in years) = ( (date start) (birth date) ) / ( 60*60*24*365.25) SPSS also contains a date and time wizard, in which you can indicate the desired unit for calculations. 7 Code / Labels Two or more categories (not ordinal) Two: male, female 1, 2 or 0,1 More: Hospital A, B, C, D Whatever is convenient e.g. 1,2,3,4 or 11,17,22,33 Categories, ordinal Age: <40,40-60,>60 1, 2, 3 Risk factor: present, not present Prior surgery: yes, no 1, 0 8
Building a Database - Keep a short paper file per patient (study forms). - Enter original data preferably in a database environment (not Excel). - Construct a code book (next slide). - Keep your original data well-organized. - Save + backup original data, apart from derived data. - Include in your data file name: date, version, ref to study. - Use a text field to comment (and update) for every patient (e.g.: emigrated, lost to follow-up, no visit at 2 years follow-up ) - Check and double-check the data. 9 Code Book Define each variable (previous to data entry) in a code book: name of variables, type (e.g. numerical, text, date), length, decimals, labels / extended variable name (e.g. date of diagnosis in referring hospital ), values (e.g. 1=male, 2=female), missing values: list of defined missing values (e.g. 99=unknown). The code book can also be used to construct an electronic data form for data entry (to minimize errors). Variable names should be reasonably short + well-organized, also to avoid problems when exported to other programs. 10
Electronic Data Form Example of simple data entry form in ACCESS 11 Error Checking Range/outliers: are outliers true values, or errors? Missings: are missing values really missing? Dates: are dates within the expected range? Queries (logical rules): E.g. stop date must be between x and y weeks after start date. 12
Describing continuous data - Descriptives (mean, sd, range, percentiles, min, max, ) - Histogram (distribution of data) - Box plot (range / variation, outliers) - Stem-and-Leaf plot (range, outliers, exact values) - Scatter (2 continuous variables) 13 Describing categ/ordinal data Data can be described in absolute values (numbers) and/or in relative values (%). Data can be described with or without missing values. - Frequency tables - Crosstabs (at least 2 variables) - Graphs: bars, pie charts, 14
Handling & Describing Data in SPSS SPSS - SPSS windows: Data, Variables, Output, Syntax. - Import / export data, output files, syntax files. - Transform data (compute, recode,...). - Describing data (tables, graphs, ). SPSS can import/export other formats (e.g. excel). 16
Windows in SPSS Open windows are shown in the tab Windows To open new windows (data, syntax, output), go to (menu): File new 17 Import Data in SPSS Using the paste button, corresponding syntax is pasted (ready to run). *.dbf, *.xls, *txt,/ 18
Menu: file open - data Get Data Use the paste button to get the syntax in the syntax window It is also possible to start with opening a syntax file, which will read / open the data (without using the menu). To run: (select and) hit the run button. GET FILE='U:\data_statcursus\trial_rt.sav'. DATASET NAME DataSet1 WINDOW=FRONT. 19 Variable View 20 21
Data File Information 21 Data File Information 22
Compute Menu: transform - compute DATASET ACTIVATE DataSet1. COMPUTE duur_rt=tend - tstart. EXECUTE. 23 Displaying Data (Graph) 24
Histogram Menu: Graphs - Legacy dialogs - Histogram GRAPH /HISTOGRAM=duur_rt. 25 Reports, Describing 26
Case Summaries Menu: analyse reports case summaries overview, error checking, summary 27 Descriptives Menu: analyse - descriptive statistics - descriptives DESCRIPTIVES VARIABLES=age /STATISTICS=MEAN STDDEV MIN MAX. 28
Recode Menu: transform - recode RECODE age (45 thru 69.99=0) (70 thru 90=1) INTO age70. EXECUTE. 29 Syntax 30
Data List Free Analyzing data without creating a data table first: data list free / naam1, naam 2, n. begin data. 1 1 18 0 1 162 1 0 21 0 0 159 end data. weight by n. 31 Other Options (exercise) 32
Merge Data Menu: Data Merge Files Add Cases / Add Variables 33 Split File / Selection Cases Menu: Data - Split File Data - Select Cases 34
35 Save Subset There is a possibility to save a subset of the variables: save as, option variables Menu: Data Save as 36
Crosstabs Menu: Analyse - Descriptive statistics - Crosstabs 37 38 Explore Menu: Analyse - Descriptive statistics - Explore 38 39
Explore: factor (by group) Menu: analyse - descriptive statistics - explore EXAMINE VARIABLES=age BY arm /PLOT BOXPLOT STEMLEAF /COMPARE GROUPS /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. (= default, you can change it) 39 40 + stem-and-leaf plot + boxplot in output 40 41
Stem-and-Leaf A Stem-and-Leaf diagram is a special type of histogram. First: stem and leaf must be defined. Example Data: 23, 26, 26, 27, 28, 30, 31, 45, 45, 45 Typically, a Stem-and-Leaf plot looks then like this (with stem unit of 10 and leaf unit of 1). 2 3 6 6 7 8 (stem = 2, leafs are 3 6 6 7 8) 3 0 1 4 5 5 5 SPSS: a Stem-and-Leaf plot is generated when the option explore is used (descriptive statistics). 41 42 Box Plot Visualizes: - distribution (normal? skew?) - full range of variation - outliers SPSS: a Box plot is generated when the option explore is used (descriptive statistics). 42 43
Scatter Menu: Graphs Legacy Dialogs Scatter/Dot 43 44 Pie Chart & Freq Table Variable: cause of death (COD) - display missing data, or not? - numbers or %? Menu: Graphs Legacy Dialogs Pie Charts (option: summaries for groups of cases) 44 45
SPSS Help There are helpful SPSS manuals / guides available at the internet. http://www.sussex.ac.uk/its/pdfs/spss_brief_guide_20.pdf http://www.ats.ucla.edu/stat/spss/modules/ http://www.onderzoekenspss.nl/index.html/ (english) (english) (dutch) SPSS has an extensive Help Function. Demo on youtube about types of data : http://www.youtube.com/watch?v=hzxnzfnt5v8&nr=1&feature=endscreen 45