Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl
This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2
Database Example 3
Types of data Type Continous Categorical - binary - ordinal - nominal Text Date Example Age Treatment Arm T stage Hospital Remarks Date of Birth 4
SPSS 5
SPSS 6
SPSS - SPSS can import/export other formats (e.g. excel). - You can open multiple databases simultaneously. - You can copy output to other programs. 7
SPSS windows Data View 8
SPSS View Variable View 9
SPSS View Variable View 10
Windows in SPSS Open windows are shown in the tab Windows 11
Windows in SPSS To open new windows (data, syntax, output), go to (menu): File new Script window.. 12
Output Window 13
Syntax Window The syntax window is a script / commands window. 14
Menu: File 15
Import Data in SPSS Using the paste button, corresponding syntax is pasted (ready to run). *.dbf, *.xls, *txt,/ 16
Get Data Menu: file open - data Use the paste button to get the syntax in the syntax window It is also possible to start with opening a syntax file, which will read / open the data (without using the menu). To run: (select and) hit the run button. GET FILE='U:\data_statcursus\trial_rt.sav'. DATASET NAME DataSet1 WINDOW=FRONT. 17
Data File Information 18
Data File Information 19
Save Subset There is a possibility to save a subset of the variables: save as, option variables Menu: File save as 20
Menu: Data 21
Menu: Data Sort Cases Sort Cases 22
Split File / Selection Cases Menu: Data - Split File Data - Select Cases 23
Split File / Selection Cases 24
Merge Data Menu: Data Merge Files Add Cases / Add Variables 25
Merge Data Menu: Data Merge Files Add Cases / Add Variables 26
Menu: Transform 27
Compute Menu: transform - compute DATASET ACTIVATE DataSet1. COMPUTE duur_rt=tend - tstart. EXECUTE. 28
Recode Menu: transform - recode RECODE age (45 thru 69.99=0) (70 thru 90=1) INTO age70. EXECUTE. 29
Menu: Analyze 30
Case Summaries Menu: analyze reports case summaries overview, error checking, summary 31
Describing continuous data - Mean and standard deviation - Median - Range, min, max percentiles, - Stem-and-leaf - Box plot - 32 39
Reports, Describing 33
Descriptives Menu: analyze - descriptive statistics - descriptives DESCRIPTIVES VARIABLES=age /STATISTICS=MEAN STDDEV MIN MAX. 34
Explore Menu: Analyze - Descriptive statistics - Explore 35 39
Explore: factor (by group) Menu: analyze - descriptive statistics - explore EXAMINE VARIABLES=age BY arm /PLOT BOXPLOT STEMLEAF /COMPARE GROUPS /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. (= default, you can change it) 36 40
37 41
Stem-and-Leaf A Stem-and-Leaf diagram is a special type of histogram. First: stem and leaf must be defined. Example Data: 23, 26, 26, 27, 28, 30, 31, 45, 45, 45 Typically, a Stem-and-Leaf plot looks then like this (with stem unit of 10 and leaf unit of 1). 2 3 6 6 7 8 (stem = 2, leafs are 3 6 6 7 8) 3 0 1 4 5 5 5 SPSS: a Stem-and-Leaf plot is generated when the option explore is used (descriptive statistics). 38 42
Box Plot Visualizes: - distribution (normal? skew?) - full range of variation - outliers SPSS: a Box plot is generated when the option explore is used (descriptive statistics). 39 43
Describing categorical/ordinal data Data can be described in absolute values (numbers) and/or in relative values (%). - Frequency tables - Crosstabs (at least 2 variables) - 40
Frequency Tables Menu: analyze - descriptive statistics - frequencies 41
Crosstabs Menu: analyze - descriptive statistics - crosstabs 42
Crosstabs 43
SPSS Help There are helpful SPSS manuals / guides available at the internet. http://www.sussex.ac.uk/its/pdfs/spss_brief_guide_20.pdf http://www.ats.ucla.edu/stat/spss/modules/ http://www.onderzoekenspss.nl/index.html/ (english) (english) (dutch) SPSS has an extensive Help Function. Demo on youtube about types of data : http://www.youtube.com/watch?v=hzxnzfnt5v8&nr=1&feature=endscreen 44
Addendum data / databases 45
Types of data: special cases Identifiers. A unique code / number to identify an individual patient. Key variable (for merging data, patient file research, etc ). Censored data. Most common is right-censored: event will occur, but we do not know when, e.g. death. Interval-censored: the event occurred in a certain time interval, but we do not know exactly when. Derived data. E.g.: age at start of treatment, derived from birth date and treatment date. Imputed data. A way of handling missing data. E.g. estimation of start treatment, based on blood values. Missing data. Missing data are often coded as missing. Beware of these values when you start analyzing data (e.g. 99 = missing). 46
Date and Time Variables To calculate the time between two dates, you can subtract dates from each other. E.g.: (date start therapy) (birth date) = (age at start therapy). Beware of the unit of the calculated age. In SPSS, it will be calculated in seconds (using the option compute ). Age at start (in days) = ( (date start) (birth date) ) / ( 60*60*24) Age at start (in years) = ( (date start) (birth date) ) / ( 60*60*24*365.25) SPSS also contains a date and time wizard, in which you can indicate the desired unit for calculations. 47
Code / Labels Two or more categories (not ordinal) Two: male, female 1, 2 or 0,1 More: Hospital A, B, C, D Whatever is convenient e.g. 1,2,3,4 or 11,17,22,33 Categories, ordinal Age: <40,40-60,>60 1, 2, 3 Risk factor: present, not present Prior surgery: yes, no 1, 0 48
Building a Database - Keep a short paper file per patient (study forms). - Enter original data preferably in a database environment (not Excel). - Construct a code book (next slide). - Keep your original data well-organized. - Save + backup original data, apart from derived data. - Include in your data file name: date, version, ref to study. - Use a text field to comment (and update) for every patient (e.g.: emigrated, lost to follow-up, no visit at 2 years follow-up ) - Check and double-check the data. 49
Code Book Define each variable (previous to data entry) in a code book: name of variables, type (e.g. numerical, text, date), length, decimals, labels / extended variable name (e.g. date of diagnosis in referring hospital ), values (e.g. 1=male, 2=female), missing values: list of defined missing values (e.g. 99=unknown). The code book can also be used to construct an electronic data form for data entry (to minimize errors). Variable names should be reasonably short + well-organized, also to avoid problems when exported to other programs. 50
Electronic Data Form Example of simple data entry form in ACCESS 51
Error Checking Range/outliers: are outliers true values, or errors? Missings: are missing values really missing? Dates: are dates within the expected range? Queries (logical rules): E.g. stop date must be between x and y weeks after start date. 52