PPMI Data Access Chelsea Caspell-Garcia, Eric Foster The University of Iowa PPMI Annual Meeting May 12-13, 2015
Outline Navigating the LONI website Requesting data access Downloading the data Structure of the data Grouping of tables Documents to answer common data questions PPMI Populations Finding enrolled subjects Example: Finding the mean change from baseline to 1- year by gender
Navigating the LONI Website How to request data access Go to www.loni.usc.edu and click on PPMI, or go directly to www.ppmi-info.org Click on Download Data on the right sidebar Click on Apply for Data Access and fill out the request form After your request is approved, you will receive an email with a link to create your password and finish setting up your account on LONI
Requesting Data Access
Requesting Data Access
Navigating the LONI Website How to download the data Login to LONI and click on the Download tab at the top To see all available data files and other files, click on ALL at the bottom of the left sidebar Check the boxes next to specific files you want to download, or check the 2 Select ALL boxes at the top to download everything Click on the Download>> button at the top right Multiple files will be downloaded in a.zip file
Downloading the PPMI Data
Data Structure Currently there are 72 data tables (.csv files) comprising both clinical and lab data Clinical Data Tables Each data table has a file name and a Page Name (variable: PAG_NAME) Screening/Demographics = SCREEN Adverse Event Log = AE Data Dictionary refers to tables by their Page Name
Lab Data Tables Data Structure Includes DaTSCAN SBR results, CSF biospecimen results, blood chemistry and hematology, etc. Lab data often structured differently than clinical data May not have a Page Name in the data file Subjects may have multiple observations for each visit because of multiple types of tests done Variables may have different names and/or may be coded differently than those in the clinical data tables Gender (values of 0, 1, 2 in clinical data; values of Female, Male in certain lab data tables) Diagnosis (variable = APPRDX in clinical data; variable = DIAGNOSIS in lab data)
Data Structure Useful files to figure out the variables Data Dictionary lists all variables in each data table with a brief description CRFs (do not list variable names) Code List lists every categorical variable with each possible response (CODE) and what each category means (DECODE) CODE = 1; DECODE = Yes PPMI Derived Variable Definitions and Score Calculations
PPMI Populations Finding enrolled subjects Need two tables: SCREEN (Screening/Demographics) and RANDOM (Randomization table) Enrolled subjects must appear in both tables Enrolled subjects must also have a non-missing enrollment date (ENROLLDT from the RANDOM table)
PPMI Populations APPRDX Variable Value (from the SCREEN table) Subject Cohort 1 Parkinson s Disease 2 Healthy Control 3 SWEDD 4 Prodromal 5 Genetic Cohort PD 6 Genetic Cohort Unaffected 7 Genetic Registry PD 8 Genetic Registry Unaffected
Example: Find Mean One Year Change in α synuclein for Enrolled PDs by Gender Start with the enrolled subject data set Subjects must be in both the SCREEN and RANDOM tables Subjects must have a non-missing ENROLLDT The PD population will have a diagnosis value of APPRDX = 1 Two ways to identify gender: From the SCREEN table, create a new variable for gender: Gender = 0 or 1 is female Gender = 2 is male From the BIOSPECIMEN_ANALYSIS_RESULTS table, use the gender variable (Special for the biospecimen_analysis_results table, not available in all lab data tables!)
Example Continued Find α synuclein measures from data table BIOSPECIMEN_ANALYSIS_ RESULTS TESTNAME = CSF Alphasynuclein The variable TESTVALUE is the α synuclein measure Merge together the enrolled PD subjects and the α synuclein measures
Example Continued We will need both the baseline and 1-year events CLINICAL_EVENT = Baseline Collection CLINICAL_EVENT = Visit 04 Take the difference between the variable TESTVALUE from the two set of records Calculate the mean of the differences while stratifying by the new gender variable
Example: Special Considerations ST Visits These are visits where symptomatic therapy was begun These did not necessarily align with planned study visits They did, however, take the place of some study visits The SIGNATURE_FORM (PAG_NAME = SIG) contains all of the planned study visits for each subject Any study visits that were replaced by ST visits will have a note in the comments on this table DATSCAN SBR Data: no ST visits (all have been accounted for within the DATSCAN SBR data table)