Usage
Getting the data
First things first: you need to get the data! Once you have access to the PPMI database, log in to the database and follow these instructions:
Select
Download
from the navigation bar at the topSelect
Study Data
from the options that appear in the navigation barSelect
ALL
at the bottom of the left-hand navigation bar on the new pageClick
Select ALL tabular data (csv) format
and then pressDownload>>
in the top right hand corner of the pageUnzip the downloaded directory and save it somewhere on your computer
Alternatively, you can use pypmi
module to download the data
programatically:
>>> import pypmi
>>> files = pypmi.fetch_studydata('all', user='username', password='password')
Fetching authentication key for data download...
Requesting 113 datasets for download...
Downloading PPMI data: 17.3MB [00:33, 519kB/s]
By default, the data will be downloaded to your current directory making it
easy to load them in the future, but you can optionally provide a path
argument to pypmi.fetch_studydata()
to specify where you would like
the data to go. (Alternatively, you can set an environmental variable
$PPMI_PATH
to specify where they should be downloaded to; this takes
precedence over the current directory.)
Loading and working with the data
Once you have the data downloaded you can use the functions to load various portions of it into tidy data frames.
For example, we can generate a number of clinical-behavioral measures:
>>> behavior = pypmi.load_behavior()
>>> behavior.columns
Index(['participant', 'visit', 'date', 'benton', 'epworth', 'gds',
'hvlt_recall', 'hvlt_recognition', 'hvlt_retention', 'lns', 'moca',
'pigd', 'quip', 'rbd', 'scopa_aut', 'se_adl', 'semantic_fluency',
'stai_state', 'stai_trait', 'symbol_digit', 'systolic_bp_drop',
'tremor', 'updrs_i', 'updrs_ii', 'updrs_iii', 'updrs_iii_a', 'updrs_iv',
'upsit'],
dtype='object')
The call to pypmi.load_behavior()
may take a few seconds to
run—there’s a lot of data to import and wrangle!
If we want to query the data with regards to, say, subject diagnosis it might be useful to load in some demographic information:
>>> demographics = pypmi.load_demographics()
>>> demographics.columns
Index(['participant', 'diagnosis', 'date_birth', 'date_diagnosis',
'date_enroll', 'status', 'family_history', 'age', 'gender', 'race',
'site', 'handedness', 'education'],
dtype='object')
Now we can perform some interesting queries! As an example, let’s just ask how many individuals with Parkinson’s disease have a baseline UPDRS III score. We’ll have to use information from both data frames to answer the question:
>>> import pandas as pd
>>> updrs = (behavior.query('visit == "BL" & ~updrs_iii.isna()')
... .get(['participant', 'updrs_iii']))
>>> parkinsons = demographics.query('diagnosis == "pd"').get('participant')
>>> len(pd.merge(parkinsons, updrs, on='participant'))
423
And the same for healthy individuals:
>>> healthy = demographics.query('diagnosis == "hc"').get('participant')
>>> len(pd.merge(healthy, updrs))
195
There’s a lot of power gained in leveraging the pandas DataFrame objects, so take a look at the pandas documentation to see what more you can do!