Autism classification (resting-state fMRI)

This example uses data from 20 participants of the Autism Brain Imaging Data Exchange (ABIDE) preprocessed connectomes dataset. The goal is to predict an autism diagnosis from static functional connectivity estimates. Please note that the results may overestimate the true effect because of statistical issues when performing prediction with tens of thousands of features and relatively few observations.

The multiverse is is similar to the classification multiverse perfomed by Dafflon et al. 2022 with the main difference being that we here implement a slightly reduced decision space for the connectivity and pracellation measures, but include an additional decision point for the regularisation strength of the classifier to also cover the statistical model.

Data Download

The data is accessed through nilearn. For quicker testing purposes, a subset of 20 subjects was provided by adding SUB_ID=SUB_IDS to the datasets.fetch_abide_pcp() arguments in the data download as well as in the multiverse analysis code cell. If you wish to run the analysis for the whole dataset to reproduce the included figures, please remove this argument from both functions.

Please note that downloading the full data will take a few hours and requires ~20 GB of memory. If the download crashes or the code cell was aborted, you can simply re-run the code cell and it will continue with only the missing data.

[ ]:

from tqdm import tqdm
from itertools import product
from nilearn import datasets

pipelines = ["cpac", "ccs", "dparsf", "niak"]
band_pass = [True, False]
global_signal = [True, False]
parcellations = ["rois_aal", "rois_cc200", "rois_dosenbach160"]

# Subset of subjects to download
SUB_IDS = [50012, 50014, 50015, 50016, 50020, 50022, 50023, 50024, 50025, 50027, # controls
           50030, 50031, 50032, 50033, 50034, 50035, 50036, 50037, 50038, 50040] # autism

def fetch_data(pipe, bp, gsr, parc):
    bunch = datasets.fetch_abide_pcp(SUB_ID=SUB_IDS, data_dir="./abide_data", verbose=0,
                                     pipeline=pipe, derivatives=parc, band_pass_filtering=bp, global_signal_regression=gsr)
    return (pipe, bp, gsr, parc), bunch

all_combinations = list(product(pipelines, band_pass, global_signal, parcellations))
abide_dataset = {}

for combo in tqdm(all_combinations, desc="Fetching ABIDE data"):
    key, bunch = fetch_data(*combo)
    abide_dataset[key] = bunch

print(f"Available pipelines: {list(abide_dataset.keys())}")
print(f"Number of subjects:  {len(abide_dataset[('cpac', True, True, 'rois_aal')].phenotypic)}")
print(f"Class distribution:  {abide_dataset[('cpac', True, True, 'rois_aal')].phenotypic['DX_GROUP'].value_counts()}")

Fetching ABIDE data: 100%|██████████| 48/48 [00:28<00:00,  1.70it/s]

Available pipelines: [('cpac', True, True, 'rois_aal'), ('cpac', True, True, 'rois_cc200'), ('cpac', True, True, 'rois_dosenbach160'), ('cpac', True, False, 'rois_aal'), ('cpac', True, False, 'rois_cc200'), ('cpac', True, False, 'rois_dosenbach160'), ('cpac', False, True, 'rois_aal'), ('cpac', False, True, 'rois_cc200'), ('cpac', False, True, 'rois_dosenbach160'), ('cpac', False, False, 'rois_aal'), ('cpac', False, False, 'rois_cc200'), ('cpac', False, False, 'rois_dosenbach160'), ('ccs', True, True, 'rois_aal'), ('ccs', True, True, 'rois_cc200'), ('ccs', True, True, 'rois_dosenbach160'), ('ccs', True, False, 'rois_aal'), ('ccs', True, False, 'rois_cc200'), ('ccs', True, False, 'rois_dosenbach160'), ('ccs', False, True, 'rois_aal'), ('ccs', False, True, 'rois_cc200'), ('ccs', False, True, 'rois_dosenbach160'), ('ccs', False, False, 'rois_aal'), ('ccs', False, False, 'rois_cc200'), ('ccs', False, False, 'rois_dosenbach160'), ('dparsf', True, True, 'rois_aal'), ('dparsf', True, True, 'rois_cc200'), ('dparsf', True, True, 'rois_dosenbach160'), ('dparsf', True, False, 'rois_aal'), ('dparsf', True, False, 'rois_cc200'), ('dparsf', True, False, 'rois_dosenbach160'), ('dparsf', False, True, 'rois_aal'), ('dparsf', False, True, 'rois_cc200'), ('dparsf', False, True, 'rois_dosenbach160'), ('dparsf', False, False, 'rois_aal'), ('dparsf', False, False, 'rois_cc200'), ('dparsf', False, False, 'rois_dosenbach160'), ('niak', True, True, 'rois_aal'), ('niak', True, True, 'rois_cc200'), ('niak', True, True, 'rois_dosenbach160'), ('niak', True, False, 'rois_aal'), ('niak', True, False, 'rois_cc200'), ('niak', True, False, 'rois_dosenbach160'), ('niak', False, True, 'rois_aal'), ('niak', False, True, 'rois_cc200'), ('niak', False, True, 'rois_dosenbach160'), ('niak', False, False, 'rois_aal'), ('niak', False, False, 'rois_cc200'), ('niak', False, False, 'rois_dosenbach160')]
Number of subjects:  20
Class distribution:  DX_GROUP
1    10
2    10
Name: count, dtype: int64

Multiverse Analysis

Available decision points for the preprocessed fMRI time series data are the following:

Preprocessing pipeline ('cpac', 'ccs', 'dparsf', 'niak')
Parcellation atlas ('rois_aal', 'rois_cc200', 'rois_dosenbach160')
Band pass filtering (True or False)
Global signal regression (True or False) -> If false, standard motion regression was performed

For the connectivity measure, the two methods from the comet toolbox are included:

Pearson correlation (comet.connectivity.Static_Pearson)
Partial correlation (comet.connectivity.Static_Partial)

And for the statistical model we include the regularisation strength (C=0.25, C=1.0)

[ ]:

from comet import multiverse

forking_paths = {
    "pipeline": ["cpac", "ccs", "dparsf", "niak"],                          # Preprocessing pipelines
    "parcellation": ["rois_aal", "rois_cc200", "rois_dosenbach160"],        # Parcellated time series data
    "band_pass": [True, False],                                             # Band-pass filtering
    "global_signal": [True, False],                                         # Global signal regression
    "connectivity":[                                                        # Functional connectivity method
        {"name": "pearson", "func": "comet.connectivity.Static_Pearson(ts).estimate()"},
        {"name": "partial", "func": "comet.connectivity.Static_Partial(ts).estimate()"}],
    "regularisation": [0.25, 1.0]                                           # Regularisation strength for the classifier
}

def analysis_template():
    import comet
    import numpy as np
    from nilearn import datasets
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import StratifiedKFold, cross_val_score

    # Subset of subjects do use
    SUB_IDS = [50012, 50014, 50015, 50016, 50020, 50022, 50023, 50024, 50025, 50027, # controls
               50030, 50031, 50032, 50033, 50034, 50035, 50036, 50037, 50038, 50040] # autism

    # Get data (if available, it will be loaded from disk)
    data = datasets.fetch_abide_pcp(SUB_ID=SUB_IDS, data_dir="./abide_data", verbose=0,
                                    pipeline={{pipeline}},
                                    derivatives={{parcellation}},
                                    band_pass_filtering={{band_pass}},
                                    global_signal_regression={{global_signal}})

    time_series = data[{{parcellation}}]
    diagnosis = data["phenotypic"]["DX_GROUP"]

    # Calculate FC
    tri_ix = None
    features = []

    for ts in time_series:
        FC = {{connectivity}}

        if tri_ix == None:
            tri_ix = np.triu_indices_from(FC, k=1)

        feat_vec = FC[tri_ix]
        features.append(feat_vec)

    # Prepare features (FC estimates) and target (autism/control)
    X = np.vstack(features)
    X[np.isnan(X)] = 0.0
    y = np.array(diagnosis)

    # Classification model
    model = Pipeline([('scaler', StandardScaler()), ('reg', LogisticRegression(penalty='l2', C={{regularisation}}, tol=1e-3))])
    cv = StratifiedKFold(n_splits=5)
    accuracies = cross_val_score(model, X, y, cv=cv, scoring='accuracy')

    # Save the results
    comet.utils.save_universe_results({"accuracy": accuracies})

# Create and run the multiverse analysis
mverse = multiverse.Multiverse(name="example_mv_abide")
mverse.create(analysis_template, forking_paths)
mverse.run(parallel=8)

[3]:

mverse.summary()
mverse.specification_curve("accuracy", height_ratio=(1,1), figsize=(12,9), baseline=0.5, ci=95, line_pad=0.1)

	Universe	Decision 1	Value 1	Decision 2	Value 2	Decision 3	Value 3	Decision 4	Value 4	Decision 5	Value 5	Decision 6	Value 6
0	Universe_1	pipeline	cpac	parcellation	rois_aal	band_pass	True	global_signal	True	connectivity	pearson	regularisation	0.25
1	Universe_2	pipeline	cpac	parcellation	rois_aal	band_pass	True	global_signal	True	connectivity	pearson	regularisation	1.00
2	Universe_3	pipeline	cpac	parcellation	rois_aal	band_pass	True	global_signal	True	connectivity	partial	regularisation	0.25
3	Universe_4	pipeline	cpac	parcellation	rois_aal	band_pass	True	global_signal	True	connectivity	partial	regularisation	1.00
4	Universe_5	pipeline	cpac	parcellation	rois_aal	band_pass	True	global_signal	False	connectivity	pearson	regularisation	0.25
...	...	...	...	...	...	...	...	...	...	...	...	...	...
187	Universe_188	pipeline	niak	parcellation	rois_dosenbach160	band_pass	False	global_signal	True	connectivity	partial	regularisation	1.00
188	Universe_189	pipeline	niak	parcellation	rois_dosenbach160	band_pass	False	global_signal	False	connectivity	pearson	regularisation	0.25
189	Universe_190	pipeline	niak	parcellation	rois_dosenbach160	band_pass	False	global_signal	False	connectivity	pearson	regularisation	1.00
190	Universe_191	pipeline	niak	parcellation	rois_dosenbach160	band_pass	False	global_signal	False	connectivity	partial	regularisation	0.25
191	Universe_192	pipeline	niak	parcellation	rois_dosenbach160	band_pass	False	global_signal	False	connectivity	partial	regularisation	1.00

192 rows × 13 columns

Warning: Only 5 samples were available for the CI.

../../_images/sections_notebooks_example_mv_abide_4_2.png