{ "cells": [ { "cell_type": "markdown", "id": "f62f2e91", "metadata": {}, "source": [ "# Autism classification (resting-state fMRI)\n", "\n", "This example uses data from 20 participants of the [Autism Brain Imaging Data Exchange (ABIDE)](http://preprocessed-connectomes-project.org/abide/) preprocessed connectomes dataset. The goal is to predict an autism diagnosis from static functional connectivity estimates. Please note that the results may overestimate the true effect because of statistical issues when performing prediction with tens of thousands of features and relatively few observations.\n", "\n", "The multiverse is is similar to the classification multiverse perfomed by [Dafflon et al. 2022](https://www.nature.com/articles/s41467-022-31347-8) with the main difference being that we here implement a slightly reduced decision space for the connectivity and pracellation measures, but include an additional decision point for the regularisation strength of the classifier to also cover the statistical model. \n", "\n", "## Data Download\n", "\n", "The data is accessed through [nilearn](https://nilearn.github.io/dev/modules/generated/nilearn.datasets.fetch_abide_pcp.html). For quicker testing purposes, a subset of 20 subjects was provided by adding `SUB_ID=SUB_IDS` to the `datasets.fetch_abide_pcp()` arguments in the data download as well as in the multiverse analysis code cell. If you wish to run the analysis for the whole dataset to reproduce the included figures, please remove this argument from both functions.\n", "\n", "Please note that downloading the full data will take a few hours and requires ~20 GB of memory. If the download crashes or the code cell was aborted, you can simply re-run the code cell and it will continue with only the missing data." ] }, { "cell_type": "code", "execution_count": null, "id": "3f88ba91", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching ABIDE data: 100%|██████████| 48/48 [00:28<00:00, 1.70it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Available pipelines: [('cpac', True, True, 'rois_aal'), ('cpac', True, True, 'rois_cc200'), ('cpac', True, True, 'rois_dosenbach160'), ('cpac', True, False, 'rois_aal'), ('cpac', True, False, 'rois_cc200'), ('cpac', True, False, 'rois_dosenbach160'), ('cpac', False, True, 'rois_aal'), ('cpac', False, True, 'rois_cc200'), ('cpac', False, True, 'rois_dosenbach160'), ('cpac', False, False, 'rois_aal'), ('cpac', False, False, 'rois_cc200'), ('cpac', False, False, 'rois_dosenbach160'), ('ccs', True, True, 'rois_aal'), ('ccs', True, True, 'rois_cc200'), ('ccs', True, True, 'rois_dosenbach160'), ('ccs', True, False, 'rois_aal'), ('ccs', True, False, 'rois_cc200'), ('ccs', True, False, 'rois_dosenbach160'), ('ccs', False, True, 'rois_aal'), ('ccs', False, True, 'rois_cc200'), ('ccs', False, True, 'rois_dosenbach160'), ('ccs', False, False, 'rois_aal'), ('ccs', False, False, 'rois_cc200'), ('ccs', False, False, 'rois_dosenbach160'), ('dparsf', True, True, 'rois_aal'), ('dparsf', True, True, 'rois_cc200'), ('dparsf', True, True, 'rois_dosenbach160'), ('dparsf', True, False, 'rois_aal'), ('dparsf', True, False, 'rois_cc200'), ('dparsf', True, False, 'rois_dosenbach160'), ('dparsf', False, True, 'rois_aal'), ('dparsf', False, True, 'rois_cc200'), ('dparsf', False, True, 'rois_dosenbach160'), ('dparsf', False, False, 'rois_aal'), ('dparsf', False, False, 'rois_cc200'), ('dparsf', False, False, 'rois_dosenbach160'), ('niak', True, True, 'rois_aal'), ('niak', True, True, 'rois_cc200'), ('niak', True, True, 'rois_dosenbach160'), ('niak', True, False, 'rois_aal'), ('niak', True, False, 'rois_cc200'), ('niak', True, False, 'rois_dosenbach160'), ('niak', False, True, 'rois_aal'), ('niak', False, True, 'rois_cc200'), ('niak', False, True, 'rois_dosenbach160'), ('niak', False, False, 'rois_aal'), ('niak', False, False, 'rois_cc200'), ('niak', False, False, 'rois_dosenbach160')]\n", "Number of subjects: 20\n", "Class distribution: DX_GROUP\n", "1 10\n", "2 10\n", "Name: count, dtype: int64\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from tqdm import tqdm\n", "from itertools import product\n", "from nilearn import datasets\n", "\n", "pipelines = [\"cpac\", \"ccs\", \"dparsf\", \"niak\"]\n", "band_pass = [True, False]\n", "global_signal = [True, False]\n", "parcellations = [\"rois_aal\", \"rois_cc200\", \"rois_dosenbach160\"]\n", "\n", "# Subset of subjects to download\n", "SUB_IDS = [50012, 50014, 50015, 50016, 50020, 50022, 50023, 50024, 50025, 50027, # controls\n", " 50030, 50031, 50032, 50033, 50034, 50035, 50036, 50037, 50038, 50040] # autism\n", "\n", "def fetch_data(pipe, bp, gsr, parc):\n", " bunch = datasets.fetch_abide_pcp(SUB_ID=SUB_IDS, data_dir=\"./abide_data\", verbose=0, \n", " pipeline=pipe, derivatives=parc, band_pass_filtering=bp, global_signal_regression=gsr)\n", " return (pipe, bp, gsr, parc), bunch\n", "\n", "all_combinations = list(product(pipelines, band_pass, global_signal, parcellations))\n", "abide_dataset = {}\n", "\n", "for combo in tqdm(all_combinations, desc=\"Fetching ABIDE data\"):\n", " key, bunch = fetch_data(*combo)\n", " abide_dataset[key] = bunch\n", "\n", "print(f\"Available pipelines: {list(abide_dataset.keys())}\")\n", "print(f\"Number of subjects: {len(abide_dataset[('cpac', True, True, 'rois_aal')].phenotypic)}\")\n", "print(f\"Class distribution: {abide_dataset[('cpac', True, True, 'rois_aal')].phenotypic['DX_GROUP'].value_counts()}\")" ] }, { "cell_type": "markdown", "id": "96ff16b5", "metadata": {}, "source": [ "## Multiverse Analysis\n", "\n", "Available decision points for the preprocessed fMRI time series data are the following:\n", "\n", "- Preprocessing pipeline (`'cpac'`, `'ccs'`, `'dparsf'`, `'niak'`)\n", "- Parcellation atlas (`'rois_aal'`, `'rois_cc200'`, `'rois_dosenbach160'`)\n", "- Band pass filtering (`True` or `False`)\n", "- Global signal regression (`True` or `False`) -> If false, standard motion regression was performed\n", "\n", "For the connectivity measure, the two methods from the comet toolbox are included:\n", "\n", "- Pearson correlation (`comet.connectivity.Static_Pearson`)\n", "- Partial correlation (`comet.connectivity.Static_Partial`)\n", "\n", "And for the statistical model we include the regularisation strength (C=0.25, C=1.0)" ] }, { "cell_type": "code", "execution_count": null, "id": "f285b420", "metadata": {}, "outputs": [], "source": [ "from comet import multiverse\n", "\n", "forking_paths = {\n", " \"pipeline\": [\"cpac\", \"ccs\", \"dparsf\", \"niak\"], # Preprocessing pipelines\n", " \"parcellation\": [\"rois_aal\", \"rois_cc200\", \"rois_dosenbach160\"], # Parcellated time series data\n", " \"band_pass\": [True, False], # Band-pass filtering \n", " \"global_signal\": [True, False], # Global signal regression \n", " \"connectivity\":[ # Functional connectivity method\n", " {\"name\": \"pearson\", \"func\": \"comet.connectivity.Static_Pearson(ts).estimate()\"},\n", " {\"name\": \"partial\", \"func\": \"comet.connectivity.Static_Partial(ts).estimate()\"}],\n", " \"regularisation\": [0.25, 1.0] # Regularisation strength for the classifier\n", "}\n", "\n", "def analysis_template():\n", " import comet\n", " import numpy as np\n", " from nilearn import datasets\n", " from sklearn.pipeline import Pipeline\n", " from sklearn.preprocessing import StandardScaler\n", " from sklearn.linear_model import LogisticRegression\n", " from sklearn.model_selection import StratifiedKFold, cross_val_score\n", "\n", " # Subset of subjects do use\n", " SUB_IDS = [50012, 50014, 50015, 50016, 50020, 50022, 50023, 50024, 50025, 50027, # controls\n", " 50030, 50031, 50032, 50033, 50034, 50035, 50036, 50037, 50038, 50040] # autism\n", "\n", " # Get data (if available, it will be loaded from disk)\n", " data = datasets.fetch_abide_pcp(SUB_ID=SUB_IDS, data_dir=\"./abide_data\", verbose=0, \n", " pipeline={{pipeline}},\n", " derivatives={{parcellation}},\n", " band_pass_filtering={{band_pass}},\n", " global_signal_regression={{global_signal}})\n", "\n", " time_series = data[{{parcellation}}]\n", " diagnosis = data[\"phenotypic\"][\"DX_GROUP\"]\n", "\n", " # Calculate FC\n", " tri_ix = None\n", " features = []\n", "\n", " for ts in time_series:\n", " FC = {{connectivity}}\n", "\n", " if tri_ix == None:\n", " tri_ix = np.triu_indices_from(FC, k=1)\n", " \n", " feat_vec = FC[tri_ix]\n", " features.append(feat_vec)\n", "\n", " # Prepare features (FC estimates) and target (autism/control)\n", " X = np.vstack(features)\n", " X[np.isnan(X)] = 0.0\n", " y = np.array(diagnosis)\n", "\n", " # Classification model\n", " model = Pipeline([('scaler', StandardScaler()), ('reg', LogisticRegression(penalty='l2', C={{regularisation}}, tol=1e-3))])\n", " cv = StratifiedKFold(n_splits=5)\n", " accuracies = cross_val_score(model, X, y, cv=cv, scoring='accuracy')\n", "\n", " # Save the results\n", " comet.utils.save_universe_results({\"accuracy\": accuracies})\n", "\n", "# Create and run the multiverse analysis\n", "mverse = multiverse.Multiverse(name=\"example_mv_abide\")\n", "mverse.create(analysis_template, forking_paths)\n", "mverse.run(parallel=8)" ] }, { "cell_type": "code", "execution_count": 3, "id": "1dfd1d03", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Universe | \n", "Decision 1 | \n", "Value 1 | \n", "Decision 2 | \n", "Value 2 | \n", "Decision 3 | \n", "Value 3 | \n", "Decision 4 | \n", "Value 4 | \n", "Decision 5 | \n", "Value 5 | \n", "Decision 6 | \n", "Value 6 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "Universe_1 | \n", "pipeline | \n", "cpac | \n", "parcellation | \n", "rois_aal | \n", "band_pass | \n", "True | \n", "global_signal | \n", "True | \n", "connectivity | \n", "pearson | \n", "regularisation | \n", "0.25 | \n", "
| 1 | \n", "Universe_2 | \n", "pipeline | \n", "cpac | \n", "parcellation | \n", "rois_aal | \n", "band_pass | \n", "True | \n", "global_signal | \n", "True | \n", "connectivity | \n", "pearson | \n", "regularisation | \n", "1.00 | \n", "
| 2 | \n", "Universe_3 | \n", "pipeline | \n", "cpac | \n", "parcellation | \n", "rois_aal | \n", "band_pass | \n", "True | \n", "global_signal | \n", "True | \n", "connectivity | \n", "partial | \n", "regularisation | \n", "0.25 | \n", "
| 3 | \n", "Universe_4 | \n", "pipeline | \n", "cpac | \n", "parcellation | \n", "rois_aal | \n", "band_pass | \n", "True | \n", "global_signal | \n", "True | \n", "connectivity | \n", "partial | \n", "regularisation | \n", "1.00 | \n", "
| 4 | \n", "Universe_5 | \n", "pipeline | \n", "cpac | \n", "parcellation | \n", "rois_aal | \n", "band_pass | \n", "True | \n", "global_signal | \n", "False | \n", "connectivity | \n", "pearson | \n", "regularisation | \n", "0.25 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 187 | \n", "Universe_188 | \n", "pipeline | \n", "niak | \n", "parcellation | \n", "rois_dosenbach160 | \n", "band_pass | \n", "False | \n", "global_signal | \n", "True | \n", "connectivity | \n", "partial | \n", "regularisation | \n", "1.00 | \n", "
| 188 | \n", "Universe_189 | \n", "pipeline | \n", "niak | \n", "parcellation | \n", "rois_dosenbach160 | \n", "band_pass | \n", "False | \n", "global_signal | \n", "False | \n", "connectivity | \n", "pearson | \n", "regularisation | \n", "0.25 | \n", "
| 189 | \n", "Universe_190 | \n", "pipeline | \n", "niak | \n", "parcellation | \n", "rois_dosenbach160 | \n", "band_pass | \n", "False | \n", "global_signal | \n", "False | \n", "connectivity | \n", "pearson | \n", "regularisation | \n", "1.00 | \n", "
| 190 | \n", "Universe_191 | \n", "pipeline | \n", "niak | \n", "parcellation | \n", "rois_dosenbach160 | \n", "band_pass | \n", "False | \n", "global_signal | \n", "False | \n", "connectivity | \n", "partial | \n", "regularisation | \n", "0.25 | \n", "
| 191 | \n", "Universe_192 | \n", "pipeline | \n", "niak | \n", "parcellation | \n", "rois_dosenbach160 | \n", "band_pass | \n", "False | \n", "global_signal | \n", "False | \n", "connectivity | \n", "partial | \n", "regularisation | \n", "1.00 | \n", "
192 rows × 13 columns
\n", "