Exercise IX: The Haxby Experiment (2001)¶

Thie exercise is based on an example from nilearn’s documentation.

The Haxby Experiment (2001) [2] shows that the representation of objects from different categories is distributed and overlaps over brain regions that were broadly considered to be “specialized” to particular stimuli categories (i.e. the fusiform area for faces and the parahippocampal place area for places).

Cats Faces [Decoding] Mask Classification

For the purposes of this exercise, we will predict stimulus categories based on the signal acquired from the ventral temporal cortex alone. This demonstration will show that while there was some expected variation, classification performance was relatively high across categories.

Let’s start things off by using nilearn’s datasets module to download a single subjects dataset:

from nilearn import datasets

haxby_data = datasets.fetch_haxby()

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/nilearn/datasets/__init__.py:96: FutureWarning: Fetchers from the nilearn.datasets module will be updated in version 0.9 to return python strings instead of bytes and Pandas dataframes instead of Numpy arrays.
  "Numpy arrays.", FutureWarning)

Dataset created in /home/runner/nilearn_data/haxby2001

Downloading data from https://www.nitrc.org/frs/download.php/7868/mask.nii.gz ...

 ...done. (0 seconds, 0 min)

Downloading data from http://data.pymvpa.org/datasets/haxby2001/MD5SUMS ...

 ...done. (0 seconds, 0 min)

Downloading data from http://data.pymvpa.org/datasets/haxby2001/subj2-2010.01.14.tar.gz ...

Downloaded 124116992 of 291168628 bytes (42.6%,    1.3s remaining)

Downloaded 281567232 of 291168628 bytes (96.7%,    0.1s remaining) ...done. (2 seconds, 0 min)

Extracting data from /home/runner/nilearn_data/haxby2001/f33ff337e914bf7fded743c7107979f9/subj2-2010.01.14.tar.gz...

.. done.

haxby_data

{'anat': ['/home/runner/nilearn_data/haxby2001/subj2/anat.nii.gz'],
 'func': ['/home/runner/nilearn_data/haxby2001/subj2/bold.nii.gz'],
 'session_target': ['/home/runner/nilearn_data/haxby2001/subj2/labels.txt'],
 'mask_vt': ['/home/runner/nilearn_data/haxby2001/subj2/mask4_vt.nii.gz'],
 'mask_face': ['/home/runner/nilearn_data/haxby2001/subj2/mask8b_face_vt.nii.gz'],
 'mask_house': ['/home/runner/nilearn_data/haxby2001/subj2/mask8b_house_vt.nii.gz'],
 'mask_face_little': ['/home/runner/nilearn_data/haxby2001/subj2/mask8_face_vt.nii.gz'],
 'mask_house_little': ['/home/runner/nilearn_data/haxby2001/subj2/mask8_house_vt.nii.gz'],
 'mask': '/home/runner/nilearn_data/haxby2001/mask.nii.gz',
 'description': b'Haxby 2001 results\n\n\nNotes\n-----\nResults from a classical fMRI study that investigated the differences between\nthe neural correlates of face versus object processing in the ventral visual\nstream. Face and object stimuli showed widely distributed and overlapping\nresponse patterns.\n\nContent\n-------\nThe "simple" dataset includes\n    :\'func\': Nifti images with bold data\n    :\'session_target\': Text file containing session data\n    :\'mask\': Nifti images with employed mask\n    :\'session\': Text file with condition labels\n\n\nThe full dataset additionally includes\n    :\'anat\': Nifti images with anatomical image\n    :\'func\': Nifti images with bold data\n    :\'mask_vt\': Nifti images with mask for ventral visual/temporal cortex\n    :\'mask_face\': Nifti images with face-reponsive brain regions\n    :\'mask_house\': Nifti images with house-reponsive brain regions\n    :\'mask_face_little\': Spatially more constrained version of the above\n    :\'mask_house_little\': Spatially more constrained version of the above\n\n\nReferences\n----------\nFor more information see:\nPyMVPA provides a tutorial using this dataset :\nhttp://www.pymvpa.org/tutorial.html\n\nMore information about its structure :\nhttp://dev.pymvpa.org/datadb/haxby2001.html\n\n\n`Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J.,\nand Pietrini, P. (2001). Distributed and overlapping representations of\nfaces and objects in ventral temporal cortex. Science 293, 2425-2430.`\n\n\nLicence: unknown.\n'}

Review the dataset’s description:

print(haxby_data.description.decode())

Haxby 2001 results


Notes
-----
Results from a classical fMRI study that investigated the differences between
the neural correlates of face versus object processing in the ventral visual
stream. Face and object stimuli showed widely distributed and overlapping
response patterns.

Content
-------
The "simple" dataset includes
    :'func': Nifti images with bold data
    :'session_target': Text file containing session data
    :'mask': Nifti images with employed mask
    :'session': Text file with condition labels


The full dataset additionally includes
    :'anat': Nifti images with anatomical image
    :'func': Nifti images with bold data
    :'mask_vt': Nifti images with mask for ventral visual/temporal cortex
    :'mask_face': Nifti images with face-reponsive brain regions
    :'mask_house': Nifti images with house-reponsive brain regions
    :'mask_face_little': Spatially more constrained version of the above
    :'mask_house_little': Spatially more constrained version of the above


References
----------
For more information see:
PyMVPA provides a tutorial using this dataset :
http://www.pymvpa.org/tutorial.html

More information about its structure :
http://dev.pymvpa.org/datadb/haxby2001.html


`Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J.,
and Pietrini, P. (2001). Distributed and overlapping representations of
faces and objects in ventral temporal cortex. Science 293, 2425-2430.`


Licence: unknown.

Visualization¶

fMRI data is generally represented as a 4D matrix of 3D EPI scans over time.

To read the fMRI volume from a .nii.gz file we’ll use nibabel’s load() function:

from nilearn import plotting
import nibabel as nib

fmri_path = haxby_data.func[0]
fmri_data = nib.load(fmri_path)
fmri_data.shape

(40, 64, 64, 1452)

To plot a 3D image representing the mean fMRI signal over time for the single subject in the dataset:

from nilearn.image import mean_img

fmri_mean_epi = mean_img(fmri_data)
plotting.view_img(fmri_mean_epi, threshold=None)

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/nilearn/plotting/html_stat_map.py:217: FutureWarning: Default resolution of the MNI template will change from 2mm to 1mm in version 0.10.0
  bg_img = load_mni152_template()

Masking¶

In order to analyze the data, we need to produce a matrix containing the measured signal in the ventral temporal (VT) cortex over time.

First, let’s have a look at the provided mask:

import matplotlib.pyplot as plt

mask = haxby_data.mask_vt[0]
anatomical = haxby_data.anat[0]

figure = plt.figure(figsize=(14, 5))
_ = plotting.plot_roi(mask,
                      bg_img=anatomical,
                      cmap="spring",
                      black_bg=False,
                      figure=figure)

Now, we will use this ROI in order to mask our 4D fMRI data:

Masking illustration

from nilearn.input_data import NiftiMasker

masker = NiftiMasker(mask_img=mask, standardize=True)
masked_data = masker.fit_transform(fmri_data)

The result containts time points as rows and voxels as columns:

masked_data.shape

(1452, 464)

For example, to plot the change in the measured signal in some 5 voxels over time:

START_VOXEL, END_VOXEL = 0, 5
SIGNAL_PLOT_CONFIGURATION = {
    "title": "Voxel Time Series",
    "xlabel": "Scan Number",
    "ylabel": "Normalized Signal",
    "xlim": (0, len(masked_data))
}

fig, ax = plt.subplots(figsize=(15, 6))
ax.plot(masked_data[:, START_VOXEL:END_VOXEL], linewidth=0.75)
_ = ax.set(**SIGNAL_PLOT_CONFIGURATION)

Prediction¶

Let’s try and predict which stimulus category the subject was viewing based on the masked fMRI data.

Reading the labels¶

The session_target file contains a labels column, describing the stimulus category at each presentation block, as well as the session (chunks) out of 12 sessions presented to each subjects.

import pandas as pd

LABEL_COLUMNS = {'labels': 'Label', 'chunks': 'Session'}

labels_path = haxby_data.session_target[0]
labels = pd.read_csv(labels_path, delimiter=' ')
labels.rename(columns=LABEL_COLUMNS, inplace=True)
labels

	Label	Session
0	rest	0
1	rest	0
2	rest	0
3	rest	0
4	rest	0
...	...	...
1447	rest	11
1448	rest	11
1449	rest	11
1450	rest	11
1451	rest	11

1452 rows × 2 columns

import numpy as np

LABEL_COLORS = {
    "rest": "white",
    "scissors": "grey",
    "face": "red",
    "cat": "orange",
    "shoe": "black",
    "house": "cyan",
    "scrambledpix": "blue",
    "bottle": "green",
    "chair": "brown"
}


def legend_without_duplicate_labels(ax):
    handles, labels = ax.get_legend_handles_labels()
    unique = [(h, l) for i, (h, l) in enumerate(zip(handles, labels))
              if l not in labels[:i]]
    ax.legend(*zip(*unique), bbox_to_anchor=(1.15, 1))


category_fig, category_ax = plt.subplots(figsize=(14, 3))
for label in labels.Label.unique():
    y1 = np.zeros(len(labels))
    y2 = np.ones(len(labels)) * (labels.Label == label)
    category_ax.fill_between(labels.index,
                             y1,
                             y2,
                             color=LABEL_COLORS[label],
                             label=label)
for session_id in labels.Session.unique():
    color = "black" if session_id % 2 == 0 else "white"
    y1 = np.ones(len(labels))
    y2 = np.ones(len(labels))
    y2[labels.Session == session_id] = 2
    category_ax.fill_between(labels.index, y1, y2, color=color)
legend_without_duplicate_labels(category_ax)
_ = category_ax.set(title="Stimulus Categories and Sessions Over Time",
                    yticks=[],
                    xlabel="Time")

Let’s create a couple of references to the given data for comfort and readability.

is_task = labels["Label"] != 'rest'

task_data = masked_data[is_task]
task_labels = labels["Label"][is_task]
task_sessions = labels["Session"][is_task]

categories = task_labels.unique()

Logistic Regression¶

To grid search across \(\ell_1\) and \(\ell_2\) with various C values and score based on the ROC AUC:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, LeaveOneGroupOut

SOLVER = "liblinear"  # See note 1
model = LogisticRegression(random_state=0, solver=SOLVER)
C = [10**i for i in range(-2, 3)]
PARAM_GRID = {"C": C, "penalty": ["l1", "l2"]}
SCORING = "roc_auc"

searcher = GridSearchCV(model,
                        param_grid=PARAM_GRID,
                        scoring=SCORING,
                        cv=LeaveOneGroupOut())  # See note 2

Note

sklearn offers a variety of different solvers for logistic regression, in our case we need to make sure we use one that supports both \(\ell_1\) and \(\ell_2\) regularization. For more information, see the LogistricRegression model’s user guide.

Note

The fMRI data is acquired by sessions, and the noise is autocorrelated in a given session. Hence, it is better to predict across sessions when doing cross-validation.

Our goal now it to evaluate how well we manage to classify the target category of stimuli based on the signal from the verntral temporal cortex alone:

# Create a DataFrame to store the results
scores = pd.DataFrame(index=categories, columns=["Penalty", "C", "AUC"])

# Iterate categories and updates best scores
for category in categories:
    target = task_labels == category
    searcher.fit(
        X=task_data,
        y=target,
        groups=task_sessions,
    )
    cv_results = searcher.cv_results_
    scores.loc[category, "Penalty"] = searcher.best_estimator_.penalty
    scores.loc[category, "C"] = searcher.best_estimator_.C
    scores.loc[category, "AUC"] = searcher.best_score_

scores

	Penalty	C	AUC
scissors	l2	10	0.912551
face	l1	1	0.986919
cat	l2	100	0.961199
shoe	l2	1	0.925779
house	l1	0.1	0.998089
scrambledpix	l1	10	0.989418
bottle	l1	1	0.898883
chair	l1	10	0.93107

Not bad! Clearly, the ventral temporal cortex has some significant value in decoding the represenation of the various stimulus categories (and not just faces).

Machine Learning for Neuroscience