• Aucun résultat trouvé

1. INTRODUCTION

1.4 Neuroimaging by fMRI

1.4.5 fMRI analysis methods

fMRI data can be analyzed in many different ways ; below, we briefly introduce the univariate and multivariate methods that we will be using in this work.

1.4.5.1 Univariate methods

Univariate analyses are commonly conducted in fMRI studies in order to compare different conditions or groups of subjects. These analyses test for activity differences in the time course of individual voxels, in a “functional localization” approach (Cox and Savoy 2003). Briefly, these methods try to find voxels that show a statistically significant response to an experimental condition. The most commonly used univariate model in fMRI analysis is the General Linear Model (GLM). The GLM is a regression-based model that aims to predict functional activity for each voxel using a number of regressors (i.e. linear combination of hypothesized response models) from the design matrix. Typically, the design matrix comprises variables of interest, confounds (e.g., motion parameters) and error, which alltogether constitute the sources of variance of the observed neurophysiological response (Price, Crinion et al. 2006).

Two different statistical tests can be used to infer about the size and statistical significance of the activation: the F-statistic, which tests the null hypothesis that all of the effects are zero, or the T-statistic, which tests whether a particular contrast/difference between two conditions is zero.

22

These tests are calculated on several thousands of voxels in the brain, resulting in a statistical parametric map. There are two stages to the analysis: the first-level analysis, which involves estimation of significant activation within a subject, and the second-level, which performs between subjects comparisons of activation. The first-level analysis involves a fixed effect analysis, because the subject effect is treated as fixed. However, the second-level analysis requires the activation of each subject to be modelled independently in order to account for the between-subjects variance; this is called the random effects model, as the between-subjects variance is assumed to correspond to the variance of the general population.

1.4.5.2 Multivariate methods

A variety of multivariate techniques have been applied to fMRI data, including pattern recognition and partial least squares (PLS), on which we will elaborate below. In contrast to univariate measures, multivariate analyses allow testing a hypothesis on many voxels of an image simultaneously, therefore bypassing the issue of multiple comparisons. Importantly, these methods also allow exploiting joint information contained in distributed patterns of activation (Cox and Savoy 2003), which boosts its sensitivity.

1.4.5.2.1 PLS

PLS is a data-driven statistical technique that was first introduced to neuroimaging by McIntosh and collaborators (McIntosh, Bookstein et al. 1996) to extract shared information between two matrices. When applied to fMRI data, PLS is commonly used to analyze the relationship between measures of brain activity and of behavior or experimental design (Krishnan, Williams et al.

2011). There are two main types of PLS: PLS Correlation (PLSC) and PLS Regression. While the latter is commonly used to predict behavior/design from brain activity, there are different variants within PLSC depending on the type of data that is correlated to brain activity, including behavior PLSC which relates brain activity to behavioral data, task PLSC which links brain activity to an experimental design, and seed PLSC which analyzes connectivity patterns between brain regions.

PLSC has also been used for dimensionality reduction and feature extraction prior to the use of a classifier (Ramirez, Gorriz et al. 2010, Andersen, Rayens et al. 2012, Segovia, Górriz et al.

2012). It has been suggested that PLS may be superior to principal component analysis (PCA) when the goal is between-group discrimination (Barker and Rayens 2003).

23

1.4.5.2.2 Pattern recognition

Pattern recognition techniques have generated much interest for the analysis of fMRI data in the past few years, as they allow extracting and exploiting exciting new information from neuroimaging data, such as decoding mental states or behavior from patterns of brain activity (Pereira, Mitchell et al. 2009). In a seminal experiment, Haxby and collaborators trained a classifier to identify the category of the object that a subject was viewing using patterns of brain activity in the ventral temporal cortex (Haxby, Gobbini et al. 2001). In contrast to univariate analyses that aim to predict activity from a design matrix, classifiers predict the design matrix from the fMRI activity.

Pattern recognition has raised much interest in the field of early detection of individuals at risk for a disease. Contrary to statistical analyses that draw comparisons at the group level (e.g., MCI/AD patients vs with EC), pattern recognition enables to make predictions at the individual level, a feature that holds great clinical value. In the case of MCI/AD patients, the early diagnosis of MCI patients, and their prognosis (conversion to AD or not) have been thoroughly investigated using many different modalities. Below, we will describe the way pattern recognition works for classifying between two classes (or groups), but the same logic applies to the decoding of two mental states or experimental conditions.

Briefly, pattern recognition algorithms train classifiers to learn statistical regularities in the dataset, and predict unseen data (Cox and Savoy 2003). Classification is the prediction of a discrete variable, in contrast to regression which aims to predict a continuous variable. The data is first separated into training data and testing data; this is a crucial step to obtain a good estimate of the true classifier’s accuracy, and to ensure the generalizability of the classifier to unseen data. This can be done in a number of ways, of which one of the most common is leave-one-out cross-validation, which consists of splitting the dataset N times into a training set containing N-1 subjects, and a test set containing the last subject. During training, a classifier is fed with a subset of labeled patterns of brain activity, from which it needs to learn a function which will map between voxel activity pattern and experimental labels (see Figure 5). Then, using a new pattern a brain activity previously unseen, the trained classifier must predict the label associated with that pattern. The rationale is that if the classifier has truly captured the

24

relationship between the features and classes, then it should be able to predict the classes of an examples that it hasn’t seen before.

Figure 5. Pattern recognition pipeline. The classifier learns from a labeled training dataset, and must predict labels for a test dataset. The accuracy is then computed by comparing the predicted labels to the true labels. Adapted from (Pereira, Mitchell et al. 2009) by permission of Elsevier.

Classification performance can suffer if the number of examples/features exceeds the number of data points (e.g., subjects). To prevent this, dimensionality reduction can be applied to the feature matrix using different approaches, such as principal component analyses, downsampling, or feature selection (Pereira, Mitchell et al. 2009). For the latter, there are different ways of selecting features, such as a priori selection of regions of interest (ROIs), or univariate measures (such as clusters or peaks of activation). It is crucial that features are selected from either an independent dataset or from a subpart of the dataset within a cross-validation scheme, otherwise the accuracy estimate will be too optimistic.

The performance of a classifier is usually assessed by its accuracy, i.e. the number of correctly classified examples divided by the total number of examples in the test set. When classifying classes (e.g., two groups of individuals), sensitivity and specificity rates can be obtained by computing the classifier’s accuracy for each class. Moreover, a receiver operating characteristic curve (ROC) can be obtained, which reflects the trade-off between sensitivity and specificity.

Another measure, the area under the ROC curve (AUC), measures the probability that a random pair of patients and controls would be correctly identified by the classifier.

Where the information is located in the brain, or which voxels drive the classifier towards a good performance, can also provide valuable information (Pereira, Mitchell et al. 2009). For this

25

purpose, the voxels that are consistently selected across folds, and the weight given to each voxel can be estimated for each fold. Notably, pattern recognition can distinguish between several classes, even if the information used for decoding is contained in overlapping voxels, because the spatial distribution of the activation patterns can be different.

A large number of classifiers are available nowadays, each of them having its own specific characteristics. They can be divided into linear (e.g., support vector machine) and nonlinear (e.g., decision trees). Another separation can be made between supervised and unsupervised classifiers:

supervised classifiers know the labels of the classes that they have to identify, while unsupervised classifiers do not.