A B S T R A C T
Thanks to the advent of **functional** brain-imaging technologies, cog- nitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider **data** **from** **functional** Magnetic Resonance Imaging ( fMRI ), that we study in a machine **learning** setting: we learn a model of brain activity that should generalize on unseen **data**. After reviewing the standard fMRI **data** analysis techniques, we propose new meth- ods and models to benefit **from** the recently released large fMRI **data** repositories. Our goal is to learn richer **representations** of brain activ- ity. We first focus on unsupervised analysis of terabyte-scale fMRI **data** acquired on subjects at rest (resting-state fMRI ). We perform this anal- ysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary **learning** on hundreds of fMRI records in reasonable time. Our leading approach relies on introduc- ing randomness in stochastic optimization loops and provides speed- up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic sub- sampling approach, for datasets **from** fMRI , hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix fac- torization problem. We then turn to work with fMRI **data** acquired on subject undergoing behavioral protocols (task fMRI ). We investigate how to aggregate **data** **from** many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks **from** brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to **learning** to decode these im- ages **from** their reduced representation. This fosters transfer **learning** in between studies, as we learn the undocumented cognitive com- mon aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting.

En savoir plus
183 En savoir plus

i=1 [(θ i + λ) −1 P n j=1 hz ij , y j iz i ]
To put our algorithm into context, we remind that a crucial question about the applica- bility of **functional** **data** is how one can find an appropriate space and a basis in which the functions can be decomposed in a computationally feasible way while taking into account the **functional** nature of the **data**. This is exactly what Algorithm 1 does. In contrast to parametric FDA methods, the basis function here is not fixed in advance but implicitly defined by choosing a reproducing operator-valued kernel acting on both input and output **data**. The spectral decomposition of the block operator kernel matrix naturally allows the assignment of an appropriate basis function to the **learning** process for representing input and output functions. Moreover, the formulation is flexible enough to be used with different operators and then to be adapted for various applications involving **functional** **data**. Also, in the context of nonparametric FDA where the notion of semi-metric plays an important role in modeling **functional** **data**, we note that Algorithm 1 is based on computing and choosing a finite number of eigenfunctions. This is strongly related to the semi-metric building scheme in Ferraty and Vieu (2006) which is based on, for example, **functional** principal components or successive derivatives. Operator-valued kernels constructed **from** the covariance opera- tor (Kadri et al., 2013b) or the derivative operator will allow to design semi-metrics similar to those just mentioned. In this sense, the eigendecomposition of the block operator kernel matrix offers a new way of producing semi-metrics.

En savoir plus
CHAPTER 1. INTRODUCTION
a review). For example, in object recognition tasks, deep convolutional neural networks have largely superseded the manually engineered SIFT features. The **representations** learned by these deep convolutional architectures were shown to capture concepts such as edges or textures in the first layers, and objects or parts of objects such as eyes or cats in deeper layers. In NLP, repre- sentation **learning** techniques have also imposed their supremacy over more traditional models such as N-grams or latent semantic analysis (LSA). Today, Word2Vec [ Mikolov et al. , 2013 ] yields among the best vector **representations** for words. It refers to one of two models, the con- tinuous bag-of-words (CBOW) model or the skip-gram model. Both of these models are linear models whose architecture only contains one single hidden layer (no non-linearity applied). The CBOW model predicts which word is most likely to appear given a certain number of words which precedes and follows it in a sentence. By contrast, the skip-gram model predicts the words that are likely to surround a given word in a sentence. Both of these models take as input the one-hot encoding of words and implement a softmax regression to predict the output. The new vector representation of words learned by these models is the representation of words in the hidden layer. This representation was shown to be interesting since it not only encodes syntactic similarities between words but also semantic similarities. For example, simple arithmetic oper- ations such as Vec(‘King’) - Vec(‘Man’) + Vec(‘Woman’) yields a new vector which is closest to the vector that represents the word ‘Queen’ in the database. The representation learned with Word2Vec has been successfully used in various NLP tasks.

En savoir plus
145 En savoir plus

Our results on HCP **data** showed that the solutions found by using func- tional and diffusion **MRI** identify fewer cortical regions while still explaining the M/EEG **data**. It is important to recall that because there are fewer MEG measurements than cortical sources to estimate, recovering brain activity **from** MEG measurements is an ill-posed problem. As a direct consequence, infinitely many source configurations will explain the observed measurements. To resolve this ambiguity, our approach makes use of prior information **from** other modali- ties to select a single source configuration which is closest to the priors. In doing so, our approach also estimates white matter information flow, understood to be the posterior likelihood of a connection to be active, given the MEG mea- surements. With the addition of **functional** **MRI**, the information flow between cortical regions known to be involved in visuomotor tasks were identified with no manual selection of these regions of interest.

En savoir plus
Figure 3.18: Classification accuracy w.r.t. different maximum considered subpath lengths P. SBoSK is computed on the Strasbourg Spot-4 image with D = 4096.
known techniques for spatial/spectral remote sensing image classification. Spatial-spectral kernel [61] has been introduced to take into account pixel spectral value and spatial infor- mation through accessing the nesting region. We thus implement spatial-spectral kernel based on the multiscale segmentation commonly used in this paper, and select the best level (determined by a cross-validation strategy) to extract spatial information. Attribute profile [48] is considered as one of the most powerful techniques to describe image content through context feature. We use full multi-spectral bands with automatic level selection for the area attribute and standard deviation attribute as detailed in [75]. Stacked vector was adopted in [96, 25, 113], and relies on features extracted **from** a hierarchical representation. We use a Gaussian kernel with stacked vector that concatenates all nodes **from** ascending paths gen- erated **from** our multiscale segmentation. The comparison is done by randomly choosing n = [ 50, 100, 200, 400 ] samples for training and the rest for testing. The classification accu- racies with different methods are shown in Tab. 3.3. Three common accuracy assessment measures in the remote sensing community [15, 40] are reported here: overall accuracy, av-

En savoir plus
146 En savoir plus

Summary
Information retrieval is generally concerned with answering questions such as: is this document relevant to this query? How similar are two queries or two doc- uments? How query and document similarity can be used to enhance relevance estimation? In order to answer these questions, it is necessary to access computa- tional **representations** of documents and queries. For example, similarities between documents and queries may correspond to a distance or a divergence defined on the representation space. It is generally assumed that the quality of the representation has a direct impact on the bias with respect to the true similarity, estimated by means of human intervention. Building useful **representations** for documents and queries has always been central to information retrieval research. The goal of this thesis is to provide new ways of estimating such **representations** and the relevance relationship between them. We present four articles that have been published in international conferences and one published in an information retrieval evaluation forum. The first two articles can be categorized as feature engineering approaches, which transduce a priori knowledge about the domain into the features of the rep- resentation. We present a novel retrieval model that compares favorably to existing models in terms of both theoretical originality and experimental effectiveness. The remaining two articles mark a significant change in our vision and originate **from** the widespread interest in deep **learning** research that took place during the time they were written. Therefore, they naturally belong to the category of representa- tion **learning** approaches, also known as feature **learning**. Differently **from** previous approaches, the **learning** model discovers alone the most important features for the task at hand, given a considerable amount of labeled **data**. We propose to model the semantic relationships between documents and queries and between queries them- selves. The models presented have also shown improved effectiveness on standard test collections. These last articles are amongst the first applications of representa- tion **learning** with neural networks for information retrieval. This series of research leads to the following observation: future improvements of information retrieval effectiveness has to rely on representation **learning** techniques instead of manually defining the representation space.

En savoir plus
159 En savoir plus

Whatever the type of task considered (supervised, unsupervised), machine-**learning** algorithms generally rest upon the computation of statistical quantities such as averages or linear combina- tions of the observed features, representing efficiently the **data**. However, summarizing ranking variability is far **from** straightforward and extending simple concepts such as that of an average or median in the context of preference **data**, i.e. ranking aggregation, raises a certain number of deep mathematical and computational problems, on which we focused on Part I . Regarding dimensionality reduction, it is far **from** straightforward to adapt traditional techniques such as Principal Component Analysis and its numerous variants to the ranking setup, the main barrier being the absence of a vector space structure on the set of permutations. In this chapter, we develop a novel framework for representing the distribution of ranking **data** in a simple manner, that is shown to extend, in some sense, consensus ranking. The rationale behind the approach we promote is that, in many situations encountered in practice, the set of instances may be partitioned into subsets/buckets, such that, with high probability, objects belonging to a certain bucket are either all ranked higher or else all ranked lower than objects lying in another bucket. In such a case, the ranking distribution can be described in a sparse fashion by: 1) a partial ranking structure (related to the buckets) and 2) the marginal ranking distributions associated to each bucket. Precisely, optimal **representations** are defined here as those associated to a bucket order minimizing a certain distortion measure we introduce, the latter being based on a mass transportation metric on the set of ranking distributions. In this chapter, we also establish rate bounds describing the generalization capacity of bucket order **representations** obtained by min- imizing an empirical version of the distortion and address model selection issues related to the choice of the bucket order size/shape. Numerical results are also displayed, providing in partic- ular strong empirical evidence of the relevance of the notion of sparsity considered, which the dimensionality reduction technique introduced is based on.

En savoir plus
210 En savoir plus

3.2 Qualitative Evaluation
Figure. 3 shows the qualitative comparison of our prediction results, a 2-layer DNN as in [ 8 ] and a single cGAN (corresponding to the sketcher in our ap- proach) with corresponding input multimodal **MRI** and the true [ 11 C]PIB PET DVR parametric map. We can find that the 2-layer DNN failed to find the non- linear mapping between the multimodal **MRI** and the myelin content in PET. Especially, some anatomical or structural traces (that are not present in the ground truth) can still be found in the 2-layer-DNN predicted PET. This high- lights that the relationship between myelin content and multimodal **MRI** **data** is complex, and only two layers are not powerful enough to encode-decode it.

En savoir plus
Materials and Methods.
We studied 34 patients with early-onset Alzheimer’s disease (EOAD), 49 with late-onset AD (LOAD), 39 with frontotemporal dementia (FTD) and 24 with depression **from** the pre-existing cohort CLIN-AD. Support vector machine (SVM) automatic classifiers using 3D T1 **MRI** were trained to distinguish: LOAD vs Depression, FTD vs LOAD, EOAD vs Depression, EOAD vs FTD. We extracted SVM weight maps, which are tridimensional **representations** of discriminant atrophy patterns used by the classifier to take its decisions and we printed posters of these maps. Four radiologists (2 senior neuroradiologists and 2 unspecialized junior radiologists) performed a visual classification of the 4 diagnostic pairs using 3D T1 **MRI**. Classifications were performed twice: first with standard radiological reading and then using SVM weight maps as a guide.

En savoir plus
1. Introduction
Deep **learning** success involves supervised **learning** where massive amounts of labeled **data** are used to learn useful **representations** **from** raw **data**. As labeled **data** is not always accessible, unsupervised **learning** algorithms have been proposed to learn useful **data** **representations** easily transferable for downstream tasks. A desirable property of these algorithms is to perform dimensionality reduction while keeping the most important attributes of **data**. For in- stance, methods based on deep neural networks have been proposed using autoencoder approaches [ 13 , 16 , 17 ] or gen- erative models [ 1 , 6 , 10 , 18 , 20 , 25 ]. Nevertheless, **learning** high-dimensional **data** can be challenging. Autoencoders present some difficulties to deal with multimodal **data** dis- tributions and generative models rely on computationally demanding models [ 9 , 15 , 24 ] which are particularly com- plicated to train.

En savoir plus
Random Matrix Theory Proves that Deep **Learning** **Representations** of GAN-**data** Behave as Gaussian Mixtures
tors and thus an appropriate statistical model of realistic **data**.
Targeting classification applications by assuming a mixture of concentrated random vectors model, this article stud- ies the spectral behavior of Gram matrices G in the large n, p regime. Precisely, we show that these matrices have asymptotically (as n, p → ∞ with p/n → c < ∞) the same first-order behavior as for a Gaussian Mixture Model (GMM). As a result, by generating images using the Big- GAN model ( Brock et al. , 2018 ) and considering different commonly used deep representation models, we show that the spectral behavior of the Gram matrix computed on these **representations** is the same as on a GMM model with the same p-dimensional means and covariances. A surprising consequence is that, for GAN **data**, the aforementioned sufficient statistics to characterize the quality of a given representation network are only the first and second order statistics of the **representations**. This behavior is shown by simulations to extend beyond random GAN-**data** to real images **from** the Imagenet dataset ( Deng et al. , 2009 ). The rest of the paper is organized as follows. In Section 2 , we introduce the notion of concentrated vectors and their main properties. Our main theoretical results are then pro- vided in Section 3 . In Section 4 we present experimental results. Section 5 concludes the article.

En savoir plus
We briefly review previous work we believe to be the most relevant to the presented method. [1] proposed a popu- lar model that has been frequently used in speech processing. A PCA-guided articulatory model is built to control tongue shapes in 2D. [2] developed active contours that use a shape model defined by a PCA. The curve evolves locally based on image gradients and curvature, and globally towards the MAP estimate of position and shape of the object. [3] adopted an implicit representation of the segmenting curve and cal- culated pose parameters to minimize a region-based energy **functional**. In this paper, we introduce a robust variational framework for segmentation. Following the work in [4] and [5], we construct a total energy including both global and local image statistics. Shape priors are incorporated into segmentation via a PCA model. We describe this framework in section 2. The implementation details are discussed in section 3. In section 4, we present results obtained using the proposed framework, and make comparisons with other approaches. We conclude in section 5.

En savoir plus
Rathore S., Habes M., Iftikhar M.A., Shacklett A., Davatzikos C. (2017), ‘A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer's disease and its prodromal stages’, Neuroimage, vol. 155, pp. 530-548
Samper-González, J., Burgos, N., Fontanella, S., Bertin, H., Habert, M.-O., Durrleman, S., Evgeniou, T., Colliot, O., ADNI (2017), ‘Yet another ADNI machine **learning** paper? Paving the way towards fully-reproducible research on classification of Alzheimer’s disease’, Machine **Learning** in Medical Imaging, MLMI 2017, LNCS, vol. 10541, pp. 53–60

En savoir plus
j (b h i ), j = 1,2..,m; h=1,2,. . . ,k and i= 1,2,. . . , Lh.
When evaluating a certain quantity or a measure with a regular (crisp) inter- val, there are two extreme cases, which we should try to avoid. It is possible to make a pessimistic evaluation, but then the interval will appear wider. It is also possible to make an optimistic evaluation, but then there will be a risk of the output measure to get out of limits of the resulting narrow interval, so that the reliability of obtained results will be doubtful. Fuzzy intervals do not have these problems. They permit to have simultaneously both pessimistic and optimistic **representations** of the studied measure [22]. This is why we introduce the thresholds d 1

En savoir plus
Although hand-crafted features and metric **learning** achieve promising performance for uncontrolled face recognition, it remains cumbersome to improve the design of hand-crafted local features (such as SIFT [25]) and their aggregation mechanisms (such as Fisher vectors [38]). This is because the experimental evaluation results of the features cannot be automatically fed back to improve the robustness to nuisance factors such as pose, illumination and expression. The major advantage of CNNs is that all processing layers, starting **from** the raw pixel-level input, have configurable parameters that can be learned **from** **data**. This obviates the need for manual feature design, and replaces it with supervised **data**-driven feature **learning**. **Learning** the large number of parameters in CNN models (millions of parameters are rather a rule than an exception) requires very large training datasets. For example, the CNNs which achieve state-of-the-art performance on the LFW benchmark are trained using datasets with millions of labeled faces: Facebook’s DeepFace [54] and Google’s FaceNet [40] were trained using 4 million and 200 million training samples, respectively.

En savoir plus
The relationships between structural equation modeling and conditional cor- relation have been the topic of much research and involve graph theoretic con- cepts like morality and d-separation (Whittaker, 1990; Lauritzen, 1996; Pearl, 2001). Theoretical considerations led us to hypothesize that partial and, more generally, conditional, correlation coefficients could extract the (undirected) structure of effective connectivity **from** the **data** (Marrelec et al., 2005a). The analysis developed in this paper strongly supports this assumption. Indeed, while we demonstrated that a lack of partial correlation between two regions can potentially be related to a lack of underlying anatomical connection, the example used suggests that a strong and significant partial correlation can be interpreted as the presence of an effective connection. Whether this behavior is a general property of fMRI **data** or only incidental remains to be investigated. Nonetheless, we believe that partial correlation will prove essential to effective connectivity investigation, for it can compensate for some of the most impor- tant drawbacks **from** which SEM analysis suffers (i.e., difficulty to provide a structural model a priori and lack of control over the SEM algorithms and results).

En savoir plus
Fig. 3: Screenshots of the different visualizations introduced in [2]
to highlight the proximity, in terms of chemical composition, of the current voxel with voxels previously inspected and whose content is known (see figure 3). Thus, it is not a question of recognizing the materials present in the container but of displaying visually close and known containers in order to deduce the contents. This approach has the advantage of not requiring **learning** or parameteriza- tion since it relies on the manual selection of a neighborhood. In figure 3, we can also see two classical **representations** that have been used in conjunction with this method. These are projections of the current voxel on two triangles: a trian- gle called “materials triangle” indicates the proximity of the voxel with metals, ceramics and organic materials, while a so-called “alert triangle” presents the ratios between carbon, nitrogen and oxygen. This last triangle makes it possible to distinguish between drugs and explosives. The main drawback of the visual analytics approach is that the operator must be able to interpret the different **representations** himself.

En savoir plus
periodic blocked design. However, this interleaved partial k -space sampling cannot be exploited in aperiodic dynamic acquisition schemes like in resting state fMRI (rs-fMRI) or during fast-event related fMRI paradigms [22, 23]. In rs-fMRI, spon- taneous brain activity is recorded without any experimental design in order to probe intrinsic **functional** connectivity [22, 24, 25]. In fast event-related designs, the presence of jittering combined with random delivery of stimuli introduces a trial-varying delay between the stimulus and acquisition time points [26]. This pre- vents the use of an interleaved k -space sampling strategy between successive scans since there is no guarantee that the BOLD response is quasi-periodic. Because the vast majority of fMRI studies in neurosciences make use either of rs-fMRI or fast event-related designs [26, 27], the most reliable acquisition strategy in such contexts remains the “scan and repeat” approach, although it is suboptimal. To our knowledge, only one kt-contribution ( kt -GRAPPA [19]) has claimed its ability to accurately reconstruct fMRI images in aperiodic paradigms.

En savoir plus
Perrinet Institut de Neurosciences de la Timone UMR7289 CNRS / Aix-Marseille Universit´e — France e-mail: Laurent.Perrinet@univ-amu.fr http://invibe.net/LaurentPerrinet/Publications/Perr[r]

Keywords: neural motion planning, obstacle avoidance, representation **learning**
1 Introduction
Motion planning is a fundamental robotics problem [ 2 , 3 ] with numerous applications in mo- bile robot navigation [ 4 ], industrial robotics [ 5 ], humanoid robotics [ 6 ] and other domains. Sampling-based methods such as Rapidly Exploring Random Trees (RRT) [ 7 ] and Probabilistic Roadmaps (PRM) [ 8 ] have been shown successful for finding a collision-free path in complex en- vironments with many obstacles. Such methods are able to solve the so-called piano mover prob- lem [ 9 ] and typically assume static environments and prior knowledge about the shape and location of obstacles. In many practical applications, however, it is often difficult or even impossible to obtain detailed a-priori knowledge about the real state of environments. It is therefore desirable to design methods relying on partial observations obtained **from** sensor measurements and enabling motion planning in unknown and possibly dynamic environments. Moreover, given the high complexity de- voted to exploration in sampling-based methods, it is also desirable to design more efficient methods that use prior experience to quickly find solutions for motion planning in new environments. To address the above challenges, several works [ 10 , 11 , 12 , 13 , 14 , 15 ] adopt neural networks to learn motion planning **from** previous observations. Such Neural Motion Planning (NMP) methods either improve the exploration strategies of sampling-based approaches [ 13 ] or learn motion policies with imitation **learning** [ 12 , 15 ] and reinforcement **learning** [ 14 ]. In this work we follow the NMP paradigm and propose a new learnable obstacle representation for motion planning.

En savoir plus