• Aucun résultat trouvé

A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioral data.

N/A
N/A
Protected

Academic year: 2021

Partager "A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioral data."

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01827389

https://hal.inria.fr/hal-01827389

Submitted on 2 Jul 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A method for statistical learning in large databases of

heterogeneous imaging, cognitive and behavioral data.

Luigi Antelmi, Marco Lorenzi, Valeria Manera, Philippe Robert, Nicholas

Ayache

To cite this version:

Luigi Antelmi, Marco Lorenzi, Valeria Manera, Philippe Robert, Nicholas Ayache. A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioral data.. EPICLIN 2018 - 12ème Conférence Francophone d’Epidémiologie Clinique / CLCC 2018 - 25èmes Journées des statisticiens des Centre de Lutte Contre le Cancer, May 2018, Nice, France. Else-vier, Revue d’Épidémiologie et de Santé Publique, 66 (3), pp.S180, 2018, 12e Conférence franco-phone d’Épidémiologie clinique 25e Journée des statisticiens des Centres de lutte contre le cancer. �10.1016/j.respe.2018.03.306�. �hal-01827389�

(2)

EPICLIN 2018

A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioural data.

Authors: Luigi Antelmi1,*, Marco Lorenzi1, Valeria Manera2,3, Philippe Robert3,4, Nicholas Ayache1

* corrisponding author: [email protected] Affiliations:

1 UCA, Inria Sophia Antipolis, Epione Research Project. 2 UCA, Inria Sophia Antipolis, Stars Research Project. 3 CoBTeK, University of Nice Sophia Antipolis. 4 Centre Memoire, CHU de Nice.

The authors declare no conflict of interest

Abstract

Introduction The aim of this study is to develop a generative and probabilistic statistical learning model

for the joint analysis of heterogeneous biomedical data. The model will be applied to the investigation of neurological disorders from collections of brain imaging, body sensors, biological and clinical data available in current large-scale health databases. The resulting methodological framework will be tested on the UK Biobank, as well as on pathology-specific clinical data, as provided by the ADNI, or INSIGHT initiatives.

Methods We propose a variational approximation of Bayesian Canonical Correlation Analysis (CCA).

The proposed formulation is inspired by current advanced in variational learning, and offers the potential to scale to high-dimensional observations, such as medical images and arrays of biological data.. We proved that the variational lower bound can be optimized through modern learning libraries such as Torch and TensorFlow.

Results We currently benchmarked the method with respect to classical CCA on both synthetic data and

on the classical benchmarking datasets in machine learning (IRIS dataset). With respect to the synthetic dataset (Fig. 1A), we observed a strong agreement between the score components computed with classical CCA and our method. Moreover, the classification results on IRIS showed that the two methods essentially provide the same latent representation (Fig. 1B).

Conclusion Our method shows promising results for the future application to medical data. The method

is computationally efficient and scalable, hence able to process complex multivariate multidimensional datasets. We expect to highlight meaningful relationship among biomarkers that could be used to develop optimal strategies for disease classification, quantification, and prediction. In the future the proposed approach will be tested in several experimental settings:

1) Classification/stratification;

2) Prediction and imputation from a set of observed data (e.g. predict biological and clinical output from medical imaging information).

(3)

(A)

(B)

Fig 1. (A) Comparison of score components computed with classical CCA and our method from the synthetic dataset. The perfect linear relationship between two methods applied to this datasets means that they both capture the same latent representation. (B) Classification of the IRIS dataset. Classical CCA on the left; our method on the right.

Figure

Fig 1. (A) Comparison of score components computed with classical CCA and our method from the   synthetic   dataset

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Sont inclus dans l’étude les patients atteints de la maladie de Behçet présentant des lésions cutanées avec des symptômes cliniques associés malgré leurs

Dimension bildet die Beziehung ab, in der das Individuum zu seiner Gesellschaft steht, also ob die Menschen enge Beziehungen mit anderen eingebunden sind

The contributions of the present manuscript to the analysis of multi- omics datasets are twofolds: firstly, we have proposed three unsupervised kernel learning approaches to

Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach techniques for the applications, based either on hardware counters or on information obtained

Les organochlorés sont les plus anciens insecticides et aussi les plus puissants et toxiques. Ils furent utilisés de manière très importante à partir de la seconde

In this context, we first designed a new soft clustering algorithm for the mode seeking framework using such random walks and we provided a way to bound the 2-Wasserstein

Other than identifying and correcting several important consequences of semi-supervised Laplacian regularization algorithms, the results of Chapter 3 point out that even though it