• Aucun résultat trouvé

Multiway SIR for biological data integration

N/A
N/A
Protected

Academic year: 2021

Partager "Multiway SIR for biological data integration"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-02791870

https://hal.inrae.fr/hal-02791870

Submitted on 5 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Multiway SIR for biological data integration

Valérie Sautron, Marie Chavent, Nathalie Viguerie, Nathalie Villa-Vialaneix

To cite this version:

Valérie Sautron, Marie Chavent, Nathalie Viguerie, Nathalie Villa-Vialaneix. Multiway SIR for bio- logical data integration. Statistical Methods for Post Genomic Data, Feb 2016, Lille, France. 2016.

�hal-02791870�

(2)

Multiway SIR for biological data integration

Val´erie Sautron1, Marie Chavent 2, Nathalie Viguerie3, Nathalie Villa-Vialaneix 4

1 INRA, GenPhySE (G´en´etique Physiologie et Syst`emes d’Elevage), F-31326 Castanet-Tolosan cedex, [email protected]

2 IMB - Institut de Math´ematiques de Bordeaux, [email protected]

3 Inserm - UMR1048, Obesity Research Laboratory, Institut of Metabolic and Cardiovascular Diseases (I2MD), Toulouse, [email protected]

4 INRA UR0875 MIAT Math´ematiques et Informatique Appliqu´ees Toulouse, 24 chemin de Borde-Rouge - Auzeville CS 52627, 31326 Castanet-Tolosan cedex,

[email protected]

1 Abstract

High-throughput sequencing techniques are increasingly used and allows to measure a large quantity of data at different levels of the living organism under study. As the cost of such techniques becomes more affordable, the complexity of the design of the experiments also increases and a biological process can now be observed dynamically. In this case, repeated measurements of one or several omics are acquired in order to monitor the evolution of a given biological process. When wanting to integrate such data to have a global overview of the time-course evolution, the biologist has to face several difficulties:

i) the longitudinal nature of the data has to be taken into account, ii) data coming from different sources have to be related.

In the present paper, we address the issue of predicting a given numerical target variable from time-course multivariate data. The proposed method is an extension of the STATIS approach [L’Hermier des Plantes, 1976] to a framework which is similar to SIR (Sliced Inverse Regression [Li, 1991]).

Given Y a real variable of interest and (Xt)t=1,...,T the collection of T tables of p variables measured on the samenindividuals, the proposed method can then be described in two steps: first, we analyze the between table similarity structure and derive a set of optimal weights to construct a compromise matrix designed so as to capture the stable structure of the differences over time between the means of Xt conditional to Y. Second, we perform a generalized PCA on the compromise matrix which estimates an EDR space for longitudinal data.

We will illustrate our approach on data collected in the DIOGenes project (www.diogenes-eu.org). These data contained thirteen clinical variables which were col- lected on 135 obese female patients at 3 stages of a specific diet. The goal of this ex- ploratory analysis is to investigate the relationship between these clinical data and weight evolution rate of the patients between the beginning and the end of the study.

1

(3)

References

[L’Hermier des Plantes, 1976] L’Hermier des Plantes, H. (1976). Structuration des tableaux `a trois indices de la statistique. PhD thesis, Universit´e de Montpellier. Th`ese de troisi`eme cycle.

[Li, 1991] Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–342.

2

Références

Documents relatifs

In this study, we used artificially spiked and naturally in- fected biological samples to evaluate the analytical sensitivity of virus detection using HTS based on Roche-454

We provide a short introduction to the laboratory work required and then describe the possible steps in data processing for the detection of plant viruses, including quality control

ПАТОФИЗИОЛОГИЧЕСКИЕ АСПЕКТЫ ВОССТАНОВЛЕНИЯ ПОВРЕЖДЕННЫХ ЖЕЛЧНЫХ ПУТЕЙ ПРИ ИСПОЛЬЗОВАНИИ НАТИВНЫХ И ИСКУССТВЕННЫХ МАТЕРИАЛОВ (ОБЗОР ЛИТЕРАТУРЫ)

Tindemans a fait basculer définitivement cette prérogative dans l'escarcelle du chef du gouvernement. Mais la doctrine a soutenu ultérieurement qu'un tel scénario n'avait

ont propos´e une exp´erience intracavit´e de fluorescence dispers´ee de la mol´ecule NiH [Hill 90] dans laquelle une mini source `a pulv´erisation cathodique ´etait ins´er´ee dans

It is thus becoming increasingly obvious that in research on alfalfa virology, similar to research on the virology of any other plant species or agricultural crop, HTS technologies

Then, during the second step an adaptive genetic algorithm will explore the matrix to discover combina- tions of k-mers that can accurately classify input samples.. The adaptive

Keywords: Computational Statistics, High-dimensional data, Dimension reduction, Compression, Variable selection, Logistic regression, Sparse Partial Least Squares, Probabilistic