• Aucun résultat trouvé

Multiple hot-deck imputation for network inference from RNA sequencing data

N/A
N/A
Protected

Academic year: 2021

Partager "Multiple hot-deck imputation for network inference from RNA sequencing data"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02788524

https://hal.inrae.fr/hal-02788524

Submitted on 5 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Multiple hot-deck imputation for network inference from

RNA sequencing data

Alyssa Imbert, Armand Valsesia, Claudia Armenise, Gregory Lefebvre,

Pierre-Antoine Gourraud, Nathalie Viguerie, Nathalie Vialaneix

To cite this version:

Alyssa Imbert, Armand Valsesia, Claudia Armenise, Gregory Lefebvre, Pierre-Antoine Gourraud,

et al.. Multiple hot-deck imputation for network inference from RNA sequencing data. European

Conference on Computational Biology (ECCB 2018), Sep 2018, Athènes, Greece. �hal-02788524�

(2)

Description (150 words max.)

In recent years, high-throughput sequencing technology has become an es-sential tool for genomic studies. Furthermore, more and more studies include gene expression data measured by RNA-sequencing. Networks inferred from such data provide a global view of the relations existing between gene expres-sion in a given transcriptomic experiment. However, the number of samples is still very small compared to the number of genes and that makes network inference an unrealistic objective for most RNA-seq experiments.

In order to take advantage of external information measured simultaneously to RNA-seq data, we propose a novel imputation method, hot-deck multiple imputation (hd-MI), that artificially increases the sample size and thus improves the reliability of network inference. hd-MI is able to impute samples which are missing for RNA-seq data but are observed on a secondary dataset (here, we used quantitative PCR data) by assuming gene expression similarity between samples that are similar for the secondary dataset.

Justification (250 words max.)

hd-MI was compared to direct inference and to other state-of-the art im-putation methods on real world datasets. More stable and accurate networks were obtained, with an improvement in precision of edge prediction up to 30% of missing individuals.

Specifically focusing on a dataset acquired for the clinical-research project DiOGenes http://www.diogenes-eu.org/, we also found out that networks inferred by using hd-MI were coherent with previous findings based on other datasets and/or on a different subset of individuals. We were also able to dis-cover novel links between genes that was coherent with functional knowledge.

More interestingly, since DiOGenes is a study based on a dietary intervention that aimed at investigating the effect of a low-calorie diet on adipose tissue in obese individuals, data have been acquired on the same individuals at two different time steps (before and after the diet). hd-MI was used to impute individuals missing in one time step before performing independent inference of two networks. The results showed an improved comparability between the networks inferred for the two steps of the nutritional intervention.

hd-MI is implemented in the R package RNAseqNet, released on CRAN and forthcoming research includes developing an inference method that can jointly estimate evolving network corresponding to distinct steps of a longitudinal anal-ysis. Our purpose is to simultaneously handle missing individuals and explicitly model the relationships between expressions measured on the same individuals along the longitudinal study.

Références

Documents relatifs

This work features 4 important novelties: i) we propose a statistically sound nonparametric estimation of the inherent mean-variance relationship in RNA- seq data; ii) we present

Results are presented for the Gaussian graphical model on log-transformed data (blue), the log-linear Poisson graphical model on power-transformed data (red) and the

In addition to the intersection of independent per-study analyses (where genes were declared to be differentially expressed if identified in more than half of the studies with

observed with a size ≥ 5000 bp leading to scattered dots. The dotted line separates the red regression line corresponding to transcripts < 600 bp and the blue one to transcripts

Differential expression studies were until now restricted to gene level studies, i.e., ignoring the transcript level information, to cases where the set of expressed transcripts

We show experimentally, on simulated and real data, that FlipFlop is more accurate than simple pooling strategies and than other existing methods for isoform deconvolution from

3.3 Illustration of the interest on complete DiOGenes data Finally, two networks (one for CID1 and one for CID2) were inferred using all individuals from RNA-seq datasets for

In the specialized taxonomic journals (4113 studies), molecular data were widely used in mycology, but much less so in botany and zoology (Supplementary Appendix S7 available on