• Aucun résultat trouvé

The R package FANet: sparse factor analysis model for high dimensional gene co-expression networks

N/A
N/A
Protected

Academic year: 2021

Partager "The R package FANet: sparse factor analysis model for high dimensional gene co-expression networks"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01211038

https://hal.archives-ouvertes.fr/hal-01211038

Submitted on 3 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

The R package FANet: sparse factor analysis model for high dimensional gene co-expression networks

Anne Blum, Magalie Houee, Sandrine Lagarrigue, David Causeur

To cite this version:

Anne Blum, Magalie Houee, Sandrine Lagarrigue, David Causeur. The R package FANet: sparse

factor analysis model for high dimensional gene co-expression networks. The International R Users

Conference, Jun 2014, Los Angeles, United States. 2014, UserR contributed abstracts. �hal-01211038�

(2)

The R package FANet: sparse Factor Analysis model for high dimensional gene co-expression Networks

Yuna Blum 1

?

, Magalie Hou´ee-Bigot 2 Sandrine Lagarrigue 3 David Causeur 2

1. Department of Human Genetics, University of California, Los Angeles CA. USA 2. Agrocampus Ouest- Applied Mathematics Department, Rennes, FRANCE 3. Agrcoampus Ouest - UMR PEGASE INRA, Agrocampus Ouest, Rennes, FRANCE.

?

Contact author: [email protected]

Keywords: co-expression networks, factor analysis, high dimension, sparsity, R

Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such interaction networks are very insightful for the deep understanding of biological relationships between genes. In particular, a functional characterization of gene modules of highly interacting genes enables the identification of biological processes underlying complex traits as diseases. Inference on this dependence structure shall account for both the high dimension of the data and the sparsity of the interaction network.

The R package FANet provides a powerful method for estimating high dimensional co-expression networks.

Extending the idea introduced for differential analysis by Blum et al. [1] and Friguet et al. [2] we suggest to take advantage of a low-dimensional latent linear structure of dependence to improve the stability of correlation estimations. We propose an EM algorithm to fit a sparse factor model for correlations and demonstrate how it helps extracting modules of genes and more generally improves the gene clustering performance. Two functions are available in FANet package in order to introduce sparsity in the network estimation. One function is based on a LASSO estimation using a cyclic coordinate descent algorithm. As an alternative, the second function is based on biological knowledge integration as Gene Ontology annotation.

Finally, FANet results can serve as an input for WGCNA (Langfelder and Horvath [3]) procedure for gene modules detection.

References

[1] Blum, Y., G. Le Mignon, S. Lagarrigue, and D. Causeur (2010). A factor model to analyze heterogeneity in gene expression. BMC bioinformatics 11(1), 368.

[2] Friguet, C., M. Kloareg, and D. Causeur (2009). A factor model approach to multiple testing under dependence. Journal of the American Statistical Association 104(488), 1406–1415.

[3] Langfelder, P. and S. Horvath (2008). Wgcna: an r package for weighted correlation network analysis.

BMC bioinformatics 9(1), 559.

Références

Documents relatifs

Three versions of the MCEM algorithm were proposed for comparison with the SEM algo- rithm, depending on the number M (or nsamp) of Gibbs iterations used to approximate the E

Inferring gene networks using a sparse factor model approach, Statistical Learning and Data Science.. Yuna Blum, Magali Houée, Chloé Friguet, Sandrine Lagarrigue,

HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data.. Laurent Bergé, Charles Bouveyron,

We develop the minimax lower bounds for the quadratic risk and construct an adaptive estimator which attains those lower bounds within a constant (if all time-dependent covariates

In order to investigate the effect of the sample size on clustering results in high- dimensional spaces, we try to cluster the data for different dataset sizes as this phenomenon

SparseEmfaBio: this function uses the biological constraines matrix to introduce sparsity in the factor model for correlation. biomat: constructed by the user or by using

Multiple factor analysis [2] deals with a multiple table, composed of sets of either quantitative or categorical variables balancing the influence of the different sets on

A second class of methods aims at aggregating interaction tests performed at the SNP level by the minimum p value ( minP , Emily 2016), a gene-based association test using