• Aucun résultat trouvé

Reconstruction quality of a biological network when its constituting elements are partially observed

N/A
N/A
Protected

Academic year: 2022

Partager "Reconstruction quality of a biological network when its constituting elements are partially observed"

Copied!
1
0
0

Texte intégral

(1)

Reconstruction quality of a biological network when its constituting elements are partially observed

Victor Picheny, Matthieu Vignes, Nathalie Villa-Vialaneix INRA, UR875 MIA-T, France & Massey University, IFS, NZ -

firstname.lastname@toulouse.inra.fr

Application framework: impact of selecting genes on network inference Ideal network inference

−→ −→

true biological network observations transcriptomic data estimation predicted network

(unknown) (from full data)

Inference with missing nodes

−→ −→

projected biological network observations transcriptomic data estimation ??? predicted network ???

with gene selection (incomplete) (from partial data)

red edges:

false positive

blue edges:

false negative

Research questions: Evaluate the impact of gene sampling on the estimation of network Theoretical framework: Gaussian Graphical Model framework

Gene expression: X ∼ N (0, Σ), sample size: n, number of genes: p and X = (XO, XH), with XH not observed.

non-zero entries of S = Σ−1 ⇔ edges of full graph

Questions:

Influence of:

• ratio of missing variables r

• missing node context: ran- dom or peculiar nodes (e.g.

big/small degree or large/small betweenness)

How to estimate the errors?

• compare to a graph whose links reflect path existence in the “true” graph (induced )

• compare to a graph inherited from edges of the “true” graph only (projected )

• compare to a graph learnt from complete data

Method 1 (naive approach)

graphical Lasso [1] on observed data

Σ−1OO = SOO

| {z }

to be estimated

− SOH(SHH )−1SHO

| {z }

biais

Method 2: CPW-S+L [2]

Question of identifiability of the 2 components of Σ−1OO:

sparse SOO and

low-rank SOO(SHH)−1SHO

→ via an algebraic study of sparse and low-rank matrix varieties.

More specifically: transversality of tangent spaces T(SOO) and T?(SOH(SHH)−1SHO) statistical identifiability.

Assumptions:

• sparsity = few non-zeros per column/row no dense subgraph.

• SOH(SHH)−1SHO has row/column spaces not too aligned with coordi- nate axes marginalisation effect over XH’s is “spread out” over many XO’s.

penalised likelihood method leads to consistent estimate via: (SOO, SOH\(SHH)−1SHO) = argmin

(S,L),S−L0,L0

−l(S −L, ΣOO)+λ [γkSkl1 + tr(L)]

with l(S, Σ) = log det(S) − tr(SΣ) + c, the GGM log-likelihood.

Experimental setup

Tests on simulated data sets:

• data simulated according to a GGM with p = 100 genes

• sample size: n = 100 and 1, 000

• ratio of missing variables: r = 0 (full graph), 5%, 10%, 20%

and 30%

• missing node “context”: with large/low degree, high/low or at random.

(10 replicate networks)

Selected results: precision vs. recall curves

(precision = T PT P+F P , recall = T PT P+F N )

References

[1] J. Friedman, T. Hastie, and R. Tibshirani (2007) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.

[2] V. Chandrasekaran, P.A. Parillo, and A.S. Willsky (2012). Latent variable graphical model selection via convex optimization. Annals of Statistics 40:1935–1967.

Références

Documents relatifs

We then describe a sparse estimator based on the Dantzig selector and upper bound its non-asymptotic error, showing that it achieves the optimal convergence rate for most of

Keywords: Asymptotic normality, Consistency, Missing data, Nonparametric maximum likeli- hood, Right-censored failure time data, Stratified proportional intensity model, Variance

This paper considers the problem of estimation in the stratified proportional intensity regression model for survival data, when the stratum information is missing for some

espèce grignote les feuilles de tremble de façon très particulière (trou central dans les feuilles), ce qui permet de la repérer plus facilement dans l ' arbre

The thesis is probably correct that totalitarian regimes, which are intolerant of ideologies (ideas of a political, economic and cultural sort) at odds with their own, will

In the present paper we show that multiple stationary black hole configurations can- not be possible were the solution everywhere (in the domain of outer communications) locally

Left: Ratios of number of function evaluations performed in single runs to reach a precision 10 −10 close to the global optimal f value of the proposed approach over CMA-ES depending

With an emphasis on partial-observed data, our main contributions are as follows: (i) a versatile end-to-end framework for solving inverse problems by jointly learning the