Statistical and computational approaches for network inference and comparison. Application to regulation perturbation in cancer.

(1)

Statistical and computational approaches for network inference and comparison.

Application to regulation perturbation in cancer.

Etienne Birmel´ e

Analyse du g´ enome tumoral, 2014

(2)

Motivations

Discrete approach

A continuous approach: Gaussian Graphical Models Perturbation analysis

Take home messages

(3)

Cancer is a wound that does not heal

I

Stat3 activity is associated with both wound healing and cancer in lungs (Dauer et al., Oncogene, 2005).

I

77% of genes differentially expressed in both Renal regeneration and repair and Renal cell carcinoma are concordantly regulated (Riss et al., Cancer research, 2006).

I

The expression pattern of genes involved in wound healing

influences survival in breast cancer (Chang et al., PNAS,

2005).

(4)

Bladder cancer

Similar observations can be made on bladder tumor data (F.

Radvanyi, personal communication).

I

179 samples on tumoral cells.

I

30 samples on normal human urothelium cells (J. Southgate, York).

I

7962 genes among which 432 putative regulators.

(5)

Main question

Is it possible to determine the elements of the cell’s regulation processes that differ in tumorous cells?

Which computational and statistical methods can be used to deal

with genome-wide data?

(6)

Why consider gene regulation networks

Only a very small component of the heritability of common complex diseases is explained by GWAS-identified allelic variants.

Similarly, somatic alterations account for only a small fraction of all cancer subtype cases. The majority of these phenotypes are either associated with extremely rare variants/alterations that are difficult to identify and validate, or show no statistical association with any genetic event.

Recent approaches suggest that the unexplained variance may be accounted for by the ability of master regulator genes, within cell regulatory networks, to integrate an entire spectrum of genetic and epigenetic variants where, in isolation, any one variant may not be statistically significant in a GWAS analysis.

in Lefebvre, Rieckhof and Califano, 2012.

(7)

Why consider gene regulation networks

Only a very small component of the heritability of common complex diseases is explained by GWAS-identified allelic variants.

Similarly, somatic alterations account for only a small fraction of all cancer subtype cases. The majority of these phenotypes are either associated with extremely rare variants/alterations that are difficult to identify and validate, or show no statistical association with any genetic event.

Recent approaches suggest that the unexplained variance may be accounted for by the ability of master regulator genes, within cell regulatory networks, to integrate an entire spectrum of genetic and epigenetic variants where, in isolation, any one variant may not be statistically significant in a GWAS analysis.

in Lefebvre, Rieckhof and Califano, 2012.

(8)

Regulatory network zoology

Type of regulation

I

Transcriptional regulation

I

Alternative splicing regulation

I

Post-transcriptional and post-translational regulation Type of network

I

Influence (or causal) maps

I

Physical regulatory maps

I

Kinetic models

(9)

Preliminary remarks and choices

I

Regulation networks are context-specific.

I

What does a node represent in the network?

easy

I

What does an edge represent in the network?

more tricky

I

Should we work in a discrete or continuous framework?

(10)

Motivations

Discrete approach

A continuous approach: Gaussian Graphical Models Perturbation analysis

Take home messages

(11)

Discretization

The data is discretized, for example by using a Z -score:

Z

ik

= X

ik −

X

i•

σ(X

_i•)

and

X

_ik^discrete

=





1 if Z

_ik

> c

−

1 if Z

_ik

<

−

c

0 else

(12)

I

We suppose that the putative regulators are known.

??

TFs

Targets

I

Each gene is regulated by a GRN composed by a set of regulators and a table of truth.

A B

C

0 1

−1

0 1

−1

0 0

0

−1

0 1

(13)

Computational problems

1.

Combinatorial explosion of the number of possible GRN’s:

I mregulators form 2^m sets in general and Ω(m^k) sets of size at mostk.

I nregulated genes yield respectively 2^mnand Ω(m^kn) regulation graphs.

m TFs

n Targets

(14)

Computational problems

1.

Combinatorial explosion

2.

How to compare two candidate regulation graphs?

(15)

Computational problems

1.

Combinatorial explosion

2.

How to compare two candidate regulation graphs?

A discrepancy is a GRN and a set of two experiments such that the inputs of the GRN are the same but the output is different.

Sample 1 Sample 2

0 1

0

1

1 0

Computing the minimum number of ’mistakes’ to correct in order to solve all discrepancies for a given graph is

NP-complete (Karlebach and Shamir, 2012).

(16)

Alternative strategy 1: LICORN (Elati et al., 2007)

Assumption

The space of possible GRN’s is restricted: every GRN is a pair

(A, I ) of respectively co-activators and co-inhibitors.

(17)

Main steps of the Licorn algorithm

I

candidate co-regulator sets are regulators which have

simultaneously value +1 or

−

1 in a given fraction T

_s

(

∼

20%)

of the samples

(18)

Main steps of the Licorn algorithm

I

candidate co-regulator sets are regulators which have

simultaneously value +1 or

−

1 in a given fraction T

s

(

∼

20%) of the samples

I

co-activator and co-inhibitor sets for a gene g which are over(under)-expressed in the sample with probability at least T

_o

(

≥

20%) when g is over(under)-expressed.

Remark

The problem of discrepancies is solved at this step.

(19)

Main steps of the Licorn algorithm

I

candidate co-regulator sets are regulators which have

simultaneously value +1 or

−

1 in a given fraction T

_s

(

∼

20%) of the samples

I

co-activator and co-inhibitor sets for a gene g which are over(under)-expressed in the sample with probability at least T

_o

(

≥

20%) when g is over(under)-expressed.

I

these sets are ranked by their prediction score: for A

∈ A

(g ), I

∈ I

(g ) with A

∩

I =

∅

,

h

g

(A, I ) =

X

s∈S

|

g

s −

g ˆ

s

(A, I )

|

(20)

Main steps of the Licorn algorithm

I

candidate co-regulator sets are regulators which have

simultaneously value +1 or

−

1 in a given fraction T

_s

(

∼

20%) of the samples

I

co-activator and co-inhibitor sets for a gene g which are over(under)-expressed in the sample with probability at least T

_o

(

≥

20%) when g is over(under)-expressed.

I

these sets are ranked by their prediction score: for A

∈ A

(g ), I

∈ I

(g ) with A

∩

I =

∅

,

h

g

(A, I ) =

X

s∈S

|

g

s −

g ˆ

s

(A, I )

|

I

Generation of p-values for the GRN’s by randomizing the

samples.

(21)

Subgraph for the normal bladder data

IRX2 SPRR2A

GATA3

HOXB6 PPARGBHLHE41

FOXC1 SPRR3

IRF1 PSIP1

FOXF2 SREBF2

ETV6

HOXA1 ST14 HES4

NR3C1 SIX1

SNORA61 ZNF512B

FOXO3 ZNF83

STC1

HOXA13 MEIS2

E2F5

BCL6

(22)

Alternative strategy 2: Hidden variable model

Idea

The observed data follows a random variable depending on the real (but non-observed) discrete status.

If Z

∈ {−

1,

0,1}

denotes the real status, X

|

Z = i

∼ N

(µ

_i

, σ

_i

).

−5 0 5

0.000.050.100.150.20

x

Hidden (true state) sample:

0,−1,0,0,1,1,0 Observed sample:

x₁,x₂,x₃,x₄,x₅,x₆,x₇,x₈

Problem

Genome-wide studies are not tractable for the moment (Karlebach

and Shamir, 2012)

(23)

Alternative strategy 2: Hidden variable model

Idea

The observed data follows a random variable depending on the real (but non-observed) discrete status.

If Z

∈ {−

1,

0,1}

denotes the real status, X

|

Z = i

∼ N

(µ

_i

, σ

_i

).

−5 0 5

0.000.050.100.150.20

x

Hidden (true state) sample:

0,−1,0,0,1,1,0 Observed sample:

x₁,x₂,x₃,x₄,x₅,x₆,x₇,x₈

Problem

Genome-wide studies are not tractable for the moment (Karlebach

and Shamir, 2012)

(24)

Motivations Discrete approach

A continuous approach: Gaussian Graphical Models

Conditional independence and Partial correlation Graphical models

The Model Inference

Perturbation analysis

Take home messages

(25)

Canonical model settings

Notations

1.

a set

P

=

{

1, . . . , p

}

of p variables:

these are typically the genes (could be proteins, exons . . . );

2.

a sample

N

=

{

1, . . . , n

}

of individuals associated to the variables:

these are typically the microarray (could be sequence counts).

Basic statistical model This can be view as

I

a random vector X in

R^p

, whose j th entry is the j th variable,

I

a n-size sample (X

¹

, . . . , X

ⁿ

), such as X

ⁱ

is the i th microarrays

I

assume a Gaussian probability distribution for X .

(26)

Canonical model settings

Notations

1.

a set

P

=

{

1, . . . , p

}

of p variables:

these are typically the genes (could be proteins, exons . . . );

2.

a sample

N

=

{

1, . . . , n

}

of individuals associated to the variables:

these are typically the microarray (could be sequence counts).

Basic statistical model This can be view as

I

a random vector X in

R^p

, whose j th entry is the j th variable,

I

a n-size sample (X

¹

, . . . , X

ⁿ

), such as X

ⁱ

is the i th microarrays

I

assume a Gaussian probability distribution for X .

(27)

Independence

Definition (Independence of events)

Two events A and B are independent if and only if

P

(A, B) =

P

(A)

P

(B), which is usually denoted by A

⊥

B. Equivalently,

I

A

⊥

B

⇔P

(A

|

B ) =

P

(A),

I

A

⊥

B

⇔P

(A

|

B ) =

P

(A

|

B

^c

) Example (class vs party)

party class Labour Tory working 0.42 0.28 bourgeoisie 0.06 0.24

party

class Labour Tory

working 0.60 0.40

bourgeoisie 0.20 0.80

Table: Joint probability (left) vs. conditional probability (right)

(28)

Conditional independence

Definition (Conditional independence of events) Two events A and B are independent if and only if

P

(A, B

|

C ) =

P

(A

|

C )

P

(B

|

C ), which is usually denoted by A

⊥

B

|

C

Example (Do reading skills depends on weight?) Consider the events A = ”reading slowly”,

B = ”having low weight”.

(29)

Conditional independence

Definition (Conditional independence of events) Two events A and B are independent if and only if

P

(A, B

|

C ) =

P

(A

|

C )

P

(B

|

C ), which is usually denoted by A

⊥

B

|

C

Example (Do reading skills depends on weight?) Consider the events A = ”reading slowly”,

B = ”having low weight”. Estimating

P

(A, B),

P

(A) and

P

(B) in a sample would lead to

P

(A, B)

6

=

P

(A)

P

(B )

(30)

Conditional independence

Definition (Conditional independence of events) Two events A and B are independent if and only if

P

(A, B

|

C ) =

P

(A

|

C )

P

(B

|

C ), which is usually denoted by A

⊥

B

|

C

Example (Do reading skills depends on weight?) Consider the events A = ”reading slowly”,

B = ”having low weight”. But in fact, introducing C = ”having a given age”,

P

(A, B

|

C ) =

P

(A

|

C )

P

(B

|

C )

(31)

The univariate Gaussian distribution

The Gaussian distribution is the natural model for the level of expression of gene (noisy data).

We note X

∼ N

(µ, σ

²

), so as

E

X = µ,

VarX

= σ

²

and f

X

(x) = 1

σ

√

2π exp

−

1 2σ

²

(x

−

µ)

²

, and

log f

X

(x) =

−

log σ

√

2π

−

1 2σ

²

(x

−

µ)

²

.

Studying genes one by one doesn’t allow to study their interactions.

(32)

One step forward: bivariate Gaussian distribution

Let X , Y be two real random variables.

Definitions

Cov(X

, Y ) =

E

h

X

−E

(X )

Y

−E

(Y )

i

=

E

(XY )

−E

(X )

E

(Y ).

ρ

_XY

=

cor(X

, Y ) =

Cov(X

, Y )

pVar(X

)

· Var(Y

) .

Proposition

I Cov(X

, X ) =

Var(X

),

I Var(X

+ Y ) =

Var(X

) +

Var(Y

) +

Cov(X

, Y ).

I

X

⊥

Y

⇒Cov(X

, Y ) = 0.

I

X

⊥

Y

⇔Cov(X,

Y ) = 0 when X

,

Y are Gaussian.

(33)

The bivariate Gaussian distribution

f

XY

(x, y) = 1

√

2π det Σ exp

{

1 2 x

−

µ

1

y

−

µ

2

Σ⁻¹

x

−

µ

1

y

−

µ

2

}

where

Σ

is the variance/covariance matrix which is symmetric and positive definite.

Σ

=

Var(X

)

Cov(Y

, X )

Cov(Y

, X )

Var(Y

)

.

and

f

_X_,Y

(x, y ) = 1

√

2π(1

−

ρ

²_XY

)

−

1 2(1

−

ρ

²_XY

) (x

²

+ y

²

+ 2ρ

_XY

xy),

where ρ

_XY

is the correlation between X , Y and describe the

interaction between them.

(34)

The bivariate Gaussian distribution

f

XY

(x, y) = 1

√

2π det Σ exp

{

1 2 x

−

µ

1

y

−

µ

2

Σ⁻¹

x

−

µ

1

y

−

µ

2

}

where

Σ

is the variance/covariance matrix which is symmetric and positive definite. If standardized,

Σ

=

1 ρ

XY

ρ

_XY

1 . and

f

_X_,Y

(x, y ) = 1

√

2π(1

−

ρ

²_XY

)

−

1 2(1

−

ρ

²_XY

) (x

²

+ y

²

+ 2ρ

_XY

xy),

where ρ

_XY

is the correlation between X , Y and describe the

interaction between them.

(35)

The bivariate Gaussian distribution

The Covariance Matrix Let

X

∼ N

(0,

Σ),

with unit variance and ρ

_XY

= 0

Σ

= 1 0

0 1

.

The shape of the 2-D

distribution evolves

accordingly.

(36)

The bivariate Gaussian distribution

The Covariance Matrix Let

X

∼ N

(0,

Σ),

with unit variance and ρ

_XY

= 0.9

Σ

=

1 0.9 0.9 1

.

The shape of the 2-D

distribution evolves

accordingly.

(37)

Full generalization: multivariate Gaussian vector

Let X , Y , Z be real random variables.

Definitions

Cov(X

, Y

|

Z ) =

Cov(X

, Y )

−Cov(X

, Z )Cov(Y , Z )/Var(Z ).

ρ

_XY|Z

= ρ

XY −

ρ

XZ

ρ

YZ

q

1

−

ρ

²_XZq

1

−

ρ

²_YZ

.

Give the interaction between X and Y once removed the effect of Z .

Proposition

When X , Y , Z are jointly Gaussian, then

Cov(X,

Y

|

Z ) = 0

⇔cor(X,

Y

|

Z ) = 0

⇔

X

⊥

Y

|

Z

.

(38)

Conditional Independence Graphs

Definition

The conditional independence graph of a set of random variables X

1

, . . . , X

p

is the undirected graph

G

=

{P

,

E}

with the set of node

P

=

{

1, . . . , p

}

and where

(i , j )

∈ E ⇔

/ X

i⊥

X

j|P\{

i , j

}

.

Property

It owns the Markov property: any two subsets of variables

separated by a third is independent conditionally on variables in the

third set.

(39)

Conditional Independence Graphs

Definition

The conditional independence graph of a set of random variables X

1

, . . . , X

p

is the undirected graph

G

=

{P

,

E}

with the set of node

P

=

{

1, . . . , p

}

and where

(i , j )

∈ E ⇔

/ X

i⊥

X

j|P\{

i , j

}

.

Property

It owns the Markov property: any two subsets of variables

separated by a third is independent conditionally on variables in the

third set.

(40)

Conditional Independence Graphs: an example

Graphical representation

1 2

3 4

I

X

1

and X

4

are conditionnally independant given X

2

.

I

X

1

and X

4

are not conditionnally independant given X

3

.

(41)

Scheme for steady-state data

≈10s microarrays over time

≈1000s probes (“genes”)

Inference

Which interactions?

(42)

Modeling the underlying distribution

Model for data generation

I

A microarray can be represented as a multivariate vector X = (X

1

, . . . , X

p

)

∈R^p

,

I

Consider n biological replicate in the same condition, which forms a usual n-size sample (X

₁

, . . . , X

_n

).

Gaussian Graphical Model

I

X

∼ N

(µ,

Σ) with

X

₁

, . . . , X

_n

i.i.d. copies of X ,

I Θ

= (θ

_ij

)

_i,j∈P ,Σ⁻¹

is called the concentration matrix.

I

−

θ

_ij

p

θ

_ii

θ

_jj

=

cor

X

_i

, X

_j|

X

_P\i,j

= ρ

_ij|P\{i_,j_}

,

(43)

Modeling the underlying distribution

Model for data generation

I

A microarray can be represented as a multivariate vector X = (X

1

, . . . , X

p

)

∈R^p

,

I

Consider n biological replicate in the same condition, which forms a usual n-size sample (X

₁

, . . . , X

_n

).

Gaussian Graphical Model

I

X

∼ N

(µ,

Σ) with

X

1

, . . . , X

n

i.i.d. copies of X ,

I Θ

= (θ

_ij

)

_i,j∈P ,Σ⁻¹

is called the concentration matrix.

I

−

θ

_ij

p

θ

_ii

θ

_jj

=

cor

X

_i

, X

_j|

X

_P\i,j

= ρ

_ij|P\{i_,j_}

,

(44)

Modeling the underlying distribution

Graphical Interpretation

The matrix

Θ

= (θ

_ij

)

i,j∈P

encodes the network

G

we are looking for.

conditionaldependency betweenXj andXi

or

non-null partial correlation betweenXjandXi

m θ_ij6= 0 if and only if

i j

?

Remark

G

is the conditional independence graph:

I

It is undirected for steady-state data (only time-course data or biological knowledge allow to retrieve the directions)

I

If X

_i

and X

_j

are conditionnally independant on a variable Z

which is not present in the data, they will be dependant: the

links in

G

may not correspond to biochemical interactions.

(45)

The Maximum likelihood estimator

Let X be a random vector with distribution defined by f

X

(x;

Θ),

where

Θ

are the model parameters.

Maximum likelihood estimator

Θ

ˆ =

argmax_ΘL

(Θ;

X)

where

L

is the log likelihood, a function of the parameters:

L

(Θ;

X) = log Yn k=1

f

_X

(x

^k

;

Θ),

where

x^k

is the k row of

X.

Remarks

I

This a convex optimization problem,

I

We just need to detect non zero coefficients in

Θ

(46)

The likelihood for steady-state model

Let

S

= n

⁻¹X^|X

be the empirical variance-covariance matrix:

S

is a sufficient statistic of

Θ.

The log-likelihood

Liid

(Θ;

S) =

n

2 log det(Θ)

−

n

2

Trace(SΘ) +

n

2 log(2π).

The MLE =

S⁻¹

of

Θ

is not defined for n < p and never sparse.

The need for regularization is huge.

(47)

The penalized likelihood approach

Let

Θ

be the parameters to infer (the edges).

A penalized likelihood approach

Θ

ˆ

_λ

= arg max

Θ L

(Θ;

X)−

λ

pen_`₁

(Θ),

I L

is the model log-likelihood,

I pen_`₁

is a penalty function tuned by λ > 0.

It performs

1. regularization(needed whennp), 2. selection(sparsity induced by the`1-norm),

(48)

The penalized likelihood approach

Let

Θ

be the parameters to infer (the edges).

A penalized likelihood approach

Θ

ˆ

_λ

= arg max

Θ L

(Θ;

X)−

λ

pen_`₁

(Θ),

I L

is the model log-likelihood,

I pen_`₁

is a penalty function tuned by λ > 0.

It performs

1. regularization(needed whennp), 2. selection(sparsity induced by the`1-norm),

(49)

A geometric intuition of penalisation

The `

1

-norm of a vector is the sum of the absolute values of its coordinates.

Among vectors with a given `

₂

-norm (euclidian distance to 0), those with the smallest `

1

norm are those with all coordinates which are null but one.

Intuitively, the penalisation therefore favors sparse values of

Θ:

θ

_ij

is chosen non-null only if the gain in likelihood is greater then the

cost of the corresponding penalisation.

(50)

Solving the penalized problem

I

Tibshirani (1996) showed that solving the penalized likelihood problem is equivalent to the Ordinary Least Square problem on a `

1

bounded area:

3.2. Régularisations!p 23

β^ls

β^!¹ β1

β2

β^ls β^!²

β1 β2

Fig.3.2–Comparaisons des solutions de problèmes régularisés par une norme!1et!2. À gauche de la figure3.2,β^!¹est l’estimateur du problème (3.2) régularisé par une norme!1. La deuxième composante deβ^!¹est annulée, car l’ellipse atteint la région admissible sur l’angle situé sur l’axeβ2=0. À droite de la figure3.2,β^!²est l’estimateur du problème (3.2) régularisé par une norme

!2. La forme circulaire de la région admissible n’incite pas les coefficients à atteindre des valeurs nulles.

Afin de poursuivre cette discussion avec des arguments à la fois simples et formels, on peut donner l’expression d’un coefficient des estimateurs β^!¹etβ^!², lorsque la matriceXest orthogonale (ce qui correspond à des contours circulaires pour la fonction de perte quadratique). Pourβ^!², nous avons

β^!m² = 1 1+λβ^lsm.

Les coefficients subissent un rétrécissement²proportionnel par le biais du facteur 1 /(1+λ). En particulier,β^!m²ne peut être nul que si le coefficient β^lsmest lui même exactement nul. Pourβ^!¹, nous avons

β^!m¹ = sign! β^lsm

" !

|β^lsm| −λ"

+,

où[u]+=max(0,u). On obtient ainsi un seuillage « doux » : les compo- santes des coefficients desmoindres carréssont rétrécies d’une constanteλ lorsque|β^ls_m|>λ, et sont annulés sinon.

Stabilité

Définition3.2 Stabilité— Selon Breiman [1996], un problème est instable si pour des ensembles d’apprentissage similaires mais pas identiques (petites perturbations), on obtient des prédictions ou des estimateurs très différents (grande perturbation).

Remarque3.5— Bousquet et Elisseeff [2002] ont défini de façon formelle différentes notions de stabilité, basées sur le comportement des estimateurs quand l’échantillon d’apprentissage est perturbé par le retrait ou le

remplacement d’un exemple. "

2Shrinkage, en anglais.

( minimize

β∈R² ky−Xβk²2

,

s.t. kβk1

=

|

β

1|

+

|

β

2| ≤

c.

m minimize

β∈R ky−Xβk²2

+ λ

kβk¹

.

I

The LARS algorithm (Efron et al, 2004) solves the problem

efficiently.

(51)

Example: prostate cancer

*

* * * * *

*

0.0 0.2 0.4 0.6 0.8 1.0

0246

|beta|/max|beta|

Standardized Coefficients

* * *

* *

* * * *

* * * * * *

*

* * * * *

*

* * *

* *

*

* *

* * *

*

* * * * * * * *

*

* * * * * * * * *

* * * * *

* * *

* LASSO

37821

0 1 3 5 6 7 8

(52)

Choice of the tuning parameter λ I

Model selection criteria

BIC(λ) =ky−Xβ

ˆ

_λk²2−df( ˆβ_λ

) log n 2

AIC(λ) =ky−Xβ

ˆ

_λk²2−df( ˆβ_λ

) where

df( ˆβ_λ

) is the number of nonzero entries in

β^λ

. Cross-validation

1.

split the data into K folds,

2.

use successively each K fold as the testing set,

3.

compute the test error on this K folds,

4.

average to obtain the CV estimation of the test error.

λ is chosen to minimize the CV test error.

(53)

Many variations

Group-Lasso

Activate the variables by group (given by the user).

Adaptive/Weighted-Lasso

Adjust the penalty level to each variables, according to prior knowledge or with data driven weights.

BoLasso

Bootstrapped version that removes false positives/stabilizes the estimate.

etc.

+ many theoretical results.

(54)

Motivations Discrete approach

A continuous approach: Gaussian Graphical Models

Perturbation analysis

Take home messages

(55)

Back to our initial problem

Let X and Y be two gene expression matrices corresponding to the same genes in differenet conditions.

What can we say about the differences in the regulation structure?

(56)

The na¨ıve approach

Natural idea

It seems natural to infer a regulation network on normal data, a regulation network on tumoral data and then to compare.

Condition 1 Condition 2

(X₁⁽¹⁾, . . . ,X_n⁽¹⁾

1),X_i⁽¹⁾∈R^p¹ (X₁⁽²⁾, . . . ,X_n⁽²⁾

2),X_i⁽²⁾∈R^p²

inference inference

(57)

Variability of the inference procedure

Problem

The variability of the inference procedure is greater than the biological perturbation we want to detect.

Stability for 4 targets and 14 regulators

Loss probability

Frequency

0.0 0.2 0.4 0.6 0.8

051015

Stability for 20 targets and 59 regulators

Loss probability

Frequency

0.0 0.2 0.4 0.6 0.8 1.0

020406080

(58)

The joint inference procedure

Another solution is to learn jointly the two regulation networks, by penalizing the choice of non-common edges.

Condition 1 Condition 2

(X₁⁽¹⁾, . . . ,X_n⁽¹⁾

1),X_i⁽¹⁾∈R^p¹ (X₁⁽²⁾, . . . ,X_n⁽²⁾

2),X_i⁽²⁾∈R^p² inference

(59)

The joint inference procedure

Another solution is to learn jointly the two regulation networks, by penalizing the choice of non-common edges.

Advantages

I

the sample is greater;

I

even if the network is noisy, the differences may be relevant.

Methods

I

Chiquet et al., 2011

I

Mohan et al., 2012

I

Vialaneix and SanCristobal, 2013

(60)

The joint inference procedure: applications

I

CXCL13 is a regulator which is not frequently mutated in brain cancer. However, its role is perturbed in tumorous cells (Mohan et al., 2012)

I

Comparison of ER+ and ER- breast cancer.

ERBB4 ERBB3

IGF1R EGFR

ESR1

BCL2

Apoptosis Extracellular space

Plasma membrane

Cytoplasm Nucleus

Growth Hormone IGF‐1

ac?va?on repression

Kinase

Ligand‐dependent nuclear receptor Transmembrane receptor Other MAPT

B binding

CDK6 B

ER+ speciﬁc regula?on ER‐ speciﬁc regula?on

(Jeanmougin et al., 2011).

(61)

The perturbation model

I

Learn a reference network on the union of all samples, integrating informations from different kinds;

I

In a given tumoral condition, list the genes not behaving as they should according to the reference;

I

Introduce a perturbation model and learn its parameters to explain the previous differences.

Regulators

Targets

(62)

The perturbation model

I

Learn a reference network on the union of all samples, integrating informations from different kinds;

I

In a given tumoral condition, list the genes not behaving as they should according to the reference;

I

Introduce a perturbation model and learn its parameters to explain the previous differences.

Regulators

Targets

(63)

The perturbation model

I

Learn a reference network on the union of all samples, integrating informations from different kinds;

I

In a given tumoral condition, list the genes not behaving as they should according to the reference;

I

Introduce a perturbation model and learn its parameters to explain the previous differences.

Regulators

Targets

(64)

Motivations Discrete approach

A continuous approach: Gaussian Graphical Models Perturbation analysis

Take home messages

(65)

Take home messages

1.

Large data, including NGS, allow to study (epi)genomic regulations at the genome scale

I theoretical and computational developments in Systems Biology.

2.

A statistical link may not represent a biological link.

3.

The dimension problem (n << p) is crucial.

I It implies modelisation choices.

I It implies instability. Statistical/computational discoveries have therefore to be validated from a biological point of view.

I Enlarge your samples as much as possible!

(66)

Thanks for your attention!

Thanks also to Julien Chiquet for his slides on GGMs and Fran¸ cois

Radvanyi’s group for their data.

(67)

Bibliography

D. Dauer, Oncogene, 2005.

Stat3 regulates genes common to both wound healing and cancer.

J. Riss et al., Cancer Research, 2006.

Cancers as Wounds that Do Not Heal: Differences and Similarities between Renal Regeneration/Repair and Renal Cell Carcinoma

HY. Chang et al. , PNAS, 2005.

Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival.

C. Lefebvre et al., WIRE Syst Biol Med, 2012.

Reverse-engineering human regulatory networks.

M. Elati et al., Bioinformatics, 2007.

LICORN: learning cooperative regulation networks from gene expression data.

G. Karlebach and R. Shamir, J. Comp. Bio., 2012.

Constructing logical models of gene regulatory networks by integrating transcription factor-DNA interactions with expression data: an entropy based approach.

R. Tibshirani, 1996.

The Lasso: Least Absolute Shrinkage and Selection Operator

B. Efron, Ann. Stat., 2004.

Least Angle Regression.

J. Chiquet et al., Statistics and Computing, 2011.

Inferring multiple graphical structures.

K. Mohan and al., NIPS, 2012.

Structured sparse learning of multiple Gaussian graphical models.

N. Villa-Vialaneix and M. Sancristobal, SFDS, 2013.

Consensus LASSO:: Inférence conjointe de réseaux de gènes dans des conditions expérimentales multiples.

M. Jeanmougin et al., personal communication.

Network Inference in Breast Cancer with Gaussian Graphical Models and Extensions