• Aucun résultat trouvé

Combining tree-based and dynamical systems for the inference of gene regulatory networks

N/A
N/A
Protected

Academic year: 2021

Partager "Combining tree-based and dynamical systems for the inference of gene regulatory networks"

Copied!
38
0
0

Texte intégral

(1)

Combining tree-based and dynamical systems

for the inference of gene regulatory networks

an Anh Huynh-Thu and Guido Sanguinetti

“Network Inference: New Methods and New Data”

3rd September, 2016

(2)

Inferring regulatory networks is a challenging problem

(3)
(4)

There are two main families of methods

Score-based: compute statistical dependencies

between pairs of expression profiles

(e.g. linear correlation)

Target gene

gene 1

gene 2

· · ·

gene p

gene 1

-

0.05

· · ·

0.56

Regulating

gene 2

0.19

-

· · ·

0.03

gene

· · ·

· · ·

· · ·

· · ·

· · ·

gene p

0.11

0.42

· · ·

-→ Fast, but can not make predictions

Model-based: learn a model capturing the dynamics

of the network

(e.g. differential equations)

(5)

Hybrid approach: Jump3

- Model for gene expression

- Tree-based method for network reconstruction

(6)

Hybrid approach: Jump3

- Model for gene expression

- Tree-based method for network reconstruction

Results

(7)

Hybrid approach: Jump3

- Model for gene expression

- Tree-based method for network reconstruction

Results

(8)

We use the on/off model of gene expression

For each gene i :

dx

i

= (A

i

µ

i

(t) + b

i

− λ

i

x

i

)dt + σdw (t)

x

i

(t): gene expression

µ

i

(t): promoter activity state (0/1)

(9)

We model the expression x

i

as a Gaussian process

- x

i

is completely described by its mean m

i

and covariance K

i

- For every finite set of time points: x

i

∼ N (m

i

, K

i

)

- x

i

is observed with i.i.d. Gaussian noise: ˆ

x

i

∼ N (m

i

, K

i

+ σ

obs

2

I )

- We can compute the likelihood:

log p(ˆ

x

i

) = −

1

2

x

i

− m

i

)

>

(K

i

+ σ

2

obs

I )

−1

x

i

− m

i

) + c

i

(10)

The likelihood depends on the promoter state µ

Model:

dx

i

= (A

i

µ

i

(t) + b

i

− λ

i

x

i

)dt + σdw (t)

Likelihood:

log p(ˆ

x

i

) = −

1

2

x

i

− m

i

)

>

(K

i

+ σ

2

obs

I )

−1

x

i

− m

i

) + c

i

Goals (for each gene i ):

1. Find the trajectory µ

i

that maximises the likelihood

(11)

Hybrid approach: Jump3

- Model for gene expression

- Tree-based method for network reconstruction

(12)

Tree-based methods have several advantages

Bagging

Random Forests

Extra-Trees

...

Can deal with interacting features

Non-parametric

Work well with high-dimensional

datasets

(13)

Decision trees are used to predict promoter states

Each interior node tests

the expression of a regulator.

Each

leaf

is

a

prediction

of the promoter state of

the target gene.

Promoter states are not observed.

(14)

Jump trees are learned through maximisation

of the likelihood

Start with µ

1

(t) = 0, ∀t

(15)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

0

1

x

2

(16)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

0

1

c

(17)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

0

1

µ

1

(18)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

0

1

µ

1

(19)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

0

1

µ

1

(20)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

(21)

Jump trees are learned through maximisation

of the likelihood

Candidate split:

µ

1

(t) =

(

0,

if ˆ

x

2

(t) < c

1,

if ˆ

x

2

(t) ≥ c

L = −2.35

(22)

Jump trees are learned through maximisation

of the likelihood

μ

1

=0

x

4

<0.8

μ

1

=1

T

A

T

B

Select the split with

the highest likelihood

(23)

Jump trees are learned through maximisation

of the likelihood

μ

1

=0

x

4

<0.8

x

2

<0.3

μ

1

=1

μ

1

=0

T

C

T

D

T

A

Repeat the procedure

for each child node

(24)

Jump trees are learned through maximisation

of the likelihood

μ

1

=0

x

4

<0.8

x

5

<0.4

x

2

<0.3

μ

1

=1

μ

1

=0

μ

1

=1

T

E

T

F

T

C

T

D

Repeat the procedure

for each child node

(25)

Jump trees are learned through maximisation

of the likelihood

μ

1

=0

x

4

<0.8

x

5

<0.4

x

2

<0.3

μ

1

=1

μ

1

=0

μ

1

=1

Stop when the likelihood

can not be increased

(26)

An ensemble of randomised trees is constructed

Randomise x

i

and c

µ

1

(t) =

(

0,

if ˆ

x

i

(t) < c

1,

if ˆ

x

i

(t) ≥ c

Extra-Trees

(Geurts et al., Machine Learning, 2006)

:

- At each node, the best split is chosen among K random splits.

- The prediction of µ(t) is averaged over the trees.

(27)

The tree-based model is informative

The learned model can be used to find the most relevant inputs.

1

10

20

0

0.25

Input variables

Importance

(28)

The variable importance is based on likelihood increase

At each tree node N :

I (N ) = L

after

− L

before

L

after

: likelihood after the split

L

before

: likelihood before the split

Importance of regulator x

i

:

sum of I values over the nodes where x

i

appears

Weight of edge gene i → gene j :

(29)

Jump3 predicts the states and the network topology

0 100 200 300 400 500 600 700 800 900 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

time

E

xp

re

ss

io

n

for each

target gene G

i

Learn tree-based

model predicting μ

i

time

μi

Predict μ

i

(t)

Identify regulators of μ

i

Potential regulators

Score

G

1

G

1

G

2

...

G

p

G

2

G

p

G

i

...

0.56

0.01

...

0.32

(30)

Hybrid approach: Jump3

- Model for gene expression

- Tree-based method for network reconstruction

(31)

Jump3 is competitive with existing methods

On/off model

Net1

Net2

Net3

Net4

Net5

0

0.1

0.2

0.3

0.4

0.5

AUPR

Jump3

GENIE3−lag

CLR−lag

Inferelator

TSNI

G1DBN

(32)

Jump3 is competitive with existing methods

DREAM4 model

Net1

Net2

Net3

Net4

Net5

0

0.1

0.2

0.3

0.4

AUPR

Jump3

GENIE3−lag

CLR−lag

Inferelator

TSNI

G1DBN

(33)

We used Jump3 to infer the IFNγ network

pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pppp pp pp pppp pp pppp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pppp pp pp pp pp pp pp pp pp pp pppppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp Ptchd3 2810433K01Rik Cks2 Bmyc Pola1 Chrne Camk1g Olfr914 Ms4a2 EG330963 Lrrn2 Psd3 1700007N14Rik Umps Nola3 Atg3 Nap1l3 Nucks1 Sh2b1 Ilkap Olfr17 Grb14 Ppm1l Mtus1 Klf9 Rasgef1a Mki67 Spert Hspa1a Plscr4 Nfil3 Lrrcc1 Map3k8 Sgk Entpd3 Rgs1 Flrt3 Olfr26 Klhl18 Ovch2 Ms4a4b Bst2 Aif1 Olfr1472 Atcay Lce5aPak6 Olfr906Ubap2l Zc3h10 Hrasls5 Spint4 Sorbs3 Timp2 Tbl2 Usp11 Myod1 Crtc3 Rabl2a Olfr1048 Il2rg Arg2 Cd83 Fdft1 Spag5 Socs3 2010106G01Rik Clec3b Ms4a6c Ly6i Madcam1 Egr2 Nkx1−2Usp24 Gdap10 Irf1 Ribc1 Drd3 Oxct1 4930431B09Rik Insig1 D630004A14Rik Wnk1 1700106J16Rik Gmpr Tmem158 Pde4b Lyl1 H2−DMb1 Ctla2a Tnfrsf14 Pdgfrl Gsh1 1810009J06Rik Itgb1bp3 Serpinb6b Plxna1 1500031H01Rik 2210418O10Rik Olfr166 Nsun2 2410127E16Rik Cd47 Nxph4 Olfr218 Atp7b 1700011L22Rik A830039N20Rik Crhr2 Adam3 Dock1 Tuba−rs1 MGC58416 Mag Nr5a1 Egr1 Sema4d Ankrd25 Cyp4x1 Cep68 Tspyl1 Cish Il27ra Stmn3 Tagap1 Foxh1 1700020L24Rik 4930597L12Rik Duox1 Depdc1b H2−T22 Casc5 Fgd4 Itpr1 Pik3ap1 Fbxo27 9030418K01Rik Plcl1 Clstn3 6330416L11Rik H2−Q7 Rnd3 Hpd Il18bp H2−Q10 Psors1c2 Gabrd Krt85 Tlx2 Ube2m Malt1 Klf4 Csh2 1500041B16Rik Sesn2 Sprr2e Epb4.9 Asxl2 Actc1 Serpina3m Slc25a22 4921510H08Rik TymsOlfr187 C1qtnf4 Scn10a AI118078 Mrps7 Fbxo32 Il17rePpp1r3d Acot4 Ugt2b34 Ngb Ccnd1 Fat4 Sod2 Eif2ak2 Fcgr3a Daxx C1r Irf7 Ifi204 1810063B05Rik BC010584 Adrb1 Srgap3 Dusp5Rest Cd69

Peli1 Slco4a1 A630077B13Rik Ccrl2 1810029B16Rik Fcgr2b Il15 2410012M07Rik Zcwpw2 Bhmt Slc16a2 9230112D13Rik Nptx1 Nr3c1 Avpr2 Tmem71 Rsnl2 Mbl2 Csprs Nsg2 Olfr1426 Atoh1 Sox13 Sp100 Cenpp Gbp2 Lhcgr Chst7 Slamf8 Fezf2 Nod1 AW214353 Olfr845 Ifit1 Msh2 Acsl3 4930431B11Rik Tcfec Tubb2b Olfr1341 Tor3a L3mbtl2 Serpina3g Ccnd2 Spinlw1 Ccl12 Tmc7 Pcnx Plekha4 Rapgef5 C3 Mertk LOC667373 Tmcc3 Slc2a6 D14Ertd500e Slc15a3 C130026I21Rik Lmx1a Fcgr1 LOC630437 Isg15 5830484A20Rik Ifi44 AI447904 Adamts15 BC011209 Faah Odf2 Il13ra1 Tnfaip8l3 Samd9l Slamf7 Lnpep Cspg2 Mx1 Ifih1 Lrrc4 Parp8 Bace1 Pbk Iigp1 Hoxc6 5031414D18Rik 4933426K21Rik 2410025L10Rik Asb13 Lhx2 BC013672 Ddx4 Phf11 Ccl5 Ncapg2 Msi2 Irf2 Iigp2 Prss27 Cxcl9 Pou3f1 Il1b Trex1 Irg1 Ch25h Tnfsf10 Tyki Slc16a6 Tbc1d15 Ctsc Krtap13−1 Bgn 2310047D13Rik Olfr1329 Gca Nfe2l1 Psmb9 Klrk1 Usp18 Mx2 Il6 Serpine1 Nfxl1 Cab39l ORF9 Cxcl2 Stat2 Tcf19 Ifnb1 Ncapd3 Katna1 Fnip1 Ptgs2 Sgol1 Dpf2 Gpr84 Mmp14 Trpc1 Acsl1 Bcl2a1b Gpr65 Fabp2 9030625A04Rik Edg2 Lonrf3 Mxd1Olfr190 Lrp12 Ppm1k 1700040I03Rik Tap1 B430201A12Rik Cald1 Abcg1 Mobp 2600005O03Rik R3hdm1 Olfr981 Vip Hs3st1 Nr1h4 Olfr1223 Dnahc10 Olfr354 Mcm3 Zc3h15 Hectd3 Sox10 UngRsad2 Cd163LOC671302 2310034C09Rik Adarb1 Extl3 Ppp2r5e Jak2 Gvin1 Cxcl16Brca2 Trim30 Tfdp2 1200002N14Rik 9930105H17Rik Oasl1 Parp12 D11Ertd759e March1 Cdca5 Rnase6Pnp2700019D07Rik Parp111110038F14RikRhbdf2 D4Ertd429e Ifit2 Egr4 Zfp503 Mmp2 Gsr Olfr174 Nat13 Bach1 Tnfrsf18 Dio3as Gbp1 Ifi47 Msh6 Flcn Recc1 Trim47 Phf17 Tlr2 Irf5 Pou3f2 Dlx4 Mt2 Ccl7Irgm 1110002H13Rik Rap1gap Mc2r Nedd4 A030007E19Rik 2010300C02RikC030045D06Rik 9930032O22Rik Zbtb26 5430435G22Rik Clec9a Aldh1b1 Psrc1 D10627 Tmhs Cd74 Pcgf1 Ahr Zbtb38 2410146L05Rik Slc29a3 Disp2 Nfe2l2 Dbc1 Six6 Slfn4 Ifrg15 Ifit3 Oasl2Lrrc28 Ccdc13 Zyx H28 Pmpca March5 Cnksr3 Mtss1 Speer4d Pbx2 Chst10 C230093N12Rik Tnf BC057170 Zbp1 Gbp4 LOC668139 Ccl8 Fbxo5 Olr1 AU020206 Kif5b Cln8 Ly6c Hspa2 Fgl2 Chl1 Mrg2 Olfr33 Tapbpl

Hubs TFs contain interferon genes, one gene associated with virus

infection, and cancer-associated genes.

(34)

Summary and future work

Summary

Jump3: Semi-parametric model-based method

for network inference and modelling

Can be applied to large-scale networks

Yields good performances on artificial data

Can generate biologically meaningful hypotheses

Future work

Incorporation of model-based prior knowledge (i.e. dynamical

parametric model) within tree-based model.

(35)

References

V. A. Huynh-Thu and G. Sanguinetti.

Combining tree-based and dynamical systems

for the inference of gene regulatory networks.

Bioinformatics 31, 2015.

Software:

(36)
(37)

Mean and variance of the Gaussian process

SDE:

dx = (Aµ(t) + b − λx )dt + σdw (t)

Solution:

x (t) = x (0)e

−λt

+A

Z

t

0

e

−λ(t−τ )

µ(τ )d τ +

b

λ

(1−e

−λt

)+σ

Z

t

0

e

−λ(t−τ )

dw (τ )

Mean:

m(t) = x (0)e

−λt

+ A

Z

t

0

e

−λ(t−τ )

µ(τ )d τ +

b

λ

(1 − e

−λt

)

Covariance:

Cov(x (t), x (t

0

)) =

σ

2

(e

−λ|t−t

0

|

− e

−λ(t+t

0

)

)

(38)

Normalisation

For a single tree:

X

i 6=j

w

i →j

= L

fin

− L

init

w

i →j

: importance of gene i for the prediction of gene j

L

init

: likelihood when µ

j

(t) = 0, ∀t

L

fin

: likelihood with learned µ

j

(t)

Positive bias for edges towards genes for which

L

fin

− L

init

is high

Normalisation:

w

i →j

Références

Documents relatifs

Simulation of the gene network reconstruction based on the ARACNE inference algorithm was performed based on the CytoScape software [24] using gene expression profiles of

We describe the temporal evolution considering an initial state in which the element D is present while the others are not. Note that at the second step the element D is

After the clustering step, we set each gene of the N genes of the input dataset as the target gene, and then we performed an exhaustive search that evaluates all possible subsets

Macroscopic scale: when stochastic fluctuations are neglected, the quantities of molecu- les are viewed as derivable functions of time, solution of a system of ODE’s, called

Each author of a GRNI method tested his method in one or several of the following manners: inferring network from real gene expression data and comparison of the inferred network to

An empirical evaluation, using data and networks for three different organ- isms, showed that the use of literature-based prior knowledge can improve both the number of true

Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian net- works identify consistent network features

Abstract— It is shown that a simplified model of genetic regulatory networks, aimed at the study of their generic properties, can shed light on some important