Combining tree-based and dynamical systems
for the inference of gene regulatory networks
Vˆ
an Anh Huynh-Thu and Guido Sanguinetti
“Network Inference: New Methods and New Data”
3rd September, 2016
Inferring regulatory networks is a challenging problem
There are two main families of methods
Score-based: compute statistical dependencies
between pairs of expression profiles
(e.g. linear correlation)
Target gene
gene 1
gene 2
· · ·
gene p
gene 1
-
0.05
· · ·
0.56
Regulating
gene 2
0.19
-
· · ·
0.03
gene
· · ·
· · ·
· · ·
· · ·
· · ·
gene p
0.11
0.42
· · ·
-→ Fast, but can not make predictions
Model-based: learn a model capturing the dynamics
of the network
(e.g. differential equations)
Hybrid approach: Jump3
- Model for gene expression
- Tree-based method for network reconstruction
Hybrid approach: Jump3
- Model for gene expression
- Tree-based method for network reconstruction
Results
Hybrid approach: Jump3
- Model for gene expression
- Tree-based method for network reconstruction
Results
We use the on/off model of gene expression
For each gene i :
dx
i
= (A
i
µ
i
(t) + b
i
− λ
i
x
i
)dt + σdw (t)
x
i
(t): gene expression
µ
i
(t): promoter activity state (0/1)
We model the expression x
i
as a Gaussian process
- x
i
is completely described by its mean m
i
and covariance K
i
- For every finite set of time points: x
i
∼ N (m
i
, K
i
)
- x
i
is observed with i.i.d. Gaussian noise: ˆ
x
i
∼ N (m
i
, K
i
+ σ
obs
2
I )
- We can compute the likelihood:
log p(ˆ
x
i
) = −
1
2
(ˆ
x
i
− m
i
)
>
(K
i
+ σ
2
obs
I )
−1
(ˆ
x
i
− m
i
) + c
i
The likelihood depends on the promoter state µ
Model:
dx
i
= (A
i
µ
i
(t) + b
i
− λ
i
x
i
)dt + σdw (t)
Likelihood:
log p(ˆ
x
i
) = −
1
2
(ˆ
x
i
− m
i
)
>
(K
i
+ σ
2
obs
I )
−1
(ˆ
x
i
− m
i
) + c
i
Goals (for each gene i ):
1. Find the trajectory µ
i
that maximises the likelihood
Hybrid approach: Jump3
- Model for gene expression
- Tree-based method for network reconstruction
Tree-based methods have several advantages
Bagging
Random Forests
Extra-Trees
...
Can deal with interacting features
Non-parametric
Work well with high-dimensional
datasets
Decision trees are used to predict promoter states
Each interior node tests
the expression of a regulator.
Each
leaf
is
a
prediction
of the promoter state of
the target gene.
Promoter states are not observed.
Jump trees are learned through maximisation
of the likelihood
Start with µ
1
(t) = 0, ∀t
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
0
1
x
2
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
0
1
c
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
0
1
µ
1
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
0
1
µ
1
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
0
1
µ
1
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
Jump trees are learned through maximisation
of the likelihood
Candidate split:
µ
1
(t) =
(
0,
if ˆ
x
2
(t) < c
1,
if ˆ
x
2
(t) ≥ c
L = −2.35
Jump trees are learned through maximisation
of the likelihood
μ
1
=0
x
4
<0.8
μ
1
=1
T
A
T
B
Select the split with
the highest likelihood
Jump trees are learned through maximisation
of the likelihood
μ
1
=0
x
4
<0.8
x
2
<0.3
μ
1
=1
μ
1
=0
T
C
T
D
T
A
Repeat the procedure
for each child node
Jump trees are learned through maximisation
of the likelihood
μ
1
=0
x
4
<0.8
x
5
<0.4
x
2
<0.3
μ
1
=1
μ
1
=0
μ
1
=1
T
E
T
F
T
C
T
D
Repeat the procedure
for each child node
Jump trees are learned through maximisation
of the likelihood
μ
1
=0
x
4
<0.8
x
5
<0.4
x
2
<0.3
μ
1
=1
μ
1
=0
μ
1
=1
Stop when the likelihood
can not be increased
An ensemble of randomised trees is constructed
Randomise x
i
and c
µ
1
(t) =
(
0,
if ˆ
x
i
(t) < c
1,
if ˆ
x
i
(t) ≥ c
Extra-Trees
(Geurts et al., Machine Learning, 2006)
:
- At each node, the best split is chosen among K random splits.
- The prediction of µ(t) is averaged over the trees.
The tree-based model is informative
The learned model can be used to find the most relevant inputs.
1
10
20
0
0.25
Input variables
Importance
The variable importance is based on likelihood increase
At each tree node N :
I (N ) = L
after
− L
before
L
after
: likelihood after the split
L
before
: likelihood before the split
Importance of regulator x
i
:
sum of I values over the nodes where x
i
appears
Weight of edge gene i → gene j :
Jump3 predicts the states and the network topology
0 100 200 300 400 500 600 700 800 900 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9time
E
xp
re
ss
io
n
for each
target gene G
i
Learn tree-based
model predicting μ
i
time
μi
Predict μ
i
(t)
Identify regulators of μ
i
Potential regulators
Score
G
1
G
1
G
2
...
G
p
G
2
G
p
G
i
...
0.56
0.01
...
0.32
Hybrid approach: Jump3
- Model for gene expression
- Tree-based method for network reconstruction
Jump3 is competitive with existing methods
On/off model
Net1
Net2
Net3
Net4
Net5
0
0.1
0.2
0.3
0.4
0.5
AUPR
Jump3
GENIE3−lag
CLR−lag
Inferelator
TSNI
G1DBN
Jump3 is competitive with existing methods
DREAM4 model
Net1
Net2
Net3
Net4
Net5
0
0.1
0.2
0.3
0.4
AUPR
Jump3
GENIE3−lag
CLR−lag
Inferelator
TSNI
G1DBN
We used Jump3 to infer the IFNγ network
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pppp pp pp pppp pp pppp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pppp pp pp pp pp pp pp pp pp pp pppppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp Ptchd3 2810433K01Rik Cks2 Bmyc Pola1 Chrne Camk1g Olfr914 Ms4a2 EG330963 Lrrn2 Psd3 1700007N14Rik Umps Nola3 Atg3 Nap1l3 Nucks1 Sh2b1 Ilkap Olfr17 Grb14 Ppm1l Mtus1 Klf9 Rasgef1a Mki67 Spert Hspa1a Plscr4 Nfil3 Lrrcc1 Map3k8 Sgk Entpd3 Rgs1 Flrt3 Olfr26 Klhl18 Ovch2 Ms4a4b Bst2 Aif1 Olfr1472 Atcay Lce5aPak6 Olfr906Ubap2l Zc3h10 Hrasls5 Spint4 Sorbs3 Timp2 Tbl2 Usp11 Myod1 Crtc3 Rabl2a Olfr1048 Il2rg Arg2 Cd83 Fdft1 Spag5 Socs3 2010106G01Rik Clec3b Ms4a6c Ly6i Madcam1 Egr2 Nkx1−2Usp24 Gdap10 Irf1 Ribc1 Drd3 Oxct1 4930431B09Rik Insig1 D630004A14Rik Wnk1 1700106J16Rik Gmpr Tmem158 Pde4b Lyl1 H2−DMb1 Ctla2a Tnfrsf14 Pdgfrl Gsh1 1810009J06Rik Itgb1bp3 Serpinb6b Plxna1 1500031H01Rik 2210418O10Rik Olfr166 Nsun2 2410127E16Rik Cd47 Nxph4 Olfr218 Atp7b 1700011L22Rik A830039N20Rik Crhr2 Adam3 Dock1 Tuba−rs1 MGC58416 Mag Nr5a1 Egr1 Sema4d Ankrd25 Cyp4x1 Cep68 Tspyl1 Cish Il27ra Stmn3 Tagap1 Foxh1 1700020L24Rik 4930597L12Rik Duox1 Depdc1b H2−T22 Casc5 Fgd4 Itpr1 Pik3ap1 Fbxo27 9030418K01Rik Plcl1 Clstn3 6330416L11Rik H2−Q7 Rnd3 Hpd Il18bp H2−Q10 Psors1c2 Gabrd Krt85 Tlx2 Ube2m Malt1 Klf4 Csh2 1500041B16Rik Sesn2 Sprr2e Epb4.9 Asxl2 Actc1 Serpina3m Slc25a22 4921510H08Rik TymsOlfr187 C1qtnf4 Scn10a AI118078 Mrps7 Fbxo32 Il17rePpp1r3d Acot4 Ugt2b34 Ngb Ccnd1 Fat4 Sod2 Eif2ak2 Fcgr3a Daxx C1r Irf7 Ifi204 1810063B05Rik BC010584 Adrb1 Srgap3 Dusp5Rest Cd69Peli1 Slco4a1 A630077B13Rik Ccrl2 1810029B16Rik Fcgr2b Il15 2410012M07Rik Zcwpw2 Bhmt Slc16a2 9230112D13Rik Nptx1 Nr3c1 Avpr2 Tmem71 Rsnl2 Mbl2 Csprs Nsg2 Olfr1426 Atoh1 Sox13 Sp100 Cenpp Gbp2 Lhcgr Chst7 Slamf8 Fezf2 Nod1 AW214353 Olfr845 Ifit1 Msh2 Acsl3 4930431B11Rik Tcfec Tubb2b Olfr1341 Tor3a L3mbtl2 Serpina3g Ccnd2 Spinlw1 Ccl12 Tmc7 Pcnx Plekha4 Rapgef5 C3 Mertk LOC667373 Tmcc3 Slc2a6 D14Ertd500e Slc15a3 C130026I21Rik Lmx1a Fcgr1 LOC630437 Isg15 5830484A20Rik Ifi44 AI447904 Adamts15 BC011209 Faah Odf2 Il13ra1 Tnfaip8l3 Samd9l Slamf7 Lnpep Cspg2 Mx1 Ifih1 Lrrc4 Parp8 Bace1 Pbk Iigp1 Hoxc6 5031414D18Rik 4933426K21Rik 2410025L10Rik Asb13 Lhx2 BC013672 Ddx4 Phf11 Ccl5 Ncapg2 Msi2 Irf2 Iigp2 Prss27 Cxcl9 Pou3f1 Il1b Trex1 Irg1 Ch25h Tnfsf10 Tyki Slc16a6 Tbc1d15 Ctsc Krtap13−1 Bgn 2310047D13Rik Olfr1329 Gca Nfe2l1 Psmb9 Klrk1 Usp18 Mx2 Il6 Serpine1 Nfxl1 Cab39l ORF9 Cxcl2 Stat2 Tcf19 Ifnb1 Ncapd3 Katna1 Fnip1 Ptgs2 Sgol1 Dpf2 Gpr84 Mmp14 Trpc1 Acsl1 Bcl2a1b Gpr65 Fabp2 9030625A04Rik Edg2 Lonrf3 Mxd1Olfr190 Lrp12 Ppm1k 1700040I03Rik Tap1 B430201A12Rik Cald1 Abcg1 Mobp 2600005O03Rik R3hdm1 Olfr981 Vip Hs3st1 Nr1h4 Olfr1223 Dnahc10 Olfr354 Mcm3 Zc3h15 Hectd3 Sox10 UngRsad2 Cd163LOC671302 2310034C09Rik Adarb1 Extl3 Ppp2r5e Jak2 Gvin1 Cxcl16Brca2 Trim30 Tfdp2 1200002N14Rik 9930105H17Rik Oasl1 Parp12 D11Ertd759e March1 Cdca5 Rnase6Pnp2700019D07Rik Parp111110038F14RikRhbdf2 D4Ertd429e Ifit2 Egr4 Zfp503 Mmp2 Gsr Olfr174 Nat13 Bach1 Tnfrsf18 Dio3as Gbp1 Ifi47 Msh6 Flcn Recc1 Trim47 Phf17 Tlr2 Irf5 Pou3f2 Dlx4 Mt2 Ccl7Irgm 1110002H13Rik Rap1gap Mc2r Nedd4 A030007E19Rik 2010300C02RikC030045D06Rik 9930032O22Rik Zbtb26 5430435G22Rik Clec9a Aldh1b1 Psrc1 D10627 Tmhs Cd74 Pcgf1 Ahr Zbtb38 2410146L05Rik Slc29a3 Disp2 Nfe2l2 Dbc1 Six6 Slfn4 Ifrg15 Ifit3 Oasl2Lrrc28 Ccdc13 Zyx H28 Pmpca March5 Cnksr3 Mtss1 Speer4d Pbx2 Chst10 C230093N12Rik Tnf BC057170 Zbp1 Gbp4 LOC668139 Ccl8 Fbxo5 Olr1 AU020206 Kif5b Cln8 Ly6c Hspa2 Fgl2 Chl1 Mrg2 Olfr33 Tapbpl