Computational Methods for Systems and Synthetic Biology
François Fages
Inria Saclay – Ile de France
Lifeware project-team
http://lifeware.inria.fr/
Overview of the Lecture
The Boolean semantics of reaction systems potentially applies to
• Very large systems
• Without any knowledge on the reaction rates
How to query the asynchronous non-deterministic Boolean dynamics ? 1. Example of Kohn’s map of the mammalian cell cycle 2. Computation Tree Logic CTL query language
3. State-based model-checking algorithm 4. Symbolic model-checking algorithm
5. Model reduction preserving CTL properties
Cell Cycle: G0 àG1àSynthesisàG2àMitosis
G1: CdK4-CycD S: Cdk2-CycA G2,M: Cdk1-CycA
Cdk6-CycD Cdk1-CycB (MPF)
Sir Paul Nurse Nobel prize 2001
for his work on Cyclins
Mammalian Cell Cycle Control Map [Kohn 99]
Kohn’s map detail for Cdk2
Complexation of cdk2 with cycA and cycE Biocham reaction patterns:
cdk2~$P + cycA-$C => cdk2~$P-cycA-$C where $C in {_,cks1} .
cdk2~$P + cycE~$Q-$C => cdk2~$P-cycE~$Q-$C where $C in {_,cks1} .
p57 + cdk2~$P-cycA-$C => p57-cdk2~$P-cycA-$C where $C in {_, cks1}.
cycE-$C =[cdk2~{p2}-cycE-$S]=> cycE~{T380}-$C
where $S in {_, cks1} and $C in {_, cdk2~?, cdk2~?-cks1}
à 732 reactions, 165 proteins and genes, 532 variables
How to analyze a transition system over 2532 states ? and 22532 sets of states ?
Kohn’s map detail for Cdk2
Complexation of cdk2 with cycA and cycE Biocham reaction patterns:
cdk2~$P + cycA-$C => cdk2~$P-cycA-$C where $C in {_,cks1} .
cdk2~$P + cycE~$Q-$C => cdk2~$P-cycE~$Q-$C where $C in {_,cks1} .
p57 + cdk2~$P-cycA-$C => p57-cdk2~$P-cycA-$C where $C in {_, cks1}.
cycE-$C =[cdk2~{p2}-cycE-$S]=> cycE~{T380}-$C
where $S in {_, cks1} and $C in {_, cdk2~?, cdk2~?-cks1}
à 732 reactions, 165 proteins and genes, 532 variables
How to analyze a transition system over 2532 states ? and 22532 sets of states ? Symbolic model-checking ! Represent a set of states by a Boolean constraint
True: full set of 2532 states, False: empty set, M: set of 2531 states with M present, MÚ¬N : set of 3.2530 states with M present or N absent, etc.
Computation Tree Logic CTL
Temporal logics extend classical logic with modal operators for time and non- determinism. Introduced for program verification by [Pnueli 77]
Kohn’s Map Model-Checking
BIOCHAM NuSMV symbolic model-checker time in seconds [Chabrier Fages 03 cmsb]
Initial state G2 Query: Time:
compiling 29
Reachability G1 EF CycE 2
Reachability G1 EF CycD 1.9
Checkpoint for mitosis complex
¬E (¬Cdc25~{Nterm} U Cdk1~{Thr161}-CycB) 2.2
Oscillations CycA EG ( (EF ¬ CycA) Ù (EF CycA)) 31.8 Osciallations CycB EG ( (EF ¬ CycB) Ù (EF CycB))
false ! (omission of CycB synthesis in Kohn’s map)
6
Kripke Semantics of CTL*
A Kripke structure K=(S,R) is a set S of states with a total relation RÍSxS The truth of a formula f in a state s or on a path p of K is defined by:
s ⊨ f if f is a proposition true in s
s ⊨ E f if there is a path p starting from s such that p ⊨ f s ⊨ A f if for every path p starting from s such that p ⊨ f
p ⊨ f for a state formula f if s ⊨ f where s is the first state of p p ⊨ X f if p1 ⊨ f where p1 is the suffix of p without its first state p ⊨ F f if $ k ≥ 0 such that pk ⊨ f where pk is the kth suffix of p p ⊨ G f if " k ≥ 0, pk ⊨ f
p ⊨ f1 U f2 if $ k ≥ 0 pk ⊨ f2 Ù " j < k pj ⊨ f1 p ⊨ f1 R f2 if " k ≥ 0 pk ⊨ f2 Ú $ j < k pj ⊨ f1
Duality: ¬ Ef = A ¬ f , ¬ Ff = G ¬ f , ¬ Xf = X ¬ f, ¬ (f1 U f2 )= ¬ f1 R ¬ f2
Minimal Set of CTL* Operators
• Logical connectives: Ú
¬
• Path quantifier: E “exists”
• Temporal operators: X “next”
U “until”
Abbreviations:
Ff = true U f “finally”
Af = ¬ E ¬ f “always”
Gf = ¬ F ¬ f “globally”
f1 R f2 = ¬ ( ¬ f1 U ¬ f2) “release”
CTL Fragment of CTL*
In CTL, each temporal operator must be preceded by a path quantifier Any CTL formula is thus a state formula
and can be identified to the set of states which satisfy it f ≃ {sÎS : s ⊨ f } [Emerson 90]
Basis of three operators: EX, EG, EU
• EF f = E(true U f)
• AX f = ¬ EX ¬ f
• AF f = ¬ EG ¬ f
• AG f = ¬ EF ¬ f
• Etc…
LTL Fragment of CTL*
Linear Time Logic (LTL) formulae are of the form Af
where f contains no path quantifier, only temporal operators Basis of two operators: X, U
• The LTL formula A(FG f) is not expressible in CTL AF(AG f) is stronger, e.g. false on p ¬ p p AF(EG f) is weaker, e.g. true on p ¬ p
• The CTL formula EF(AG f) is not expressible in LTL
• LTL and CTL are strict fragments of CTL*
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R?
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R? reachable(P^Q^¬R)
• Can the cell always produce P?
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R? reachable(P^Q^¬R)
• Can the cell always produce P? AG(reachable(P)) About pathways:
• Can the cell reach a set s of (partially described) states while passing by another set of states s2?
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R? reachable(P^Q^¬R)
• Can the cell always produce P? AG(reachable(P)) About pathways:
• Can the cell reach a set s of (partially described) states while passing by another set of states s2? EF(s2^EFs)
• Is it possible to produce P without Q ?
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R? reachable(P^Q^¬R)
• Can the cell always produce P? AG(reachable(P)) About pathways:
• Can the cell reach a set s of (partially described) states while passing by another set of states s2? EF(s2^EFs)
• Is it possible to produce P without Q ? E(¬Q U P)
• Is set of state s2 a necessary checkpoint for reaching set of state s?
checkpoint(s2,s)== ¬E(¬s2 U s)
Biological Properties in CTL (1/3)
About reachability:
• Can the cell produce some protein P? reachable(P)==EF(P)
• Can the cell produce P, Q and not R? reachable(P^Q^¬R)
• Can the cell always produce P? AG(reachable(P)) About pathways:
• Can the cell reach a set s of (partially described) states while passing by another set of states s2? EF(s2^EFs)
• Is it possible to produce P without Q ? E(¬Q U P)
• Is set of state s2 a necessary checkpoint for reaching set of state s?
checkpoint(s2,s)== ¬E(¬s2 U s)
• Is s2 always a checkpoint for s? AG(¬s -> checkpoint(s2,s))
Biological Properties in CTL (2/3)
About stability:
• Is a set of states s a stable state? stable(s)== AG(s)
Biological Properties in CTL (2/3)
About stability:
• Is a set of states s a stable state? stable(s)== AG(s)
• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)
• Can the cell reach a stable state s?
Biological Properties in CTL (2/3)
About stability:
• Is a set of states s a stable state? stable(s)== AG(s)
• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)
• Can the cell reach a stable state s? EF(stable(s)) alternance of path quantifiers EFAG f,
not in Linear Time Logic LTL (fragment without path quantifiers)
• Must the cell reach a stable state s?
Biological Properties in CTL (2/3)
About stability:
• Is a set of states s a stable state? stable(s)== AG(s)
• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)
• Can the cell reach a stable state s? EF(stable(s)) alternance of path quantifiers EFAG f,
not in Linear Time Logic LTL (fragment without path quantifiers)
• Must the cell reach a stable state s? AF(stable(s))
• What are the stable states?
Biological Properties in CTL (2/3)
About stability:
• Is a set of states s a stable state? stable(s)== AG(s)
• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)
• Can the cell reach a stable state s? EF(stable(s)) alternance of path quantifiers EFAG f,
not in Linear Time Logic LTL (fragment without path quantifiers)
• Must the cell reach a stable state s? AF(stable(s))
• What are the stable states? Not expressible in CTL.
needs to combine CTL with search [Chan 00, Calzone-Chabrier-Fages-Soliman 05].
Biological Properties in CTL (3/3)
About durations:
• How long does it take for a molecule to become activated?
• In a given time, how many Cyclins A can be accumulated?
• What is the duration of a given cell cycle’s phase?
Biological Properties in CTL (3/3)
About durations:
• How long does it take for a molecule to become activated?
• In a given time, how many Cyclins A can be accumulated?
• What is the duration of a given cell cycle’s phase?
CTL operators abstract from durations. Time intervals can be modeled in FOL by adding numerical constraints for start times and durations.
Biological Properties in CTL (3/3)
About durations:
• How long does it take for a molecule to become activated?
• In a given time, how many Cyclins A can be accumulated?
• What is the duration of a given cell cycle’s phase?
CTL operators abstract from durations. Time intervals can be modeled in FOL by adding numerical constraints for start times and durations.
About oscillations:
• Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscil(P)== EG((F ¬P) ^ (F P))
temporal operators not preceded by a path operator: CTL* formula approximation in CTL ? oscil(P)== EG((EF ¬P) ^ (EF P))
Oscillations in CTL ?
EG((EF ¬P) ^ (EF P))
• Necessary but not sufficient condition for oscillations:
Oscillations in CTL ?
EG((EF ¬P) ^ (EF P))
• Necessary but not sufficient condition for oscillations without fairness:
P ¬ P
• also with weak fairness (no rule stays continuously fireable without being fired)
Oscillations in CTL ?
EG((EF ¬P) ^ (EF P))
• Necessary but not sufficient condition for oscillations without fairness:
P ¬ P
• also with weak fairness (no rule stays continuously fireable without being fired)
P ¬ P
P,Q
• Needs strong fairness (no rule is infinitely often fireable without being fired)
MAPK Signaling Cascade Model
RAF + RAFK <=> RAF-RAFK.
RAF-RAFK => RAFK + RAF~{p1}.
RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.
RAF~{p1}-RAFPH => RAF + RAFPH.
MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1}
where p2 not in $P.
MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.
MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.
MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.
MEK~{p1}-MEKPH => MEK + MEKPH.
MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.
MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2}
where p2 not in $P.
MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.
MAPK~{p1}-MAPKPH => MAPK + MAPKPH.
MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.
MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.
Temporal Logic Querying of MAPK Cascade
MEK~{p1} is a checkpoint for the cascade, i.e. producing MAPK~{p1,p2}
biocham: checkpoint(MEK~{p1} , MAPK~{p1,p2})
!E(!MEK~{p1} U MAPK~{p1,p2}) is True
The PH complexes are not checkpoints
biocham: checkpoint(MEK~{p1}-MEKPH , MAPK~{p1,p2})
!E(!MEK~{p1}-MEKPH U MAPK~{p1,p2}) is false Step 1 rule 15
Step 2 rule 1 RAF-RAFK present Step 3 rule 21 RAF~{p1} present Step 4 rule 5 MEK-RAF~{p1} present Step 5 rule 24 MEK~{p1} present Step 6 rule 7 MEK~{p1}-RAF~{p1} present Step 7 rule 23 MEK~{p1,p2} present Step 8 rule 13 MAPK-MEK~{p1,p2} present Step 9 rule 27 MAPK~{p1} present Step 10 rule 15 MAPK~{p1}-MEK~{p1,p2} present Step 11 rule 28 MAPK~{p1,p2} present
Automatic Generation of CTL Properties
biocham: add_genCTL.
Ei(reachable(RAF))
Ei(reachable(!(RAF))) Ai(oscil(RAF))
Ei(steady(RAF))
Ai(AG(RAF->checkpoint(RAFK,!(RAF))))
…
Ei(reachable(MAPK~{p1,p2}))
Ei(reachable(!(MAPK~{p1,p2}))) Ai(oscil(MAPK~{p1,p2}))
Ei(steady(!(MAPK~{p1,p2})))
Ai(AG(MAPK~{p1,p2}->checkpoint(MAPKPH,!(MAPK~{p1,p2}))))
Model Reduction preserving CTL Properties
biocham: reduce_model.
1: deleting RAF-RAFK=>RAF+RAFK
2: deleting RAFPH-RAF~{p1}=>RAFPH+RAF~{p1}
3: deleting MEK-RAF~{p1}=>MEK+RAF~{p1
4: deleting MEKPH-MEK~{p1}=>MEKPH+MEK~{p1}
6: deleting MAPKPH-MAPK~{p1}=>MAPKPH+MAPK~{p1}
7: deleting MEK~{p1}-RAF~{p1}=>MEK~{p1}+RAF~{p1}
8: deleting MEKPH-MEK~{p1,p2}=>MEKPH+MEK~{p1,p2}
9: deleting MAPK~{p1}-MEK~{p1,p2}=>MAPK~{p1}+MEK~{p1,p2}
10: deleting MAPKPH-MAPK~{p1,p2}=>MAPKPH+MAPK~{p1,p2}
The reverse reactions are not necessary to satisfy the CTL specification of reachability, checkpoint and oscillations.
Tyson 91’s Model of the Cell Cycle
k1 for _=>Cyclin.
k2*[Cyclin] for Cyclin=>_.
k3*[Cyclin]*[Cdc2~{p1}] for
Cyclin+Cdc2~{p1}=>Cdc2-Cyclin~{p1,p2}.
k4p*[Cdc2-Cyclin~{p1,p2}] for
Cdc2-Cyclin~{p1,p2}=>Cdc2-Cyclin~{p1}.
k4*[Cdc2-Cyclin~{p1}]^2*[Cdc2-Cyclin~{p1,p2}] for
Cdc2-Cyclin~{p1,p2}=[Cdc2-Cyclin~{p1}]=>Cdc2-Cyclin~{p1}.
k5*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1}=>Cdc2-Cyclin~{p1,p2}.
k6*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1}=>Cyclin~{p1}+Cdc2.
k7*[Cyclin~{p1}] for Cyclin~{p1}=>_.
k8*[Cdc2] for Cdc2=>Cdc2~{p1}.
k9*[Cdc2~{p1}] for Cdc2~{p1}=>Cdc2.
Automatic Generation of CTL Properties
Biocham: add_genCTL.
Ei(reachable(Cyclin)).
Ei(reachable(!(Cyclin))).
Ai(oscil(Cyclin)).
Ei(steady(!(Cyclin))).
…
Ei(reachable(Cyclin~{p1})).
Ei(reachable(!(Cyclin~{p1}))).
Ai(oscil(Cyclin~{p1})).
Ei(steady(!(Cyclin~{p1}))).
Ai(AG(!(Cyclin~{p1})->checkpoint(Cdc2-Cyclin~{p1},Cyclin~{p1}))).
Model Reduction preserving CTL Properties
biocham: reduce_model.
1: deleting Cyclin=>_
2: deleting Cdc2-Cyclin~{p1,p2}=[Cdc2-Cyclin~{p1}]=>Cdc2-Cyclin~{p1}
3: deleting Cdc2-Cyclin~{p1}=>Cdc2-Cyclin~{p1,p2}
4: deleting Cdc2~{p1}=>Cdc2
Some degradation, phosphorylation, dephosphorylation and the autocatalysis reaction
are not necessary for the CTL specification
of reachability, checkpoints and pseudo-oscillations.
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to the (immediate) predecessors of states labeled by f
• Add E(f1 U f2 ) to
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to the (immediate) predecessors of states labeled by f
• Add E(f1 U f2 ) to the predecessor states of f2 while they satisfy f1
• Add EG f to
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to the (immediate) predecessors of states labeled by f
• Add E(f1 U f2 ) to the predecessor states of f2 while they satisfy f1
• Add EG f to the states for which there exists a path leading to a non trivial strongly connected component of the subgraph of states satisfying f.
Space and time in
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to the (immediate) predecessors of states labeled by f
• Add E(f1 U f2 ) to the predecessor states of f2 while they satisfy f1
• Add EG f to the states for which there exists a path leading to a non trivial strongly connected component of the subgraph of states satisfying f.
Space and time in O(|K|*|f|), CTL model-checking is Ptime-complete
Basic CTL Model-Checking Algorithm
Dynamic programming algorithm for computing, in a finite Kripke structure K, the set of states satisfying a CTL formula: {sÎK : s ⊨ f }.
Represent K explicitly as a finite graph and iteratively label the nodes with the subformulas of f that are true in that node:
• Add f to the states satisfying f
• Add EF f (EX f) to the (immediate) predecessors of states labeled by f
• Add E(f1 U f2 ) to the predecessor states of f2 while they satisfy f1
• Add EG f to the states for which there exists a path leading to a non trivial strongly connected component of the subgraph of states satisfying f.
Space and time in O(|K|*|f|), CTL model-checking is Ptime-complete
Symbolic CTL Model-Checking Algorithm
• Represent sets of states as a boolean constraint c(V)
Symbolic CTL Model-Checking Algorithm
• Represent sets of states as a boolean constraint c(V)
• Represent the transition relation as a boolean constraint r(V,V’)
Symbolic CTL Model-Checking Algorithm
• Represent sets of states as a boolean constraint c(V)
• Represent the transition relation as a boolean constraint r(V,V’) Let
Symbolic CTL Model-Checking Algorithm
• Represent sets of states as a boolean constraint c(V)
• Represent the transition relation as a boolean constraint r(V,V’) Let
• Least fixed point operator µ(f)=
Ú
ifi(false) greatest fixed point n(f)=Ù
ifi(true)• Dynamic programming: for each subformula f of the formula to prove,
associate the constraint [f] which represents the set of states where it is true
Exercise
Write down the transition constraint for P ¬ P r(P,P’)= {(1,1),(1,0),(0,0)}
Apply the algorithm to 𝜑 = EG((EF ¬P)^(EF P))
• Apply the algorithm to AG(P)
Sub-formulae 𝜓 of 𝜑 Constraint [𝜓] ~ set of states where 𝜓 is true
P P(x)
¬P ¬P(x)
EF P µZ.(P(x) Ú ex(Z)) = false Ú P(x) Ú $x’( r(x,x’)Ù(P(x’) Ú P(x’)) Ú …
= P(x)
EF ¬P µZ.(¬P Ú ex(Z)) = false Ú ¬P(x) Ú $x’( r(x,x’)Ù(¬P(x’) Ú¬P(x’)) Ú …
= ¬P(x) Ú true
= true
EF ¬P Ù EF P P(x)
EG(EF ¬P Ù EF P) nZ.(P Ú ex(Z)) = true Ù P(x) Ù $x’( r(x,x’)Ù(P(x’) ÚP(x’)) Ù… = P(x)
Ordered Binary Decision Diagrams OBDD
Ordered Binary Decision Diagrams OBDD [Bryant 85] are decision graphs
• With fixed ordering of variables by levels
• And compressed in binary graphs with maximum sharing of common subtrees Example: (x⋁¬y)⋀(y⋁¬z)⋀(z⋁¬x) OBDD(x,y,z) :
(x⋁¬z)⋀(z⋁¬y)⋀(y⋁¬x) has the same OBDD(x,y,z) and is indeed equivalent OBDD provide canonical forms for Boolean formulas OBDD decide both SAT in NP and TAUT in co-NP
Logical Paradigm for Systems Biology
Use of model-checking algorithms[Lincoln et al. 02] [Chabrier Fages 03] [Bernot et al. 04]…
Biological process model = State Transition System K Biological property = Temporal Logic Formula φ Model validation = model-checking K, s ⊨? φ Model reduction = model-checking K’?⊂K K’, s ⊨ φ Static experiment design = model-checking K, s? ⊨ φ Model functions = true formulae enumeration K, s ⊨ φ?
Model Inference, dyn. exp. design = constraint solving K?, s? ⊨ φ
Generalizations to quantitative temporal logics
• FO-LTL(Rlin) [Rizk, Batt, F, Soliman 09, Donze Maler 12] parameter search, robustness
• SAT modulo ODE [Gao Clarke 2012] formal verification on parameter range
• Mixed logical-analog specification K?, s? ⊨ reachable(stable(y ≈ 12
3412 )