Using selective pressure to improve protein tridimensional structure prediction

28  Download (0)

Texte intégral

(1)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Using selective pressure to improve protein tridimensional structure prediction

Aude GRELAUD 1 , 2 Jean-Michel MARIN 3 , Christian P.

ROBERT 1 , François RODOLPHE 2

1

Cérémade, Université Paris Dauphine et Laboratoire de statistique, CREST-INSEE

2

Unite Mathématique, Informatique et Génome, INRA

3

INRIA Saclay

MIEP

Hameau de l’Etoile, june 2008

(2)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(3)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Aim

Predict the tridimensional structure of the protein

Knowning amino acid sequence

Met Thr Gln Cys · · · ·

(4)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Existing methods

Experimental methods :

• X-ray cristallography

• Nuclear magnetic resonance spectroscopy

• Cryomicroscopy

# Expensive and slow, but provide the exact 3D structure

Computational methods :

• Based on homologies with proteins of known structure : methods based on sequence similarity, protein threading

De novo prediction

# Gives several possible 3D structures, with no criterion of

choice

(5)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Purpose : build a ranking method based on a phylogenetic stability criterion

• In a 3D structure, amino acids in contact frequently have similar modification tolerances

Criterion : selective pressure sequence

(6)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Data

First step : Estimate the selective pressure sequence

ω 1 , ...ω

n

on a multiple aligment of homologs

Seq : caa agg tgc tta H1 : cat agg tgc gta H2 : cat tgg tgc cta H3 : aat tgg tgc ctg

ω

1

ω

2

ω

3

ω

4

m folding candidates

or

(7)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Statistical tools

• Markov random fields

• ABC (Approximate Bayesian Computation)

• Bayesian model choice

(8)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(9)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Definition

• Markov chain :

• Markov random field : Markov chain generalisation

(10)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Definition (2)

State at a point i only depends on the state of its neighbours n ( i ) :

π( x

i

= j | x

i

) = π( x

i

= j | z

n

(

i

) )

• Hammersley-Clifford theorem :

P ( X = x ) = 1

Z exp (− U ( x )) with

U ( x ) : potential

U ( x ) = ∑

c

C

V

c

( x )

U ( x ) = −θ ∑

(

i

,

j

):

i

sj

1 {

xi

=

xj

}

Z : normalizing constant Z = ∑

x

exp (− U ( x ))

(11)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(12)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

Bayesian modelisation

• Prior distribution :

• (θ) ∼ π(θ)

• Likelihood : ( X |θ) ∼ MRF (θ)

f ( x |θ) = 1

Z θ exp (θ ∑

(

i

,

j

):

i

j

1 {

xi

=

xj

} )

# Target : Posterior distribution of θ

(13)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

Parameter posterior distribution

MCMC methods :

Hastings-Metropolis algorithm :

• Proposal : θ 0p0(

t

) )

• θ (

t

+

1

) = θ 0 with probability min { 1 ,

1 Zθ0

q θ

0

( X )

1

Zθ(t)

q θ

(t)

( X )

p(

t

)0 ) p0(

t

) )

π(θ 0 ) π(θ (

t

) ) }

# Ratio involves intractable normalizing constants Z θ

0

and Z θ

(t)

(14)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

ABC : Approximate Bayesian Computation

• Bayesian inference without using likelihood

Idea : Data sufficiently close provide similar parameter posterior distribution

• What we need :

Simulate data given parameter values

Summary statistics (sufficient)

Calculate closeness between our data (X

0

) and simulated

data (X

i

) : distance between summary statistics

(15)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

ABC Algorithm

Sufficient statistic : S ( X ) = ∑ (

i

,

j

):

i

j

1 {

xi

=

xj

}

Distance : d ( S ( X

0

), S ( X

i

)) = ( S ( X

0

) − S ( X

i

))

2

Algorithm :

• Generate θ

i

∼ π(θ

i

)

• Generate ( X

i

) ∼ MRF

i

)

Calculate d

i

= d ( S ( X

0

), S ( X

i

))

• Accept θ

i

if d

i

< ε

Result : sample of independent draws from f (θ| d < ε)

# Good approximation of f (θ| X 0 )

• In practice, ε is a 1 % quantile of d

(16)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(17)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Bayesian hierarchical modelisation

1 model ←→ 1 neighborhood / 3D structure

• Prior distributions :

s ∼ π( s )

• (θ

s

| s ) ∼ π

s

s

)

• Likelihood :

( X

s

, s ) ∼ MRF

s

, s ) f

s

( x

s

) = 1

Z θ

s

,

s

exp

s

(

i

,

j

):

i

sj

1 {

xi

=

xj

} )

(18)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Bayes factor definition

BF 0 / 1 =

P

(

s

= 0 |

X

)

P

(

s

= 1 |

X

)

P

(

s

= 0 )

P

(

s

= 1 )

=

R f 0 ( X |θ 0 )π 0 (θ 0 ) d θ 0

R f 1 ( X |θ 1 )π 1 (θ 1 ) d θ 1

Interpretation :

BF > 1 : Model 0

BF < 1 : Model 1

• Jeffreys scale :

< 10

2

[ 10

2

, 10

3/2

] [ 10

3/2

, 10

1

] [ 10

1

, 10

1/2

]

> 10

2

[ 10

3/2

, 10

2

] [ 10

1

, 10

3/2

] [ 10

1/2

, 10

1

]

decisive very hard hard substantial

(19)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Another way to write the Bayes factor

P ( S

i

( X ) = s

i

) = ∑

X

:

Si

(

X

)=

s

f

i

( X

i

)

= 1

Z θ

i

,

i

exp

i

s ) card { X : S

i

( X ) = s }

BF

0

/

1

= card { X : S

1

( X ) = s

1

} card { X : S

0

( X ) = s

0

}

R P ( S

0

( X ) = s

0

0

0

0

) d θ

0

R P ( S

1

( X ) = s

1

1

1

1

) d θ

1

(20)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

ABC algorithm

Vector of summary statistics : S ( X ) = S

1

( X ), .. S

m

( X ) avec S

s

( X ) = ∑ (

i

,

j

):

i

sj

1 {

xi

=

xj

}

Distance : d ( S ( X

0

), S ( X

i

)) = ∑

s

( S

s

( X

0

) − S

s

( X

i

))

2

Algorithm :

• Generate s

i

∼ π( s )

• Generate (θ

si

| s

i

) ∼ π

si

si∗

)

• Generate ( X

si∗

,

i

) ∼ MRF

si∗

,

i

)

Calculate d

i

= d ( S ( X

0

), S ( X

i

))

• Accept ( s

i

si∗

) if d

i

< ε

(21)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

• Result : (( s

i

∗ , θ

si∗

)

i

)

# card ( s

i

= 0 )

card ( s

i

= 1 ) estimate of

R P ( S

0

( X ) = s

0

0

0

0

) d θ

0

R P ( S

1

( X ) = s

1

1

1

1

) d θ

1

• Calculate card { X : S 1 ( X ) = s 1 }

card { X : S 0 ( X ) = s 0 } to obtain BF c

(22)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(23)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Models

M0 : iid case, Bernouilli ( p )

f

0

( X |θ, m = 0 ) =

Z1

θ,0

exp ( θ∑

i

1 {

xi

=

1

} )

p =

1

+

expexp

(θ) (θ)

M1 : Markov chain with transition matrix P

f ( X |θ, 1 ) =

exp

(

2

θ ∑

n−1

i=11{xi=xi+1}

)

2

(

1

+

exp

(

2

θ))

n−1

P =

exp

(

2

θ)

1

+

exp

(

2

θ)

1

+

exp1

(

2

θ)

1

1

+

exp

(

2

θ)

exp

(

2

θ)

1

+

exp

(

2

θ)

!

(24)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Parameter estimation

• Comparison ML/ABC estimates : | b θ

abc

− b θ

mv

|

1stQu .

Median Mean

3rdQu .

Var

1 . 68e − 03 3 . 28e − 03 3 . 27e − 03 5 . 001e − 03 3 . 89e − 06

(25)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Model choice

BF c = C

s1 n

− 1

C

ns0

card ( s

i

∗ = 0 ) card ( s

i

= 1 )

BF =

R

exp

0S0

(

X

))

( 1 +

exp

0

))

n

π 0 (θ 0 ) d θ 0

R

exp

( 2 θ

1S1

(

X

))

( 1 +

exp

( 2 θ

1

))

n−1

π 1 (θ 1 ) d θ 1

• Comparison on 10.000 simulated data : BF

BF c

M0 ? M1

M0 4684 2 30

? 158 339 53

M1 5 261 4464

(26)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(27)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Conclusion

Conclusion

• Parameter estimation in MRF is not too expensive using ABC

• BF is a good way to choose between some neighborhoods

Perspectives :

• Estimate / calculate the number of configurations given a value of S

• Apply on biological data

(28)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Figure

Updating...

Références

Sujets connexes :