• Aucun résultat trouvé

Using selective pressure to improve protein tridimensional structure prediction

N/A
N/A
Protected

Academic year: 2022

Partager "Using selective pressure to improve protein tridimensional structure prediction"

Copied!
28
0
0

Texte intégral

(1)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Using selective pressure to improve protein tridimensional structure prediction

Aude GRELAUD 1 , 2 Jean-Michel MARIN 3 , Christian P.

ROBERT 1 , François RODOLPHE 2

1

Cérémade, Université Paris Dauphine et Laboratoire de statistique, CREST-INSEE

2

Unite Mathématique, Informatique et Génome, INRA

3

INRIA Saclay

MIEP

Hameau de l’Etoile, june 2008

(2)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(3)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Aim

Predict the tridimensional structure of the protein

Knowning amino acid sequence

Met Thr Gln Cys · · · ·

(4)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Existing methods

Experimental methods :

• X-ray cristallography

• Nuclear magnetic resonance spectroscopy

• Cryomicroscopy

# Expensive and slow, but provide the exact 3D structure

Computational methods :

• Based on homologies with proteins of known structure : methods based on sequence similarity, protein threading

De novo prediction

# Gives several possible 3D structures, with no criterion of

choice

(5)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Purpose : build a ranking method based on a phylogenetic stability criterion

• In a 3D structure, amino acids in contact frequently have similar modification tolerances

Criterion : selective pressure sequence

(6)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Data

First step : Estimate the selective pressure sequence

ω 1 , ...ω

n

on a multiple aligment of homologs

Seq : caa agg tgc tta H1 : cat agg tgc gta H2 : cat tgg tgc cta H3 : aat tgg tgc ctg

ω

1

ω

2

ω

3

ω

4

m folding candidates

or

(7)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Statistical tools

• Markov random fields

• ABC (Approximate Bayesian Computation)

• Bayesian model choice

(8)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(9)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Definition

• Markov chain :

• Markov random field : Markov chain generalisation

(10)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Definition (2)

State at a point i only depends on the state of its neighbours n ( i ) :

π( x

i

= j | x

i

) = π( x

i

= j | z

n

(

i

) )

• Hammersley-Clifford theorem :

P ( X = x ) = 1

Z exp (− U ( x )) with

U ( x ) : potential

U ( x ) = ∑

c

C

V

c

( x )

U ( x ) = −θ ∑

(

i

,

j

):

i

sj

1 {

xi

=

xj

}

Z : normalizing constant Z = ∑

x

exp (− U ( x ))

(11)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(12)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

Bayesian modelisation

• Prior distribution :

• (θ) ∼ π(θ)

• Likelihood : ( X |θ) ∼ MRF (θ)

f ( x |θ) = 1

Z θ exp (θ ∑

(

i

,

j

):

i

j

1 {

xi

=

xj

} )

# Target : Posterior distribution of θ

(13)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

Parameter posterior distribution

MCMC methods :

Hastings-Metropolis algorithm :

• Proposal : θ 0p0(

t

) )

• θ (

t

+

1

) = θ 0 with probability min { 1 ,

1 Zθ0

q θ

0

( X )

1

Zθ(t)

q θ

(t)

( X )

p(

t

)0 ) p0(

t

) )

π(θ 0 ) π(θ (

t

) ) }

# Ratio involves intractable normalizing constants Z θ

0

and Z θ

(t)

(14)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

ABC : Approximate Bayesian Computation

• Bayesian inference without using likelihood

Idea : Data sufficiently close provide similar parameter posterior distribution

• What we need :

Simulate data given parameter values

Summary statistics (sufficient)

Calculate closeness between our data (X

0

) and simulated

data (X

i

) : distance between summary statistics

(15)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

ABC Algorithm

Sufficient statistic : S ( X ) = ∑ (

i

,

j

):

i

j

1 {

xi

=

xj

}

Distance : d ( S ( X

0

), S ( X

i

)) = ( S ( X

0

) − S ( X

i

))

2

Algorithm :

• Generate θ

i

∼ π(θ

i

)

• Generate ( X

i

) ∼ MRF

i

)

Calculate d

i

= d ( S ( X

0

), S ( X

i

))

• Accept θ

i

if d

i

< ε

Result : sample of independent draws from f (θ| d < ε)

# Good approximation of f (θ| X 0 )

• In practice, ε is a 1 % quantile of d

(16)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution

Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(17)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Bayesian hierarchical modelisation

1 model ←→ 1 neighborhood / 3D structure

• Prior distributions :

s ∼ π( s )

• (θ

s

| s ) ∼ π

s

s

)

• Likelihood :

( X

s

, s ) ∼ MRF

s

, s ) f

s

( x

s

) = 1

Z θ

s

,

s

exp

s

(

i

,

j

):

i

sj

1 {

xi

=

xj

} )

(18)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Bayes factor definition

BF 0 / 1 =

P

(

s

= 0 |

X

)

P

(

s

= 1 |

X

)

P

(

s

= 0 )

P

(

s

= 1 )

=

R f 0 ( X |θ 0 )π 0 (θ 0 ) d θ 0

R f 1 ( X |θ 1 )π 1 (θ 1 ) d θ 1

Interpretation :

BF > 1 : Model 0

BF < 1 : Model 1

• Jeffreys scale :

< 10

2

[ 10

2

, 10

3/2

] [ 10

3/2

, 10

1

] [ 10

1

, 10

1/2

]

> 10

2

[ 10

3/2

, 10

2

] [ 10

1

, 10

3/2

] [ 10

1/2

, 10

1

]

decisive very hard hard substantial

(19)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

Another way to write the Bayes factor

P ( S

i

( X ) = s

i

) = ∑

X

:

Si

(

X

)=

s

f

i

( X

i

)

= 1

Z θ

i

,

i

exp

i

s ) card { X : S

i

( X ) = s }

BF

0

/

1

= card { X : S

1

( X ) = s

1

} card { X : S

0

( X ) = s

0

}

R P ( S

0

( X ) = s

0

0

0

0

) d θ

0

R P ( S

1

( X ) = s

1

1

1

1

) d θ

1

(20)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

ABC algorithm

Vector of summary statistics : S ( X ) = S

1

( X ), .. S

m

( X ) avec S

s

( X ) = ∑ (

i

,

j

):

i

sj

1 {

xi

=

xj

}

Distance : d ( S ( X

0

), S ( X

i

)) = ∑

s

( S

s

( X

0

) − S

s

( X

i

))

2

Algorithm :

• Generate s

i

∼ π( s )

• Generate (θ

si

| s

i

) ∼ π

si

si∗

)

• Generate ( X

si∗

,

i

) ∼ MRF

si∗

,

i

)

Calculate d

i

= d ( S ( X

0

), S ( X

i

))

• Accept ( s

i

si∗

) if d

i

< ε

(21)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice

Simulations Conclusion

• Result : (( s

i

∗ , θ

si∗

)

i

)

# card ( s

i

= 0 )

card ( s

i

= 1 ) estimate of

R P ( S

0

( X ) = s

0

0

0

0

) d θ

0

R P ( S

1

( X ) = s

1

1

1

1

) d θ

1

• Calculate card { X : S 1 ( X ) = s 1 }

card { X : S 0 ( X ) = s 0 } to obtain BF c

(22)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(23)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Models

M0 : iid case, Bernouilli ( p )

f

0

( X |θ, m = 0 ) =

Z1

θ,0

exp ( θ∑

i

1 {

xi

=

1

} )

p =

1

+

expexp

(θ) (θ)

M1 : Markov chain with transition matrix P

f ( X |θ, 1 ) =

exp

(

2

θ ∑

n−1

i=11{xi=xi+1}

)

2

(

1

+

exp

(

2

θ))

n−1

P =

exp

(

2

θ)

1

+

exp

(

2

θ)

1

+

exp1

(

2

θ)

1

1

+

exp

(

2

θ)

exp

(

2

θ)

1

+

exp

(

2

θ)

!

(24)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Parameter estimation

• Comparison ML/ABC estimates : | b θ

abc

− b θ

mv

|

1stQu .

Median Mean

3rdQu .

Var

1 . 68e − 03 3 . 28e − 03 3 . 27e − 03 5 . 001e − 03 3 . 89e − 06

(25)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Model choice

BF c = C

s1 n

− 1

C

ns0

card ( s

i

∗ = 0 ) card ( s

i

= 1 )

BF =

R

exp

0S0

(

X

))

( 1 +

exp

0

))

n

π 0 (θ 0 ) d θ 0

R

exp

( 2 θ

1S1

(

X

))

( 1 +

exp

( 2 θ

1

))

n−1

π 1 (θ 1 ) d θ 1

• Comparison on 10.000 simulated data : BF

BF c

M0 ? M1

M0 4684 2 30

? 158 339 53

M1 5 261 4464

(26)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1 Context

2 Markov random fields

3 Parameter posterior distribution

4 Model choice

5 Simulations

6 Conclusion

(27)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Conclusion

Conclusion

• Parameter estimation in MRF is not too expensive using ABC

• BF is a good way to choose between some neighborhoods

Perspectives :

• Estimate / calculate the number of configurations given a value of S

• Apply on biological data

(28)

improve protein tridimensional structure prediction

Aude GRELAUD

Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Références

Documents relatifs

In section 2 we describe the template model based on secondary structure definition, the scoring function based on knowledge-based potentials and pseudo-energies, the search

Several methodologies can be carried out to obtain an LTI model. In this work, special focus is given on improved frequency analysis [6], for obtaining

melles plus complexes qui sont soit en zigzag, dans le sens de la profondeur (= dans la profondeur de la gomme) avec un effet autoblocant, soit en Y favorisant l'adhérence du

indiquant la durée de chaque régime. b- Déduire la valeur de E. c- Déterminer la valeur de la constante de temps   par deux méthodes. d- En déduire la valeur de capacité

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The nuclear modification factor RAA relative to p+p collisions shows a strong suppression in central Au+Au collisions, indicating substantial energy loss of heavy quarks in the

It appears that, for those companies or their clients that employ welding and related technologies in mining, about 48% of them are involved in weld surfacing for repair,

maintenance for municipal roads, says Mike Sheflin, who chairs the technical committee on municipal roads and sidewalks for the National Guide to Sustainable Municipal