• Aucun résultat trouvé

Some physical approaches to protein folding

N/A
N/A
Protected

Academic year: 2021

Partager "Some physical approaches to protein folding"

Copied!
18
0
0

Texte intégral

(1)

HAL Id: jpa-00246718

https://hal.archives-ouvertes.fr/jpa-00246718

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

J. Bascle, T. Garel, Henri Orland

To cite this version:

J. Bascle, T. Garel, Henri Orland. Some physical approaches to protein folding. Journal de Physique

I, EDP Sciences, 1993, 3 (2), pp.259-275. �10.1051/jp1:1993128�. �jpa-00246718�

(2)

J. Phys. I France 3

(1993)

259-275 FEBRUARY 1993, PAGE 259

Classification Physics AbstTacts

05.90 61.40D 87.10

Some physical approaches to protein folding

J.

Bascle,

T. Garel and H. Orland

Service de Physique

Thdorique(*)

CE-Saclay, 91191 Gif-sur-Yvette Cedex, France

(Received

15 May1992, accepted 5 June

1992)

R4sum4. Le repliement des protdines eat un probl+me qui a de nombreuses implications biologiques. Dana cet article, nous pr6sentons, de deux fa&ens difl4rentes, un point de vue de physicien. Nous introduisons tout d'abord des mod+les simples de m4canique statistique qui exhibent, h la limite thermodynamique, des transitions de repliement. Ces mod+les peuvent Atre divis6s en (I) verres de spin

(6ventuellement

k la

Mattis),

al l'on peut chercher des corr41ations entre [es interactions intrachaine et la structure replide,

(it)

verres, al l'on met l'accent sur la

comp4tition g40mdtrique entre l'ordre local uni- au bi-dimensionnel

(qui

modble [es structures

en hdlices a au en feuillets

fl),

et la contrainte

globale

de compacitd. Ces deux types de modbles sent trap simples pour l'dtude de vraies prot4ines, mars its devraient s'appliquer dons le domaine de la transition vitreuse, des polymbres collaps4s,... La deuxibme voie d'dtude eat une m4thode

Monte-Carlo, al

on fait croitre la protdine atome par atome

(au

rdsidu par

rdsidu),

I l'aide d'une forme donnde de

l'dnergie

totale de la protdine

(CHARMM,...).

Cette m4thode pent dtre alors compar4e aux autres m4thodes numdriques; nous comparons ainsi nos rdsultats avec des calculs de dynamique moldculaire pour le cas des poly-alanines. Cette double approche eat une bonne illustration des difficultds que l'on rencontre dons le probl+me du repfiement des protdines (nombreux 4tats m4tastables,...).

Abstract. To understand how a protein folds is a problem which has important

biological

implications. In this article, we would like to present a physics-oriented point of view, which is twofold. First of all, we introduce simple statistical mechanics models which display, in the thermodynamic limit, folding and related transitions. These models can be divided into (I) crude spin glass-like models

(with

their Mattis

analogs),

where one may look for possible correlations

between the chain self-interactions and the folded structure, (it) glass-like models, where one

emphasizes the geometrical competition between one- or two-dimensional local order

(mimicking

a helix or fl sheet

structures),

and the requirement of global compactness. Both models

are too

simple to predict the spatial organization of a realistic protein, but are useful for the physicist

and should have some feedback in other glassy systems

(glasses,

collapsed

polymers,...).

These remarks lead

us to the second physical approach, namely a new Monte-Carlo method, where one grows the protein atom-by-atom

(or residue-by-residue),

using a standard form

(CHARMM,...)

for the total energy. A detailed comparison with other Monte-Carlo schemes, or M61ecular Dynamics calculations, is then possible; we will sketch such

a comparison for poly-alanines.

Our twofold approach illustrates some of the difficulties one encounters in the protein

folding

problem, in particular those associated with the existence of

a large number of metastable states.

(*)

Laboratoire de la Direction des Sciences de la Matibre du Commissariat I l'Energie Atomique

(3)

1 Introduction.

Proteins are

weakly

branched

polymers,

built out of twenty

species

of monomers

(aminoacids).

They

have the property of

folding

into an

(almost) unique

compact native structure, which is the

biological interesting object [I].

The compactness is

largely

due to the existence of the

hydrophobic

aminoacid

residues,

since these

biological objects

are

usually designed

to work

in water. Both the compactness and the chemical

heterogeneity

of a

given protein

tend to

slow down

dynamical

processes, and the

question periodically

arises as to whether the

protein folding problem

in under

thermodynamic

or kinetic control. This

question

is not unfamiliar in

the

physics

of

glassy

systems where the same

problem

of a very

rugged phase

space is present.

In

physical

terms, the frustration in a

protein

can be

naively

described in two different waysi

(I)

the energy of a

protein

is the sum of bonded

(geometrical)

and non-bonded

(Coulomb,

Van Der

Waals)

terms, which cannot be

simultaneously

satisfied.

(ii) experimentally (Cristallography, NMR),

a folded

protein

has a local order due to

hydrogen-bonds.

This order

is, roughly speaking,

of one-dimensional

(a helix)

or twc-dimensional

(fl sheet)

nature and is therefore

incompatible

with the

requirement

of

global

compactness.

In this

ch8pter,

we shall follow the

thermodynamics approach

to

folding; simple

statistical mechanics models will be studied for

points (I)

and

(it).

For the

former,

one is

naturally

lead to draw a

parallel

with the

spin glass problem (the quenched

disorder

being

linked to the

primary structure),

whereas the latter is more akin to

glasses.

Both of these

approaches

have

interesting

outputs. The

spin glass

[2] case and its Mattis

analogs

[3] suggest a connection with the

physics

[4] of neural

networks, Hopfield model,..,

for the

major

unsolved

problem

of the

coding

of the

tertiary

structure in the

primary

structure. The

"glassy glass"

case [5]

points

towards

interesting

differences between helices and

sheets,

and revives

Flory-like

models of

polymer melting,

as well as the Gibbs-Dimarzio

theory

of the

glass

transition

[6,7].

On a more realistic

level,

we also wish to benefit from the

biologists' experience

with their

complicated

systems. It is therefore necessary to go

beyond

the above

qualitative

picture and

study

well-defined entities. We have therefore devised a new Monte-Carlo

(MC)

method to generate a Boltzmannian ensemble of

configurations

of a

protein.

This method uses

an

empirical

form for the total energy of the

protein,

and may be

loosely

described as an atom-

by-atom growth

of the

protein,

in marked contrast to other MC methods [8] or to Molecular

Dynamics

[9]

(MD)

calculations. This

growth procedure

was introduced [10] to try to

efficiently explore

the

rugged landscape

of a

protein phase

space and may be

coupled

to more traditional

techniques (simulated annealing

[11], minimization

procedures,...).

We have tested [12] the method on

peptides

with a small number N of atoms

(alanine dipeptide (N

=

22),

penta-

alanine

(N

=

53)), by comparing

the energy minima with those obtained

by

MD simulations

(CHARMM).

A short review of the same

comparison

[13a] with

hepta-alanine (N

=

73)

is

given below, together

with

preliminary

results [13b] on

twenty-alanine (N

=

203).

At this

point

a caveat seems in order: all the

methods, including

ours, are faced with the

problem

of the

solvent,

and use at best an effective energy function

taking

in a crude way the effect

of the water molecules

((or

instance in the

collapsed regime,

a one-hundred residue

protein

has

something

like half of its atoms on the

surface).

Our simulations are

usually

done with a dielectric constant

equal

to

unity (vacuum-type calculations).

The

layout

of the paper is as follows. Section 2

briefly

deals with the

spin glass-like approach

to the

folding

transition, as well as the related Mattis models

(coding

I la

Hopfield,.. ).

The

"glassy glass"

case is studied in section 3, where the link with the

Flory-Gibbs-Dimarzio theory

of the

glass

transition is discussed. We

emphasize,

in this context, the existence of a disorder

point.

Section 4 describes the new AIC

growth

method and its

application

to

poly-alanines

(Sect. 5).

(4)

N°2 SO&fE PHYSICAL APPROACIIES TO PROTEIN FOLDING 261

2.

Spin glasses

and

folding

transitions.

We model a

protein

as a chain of N links

(residues),

,vhere link I

(at r;)

and link

j (at rj)

interact

through

a

potential

u;j

(r;

rj

),

which

depends

on the chemical nature of links I and

j. Physically,

ujj is

expected

to be

relatively short-ranged (screened

Coulomb or van der lvaals

interactions,.. ).

We take for

simplicity

u;j

(ri

rj = u;j b

(r;

rj

(I)

Two types of model can be studied [2, 3].

(I)

the

spin glass

model: the interactions

(uj)

are taken as

independent

random variables

distributed,

for

instance, according

to a Gaussian distribution

~ ~~"

~#~~~ 12~2

~~'J

°~~) (2)

In

equation (2),

uo denotes the excluded ;olume effect

(in appropriate units).

A

"biological"

interpretation of this approach is the

follo,ving:

the interactions

(u)

bet,veen the same

couples

of

residues,

but at different

places

in the

primary

sequcnce, are

totally

uncorrelated because of their different environment.

(One

may also link this

approach

to the

travelling

salesman

optimization problem [14]).

(ii)

the

separable

model: the interactions

(u;j)

are taken as a sum of AI

separable

terms

AI

~,,

~j

~ ~P~P

(3)

11 P I j

p=i

,vhere the

(f[)

are

taken,

for instance, as

indepciident

mndoni,<ariables with Gaussian distri- butioii.

Apart

from thc excluded iolume

efl'ect,

thc

"charges" (f[;

p

= I,...,

AI)

can represent

the Coulomb

charge,

thc

liydropliolJicity,

the

I;clix-forming

or

breaking tcndency,..

The "bi-

ological" interpretation

is

opposite

to the

prcvious

one:

here,

each residue is dcfiued

by

M

independent "ch;irges",

or chJracters, ii~hicli

dcpends onl»

on its chemical nature and not on its

position along

the

primary

sequence.

In the continuous

limit,

the

partition

functi~n of these models reads [16]

Z =

/ Dr(s)exp (-

~

/~

ds ~~ ~

~ /~ /~

ds ds'u

(s, s')

6

(I(s)

r

(s') )lx

2

o ds 2

o o

~ S S S ~~~

~~~~ 6 ~~

~~' ~~" ~~~~~~ ~~~'~~ ~~~~'~

~~"~~~

The last term is included to avoid a total

collapse

of the chain, its

usual,

the paraineters in

equation (4)

are the space dimension

d,

the in;crsc temperature

fl

= ~,, and S

= Na~

(n.here

a is the common

length

of the

links).

Introducing replicas

[4] to

pcrform quenched averaging

ovcr the disordered intcractions

(v(s,s')),

we get the

following

results

[2,3]

: there are three

phases, namely,

a

high

tem-

perature coil state, an intermediate teniperature

collapsed phase

with a

macroscopic

entropy

(similar

to a

polymer

below tile 8

point),

and

finally

a low temperature

collapsed

frozen

phase.

Detailing

the above

models,

,ve have:

(5)

(I)

the

spin glass

model: the low temperature

phase

is a Potts

glass

with p

- oo states

[16],

at least for

high enough

dimensions. The

(mean field)

order parameter of the

freezing

transition is:

Qnfl (r,

r

')

=

2flv /~

ds

(b (r ra(S))

b

(r

'

rfl(S))) (5)

where I < a <

fl

< n

(and

n -

0),

and (. denotes a thermal average with respect to the

replicated

Hamiltonian of

equation (4). Alternatively,

the

freezing

transition can be studied

by

the

overlaps

of two

(real) copies

of the system [17]. As in

Ising spin glasses

[4], one may argue

that there are few dominant states in the system, which could be

interpreted

in terms of a few dominant folded structures. Numerical calculations

along

these

lines, including dynamics,

have been

recently reported

[18]. Note

that, by construction,

these

"protein

models" possess ultrametric

properties

[19]. For a more realistic case, see section 4.

To conclude on this type of

approach,

it is

interesting

to

point

out that a variational function introduced

by

Shakhnovich and Gutin [20] in the context of

proteins,

has been used in other disordered solid state situations

[21].

(ii)

the

separable

model: when one

performs

the

quenched

average over the

(ff )

in

equation (4),

some mean field order paramters appear

naturally

in the system such as

mp,a

jr)

=

/~

ds

fp

16

(r ra(S))) (6)

In

equation (6),

a is an

unimportant replica

index

(to

be omitted from now

on); (.

and denote

respectively

thermal and disorder averages. These order parameters h la

Hopfield

[22] are a measure of the correlation between the chemical nature of a link

(characterized by

(fp Is))

and its

position

r in space. There is a Mattis-like

freezing phase

transition where some

(mp (r)

p = 1, 2,..

,

Mo)

condense. For these MO

characters,

the

primary

sequence codes for the

spatial

structure of the chain. If there is

only

one

"charge"

or character

(e.g. hydrophc- bicity),

the

folding

transition will translate into a

spatial separation

between

hydrophobic

and

hydropholic

links. In

general,

a Mattis-like transition

implies

a

single

dominant

spatial

structure, with

a

large

number of metastable states

[22(b)].

Note that when M

increases,

at fixed N, we expect a smooth crossover from the

separable

to the

spin glass

model. From

simple qualitative

arguments [23], it can be inferred that in real

proteins,

one should have M

= 8 relevant

(and independent)

characters for each residue. Thus for a =

fl

small

(I.e, long chains),

one deals with the

separable

case, whereas for a

larger (short chains),

the

glassy

model is more

appropriate.

The critical a is of order [22, 23] a~ ct.I

(which gives,

in this

model,

a critical

length

of N~ ct

80).

Similar

coding

schemes

using

Protein Data Banks have been studied

by Wolynes

and coworkers [24].

3. Glasses and

folding

transitions.

3.I THE MODEL. The

energetical

frustration described above is note

quite satisfactory,

since there is no real disorder in

proteins.

We will

see in section 4 that a

commonly

used form of the total energy of a

protein

is the sum of

a bonded

(geometric)

part and of a non- bonded

(Coulomb,

Van der

Waals)

part. In

particular,

the Coulomb part is

responsible,

in this

formalism,

of the formation of

hydrogen

bonds [25] that tend to

locally

stabilize one-

dimensional

(a helix),

or two-dimensional

(fl sheets)

structures. See

figure

I. Since

we know

that the

biologically

active

protein

is compact

[I],

we are

typically

faced with the

problem

(6)

N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 263

of

geometrical frustration,

where local and

global

orders are

incompatible.

This

approach

is familiar in

glasses

where one tries to solve this contradiction

by

a

mapping

onto a curved space

[26].

We choose here a

thermodynamic approach

and

model,

as an

example,

the o helix

case in the

following

way: we consider a d-dimensional

hypercubic

lattice of N =

L~ sites,

with

periodic boundary conditions,

and its associated Hamiltonian

paths.

We recall that a Hamiltonian

path

visits all sites of the lattice once and

only

once. Hamiltonian

paths

have been often used to model

collapsed polymer globules [15]. Following Flory [6a],

we take each link of the Hamiltonian

path

to represent a helical turn. Since

hydrogen-bonds

have a

tendency

to favor

long helices,

that is to

align

the links of our

model,

we attribute an energy

penalty

e to the

breaking

of an helix, that is whenever the Hamiltonian

path

makes a turn

(corner).

This model has attracted a lot of attention in the

theory

of

polymer melting ii, 27].

For

simplicity,

we consider closed

paths, but,

as is well known in

polymer theory [16], boundary

conditions

play

a role

only

in subdominant terms of the free energy. The

partition

function of the system,

at inverse temperature

fl

=

,

reads

z =

~ e-P£N,jl~j

~~~

jl~j

CO

RN

co MN

NH

NH °C

NH DC

NH

(a) (bi

Fig. I. Schematic representation of hydrogen bonds in

(a)

a-helix,

(b) (antiparallel)

p-sheet-

where

(7l)

denotes the ensemble of all Hamiltonian

paths,

and

N~(7l)

denotes the number of

corners present in

path

7l.

Following

reference [28], one may rewrite Z as

f fl$~~

d~an

(r)

e~~G

fl~ (£~ )~aJ (r)

+ e~fl~

£~

~~ ~an

(r)

~a~

(r))

Z = lim

~

(8a)

"-° n

f fl~_~

d~aa

(r)

e-AG

with

AG

=

jj [

~an

(r) (Air ,)~~

~an

(r ') (8bi

~= m

(7)

where ~oa

(r)

is an n-component

(n

=

0)

real

field,

defined in each direction a =

I,..., d,

attached to all

points

r of the lattice. The operator

AQ~,

is I if r and r' are nearest

neighbours

in direction o and 0

otherwise; (AQ~,)~~

denotes its inverse.

Using

Wick's theorem and

extracting, through

the n

= 0 trick

[29],

the contribution of all connected

paths,

it is

easily

shown that

(8a)

and

(7)

are

equal.

Note that in the above

description,

one does not consider the

primary

sequence anymore, in marked contrast to the

approach

of section 2. In the non

weighted problem (e

=

0),

the

saddle-point (SP)

method of reference [28]

yields

Zsp(e

=

0)

=

(~) (9)

e

~

where q

= 2d is the lattice coordination number and e ci 2.71828...

Equation (9)

is in excellent

agreement

with numerical

data,

in marked contrast to the "old"

Flory theory

[6a] which

gives

ZF(e

=

0)

=

(~ (10)

e

~

3. 2 THE HIGH TEMPERATURE ISOTROPIC APPROACH. We have extended the SP

approach

to the model defined in

equations (8).

We get

§ ~~~

'~ ~~ ~~'~

l Co @(~/~~~i~(~i~)~~'

i'fl

(~)

~~~~

At

high

temperature, it is natural to look for a

homogeneous

and

isotropic solution,

~OJ

(r)

= ~o.

We break the

O(n)

symmetry

by choosing

~a in a

given "direction",

say 0, and obtain

~a( =

~

(12a)

and

ZSP(61

=

(M)~ (12bj

with

q(fl)

= 2 +

2(d I)e~~~ (12c)

The "old"

Flory theory [6a]

would

yield

similar results with

ZF(e)

=

~~~~~)~

(13a)

e

where

qF(iii

" 1+

2(d I)e~~~ (13bj

Both

approaches

have the

following properties:

(I)

there exists a temperature TG where the

entropy

vanishes. This

remark,

in the framework of the

Flory theory,

is the basis of the Gibbs-Dimarzio

theory

of the

glass

transition

[6b].

(ii)

before one reaches

TG,

there is a first order

freezing

transition at

Tc,

such that

q

(flc)

= e

(14)

(8)

N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 265

o

-J 5

-2 0

0 2 3

I

Fig. 2. Various approximations to the free energy of the glass model of III

as a function of temper- ature. Curve

(I)

is the "old" Flory theory. Curve (2) is the low temperature anisotropic saddle point result

(with

the disorder point at TD Ci 2.24

e).

Curve

(3)

is the

high

temperature isotropic saddle point result. In all cases, the transition occurs when the free energy vanishes.

The low temperature

phase

is frozen

(Fig. 2),

since it consists of

fully

stretched

paths making

turns at the surface.

Using (12c)

and

(13b),

we get, for d = 3

Tc[~p

ci 0.58 e

(Isa)

for the SP

approach

to model

(8)

and

Tc[~ ci 1.18 e

(lsb)

for

Flory's theory.

However,

as

pointed

out

by Gujrati

and coworkers

iii,

such a

freezing

transition cannot be

thoroughly

correct, since the free energy may be shown to be

strictly negative

at low tem-

peratures. This

(slight)

correction to the

Flory freezing

scenario comes from one dimensional excitations that are not well treated in an

isotropic

SP

approach.

3.3 THE LOW TEMPERATURE ANISOTROPIC APPROACH.

Considering

the above men-

tioned criticism of the

isotropic

SP

approach,

we have considered [30] an

anisotropic approach

to the model described in

(8a)

and

(8b):

we treat

exactly

one direction of the

lattice,

say

I,

and treat the

(d I) remaining

directions in a mean field

(saddle-point) approach.

Using

the fact that the denominator of

equation (8a)

goes to one when n goes to zero, we

rewrite

(8a)

as

Z # llDl

f jj

d§2a

(~) ~~~~ fl (~ )~'? (~)

~ ~~~~

ll

~~ ~~~ "

~~j

~~~~

n-0 n

~_~ r n o<fl

which we

approximate by

Zi ci lim

/

d~ai

(r) e'~ie~ +(~~~)~

~

fl

(~

~a~~(r) + A ~ai

(r)

~l + C ~l

~l(17)

n-0 n 2

~

(9)

~~~~~

A~ =

~

~gi

(r) (A)r ,)

~~1°1

(r') (~~)

~

~,~'

and

A =

(d I)e~~~ (19)

and

C =

~~

~

~~

(l

+

(d 2)e~~~ (20)

In

(17),

~l is the

(mean-field)

value of ~an, a

#

I.

Integrating exactly (17) yields

a free energy per site

fl

ii = ~~ ~~~ ~

Log (l

+ C~I ~ +

((l

+ C~I ~)~ 4

(C A~)

~l

~)

~~j (21)

4 2

Equation (21) exhibits,

at T' ci 0.68 e

(d

=

3),

a first order transition

(Fig. 2)

between a

frozen

phase (cristal)

with ~l = 0 and

a

high

temperature

(liquid) phase

with ~l

#

0. At this

order,

the free energy is zero in the frozen

phase,

but becomes

negative

if fluctuations

(in

~l)

are taken into account

(30].

In any case, the corrections to

Flory's free2ing

picture are weak.

In the

high

temperature

phase however,

we have found [30] a disorder

point

of the second kind [31], where the nature of the correlations

along

direction I

changes.

The disorder

point

TD is

given by

C = A~

(22)

For d

= 3, we get TD t 2.24 e; in a

polymeric chain,

such a disorder

point

is

likely

to have

more severe

dynamic implications

than in usual

spin

systems [32].

3.4 CONCLUSION. We have also considered [33] the case of

fl

sheets and found similar conclusions. The

isotropic

SP

approach

should be "better" in this case since twc-dimensional

long

range order may exist at finite temperature: we

get

a first order

freezing

transition h la

Flory.

The results of these

geometrically

frustrated models should be relevant for other

thermody-

namic systems, such as

glasses [34], polyelectrolytes

in a bad solvent

[35],

chiral

liquid crystals [36],...

For

instance,

it is rather

tempting,

in the case of

glasses,

to

identify

the low temperature

phase

as the

(unreachable) crystal phase,

and to link the disorder

point

with the

glass

transi- tion. In the case of

proteins however,

one deals with finite systems: we thus

cautiously identify

the low temperature

phase

as the native structure, whereas the

high

temperature

phase

looks like a "molten

globule"

[37]. We now consider a more "realistic"

approach,

which will allow us to benefit from the

biologists' experience

with these rather

complicated

systems.

4. The Monte Carlo

growth

method.

4. I INTRODUCTION. As

previously mentioned,

one of the main difficulties of the

protein folding problem

is the existence, in

phase

space, of a

large

number of local minima. Traditional

single

move MC methods are therefore doomed to fail, as

large

collective motions will be necessary to

"untrap"

the chain. One may

improve

these methods

by using

simulated

annealing

procedures,

or any other minimization scheme. We have chosen to devise a new MC

method,

(10)

N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 267

@

-180 J00 20 20 100 180

qidegi

@

180 -loo lo lo loo 180

Q

(degl

Fig. 3. Ramachandran's plots for the third residue of an hepta-alanine chain

(a)

MC results,

(b)

MD results.

where one grows

an ensemble of chains

atom-by-atom (or

residue

by residue), replicating

and

deleting

chains so as to

generate

an ensemble that

obeys

the Boltzmann statistics.

(Note

that there are other methods of

growing

chains atom

by

atom

[38]).

A central idea in this method

is to avoid to go over

large

energy barriers

(as

in MC methods where the chain is

completed),

but to go around them. As far as

comparison

with MD calculations is

concerned,

our method

does not assume any

particular

guess for the initial state. We will illustrate the method for

the case of linear

polymers

[10] and its

application

to

poly-alanines [13a,b].

4.2 DESCRIPTION FOR THE CASE OF LINEAR POLYMERS. In this section we recall the

principles

on which the method is based. For

simplicity,

we shall illustrate it on the case of linear

polymers [10].

Our aim is to construct

a Boltzmann ensemble of

chains,

that

is,

a statistical ensemble of

(11)

M chains such that the

probability

to find a chain of energy E in the ensemble should be

proportional

to its Boltzmann

weight ~,

where

fl

=

£j

and Z is a normalization

factor,

I-e-, the

partition

function of the ensemble. In other

words,

the number of chains of energy E

in the ensemble should be

M~

Since

M/Z

is

a constant

independent

of

E,

we shall say that a chain of energy E should be

replicated

a number of times

proportional

to

e~flE

in the

ensemble.

To generate these

chains,

we use a recursive

procedure.

Assume that we have a Boltzmann

population

of chains of size n. In order to obtain a Boltzmann

population

of chains of size

n +

I,

we add

one atom to each of the

previously generated

chains of size n, and

replicate

the

new chain the number of times

proportional

to

e~flAE,

where AE is the energy cost of

adding

the last atom.

To illustrate the method in more

detail,

we assume that the

partition

function of the chain is

z =

/ fl d~r;

exP

(-

kb

$ (ir;+i r;i a)~ ~

»

(r;>

r>

)1(23)

~2 =~

i#j

where ri

=

0, (r;) being

the

position

of the I-th atom in the chain. The first term represents the elastic energy of a link

(of

average

length

a and elastic constant

kb),

and

v is a

2-body potential acting

between the atoms.

We have

deliberately

used a

simple

form for the energy in

(23),

but the

generalization

to a

peptide

chain is

easily performed

as discussed in section 5 below.

4. 3 REPLICATION-DELETION PROCEDURE. We start with the ensemble of

Ml

atoms n = I

at ri = 0. Each of these is a seed for a chain.

To build chains of

length

n = 2, for each of the

MI seeds,

we draw

randomly

a

position

r2.

The Boltzmann

weight

associated with the

configuration (ri>r2)

is

proportional

to

In order to obtain a

population

of chains

obeying

the Boltzmann

distribution,

we must

repli-

cate each

(ri, r2)-chain

a number w2

(ri

(r2 times. Since w2 is not an

integer,

the

replication

is

actually

done in the

following

way:

Define 12 = Int

(w2)

the

integer

part of w2> and r2

= w2 -12 < the rest.

Then, replicating statistically

w2 times means

replicating

12

times, plus

one additional time with

probability

r2.

That is to say, one

randomly

generates a number 0 < r < I. If r > r2, the chain is

replicated

12

times.

Otherwise,

it is

replicated

(12 + 1) times. Since w2 can be smaller than I, the

replication

can in fact amount to a

deletion,

and the chain is no

longer

considered in future calculations.

For this reason we call this a

replication-deletion procedure (RDP).

Once the RDP has been

applied

to each

chain,

we obtain a Boltzmann-distributed

population

of M2 chains of two atoms.

We can now iterate the

procedure

as follows.

Assume that we have a Boltzmann

population

of

Mn

chains of size n. The number

fi4n

(ri,...,rn)

of chains

(ri, ,rn)

in the ensemble is

proportional,

within statistical errors, to its Boltzmann

weight:

fi4n in,

,

r~)

= A~ exp

(-pE~ (ri,

, rn

)) (25)

(12)

N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 269

For each chain of the

ensemble,

we draw the

(n

+

I)-st

atom

randomly

at the

point

rn+i.

We compute the

weight:

Wn+i

(rn+i in,

, rn = exp

I-

kb

(lrn+i

rn

a)~ fl ~

v

(r~+i, r;) (26)

~i

We

replicate

the new chain wn+i

(rn+i (ri,

,

rn times. Then the number of

(ri>

rn,rn+i

)-chains

is:

fi4n+1(ri,

, rn,

rn+i)

= wn+i

(rn+i (ri,

, rn

fi4n (ri, ,rn)

=

An

exp

(-fl En (ri,

,

rn)

+ ~~

([rn+i rn[ a)~

+

~

2

(27)

+~ v(rn+i,r;)j)

;=1

The last term in the

exponential

is

just

the total energy

En+i (ri,..,rn+i)

of the chain

(ri>

, rn+i

)

We thus have:

fidn+i (ri;.,rn+i)

= An exp

(-flEn+i (ri;.,rn+i))

and the new ensemble of chains of

length (n

+

I)

is

again

Boltzmann distributed.

By iterating

the

procedure,

we see that at each

stage

of the process we construct a Boltzmann- distributed ensemble of chains of

increasing

size. We stop when the

required length

is obtained.

The

procedure

can be modified without alteration of the Boltzmann character of the statistics if we allow rn+i to be drawn several times for each chain.

Although

the method seems

applicable

as it

is,

one

immediately

encounters a

major problem, namely,

an

exponential

increase

(or decrease)

of the

population

of chains.

Indeed,

if we

deal,

for

example,

with a model of a

polymer

with steric

repulsion,

the

potential v(r)

is

repulsive (positive)

at short distances and thus the replication

weight

wn is smaller that

unity. Thus,

iteration of the process will result in

a decrease in the total

population

of

chains,

and

eventually

we may end up at some stage with an empty ensemble of chains.

Conversely,

if the interaction

v is attractive

(e,g.,

a

polymer

chain in a bad

solvent),

the

replication weight

wn is

larger

than

I, leading

to an

exponential

increase of the

population.

This also causes

computational problem,

since the available computer memory is finite.

However,

the

problem

can be

easily

handled if one recalls that all one needs is a

population

in which each chain is

replicated proportionally

to its Boltzmann

weight.

4.4 POPULATION CONTROL. Instead of

replicating

each chain with

a factor

wn+i(rn+i

[ri>

>rn) (Eq.(27)),

it is

perfectly legitimate

to

replicate

it with a factor

gn+iwn+i

(rn+i(ri;.,rn

where gn+i is an

arbitrary scaling

factor which can be

adjusted

so as to

keep

the

population

of chains under control.

Equation (27)

becomes

fi4n+1 (ri,

, rn, rn+i = gn+iwn+i

(rn+i (ri,

, rn fi4n

(ri,

, rn

(28)

The new

population

of chains has the size of

Mtot =

~j fi4n+1 (ri,

,

rn,

rn+i)

= gn+i

~j

wn+i

(rn+i (ri,

, rn fi4n

(ri,

,

rn). (29)

chains chains

(13)

From this we see in which way one should choose gn so as to

keep

the

population

under control. The iteration of

equation (29)

for a chain of size N

yields:

fidN

(ri,

,

rN)

" gig2g3...gN eXP

~fl ~

(~i+1

ri

a)~ fl ~

V

(~ii

~i

lfiii

=~

i<I,j<N

(30) (where

we set gi =

I). Equation (30)

proves that the final

population

is indeed Bolt2mann-

distributed.

Note that the

product

of g;

provides

a

simple

evaluation for the free energy.

Indeed, sumlring equation (30)

over all chains of the

ensemble,

we obtain

N

MN "

fl

g;Z MI

(31)

;=1

and the free energy is

given by

~~ N

F =

j (log<

+

Slog

g;

(321

,=1

In

practice,

the

scaling

factors gn can be determined in two ways:

for

simple problems (polymers

in

good

or bad

solvents),

one can use gn+i " gn and

adjust (increase

or

decrease)

gn+i in the case when the

population Mn+i

gets out of some fixed range Mmin <

Mn+i

< Mmax.

for more

complicated problems (e.g., proteins),

it is

preferable

to make a trial run of

adding

the

(n

+

I)-st

atom at each stage with the

scaling

of gn+1 " 7n+1, where 7n+1 is a property chosen

factor,

count the total

population

of chains

M(+i,

and then make the actual run with

MI g"+1 " 7n+1

~,n+1

so as to conserve

approximately

the initial

population

MI In this

work,

we chose 7n+1

" gn+i

Thus,

every time we add an atom, we

adjust

the

scaling

factor so as to conserve the total number of chains.

4.5 THE GUIDING FIELD. Assume that the elastic constant kb in

equation (23)

is

large.

Then,

if we distribute rn+i

uniformly,

the factor kb

((rn+i rn[ a)~

will be

large,

and the

replication weight

wn+i in

Eq.(26)

small.

Thus,

the

sampling

will be very

inefficient,

since it will

assign

a very

large scaling

factor gn+i to the rare

configuration

for which

[rn+i

rn

~ a,

leaving

a very small

weight

to other

configurations.

In other

words,

if one

configuration

is such that

[rn+i rn[

~ a, then it will be

replicated

a

large

number of

times,

while the others will be deleted from the ensemble. This results in a deterioration of the

quality

of the ensemble

and a

buildup

of correlations among the

chains,

I-e- many chains

redundantly

follow similar

paths

in

configurational

space.

This

difficulty

can be avoided. The

replication weight

for the atom

(n

+

I)

is of the form Wn+I

(rn+I lrli

irn

# gn+I ~XP

(~fIAE (~"+l l~li

~n )1

(331

(14)

N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 271

where AE is the energy cost of

adding

the atom

(n

+

I).

This

equation

can be factorized as follows:

wn+i

(rn+i in..

,n~

=

p~+i ~r~+i) g~+i

exP

(-fl§[jjjjj/jj>...>rn11)

~~4)

where the function

Pn+i (rn+i (ri,

,rn is an

arbitrary probability

distribution. The prc- cedure is now

simple:

draw rn+i with the

probability

distribution

Pn+i>

and

replicate

it

gn+i@

times. In what

follows,

we write

Pn+i

c~

exp(-flvn+i),

and we call

Vn+i

the

»+i »+i

guiding

field.

It can be

easily

seen that this

procedure

indeed conserves the Boltzmann distribution. It is also clear that statistical

independence

is best achieved when the

replication

factor is close to

unity.

For

example,

for the linear

polymer

chain it seems natural to take

Pn+i (rn+i)

c~ exp

-fl~~ ([rn+i

rn

a)~ (35)

2

that is, to draw rn+i with the correct Gaussian distribution. Then the

(n

+

I)-st

atom is

sampled

at a correct distance from rn, and the residual

replication weight

will be closer to one.

The ideal choice for the

sampling

function would be

Pn+i (rn+i)

c~ exp

(-fIAE (rn+i(ri>..

>rn

)) (36)

which would lead to unit

replication factors,

and thus a

completely

uncorrelated statistical ensemble.

However,

in the presence of

twc-body interactions,

there are no known

techniques

for

sampling

distributions like

(36).

The

optimal

choice for the

sampling

function Pn is

Pn+i (rn+i)

CC exP

(-flUn+i (rn+1)) (37)

where

Un+i (rn+i)

is the mean

potential

seen

by

the atom

(n

+

I).

But in

general,

the de- termination of this mean field is difficult, and one must resort to intuition in the choice of

Pn.

At this

point,

it is of interest to note that another use of the

guiding

field is to introduce an extra

potential

term to bias the MC

procedure

if one wishes to

directly incorporate experimental (or other)

information into the search. One may, for

instance, guide

the

sampling, using

Ramachandran's

plot

information [1].

4.6 THE RESCALING PROCEDURE. Even if the choice of Pn is

nearly optimal,

the

replica-

tion factors are not

strictly

equal to one, and

following

the

argumentation given

in

(4.3) above,

correlations between the chains build up in the statistical ensemble. This effect becomes more

important

as the chains become

longer.

The final number of uncorrelated chains in the ensemble is

proportional

to the

population

of chains.

Depending

on the temperature, the form of the

interaction,

and the size of the

chain,

it may be necessary to consider very

large

ensembles to get sufficient

configurational sampling.

In reference [35] for instance, it was shown

that,

for the case of

polyelectrolytes

in

good

or bad

solvents, good

statistics are achieved when the

population

M is of the order of10 times the

length

of the chain.

It can be seen that

algorithmically (not taking

into account the

possibilities

of vectorization

or

parallerization)

the

computational

time scales as M N~.

Thus,

for short

chains,

it is

possible

to use

large populations,

whereas for

large chains,

one can

only

use small

populations

Références

Documents relatifs

In terms of structure, the paper is set out as follows: Section 2 will provide further details on the Swiss context and introduce the case studies; Section 3 outlines

In another example, the acid-denatured states of ribonuclease, lysozyme and chymotrypsinogen, all three common model globular proteins for the study of protein folding, have shown

Some properties of the main protein of honeybee (Apis mellifera) royal jelly... Research on RJ has recently focused on the physiological functions of its individual

With a simple definition based on the well-known NKModel, the motivation is to make the IFP problem more accessible to optimisation specialists and model experts.Furthermore

Finally, we examine how drugs targeting the protein folding activity of the ribosome could be active against mammalian prion and other protein aggregation-based diseases, making

Although the shape of a folded rod looks very complicated, elementary parts of the rod, delimited by two neighbouring junction points (where the rod is locally in self-contact) can

The heat transfer fluid (humid air) licks the salt, water vapor diffuses within, allowing the chemical reaction to happen, and heat to be stored or released.. The salt thickness

When co-variation analysis aims to gain evolutionary information, OMES and ELSC are well suited to identify the co-evolving residues that contributed to the divergence within