HAL Id: jpa-00246718
https://hal.archives-ouvertes.fr/jpa-00246718
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
J. Bascle, T. Garel, Henri Orland
To cite this version:
J. Bascle, T. Garel, Henri Orland. Some physical approaches to protein folding. Journal de Physique
I, EDP Sciences, 1993, 3 (2), pp.259-275. �10.1051/jp1:1993128�. �jpa-00246718�
J. Phys. I France 3
(1993)
259-275 FEBRUARY 1993, PAGE 259Classification Physics AbstTacts
05.90 61.40D 87.10
Some physical approaches to protein folding
J.
Bascle,
T. Garel and H. OrlandService de Physique
Thdorique(*)
CE-Saclay, 91191 Gif-sur-Yvette Cedex, France(Received
15 May1992, accepted 5 June1992)
R4sum4. Le repliement des protdines eat un probl+me qui a de nombreuses implications biologiques. Dana cet article, nous pr6sentons, de deux fa&ens difl4rentes, un point de vue de physicien. Nous introduisons tout d'abord des mod+les simples de m4canique statistique qui exhibent, h la limite thermodynamique, des transitions de repliement. Ces mod+les peuvent Atre divis6s en (I) verres de spin
(6ventuellement
k laMattis),
al l'on peut chercher des corr41ations entre [es interactions intrachaine et la structure replide,(it)
verres, al l'on met l'accent sur lacomp4tition g40mdtrique entre l'ordre local uni- au bi-dimensionnel
(qui
modble [es structuresen hdlices a au en feuillets
fl),
et la contrainteglobale
de compacitd. Ces deux types de modbles sent trap simples pour l'dtude de vraies prot4ines, mars its devraient s'appliquer dons le domaine de la transition vitreuse, des polymbres collaps4s,... La deuxibme voie d'dtude eat une m4thodeMonte-Carlo, al
on fait croitre la protdine atome par atome
(au
rdsidu parrdsidu),
I l'aide d'une forme donnde del'dnergie
totale de la protdine(CHARMM,...).
Cette m4thode pent dtre alors compar4e aux autres m4thodes numdriques; nous comparons ainsi nos rdsultats avec des calculs de dynamique moldculaire pour le cas des poly-alanines. Cette double approche eat une bonne illustration des difficultds que l'on rencontre dons le probl+me du repfiement des protdines (nombreux 4tats m4tastables,...).Abstract. To understand how a protein folds is a problem which has important
biological
implications. In this article, we would like to present a physics-oriented point of view, which is twofold. First of all, we introduce simple statistical mechanics models which display, in the thermodynamic limit, folding and related transitions. These models can be divided into (I) crude spin glass-like models(with
their Mattisanalogs),
where one may look for possible correlationsbetween the chain self-interactions and the folded structure, (it) glass-like models, where one
emphasizes the geometrical competition between one- or two-dimensional local order
(mimicking
a helix or fl sheet
structures),
and the requirement of global compactness. Both modelsare too
simple to predict the spatial organization of a realistic protein, but are useful for the physicist
and should have some feedback in other glassy systems
(glasses,
collapsedpolymers,...).
These remarks leadus to the second physical approach, namely a new Monte-Carlo method, where one grows the protein atom-by-atom
(or residue-by-residue),
using a standard form(CHARMM,...)
for the total energy. A detailed comparison with other Monte-Carlo schemes, or M61ecular Dynamics calculations, is then possible; we will sketch such
a comparison for poly-alanines.
Our twofold approach illustrates some of the difficulties one encounters in the protein
folding
problem, in particular those associated with the existence ofa large number of metastable states.
(*)
Laboratoire de la Direction des Sciences de la Matibre du Commissariat I l'Energie Atomique1 Introduction.
Proteins are
weakly
branchedpolymers,
built out of twentyspecies
of monomers(aminoacids).
They
have the property offolding
into an(almost) unique
compact native structure, which is thebiological interesting object [I].
The compactness islargely
due to the existence of thehydrophobic
aminoacidresidues,
since thesebiological objects
areusually designed
to workin water. Both the compactness and the chemical
heterogeneity
of agiven protein
tend toslow down
dynamical
processes, and thequestion periodically
arises as to whether theprotein folding problem
in underthermodynamic
or kinetic control. Thisquestion
is not unfamiliar inthe
physics
ofglassy
systems where the sameproblem
of a veryrugged phase
space is present.In
physical
terms, the frustration in aprotein
can benaively
described in two different waysi(I)
the energy of a
protein
is the sum of bonded(geometrical)
and non-bonded(Coulomb,
Van DerWaals)
terms, which cannot besimultaneously
satisfied.(ii) experimentally (Cristallography, NMR),
a foldedprotein
has a local order due tohydrogen-bonds.
This orderis, roughly speaking,
of one-dimensional(a helix)
or twc-dimensional(fl sheet)
nature and is thereforeincompatible
with therequirement
ofglobal
compactness.In this
ch8pter,
we shall follow thethermodynamics approach
tofolding; simple
statistical mechanics models will be studied forpoints (I)
and(it).
For theformer,
one isnaturally
lead to draw aparallel
with thespin glass problem (the quenched
disorderbeing
linked to theprimary structure),
whereas the latter is more akin toglasses.
Both of theseapproaches
haveinteresting
outputs. Thespin glass
[2] case and its Mattisanalogs
[3] suggest a connection with thephysics
[4] of neuralnetworks, Hopfield model,..,
for themajor
unsolvedproblem
of the
coding
of thetertiary
structure in theprimary
structure. The"glassy glass"
case [5]points
towardsinteresting
differences between helices andsheets,
and revivesFlory-like
models ofpolymer melting,
as well as the Gibbs-Dimarziotheory
of theglass
transition[6,7].
On a more realistic
level,
we also wish to benefit from thebiologists' experience
with theircomplicated
systems. It is therefore necessary to gobeyond
the abovequalitative
picture andstudy
well-defined entities. We have therefore devised a new Monte-Carlo(MC)
method to generate a Boltzmannian ensemble ofconfigurations
of aprotein.
This method usesan
empirical
form for the total energy of theprotein,
and may beloosely
described as an atom-by-atom growth
of theprotein,
in marked contrast to other MC methods [8] or to MolecularDynamics
[9](MD)
calculations. Thisgrowth procedure
was introduced [10] to try toefficiently explore
therugged landscape
of aprotein phase
space and may becoupled
to more traditionaltechniques (simulated annealing
[11], minimizationprocedures,...).
We have tested [12] the method onpeptides
with a small number N of atoms(alanine dipeptide (N
=22),
penta-alanine
(N
=53)), by comparing
the energy minima with those obtainedby
MD simulations(CHARMM).
A short review of the samecomparison
[13a] withhepta-alanine (N
=73)
isgiven below, together
withpreliminary
results [13b] ontwenty-alanine (N
=203).
At thispoint
a caveat seems in order: all themethods, including
ours, are faced with theproblem
of the
solvent,
and use at best an effective energy functiontaking
in a crude way the effectof the water molecules
((or
instance in thecollapsed regime,
a one-hundred residueprotein
has
something
like half of its atoms on thesurface).
Our simulations areusually
done with a dielectric constantequal
tounity (vacuum-type calculations).
The
layout
of the paper is as follows. Section 2briefly
deals with thespin glass-like approach
to the
folding
transition, as well as the related Mattis models(coding
I laHopfield,.. ).
The"glassy glass"
case is studied in section 3, where the link with theFlory-Gibbs-Dimarzio theory
of theglass
transition is discussed. Weemphasize,
in this context, the existence of a disorderpoint.
Section 4 describes the new AICgrowth
method and itsapplication
topoly-alanines
(Sect. 5).
N°2 SO&fE PHYSICAL APPROACIIES TO PROTEIN FOLDING 261
2.
Spin glasses
andfolding
transitions.We model a
protein
as a chain of N links(residues),
,vhere link I(at r;)
and linkj (at rj)
interact
through
apotential
u;j(r;
rj),
whichdepends
on the chemical nature of links I andj. Physically,
ujj isexpected
to berelatively short-ranged (screened
Coulomb or van der lvaalsinteractions,.. ).
We take forsimplicity
u;j
(ri
rj = u;j b(r;
rj(I)
Two types of model can be studied [2, 3].
(I)
thespin glass
model: the interactions(uj)
are taken asindependent
random variablesdistributed,
forinstance, according
to a Gaussian distribution~ ~~"
~#~~~ 12~2
~~'J°~~) (2)
In
equation (2),
uo denotes the excluded ;olume effect(in appropriate units).
A"biological"
interpretation of this approach is the
follo,ving:
the interactions(u)
bet,veen the samecouples
of
residues,
but at differentplaces
in theprimary
sequcnce, aretotally
uncorrelated because of their different environment.(One
may also link thisapproach
to thetravelling
salesmanoptimization problem [14]).
(ii)
theseparable
model: the interactions(u;j)
are taken as a sum of AIseparable
termsAI
~,,
~j
~ ~P~P
(3)
11 P I j
p=i
,vhere the
(f[)
aretaken,
for instance, asindepciident
mndoni,<ariables with Gaussian distri- butioii.Apart
from thc excluded iolumeefl'ect,
thc"charges" (f[;
p= I,...,
AI)
can representthe Coulomb
charge,
thcliydropliolJicity,
theI;clix-forming
orbreaking tcndency,..
The "bi-ological" interpretation
isopposite
to theprcvious
one:here,
each residue is dcfiuedby
Mindependent "ch;irges",
or chJracters, ii~hiclidcpends onl»
on its chemical nature and not on itsposition along
theprimary
sequence.In the continuous
limit,
thepartition
functi~n of these models reads [16]Z =
/ Dr(s)exp (-
~/~
ds ~~ ~~ /~ /~
ds ds'u(s, s')
6(I(s)
r
(s') )lx
2
o ds 2
o o
~ S S S ~~~
~~~~ 6 ~~
~~' ~~" ~~~~~~ ~~~'~~ ~~~~'~
~~"~~~
The last term is included to avoid a total
collapse
of the chain, itsusual,
the paraineters inequation (4)
are the space dimensiond,
the in;crsc temperaturefl
= ~,, and S= Na~
(n.here
a is the common
length
of thelinks).
Introducing replicas
[4] topcrform quenched averaging
ovcr the disordered intcractions(v(s,s')),
we get thefollowing
results[2,3]
: there are threephases, namely,
ahigh
tem-perature coil state, an intermediate teniperature
collapsed phase
with amacroscopic
entropy(similar
to apolymer
below tile 8point),
andfinally
a low temperaturecollapsed
frozenphase.
Detailing
the abovemodels,
,ve have:(I)
thespin glass
model: the low temperaturephase
is a Pottsglass
with p- oo states
[16],
at least forhigh enough
dimensions. The(mean field)
order parameter of thefreezing
transition is:
Qnfl (r,
r')
=
2flv /~
ds(b (r ra(S))
b(r
'rfl(S))) (5)
where I < a <
fl
< n(and
n -0),
and (. denotes a thermal average with respect to thereplicated
Hamiltonian ofequation (4). Alternatively,
thefreezing
transition can be studiedby
the
overlaps
of two(real) copies
of the system [17]. As inIsing spin glasses
[4], one may arguethat there are few dominant states in the system, which could be
interpreted
in terms of a few dominant folded structures. Numerical calculationsalong
theselines, including dynamics,
have been
recently reported
[18]. Notethat, by construction,
these"protein
models" possess ultrametricproperties
[19]. For a more realistic case, see section 4.To conclude on this type of
approach,
it isinteresting
topoint
out that a variational function introducedby
Shakhnovich and Gutin [20] in the context ofproteins,
has been used in other disordered solid state situations[21].
(ii)
theseparable
model: when oneperforms
thequenched
average over the(ff )
inequation (4),
some mean field order paramters appearnaturally
in the system such asmp,a
jr)
=
/~
dsfp
16(r ra(S))) (6)
In
equation (6),
a is anunimportant replica
index(to
be omitted from nowon); (.
and denoterespectively
thermal and disorder averages. These order parameters h laHopfield
[22] are a measure of the correlation between the chemical nature of a link
(characterized by
(fp Is))
and itsposition
r in space. There is a Mattis-likefreezing phase
transition where some(mp (r)
p = 1, 2,..,
Mo)
condense. For these MOcharacters,
theprimary
sequence codes for thespatial
structure of the chain. If there isonly
one"charge"
or character(e.g. hydrophc- bicity),
thefolding
transition will translate into aspatial separation
betweenhydrophobic
and
hydropholic
links. Ingeneral,
a Mattis-like transitionimplies
asingle
dominantspatial
structure, witha
large
number of metastable states[22(b)].
Note that when M
increases,
at fixed N, we expect a smooth crossover from theseparable
to the
spin glass
model. Fromsimple qualitative
arguments [23], it can be inferred that in realproteins,
one should have M= 8 relevant
(and independent)
characters for each residue. Thus for a =fl
small(I.e, long chains),
one deals with theseparable
case, whereas for alarger (short chains),
theglassy
model is moreappropriate.
The critical a is of order [22, 23] a~ ct.I(which gives,
in thismodel,
a criticallength
of N~ ct80).
Similarcoding
schemesusing
Protein Data Banks have been studiedby Wolynes
and coworkers [24].3. Glasses and
folding
transitions.3.I THE MODEL. The
energetical
frustration described above is notequite satisfactory,
since there is no real disorder in
proteins.
We willsee in section 4 that a
commonly
used form of the total energy of aprotein
is the sum ofa bonded
(geometric)
part and of a non- bonded(Coulomb,
Van derWaals)
part. Inparticular,
the Coulomb part isresponsible,
in thisformalism,
of the formation ofhydrogen
bonds [25] that tend tolocally
stabilize one-dimensional
(a helix),
or two-dimensional(fl sheets)
structures. Seefigure
I. Sincewe know
that the
biologically
activeprotein
is compact[I],
we aretypically
faced with theproblem
N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 263
of
geometrical frustration,
where local andglobal
orders areincompatible.
Thisapproach
is familiar inglasses
where one tries to solve this contradictionby
amapping
onto a curved space[26].
We choose here athermodynamic approach
andmodel,
as anexample,
the o helixcase in the
following
way: we consider a d-dimensionalhypercubic
lattice of N =L~ sites,
with
periodic boundary conditions,
and its associated Hamiltonianpaths.
We recall that a Hamiltonianpath
visits all sites of the lattice once andonly
once. Hamiltonianpaths
have been often used to modelcollapsed polymer globules [15]. Following Flory [6a],
we take each link of the Hamiltonianpath
to represent a helical turn. Sincehydrogen-bonds
have atendency
to favor
long helices,
that is toalign
the links of ourmodel,
we attribute an energypenalty
e to thebreaking
of an helix, that is whenever the Hamiltonianpath
makes a turn(corner).
This model has attracted a lot of attention in thetheory
ofpolymer melting ii, 27].
Forsimplicity,
we consider closed
paths, but,
as is well known inpolymer theory [16], boundary
conditionsplay
a roleonly
in subdominant terms of the free energy. Thepartition
function of the system,at inverse temperature
fl
=,
reads
z =
~ e-P£N,jl~j
~~~
jl~j
CO
RN
co MN
NH
NH °C
NH DC
NH
(a) (bi
Fig. I. Schematic representation of hydrogen bonds in
(a)
a-helix,(b) (antiparallel)
p-sheet-where
(7l)
denotes the ensemble of all Hamiltonianpaths,
andN~(7l)
denotes the number ofcorners present in
path
7l.Following
reference [28], one may rewrite Z asf fl$~~
d~an(r)
e~~Gfl~ (£~ )~aJ (r)
+ e~fl~£~
~~ ~an
(r)
~a~(r))
Z = lim
~
(8a)
"-° n
f fl~_~
d~aa(r)
e-AGwith
AG
=jj [
~an(r) (Air ,)~~
~an(r ') (8bi
~= m
where ~oa
(r)
is an n-component(n
=0)
realfield,
defined in each direction a =I,..., d,
attached to allpoints
r of the lattice. The operatorAQ~,
is I if r and r' are nearestneighbours
in direction o and 0
otherwise; (AQ~,)~~
denotes its inverse.Using
Wick's theorem andextracting, through
the n= 0 trick
[29],
the contribution of all connectedpaths,
it iseasily
shown that(8a)
and(7)
areequal.
Note that in the abovedescription,
one does not consider theprimary
sequence anymore, in marked contrast to theapproach
of section 2. In the nonweighted problem (e
=0),
thesaddle-point (SP)
method of reference [28]yields
Zsp(e
=0)
=(~) (9)
e
~
where q
= 2d is the lattice coordination number and e ci 2.71828...
Equation (9)
is in excellentagreement
with numericaldata,
in marked contrast to the "old"Flory theory
[6a] whichgives
ZF(e
=0)
=(~ (10)
e
~
3. 2 THE HIGH TEMPERATURE ISOTROPIC APPROACH. We have extended the SP
approach
to the model defined in
equations (8).
We get§ ~~~
'~ ~~ ~~'~l Co @(~/~~~i~(~i~)~~'
i'fl
(~)
~~~~At
high
temperature, it is natural to look for ahomogeneous
andisotropic solution,
~OJ(r)
= ~o.We break the
O(n)
symmetryby choosing
~a in agiven "direction",
say 0, and obtain~a( =
~
(12a)
and
ZSP(61
=(M)~ (12bj
with
q(fl)
= 2 +2(d I)e~~~ (12c)
The "old"
Flory theory [6a]
wouldyield
similar results withZF(e)
=~~~~~)~
(13a)
e
where
qF(iii
" 1+2(d I)e~~~ (13bj
Both
approaches
have thefollowing properties:
(I)
there exists a temperature TG where theentropy
vanishes. Thisremark,
in the framework of theFlory theory,
is the basis of the Gibbs-Dimarziotheory
of theglass
transition[6b].
(ii)
before one reachesTG,
there is a first orderfreezing
transition atTc,
such thatq
(flc)
= e(14)
N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 265
o
-J 5
-2 0
0 2 3
I
Fig. 2. Various approximations to the free energy of the glass model of III
as a function of temper- ature. Curve
(I)
is the "old" Flory theory. Curve (2) is the low temperature anisotropic saddle point result(with
the disorder point at TD Ci 2.24e).
Curve(3)
is thehigh
temperature isotropic saddle point result. In all cases, the transition occurs when the free energy vanishes.The low temperature
phase
is frozen(Fig. 2),
since it consists offully
stretchedpaths making
turns at the surface.Using (12c)
and(13b),
we get, for d = 3Tc[~p
ci 0.58 e(Isa)
for the SP
approach
to model(8)
andTc[~ ci 1.18 e
(lsb)
for
Flory's theory.
However,
aspointed
outby Gujrati
and coworkersiii,
such afreezing
transition cannot bethoroughly
correct, since the free energy may be shown to bestrictly negative
at low tem-peratures. This
(slight)
correction to theFlory freezing
scenario comes from one dimensional excitations that are not well treated in anisotropic
SPapproach.
3.3 THE LOW TEMPERATURE ANISOTROPIC APPROACH.
Considering
the above men-tioned criticism of the
isotropic
SPapproach,
we have considered [30] ananisotropic approach
to the model described in
(8a)
and(8b):
we treatexactly
one direction of thelattice,
sayI,
and treat the(d I) remaining
directions in a mean field(saddle-point) approach.
Using
the fact that the denominator ofequation (8a)
goes to one when n goes to zero, werewrite
(8a)
asZ # llDl
f jj
d§2a(~) ~~~~ fl (~ )~'? (~)
~ ~~~~ll
~~ ~~~ "
~~j
~~~~
n-0 n
~_~ r n o<fl
which we
approximate by
Zi ci lim
/
d~ai(r) e'~ie~ +(~~~)~
~fl
(~
~a~~(r) + A ~ai(r)
~l + C ~l~l(17)
n-0 n 2
~
~~~~~
A~ =
~
~gi(r) (A)r ,)
~~1°1(r') (~~)
~
~,~'
and
A =
(d I)e~~~ (19)
and
C =
~~
~
~~
(l
+(d 2)e~~~ (20)
In
(17),
~l is the(mean-field)
value of ~an, a#
I.Integrating exactly (17) yields
a free energy per sitefl
ii = ~~ ~~~ ~Log (l
+ C~I ~ +((l
+ C~I ~)~ 4(C A~)
~l~)
~~j (21)
4 2
Equation (21) exhibits,
at T' ci 0.68 e(d
=3),
a first order transition(Fig. 2)
between afrozen
phase (cristal)
with ~l = 0 anda
high
temperature(liquid) phase
with ~l#
0. At thisorder,
the free energy is zero in the frozenphase,
but becomesnegative
if fluctuations(in
~l)are taken into account
(30].
In any case, the corrections toFlory's free2ing
picture are weak.In the
high
temperaturephase however,
we have found [30] a disorderpoint
of the second kind [31], where the nature of the correlationsalong
direction Ichanges.
The disorderpoint
TD isgiven by
C = A~
(22)
For d
= 3, we get TD t 2.24 e; in a
polymeric chain,
such a disorderpoint
islikely
to havemore severe
dynamic implications
than in usualspin
systems [32].3.4 CONCLUSION. We have also considered [33] the case of
fl
sheets and found similar conclusions. Theisotropic
SPapproach
should be "better" in this case since twc-dimensionallong
range order may exist at finite temperature: weget
a first orderfreezing
transition h laFlory.
The results of these
geometrically
frustrated models should be relevant for otherthermody-
namic systems, such as
glasses [34], polyelectrolytes
in a bad solvent[35],
chiralliquid crystals [36],...
Forinstance,
it is rathertempting,
in the case ofglasses,
toidentify
the low temperaturephase
as the(unreachable) crystal phase,
and to link the disorderpoint
with theglass
transi- tion. In the case ofproteins however,
one deals with finite systems: we thuscautiously identify
the low temperaturephase
as the native structure, whereas thehigh
temperaturephase
looks like a "moltenglobule"
[37]. We now consider a more "realistic"approach,
which will allow us to benefit from thebiologists' experience
with these rathercomplicated
systems.4. The Monte Carlo
growth
method.4. I INTRODUCTION. As
previously mentioned,
one of the main difficulties of theprotein folding problem
is the existence, inphase
space, of alarge
number of local minima. Traditionalsingle
move MC methods are therefore doomed to fail, aslarge
collective motions will be necessary to"untrap"
the chain. One mayimprove
these methodsby using
simulatedannealing
procedures,
or any other minimization scheme. We have chosen to devise a new MCmethod,
N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 267
@
-180 J00 20 20 100 180
qidegi
@
180 -loo lo lo loo 180
Q
(degl
Fig. 3. Ramachandran's plots for the third residue of an hepta-alanine chain
(a)
MC results,(b)
MD results.
where one grows
an ensemble of chains
atom-by-atom (or
residueby residue), replicating
anddeleting
chains so as togenerate
an ensemble thatobeys
the Boltzmann statistics.(Note
that there are other methods ofgrowing
chains atomby
atom[38]).
A central idea in this methodis to avoid to go over
large
energy barriers(as
in MC methods where the chain iscompleted),
but to go around them. As far as
comparison
with MD calculations isconcerned,
our methoddoes not assume any
particular
guess for the initial state. We will illustrate the method forthe case of linear
polymers
[10] and itsapplication
topoly-alanines [13a,b].
4.2 DESCRIPTION FOR THE CASE OF LINEAR POLYMERS. In this section we recall the
principles
on which the method is based. Forsimplicity,
we shall illustrate it on the case of linearpolymers [10].
Our aim is to construct
a Boltzmann ensemble of
chains,
thatis,
a statistical ensemble ofM chains such that the
probability
to find a chain of energy E in the ensemble should beproportional
to its Boltzmannweight ~,
wherefl
=
£j
and Z is a normalizationfactor,
I-e-, thepartition
function of the ensemble. In otherwords,
the number of chains of energy Ein the ensemble should be
M~
SinceM/Z
isa constant
independent
ofE,
we shall say that a chain of energy E should bereplicated
a number of timesproportional
toe~flE
in theensemble.
To generate these
chains,
we use a recursiveprocedure.
Assume that we have a Boltzmannpopulation
of chains of size n. In order to obtain a Boltzmannpopulation
of chains of sizen +
I,
we addone atom to each of the
previously generated
chains of size n, andreplicate
thenew chain the number of times
proportional
toe~flAE,
where AE is the energy cost ofadding
the last atom.
To illustrate the method in more
detail,
we assume that thepartition
function of the chain isz =
/ fl d~r;
exP
(-
kb$ (ir;+i r;i a)~ ~
»(r;>
r>)1(23)
~2 =~
i#j
where ri
=
0, (r;) being
theposition
of the I-th atom in the chain. The first term represents the elastic energy of a link(of
averagelength
a and elastic constantkb),
andv is a
2-body potential acting
between the atoms.We have
deliberately
used asimple
form for the energy in(23),
but thegeneralization
to apeptide
chain iseasily performed
as discussed in section 5 below.4. 3 REPLICATION-DELETION PROCEDURE. We start with the ensemble of
Ml
atoms n = Iat ri = 0. Each of these is a seed for a chain.
To build chains of
length
n = 2, for each of theMI seeds,
we drawrandomly
aposition
r2.The Boltzmann
weight
associated with theconfiguration (ri>r2)
isproportional
toIn order to obtain a
population
of chainsobeying
the Boltzmanndistribution,
we mustrepli-
cate each
(ri, r2)-chain
a number w2(ri
(r2 times. Since w2 is not aninteger,
thereplication
is
actually
done in thefollowing
way:Define 12 = Int
(w2)
theinteger
part of w2> and r2= w2 -12 < the rest.
Then, replicating statistically
w2 times meansreplicating
12times, plus
one additional time withprobability
r2.That is to say, one
randomly
generates a number 0 < r < I. If r > r2, the chain isreplicated
12times.
Otherwise,
it isreplicated
(12 + 1) times. Since w2 can be smaller than I, thereplication
can in fact amount to a
deletion,
and the chain is nolonger
considered in future calculations.For this reason we call this a
replication-deletion procedure (RDP).
Once the RDP has beenapplied
to eachchain,
we obtain a Boltzmann-distributedpopulation
of M2 chains of two atoms.We can now iterate the
procedure
as follows.Assume that we have a Boltzmann
population
ofMn
chains of size n. The numberfi4n
(ri,...,rn)
of chains(ri, ,rn)
in the ensemble isproportional,
within statistical errors, to its Boltzmannweight:
fi4n in,
,
r~)
= A~ exp(-pE~ (ri,
, rn
)) (25)
N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 269
For each chain of the
ensemble,
we draw the(n
+I)-st
atomrandomly
at thepoint
rn+i.We compute the
weight:
Wn+i
(rn+i in,
, rn = exp
I-
kb(lrn+i
rna)~ fl ~
v
(r~+i, r;) (26)
~i
We
replicate
the new chain wn+i(rn+i (ri,
,
rn times. Then the number of
(ri>
rn,rn+i)-chains
is:fi4n+1(ri,
, rn,
rn+i)
= wn+i(rn+i (ri,
, rn
fi4n (ri, ,rn)
=
An
exp(-fl En (ri,
,rn)
+ ~~([rn+i rn[ a)~
+~
2
(27)
+~ v(rn+i,r;)j)
;=1
The last term in the
exponential
isjust
the total energyEn+i (ri,..,rn+i)
of the chain(ri>
, rn+i
)
We thus have:fidn+i (ri;.,rn+i)
= An exp(-flEn+i (ri;.,rn+i))
and the new ensemble of chains of
length (n
+I)
isagain
Boltzmann distributed.By iterating
theprocedure,
we see that at eachstage
of the process we construct a Boltzmann- distributed ensemble of chains ofincreasing
size. We stop when therequired length
is obtained.The
procedure
can be modified without alteration of the Boltzmann character of the statistics if we allow rn+i to be drawn several times for each chain.Although
the method seemsapplicable
as itis,
oneimmediately
encounters amajor problem, namely,
anexponential
increase(or decrease)
of thepopulation
of chains.Indeed,
if wedeal,
forexample,
with a model of apolymer
with stericrepulsion,
thepotential v(r)
isrepulsive (positive)
at short distances and thus the replicationweight
wn is smaller thatunity. Thus,
iteration of the process will result in
a decrease in the total
population
ofchains,
andeventually
we may end up at some stage with an empty ensemble of chains.
Conversely,
if the interactionv is attractive
(e,g.,
apolymer
chain in a badsolvent),
thereplication weight
wn islarger
than
I, leading
to anexponential
increase of thepopulation.
This also causescomputational problem,
since the available computer memory is finite.However,
theproblem
can beeasily
handled if one recalls that all one needs is apopulation
in which each chain is
replicated proportionally
to its Boltzmannweight.
4.4 POPULATION CONTROL. Instead of
replicating
each chain witha factor
wn+i(rn+i
[ri>
>rn) (Eq.(27)),
it isperfectly legitimate
toreplicate
it with a factorgn+iwn+i
(rn+i(ri;.,rn
where gn+i is an
arbitrary scaling
factor which can beadjusted
so as tokeep
thepopulation
of chains under control.
Equation (27)
becomesfi4n+1 (ri,
, rn, rn+i = gn+iwn+i
(rn+i (ri,
, rn fi4n
(ri,
, rn
(28)
The new
population
of chains has the size ofMtot =
~j fi4n+1 (ri,
,
rn,
rn+i)
= gn+i~j
wn+i(rn+i (ri,
, rn fi4n
(ri,
,
rn). (29)
chains chains
From this we see in which way one should choose gn so as to
keep
thepopulation
under control. The iteration ofequation (29)
for a chain of size Nyields:
fidN
(ri,
,
rN)
" gig2g3...gN eXP~fl ~
(~i+1ri
a)~ fl ~
V
(~ii
~ilfiii
=~
i<I,j<N
(30) (where
we set gi =I). Equation (30)
proves that the finalpopulation
is indeed Bolt2mann-distributed.
Note that the
product
of g;provides
asimple
evaluation for the free energy.Indeed, sumlring equation (30)
over all chains of theensemble,
we obtainN
MN "
fl
g;Z MI(31)
;=1
and the free energy is
given by
~~ N
F =
j (log<
+Slog
g;(321
,=1
In
practice,
thescaling
factors gn can be determined in two ways:for
simple problems (polymers
ingood
or badsolvents),
one can use gn+i " gn andadjust (increase
ordecrease)
gn+i in the case when thepopulation Mn+i
gets out of some fixed range Mmin <Mn+i
< Mmax.for more
complicated problems (e.g., proteins),
it ispreferable
to make a trial run ofadding
the
(n
+I)-st
atom at each stage with thescaling
of gn+1 " 7n+1, where 7n+1 is a property chosenfactor,
count the totalpopulation
of chainsM(+i,
and then make the actual run withMI g"+1 " 7n+1
~,n+1
so as to conserve
approximately
the initialpopulation
MI In thiswork,
we chose 7n+1" gn+i
Thus,
every time we add an atom, weadjust
thescaling
factor so as to conserve the total number of chains.4.5 THE GUIDING FIELD. Assume that the elastic constant kb in
equation (23)
islarge.
Then,
if we distribute rn+iuniformly,
the factor kb((rn+i rn[ a)~
will belarge,
and thereplication weight
wn+i inEq.(26)
small.Thus,
thesampling
will be veryinefficient,
since it willassign
a verylarge scaling
factor gn+i to the rareconfiguration
for which[rn+i
rn~ a,
leaving
a very smallweight
to otherconfigurations.
In otherwords,
if oneconfiguration
is such that[rn+i rn[
~ a, then it will bereplicated
alarge
number oftimes,
while the others will be deleted from the ensemble. This results in a deterioration of thequality
of the ensembleand a
buildup
of correlations among thechains,
I-e- many chainsredundantly
follow similarpaths
inconfigurational
space.This
difficulty
can be avoided. Thereplication weight
for the atom(n
+I)
is of the form Wn+I(rn+I lrli
irn# gn+I ~XP
(~fIAE (~"+l l~li
~n )1(331
N°2 SOME PHYSICAL APPROACHES TO PROTEIN FOLDING 271
where AE is the energy cost of
adding
the atom(n
+I).
Thisequation
can be factorized as follows:wn+i
(rn+i in..
,n~=
p~+i ~r~+i) g~+i
exP(-fl§[jjjjj/jj>...>rn11)
~~4)where the function
Pn+i (rn+i (ri,
,rn is anarbitrary probability
distribution. The prc- cedure is nowsimple:
draw rn+i with theprobability
distributionPn+i>
andreplicate
itgn+i@
times. In whatfollows,
we write
Pn+i
c~exp(-flvn+i),
and we callVn+i
the»+i »+i
guiding
field.It can be
easily
seen that thisprocedure
indeed conserves the Boltzmann distribution. It is also clear that statisticalindependence
is best achieved when thereplication
factor is close tounity.
Forexample,
for the linearpolymer
chain it seems natural to takePn+i (rn+i)
c~ exp-fl~~ ([rn+i
rn
a)~ (35)
2
that is, to draw rn+i with the correct Gaussian distribution. Then the
(n
+I)-st
atom issampled
at a correct distance from rn, and the residualreplication weight
will be closer to one.The ideal choice for the
sampling
function would bePn+i (rn+i)
c~ exp(-fIAE (rn+i(ri>..
>rn)) (36)
which would lead to unit
replication factors,
and thus acompletely
uncorrelated statistical ensemble.However,
in the presence oftwc-body interactions,
there are no knowntechniques
for
sampling
distributions like(36).
Theoptimal
choice for thesampling
function Pn isPn+i (rn+i)
CC exP(-flUn+i (rn+1)) (37)
where
Un+i (rn+i)
is the meanpotential
seenby
the atom(n
+I).
But ingeneral,
the de- termination of this mean field is difficult, and one must resort to intuition in the choice ofPn.
At this
point,
it is of interest to note that another use of theguiding
field is to introduce an extrapotential
term to bias the MCprocedure
if one wishes todirectly incorporate experimental (or other)
information into the search. One may, forinstance, guide
thesampling, using
Ramachandran's
plot
information [1].4.6 THE RESCALING PROCEDURE. Even if the choice of Pn is
nearly optimal,
thereplica-
tion factors are not
strictly
equal to one, andfollowing
theargumentation given
in(4.3) above,
correlations between the chains build up in the statistical ensemble. This effect becomes moreimportant
as the chains becomelonger.
The final number of uncorrelated chains in the ensemble is
proportional
to thepopulation
of chains.Depending
on the temperature, the form of theinteraction,
and the size of thechain,
it may be necessary to consider verylarge
ensembles to get sufficientconfigurational sampling.
In reference [35] for instance, it was shown
that,
for the case ofpolyelectrolytes
ingood
or badsolvents, good
statistics are achieved when thepopulation
M is of the order of10 times thelength
of the chain.It can be seen that
algorithmically (not taking
into account thepossibilities
of vectorizationor