HAL Id: jpa-00211125
https://hal.archives-ouvertes.fr/jpa-00211125
Submitted on 1 Jan 1989
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Statistical mechanics model of protein folding : short
and long chains have different folding transitions
J.-R. Garel, T. Garel, Henri Orland
To cite this version:
J.-R. Garel, T. Garel, Henri Orland.
Statistical mechanics model of protein folding :
short
and long chains have different folding transitions. Journal de Physique, 1989, 50, pp.3067-3074.
�10.1051/jphys:0198900500200306700�. �jpa-00211125�
Statistical mechanics model of
protein
folding :
short and
long
chains have
different
folding
transitions
J.-R. Garel
(1),
T. Garel(2, *)
and H. Orland(2)
(1)
Laboratoired’Enzymologie,
C.N.R.S.,
91190Gif-sur-Yvette,
France(2)
Service dePhysique Théorique,
Institut de RechercheFondamentale,
CEA-CENSaclay,
91191 Gif-sur-YvetteCedex,
France(Reçu
le 17 avril 1989,accepté
sousforme
définitive
le 30juin
1989)
Résumé. 2014
Nous considdrons un modèle de
mécanique
statistique
pour simuler lerepliement
d’une chaîne
polypeptidique.
Chaque
résidu de la cheine est défini par un ensemble de caractèresindependants.
Les simulations Monte-Carlo pour unegéométrie
dechamp
moyen(dimension
infinie)
montrent que(i)
les chaînes courtes ne sereplient
pas,(ii)
la transition derepliement
deschaînes de
longueur
intermédiaire est du type « verre despins »
(où
tous les caractères doivent êtrepartiellement
satisfaits), (iii)
la transition derepliement
des chaîneslongues
est du type « Mattis »(où
un seul caractère dominant est presquecompletement
satisfait).
Ce modèle estpeut-être applicable
au cas deprotéines
multidomaines.Abstract. 2014 A statistical mechanics model of a
polypeptide
chain is used to simulate thefolding
process. Each residue is definedby
a set ofindependent
characters within agiven
sequence.Simulations of this model in a mean-field
(infinite-dimensional)
geometry show that(i)
short chains do not fold,(ii)
medium chains foldaccording
to aspin-glass-like
« transition »(where
most characters must be
partially
satisfied)
and(iii)
long
chains foldaccording
to a Mattis-like « transition »(where
only
one dominant character is almostcompletely
satisfied).
Thischange
infolding
mechanism associated to chainlength
may be relevant to the existence of multidomainproteins.
Classification
Physics
Abstracts05.90 - 61.40D - 87.10
1. Introduction.
There is
enough
information in the amino-acid sequence of aprotein
to direct itsfolding
into its native conformation[1].
Deciphering
the stereochemical code which transforms theone-dimensional information contained in the amino-acid sequence into a
unique
three-dimen-sional structure has become a
major challenge,
since this would allow topredict
the conformation of aprotein
from its sequence. Thefolding
code appears to belargely
degenerate
since(i)
very different sequences can fold into almost identicalconformations,
and
(ii)
only
a limited subset of allpossible
sequences is able to fold into a stable conformation(*)
On leave of absence fromPhysique
des Solides,Orsay,
France.3068
[2].
Thefolding
of apolypeptide
chain is athermodynamically
driven process in favorableconditions : the chain minimizes its energy in
reaching
an ordered conformation. It is the stereochemical information coded inside the chain sequence which determines the energyassociated with a
given configuration
of the chain. Thisconfiguration
results from the balancebetween the
(attractive
as well asrepulsive)
interactions that the chain has with itself and itssurrounding
solvent. This balance may havedynamical
implications
due to thepossible
existence oflong-lived
metastable states. In this article we refine a statistical mechanics modelof a chain that we have
recently
studied[3],
andperform
Monte Carlo simulations of itsthermodynamics,
in asimplified
mean-fieldgeometry.
This model allows us tostudy
how the informationpresent
in a linear sequence is utilized to direct thefolding
of the chain into astable and
organized
structure, and itpredicts
that short andlong
chains will follow differentfolding
transitions.2. The model of a
polypeptide
chain.The chain consists of N
links,
with a contact interaction between links iat ri
and j
atrj
given by vij
8(rai
-rj)
where iand j
specify
thepositions
of the links in the sequence. Thiscontact interaction is an extreme limit of the more realistic situation where links i
and j
interact
only through
short rangeinteractions,
such assteric,
screenedCoulombic,
induceddipolar,
etc., interactions.Furthermore,
the absence ofangular dependent
terms in the interaction suppresses the existence ofsecondary
structures(helices,
sheets,
...).
Since the sequence of links isgiven,
thevij
arequenched
variables relative to thegeometry
of the chain. The Hamiltonian reads :Two extreme limits can be
studied,
depending
on theprobability
distribution of thevli :
1)
theVij
may be taken asindependent
random variables : this supposes that the interactions between apair
oflinks, i
andj,
is not related to the interactions between anotherpair, i’ and j’,
even when these twopairs
share a common link or when thesepairs
are madeby
the sametypes
of links but at differentpositions.
This case, which cannot take the nature of links iand j into
account and considersonly
theirpositions,
leads to aspin-glass-like
transitionbetween unfolded and folded states
[3a].
Thisanalogy
withspin-glasses
is consistent[5a]
withlong-lived
metastable states, slowrelaxations,
etc. A morephenomenological
model[5b],
based on arandom-energy
model,
leads to similarconclusions ;
2)
theVij
may bedecomposed
into a sum ofseparable
(Mattis-like)
interactions[3b] :
where
the ( §f)
areindependent
random variables. In thefollowing,
wetake ( §f =
±1 }
withequal probability.
Note that thefactor 1
inequation
(2)
ispresent
to avoid acomplete
collapse
of thechain,
due to the absence of hard-corerepulsion.
This form ofvij
supposes that the interactions of agiven
link i with the othersdepend
on someparticular
features of this(1)
associated with agiven
link,
and this set of( )f)
corresponds
to the « nature » of link i. Links of a different chemicaltype
(such
as the differentaminoacids)
will be describedby
different sets of M values of the( )f) .
In this case, the nature of the linkpresent
atposition
i in the sequence isexplicitly
consideredthrough
a set of Mindependent
characters ;
these characters can be viewed as the Van der Waalsvolume,
partial charge,
hydrogen-bonding
ability, hydrophobicity,
solvatation,
etc., of link i. Note that both the attractive andrepulsive
interactions of a
given
link i are consideredthrough
thesigns
of thecorresponding
M valuesvp :
for instance thehydrophobic
characteryields
apositive vp
(see
theappendix),
whereas theshort-range
Coulombic characteryields
anegative vp
since the interaction isgiven
qiq J 8
(rai
-rj).
With the twosimplifications
mentionned above(zero-range
and noangular
dependence
of theinteractions),
the frustration of thesystem
isclearly
decreased.However,
our model still has someimportant
frustration effects in itsinteractions,
and can be viewed as a firststep
towards a betterunderstanding
of thefolding
process.3. An
example :
thesingle
character chain(M = 1 ).
In order to illustrate the
physical
content of this model we willbriefly
describe a chain in which the residues are definedby
asingle
character,
i.e. M = 1. Forinstance,
this chain canbe viewed as
composed
ofhydrophilic
( 03BE = - 1 )
andhydrophobic
(g
= +1 )
residues. The interactions between the chain and the solvent molecules are describedby
amicroscopic
Hamiltonian which
leads,
through
integration
over thepositions
of solvent molecules(see
theappendix),
to an intra-chain interaction which is indeed describedby equation
(1).
Although
the solvent does not appear
explicitly
in theexpression
ofvij
given by equation
(1),
it is taken into accountthrough
thevariables ei
-In the
thermodynamic
limit N -> oo, the «folding
» transition of such a chaincorresponds
to a
simple
condensationgoverned by
thehydrophobic
character ;
ahydrophilic
residue likesto be close to another
hydrophilic
residue or to a solventmolecule,
while anhydrophobic
residueprefers
to be away from anhydrophilic
residue or a solvent molecule(this
simple
picture
breaks down if one considers the chain with hard-corerepulsion.
In that case, oneexpects
microdomainformation).
This transition can bedescribed by
an orderparameter
(à
laMattis) :
where (...)
and ... denoterespectively
thermal and disorder averages. This orderparameter
m (r)
is a measure of the correlation between the distribution ofthe 03BEi along
the chain(its
«
sequence »)
and that ofthe ri
(its
«conformation »).
Assuch,
it is a measure of thequality
of the
coding
of this « conformation »by
this « sequence ».4. The
physical
définition of a chain.A
key
parameter
of this model is the number M of characters needed to describe the interactions of agiven
type
of residue in the chain. In a broader view of statistical mechanicsmodels,
the Mindependent
characters associated with a link bear somesimilarity
to thepatterns
defined in neural networks theories[4].
As Mincreases,
oneexpects
thefolding
transition of a chain of N links to evolve from a Mattis-like to aspin-glass-like
behavior[5].
So(1)
This definition of M isslightly
different from the one used in reference[3b] ;
see the discussion in3070
far,
the model has been defined in any dimension. From now on, we will restrict ourselves tothe
following
(high-dimension
ormean-field)
approximation :
the chain is constructed on a set(simplex)
such that the distance between anypair
ofpoints
is constant(a
simplex
with threepoints
(resp.
fourpoints)
is anequilateral triangle
(resp.
atetrahedron), etc.).
Note that forM
= 1,
a chain on asimplex always
folds on at most fourpoints
(due
to the chainconstraint),
and different chemical sequencescorrespond
to differentpaths going through
these fourpoints.
One could alsostudy
model(2)
with M = aN in thethermodynamic
limit,
as in reference[4b].
Theresulting equations being
ratherintricate,
we have turned to a differentapproach.
For agiven
value ofM,
thefolding
of a «long »
chain willcorrespond
to a Mattis-liketransition,
and that of a « short » chain to aspin-glass-like
transition. We have firstchosen the value M = 8 for the number of characters
using
asemi-biological
argument,
andwe have then studied the
possible folding
transitions of chains of variouslengths
(20
N100 )
on thesimplex.
5. 8 characters may be sufficient to describe a
polypeptide
chain.Each
type
of residuebeing
definedby
a set of Mvariables (§f)
with values ±1,
the r.h.s. ofequation
(2)
can take2M
differentvalues,
andtherefore,
there are2M
differentpossible
values ofvij.
With 20 amino-acids there are 210possible
differentpairs
of residues(20
x21/2)
andthus 210 different interaction terms
vj,
if we assume that the interaction between residues iand j depends only
on the « nature » of these residues and not on theirposition
in thesequence. To obtain the 210 interaction terms needed for the 210 different
pairs,
we must take M so that2M >
210. The smallestinteger
valuepossible
is M = 8(a
similarargument
in thecase of RNA or DNA chains would lead to M = 4
independent
characters for eachbase).
Is M = 8 a sufficient number of characters to describe all the interactions between the 20 amino-acids ? Individual amino-acids canprobably
be discriminatedusing
more than 8features :
polar
orapolar,
neutral orcharged, large
orsmall,
helix former orbreaker,
hydrogen-bond
donor oracceptor,
etc., but these features do notcorrespond
totruely
independent
characters : forinstance,
properties
such ashydrogen-bonding
abilities,
polarity,
secondary
structurepreference,
and sidechain bulkiness arelikely
to be related. It is difficultto
specify
moreprecisely
what these 8independent
characters couldbe,
but this value of 8 issufficient to
distinguish
all interactions betweenpairs
of amino-acids. The simulationsgiven
belowcorrespond
to the case M =8,
a fixed set of 8characters,
and to chains of variablelengths
(20
N100 ).
6. Simulations.
We have
performed
Monte Carlo simulation for chains of N links(20
N100 )
constructedon a
simplex
of Npoints. Starting
with a random initialconfiguration
of thechain,
weperformed single
link moves(cost
in energyAE) :
anelementary
Monte Carlostep
moves alink i from its actual
position ri
on thesimplex
to a newposition
rneW (keeping
the rest of the chainunchanged),
where the newposition
is differentfrom ri -
1and ri +
1 to
satisfy
the chainconstraint. These moves are
accepted according
to theMetropolis algorithm
[6].
The outcomeof the calculation is :
(i)
the internal energy U =(H)
with H defined inequation
(1).
which is a measure of the correlation of the chain conformation with character p. It is
technically
more convenient to useglobal
orderparameters :
Since on a
given
chain,
the chemical sequence isfixed,
we have notaveraged equation
(4)
and
(5)
over the disorder.We have taken vl
= 1, V2
=0,5,
v3 =
0.2,
v4 = 0.15 andVi +4 Vi
(i
= 1,
..., 4 )
inequation
(2),
and the number of Monte Carlosteps
per link was 40 for N =40, 60, 100
(which
turned out to be sufficient to reach thermalequilibrium ;
this was checkedby using
different initialconditions),
and 100 for N = 20. In the last case, it will become clear that one has to increase this factor to reach thermalequilibrium (see below).
The internal energy U is shown infigure
1 as a function of thetemperature
T. A finite size « criticaltemperature
» can bedefined for N =
40, 60, 100,
corresponding
to thefolding
of the chain(this
is not the case for N =20,
due to finite sizefluctuations).
This criticaltemperature
is defined as thetemperature
at which some orderparameters
Sp
start to grow. The theoretical criticaltemperatures
(Ref. [3b])
are shown infigure
1,
forcomparison ;
theagreement
can be considered as reasonable. The orderparameters Sp
have been measuredjust
below the(finite
size)
transition(Tl,
Ti,
Tr)
and at low(finite size)
temperature
(To, Tô, TÕ)
for N =40, 60, 100.
We have also studied the case N = 20 at two similartemperatures.
In thatcase, there is a
progressive freezing
of thesystem,
due to the presence of many metastable states(since
one is in aglassy
situation)
separated by
small(since
N issmall)
barriers. Theresults are
displayed
in tableI,
whereonly
the Sp
larger
than10- 2
are shown.According
to thesimulations,
thefolding
« transitions » seem to be first order.For N
= 100,
it is clear that one is almost in the M = 1 case, at least close to the transition. There is asingle
charactershowing
up at the transition( Tl ).
As Ndecreases,
more and morecharacters appears at the transition
(if any).
At lowtemperatures,
the frustration grows andFig. 1.
-Internal energy U vs. temperature T for various chain
length
N. The stars denote the3072
Table 1. - For each value
of
N(N
=20, 40,
60,100 ),
the variousSp
(p
= 1,
...,8 )
areshown at two
different
temperatures.
For N =20,
where there is no « transition», the
Sp
are seen to growsimultaneously
as thetemperature
T is decreased. For(N
=40,
60,100 ),
thefirst
temperature(Tl, Tl, Tr)
is chosen to be close to thefolding
« transition », while(To, To, TÕ)
illustrates how the other characters grow at low temperature. A bar(-)
denotes a valueof
Sp
smaller than10- 2.
other dominated characters appear. In an
analytical
calculation such as the one mentionned insection
4,
one would say that forM >
a c, thefolding
transition is of thespin-glass
type,
N
whereas it is of the Mattis
type
forM
a C. Our simulations are inqualitative
agreement
withn
this
picture.
7. Conclusions.
The
present
model has considered chains of N links of 20 differenttypes,
the « amino-acids »,each one defined
by
8independent
characters. For very short chains(N 20),
the chaincannot fold into an ordered state. For medium chains
(20
N60 ),
folding
into an orderedstate involves the condensation of several characters.
Many
(if
notall)
different interactionsbetween
pairs
of links contribute to the decrease in internal energy associated withfolding.
Thecooperative
unit over which the chain finds anenergetic compromise
which minimized its frustrationcorresponds
to the entire chain itself :folding
resembles aspin-glass
transition.For
long
chains(N > 100),
the condensation of asingle
dominant character is sufficient topromote
folding
into an ordered state.Folding
resembles a Mattis-liketransition,
and there is astrong
correlation between the « conformation » of the chain and the distribution of thisdominant character
along
its « sequence ». Thechange
in theregime
which govems thefolding
of chains ofincreasing
lengths
may be relevant to the behavior of realproteins :
smallerproteins
fold into asingle
compact
structure upon ahighly cooperative
transition,
whereas in
large proteins,
differentsegments
along
the chain behave asindependent
folding
corresponds
to anoversimplified image
of apolypeptide
chain. In realchains,
links of the same chemicaltype
may not have identical sets of( )f)
if they
are at differentpositions along
thechain ; indeed,
although
twoglycine
residues arechemically
identical,
they
may have different abilities to form turns becausethey
arepreceeded
or followedby
different sequences. The definition of a « character »assigned
to agiven
link is openenough
toaccomodate not
only
intrinsicphysicochemical
properties
of individualamino-acids,
but alsosemi-empirical
data derived from theanalysis
of known structures[8].
Suchsimple
models may behelpful
toadapt
theconcepts
of statistical mechanics to the correlation between sequence and conformation ofproteins
and to elucidate the stereochemicalfolding
code[9].
Note added : After submission of this
work,
we have receivedpreprints by
E. I.Shakhnovich and A. M. Gutin
dealing
with similar models.Appendix.
Let the
short-range
interaction between link i of the chain(at ri)
and a water molecule(at
Ra)
be 03BEi
5 (ri - Ra ).
Thus § = - 1
(resp. e =
+1)
denotes anhydrophilic
(resp.
hyd-rophobic)
link. Thegrand partition
function of the chainplus
watersystem
reads :where À is the
fugacity
of a water molecule. We have :that is : o o
Expanding
(A3)
to second order in/3
yields :
of the form of
equation (2)
with VI =Nk,6
/2 (the
fact that the interaction has the same value for twohydro-philic
or-phobic
links is due to the zero range of theinteraction).
3074
References