HAL Id: jpa-00247827
https://hal.archives-ouvertes.fr/jpa-00247827
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Formation and stability of secondary structures in globular proteins
J. Bascle, T. Garel, Henri Orland
To cite this version:
J. Bascle, T. Garel, Henri Orland. Formation and stability of secondary structures in globular proteins.
Journal de Physique II, EDP Sciences, 1993, 3 (2), pp.245-253. �10.1051/jp2:1993126�. �jpa-00247827�
Classification Physics Abstracts
87.15B 36.20 05.90 61.40D 87.10
Formation and stability of secondary structures in globular proteins
J.
Bascle,
T. Garel and H. Orland(*)
Service de Physique Thdorique
(**),
CE-Saclay, 91191 Gif-sur-Yvette Cedex, France(Received
10 August 1992, accepted 27 October1992)
R4sum4. Nuns 4tudions deux modbles pour la formation et I'empiement d'h6Iices
on de
feuillets dans la phase globulaire
(compacte)
des prot4ines. Ces mod+Ies, fond4s sur des chemins hamiltoniens ponddrds sur r4seau, possbdent une transition de phase du premier ordre, entre(I)
une phase haute tempdrature compacte, avec structures secondaires non 6tendues, et
(it)
une phase compacte quasi-geI4e, oh Ies structures secondaires envahissent tout Ie rdseau. La phase quasi-gelde, qui a une ddpendance en tempdrature trbs foible, est identifide h la phase native desprotdines; la phase haute tempdrature est pent-4tre reticle h la phase "globule fondu"
(molten globule)
des protdines.Abstract. We study two models for the formation and
packing
of helices and sheets inglobular (compact)
proteins. These models, based on weighted Hamiltonian paths on aregular
lattice both exhibit
a first order transition between
a compact high temperature phase, with
no extended secondary structures, and a quasi-frozen compact phase, with secondary structures
invading the whole lattice. The quasi-frozen phase with very weak temperature dependence, is identified as the native phase of proteins, whereas the high-temperature phase may be relevant to the so-called molten globule state of proteins.
1 Introduction.
Proteins are
weakly
branchedpolymers,
built out of twentyspecies
of monomers(aminoacids);
they
have the property offolding
into an(almost) unique
compact native structure. This native structure is of greatinterest,
since it isintimately
related to thebiological
function of theprotein iii.
Thegeneric
formula of aminoacids(except
forproJine)
isNH2-CHR COOH,
where the variable part R is called the residue.
Polycondensation
of No aminoacids leads to(* Also at Groupe de Physique Thdorique Statistique, Universitd de Cergy-Pontoise, 95806 Cergy- Pontoise, Cedex, France.
(** Laboratoire de la Direction des Sciences de la Matibre du Commissariat h l'Energie Atomique.
246 JOURNAL DE PHYSIQUE II N°2
the formation of
proteins,
thetypical
size of whichbeing
in the range No '- 60 -1000(e,g.
sperm-whale myoglobin
has No "153).
Several levels of
complexity
may be defined in thefolding problem.
One may for instance consider the relevant interactions inproteins:
at amicroscopic level,
oneonly
deals with Coulomb interactions. On alonger scale,
thesemicroscopic
interactionsgive
rise to covalentbonding,
effective Coulomb interactions(with partial charges
orscreening), hydrogen bonding,
Van der Waals interactions and
hydrophobic
effects with the solvent. It is clear that all these interactions areinterdependent.
The
complexity
among the interactions is also reflected in the multi-level structure of the foldedproteins. Roughly speaking,
theprimary
structure(I.e.
the chemical sequence of theprotein)
is due to covalentbonding, whereas,
as shownby Pauling
[2],hydrogen bonding
isresponsible
for the existence ofsecondary
structures, I-e- o-helices orp-sheets (see Fig. I).
Finally,
thetertiary
structure ismostly
drivenby
thehydrophobic
effect: the nativeprotein
is compact so that apolar
residue may hide away from the solvent./
NH
(al lbl
Fig,
i. Schematic representation of hydrogen bonds in(a)
a-helix(b)
fl-sheet.A
physical approach
to thestudy
of low energy compact structures can be stated in thefollowing
way: the number ofhydrogen
bonds(H-bonds)
with the solvent isproportional
to the surface of theprotein.
This surfacebeing
constant, one is left with a balance betweenintraglobular
H-bonds and therequirement
ofglobular compacity. Fully
saturated H-bondsimply
a one dimensional(o helix)
or two dimensional(fl sheets)
local structure, bothbeing incompatible
with a compact three dimensional structure. This situation therefore bears someresemblance to the situation one encounters in
glasses,
where local order(e.g.
five-fold sym-metry)
is notcompatible
with theglobal space-filling
constraint [3].Indeed,
one of the moststriking
features of the native structure ofproteins
is its almost frozen character. On the ex-perimental side,
thefolding (or denaturation)
transition seems, in some cases, toproceed
in two steps [4], the intermediatephase being
the sc-called moltenglobule.
In this paper, we
study
the statistical mechanics of(bulk)
H-bonds in the compactphase.
Two
models, describing respectively
the formation of o-helices andp-sheets
are introduced.Both models are formulated in terms of
weighted
Hamiltonianpaths;
in a mean fieldapproach,
we get a first order transition between a
high
temperature compactliquid phase (which
weinterpret
as the moltenglobule)
and a low temperaturecrystal phase (which
weinterpret
asthe native
state).
Note that thecrystal phase
has(almost) fully
saturated H-bonds and is thus(almost)
frozen.2
Physical
models.Since we are
only
concerned with theglobular
state of theprotein,
we will consider compactchains, represented
as Hamiltonianpaths
on a lattice [5]. Thisrepresentation,
which is usual in themodelling
ofcollapsed polymer chains,
can be summarized as follows: consider ahypercubic
lattice in a d-dimensional space, with N = L~
sites,
andperiodic boundary
conditions. A Hamiltonianpath
on the lattice is a walk which goesthrough
all sites once andonly
once.Such a walk satisfies both the
compacity
and self-avoidednessrequirements. Moreover,
we will restrict ourselves to closedpaths,
sinceboundary
conditionsplay only
a role in subdominant terms [5].We will now introduce two different models, to mimic the formation of o-helices or
p-sheets
in such compact structures.
2. I n-helices. In this
model,
each link of the Hamiltonianpath
represents a helical turn.We recall
that,
in realproteins,
a helical turncorresponds
to 3.6 aminc-acids on the average, and that H-bonds stabilize extended helical structures.Thus,
in ourmodel,
we attribute anenergy loss e to the
breaking
of a"helix",
that is whenever the Hamiltonianpath
makes a turn(corner).
The
partition
function of the system at inverse temperaturefl
=
$
reads:z =
£ e-P«
N~l'i)ji)
jiij
where
(~l)
denotes the ensemble of all Hamiltonianpaths
andNc(~l)
denotes the number ofcorners present in
path
~l. Infigure 2,
we show anexample
of a Hamiltonianpath.
Fig. 2. A Hamiltonian path in d = 2 with Nc = 14 corners.
This model has been studied in the context of the
melting
of semi-flexiblepolymer
chains [6,7].
In order to calculate Z, we follow reference [7, 8]. We introduce on each site r and for248 JOURNAL DE PHYSIQUE II N°2
each direction o
= I,..
,d
an n-component real field ~~(r).
Thepartition
function Z can be rewritten:z =
~'im~
/ fl ji d~a lr) e~~° fl ( (
~l lr)
+ e~P~£
~ajr)
~,r)j
12a)
r «=i r «=i i<«<,<d
with
~
~G
"~ £
i'a(~) '(l~$r ')
i'a(~ ') (2~)
o=I r,r '
where the operator AQ~
,
is I if r and r ' are
nearest-neighbours
in direction o, and 0 otherwise.In order to prove the
equivalence
of(2)
andii),
we will use Wick's theorem: wedefine,
from(2),
the elementary contraction:i~b~ l~J §'i~
(~'J
"6Uu6a7Alr
' 13)
where u and v are component indices of ~~
jr), running
from I to n.Expanding
theproduct
over(r)
in(2a),
we must choose at each site r either a term of the form)~$ jr)
or one of the forme~P~~~ (r)
~,(r), corresponding respectively
to apath going straight through
r in direction o, or to apath making
a turn at r, from directiontY to direction
~.
By contracting
the fieldsaccording
to(3),
we construct a sum over all selfavoiding
compactclosed
paths
withappropriate weights,
with an additional factor n(due
to the summation over component indexu)
for each closedloop.
As usual [9], we extract thesingle
chain contributionby taking
the limit n= 0. This concludes the
proof.
From thisproof,
it is clear that vacancies(empty sites)
caneasily
be included in themodel, by adding
to the terms in brackets in(2a)
a term of the form
e~P",
where ~ is the chemicalpotential
for the vacancies. In that case,equation (4) (see below)
isslightly modified, giving
rise to thepossibility
ofphase separation
between
vacancy-rich
and vacancy-poor(globular) phases.
In order to evaluate
(2a),
we will resort to a mean-fieldtheory,
thatis,
a saddlepoint
method.The
ground
state of the model consists ofstraight paths, making
turns on the surface of the lattice(their
free energybeing
nonextensive,
of orderL~~~).
A correct mean-field treatmentshould,
apriori,
take this one-dimensional character into account, but we have shown elsewhere [7] that excellent results are obtainedby using
astraightforward isotropic homogeneous
mean-field. We have also shown in
iii
that fluctuations around thisisotropic
mean field should not beincluded,
since itspoils
thequality
of the results.The mean field
equations (saddle point
on(2a))
read:§ ~~~
'~ ~ ~~ ~~ '~l £a' §'I)~~~)
~~~ ~~))~
~j'~J
§'7'
l~J
~~~The
isotropic homogenous
mean-field assumes ~~jr)
= ~ for any o
=
I,..,d
and r. We further break theO(n)
symmetryby choosing
~ in agiven
direction, say I. From(4),
weobtain:
§'~~~ "
(5a)
At this mean-field
level,
the free energy per site reads:f
= -kBTlog ()) (5b)
e
with
q(fl)
= 2 +2(d I)e~P~ (5c)
and e = 2.71828... Note that
q(fl) plays
the role of an effective coordination number(see Ref.[8]).
On the other
hand,
the free energy is adecreasing
function of temperature(since
the entropy ispositive),
and as mentionedabove,
theground
state free energy per site vanishes(it
is oforder
I/L). Thus,
the free energy per site remainsnegative (or zero)
at all temperatures.There is a temperature
TF
for which the effective coordination numberq(fl)
isequal
to e, andf
vanishes. For d=
3, kBTF
= 0.58 e. Below this temperature, the free energy remainsequal
to zero(Fig.3).
t~ t
0J 0.8 1.2 1.6 20
f/~
l.0
Fig. 3. Plot of the free energy per site
(Eq.(5b))
versus temperature, for d= 3. t is the reduced temperature: t
=
kBTle.
Physically,
there is acompetition
between the entropygain
ofmaking
turns, and the cor-responding
energy loss. Athigh
temperature, the corners are mobile in thebulk, leading
to aliquid
like structure, whereas at low temperature, the system isfrozen,
in stretchedwalks,
with the corners
expelled
on the surface. The tworegions
areseparated by
a first orderphase
transition at TF. The average
length
of a helix isgiven by
~
U
(flF)
~~~where
U(fl)
=-]log
Z is the internal energy.At the
freezing point,
in d= 3, the
length
isequal
to £F "3.78,
and it is infiniteit
=L)
in the low temperaturephase.
Note that £F
corresponds
to atypical
number of aminc-acids per o-helix of the order of15.In a more elaborate treatment [7], one finds a very weak temperature
dependence,
of the low temperaturephase,
but the overallfreezing picture
remains correct.Indeed,
in the frozenphase,
thetypical length
of a helix is~6fle g ~
12
250 JOURNAL DE PHYSIQUE II N°2
yielding
£= 2592
just
below TF(corresponding
to'9330residues).
Thislength
scale isclearly
out of reach in any realistic
protein,
orpolymer
system. The(first order) complete freezing picture
is thusadequate
for anypractical
purpose.2. 2
p-sheets-
Weagain
use the Hamiltonianpath formalism,
with theslight
modification that a link of apath
is now to beinterpreted
as an aminc-acid.The model can be described in the
following
way. Consider a Hamiltonianpath.
To mimic the formation ofCO-HN,
I-e- aH-bond,
inp-sheets,
we allow an H-bond(energy gain e),
whenever two
pairs
ofaligned
linksbelong
to twonon-intersecting neighbouring
strands(see FigA).
Thepartition
function reads:z =
~ ~ e-P«
N~(it)j~)
j7ij jH-bondsj
(ai (hi
~-Y
Fig. 4. The two different possible types of H-bonds in the fl-sheet model.
The summation runs over all
possible
Hamiltonianpaths ~l,
and over allpossible
sets of H-bondscompatible
with thepath (see below).
We show in
figure
5 such a Hamiltonianpath.
Fig. 5. A Hamiltonian path in d
= 2 with NH = 4
(out
of fivepossible) hydrogen-bonds.
In order to
give
anintegral representation
ofZ,
in addition to theprevious
fields ~~jr)
which generate thepaths,
we need to introduce two scalar fields~l$ jr)
and ~lajr)
whichrespectively
initiate and terminate and H-bond at site r in direction o. We obtain
~ j~~ I
J n~,r ~h'a (~) dlfia (~) dint (~) ~~~~ fir
~(~)
~ ~
n-o
I j n~
~
d~2~
jr) dfbo jr) dfbt jr)
e~~G ~with
~G
"i~ ~
()i'o (~) (l~~r ')
i'a(~ ')
+l~$ (~) (l~~~')
~lba (~ )j (~~)
o=I r,r '
and
Dir)
"~ (i°i jr) Gn jr)
+~
i°«(r)
i°bjr)
18C)and
G« jr)
= i + eP~/~~ (~t jr)
+~, jr))
+ e~~l~ ~t (rJ ~, jr) (8d)
7(#") 7(#")
The operator AQ~
,
is the one defined in
2,I,
whereasAQf,
= 6(r' (r
+ea)),
where ea is the unit vector in direction o.Expanding
theproduct
over r in(8a),
at each site r, we must choose eitherii)
a term of the form)~$ jr) Ga jr)
or(it)
a term of the form ~~jr)
~~jr)
The latter
(it)
represents a corner and does not allow for an H-bond. The former I) rep- resents apath going straight through
r in direction o, andaccording
to(8d),
allows for fourpossibilities, namely,
no bond withweight
I, one bondentering
orleaving
r in direction7(# o)
with
weight eP~/~,
andfinally,
one bondentering
and oneleaving
at r in direction7(# o)
withweight
eP~As in section 2,I, the identification of
ii)
and(8a) proceeds through
the use of the Wick'stheorem,
with thefollowing elementary
contractions:i~b~ (~) i~(~
(~'J
"6afl6uuAlr
'
l~t l~)
l~fll~ 'J
" SUPAll'
~~~~fia
jr)
~fip(r ')
= ~fit(r)
~fitjr ')
= o
These contractions indeed generate the
required partition
function. As mentioned in the pre- vious section vacancies can be included in the modelby
aslight
modification of Djr).
As in the
previous section,
westudy
this model in a mean-fieldapproximation.
The mean- fieldequations
read:~ (~ir ')~
h'a(~ ')
"
~~ ~~~
~~ ~~j~~~~~
~~ ~~~(1°~)
i
(~il')
trio(~ ')
"
~~~~~~ ~~
~~)~~~~
~ ~~~~~
~~~~(~°~)
i~ I~SS)
~~l~t (~ ')
"
~~~~~~ ~~ ~))~)~~
~~~~~~
~~~~(1°~)
252 JOURNAL DE PHYSIQUE II N°2
Restricting
ourselves to anisotropic homogeneous solution,
~~jr)
= ~,
~l$ (r)
=
~la jr)
= ~l,
we
get:
ia ~ =
(ha)
and ~l satisfies the
equation:
~l =
~~~ ~~
~~~~
~~ ~ ~~~~~~~
ii16)
with
D = 2d +
4(d I)eP~/~~l
+2(d I)eP~~l~ (llc)
The free energy per site reads:
f=T(I+d~l~-logD) (12)
As in the
previous model,
theground
stateconfiguration corresponds
to frozenfully
stretchedconfigurations
saturated with transverse H-bonds(stacked fl-sheets),
with energy per siteequal
to -e.
Therefore,
we havef
< -e(13)
Solving numerically equation (lib),
we findagain
a first orderfreezing
transition at a tem-perature TF where
f
= -e. At d =3,
we get kBTF " 0.86 e.The
physics
of this model is very similar to that of theprevious
model2.I, namely,
aliquid-
like
high
temperaturephase
with no definitep-sheet
structure, and a low temperature frozenphase, consisting
of stacks ofparallel p-sheets-
A non
isotropic
type ofmean-field,
as in referenceiii,
wouldprobably again
induce a very weak temperaturedependence
in the low temperaturephase,
but thebiological
robustness of the two dimensionalp-sheets
should make the temperaturedependence
even weaker than in theprevious
case.Furthermore,
due to thehigher dimensionality
of thep-sheets,
we believe theisotropic
mean-fieldapproximation
to be even better than in theprevious
case.3 Conclusion.
We have studied in a mean-field
approximation,
two models for the formation ofsecondary
structures in
globular proteins (in
thethermodynamic limit).
These models exhibit a first order transition between
a
high-temperature liquid-like
com- pactphase
and aquasi-frozen low-temperature phase.
In thisquasi-frozen phase, secondary
structures span the whole system.
As far as
comparison
withproteins
isconcerned,
several caveats are in order.ii)
The latticeapproach,
where one link is taken as a helix turn or anaminc-acid,
may be too crude.(ii)
The models are studied in thethermodynamic
limit(N
= L~
going
toinfinity). Choosing arbitrarily
a size of N=
10~ -10~ aminc-acids for a protein, the
quasi-frozen phase
is in factcompletely
frozen in thebulk,
since thermal fluctuations would be presentonly
in muchlarger
systems, N~-
10~ aminc-acids
(see
Sect.2).
However,
surface modes should not befrozen,
and a more detailedstudy
of finite size effects is in order.(iii)
Since we have consideredonly
compact structures, we haveimplicitly
assumed thecollapse
energy Ec to be muchlarger
than the intramolecular H-bond energy EH. In real systems, the ratioEc/EH
seems ratherlarge,
of order 20 [10].A different limit is
currently studied,
where Ec « EH> and the formation ofsecondary
structures is initiated in the unfolded
(non-compact) phase.
(iv) Finally,
let us note that aminc-acids havegiven probabilities
to be present intY-helices, p-sheets
or turns(I.e,
nosecondary structure).
This is reflected in the Ramachandranplots iii;
a more elaborateapproach
should include theseweights.
In our
models,
the denaturationproceeds
in two steps: a first step, wheresecondary
struc- tures unfreeze and become mobile. Thisphase
seemsclosely
related to the moltenglobule
of reference [4]; in a second step(not
describedby
the abovemodels)
the compactglobule
wouldunfold,
into a swollen coil.References
iii (a)
Creighton T-E-, Proteins(W.H.
Freeman, New York, 1984);(b)
Levitt M., Current Opinion in Structural Biology1(1991)
224 [2](a)
Pauling L. and Corey R-B-, P-N-A-S. 37(1951)
235, 251, 272, 729;(b)
Richardson J-S-, Adv. Prot. Chem. 34(1981)
167.[3]
(a)
Kldman M, and Sadoc J-F-, J. Pllys. 40(1974)
L569;(b)
Nelson D-R- and Spaepen F-S-, Solid State Phys. 42(1988)
and references therein.[4]
(a)
Ptitsyn O-B-, J. Protein Chem. 6(1987)
273;(b)
Stigter D., Alonso D-O-V- and Dill K-A-, P-N-A-S. 88(1991)
4176.[5] des aoizeaux J. and Jannink G., Les Polymbres en Solution
(Editions
de Physique, Les Ulis,1987).
[6] Flory P-J-, Proc. R. Soc., London Ser. A234
(1956)
60.[7] Bascle J., Garel T. and Orland H., J. Phys. A25
(1992)
L1323.[8] Orland H., Itzykson C, and De Dominicis C., J. Phys. Lett. 46
(1985)
L353.[9]de Gennes P-G-, Phys. Lett. A38
(1972)
339.[10] Gamier J., private communication.