Formation and stability of secondary structures in globular proteins

(1)

HAL Id: jpa-00247827

https://hal.archives-ouvertes.fr/jpa-00247827

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Formation and stability of secondary structures in globular proteins

J. Bascle, T. Garel, Henri Orland

To cite this version:

J. Bascle, T. Garel, Henri Orland. Formation and stability of secondary structures in globular proteins.

Journal de Physique II, EDP Sciences, 1993, 3 (2), pp.245-253. �10.1051/jp2:1993126�. �jpa-00247827�

(2)

Classification Physics Abstracts

87.15B 36.20 05.90 61.40D 87.10

Formation and stability of secondary structures in globular proteins

J.

Bascle,

T. Garel and H. Orland

(*)

Service de Physique Thdorique

(**),

CE-Saclay, 91191 Gif-sur-Yvette Cedex, France

(Received

10 August 1992, accepted 27 October

1992)

R4sum4. Nuns 4tudions deux modbles _pour la formation et I'empiement d'h6Iices

on de

feuillets dans la phase globulaire

(compacte)

des prot4ines. Ces mod+Ies, fond4s _sur des chemins hamiltoniens ponddrds _sur r4seau, possbdent _une transition de phase du premier ordre, entre

(I)

une phase haute tempdrature compacte, avec structures secondaires _non 6tendues, et

(it)

_une phase _compacte quasi-geI4e, ^{oh Ies} structures secondaires envahissent _tout Ie rdseau. La phase quasi-gelde, qui _a _une ddpendance _en tempdrature trbs foible, est identifide h la phase native des

protdines; la phase haute tempdrature est pent-4tre reticle h la phase "globule fondu"

(molten globule)

des protdines.

Abstract. We study two models for the formation and

packing

of helices and sheets in

globular (compact)

proteins. These models, based _on weighted Hamiltonian paths _on _a

regular

lattice both exhibit

a first order transition between

a compact high temperature phase, with

no extended secondary structures, and _a quasi-frozen compact phase, with secondary structures

invading the whole lattice. The quasi-frozen phase with very weak temperature dependence, is identified _as the native phase of proteins, whereas the high-temperature phase _may be relevant to the so-called molten globule state of proteins.

1 Introduction.

Proteins _are

weakly

branched

polymers,

built _out of twenty

species

of _monomers

(aminoacids);

they

have the property of

folding

into _an

(almost) unique

compact native structure. This native structure is of great

interest,

since it is

intimately

related to the

biological

function of the

protein iii.

The

generic

formula of aminoacids

(except

for

proJine)

is

NH2-CHR COOH,

where the variable part ^{R is} ^called ^the ^residue.

Polycondensation

of No aminoacids leads to

(* ^Also at Groupe de Physique Thdorique Statistique, Universitd de Cergy-Pontoise, 95806 Cergy- Pontoise, Cedex, France.

(** Laboratoire de la Direction des Sciences de la Matibre du Commissariat h l'Energie Atomique.

(3)

246 JOURNAL DE PHYSIQUE II N°2

the formation of

proteins,

^the

typical

size of which

being

in the _range No _'- 60 -1000

(e,g.

sperm-whale myoglobin

has No _"

153).

Several levels of

complexity

_may be defined in the

folding problem.

One _may for instance consider the relevant interactions in

proteins:

at _a

microscopic level,

_one

only

deals with Coulomb interactions. On _a

longer scale,

these

microscopic

interactions

give

rise to covalent

bonding,

effective Coulomb interactions

(with partial charges

or

screening), hydrogen bonding,

Van der Waals interactions and

hydrophobic

effects with the solvent. It is clear that all these interactions _are

interdependent.

The

complexity

_among the interactions is also reflected in the multi-level structure of the folded

proteins. Roughly speaking,

^the

primary

structure

(I.e.

^the ^chemical _sequence of the

protein)

is due to covalent

bonding, whereas,

_as shown

by Pauling

[2],

hydrogen bonding

is

responsible

for the existence of

secondary

structures, ^I-e- ^o-helices or

p-sheets (see Fig. I).

Finally,

the

tertiary

structure is

mostly

driven

by

the

hydrophobic

effect: the native

protein

is compact _so ^that a

polar

residue _may hide _away from the solvent.

/

NH

(al lbl

Fig,

i. Schematic representation of hydrogen bonds in

(a)

a-helix

(b)

fl-sheet.

A

physical approach

to the

study

of low energy compact structures can be stated in the

following

_way: the number of

hydrogen

bonds

(H-bonds)

with the solvent is

proportional

to the surface of the

protein.

This surface

being

constant, one is left with _a balance between

intraglobular

H-bonds and the

requirement

of

globular compacity. Fully

saturated H-bonds

imply

_a _one dimensional

(o helix)

_or two dimensional

(fl sheets)

local structure, ^both

being incompatible

with _a compact three dimensional structure. This situation therefore bears _some

resemblance to the situation _one _encounters in

glasses,

where local order

(e.g.

five-fold _sym-

metry)

is not

compatible

with the

global space-filling

constraint [3].

Indeed,

_one of the most

striking

features of the native structure of

proteins

is its almost frozen character. On the _ex-

perimental side,

the

folding (or denaturation)

transition _seems, in _some _cases, _to

proceed

in two steps [4], ^the intermediate

phase being

the sc-called molten

globule.

In this _paper, _we

study

the statistical mechanics of

(bulk)

H-bonds in the compact

phase.

Two

models, describing respectively

the formation of o-helices and

p-sheets

_are introduced.

Both models _are formulated in terms of

weighted

Hamiltonian

paths;

in _a _mean field

approach,

we get a first order transition between _a

high

temperature compact

liquid phase (which

_we

interpret

_as the molten

globule)

and _a low temperature

crystal phase (which

we

interpret

as

(4)

the native

state).

Note that the

crystal phase

has

(almost) fully

saturated H-bonds and is thus

(almost)

frozen.

2

Physical

models.

Since _we _are

only

concerned with the

globular

state of the

protein,

_we will consider compact

chains, represented

as Hamiltonian

paths

_on _a lattice [5]. ^This

representation,

which is usual in the

modelling

of

collapsed polymer chains,

_can be summarized _as follows: consider _a

hypercubic

lattice in _a d-dimensional _space, with N ₌ L~

sites,

and

periodic boundary

conditions. A Hamiltonian

path

_on the lattice is _a walk which _goes

through

all sites _once and

only

_once.

Such _a walk satisfies both the

compacity

and self-avoidedness

requirements. Moreover,

_we will restrict ourselves _to closed

paths,

since

boundary

conditions

play only

_a role in subdominant terms [5].

We will _now introduce two different models, to mimic the formation of o-helices _or

p-sheets

in such compact structures.

2. I n-helices. In this

model,

each link of the Hamiltonian

path

represents _a helical turn.

We recall

that,

in real

proteins,

_a helical turn

corresponds

to 3.6 aminc-acids _on the _average, and that H-bonds stabilize extended helical structures.

Thus,

in _our

model,

_we attribute _an

energy loss _e _to the

breaking

of _a

"helix",

that is whenever the Hamiltonian

path

makes _a turn

(corner).

The

partition

function of the system at inverse temperature

fl

=

$

reads:

z ₌

£ e-P«

N~l'i)

ji)

jiij

where

(~l)

denotes the ensemble of all Hamiltonian

paths

and

Nc(~l)

denotes the number of

corners present in

path

~l. In

figure 2,

we show _an

example

of _a Hamiltonian

path.

Fig. 2. A Hamiltonian path in d ₌ 2 with Nc ₌ 14 _corners.

This model has been studied in the context of the

melting

of semi-flexible

polymer

chains [6,

7].

În ôrder to calculate Z, _we follow reference [7, 8]. ^We întroduce on each site _r and for

(5)

each direction _o

= I,..

,d

_an n-component real field ~~

(r).

The

partition

function Z _can be rewritten:

z ₌

~'im~

/ _fl ^ji _d~a _lr) _e~~° _fl ⁽ (

~l lr)

₊ e~P~

£

_~a

_jr)

_~,

r)j

12a)

r «=i _r «=i i<«<,<d

with

~

~G

_"

~ £

_i'a

_(~) _'(l~$r ')

_i'a

_(~ ') (2~)

o=I r,r ^'

where the operator AQ~

,

is I if _r and _r ^' _are

nearest-neighbours

in direction _o, and 0 otherwise.

In order to prove the

equivalence

of

(2)

and

ii),

we will _use Wick's theorem: _we

define,

from

(2),

the elementary contraction:

i~b~ l~J §'i~

_(~

'J

_"

6Uu6a7Alr

' 13)

where _u and _v _are component ^indices _{of ~~}

jr), running

from I to _n.

Expanding

the

product

_over

(r)

in

(2a),

we must choose at each site _r either _a term of the form

)~$ jr)

_or _one of the form

e~P~~~ (r)

_~,

(r), corresponding respectively

to a

path going straight through

_r in direction _o, _or to _a

path making

_a turn at r, from direction

tY to direction

~.

By contracting

^the ^fields

according

to

(3),

_we construct _a _sum _over all self

avoiding

_compact

closed

paths

with

appropriate weights,

^with an additional factor _n

(due

to the summation _over component ^index

u)

for each closed

loop.

As usual [9], we extract the

single

chain contribution

by taking

the limit _n

= 0. This concludes the

proof.

From this

proof,

it is clear that vacancies

(empty sites)

_can

easily

be included in the

model, by adding

to the _terms in brackets in

(2a)

a term of the form

e~P",

where _~ is the chemical

potential

for the vacancies. In that _case,

equation (4) (see below)

^is

slightly modified, giving

rise to the

possibility

of

phase separation

between

vacancy-rich

and vacancy-poor

(globular) phases.

In order to evaluate

(2a),

_we will resort to a mean-field

theory,

that

is,

_a saddle

point

method.

The

ground

state of the model consists of

straight paths, making

turns on the surface of the lattice

(their

free _energy

being

_non

extensive,

of order

L~~~).

A correct mean-field treatment

should,

_a

priori,

take this one-dimensional character into account, but _we have shown elsewhere [7] that excellent results _are obtained

by using

_a

straightforward isotropic homogeneous

_mean-

field. We have also shown in

iii

that fluctuations around this

isotropic

_mean field should _not be

included,

since it

spoils

the

quality

of the results.

The _mean field

equations (saddle point

_on

(2a))

read:

§ ~~~

_'~ ^~ _~~ _{~~ '~}

l £a' §'I)~~~)

~ ))~

~j'~J

§'7'

l~J

^~~~

The

isotropic homogenous

mean-field _assumes _~~

jr)

= ~ for _any _o

=

I,..,d

and _r. We further break the

O(n)

symmetry

by choosing

_~ in _a

given

direction, _say I. From

(4),

_we

obtain:

§'~~~ "

(5a)

At this mean-field

level,

the free _energy _per site reads:

f

= -kBT

log ()) _(5b)

e

(6)

with

q(fl)

₌ 2 +

2(d I)e~P~ (5c)

and _e ₌ 2.71828... Note that

q(fl) plays

the role of _an effective coordination number

(see Ref.[8]).

On the other

hand,

the free energy is _a

decreasing

function of temperature

(since

the entropy is

positive),

and _as mentioned

above,

the

ground

state free energy per site vanishes

(it

is of

order

I/L). Thus,

the free energy per site remains

negative (or zero)

at all temperatures.

There is _a temperature

TF

for which the effective coordination number

q(fl)

is

equal

to e, and

f

vanishes. For d

=

3, kBTF

₌ 0.58 _e. Below this temperature, ^the ^free energy remains

equal

to zero

(Fig.3).

t~ ^t

0J 0.8 1.2 1.6 20

f/~

l.0

Fig. 3. Plot of the free _energy _per site

(Eq.(5b))

_versus temperature, for d

= 3. t is the reduced temperature: t

=

kBTle.

Physically,

there is _a

competition

between the entropy

gain

of

making

turns, and the _cor-

responding

_energy loss. At

high

temperature, the _corners _are mobile in the

bulk, leading

to a

liquid

like structure, whereas _at low temperature, the system is

frozen,

in stretched

walks,

with the _corners

expelled

_on the surface. The _two

regions

_are

separated by

_a first order

phase

transition at TF. The average

length

of _a helix is

given by

~

U

(flF)

^~~~

where

U(fl)

₌

-]log

Z is the internal _energy.

At the

freezing point,

in d

= 3, ^the

length

is

equal

to £F _"

3.78,

and it is infinite

it

₌

L)

in the low temperature

phase.

Note that £F

corresponds

to _a

typical

number of aminc-acids _per o-helix of the order of15.

In _a _more elaborate treatment [7], one finds _a _very weak temperature

dependence,

of the low temperature

phase,

but the overall

freezing picture

remains _correct.

Indeed,

in the frozen

phase,

the

typical length

^of a helix is

~6fle g _~

12

(7)

yielding

£

= 2592

just

below TF

(corresponding

to'9330

residues).

This

length

scale is

clearly

out of reach in any realistic

protein,

_or

polymer

system. The

(first order) complete freezing picture

is thus

adequate

for any

practical

_purpose.

2. 2

p-sheets-

We

again

_use ^the Hamiltonian

path formalism,

with the

slight

modification that _a link of _a

path

is _now to be

interpreted

as an aminc-acid.

The model _can be described in the

following

_way. ^Consider a Hamiltonian

path.

To mimic the formation of

CO-HN,

I-e- _a

H-bond,

in

p-sheets,

_we allow _an H-bond

(energy gain e),

whenever two

pairs

of

aligned

links

belong

to two

non-intersecting neighbouring

strands

(see FigA).

The

partition

function reads:

z =

~ ~ ^e-P«

^N~(it)

_j~)

j7ij jH-bondsj

(ai (hi

~-Y

Fig. 4. The two different possible types ^of ^H-bonds ⁱⁿ ^the fl-sheet model.

The summation _runs _over all

possible

Hamiltonian

paths ~l,

and _over all

possible

sets of H-bonds

compatible

with the

path (see below).

We show in

figure

5 such _a Hamiltonian

path.

Fig. 5. A Hamiltonian path in d

= 2 with NH ₌ 4

(out

of five

possible) hydrogen-bonds.

In order to

give

_an

integral representation

of

Z,

in addition to the

jr)

which generate the

paths,

_we need to introduce two scalar fields

~l$ jr)

and ~la

jr)

which

respectively

(8)

initiate and terminate and H-bond at site _r in direction _o. We obtain

~ j~~ ^I

J n~,r ~h'a (~) dlfia (~) dint _{(~) ~~~~} fir

~

(~)

~ ~

n-o

I j n~

~

d~2~

jr) _dfbo jr) dfbt jr)

e~~G ^~

with

~G

_"

i~ _~

()i'o ^{(~) (l~~r} ^')

^i'a

^(~ ^')

⁺

^{l~$ (~)} ^(l~~~')

^~

^{lba (~} ^)j ^(~~)

o=I r,r ^'

and

Dir)

"

~ (i°i ^jr) ^Gn jr)

+

~

_i°«

_(r)

_i°b

_jr)

_18C)

and

G« jr)

₌ i + eP~/~

~ (~t jr)

₊

~, jr))

₊ e~~

l~ ~t _(rJ ~, jr) (8d)

7(#") 7(#")

The operator _AQ~

,

is the _one defined in

2,I,

whereas

AQf,

₌ ⁶

(r' (r

+

ea)),

where _ea is the unit vector in direction _o.

Expanding

^the

product

_over r in

(8a),

at each site r, we must choose either

ii)

_a term of the form

)~$ jr) Ga jr)

_or

(it)

_a term of the form _~~

jr)

_~~

jr)

The latter

(it)

represents a corner and does not allow for _an H-bond. The former I) represents _a

path going straight through

r in direction _o, and

according

to

(8d),

allows for four

possibilities, namely,

_no bond with

weight

I, _one bond

entering

or

leaving

_r in direction

7(# o)

with

weight eP~/~,

and

finally,

one bond

entering

and _one

leaving

at r in direction

7(# o)

with

weight

eP~

As in section 2,I, ^the identification of

ii)

and

(8a) proceeds through

the _use of the Wick's

theorem,

with the

following elementary

contractions:

i~b~ (~) i~(~

(~

'J

_"

6afl6uuAlr

'

l~t _l~)

_l~fl

_{l~ 'J}

_" _SUP

All'

^~~~

~fia

jr)

_~fip

(r ')

₌ ~fit

(r)

_~fit

jr ')

= o

These contractions indeed generate ^the

required partition

function. As mentioned in the _pre- vious section vacancies _can be included in the model

by

_a

slight

modification of D

jr).

As in the

previous section,

_we

study

this model in _a mean-field

approximation.

The _mean- field

equations

read:

~ _{(~ir ')~}

_h'a

_(~ _')

"

~~ ~~~

j~~~~~

_{~~ ~~~}

(1°~)

i

(~il')

_trio

_(~ ')

"

~~~~

~~

)~~~~

^~ ^~~~

^~~

_~~~~

(~°~)

i~ _I~SS)

^~~

_l~t _(~ _')

"

~~~~ ~))~)~~

~

~~~~~

_~~~~

(1°~)

(9)

Restricting

ourselves to _an

isotropic homogeneous solution,

_~~

jr)

= ~,

~l$ (r)

=

~la jr)

= ~l,

we

get:

ia ^~ =

(ha)

and ~l satisfies the

equation:

~l =

~~~ _~~

~~~~

~~ ~ ~~~~~~~

ii16)

with

D ₌ 2d +

4(d I)eP~/~~l

+

2(d _I)eP~~l~ (llc)

The free _energy _per site reads:

f=T(I+d~l~-logD) ₍₁₂₎

As in the

previous model,

the

ground

state

configuration corresponds

to frozen

fully

stretched

configurations

saturated with transverse H-bonds

(stacked fl-sheets),

with _energy _per site

equal

to -e.

Therefore,

_we have

f

< _-e

(13)

Solving numerically equation (lib),

_we find

again

_a first order

freezing

transition at _a tem-

perature TF where

f

₌ _-e. At d ₌

3,

_we get kBTF _" 0.86 _e.

The

physics

of this model is _very similar _to that of the

2.I, namely,

a

liquid-

like

high

temperature

phase

with _no definite

p-sheet

structure, and _a low temperature ^frozen

phase, consisting

of stacks of

parallel p-sheets-

A _non

isotropic

_type of

mean-field,

_as in reference

iii,

would

probably again

induce _a _very weak temperature

dependence

in the low temperature

phase,

but the

biological

robustness of the _two dimensional

p-sheets

should make the temperature

dependence

_even weaker than in the

Furthermore,

due to the

higher dimensionality

of the

p-sheets,

_we believe the

isotropic

mean-field

approximation

to be _even better than in the

approximation,

two models for the formation of

secondary

structures in

globular proteins (in

^the

thermodynamic limit).

These models exhibit _a first order transition between

a

high-temperature liquid-like

_com- pact

phase

and _a

quasi-frozen low-temperature phase.

In this

quasi-frozen phase, secondary

structures span the whole system.

As far _as

comparison

with

proteins

is

concerned,

several _caveats _are in order.

ii)

The lattice

approach,

where _one link is taken _as _a helix turn _or _an

aminc-acid,

_may be too crude.

(ii)

The models _are studied in the

thermodynamic

limit

(N

= L~

going

to

infinity). Choosing arbitrarily

_a size of N

=

10~ -10~ aminc-acids for _a protein, the

quasi-frozen phase

is in fact

completely

frozen in the

bulk,

since thermal fluctuations would be present

only

in much

larger

systems, N

~-

10~ aminc-acids

(see

Sect.

2).

However,

surface modes should not be

frozen,

and _a _more detailed

study

of finite size effects is in order.

(10)

(iii)

Since _we have considered

only

compact structures, _we have

implicitly

assumed the

collapse

_energy Ec to be much

larger

than the intramolecular H-bond _energy EH. In real systems, the ratio

Ec/EH

_seems rather

large,

of order 20 [10].

A different limit is

currently studied,

^where Ec « EH> and the formation of

secondary

structures is initiated in the unfolded

(non-compact) phase.

(iv) Finally,

let _us note that aminc-acids have

given probabilities

to be present in

tY-helices, p-sheets

_or turns

(I.e,

no

secondary structure).

This is reflected in the Ramachandran

plots iii;

_a _more elaborate

approach

should include these

weights.

In _our

models,

the denaturation

proceeds

in two steps: _a first step, ^where

secondary

structures unfreeze and become mobile. This

phase

_seems

closely

related to the molten

globule

of reference [4]; ⁱⁿ a second step

(not

described

by

the above

models)

the compact

globule

would

unfold,

into _a swollen coil.

References

iii (a)

Creighton T-E-, Proteins

(W.H.

Freeman, New York, 1984);

(b)

Levitt M., Current Opinion in Structural Biology1

(1991)

224 [2]

(a)

Pauling L. and Corey R-B-, P-N-A-S. 37

(1951)

235, 251, 272, 729;

(b)

Richardson J-S-, Adv. Prot. Chem. 34

(1981)

167.

[3]

(a)

Kldman M, and Sadoc J-F-, J. Pllys. 40

(1974)

L569;

(b)

Nelson D-R- and Spaepen F-S-, Solid State Phys. 42

(1988)

and references therein.

[4]

(a)

Ptitsyn O-B-, J. Protein Chem. 6

(1987)

273;

(b)

Stigter D., Alonso D-O-V- and Dill K-A-, P-N-A-S. 88

(1991)

4176.

[5] des aoizeaux J. and Jannink G., Les Polymbres _en Solution

(Editions

de Physique, Les Ulis,

1987).

[6] Flory P-J-, Proc. R. Soc., London Ser. A234

(1956)

60.

[7] Bascle J., Garel T. and Orland H., J. Phys. A25

(1992)

L1323.

[8] ^Orland H., Itzykson C, and De Dominicis C., J. Phys. Lett. 46

(1985)

L353.

[9]^de ^Gennes P-G-, Phys. ^Lett. A38

(1972)

339.

[10] ^Gamier J., private communication.