HAL Id: jpa-00247149
https://hal.archives-ouvertes.fr/jpa-00247149
Submitted on 1 Jan 1995
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
From Minimal Models to Real Proteins: Time Scales for Protein Folding Kinetics
D. Thirumalai
To cite this version:
D. Thirumalai. From Minimal Models to Real Proteins: Time Scales for Protein Folding Kinet-
ics. Journal de Physique I, EDP Sciences, 1995, 5 (11), pp.1457-1467. �10.1051/jp1:1995209�. �jpa-
00247149�
Classification
Physics
Abstracts42.66-p 43.64+r 43.70+1
From Minimal Models to Real Proteins: Time Scales for Protein
Folding Kinetics
D. Thirumalai
Department
ofChemistry
andBiochemistry
and Institute ForPhysical
Science andTechnology, University
ofMaryland, College Park,
MD 20742, USA(Received
20February1995,
revised 29May
1995,accepted
25July 1995)
Abstract. Trie
multipathway
mechanism discoveredusing
minimalprotein
mortels in con-junction
withscaling
arguments are used to obtain time scales for the various processes in thefolding
of realproteins.
We considerpathways involving
low energy native-like structures as well as directpathways
thatproceed
via a nudeation mechanism. The average activation bar- rierseparating
the low energy structures and the native state ispredicted
to scale as/Ù
where N is the number of aminoacids in theproteins.
In addition estimates offolding
times for di- rectpathways
in whichcollapse
andfolding
are(almost)
synchronous are given. It isargued folding
sequences whosefolding
transition temperature is very close to thecollapse
transitiontemperature are
likely
to reach the native conformationrapidly.
1. Introduction
The
study
of minimal models forproteins
basproduced
a very novel(and unconventional)
framework for
understanding general aspects
of trie kinetics of in vitroprotein folding.
Trie the~oretical concepts
underlying
this framework basrecently
been referred to as trie "new view"iii
of
protein folding.
A small butrepresentative
list of references espousing trie "new view" is listed[2-8]. According
to trie conventional viewpoint
[9]protein folding
isthought
to occur via asequential
mechanism 1-e- successivestages
in trieself-assembly
process(characterized, perhaps, by
discreteintermediates)
are morefully
folded and doser to trie folded state. Moreimportantly
it is believed that triepathway leading
trieprotein
from a denatured state to triefully
folded state isthought
to beunique.
In contrast trie "new view" starts from trie premise that trieunderlying
energylandscape
inproteins
isrougir.
This means that there are many minimaseparated by
barriers ofdiffenng heights.
Thus one is forced to use statisticalmechanical methods to understand trie
dynamics leading
to trie native state even when consid-ering
triefolding
of asingle
monomericprotein.
In this scenario trieprotein folding problem
amounts to
searching
for trie native state(presumed
to beunique)
in triecomplex landscape.
Trie
description
based on trierugged
energylandscape implies
thatgenerically
aheteropolymer
formed
randomly
from triepool
oftwenty
aminoacids would exhibit kineticglassy
behaviorby becoming trapped
in adeep
non~native minimum. Apossible
way out of this conundrum is@
Les Editions dePhysique
1995to assume that in trie process of evolution sequences bave been
designed
that bave an effective free energy bias toward trie native state such that as trieunfolding-folding
reaction progressesthese
deep
minima areeffectively
avoided. Thisgeneral postulate
ledBryngelson
andWolynes (BW)
tosuggest
that naturalproteins
are"minimally
frustrated" [2]. Onuchic et ai.[loi
bave used these ideas further to characterize the energylandscape
m realproteins
in terms of theparameters introduced
by
BW.The mechanism
by
which aprotein
reaches the native state has been studiedusing
minimalmodels and
variety
of correlation functionsIll].
Thesestudies,
which have correlated thefolding
kinetics with theunderlying landscape
for foldable sequences, have shown that theapproach
to the native state occurs m three distinctstages
and involvesmultiple parallel pathways.
The purpose of this note is to utilize the results of these modelstudies,
which haveonly
considered smallsystem sizes,
and proposescaling
laws forextrapolating
the time scales and mechanism forlarger single
domainproteins.
This work builds on our earliersuggestion
that suchscahng
laws can be formulated inanalogy
with theprinciples
inpolymer physics il Ii.
This
together
with simulations of minimal models can be used toanalyze
kineticprotein folding experiments.
We would like to
emphasize
that severalaspects
of thisstudy
are tentative.They
have been obtained from thephysical picture
offolding
whichemerged
fromprevious
studies of minimal models. These models are caricatures ofprotein
and differ from each other in many ways.Despite
the variations in these models certain features offolding
kinetics appear to be robust.For
example
both thethermodynamic
and kineticaspect
for all these models arequahtatively
the same and this is
perhaps
due to sequenceheterogeneity
mherent inproteins.
It should also beemphasized
that all these models and other realisticrepresentations
ofproteins
could lead to differentpredictions
for certain aspects of theproblem.
Forexample
formation of the preciseordering
ofsecondary
structures willcrucially depend
on the details of the models. Uneexpects
ongeneral grounds
that these features should not affect the overalltertiary
structureformation to which this paper is most
directly
relevant. In other words the N(number
of aminoacids) dependence
of the kinetic laws obtained here should notsignificantly depend
onthese details.
2. Three
Stage Multipathway
KineticsBefore
postulating
thescaling
laws for the time scales in the threestages
offolding
as a function of the number ofaminoacids, N, comprising
aprotein
webriefly
discuss theorigin
of the mul-tiphasic multipathway
kinetics. This basic mechanism wasoriginally
discoveredusing
Monte Carlo simulations of minimal models of the sort introducedby
Dill and coworkers[12].
In theseIll,13]
andsubsequent
off-lattice studies[14] only
thealpha
carbonrepresentation
of theprotein
is utilized. These models are minimalrepresentation
ofproteins
becausethey
contain some, but notall,
of the features thatplay
a role mimparting stability
toglobular proteins.
Despite
the seemmgsimphcity
of the minimal models it has been shown that theunderlying
energy
landscape
iscomplex sharing
many features m common with realproteins Ill,13].
Sim- ulationsusing
minimalprotein
models have revealed two distinct mechanisms forprotein
to reach the native state. The first one is an indirect route that is descnbedby
a threestage
mul-tipathway
mechanism and the second is a directpathway
which is characterizedby
a uucleationmechanism. It is crucial to
appreciate
that between the characteristic temperatures,Te
andTf, namely
thecollapse
transitiontemperature
and thefolding
transitiontemperature
respec-tively
both mechanisms aresimultaneously operative.
Theamplitudes
of the two mechamsmsdepend
ontemperature [14],
chemical conditionsils],
sequence, and will be very sensitive tomutations.
We first consider indirect
pathways
thatinvariably produce
misfoldedstructures,
and aredescribed
by
a threestage multipathway
mechanism(TSMM).
TheTSMM, thought
to besuitable for the
folding
ofsingle
domainproteins,
was first observed in a Mass of latticeil Ii
and off-lattice models
[14] using
thefollowing technique.
The modelprotein
isquenched
froma
high temperature (infinite
in latticemortels)
to atemperature
close to thefolding
transitiontemperature.
Theapproach
to the native state is then monitoredusing
several statistical mechanical correlation functions. Theensuing kinetics, averaged
over an ensemble of distinct initialconditions, signalling
the formation of the native statedearly
occurs in three distinctstages.
A briefdescription
of each of thephases
isgiven
below.ii Non-specific Collapse:
in the firstphase
the mortelprotein collapses
to a compact confor~mation driven
perhaps by
the effective attractive interactions between thehydrophobic
residues. The relaxation of the various correlation functions in this
stage
isquite
com-plex
and may in fact be consistent with a stretchedexponential
behavior. In contrast tohomopolymers
thenon-specific collapse
inproteins
is nottotally
random. The structures obtained at thisstage depend
on various factors likeloop
formationprobability,
intemal motions dominatedby
dihedralangle
transitions etc. Thus inproteins
even at thisstage
there is somespecificity
that is notexpected
inhomopolymers.
iii
KineticOrdering:
in the secondphase
the foldable chaineffectively
discriminates between theexponentially large
number ofcompact
conformations to attain alarge
fraction ofnative hke contacts. The motion of the various
segments
in thisregime
arehighly
coop-erative and in this
phase
the effect of thebiasing
forces inherent in the foldable sequencesencapsulated
in theprinciple
of minimal frustration becomeoperative.
At the end of thisstage
the molecule finds one of the basinscorresponding
to trie minimum energy struc~tures.
Although
these structures have many native-like contactsthey
are also differentfrom the native state in a
significant
way.iii)
All or None: the finalstage
offolding corresponds
to activated transitions from one of the many native-like minimum energy structures to the native state. In thisstage
the few incorrect contacts are broken and the native contact established. Thisnecessarily
involves chainexpansion
andsignificant unravelling
before the correct contacts can be established. A detailedanalysis
of severalindependent trajectories
for both lattice and off-lattice simulationssuggest
that there aremultiple pathways
that lead to the structures found at the end of the secondstage [11,14].
There arerelatively
fewpaths
that connect the native state and the numerous native-like conformations located at the end of thesecond
stage [11,16].
A summary of the threestage multipathway
mechanism isgiven
inFigure1.
3. Time Scales in Protein
Folding
In order to make the above scenario
applicable
toproteins
it is necessary to obtain thedepen-
dence of the time constants for the three
stages
as a function of N. Proteins aremesoscopic
evolvedheteropolymer systems
with considerablebranching.
Here weadopt
triepoint
of viewthat from a coarse
grained
sense one can represent aprotein
with N effective amino acidresidues with N up to
roughly
200.Many smgle
domamproteins
fall m thiscategory.
Given this we wish to makeplausible scaling arguments
thatgive explicit expressions
for the approx-imate time scales involved in the distinct
stages
offolding.
Trie kinetic lawsproposed
in this paper areapplicable
when trierefolding
process is initiatedby changing
trie solvent conditions.°'R£CT
~ p~
ii~~ THkyAy
~~ÉATIONI, ~
/à#
~%1Q°
~~
"p
D'FFUS'VEdé
I
ÎiÎ
SEARCH
Î~Î~ ~É
III
"COMPACT" ~~ NATIVE-LIKE
STRUCTURES
MULTIPLE PATHS
(-
FEW PATHS-(
CN w q~NY~~
(qle)~
& N(?)
io~6 -io~ -io~
~ l
0~~s 10~~s
1sFig.
1. Thisfigure
summarizes the mechanismsby
which proteins reach their native statestarting
from random colt conformations. This is based on simulation studies of several classes of minimal mortels. The mechanisms illustrated are
applicable
for sequences which fold in finite times. In trie first process afraction,
#, of the initial population of molecules reaches the native statedirectly following collapse.
This involves the formation ofa critical nucleus.
Clearly
#depends
on temperature, ambientconditions,
and is also sensitive to mutations. Theremaining
fraction(1- #)
reach the nativeconformation via
a three stage multipath- way mechanism
(TSMM).
It should beemphasized
thatthis kinetic partitioning between nucleation
pathways
and TSMM issimultaneously
present for ageneric
protein. The kinetic partition factor depends on intrinsic factors suchas sequence as well as
environment. By
changing
the condition one carichange
# as has been clone recently inCytochrome
c[15]. The TSMM involves a
nonspecific collapse
followedby
a search among the compact conformations.The search among compact conformations leads the
protein
to orle of the many native-like conforma- tions. Those conformationstypically
have between(60-80)%
of nativetertiary
contacts. The native- hke conformations are often stabilizedby
incorrecttertiary
interactions. The last stage consists ofactivated transitions from one of the native-hke conformations to the native state. There are
multiple paths leading
to the native state andrelatively
fewpaths
in the final stage offolding.
The numberof various conformations available m the three stages as a function of N are given. Their value for
N
= 27 is also shown. The number of minimum energy
(native-hke)
structures should be viewed asa conjecture. This conjecture is based on results
reported
elsewhere [17]. The last hne in thefigure gives
the time scale for the three processes for N =100 usmg estimates forphysical
quantities givenin the text.
This is
usually accomplished by diluting
trie concentration of trie denaturant so that trie fold-ing
process canbegm.
One can also initiate thefolding by changmg
thetemperature
so that T mTf.
3.1. NON-SPECIFIC COLLAPSE. This stage is
analogous
to thehomopolymer collapse.
Anapproximate
time scale for thisphase
can be obtainedusing
thearguments
usedby
de Gennesin his treatment of the kinetics of coil to
globule
transitionIi?i
It ispossible
that morerecent models of
homopolymer collapse
could be used toget
better estimates for thecollapse
process itself
[18].
This seems worthpursuing.
Here we restrict ourselves to thesimpler description Ii?i.
The time scale for non-specific collapse
islargely
determinedby
the average surface tension between the residues(hydrophobic)
and water. In order to utilize de Gennes'arguments
weimagine
that theprotein undergoes
atemperature jump
fromslightly
above the thetatemperature Te
to a value close to thefolding
transitiontemperature Tf. By generalizing slightly
thearguments
set forth in referenceIi?i
it is easy to show that rc can be written asTc m
(~~) (~~ ~)~ N~ (1)
7
,vhere
i~ is the solvent
viscosity
and a isroughly
thepersistence length
of theprotein.
Notice that Tc is notdependent
on the average attractiveinteractions,
eh, between thehydrophobic
residues which are
thought
to beresponsible
for thecollapse.
The value of the surface tension 7dearly depends
on Eh and as a result Tcdepends implicitly
on Eh.Using
minimal lattice models ofproteins
we have found thatequation il
isroughly obeyed
with theexponent lying
between 2 and 2.2.
3.2. KINETIC ORDERING. As mentioned before in this
stage
theprotein
molecule searches among the space ofcompact
structures in such a manner that as the forward reactionproceeds
biases in the sequence directs it to structures that are native-like. In our
previous study
we had shown that these low energy structures couldcorrespond
to one of the many minimum energystructures
Ill,19].
These structures have considerable native character as measuredby
thedegree
ofoverlap
introduced elsewhereiii].
Triedegree
ofoverlap
between agiven
structure and the native conformation is assessedquantitatively by comparing
the distances betweennon-bonded residues
(residues separated by
at least threebeads)
and thecorresponding
onesin the native conformation. If the distance is less than trie bond
length
then it ispresumed
that the contact between the
specified
residues is a native contact. If the fraction of nativecontacts m a
given
structure isgreater
than about50%
these structures are dassified as native- hke. In fact wetypically
find that the minimum energy structures have(60 80)%
of nativecontacts
iii].
Trie search among the compact structuresleading
to one of the minimum energystructures
(see Fig. Ii
has beenpreviously argued
toproceed by
a diffusive process in therugged
energylandscape [19].
Based on extensive numerical studies we hadsuggested
that the formation of native-like structures from an ensemble of native structures takesplace
via a series of(largely)
localcooperative
motions. Furthermore this process wasargued
to beanalogous
to
reptation
inpolymer
melts[19].
We shouldemphasize
that the localcooperative
motions in a monomericprotein
is reminiscent ofreptation
andconsequently
one mayexpect
similar Ndependence
to be obtained. With thisanalogy
trieapproximate
time scale for diffusion msubspace
of compact structures cari be written asiii]
TKO "
TDN~ (2)
where
(
is adynamical folding
exportent, and TD is a an undetermined time constant. We have estimated from limited numencal data that(
should beapproximately
3. This value is consis- tent with the notion that the diffusive search among a manifold ofcompact
structuresleading ultimately
to one of the numerous minimum energy structures takesplace
via areptation
hke mechanism. This estimate for(
should beregarded
as aconjecture
and thus it would be de- sirable to have better estimates of this using minimal models. The time scale TKO is somewhatreminiscent of the
arguments given by Grosberg
et ai.[20b]
m theirdescription
of the secondstage
of kinetics ofapproach
to thecompact phase
ofhomopolymers following
atemperature quench.
In our case thephysical
process that takesplace
is very different from that of ho-mopolymers.
Inproteins
the secondstage
leads thesystem
to native-like structures drivenmostly by
formation of favorable free energy contacts while thecompact globule
formation in the secondstage
is dictatedby geometrical
factors andentanglements [20b].
3.3. ALL OR NONE. The last
stage
of thefolding
processcorresponds
to an activatedtransition from one of the many minimum energy native-like structures to the native state.
Since there are many native-hke structures it follows that the transition
state,
which occurslate in the
folding
process for those molecules that reach the native state viaTSMM,
the transition state is notunique.
Because of the number of precursors to the native state it follows that themajor driving
force even in the latestages
forfolding
isentropic
inorigin.
Thisconcept
has beenexplicitly
illustratedusing
lattice models of disulfide bondedproteins [21].
From a theoretical per-
spective
this is mharmony
with similar notions introduced to describe activated transitions mglasses [22]
based on a detailedstudy
of Pottsglass
models[22].
Thisanalogy
to Pottsglasses
cari in fact be utilized to obtain an estimate of the average barrierheight separating
the minimum energy structures and the native state as a function of N. Herewe
give
thearguments
based ourpervious study
of thetemperature dependence
of relaxation times in systems withrough
energylandscape [21, 23].
The diffusivefolding
time for the laststage
may be written asTF "
T0e~~~~~~~ (3)
where
AFt
is the average free energyseparating
the native state and the precursors, and TO is anappropriate
time constant. The estimate ofAFt
for a 46-mer minimal model with afl-
barrel structure as the native state is estimated to be
6kBTf
whereTf
is thefolding
transitiontemperature [24]. Apparently
a smaller estimate(AFt
+~
2.4kBTf)
has been made for 27-mer three letter code lattice model ofproteins
[9].The
scaling
ofAFt
with N cari beanticipated by using
thefollowing argument.
We assume that the free energy distribution of the low energy native-hke structures isgiven by
a Gaussiandistribution. Since there are an ensemble of
independent
transition statesconnecting
these native like conformations and the native state it is natural to assume that the barrierheight
distribution is also
roughly
Gaussian with triedispersion (AF~)
that scales as N. It should beemphasized
that this distribution resultsonly
afteraveraging
over the freeenergies
of thenative41ke structures. Since the barrier
height
distribution isessentially
a Gaussian it follows thatAFt
+~
(AF~)1/~
+~
/Ù.
The
physical picture
given above and theanalogy
with relaxation processes inproteins
andglasses
cari be further used to obtain a more precisescahng
behavior ofAFt.
The relaxation time forglasses
is often written as [22]~6Ft/kB(T-Tk) j~j
T ~
where
Tk
is the Kauzmanntemperature signalling
thevanishing
of theconfigurational entropy.
(It
should bepointed
out thatTk
is the same asTg
m the random energy model in thatboth are
computed
from trievanishing
ofentropy,
and hence areequilibrium glass
transitiontemperature [25].
From here on we will useTk
andTg interchangeably.)
The formula inequation (4)
was derived usmgscahng arguments
based on thephysics
ofdroplet
excitations in Pottsglass
models[22, 23].
In reference[20a]
it was shown thatsulliciently
close toTk
theactivation barrer scales as
AFt/kBTk
+~
t~~ (5)
where t =
(T Tk)/Tk.
We will assume that this behaviorpersists
attemperatures
close toTf.
In order to obtain the Ndependence
ofAFt
we note that the activationphysically corresponds
tocooperative
structuralrearrangement
on a scale which becomesincreasingly large
as T-
Tk.
Forproteins
thisimplies
that as T -Tg (or Tk)
such alength
scale(
wouldessentially
be the wholeprotein
molecule and hence thefolding
times nearTg
wouldessentially
be infinite or moreprecisely
muchgrater
than thebiologically
relevant time scales.Using
therelation
t~l
+~
(1/~
+~
Nl/~~
we
get
AFt /kBTk
~
N~/~~ (6)
where v is a correlation
length exportent.
Because of thepossible mapping
between the minimalprotein
models and Puttsglasses
and REM we take v = 2Id [22a].
Thisyields
AFt
+~
kBTk/Ù. (7)
By letting
o =Tf/Tg
trie barrierheight
can be written asAFt
+~
a~~ kBTf/Ù. (8)
There are a few comments about trie
scaling
laws for trie threestage multipathway
kinetics that are worthmaking.
ai
The estimate of the time scale fornon~specific collapse
mequation (1)
cari be madeusing
realistic
parameters. Typical
ranges of i~, a, and 7 are(0.01-0.1) Poise, (5-10) À,
and(40~
60) cal/À~
mole. Thisyields i~a/7
in the range(10~~° -10-~~)
s. Thus atTf
=
Te /2
m Tcis found to be between
(0.1-10)
J1s for N= 200. This time is very short and it therefore is reasonable that
non-specific collapse would,
atsubsequent times, invariably
lead to misfoldedstructures.
b)
The time constant TD inequation (2) correspondly roughly
to the time scale for localdihedral
angle
transitions[26].
Our previous simulation studiessuggest
that to alarge
extent a series of local dihedralangle
transitions areresponsible
for trie diffusive search[24]. Typically
TD "
10~~
s. This is also consistent with similar estimates based on lattice MC simulations
[26].
Thus TKO " 10 ms. for N
= 100.
c)
Since there are a manifold of different kinds of states m aprotein
it is worthwhilebeing specific
about the nature of states for whichequation (8)
isexpected
to be valid. Weexpect
that the barrierheight separating
any one of the manifolded of low energy native-like states and the native state to begiven by equation il ).
The barrier to go from the native state toone of the many low energy structures is
given by AFt
+As
whereAs
is thestability
gap. If theanalogy
with Pottsglasses
is further utilized then it follows that the relative free energydifference between the distinct low energy states can differ at most
by /Ù.
Thus for native state to be stable it follows that thestability
gap shouldobey
theinequality
As /kBT / ~/Ù (9)
where fl is an
unspecified
constant. The relative free energystability
of the native conformationcompared
to that of the manifold ofhigher
free energy states at T= 37 °C
(physiological temperature)
for N= 150 is
roughly
10 kcal/mole according
toequation (9) assuming
flrwO(1).
It is known that native conformations of
single
domainproteins (N
<200)
are stable withrespect
to the other statesby only (5-15) kcal/mole
which is inapproximate agreement
withequation (9).
d)
It isamusing
that the estimate given inequation (8) gives AFt
rw
3.25kBTf
for N= 27
and 4.24
kBTf
for N= 46
using
a value of arw
1.6. These values are in
rough
accord with thenumerically
estimated barrierheights given previously.
Thisagreement
should be considered verygood
becauseequation (8)
is trotexpected
to be accurate for small values of N. We believe thatequation (8)
wouldprovide good
estimates for realproteins.
e)
It isinteresting
to calculate thefolding
times TF inequation (3)
for values of N corre-sponding
totypical
values in asingle
domainprotein.
Theseroughly
range from N= 50 200.
In order to obtain the
approximate
time scale TF an estimate forTO is needed. It is
quite
dillicult TO obtain an estimate of To because ingeneral
TO is a function of solventviscosity
and internalviscosity
ofproteins.
Here we follow thesuggestions
of Onuchic et ai.[10]
and setTO "
2xTc~rr
where Tc~rr is the correlation time for harmonie fluctuations in asuitably
definedreaction coordinate. Onuchic et ai. estimate Tc~rr
+~
20,
000 monte Carlosteps (MCS)
for athree letter 27-mer model [10] which
yields
Tc~rr+~ 1.26
x10~~
sassuming1
MCS=10~~
s.
With this estimate one obtains an
expression
for TF at T=
Tf
which is givenby
TF " Ù.126
e°'~~
ms
(lÙ)
using
n~~
= ù-G-
Equation (10) yields
7F in the range(0.05-4)
s for N between 100 and 300.These values are consistent with
typical expenmental findings
which estimate that the timeconstants for the slow
phase
of thefolding
isroughly
on the order of seconds. Inobtaimng
equation (10)
we have assumed that Tc~rr isindependent
of N.fi
It isinteresting
to compare TKO(cf. Eq. (2))
and TF(cf. Eq. (10))
for the much studiedcase of N
= 27. In this case TKO
~
2
ms1-e-,
the time scales for the second and thirdstages
arepractically
the same.Folding
of this modelpeptide
will appear to beroughly
twostage
like.In order to
clearly distinguish
between the vanousstages
simulations for a range of values of N should beperformed.
g)
The value of a =Tf/Tg
has been set to I.G ingetting equations (8)
and(9).
These estimates shouldreally
be viewed as tentative eventhough they
seem to besupported by
simulations of minimal models. In
general
for foldable sequences it is necessary that a > 1.This shows that as o becomes
large
the effective activation barrier decreases. This wouldnecessarily
lead to smaller values of thediffusing folding
times TFIci Eq. (3)). Large
values ofo
imply Tf /Tg
» and the abovearguments
indicate that fastfolding
sequences maximize a asanticipated by Bryngelson
andWolynes [27].
The upper bound forTf
isclearly Te
thecollapse
transition
temperature
because aboveTe
theprotein
behaves as a denatured species withlarge
values of the radius ofgyration. (Here
it should be stressed that even nearstrongly denatunng
conditions many
protein
molecules could retain some element of thesecondary
structure, and hence the radius ofgyration
may notcorrespond
to theFlory
estimate for selfavoiding walk.)
Thus fast
folding
sequences should be characterizedby having Tf
as close toTe
aspossible.
From the bound
max
(Tf/Tg)
=Te /Tg Ill)
it follows that there should be correlations between
folding
rates andtcT "
(Te Tf)/Te. (12)
In fact in our earlier simulation studies
Ill]
on lattice mortels we had discovered that fastfolding
sequences are charactenzedby
small values of tcT. In other words smaller the value of tcT the faster is thefolding
rate.Although minimizing
tcT has been denved from the crucialobservation of BW that
alpha
should be mammized forpractical applications
such as de novodesign
ofproteins
the critenon ofhaving
smalltcT
may be easier toimplement
because bothTf
andTe
can be calculated usingequilibrium
statistical mechanics.A note about
Te
is m order. Itmight
appear that one may noteasily apply
theconcept
ofpoint
for apurely
randomheteropolymer by
using thevanishing
of the effective twobody
interaction. The reasons for thisargument
is that even if the value of the effective twobody
interaction is obtainedby
suitable renormalization of the energy scales its value may bepositive implying
that thesystem
is in agood
solvent.However, proteins
are not randomheteropolymers.
The fraction ofhydrophobic
residues is inslight
excess of others so that an effective twobody
interaction isonly positive
understrongly denaturing
conditions. Minimal model studies havedearly
shown that for agiven
foldable sequenceTe
may be obtainedby
conventional methods
(by monitoring
thetemperature dependence
of energyfluctuations)
and thecollapse
transition is a finite sized second orderphase
transition.4.
Specific Collapse,
DirectPathways,
Nucleation MechanismIf in fact
tcT
< 1 it follows that the process offolding
would occursimultaneously
withcollapse
itself or m other words it would be very diflicult to differentiate betweencollapse
andfolding
in
typical experiments.
In these situationsspecific collapse
is almostsynchronous
withfolding.
Let us first
emphasize
that forproteins
which are charactenzedby
several energy scales tcTcannot be
identically
zero. Thephysical
reason is thatprotein
at T=
Te
is notcompact
andat
Te
the chain behaves like an ideal one. It is known thatproteins
are compact with the radius ofgyration satisfying
theinequality aNl/Z
< R <aNl/~.
It follows thateven
though
tcT cannot be
identically
zero one cariengineer
sequences(in pnnciple)
such that tcT < 1.In this case, as mentioned
above,
at leastexperimentally collapse
andfolding
cannot beeasily
separated.
Two recent studies in factsuggest
that ingeneral
there arepathways
that exist thatdirectly
reach the native state withoutforming
any detectable intermediates. Let usbriefly
discuss both these studies.
l)
The firststudy probed
the kinetics ofapproach
to the native state in a modelfl-barrel
structure. It
[14, 23]
was shown thatdepending
on thetemperature
there exists a fraction of initialtrajectories
in which the native state is reachedrapidly following collapse.
This processwas shown to occur via the formation of a critical nucleus
[14, 24, 28].
InFigure
1 the directpathway proceeding by
a nucleation mechanism is shown as dashed lines.2)
Moreimportantly
experiments onCytochrome
c[15] clearly
show that these directpath-
ways exist such that the native state is reached almost
simultaneously
with thecollapse
itself.Furthermore these authors have shown that
by suitably modifying
the chemical conditions indirectpathways leading
to the formation of low energy misfolded structures can often beehminated. This would
imply
that under suitable chemical andtemperature
conditionsfolding
would occur in
essentially
asingle step.
If this occurs, as the authors of reference[24]
bave shown forCytochrome
c, then one would inferusing
our scenario thatTf
+~
Te. Clearly
moreexperiments
would be needed to establish thegeneral validity
of thesefindings.
For the
specific collapse,
which ensures the correct formation of the native structures andloops,
thefolding
kinetics would beexponential.
An estimate for the time constant forspecific collapse
may be obtainedby generalizing
de Gennesarguments
to mdudelarge loop
formationprobability.
Thischanges equation iii
toTsc C~
Ii)
l~~i~~)~ N~+~ i131
where
approximately
accounts for theprobability
oflong
range(approximately
the size ofprotein) tertiary
interactions. In three dimensions 1.8 < < 2.2 and using the estimate of10~1°
s for
(~a/7)Tsc
m 3 ms for N= 100 at
Tf
=
Te /2.
Conclusions
Analogies
between thefolding
ofproteins
and otherproblems
inpolymer physics
and other disordered systems have been used to obtain expressions for the time constants for the variousstages
in thepathways
thatinevitably (for entropic reason)
lead theprotein
to one of the lowenergy misfolded structures. The
scaling
lawspresented
here can be verifiedexpenmentally.
It has been shown
using pulsed exchange
NMR methods on a number ofproteins
that theprotection
kinetics exhibits a fastphase
and a slowphase
after a crucial transient time of about 5 ms[29].
This has been shown to arise from a combination of nudeationpathway
andpathways leading
to misfolded structures [14](see Fig. l).
The fractionil #)
of molecules follows the TSMM enroute to the native state. Forsufliciently large
N the thirdstage
would be the ratedetermining step
whose time scale is determinedby
the estimated barrierheight given
inequation (8).
If the scenariopresented
inFigure
1 is valid then time scale for the slowphase
inrefolding experiments
would follow the Arrhenius behavior with timebeing given by
TF 1T0eW.
If these expenmentsare
performed
as a function of temperature then the activation barrierheights
can be inferred from the temperaturedependence
of the slowphase [14].
Since the variousproteins
studied span a reasonable range of N the kinetic lawsproposed
here con be tested.A very
important
theme in the mechanism given mFigure
is that naturalproteins
con foldextremely rapidly
on the time scale ofcollapse
itself. In fact smtable modification of theenvironment can
(almost)
eliminate the indirectpathways mvolving
kinetictraps.
The timescale for this process is on the order of several msecs which
roughly
coincides with the "burstphase"
inpulsed hydrogen exchange labehng
expenments. Thus m order tofully
understand the detailed mechanism m the directpathway
one needsexperiments
thatprobe
initialevents, namely loop
formation between distantresidues,
whichhappen
on the order of tens of mi-croseconds
[30, 31].
It appears thatexperiments
ranging from10~~
s to s are needed to
verify
the full consequences of the kinetics of
folding
aspredicted by
the "new view".Acknowledgments
wish to thank
Zhuyan
Guo and PeterWolynes
forilluminating
discussions on an earlier version of themanuscript;
I amgrateful
to T. R.Kirkpatrick
for anunforgettable
discussionregarding
barrierheights
inproteins.
I am mdebted to Tobin Sosnick forhelpful
discussions on the expenments onCytochrome
c. John Straub andGeorge
Lorimerprovided
useful comments for which I am thankful. I thank the referees for severalpertinent
comments. This work wassupported
inpart by
agrant
from the National Science Foundation and the Air Force Office of Scientific Research.References
iii
BaldwinRi.,
Nature 369(1994)
183.[2]
Bryngelson
J-D- andWolynes P-G-,
Froc. Natl. Acad. SC~.(USA )
84(1987)
7524-7528.[3] Garel T. and Orland H.,
Europhys.
Lent. 6(1988)
307.[4] Honeycutt J-D- and Thirumalai D., Froc. Natl. Acad. Sa.
(USA )
87(1990)
3526-3529.[Si Shakhnovich E., Farztdinov
G.,
Gutin A.M. andKarplus
M.,Phys.
Reu. Lett. 67(1991)
1665- 1668.[GI Miller R., Danks C.A., Fasolka M.J., Balazs
A.C.,
Chan H-S- and Dill K-A-, J. Chem.Phys.
96(1992)
768~780.[7]
Leopold
P-E-, Montal M. and OnuchicJ-N-,
Froc. Natl. Acad. Sci. USA 89(1992)
8721-8725.[8] Sali
A.,
Shakhnovich E. andKarplus M.K.,
J. Mol. Biot. 235(1994)
1614.[9] Kim P-S- and Baldwin R-L-, Ann. Reu. Biochem. 51
(1982)
459-489.[loi
OnuchicJ-N-, Wolynes
P-G-,Luthey-Schulten
Z. and Socci N-D-,(Preprint, Jan., 1995).
iii]
Camacho C.J. and Thirumalai D., Froc. Natl. Acad. Sa. USA 90(1993)
6369-6372.[12] Lau K. and Dill K-D-, Macromolecules 22
(1989)
3986-3997.[13] Chan H-S- and Dill
K.D.~
J. Chem.Phys.
100(1994)
9238-9257.[14] Thirumalai D. and Guo Z.,
Biopolymers
Research Communications 35(1995)
137-140.ils]
Sosnick T., Mayne L., Hiller R. andEnglander
S-W-, Nature 1(1994)
149-156.[16]
Bryngelson
J-D-, Onuchic J-N-, Socci N-D- and WolynesP-G-,
Proteins:Structure,
Function and Genetics(1995)
in press.[17] de Gennes P-G-, J.
Phys.
Lett. 46(1985)
L639-L642.[18]
Ostrovsky
B. and Bar-Yan Y.,Europhys.
Lett. 25(1994)
409.[19] Camacho C.J. and Thirumalai
D., Phys.
Reu. Lent. 71(1993)
2505-2508.[20]
a)
de GennesP-G-,
J. Chem.Phys.
55(1971)
572;b) Grosberg
A., Nechaev S. and ShakhnovichE.,
J.Phys.
49(1988)
2095.[21] Camacho C.J. and Thirumalai D., Proteins:
Structure, Function,
and Genetics 22(1995)
27.[22]
a) Kirkpatrick T.R.,
Thirumalai D. andWolynes P-G-, Phys.
Reu. A 40(1989)
1045-1054;b) Kirkpatrick
T.R. and ThirumalaiD., Phys.
Reu. B 37(1988)
5342-5350.[23]
Kirkpatrick
T.R. and ThirumalaiD.,
J.Phys.
i France 5(1995)
777.[24] Guo Z. and Thirumalai
D., Biopolymers
36(1995)
83.[25] Derrida B.,
Phys.
Reu. B 24(1981)
2613,2626.[26] Thirumalai D., m Statistical Mechanics, Protein Structure, and Protein Substrate Interactions, S.
Domach,
Ed.(Plenum
Press,1994)
ils-134.[27]
Bryngelson
J-D- andWolynes P-G-,
J.Phys.
Chem. 93(1989)
6902-6915.[28] Abkevich V.I., Gutin A.M. and Shakhnovich
E-I-,
Biochemistry 33(1994)
10026-10036.[29] Radford
S-E-,
Dobson C.M. and EvansP-A-,
Nature 358(1992)
302-307.[30] Jones C.M.,
Henry
E-R-, Hu Y., ChanC.K.,
LuckS-D-, Bhuyama
A., RoderH.,
Hofrichter J. and Eaton W.A., Froc. Natl. Acad. Sci. USA 90(1993)
11860-11864.[31] Carnacho C.J. and Thirumalai D., Froc. Natl. Acad. Sci. USA 92
(1995)
1277.JOURNAL DEPHYSIQUBL T.3,N° ii NOWMBEII 1993 37