HAL Id: jpa-00246711
https://hal.archives-ouvertes.fr/jpa-00246711
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
RNA secondary structure: a comparison of real and random sequences
Paul Higgs
To cite this version:
Paul Higgs. RNA secondary structure: a comparison of real and random sequences. Journal de
Physique I, EDP Sciences, 1993, 3 (1), pp.43-59. �10.1051/jp1:1993116�. �jpa-00246711�
Classification Physics Abstracts
87,15 36,20
RNA secondary structure:
acomparison of real and random sequences
Paul G.
Higgs(*)
Service de Physique Tltdorique (**) de Saclay, F-91191 Gif-sur-Yvette Cedex, France
(Received
?3 July 1992, accepted in final form 23 September1992)
Abstract A sample of tra»sfer RNA molecules is compared to a sample of random sequences
having the same length and same percentage composition of the different bases. For each sequence all possible secondary structures are constructed and a distribution of free energies for
the states is obtained. It is found that the ground state free energies of tRNA molecules are
significantly
lower than for random sequences, and that tRNA molecules have significantly fewer alternative secondary structures at energies close to the ground state than do random sequences.A distance D is defined which
measures the average difference between molecular configurations
and the ground state configuration. At realistic temperatures of order 300 K this distance is much larger for random sequences than for tRNA sequences. Thus the secondary structure of tRNA molecules at finite temperature is more stable than for random sequences. Sequences are
considered wltich differ by a small number of ntutations from real tRNA sequences. On average mutations destabilize the secondary structure. This suggests that a stable secondary structure
is one of the factors selected for by natural selection. The thermodynamic behaviour of RNA
sequences is compared to models for ranqom heteropolymers which have a low temperature
frozen phase.
1 Introduction.
The
secondary
structure of ribonucleic acid molecules is known to be acomplex
set of "stems"and
"loops"
formedby
basepairing
bet,veencomplementary regions
of the chain(Saenger
[1], and seeexamples
inFig. 1).
It is no,vconinion-Place
to usecomputational
methods topredict
thesecondary
structure ofparticular
RNA sequences [2-7] or to confirmexperimental
evidence for the structure.(*)
Address from Oct. 92; Dept. of Physics, Uitii,ersity of Sheffield, Hounsfield Road, Sheffield 53 7RH, G-B-(**) Laboratoire de la Direction des Sciences de la MatiAre du Commissariat I
l'Energie
Atomique.Typically
a program will consider the manypossible
ways offolding
a chain into asecondary
structure and select the structure which niininiizes the free energy, and
possibly
a few alterna-tive structures of
slightly higher
free energy. The lowest free energy state isgenerally
assumedto be
"stable",
and shouldcorrespond
to thebiological
structure if the parameters used to calculate the energy are knownsufficiently accurately.
We will call this most favourable state the
ground
state. From athermodynamical point
of view it isby
no mealis sufficient to know what theground
state of aphysical
systemis,
in order to know whether this state is stable. There will be anexponentially large
number of otherconfigurations
of the system, and eventhough
theground
state is more favourable than anyone of these other states, the
probability
offinding
the system in itsground
stateconfiguration
may be
negligible.
In this article we look at the
thermodynamics
of RNAfolding.
We will usea method which
can calculate the distribution of
energies
of allpossible configurations,
and hence theprobability
of
finding
the molecule in itsground
state. ~Te o,ill focus on transfer RNAS, since these are among the shortest RNA sequences(approx.
76bases)
and since thesecondary
structrue is well known. Eachspecies
has a series of tRNA molecules, each of which is able to bind to one of the 20 amino acids. The tRNA sequences for the different amino acids are known for manyspecies,
and have beencatalogued (Sprintzl
et al.[8]). Although
the sequences differ fromone another
they
can all bearranged
into the clover-leafsecondary
structure(Fig. 1a)
and the evidence suggests that this is thenaturally occurring
structure [1].Base-paired regions
of the chain are called stems. In tRNA the acceptor stem containstypically
6 or 7pairs,
whilst the other three stems contain between 3 and 5pairs, depending
on the sequence.Q b C
~~j~$°~
69D loop
TV loop
Aniicodon loop
Fig. 1. al Example of clover-leaf structure: the ground state of tRNA~'~'~ from T. utilis.
b)
andc) The ground state structure for two sequences each differing from tRNA~'~'~ by only one base at positions 26 and 69 respectively. Small changes in tlie sequence can lead to large scale reorgan12ation of the structure.
If a
biological
molecule is toplay
its pt.oper role in aliving cell,
it needs to be able torecognize
and interact ,vith otherbiological
molecules. Such interactions willusually only
bepossible
if the molecule is in aparticular configuration.
Aniniportant
property of a usefulbio-molecule is to possess a stable structure
(or possibly
a small number of alternative stablestructures)
in which the molecule is almostalways
to be found.Physicists
have been lead to draw aparallel
between thefolding
ofbiological
molecules(particularly proteins)
and models forstrongly
disordered systems such asspin glasses (Bryngelson
andWolynes
[9], Garel andOrland [10], Shakhnovich and Gutin
[I II).
In thesearticles,
molecules are treated as randommonomer sequences. The random disorder is shown to lead to the
freezing
of the molecule intoa small number of low energy
configurations
in some cases.In the present paper we will compare a
sample
of real tRNA sequences with asample
ofrandomly generated
base sequences. It is clear that real sequences are in no way random.Natural selection has been
acting
onbiological
moleculestuning
them to have certainrequired properties.
A stablesecondary
structure islikely
to be one of theseproperties,
and so we should not besurprised
to findsignificant
differences between the structures of real and randomsequences. On the other
hand,
aglance
at the tRNAcatalogue (Sprintzl
et al. [8]) shows that theprimary
sequences of the different iuolecules differ from one another agreat deal,
with noapparent pattern visible.
Although
there is a small number of conserved bases in theprimary
sequence [1, 8], the
siiuilarity
of the molecules is much more apparent whencomparing
thesecondary
structures rather than thepriiuary
base sequences. There are many other classes of molecule for which thesecondary
structure is also wellpreserved despite changes
in thesequence, e-g- 55 ribosoinal RNA
(Rogers
et al.[12])
and viral RNA(Ahlquist
et al.[13]).
Inthe case of tRNA, since there are a
large
number of known sequences, we are lead to look at theirproperties
in a statistical manner.2.
Description
of the pi,ograni.There are now man>,
coniput.ational algorit.hins
,vhich attempt topredict
RNA structureby searching
for the lo,,.est free energyconfiguration [2-7]. Thermodynamic
parameters measured inexperiment
areincorporated
into the prograiiis. Contributions to the free energy are basi-cally
of t,vo kinds: a fi.ee energygain
for eachcorrectly matching
basepair
added to a stem("stacking"),
anda fi.ee energy
penalty
for everyloop
closed. Favourablesecondary
structures will therefore have arelati;ely
small number of stems which are aslong
aspossible,
ratherthan a
large
number of very shot.t steins.The
stacking
freeenergies depend
on the basepair
added and on theprevious pair
in the stem. Allo,ved basepairs
are AU,
CG,
and the non-standardpair
G U. The values used in our program are thoseconveniently
tabulatedby
Jacobson et al. [5].They
vary between-0.3
kcal/mole
and -4.8kcal/mole depending
on the stackedpair.
These values are more or less standard in the literature.Much less standard is the treatment of
loops.
This ispartly
becauseexperimental
values forloop
parameters ai'e notal,vays
available. Thus even the more recent works involve aconsiderable amount of
approxiniation
in the valuesassigned
toloops (Jaeger
et al.[7]).
We have decided to treatloops
in a verysimplified
,vay. Ourobject
here is not toprovide
themost accurate
prediction
of thesecondary
structure for oneparticular
sequence, but to lookat
general
features of thefolding
behaviour in a statistical manner. We therefore note thatevery time a ne,v stein is added to the structure a new
loop
is also formed(possibly
ahairpin,
an interior
loop,
abulge loop,
etc..).
Ilence the total number ofloops
isequal
to the total number of stems. life ,vill make thesiniplification
ofassigning
apenalty
of +4.5kcal/mole
to allloops, irrespective
of their type andlength.
This value appears to betypical
of the valuesgiven by
Jacobson et al. [5],particularly
for tliehairpin
inloops
of 4 to 8 bases which arepresent in the tRNA clover-leaf structut.e.
With this
simplification
we canassign
a net free energy to a sequenceequal
to thestacking
energy +4.5. A stem is stable relative to the coil state if its net free energy
(including
thepenalty
for theloop
which itcloses)
isnegative.
A stemalways
contributesequally
to the free energy of a structure in thismodel, regardless
of thetopology
of theloops.
We will be interested not
just
in the lowest free energy structure, but in the distribution ofenergies
of these structures(density
ofstates).
Our program is rather similar to that ofPipas
and Macmahon [2] for this reason. It therefore
requires
alarge
amount of storage, and is less suitable forlong
sequences than alternative methods [4-7] due to theexponential
number ofpossible
structures. The programproceeds
as follows.SEARCH FOR POSSIBLE STEMS. All
points
in the sequence are checked forcomplementary pairs,
and a list ofpossible
stems is made. A stem is added to the list if it contains at least 3base
pairs,
and if its net free energy,including
thepenalty
forloop closure,
isnegative.
There must be a minimum of 3unpaired
bases in ahairpin loop,
hence if base I andj
arepaired
within a stem, thenj
> I + 4. ~Ve have follo,vedPipas
and Mcmahon [2] in the treatment of the GUpair,
I.e. GUpairs
are allowed withina stein, but not as the terminal
pair
in a stem.CREATION OF A COMPATIBILITY MATRIX. A matriX C is created such that the elements CAB = I if stems A and B are
compatible,
and CAB = 0 otherwise.Two stems are
compatible
ifthey
do notoverlap (I.e.
a base cannot be bonded in more thanone stem at
once),
and ifthey satisfy
the "no knots" rule. Thisrequires
that if bases I and j arepaired
in one stem, and k and I arepaired
in another stem, then either I <j
< k < I,or I < k < I <
j.
The otherpossibility
I < k <j
< I is forbidden(see
Sankoff et al.[4]).
Aconsequence of the no knots rule is that all allowed
secondary
structures can be drawn in 2dwithout the line of the chain
crossing
over itself.COMPILATION OF ALL POSSIBLE STRUCTURES. A structure is a set of
compatible
steals taken from the list. Each stem A represents apossible
structure in itself. The program thencreates a list of all structures
containing
apair
ofcompatible
stems A and B. To prevent doublecounting
werequire
B > A. For eachpair
A and B the program then searches for all structurescontaining
three stetnsABC,
all of,vhich arecompatible
with eachother,
andrequiring
C >B. Structures
containing
4, 5, 6.. steins can be built up in this way. For the moderatelength
chains considered here there ,vere a verylarge
number of structurescontaining
3 or 4 stems, but structures with more than 6 steins were aIniostimpossible
due to the restrictions ofcompatibility.
Once the set of steins contained in a structure is kno,vn, the free energy of the structure issimply
the suiu of the fi.eeenergies
of the stems. We note that in a more realistic treatment of theloops
this would not be true, and it would be necessary to test thetopology
of the
loops
in agiven
structure to calculate its fi.ee energy. Thus the program would be muchlonger.
3.
Comparison
of real and random sequences.A
sample
of tRNA sequences was taken from thecompilation
ofSprintzl
et al. [8]. This is thesame source as used
by
Ninio [3] in aprevious investigation
of tRNA structure. Two sequenceswere taken from the list for each amino acid
(,vhere
more than one isgiven).
The tRNAS forLeucine,
Serine andTyrosine
were excluded front thesatnple
since they contain an extra arm, and aresignificantly longer
than the rest. The result was asample
of 32 tRNAS withlengths
in the range 74-77 bmes and mean
length
close to 76. All of these can bearranged
in the clover-leaf pattern sho,vn infigure
la.One characteristic
distinguishing
tRNA from most other RNA is the presence of modified bases in addition to the four standard basesA, C,
G and U(Saenger [Ii).
Some of these aremodified in such a way as to prevent base
pairing,
so it is necessary to introducea class of
non-bonding
bases into the program.Following
Ninio [3] we have treated thefollowing
basesas
non-bonding: D, m~G, m(G, m~G, Q, Y,
andm~C.
All other modified bases were treatedas the standard base to which
they
most resemble. Inparticular
T and ~l were treated as U, and I was treatedas G. The
proportions
of the five different types of base in thesample
of tRNAS studied were: 20.2~A,
27.0~C,
28.4~G,
20.0$l U and 4A$lnon-bonding.
Properties
of tRNA sequences werecompared
witha
sample
of random sequences oflength
76 bases. Each base in the random sequences
was chosen to be
A, C, G,
U ornon-bonding
with a
probability equal
to theprobability
of occurrence in the real sequences. It is known that freeenergies depend
on the C + G content of thechains,
hencewe wished to be sure that the random sequences had the same
composition
as the tRNA sequences.As a test of the model we calculated the
ground
state structure for the 32 tRNA sequences.Of
these,
25 were found to have the clover-leaf structure asground
state, a further 5 were found to have the clover-leaf structure except for aniissing
D stem, and theremaining
2 hadground
states other tlian the clover-leaf. These results aretypical
of those obtainedby Pipas
and Mcmahon [2] and Ninio [3]. It would therefore seem that the
simplified
treatment of theloop energies
does notseriously
affect the results. In fact Ninio has considered alarge
number ofslight
variations in thestacking energies
andloop energies.
Thedegree
of "successful"prediction
of the clover-leaf structure variesslightly
with the parametersused,
but isalways fairly high.
We stateagain
that theobject
of this paper is to compare real tRNA with randomsequences on a statistical basis, and not to look at the
precise
details of thesecondary
structure of any one sequence. The model defined above appearsperfectly adequate
for this purpose,without
introducing
any furthercomplications
andspecial
cases.In
figure
2 we show thehistogram
ofground
state freeenergies
for the tRNAsamples compared
to that for asample
of1000 random sequences oflength
76. There isclearly
alarge
difference bet,veen the two, with the real sequenceshaving
much lowerground
state freeenergies
than the random sequences. Theground
states for the tRNAsamples
are in the range -45 to -15kcal/iuole,
in agreement with the results ofPipas
and Mcmahon [2]. If we takea
typical
tRNA sequence of free energy -30kcal/mole,
we may calculate from thehistogram
that the
probability
offinding
a random sequence with aground
state less than orequal
to -30 isonly
about 2§l.Since the program calculates all
possible secondary
structures we can calculate thedensity
of states, I.e. the distribution of free
energies
for the different structures. For each individual sequence the distribution is ratherirregular,
and there arelarge
fluctuations from sequence to sequence(see Pipas
and Mcmahon[iii.
Infigure
3a we show the average distribution for the tRNAsamples
and for the randomsamples.
These arefairly
smooth curves. Thecolumn
heights
in thefigure
represent the average number of structures per sequence in eachkcal/mole
interval.It will be seen that the total number of structures
(area
under thecurve)
is muchlarger
for the tRNA sequences than for the random sequences, and that the tail of the distribution
representing
the most favourable structures extends to much lower freeenergies.
The reason that the realsamples
have alarger
number of structures is becausethey
have stems withrelatively long coinpleinentary
sequences. For every stem of 5 basepairs,
forexample,
thereare shorter stems with
lengths
3 or 4 formedby partially unzipping
the 5pair
stems. Thus sequences with thepossibility
offorming relatively long
stemsautomatically
have alarger
number of stems in total and hence a
larger
number of structures.(Only
structures withcorrectly
matched basepairs
have been considered here. It would also have beenpossible
too,3
tRNA random
0.2
~ ZI f
~ 0.1
-10 -30 -20 -lo 0
Free energy lKcal/motel
Fig. 2. Histogram of ground state free energies for tRNA (32
sequences)
compared to random sequences(1000
sequences of length76).
The range has been divided into boxes of width 3 kcal/mole.permit pairing
between anyregions
of the chain and toassign large
unfavourable freeenergies
to
incorrectly
matchedpairs.
In this case theri would be the same number of structures for all chains of the samelength.
Thedensity
of states would then extend topositive
freeenergies
with respect to the unfolded states, but ,vould differ very little at
energies
close to theground
state. We have not done this since it would increase
enormously
the total number ofstates).
Table I.
Average
vahies ofsonie parametersconipared
for (RNA and random sequences, and for (RNA where thenon-bonding
bases iverereplaced by
standard bases. Closecompetitors
are structures
(or
localininiina)
ii,itliin 5 kcal/niole
of theground
state.raiidom tRNA
(iio lion-bonding bases) ground
stateenergy
(kcale/iuole)
-29.7 -16.5 -30.IMean nuiuber of structures
sequences 1544 489 + 10 3081
close
competitors
li.0 33.5 + 21.6Mean number of local
per sequence 152 73 + 3 277
close
competitors
: 3.8 14.0 + 0.5 6.9The free energy distributions iii
figure
3a are of course measured relative to thecompletely
unfolded state ,vith no basepairs.
It is also of intern.st to measure the average distribution of freeenergies
relative to theground
state.Figure
3b shows the same data asfigure
3a, but thedensity
of states for each sequence hits been shifted so that theground
state is at zero, and the shifted distributions have beenaveraged.
li~e see that eventhough
the random sequences haveiso iso
al b)
w
100 loo
g tRNA
i
~§ 50 50
0 0
-10 -30 -20 -lo 0 0 lo 20 30 10
15 isoo
cl dl
° tRNA
~
lo 1000 55 rRNA
E B
)
o 5 500
Z random
0 0
0 10 20 30 10 0 20 10 60
Free energy Free energy
Fig. 3.
a)
Average density of states for tRNA compared to random sequences(same
samples as Fig.2). b)
Average density of states forsame samples, with free energy measured relative to ground
state. Whilst the total number of structures is
larger
for tlte tRNA samples, the number of structures close to the ground state is sntaller. c) Average deitsity of local minima states measured relative toground state for same samples.
d)
Density of local minima states relative to the ground state for E.coli 55 rRNA compared to tlte average deitsity for 20 random sequences oflength 120.
fewer st.ructures in total,
they
have more structures atenergies
close to theground
state thando the tRNA sequences. Note that
figures
3a and 3b do not have the sameshape,
since eachof the densities of states
contributing
to the a,,erage has been shifted relative to itsparticular groundstate.
Some statistics
are
presented
in table I(columns
I and2).
We see that the tRNA sequences haveroughly
three times as many foldedconfigurations
as the random sequences, whilstonly
about one third as many close
competitors
with theground
state. We have taken the number of closecompetitors
as the nuiuber of structures ,vithiii 5kcal/mole
of theground
state(including
the
ground
stateitself).
This trend is enhanced if,ve look
only
at structures ,vhich are local free energy minima. A local minimum is a structure to which it is notpossible
to add any further basepairs
withoutbreaking
some of thepairs,vhich
arealready
present.A structure
consisting
of the set of stems(A,
B,..,
K)
is a local minimum if(I)
there is no stem L not a member of the set, which iscompatible
with all members of the set, and,(it)
none of the members of the set can grow into alonger
stem withoutbecoming
incom-patible
with another member of the set.The first condition is
straightforward.
If there were another stem which could be added withoutdisrupting
theoriginal
set of stems, then theoriginal
set cannot be a local minimum of free energy. The second conditionrequires
moreexplanation.
There maybe,
forexample,
a 3base
pair
stem which can"grow"
into a 4 basepair
stemby
acldition of a furthercomplementary pair
on the end. These two stems would be defined asincompatible,
since a structure may contain either one or theother,
but not both. For every structurecontaining
the 3pair
stem there willusually
be a structurecontaining
the 4pair
stem instead andhaving
a lower free energy. The first structure is therefore not a free energy minimum. However, it may be that whilst the 3pair
stem wascompatible
with the other stems in the structure, the 4pair
stem would not be. In this case the structure o.ith the 3pair
stem would be a local free energyminimum.
It is clear that there will
always
be a certain number of structures close to theground
state in which some of the steins of theground
state have becomepartially "unzipped"
at the ends.These states cannot be considered
m true alternative
secondary
structures to theground
state.On the other hand the local niininia defined above represent real alternative structures because
they
contain basepairs
which are not present in theground
state. Thus if we want to know how many alternative structures are in closeproximity
to theground
state, it isinteresting
to look at the distribution of local niininia(Fig. 3c).
The effect observed in
figure
3b is enhanced: whilst the total number of local minima islarger
for the tRNA sequences than the random sequences, the number of local minima close to theground
state is smaller(see
also Tab.I).
Thisimplies
that theground
state of the tRNA sequences is more stable than theground
state of atypical
random sequence, both relative to the unfolded state, and relative to alternativecompeting
structures.In order to see if this trend ,vas found in other types of RNA molecules we looked at the
following
sequences: 55 ribosonial RNA(Rogers
et al.[12]), plant
viral RNA(Ahlquist
et al.[13])
andfragments
froiu theTetrahyinena intervening
sequence(Cech
et al.[14],
Williams and Tinoco[6]).
In each case the real sequences werecompared
to random sequences of thesame
length
and the same basecomposition.
The basecomposition
is different for the different classes ofmolecule,
but none of theiu contains any of thenon-bonding
bases present in tRNA.In most of t-he cases clear differences between random and real sequences were observed
with the same features apparent as for tRNA. The distribution of local minima for E. coli 55 ribosomal RNA is shown in
figure
3d as anexample.
~~e have notanalysed
sufficient of theselonger
sequences to have reliablestatistics,
and therefore in the rest of the paperonly
tRNA will be considered.4 Thermo
dynainic
bebaviour.Having
calculated thedensity
of states if ispossible
to obtain anythermodynamic quantities required. Firstly
thepartition
function Z isz
~ ~-Gja)/kT ~ij
where
G(a)
is the fi.ee energy of structure o. The sum is to be taken over all states, notjust
the local minima states. One
quantity
of interest is theprobability
Wo that the molecule is inits
ground
state. If Go is theground
state free energy, then theweight
of theground
state iswo
=je-Go/kT
(21Figure
4a shows thesample
average value(Wo)
as a function oftemperature
for tRNA and random sequences.The
G(a)
are freeenergies containing
bothentropic
andenthalpic
parts. Each state a is not a true microstate, but may bethought
of as a sum over all microstates with agiven
set of bonds. TheG(o)
are thus functions of temperature. We have used theexperimental
values measured at temperatures close to 300 K. Infigure
4 we have assumed these values to be fixedindependent
of temperature. The temperature scale infigure
4 is artificial and determines the;.elative
weight given
to thegroundstate
and itscompetitors.
As T - 0only
thegroundstate
is selected andas T
- cc all structures are
present,vith equal probability. Only
the temperature T = 300It,
at whichpoint
kT m 0.6kcal/mole corresponds
to a situationoccurring naturally.
A real molecule at T » 300 II ,vill of course wifold
coiupletely
since the unfolded state(with
no
bonds)
has thelargest
entropy. This is notequivalent
to thehigh
temperature limit infigure
4. Toplot quantities
as a function of "real" temperature wouldrequire experimental
data at many different temperatures.
We see in
figure
4a that(Wo)
issignificantly higher
for tRNA than for random sequences, and is close to for tRNA at 300 K,indicating
that a real molecule will almostalways
be in itsground
state. In fact(ivo)
" 0.83 for tIINA and 0.52 for random sequences at 300 K.1-o i-o
,
0.8
~°
~~~~~~D~~
~'~~°~'
,
, ,
o-I o-I
tRNA random
i i
0.2
,
, ,
o-i i io ioo o-i i io ioo
kT kT
Fig. 4.
a)
Average weight ~Vo of the ground state as a function of temperature for tRNA, random, and mirror image sequences(N
= 76 in each case ). The dotted line at kT
= 0.6 corresponds to 300 K.
See text for meaning of temperature scale.
The thermal energy kT = 0.6 is rather small
compared
with the bond freeenergies (in
therange 1.2-4.8 for lvatson-Crick
pairs,
and 0.3 forGU)
hence excitations from theground
state are rathercostly,
and (Vu isfairly large
even for the random sequences.A
special
class of seqttences which ,vill have aparticularly
stableground
state are mirrorimage
sequencescapable
offolding
int.o asingle hairpin loop.
Mirrorimage
sequences weregenerated by choosing
the first half of the molecule to be a random sequence, andsetting
the second half to be the exactcompleiuentary
sequence to the first half. As shown infigure 4a, (Wo)
is muchlarger
for mirroriiuage
sequences than for random sequences. At 300 K(Wo)
"0.97 for mirror
images.
Real tRNA has a behaviour intermediate between the two extremes.(Note
that the mirrorimage
sequences infigure
4 contained nonon-bonding bases,
whilst the random sequences contained the same fraction ofnon-bonding
bases astRNA,
to allow propercomparison).
It is also
interesting
to measure thetypical
difference of theconfiguration
from theground
state
configuration
at finite temperature. Theconfiguration
o is definedby
the bond variablesb°(I)
in thefollowing
way. If bases I andj
arepaired
then setb°(I)
=j
andb"(j)
= I. If I isunpaired
then setb°(I)
= 0. tile will define the distanceD°P
betweenconfigurations
aand
p
assimply
the number of bases I for whichb"(I) # bP(I). D°P
is ageneralization
of theHamming
distance often used for sequencecomparison
[4]. Infigure
4b we show the average distance D from theground
state as a function of kT.D =
j ~j D°°e~~(")/~~ (3)
«
where D°" is the distance of
configuration
a from theground
state. D is thus sensitive to alternative structuresdiffering widely
front theground
state.As
expected
we find that D issignificantly larger
for random sequences than tRNA(Fig. 4b).
At 300 K D m 1.7 for tRNA and D cs 8.2 for random sequences. The
figure
showsDIN
withN = 76 in each case. A D of around 2 indicates
siinple
excitations of theground
state suchas
unzipping
one basepair
froiu the end of a stein. When D m 8significant changes
in thesecondary
structure are present: loss ofa 3
base-pair
stem wouldgive
D = 6 forexample.
The behaviour of iuirror
iniage
sequences is also shown infigure
4b. A ratherabrupt change
in D is visible in this case as the temperature is increased. At 300 K D m 0.067,
indicating
almost no excitation front the
ground
state. This issimply
because there are so few accessible excited states for mirrorimage
molecules.In the
thermodynamic
liniit(N
-
cc)
mirrorimage
sequences behave verydifferently
from random sequences. This can be seenby comparing
sequences of three differentlengths (30,
50 and76)
infigure
5. For random sequences lvo decreases as N increases over the whole of the temperature range. This is because as N increases the nuiuber ofcompeting
structures close to theground
state ,vill alsoincrease,
and the,veight
of theground
state will decrease at allnon-zero temperatures.
On the other
hand,
for mirrorimage
sequences the curves for Wo superpose at low tem-peratures, and decrease with N at
high
teiuperatures. This indicates the presence of aphase
transition in the liniit N~ cc. Front
figure
5b the transition temperature isapproximately
kT~ m 10 12. Belo,v T~ theground
state has afinite,veight, independent
of N, whilst above T~, Wo is a function of N and decreases to zero as N- cc. Wo is finite at T < T~ since the number of states at accessible energy levels does not increase with N.
The
corresponding
behaviour forDIN
is also shown infigure
5. For random chains we expect D-~ N for all temperatures, and so for
large
N the curves ofDIN
should superpose.The fact that the three curves in
figure
5a do not superpose ispresumably
due to finite size effects in these rather short chains. For mirrorimage
sequences at T < T~, D is a function oftemperature
only
and not ofN,
thusDIN
- 0 as N
- cc. At T > T~ we expect D -~ N for
large
N. Finite size effects areagain
t.athet.large
infigure
5b. Thecrossing
of the three curves forDIN
is an indication of thephase
transition forlarge
N. The transition is firstorder,
I.e.quantities
such as ivo, D and the energy of the systeiu ,viiichange discontinuously
at T~ in the limit N- cc.
1-o
random mirror
0.8 0.8
~~
0.6 0.6
30
°" 76 °"
~~ 30
~~
30
o-o o-o
o,i i io ioo o-i i io ioo
kT kT
Fig. 5. Dependence of Wo and D on N for raitdom and mirror image sequences. Figures show Wo
(decreasing
curves) andDIN (increasing
curves) as functions of temperature for chains of length N = 30, 50 and 76. For the mirror image molecules tltere is a low temperature phase for which Wo isfinite even as N
- oc, wltereas for random sequeitces there is no pltase transition and Wo decreases with N at all temperatures.
As stated
above,
the temperature scale used here is artificial because we have treated the freeenergies
of the states assimple energies
which do notchange
with temperature. Thethermodynamic
behaviour would be the same if the temperaturedependence
of the states were treatedproperly.
For the mirrorimage
molecules there would be aphase
transition at the temperature where theground
state free energy isequal
to the free energy for the sum of the other states. For atypical
raiidom sequence there would be no such transition, andthermodynamic quantities
wouldchange smoothly.
Thus real tRNA molecules have a
ground
state which isconsiderably
more stable than atypical
random sequence, but less stable than the extreme case of the mirrorimage.
The mirrorimage
moleculesare a
simple example
of a system with a low temperaturephase
which isground
state dominated. Otherexamples
are discussed in section 6. The low temperaturephase
may be termedfrozen,
since theconfigurational
entropy is not extensive at T < T~. Atypical
random sequence has an extensi,,e entropy at all temperatures.5. Effect of small
changes
in the sequence.In order to
investigate
the role of the modified bases in tRNA structure, we calculated theground
state for the same set of tRNA sequences, but the modified bases which hadpreviously
been treated as
non-bonding
were treated as theequivalent
uninodified base. In most cases this did not affect theprediction
oft-heground
state, however in two cases where the clover-leafwas
predicted successfully before,
and alternative lower free energy structure was found when the modified bases wherereplaced by
standard bases.We see in table I that
replacement
ofnon-bonding
basesby
standard bases leads toonly
avery small decrease in the mean
ground
state energy. IIO,vever there is a much greaterchange
of number of structures. There are no,v aIniost twice as many structures and close
competitor
structures as before. Thus it would appear that the
non-bonding
bases mayplay
animportant
role in
eliminating
alternative structures to the clover-leaf- Ninio [3] has also looked at the effects ofnon-bonding
bases and finds that thepredictability
of the clover-leaf issignificantly
reduced if the
non-bonding
bases are treated asbonding.
One
example,vhere
a modified base was found to beimportant
was in <RNA~~~'~ from T. utilis shown infigure
I(sequence
092 from thecatalogue [8]).
When the basem(G
atposition
26was treated as
non-bonding
the clover-leaf structure waspredicted
asground
state(Fig. 1a),
however when it was treated as a standard base
G,
the alternative structurefigure
1b was found.Changing
the basepermits
an alternative lower free energy structure to form,In
general
theground
stateconfiguration
isextremely
sensitive to smallchanges
in the sequence. If in the sametRNA~'~~
molecule the base G atposition
69 isreplaced by
anA,
then theresulting
sequence hm theground
state shown infigure
lc.Changing
base 69disrupts
the acceptor stem, but this leads to achange
in theconfigu
ration of alarge
fraction of themolecule,
not
just
the acceptor stein itself.Only
theT~I loop
is conserved in all threeexamples.
We have carried out a
systematic study
of the effect of mutations in the sequence on theresulting secondary
struct.ure.Firstly,
we looked at Ipoint
iuutations. For each of the 32 tRNA sequences in thesample,
mutated sequences were
generated
which diffet.edby
one base from the real sequence. The mutated base was chosen to be eitherA, C, G,
U or nonbonding
with the sameprobabilities
as
given
in section 3(but
was forced to be different fi.om theoriginal base).
One mutatedsequence was formed for each base on the
chain,
I.e. a total ofapproximately
76 x 32 sequenceswere
analyzed.
Each mutated sequence was classedaccording
to whether itsground
state hadincreased, decreased,
or reiuained the same relative to theoriginal
sequence. Theprobabilities
p,nc, pdec, psame of these three
possibilities
aregiven
infigure
6a. Two and threepoint
mutationswere also
analyzed.
For each tRNA sequence 76 mutant sequences weregenerated differing
at 2(or 3) randomly
chosenpoints.
100
°' tRNA b)
random
8° 8°
30 random
% % w
60 60
)
I
I
Psame P;n~/P £
20 p 20 ~~ tRNA
dec
~
0
3 2 3 0 12 3
N° mutations N° mutations N° mutations
Fig. 6.
a)
Comparison of tRNA with mutant sequences. pjnc > pd~~, indicating that mutant sequences have less stable ground state structures than tRNA.b)
Mutations made to random sequences leave the chains statistically equivalent, therefore pjnc" 1Jdec for random sequences,
c)
The number ofclose competitor structures ,vitbin 5
kcal/mole
of the grouitd state increases as mutations are made totRNA sequences. The original tRNA sequences are shown at 0 mutations. The dotted line indicates
the value for raitdom sequences.
In each case pmc is