RNA secondary structure: a comparison of real and random sequences

(1)

HAL Id: jpa-00246711

https://hal.archives-ouvertes.fr/jpa-00246711

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

RNA secondary structure: a comparison of real and random sequences

Paul Higgs

To cite this version:

Paul Higgs. RNA secondary structure: a comparison of real and random sequences. Journal de

Physique I, EDP Sciences, 1993, 3 (1), pp.43-59. �10.1051/jp1:1993116�. �jpa-00246711�

(2)

Classification Physics Abstracts

87,15 36,20

RNA secondary structure:

_a

comparison of real and random sequences

Paul G.

Higgs(*)

Service de Physique Tltdorique (**) de Saclay, F-91191 Gif-sur-Yvette Cedex, France

(Received

?3 July 1992, accepted in final form 23 September

1992)

Abstract A sample of tra»sfer RNA molecules is compared to _a sample of random _sequences

having the _same length and _same percentage composition of the different bases. For each sequence all possible secondary structures _are constructed and _a distribution of free energies for

the states is obtained. It is found that the ground state free energies of tRNA molecules _are

significantly

lower than for random _sequences, and that tRNA molecules have significantly fewer alternative secondary structures at energies close to the ground state than do random _sequences.

A distance D is defined which

measures the average difference between molecular configurations

and the ground state configuration. At realistic temperatures of order 300 K this distance is much larger for random _sequences than for tRNA _sequences. Thus the secondary structure of tRNA molecules _at finite temperature is _more stable than for random _sequences. Sequences _are

considered wltich differ by _a small number of ntutations from real tRNA _sequences. On _average mutations destabilize _the secondary structure. This suggests ^that a stable secondary structure

is _one of the factors selected for by natural selection. The thermodynamic behaviour of RNA

sequences is compared to models for ranqom heteropolymers which have _a low temperature

frozen phase.

1 Introduction.

The

secondary

_structure of ribonucleic acid molecules is known to be _a

complex

set of "stems"

and

"loops"

formed

by

base

pairing

bet,veen

complementary regions

of the chain

(Saenger

[1], and _see

examples

in

Fig. 1).

It is _no,v

coninion-Place

to _use

computational

methods _to

predict

the

secondary

structure of

particular

RNA _sequences [2-7] or to confirm

experimental

evidence for the structure.

(*)

Address from Oct. 92; Dept. of Physics, Uitii,ersity of Sheffield, Hounsfield Road, Sheffield 53 7RH, G-B-

(**) Laboratoire de la Direction des Sciences de la MatiAre du Commissariat I

l'Energie

Atomique.

(3)

Typically

_a _program will consider the many

possible

_ways of

folding

_a chain into _a

secondary

structure and select the structure which niininiizes the free energy, and

possibly

_a few alterna-

tive structures of

slightly higher

free energy. The lowest free _energy state is

generally

assumed

to be

"stable",

and should

correspond

to the

biological

structure if the parameters ^used to calculate the _energy _are known

sufficiently accurately.

We will call this most favourable state the

ground

state. From _a

thermodynamical point

of view it is

by

_no _mealis sufficient to know what the

ground

state of _a

physical

system

is,

in order to know whether this state is stable. There will be _an

exponentially large

number of other

configurations

of the system, and _even

though

the

ground

state is _more favourable than _any

one of these other states, the

probability

of

finding

the system in its

ground

state

configuration

may be

negligible.

In this article _we look at the

thermodynamics

of RNA

folding.

We will _use

a method which

can calculate the distribution of

energies

of all

possible configurations,

and hence the

probability

of

finding

the molecule in its

ground

state. ~Te o,ill focus _on transfer RNAS, since these _are among the shortest RNA _sequences

(approx.

76

bases)

and since the

secondary

structrue is well known. Each

species

has _a series of tRNA molecules, each of which is able to bind _to _one of the 20 amino acids. The tRNA sequences for the different amino acids _are known for _many

species,

and have been

catalogued (Sprintzl

et al.

[8]). Although

the sequences differ from

one another

they

_can all be

arranged

into the clover-leaf

secondary

structure

(Fig. 1a)

and the evidence suggests that this is the

naturally occurring

structure [1].

Base-paired regions

of the chain _are called _stems. In tRNA the acceptor stem contains

typically

6 _or 7

pairs,

whilst the other three stems contain between 3 and 5

pairs, depending

_on the _sequence.

Q b _C

~~j~$°~

⁶⁹

D loop

TV loop

Aniicodon loop

Fig. 1. al Example of clover-leaf _structure: the ground state of tRNA~'~'~ _from _T. _utilis.

b)

and

c) The ground state structure for two sequences each differing from tRNA~'~'~ by only one base _at positions 26 and 69 respectively. Small changes in tlie _sequence _can lead _to large scale reorgan12ation of the structure.

If _a

biological

molecule is to

play

its pt.oper role in _a

living cell,

it needs _to be able _to

recognize

^and interact ,vith other

biological

molecules. Such interactions will

usually only

be

possible

if the molecule is in a

particular configuration.

An

iniportant

property ^of _a useful

(4)

bio-molecule is _to _possess _a stable _structure

(or possibly

_a small number of alternative stable

structures)

in which the molecule is almost

always

to be found.

Physicists

have been lead to draw _a

parallel

between the

folding

of

biological

molecules

(particularly proteins)

and models for

strongly

disordered systems such _as

spin glasses (Bryngelson

and

Wolynes

[9], ^Garel ^and

Orland [10], Shakhnovich and Gutin

[I II).

In these

articles,

molecules _are treated _as random

monomer sequences. The random disorder is shown _to lead _to the

freezing

of the molecule into

a small number of low energy

configurations

in _some _cases.

In the present _paper _we will compare a

sample

of real tRNA _sequences with _a

sample

of

randomly generated

base _sequences. It is clear that real _sequences _are in _no _way random.

Natural selection has been

acting

_on

biological

molecules

tuning

them _to have certain

required properties.

A stable

secondary

structure is

likely

to be _one of these

properties,

and _so _we should _not be

surprised

to find

significant

differences between the _structures of real and random

sequences. On the other

hand,

_a

glance

at the tRNA

catalogue (Sprintzl

et al. [8]) ^shows ^that the

primary

_sequences of the different iuolecules differ from _one another _a

great deal,

^with no

apparent pattern ^visible.

Although

there is _a small number of conserved bases in the

primary

sequence [1, 8], ^the

siiuilarity

of the molecules is much _more apparent ^when

comparing

the

secondary

structures rather than the

priiuary

base sequences. There _are _many other classes of molecule for which the

secondary

structure is also well

preserved despite changes

in the

sequence, e-g- 55 ribosoinal RNA

(Rogers

et al.

[12])

^and ^viral ^RNA

(Ahlquist

et al.

[13]).

^In

the _case of tRNA, since there _are _a

large

number of known _sequences, _we _are lead to look at their

properties

in _a statistical _manner.

2.

Description

of the pi,ograni.

There _are _now _man>,

coniput.ational algorit.hins

,vhich attempt to

predict

RNA structure

by searching

for the lo,,.est free energy

configuration [2-7]. Thermodynamic

parameters ^measured in

experiment

_are

incorporated

into the prograiiis. Contributions to the free energy are basi-

cally

of t,vo kinds: _a fi.ee energy

gain

for each

correctly matching

base

pair

added _to _a stem

("stacking"),

and

a fi.ee _energy

penalty

for every

loop

closed. Favourable

secondary

structures will therefore have _a

relati;ely

small number of stems which _are _as

long

_as

possible,

rather

than _a

large

number of _very shot.t steins.

The

stacking

^free

energies depend

_on the base

pair

added and _on the

previous pair

in the stem. Allo,ved base

pairs

_are A

U,

C

G,

and the non-standard

pair

G U. The values used in _our program are those

conveniently

tabulated

by

Jacobson et al. [5].

They

_vary between

-0.3

kcal/mole

and -4.8

kcal/mole depending

_on the stacked

pair.

These values _are _more _or less standard in the literature.

Much less standard is the _treatment of

loops.

^This is

partly

because

experimental

values for

loop

_parameters _ai'e not

al,vays

available. Thus _even the _more recent works involve _a

considerable amount of

approxiniation

in the values

assigned

to

loops (Jaeger

et al.

[7]).

^We have decided to treat

loops

in _a _very

simplified

_,vay. Our

object

here is _not to

provide

the

most accurate

prediction

of the

secondary

structure for _one

particular

_sequence, but to look

at

general

features of the

folding

behaviour in _a statistical _manner. We therefore _note that

every time _a _ne,v stein is added to the structure _a _new

loop

is also formed

(possibly

_a

hairpin,

an interior

loop,

_a

bulge loop,

etc..

).

Ilence the total number of

loops

is

equal

to the total number of stems. life ,vill make the

siniplification

of

assigning

_a

penalty

of +4.5

kcal/mole

to all

loops, irrespective

of their type ^and

length.

This value _appears to be

typical

of the values

given by

Jacobson et al. [5],

particularly

for tlie

hairpin

in

loops

of 4 to 8 bases which _are

present in the tRNA clover-leaf structut.e.

(5)

With this

simplification

_we _can

assign

_a net free _energy to _a sequence

equal

to the

stacking

energy +4.5. A stem is stable relative to the coil _state if its net free _energy

(including

the

penalty

for the

loop

which it

closes)

is

negative.

A stem

always

contributes

equally

to the free energy of _a structure in this

model, regardless

of the

topology

of the

loops.

We will be interested not

just

in the lowest free _energy structure, ^but ⁱⁿ ^the distribution of

energies

of these _structures

(density

of

states).

Our _program is rather similar _to that of

Pipas

and Macmahon [2] for this _reason. It therefore

requires

a

large

amount of storage, ^and ^is ^less suitable for

long

_sequences than alternative methods [4-7] ^due to the

exponential

number of

possible

structures. The _program

proceeds

_as follows.

SEARCH FOR POSSIBLE STEMS. All

points

in the _sequence _are checked for

complementary pairs,

and _a list of

possible

stems is made. A _stem is added to the list if it contains at least 3

base

pairs,

and if its net free energy,

including

the

penalty

for

loop closure,

is

negative.

There must be _a minimum of 3

unpaired

bases in _a

hairpin loop,

hence if base I and

j

_are

paired

within _a stem, then

j

> I + 4. ~Ve have follo,ved

Pipas

and Mcmahon [2] ⁱⁿ the treatment of the GU

pair,

I.e. GU

pairs

_are allowed within

a stein, but not _as the terminal

pair

in _a stem.

CREATION OF A COMPATIBILITY MATRIX. A matriX C is created such that the elements CAB ₌ I if stems A and B _are

compatible,

and CAB ₌ 0 otherwise.

Two stems _are

compatible

if

they

do not

overlap (I.e.

a base cannot be bonded in _more than

one stem at

once),

and if

they satisfy

the "no knots" rule. This

requires

that if bases I and j are

paired

in _one stem, and k and I _are

paired

in another stem, then either I _<

j

< k _< I,

or I _< k _< I <

j.

The other

possibility

I _< k _<

j

< I is forbidden

(see

Sankoff et al.

[4]).

^A

consequence of the _no knots rule is that all allowed

secondary

structures can be drawn in 2d

without the line of the chain

crossing

_over itself.

COMPILATION _OF ALL POSSIBLE STRUCTURES. A structure is _a set of

compatible

steals taken from the list. Each stem A represents _a

possible

structure in itself. The _program then

creates a list of all structures

containing

_a

pair

of

compatible

stems A and B. To prevent ^double

counting

we

require

B > A. For each

pair

A and B the _program then searches for all structures

containing

three stetns

ABC,

all of,vhich _are

compatible

with each

other,

and

requiring

C _>

B. Structures

containing

4, 5, 6.. steins _can be built _up in this _way. For the moderate

length

chains considered here there _,vere _a very

large

number of structures

containing

3 _or 4 stems, but structures with more than 6 steins _were aIniost

impossible

due to the restrictions of

compatibility.

Once the set of steins contained in _a structure is kno,vn, the free _energy of the structure is

simply

the _suiu of the fi.ee

energies

of the stems. We note that in _a _more realistic treatment of the

loops

this would not be true, and it would be _necessary to test the

topology

of the

loops

in _a

given

structure to calculate its fi.ee energy. Thus the _program would be much

longer.

3.

Comparison

of real and random _sequences.

A

sample

of tRNA _sequences _was taken from the

compilation

of

Sprintzl

_et al. [8]. ^This ^is ^the

same source as used

by

Ninio [3] ⁱⁿ a

previous investigation

of tRNA structure. Two _sequences

were taken from the list for each amino acid

(,vhere

_more than _one is

given).

The tRNAS for

Leucine,

Serine and

Tyrosine

_were excluded front the

satnple

since they contain _an extra arm, and _are

significantly longer

than the rest. The result _was _a

sample

of 32 tRNAS with

lengths

in the _range 74-77 bmes and _mean

length

^close to 76. All of these _can be

arranged

in the clover-leaf pattern ^sho,vn ⁱⁿ

figure

la.

(6)

One characteristic

distinguishing

tRNA from most other RNA is the _presence of modified bases in addition to the four standard bases

A, C,

G and U

(Saenger [Ii).

Some of these _are

modified in such _a _way _as _to prevent ^base

pairing,

_so it is _necessary _to introduce

a class of

non-bonding

bases into the _program.

Following

Ninio [3] we have treated the

following

bases

as

non-bonding: D, m~G, m(G, m~G, Q, Y,

and

m~C.

All other modified bases _were treated

as the standard base to which

they

most resemble. In

particular

T and ~l were treated _as U, and I _was treated

as G. The

proportions

of the five different types ^of ^base ⁱⁿ ^the

sample

of tRNAS studied _were: 20.2~

A,

27.0~

C,

28.4~

G,

20.0$l U and 4A$l

non-bonding.

Properties

of tRNA _sequences _were

compared

with

a

sample

of random _sequences of

length

76 bases. Each base in the random _sequences

was chosen _to be

A, C, G,

U _or

non-bonding

with _a

probability equal

to the

probability

of _occurrence in the real _sequences. It is known that free

energies depend

_on the C ₊ G _content of the

chains,

hence

we wished _to be _sure that the random _sequences had the _same

composition

_as the tRNA sequences.

As _a test of the model _we calculated the

ground

state structure for the 32 tRNA _sequences.

Of

these,

25 _were found to have the clover-leaf structure as

ground

state, a further 5 _were found to have the clover-leaf _structure except for _a

niissing

D stem, and the

remaining

2 had

ground

states other tlian the clover-leaf. These results _are

typical

of those obtained

by Pipas

and Mcmahon [2] ^and Ninio [3]. ^It ^would ^therefore seem that the

simplified

treatment of the

loop energies

does not

seriously

affect the results. In fact Ninio has considered _a

large

number of

slight

variations in the

stacking energies

and

loop energies.

The

degree

of "successful"

prediction

of the clover-leaf structure varies

slightly

with the parameters

used,

but is

always fairly high.

We state

again

that the

object

of this _paper is to compare real tRNA with random

sequences on a statistical basis, and not to look at the

precise

details of the

secondary

structure of _any _one sequence. The model defined above _appears

perfectly adequate

for this _purpose,

without

introducing

_any further

complications

and

special

_cases.

In

figure

² we show the

histogram

of

ground

state free

energies

for the tRNA

samples compared

to that for _a

sample

of1000 random _sequences of

length

76. There is

clearly

_a

large

difference bet,veen the two, with the real sequences

having

much lower

ground

state free

energies

^than the random _sequences. The

ground

states for the tRNA

samples

are in the _range -45 to -15

kcal/iuole,

in agreement ^with ^the ^results ^of

Pipas

and Mcmahon [2]. If we take

a

typical

tRNA _sequence of free _energy -30

kcal/mole,

_we _may calculate from the

histogram

that the

probability

of

finding

a random _sequence with _a

ground

state less than _or

equal

to -30 is

only

about 2§l.

Since the _program calculates all

possible secondary

structures _we can calculate the

density

of states, I.e. the distribution of free

energies

for the different structures. For each individual sequence the distribution is rather

irregular,

and there _are

large

fluctuations from _sequence to sequence

(see Pipas

and Mcmahon

[iii.

In

figure

3a _we show the _average distribution for the tRNA

samples

and for the random

samples.

^These are

fairly

smooth _curves. The

column

heights

in the

figure

represent the average number of structures per sequence in each

kcal/mole

interval.

It will be _seen that the total number of _structures

(area

under the

curve)

is much

larger

for the tRNA _sequences than for the random sequences, and that the tail of the distribution

representing

the most favourable structures extends _to much lower free

energies.

The _reason that the real

samples

have _a

larger

number of structures is because

they

have stems with

relatively long coinpleinentary

_sequences. For _every stem of 5 base

pairs,

for

example,

there

are shorter stems with

lengths

3 _or 4 formed

by partially unzipping

the 5

pair

stems. Thus sequences with the

possibility

of

forming relatively long

stems

automatically

^have a

larger

number of stems in total and hence _a

larger

^number ^of structures.

(Only

structures with

correctly

^matched base

pairs

have been considered here. It would also have been

possible

to

(7)

o,3

tRNA random

0.2

~ ZI f

~ 0.1

-10 -30 -20 -lo 0

Free energy lKcal/motel

Fig. 2. Histogram of ground state free energies for tRNA (32

sequences)

compared to random sequences

(1000

_sequences of length

76).

The range has been divided into boxes of width 3 kcal/mole.

permit pairing

between _any

regions

of the chain and to

assign large

unfavourable free

energies

to

incorrectly

matched

pairs.

In this _case theri would be the _same number of structures for all chains of the _same

length.

The

density

of states would then extend to

positive

free

energies

with respect to the unfolded states, but ,vould differ _very little at

energies

close to the

ground

state. We have not done this since it would increase

enormously

the total number of

states).

Table I.

Average

vahies ofsonie parameters

conipared

for (RNA and random _sequences, and for (RNA where the

non-bonding

bases _ivere

replaced by

standard bases. Close

competitors

are structures

(or

local

ininiina)

ii,itliin 5 kcal

/niole

of the

ground

state.

raiidom tRNA

(iio lion-bonding bases) ground

state

energy

(kcale/iuole)

-29.7 -16.5 -30.I

Mean nuiuber of structures

sequences 1544 489 + 10 3081

close

competitors

li.0 33.5 + 21.6

Mean number of local

per sequence 152 73 + 3 277

close

competitors

_: 3.8 14.0 + 0.5 6.9

The free _energy distributions iii

figure

3a _are of _course measured relative _to the

completely

unfolded _state ,vith _no base

pairs.

It is also of intern.st to _measure the average distribution of free

energies

relative to the

ground

state.

Figure

3b shows the _same data _as

figure

3a, but the

density

of states for each _sequence hits been shifted _so that the

ground

state is at zero, and the shifted distributions have been

averaged.

li~e _see that _even

though

the random _sequences have

(8)

iso iso

al b)

w

100 _loo

g tRNA

i

~

§ 50 50

0 0

-10 -30 -20 -lo 0 0 lo 20 30 10

15 isoo

cl dl

° tRNA

~

lo 1000 55 rRNA

E B

)

o 5 500

Z random

0 ₀

0 10 20 30 10 0 20 10 60

Free energy Free energy

Fig. 3.

a)

Average density of states for tRNA compared to random sequences

(same

samples _as Fig.

2). b)

Average density of states for

same samples, with free energy measured relative to ground

state. Whilst the total number of structures is

larger

for tlte tRNA samples, the number of _structures close to the ground state is sntaller. c) Average deitsity of local minima states measured relative to

ground state for _same samples.

d)

Density of local minima states relative to the ground state for E.

coli 55 rRNA compared to tlte _average deitsity for 20 random _sequences oflength 120.

fewer st.ructures in total,

they

have _more structures at

energies

^close to the

ground

state than

do the tRNA _sequences. Note that

figures

3a and 3b do not have the _same

shape,

since each

of the densities of states

contributing

to the a,,erage has been shifted relative to its

particular groundstate.

Some statistics

are

presented

in table I

(columns

I and

2).

We _see that the tRNA _sequences have

roughly

three times _as _many folded

configurations

as the random _sequences, whilst

only

about one third _as many close

competitors

with the

ground

state. We have taken the number of close

competitors

_as the nuiuber of structures ,vithiii 5

kcal/mole

of the

ground

state

(including

the

ground

state

itself).

This trend is enhanced if,ve look

only

at structures ,vhich _are local free _energy minima. A local minimum is _a _structure to which it is not

possible

to add _any further base

pairs

without

breaking

_some of the

pairs,vhich

_are

already

_present.

A structure

consisting

of the set of stems

(A,

B,..

,

K)

is _a local minimum if

(I)

there is _no stem L not _a member of the set, which is

compatible

with all members of the set, and,

(9)

(it)

none of the members of the set can grow into _a

longer

stem without

becoming

incom-

patible

with another member of the set.

The first condition is

straightforward.

If there _were another _stem which could be added without

disrupting

the

original

set of stems, then the

original

set cannot be _a local minimum of free _energy. The second condition

requires

_more

explanation.

There _may

be,

for

example,

a 3

base

pair

stem which _can

"grow"

into _a 4 base

pair

stem

by

acldition of _a further

complementary pair

on the end. These two stems would be defined _as

incompatible,

since _a structure may contain either _one _or the

other,

but _not both. For _every structure

containing

the 3

pair

stem there will

usually

be _a structure

containing

the 4

pair

stem instead and

having

_a lower free energy. The first structure is therefore not _a free _energy minimum. However, it _may be that whilst the 3

pair

stem was

compatible

with the other stems in the structure, ^the 4

pair

stem would not be. In this _case the structure o.ith the 3

pair

stem would be _a local free energy

minimum.

It is clear that there will

always

be _a certain number of structures close to the

ground

state in which _some of the steins of the

ground

state have become

partially "unzipped"

at the ends.

These states cannot be considered

m true alternative

secondary

structures to the

ground

state.

On the other hand the local niininia defined above represent ^real alternative structures because

they

contain base

pairs

which _are not present in the

ground

state. Thus if _we want to know how many alternative structures _are in close

proximity

to the

ground

state, it is

interesting

to look at the distribution of local niininia

(Fig. 3c).

The effect observed in

figure

3b is enhanced: whilst the total number of local minima is

larger

for the tRNA _sequences than the random sequences, the number of local minima close to the

ground

state is smaller

(see

also Tab.

I).

This

implies

that the

ground

state of the tRNA sequences is _more stable than the

ground

state of _a

typical

random _sequence, both relative to the unfolded state, and relative to alternative

competing

structures.

In order to _see if this trend _,vas found in other types ^of ^RNA ^molecules we looked _at the

following

_sequences: 55 ribosonial RNA

(Rogers

et al.

[12]), plant

^viral ^RNA

(Ahlquist

et al.

[13])

^and

fragments

froiu the

Tetrahyinena intervening

_sequence

(Cech

et al.

[14],

Williams and Tinoco

[6]).

^In ^each case the real _sequences _were

compared

to random sequences of the

same

length

and the _same base

composition.

The base

composition

is different for the different classes of

molecule,

but _none of theiu contains _any of the

non-bonding

bases present ⁱⁿ ^tRNA.

In most of t-he _cases clear differences between random and real _sequences _were observed

with the _same features apparent as for tRNA. The distribution of local minima for E. coli 55 ribosomal RNA is shown in

figure

3d _as _an

example.

~~e have not

analysed

sufficient of these

longer

_sequences to have reliable

statistics,

and therefore in the rest of the paper

only

tRNA will be considered.

4 Thermo

dynainic

bebaviour.

Having

calculated the

density

of _states _{if is}

possible

to obtain _any

thermodynamic quantities required. Firstly

the

partition

function Z is

z

~ ~-Gja)/kT ~ij

where

G(a)

is the fi.ee _energy of _structure _o. The _sum is to be taken _over all states, not

just

the local minima states. One

quantity

of interest is the

probability

Wo that the molecule is in

(10)

its

ground

state. If Go is the

ground

state free _energy, then the

weight

of the

ground

state is

wo

₌

je-Go/kT

⁽²¹

Figure

4a shows the

sample

_average value

(Wo)

as a function of

temperature

^for tRNA and random _sequences.

The

G(a)

_are free

energies containing

both

entropic

and

enthalpic

_parts. Each _state _a is not _a true microstate, but _may be

thought

of _as _a _sum _over all microstates with _a

given

set of bonds. The

G(o)

_are thus functions of temperature. We have used the

experimental

values measured at temperatures ^close to 300 K. In

figure

4 _we have assumed these values to be fixed

independent

of temperature. The temperature scale in

figure

4 is artificial and determines the

;.elative

weight given

to the

groundstate

and its

competitors.

As T _- 0

only

the

groundstate

is selected and

as T

- cc all structures _are

present,vith equal probability. Only

the temperature T = 300

It,

at which

point

kT _m 0.6

kcal/mole corresponds

to a situation

occurring naturally.

A real molecule at T » 300 II ,vill of _course wifold

coiupletely

since the unfolded _state

(with

no

bonds)

has the

largest

entropy. ^This ^is not

equivalent

to the

high

temperature limit in

figure

4. To

plot quantities

_as _a function of "real" temperature would

require experimental

data at many different temperatures.

We _see in

figure

4a that

(Wo)

^is

significantly higher

for tRNA than for random _sequences, and is close to for tRNA _at 300 K,

indicating

that _a real molecule will almost

always

be in its

ground

state. In fact

(ivo)

_" 0.83 for tIINA and 0.52 for random sequences ^at 300 K.

1-o i-o

,

0.8

~°

^~~~~~~

D~~

^~'~~°~

'

,

, ,

o-I o-I

tRNA random

i i

0.2

,

, ,

o-i i io ioo o-i i io ioo

kT kT

Fig. 4.

a)

Average weight ~Vo of the ground state _as _a function of temperature for tRNA, random, and mirror image _sequences

(N

= 76 in each _case ). ^The ^dotted ^line at kT

= 0.6 corresponds to 300 K.

See text for meaning of temperature scale.

The thermal _energy kT ₌ 0.6 is rather small

compared

with the bond free

energies (in

^the

range 1.2-4.8 for lvatson-Crick

pairs,

and 0.3 for

GU)

hence excitations from the

ground

state are rather

costly,

and (Vu is

fairly large

_even for the random _sequences.

A

special

class of _seqttences which ,vill have _a

particularly

stable

ground

state are mirror

image

_sequences

capable

of

folding

int.o _a

single hairpin loop.

^Mirror

image

_sequences _were

(11)

generated by choosing

the first half of the molecule to be _a random _sequence, and

setting

the second half _to be the exact

compleiuentary

_sequence to the first half. As shown in

figure 4a, (Wo)

is much

larger

for mirror

iiuage

_sequences than for random _sequences. At 300 K

(Wo)

_"

0.97 for mirror

images.

Real tRNA has _a behaviour intermediate between the two extremes.

(Note

that the mirror

image

_sequences in

figure

4 contained _no

non-bonding bases,

whilst the random _sequences contained the _same fraction of

non-bonding

^bases _as

tRNA,

to allow _proper

comparison).

It is also

interesting

to measure the

typical

difference of the

configuration

from the

ground

state

configuration

at finite temperature. ^The

configuration

_o is defined

by

the bond variables

b°(I)

in the

following

_way. If bases I and

j

_are

paired

then _set

b°(I)

₌

j

and

b"(j)

₌ I. If I is

unpaired

then set

b°(I)

₌ 0. tile will define the distance

D°P

_between

configurations

_a

and

p

_as

simply

the number of bases I for which

b"(I) # bP(I). D°P

is _a

generalization

of the

Hamming

distance often used for _sequence

comparison

_[4]. In

figure

4b _we show the _average distance D from the

ground

state _as _a function of kT.

D =

j _~j D°°e(")/ (3)

«

where D°" _is the distance of

configuration

_a from the

ground

state. D is thus sensitive _to alternative structures

differing widely

front the

ground

state.

As

expected

_we find that D is

significantly larger

for random _sequences than tRNA

(Fig. 4b).

At 300 K D m 1.7 for tRNA and D _cs 8.2 for random _sequences. The

figure

shows

DIN

with

N ₌ 76 in each _case. A D of around 2 indicates

siinple

excitations of the

ground

state such

as

unzipping

one base

pair

froiu the end of _a stein. When D _m 8

significant changes

in the

secondary

structure are present: loss of

a 3

base-pair

stem would

give

^D ₌ 6 for

example.

The behaviour of iuirror

iniage

_sequences is also shown in

figure

4b. A rather

abrupt change

in D is visible in this _case _as the temperature is increased. At 300 K D _m 0.067,

indicating

almost _no excitation front the

ground

state. This is

simply

because there _are _so few accessible excited _states for mirror

image

molecules.

In the

thermodynamic

liniit

(N

-

cc)

mirror

image

_sequences behave _very

differently

from random _sequences. This _can be _seen

by comparing

_sequences of three different

lengths (30,

50 and

76)

in

figure

5. For random _sequences lvo decreases _as N increases _over the whole of the temperature _range. This is because _as N increases the nuiuber of

competing

structures close to the

ground

state ,vill also

increase,

and the

,veight

of the

ground

state will decrease _at all

non-zero temperatures.

On the other

hand,

for mirror

image

_sequences the _curves for Wo _superpose at low tem-

peratures, ^and decrease with N _at

high

teiuperatures. This indicates the _presence of _a

phase

transition in the liniit N

~ cc. Front

figure

5b the transition temperature is

approximately

kT~ m 10 12. Belo,v T~ the

ground

state has _a

finite,veight, independent

of N, whilst above T~, Wo is _a function of N and decreases _to _zero _as N

- cc. Wo is finite at T _< T~ since the number of states at accessible energy levels does not increase with N.

The

corresponding

behaviour for

DIN

is also shown in

figure

5. For random chains _we expect D

-~ N for all temperatures, ^and so for

large

N the _curves of

DIN

should superpose.

The fact that the three _curves in

figure

5a do _not _superpose is

presumably

due to finite size effects in these rather short chains. For mirror

image

_sequences at T < T~, D is _a function of

temperature

only

and _not of

N,

thus

DIN

- 0 _as N

- cc. At T > T~ we expect ^D -~ N for

large

N. Finite size effects _are

again

t.athet.

large

in

figure

5b. The

crossing

of the three _curves for

DIN

is _an indication of the

phase

transition for

large

N. The transition is first

order,

I.e.

quantities

such _as ivo, D and the _energy of the systeiu ^,viii

change discontinuously

at T~ in the limit N

- cc.

(12)

1-o

random mirror

0.8 0.8

~~

0.6 0.6

30

°" 76 °"

~~ 30

~~

30

o-o o-o

o,i i io ioo o-i i io ioo

kT kT

Fig. 5. Dependence of Wo and D _on N for raitdom and mirror image _sequences. Figures show Wo

(decreasing

curves) and

DIN (increasing

curves) _as functions of temperature for chains of length N = 30, 50 and 76. For the mirror image molecules tltere is _a low temperature phase for which Wo is

finite _even _as N

- oc, wltereas for random _sequeitces there is _no pltase transition and Wo decreases with N at all temperatures.

As stated

above,

the temperature scale used here is artificial because _we have treated the free

energies

of the _states _as

simple energies

which do not

change

^with temperature. ^The

thermodynamic

^behaviour would be the _same if the temperature

dependence

of the states were treated

properly.

For the mirror

image

molecules there would be _a

phase

transition at the temperature where the

ground

state free _energy is

equal

to the free _energy for the _sum of the other states. For _a

typical

raiidom sequence there would be _no such transition, and

thermodynamic quantities

would

change smoothly.

Thus real tRNA molecules have _a

ground

state which is

considerably

_more stable than _a

typical

random _sequence, but less stable than the extreme _case of the mirror

image.

The mirror

image

molecules

are a

simple example

of _a system ^with a low temperature

phase

which is

ground

state dominated. Other

examples

_are discussed in section 6. The low temperature

phase

_may be termed

frozen,

since the

configurational

entropy ^is not extensive at T _< T~. A

typical

random _sequence has _an extensi,,e entropy at all temperatures.

5. Effect of small

changes

ⁱⁿ the sequence.

In order to

investigate

the role of the modified bases in tRNA structure, _we calculated the

ground

state for the _same set of tRNA _sequences, but the modified bases which had

previously

been treated _as

non-bonding

were treated _as the

equivalent

uninodified base. In most _cases this did not affect the

prediction

oft-he

ground

state, ^however ⁱⁿ ^two cases where the clover-leaf

was

predicted successfully before,

and alternative lower free energy ^structure was found when the modified bases where

replaced by

standard bases.

We _see in table I that

replacement

of

non-bonding

bases

by

standard _bases leads _to

only

_a

very small decrease in the _mean

ground

state energy. IIO,vever there is _a much greater

change

of number of structures. There _are _no,v aIniost twice _as _many structures and close

competitor

structures _as before. Thus it would appear that the

non-bonding

bases may

play

an

important

(13)

role in

eliminating

alternative structures to the clover-leaf- Ninio [3] has also looked at the effects of

non-bonding

bases and finds that the

predictability

of the clover-leaf is

significantly

reduced if the

non-bonding

bases _are treated _as

bonding.

One

example,vhere

_a modified base _was found _to be

important

_was in <RNA~~~'~ from T. utilis shown in

figure

I

(sequence

092 from the

catalogue [8]).

^When ^the ^base

m(G

_at

position

26

was treated _as

non-bonding

the clover-leaf structure was

predicted

_as

ground

state

(Fig. 1a),

however when it _was treated _as _a standard base

G,

the alternative structure

figure

1b _was found.

Changing

the base

permits

_an alternative lower free _energy _structure _to form,

In

general

the

ground

state

configuration

is

extremely

sensitive to small

changes

in the sequence. If in the _same

tRNA~'~~

_molecule _the _base _G _at

position

69 is

replaced by

_an

A,

then the

resulting

_sequence hm the

ground

state shown in

figure

lc.

Changing

base 69

disrupts

the acceptor stem, ^but this leads _to _a

change

in the

configu

ration of _a

large

fraction of the

molecule,

not

just

the acceptor stein itself.

Only

the

T~I loop

is conserved in all three

examples.

We have carried out _a

systematic study

of the effect of mutations in the _sequence _on the

resulting secondary

struct.ure.

Firstly,

_we looked at I

point

iuutations. For each of the 32 tRNA _sequences in the

sample,

mutated sequences were

generated

^which ^diffet.ed

by

one base from the real sequence. The mutated base _was chosen to be either

A, C, G,

U _or _non

bonding

^with the _same

probabilities

as

given

in section 3

(but

_was forced to be different fi.om the

original base).

One mutated

sequence was formed for each base _on the

chain,

I.e. _a total of

approximately

76 _x 32 sequences

were

analyzed.

Each mutated _sequence _was classed

according

to whether its

ground

state had

increased, decreased,

_or reiuained the _same relative _to the

original

_sequence. The

probabilities

p,nc, pdec, psame of these three

possibilities

_are

given

in

figure

6a. Two and three

point

mutations

were also

analyzed.

For each tRNA sequence 76 mutant sequences were

generated differing

at 2

(or 3) randomly

chosen

points.

100

°' _tRNA b)

random

8° 8°

30 random

% % _w

60 60

)

I

Psame P;n~/P £

20 p 20 ^~~ tRNA

dec

~

0

3 2 3 0 12 3

N° mutations N° mutations N° mutations

Fig. 6.

a)

Comparison of tRNA with mutant sequences. pjnc > pd~~, indicating that mutant sequences have less stable ground state structures than tRNA.

b)

Mutations made to random _sequences leave the chains statistically equivalent, therefore _pjnc

" 1Jdec for random _sequences,

c)

The number of

close competitor structures ,vitbin 5

kcal/mole

of the grouitd state increases _as mutations _are made to

tRNA _sequences. The original tRNA sequences are shown at 0 mutations. The dotted line indicates

the value for raitdom sequences.

In each _case _pmc is

significant.ly

greater ^than pdec,

indicating

that

making

mutations tends to destabilize the

ground

state on average. The mutant sequences also have _a reduced total