• Aucun résultat trouvé

RNA secondary structure: a comparison of real and random sequences

N/A
N/A
Protected

Academic year: 2021

Partager "RNA secondary structure: a comparison of real and random sequences"

Copied!
18
0
0

Texte intégral

(1)

HAL Id: jpa-00246711

https://hal.archives-ouvertes.fr/jpa-00246711

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

RNA secondary structure: a comparison of real and random sequences

Paul Higgs

To cite this version:

Paul Higgs. RNA secondary structure: a comparison of real and random sequences. Journal de

Physique I, EDP Sciences, 1993, 3 (1), pp.43-59. �10.1051/jp1:1993116�. �jpa-00246711�

(2)

Classification Physics Abstracts

87,15 36,20

RNA secondary structure:

a

comparison of real and random sequences

Paul G.

Higgs(*)

Service de Physique Tltdorique (**) de Saclay, F-91191 Gif-sur-Yvette Cedex, France

(Received

?3 July 1992, accepted in final form 23 September

1992)

Abstract A sample of tra»sfer RNA molecules is compared to a sample of random sequences

having the same length and same percentage composition of the different bases. For each sequence all possible secondary structures are constructed and a distribution of free energies for

the states is obtained. It is found that the ground state free energies of tRNA molecules are

significantly

lower than for random sequences, and that tRNA molecules have significantly fewer alternative secondary structures at energies close to the ground state than do random sequences.

A distance D is defined which

measures the average difference between molecular configurations

and the ground state configuration. At realistic temperatures of order 300 K this distance is much larger for random sequences than for tRNA sequences. Thus the secondary structure of tRNA molecules at finite temperature is more stable than for random sequences. Sequences are

considered wltich differ by a small number of ntutations from real tRNA sequences. On average mutations destabilize the secondary structure. This suggests that a stable secondary structure

is one of the factors selected for by natural selection. The thermodynamic behaviour of RNA

sequences is compared to models for ranqom heteropolymers which have a low temperature

frozen phase.

1 Introduction.

The

secondary

structure of ribonucleic acid molecules is known to be a

complex

set of "stems"

and

"loops"

formed

by

base

pairing

bet,veen

complementary regions

of the chain

(Saenger

[1], and see

examples

in

Fig. 1).

It is no,v

coninion-Place

to use

computational

methods to

predict

the

secondary

structure of

particular

RNA sequences [2-7] or to confirm

experimental

evidence for the structure.

(*)

Address from Oct. 92; Dept. of Physics, Uitii,ersity of Sheffield, Hounsfield Road, Sheffield 53 7RH, G-B-

(**) Laboratoire de la Direction des Sciences de la MatiAre du Commissariat I

l'Energie

Atomique.

(3)

Typically

a program will consider the many

possible

ways of

folding

a chain into a

secondary

structure and select the structure which niininiizes the free energy, and

possibly

a few alterna-

tive structures of

slightly higher

free energy. The lowest free energy state is

generally

assumed

to be

"stable",

and should

correspond

to the

biological

structure if the parameters used to calculate the energy are known

sufficiently accurately.

We will call this most favourable state the

ground

state. From a

thermodynamical point

of view it is

by

no mealis sufficient to know what the

ground

state of a

physical

system

is,

in order to know whether this state is stable. There will be an

exponentially large

number of other

configurations

of the system, and even

though

the

ground

state is more favourable than any

one of these other states, the

probability

of

finding

the system in its

ground

state

configuration

may be

negligible.

In this article we look at the

thermodynamics

of RNA

folding.

We will use

a method which

can calculate the distribution of

energies

of all

possible configurations,

and hence the

probability

of

finding

the molecule in its

ground

state. ~Te o,ill focus on transfer RNAS, since these are among the shortest RNA sequences

(approx.

76

bases)

and since the

secondary

structrue is well known. Each

species

has a series of tRNA molecules, each of which is able to bind to one of the 20 amino acids. The tRNA sequences for the different amino acids are known for many

species,

and have been

catalogued (Sprintzl

et al.

[8]). Although

the sequences differ from

one another

they

can all be

arranged

into the clover-leaf

secondary

structure

(Fig. 1a)

and the evidence suggests that this is the

naturally occurring

structure [1].

Base-paired regions

of the chain are called stems. In tRNA the acceptor stem contains

typically

6 or 7

pairs,

whilst the other three stems contain between 3 and 5

pairs, depending

on the sequence.

Q b C

~~j~$°~

69

D loop

TV loop

Aniicodon loop

Fig. 1. al Example of clover-leaf structure: the ground state of tRNA~'~'~ from T. utilis.

b)

and

c) The ground state structure for two sequences each differing from tRNA~'~'~ by only one base at positions 26 and 69 respectively. Small changes in tlie sequence can lead to large scale reorgan12ation of the structure.

If a

biological

molecule is to

play

its pt.oper role in a

living cell,

it needs to be able to

recognize

and interact ,vith other

biological

molecules. Such interactions will

usually only

be

possible

if the molecule is in a

particular configuration.

An

iniportant

property of a useful

(4)

bio-molecule is to possess a stable structure

(or possibly

a small number of alternative stable

structures)

in which the molecule is almost

always

to be found.

Physicists

have been lead to draw a

parallel

between the

folding

of

biological

molecules

(particularly proteins)

and models for

strongly

disordered systems such as

spin glasses (Bryngelson

and

Wolynes

[9], Garel and

Orland [10], Shakhnovich and Gutin

[I II).

In these

articles,

molecules are treated as random

monomer sequences. The random disorder is shown to lead to the

freezing

of the molecule into

a small number of low energy

configurations

in some cases.

In the present paper we will compare a

sample

of real tRNA sequences with a

sample

of

randomly generated

base sequences. It is clear that real sequences are in no way random.

Natural selection has been

acting

on

biological

molecules

tuning

them to have certain

required properties.

A stable

secondary

structure is

likely

to be one of these

properties,

and so we should not be

surprised

to find

significant

differences between the structures of real and random

sequences. On the other

hand,

a

glance

at the tRNA

catalogue (Sprintzl

et al. [8]) shows that the

primary

sequences of the different iuolecules differ from one another a

great deal,

with no

apparent pattern visible.

Although

there is a small number of conserved bases in the

primary

sequence [1, 8], the

siiuilarity

of the molecules is much more apparent when

comparing

the

secondary

structures rather than the

priiuary

base sequences. There are many other classes of molecule for which the

secondary

structure is also well

preserved despite changes

in the

sequence, e-g- 55 ribosoinal RNA

(Rogers

et al.

[12])

and viral RNA

(Ahlquist

et al.

[13]).

In

the case of tRNA, since there are a

large

number of known sequences, we are lead to look at their

properties

in a statistical manner.

2.

Description

of the pi,ograni.

There are now man>,

coniput.ational algorit.hins

,vhich attempt to

predict

RNA structure

by searching

for the lo,,.est free energy

configuration [2-7]. Thermodynamic

parameters measured in

experiment

are

incorporated

into the prograiiis. Contributions to the free energy are basi-

cally

of t,vo kinds: a fi.ee energy

gain

for each

correctly matching

base

pair

added to a stem

("stacking"),

and

a fi.ee energy

penalty

for every

loop

closed. Favourable

secondary

structures will therefore have a

relati;ely

small number of stems which are as

long

as

possible,

rather

than a

large

number of very shot.t steins.

The

stacking

free

energies depend

on the base

pair

added and on the

previous pair

in the stem. Allo,ved base

pairs

are A

U,

C

G,

and the non-standard

pair

G U. The values used in our program are those

conveniently

tabulated

by

Jacobson et al. [5].

They

vary between

-0.3

kcal/mole

and -4.8

kcal/mole depending

on the stacked

pair.

These values are more or less standard in the literature.

Much less standard is the treatment of

loops.

This is

partly

because

experimental

values for

loop

parameters ai'e not

al,vays

available. Thus even the more recent works involve a

considerable amount of

approxiniation

in the values

assigned

to

loops (Jaeger

et al.

[7]).

We have decided to treat

loops

in a very

simplified

,vay. Our

object

here is not to

provide

the

most accurate

prediction

of the

secondary

structure for one

particular

sequence, but to look

at

general

features of the

folding

behaviour in a statistical manner. We therefore note that

every time a ne,v stein is added to the structure a new

loop

is also formed

(possibly

a

hairpin,

an interior

loop,

a

bulge loop,

etc..

).

Ilence the total number of

loops

is

equal

to the total number of stems. life ,vill make the

siniplification

of

assigning

a

penalty

of +4.5

kcal/mole

to all

loops, irrespective

of their type and

length.

This value appears to be

typical

of the values

given by

Jacobson et al. [5],

particularly

for tlie

hairpin

in

loops

of 4 to 8 bases which are

present in the tRNA clover-leaf structut.e.

(5)

With this

simplification

we can

assign

a net free energy to a sequence

equal

to the

stacking

energy +4.5. A stem is stable relative to the coil state if its net free energy

(including

the

penalty

for the

loop

which it

closes)

is

negative.

A stem

always

contributes

equally

to the free energy of a structure in this

model, regardless

of the

topology

of the

loops.

We will be interested not

just

in the lowest free energy structure, but in the distribution of

energies

of these structures

(density

of

states).

Our program is rather similar to that of

Pipas

and Macmahon [2] for this reason. It therefore

requires

a

large

amount of storage, and is less suitable for

long

sequences than alternative methods [4-7] due to the

exponential

number of

possible

structures. The program

proceeds

as follows.

SEARCH FOR POSSIBLE STEMS. All

points

in the sequence are checked for

complementary pairs,

and a list of

possible

stems is made. A stem is added to the list if it contains at least 3

base

pairs,

and if its net free energy,

including

the

penalty

for

loop closure,

is

negative.

There must be a minimum of 3

unpaired

bases in a

hairpin loop,

hence if base I and

j

are

paired

within a stem, then

j

> I + 4. ~Ve have follo,ved

Pipas

and Mcmahon [2] in the treatment of the GU

pair,

I.e. GU

pairs

are allowed within

a stein, but not as the terminal

pair

in a stem.

CREATION OF A COMPATIBILITY MATRIX. A matriX C is created such that the elements CAB = I if stems A and B are

compatible,

and CAB = 0 otherwise.

Two stems are

compatible

if

they

do not

overlap (I.e.

a base cannot be bonded in more than

one stem at

once),

and if

they satisfy

the "no knots" rule. This

requires

that if bases I and j are

paired

in one stem, and k and I are

paired

in another stem, then either I <

j

< k < I,

or I < k < I <

j.

The other

possibility

I < k <

j

< I is forbidden

(see

Sankoff et al.

[4]).

A

consequence of the no knots rule is that all allowed

secondary

structures can be drawn in 2d

without the line of the chain

crossing

over itself.

COMPILATION OF ALL POSSIBLE STRUCTURES. A structure is a set of

compatible

steals taken from the list. Each stem A represents a

possible

structure in itself. The program then

creates a list of all structures

containing

a

pair

of

compatible

stems A and B. To prevent double

counting

we

require

B > A. For each

pair

A and B the program then searches for all structures

containing

three stetns

ABC,

all of,vhich are

compatible

with each

other,

and

requiring

C >

B. Structures

containing

4, 5, 6.. steins can be built up in this way. For the moderate

length

chains considered here there ,vere a very

large

number of structures

containing

3 or 4 stems, but structures with more than 6 steins were aIniost

impossible

due to the restrictions of

compatibility.

Once the set of steins contained in a structure is kno,vn, the free energy of the structure is

simply

the suiu of the fi.ee

energies

of the stems. We note that in a more realistic treatment of the

loops

this would not be true, and it would be necessary to test the

topology

of the

loops

in a

given

structure to calculate its fi.ee energy. Thus the program would be much

longer.

3.

Comparison

of real and random sequences.

A

sample

of tRNA sequences was taken from the

compilation

of

Sprintzl

et al. [8]. This is the

same source as used

by

Ninio [3] in a

previous investigation

of tRNA structure. Two sequences

were taken from the list for each amino acid

(,vhere

more than one is

given).

The tRNAS for

Leucine,

Serine and

Tyrosine

were excluded front the

satnple

since they contain an extra arm, and are

significantly longer

than the rest. The result was a

sample

of 32 tRNAS with

lengths

in the range 74-77 bmes and mean

length

close to 76. All of these can be

arranged

in the clover-leaf pattern sho,vn in

figure

la.

(6)

One characteristic

distinguishing

tRNA from most other RNA is the presence of modified bases in addition to the four standard bases

A, C,

G and U

(Saenger [Ii).

Some of these are

modified in such a way as to prevent base

pairing,

so it is necessary to introduce

a class of

non-bonding

bases into the program.

Following

Ninio [3] we have treated the

following

bases

as

non-bonding: D, m~G, m(G, m~G, Q, Y,

and

m~C.

All other modified bases were treated

as the standard base to which

they

most resemble. In

particular

T and ~l were treated as U, and I was treated

as G. The

proportions

of the five different types of base in the

sample

of tRNAS studied were: 20.2~

A,

27.0~

C,

28.4~

G,

20.0$l U and 4A$l

non-bonding.

Properties

of tRNA sequences were

compared

with

a

sample

of random sequences of

length

76 bases. Each base in the random sequences

was chosen to be

A, C, G,

U or

non-bonding

with a

probability equal

to the

probability

of occurrence in the real sequences. It is known that free

energies depend

on the C + G content of the

chains,

hence

we wished to be sure that the random sequences had the same

composition

as the tRNA sequences.

As a test of the model we calculated the

ground

state structure for the 32 tRNA sequences.

Of

these,

25 were found to have the clover-leaf structure as

ground

state, a further 5 were found to have the clover-leaf structure except for a

niissing

D stem, and the

remaining

2 had

ground

states other tlian the clover-leaf. These results are

typical

of those obtained

by Pipas

and Mcmahon [2] and Ninio [3]. It would therefore seem that the

simplified

treatment of the

loop energies

does not

seriously

affect the results. In fact Ninio has considered a

large

number of

slight

variations in the

stacking energies

and

loop energies.

The

degree

of "successful"

prediction

of the clover-leaf structure varies

slightly

with the parameters

used,

but is

always fairly high.

We state

again

that the

object

of this paper is to compare real tRNA with random

sequences on a statistical basis, and not to look at the

precise

details of the

secondary

structure of any one sequence. The model defined above appears

perfectly adequate

for this purpose,

without

introducing

any further

complications

and

special

cases.

In

figure

2 we show the

histogram

of

ground

state free

energies

for the tRNA

samples compared

to that for a

sample

of1000 random sequences of

length

76. There is

clearly

a

large

difference bet,veen the two, with the real sequences

having

much lower

ground

state free

energies

than the random sequences. The

ground

states for the tRNA

samples

are in the range -45 to -15

kcal/iuole,

in agreement with the results of

Pipas

and Mcmahon [2]. If we take

a

typical

tRNA sequence of free energy -30

kcal/mole,

we may calculate from the

histogram

that the

probability

of

finding

a random sequence with a

ground

state less than or

equal

to -30 is

only

about 2§l.

Since the program calculates all

possible secondary

structures we can calculate the

density

of states, I.e. the distribution of free

energies

for the different structures. For each individual sequence the distribution is rather

irregular,

and there are

large

fluctuations from sequence to sequence

(see Pipas

and Mcmahon

[iii.

In

figure

3a we show the average distribution for the tRNA

samples

and for the random

samples.

These are

fairly

smooth curves. The

column

heights

in the

figure

represent the average number of structures per sequence in each

kcal/mole

interval.

It will be seen that the total number of structures

(area

under the

curve)

is much

larger

for the tRNA sequences than for the random sequences, and that the tail of the distribution

representing

the most favourable structures extends to much lower free

energies.

The reason that the real

samples

have a

larger

number of structures is because

they

have stems with

relatively long coinpleinentary

sequences. For every stem of 5 base

pairs,

for

example,

there

are shorter stems with

lengths

3 or 4 formed

by partially unzipping

the 5

pair

stems. Thus sequences with the

possibility

of

forming relatively long

stems

automatically

have a

larger

number of stems in total and hence a

larger

number of structures.

(Only

structures with

correctly

matched base

pairs

have been considered here. It would also have been

possible

to

(7)

o,3

tRNA random

0.2

~ ZI f

~ 0.1

-10 -30 -20 -lo 0

Free energy lKcal/motel

Fig. 2. Histogram of ground state free energies for tRNA (32

sequences)

compared to random sequences

(1000

sequences of length

76).

The range has been divided into boxes of width 3 kcal/mole.

permit pairing

between any

regions

of the chain and to

assign large

unfavourable free

energies

to

incorrectly

matched

pairs.

In this case theri would be the same number of structures for all chains of the same

length.

The

density

of states would then extend to

positive

free

energies

with respect to the unfolded states, but ,vould differ very little at

energies

close to the

ground

state. We have not done this since it would increase

enormously

the total number of

states).

Table I.

Average

vahies ofsonie parameters

conipared

for (RNA and random sequences, and for (RNA where the

non-bonding

bases ivere

replaced by

standard bases. Close

competitors

are structures

(or

local

ininiina)

ii,itliin 5 kcal

/niole

of the

ground

state.

raiidom tRNA

(iio lion-bonding bases) ground

state

energy

(kcale/iuole)

-29.7 -16.5 -30.I

Mean nuiuber of structures

sequences 1544 489 + 10 3081

close

competitors

li.0 33.5 + 21.6

Mean number of local

per sequence 152 73 + 3 277

close

competitors

: 3.8 14.0 + 0.5 6.9

The free energy distributions iii

figure

3a are of course measured relative to the

completely

unfolded state ,vith no base

pairs.

It is also of intern.st to measure the average distribution of free

energies

relative to the

ground

state.

Figure

3b shows the same data as

figure

3a, but the

density

of states for each sequence hits been shifted so that the

ground

state is at zero, and the shifted distributions have been

averaged.

li~e see that even

though

the random sequences have

(8)

iso iso

al b)

w

100 loo

g tRNA

i

~

§ 50 50

0 0

-10 -30 -20 -lo 0 0 lo 20 30 10

15 isoo

cl dl

° tRNA

~

lo 1000 55 rRNA

E B

)

o 5 500

Z random

0 0

0 10 20 30 10 0 20 10 60

Free energy Free energy

Fig. 3.

a)

Average density of states for tRNA compared to random sequences

(same

samples as Fig.

2). b)

Average density of states for

same samples, with free energy measured relative to ground

state. Whilst the total number of structures is

larger

for tlte tRNA samples, the number of structures close to the ground state is sntaller. c) Average deitsity of local minima states measured relative to

ground state for same samples.

d)

Density of local minima states relative to the ground state for E.

coli 55 rRNA compared to tlte average deitsity for 20 random sequences oflength 120.

fewer st.ructures in total,

they

have more structures at

energies

close to the

ground

state than

do the tRNA sequences. Note that

figures

3a and 3b do not have the same

shape,

since each

of the densities of states

contributing

to the a,,erage has been shifted relative to its

particular groundstate.

Some statistics

are

presented

in table I

(columns

I and

2).

We see that the tRNA sequences have

roughly

three times as many folded

configurations

as the random sequences, whilst

only

about one third as many close

competitors

with the

ground

state. We have taken the number of close

competitors

as the nuiuber of structures ,vithiii 5

kcal/mole

of the

ground

state

(including

the

ground

state

itself).

This trend is enhanced if,ve look

only

at structures ,vhich are local free energy minima. A local minimum is a structure to which it is not

possible

to add any further base

pairs

without

breaking

some of the

pairs,vhich

are

already

present.

A structure

consisting

of the set of stems

(A,

B,..

,

K)

is a local minimum if

(I)

there is no stem L not a member of the set, which is

compatible

with all members of the set, and,

(9)

(it)

none of the members of the set can grow into a

longer

stem without

becoming

incom-

patible

with another member of the set.

The first condition is

straightforward.

If there were another stem which could be added without

disrupting

the

original

set of stems, then the

original

set cannot be a local minimum of free energy. The second condition

requires

more

explanation.

There may

be,

for

example,

a 3

base

pair

stem which can

"grow"

into a 4 base

pair

stem

by

acldition of a further

complementary pair

on the end. These two stems would be defined as

incompatible,

since a structure may contain either one or the

other,

but not both. For every structure

containing

the 3

pair

stem there will

usually

be a structure

containing

the 4

pair

stem instead and

having

a lower free energy. The first structure is therefore not a free energy minimum. However, it may be that whilst the 3

pair

stem was

compatible

with the other stems in the structure, the 4

pair

stem would not be. In this case the structure o.ith the 3

pair

stem would be a local free energy

minimum.

It is clear that there will

always

be a certain number of structures close to the

ground

state in which some of the steins of the

ground

state have become

partially "unzipped"

at the ends.

These states cannot be considered

m true alternative

secondary

structures to the

ground

state.

On the other hand the local niininia defined above represent real alternative structures because

they

contain base

pairs

which are not present in the

ground

state. Thus if we want to know how many alternative structures are in close

proximity

to the

ground

state, it is

interesting

to look at the distribution of local niininia

(Fig. 3c).

The effect observed in

figure

3b is enhanced: whilst the total number of local minima is

larger

for the tRNA sequences than the random sequences, the number of local minima close to the

ground

state is smaller

(see

also Tab.

I).

This

implies

that the

ground

state of the tRNA sequences is more stable than the

ground

state of a

typical

random sequence, both relative to the unfolded state, and relative to alternative

competing

structures.

In order to see if this trend ,vas found in other types of RNA molecules we looked at the

following

sequences: 55 ribosonial RNA

(Rogers

et al.

[12]), plant

viral RNA

(Ahlquist

et al.

[13])

and

fragments

froiu the

Tetrahyinena intervening

sequence

(Cech

et al.

[14],

Williams and Tinoco

[6]).

In each case the real sequences were

compared

to random sequences of the

same

length

and the same base

composition.

The base

composition

is different for the different classes of

molecule,

but none of theiu contains any of the

non-bonding

bases present in tRNA.

In most of t-he cases clear differences between random and real sequences were observed

with the same features apparent as for tRNA. The distribution of local minima for E. coli 55 ribosomal RNA is shown in

figure

3d as an

example.

~~e have not

analysed

sufficient of these

longer

sequences to have reliable

statistics,

and therefore in the rest of the paper

only

tRNA will be considered.

4 Thermo

dynainic

bebaviour.

Having

calculated the

density

of states if is

possible

to obtain any

thermodynamic quantities required. Firstly

the

partition

function Z is

z

~ ~-Gja)/kT ~ij

where

G(a)

is the fi.ee energy of structure o. The sum is to be taken over all states, not

just

the local minima states. One

quantity

of interest is the

probability

Wo that the molecule is in

(10)

its

ground

state. If Go is the

ground

state free energy, then the

weight

of the

ground

state is

wo

=

je-Go/kT

(21

Figure

4a shows the

sample

average value

(Wo)

as a function of

temperature

for tRNA and random sequences.

The

G(a)

are free

energies containing

both

entropic

and

enthalpic

parts. Each state a is not a true microstate, but may be

thought

of as a sum over all microstates with a

given

set of bonds. The

G(o)

are thus functions of temperature. We have used the

experimental

values measured at temperatures close to 300 K. In

figure

4 we have assumed these values to be fixed

independent

of temperature. The temperature scale in

figure

4 is artificial and determines the

;.elative

weight given

to the

groundstate

and its

competitors.

As T - 0

only

the

groundstate

is selected and

as T

- cc all structures are

present,vith equal probability. Only

the temperature T = 300

It,

at which

point

kT m 0.6

kcal/mole corresponds

to a situation

occurring naturally.

A real molecule at T » 300 II ,vill of course wifold

coiupletely

since the unfolded state

(with

no

bonds)

has the

largest

entropy. This is not

equivalent

to the

high

temperature limit in

figure

4. To

plot quantities

as a function of "real" temperature would

require experimental

data at many different temperatures.

We see in

figure

4a that

(Wo)

is

significantly higher

for tRNA than for random sequences, and is close to for tRNA at 300 K,

indicating

that a real molecule will almost

always

be in its

ground

state. In fact

(ivo)

" 0.83 for tIINA and 0.52 for random sequences at 300 K.

1-o i-o

,

0.8

~~~~~~

D~~

~'~~°~

'

,

, ,

o-I o-I

tRNA random

i i

0.2

,

, ,

o-i i io ioo o-i i io ioo

kT kT

Fig. 4.

a)

Average weight ~Vo of the ground state as a function of temperature for tRNA, random, and mirror image sequences

(N

= 76 in each case ). The dotted line at kT

= 0.6 corresponds to 300 K.

See text for meaning of temperature scale.

The thermal energy kT = 0.6 is rather small

compared

with the bond free

energies (in

the

range 1.2-4.8 for lvatson-Crick

pairs,

and 0.3 for

GU)

hence excitations from the

ground

state are rather

costly,

and (Vu is

fairly large

even for the random sequences.

A

special

class of seqttences which ,vill have a

particularly

stable

ground

state are mirror

image

sequences

capable

of

folding

int.o a

single hairpin loop.

Mirror

image

sequences were

(11)

generated by choosing

the first half of the molecule to be a random sequence, and

setting

the second half to be the exact

compleiuentary

sequence to the first half. As shown in

figure 4a, (Wo)

is much

larger

for mirror

iiuage

sequences than for random sequences. At 300 K

(Wo)

"

0.97 for mirror

images.

Real tRNA has a behaviour intermediate between the two extremes.

(Note

that the mirror

image

sequences in

figure

4 contained no

non-bonding bases,

whilst the random sequences contained the same fraction of

non-bonding

bases as

tRNA,

to allow proper

comparison).

It is also

interesting

to measure the

typical

difference of the

configuration

from the

ground

state

configuration

at finite temperature. The

configuration

o is defined

by

the bond variables

b°(I)

in the

following

way. If bases I and

j

are

paired

then set

b°(I)

=

j

and

b"(j)

= I. If I is

unpaired

then set

b°(I)

= 0. tile will define the distance

D°P

between

configurations

a

and

p

as

simply

the number of bases I for which

b"(I) # bP(I). D°P

is a

generalization

of the

Hamming

distance often used for sequence

comparison

[4]. In

figure

4b we show the average distance D from the

ground

state as a function of kT.

D =

j ~j D°°e~~(")/~~ (3)

«

where D°" is the distance of

configuration

a from the

ground

state. D is thus sensitive to alternative structures

differing widely

front the

ground

state.

As

expected

we find that D is

significantly larger

for random sequences than tRNA

(Fig. 4b).

At 300 K D m 1.7 for tRNA and D cs 8.2 for random sequences. The

figure

shows

DIN

with

N = 76 in each case. A D of around 2 indicates

siinple

excitations of the

ground

state such

as

unzipping

one base

pair

froiu the end of a stein. When D m 8

significant changes

in the

secondary

structure are present: loss of

a 3

base-pair

stem would

give

D = 6 for

example.

The behaviour of iuirror

iniage

sequences is also shown in

figure

4b. A rather

abrupt change

in D is visible in this case as the temperature is increased. At 300 K D m 0.067,

indicating

almost no excitation front the

ground

state. This is

simply

because there are so few accessible excited states for mirror

image

molecules.

In the

thermodynamic

liniit

(N

-

cc)

mirror

image

sequences behave very

differently

from random sequences. This can be seen

by comparing

sequences of three different

lengths (30,

50 and

76)

in

figure

5. For random sequences lvo decreases as N increases over the whole of the temperature range. This is because as N increases the nuiuber of

competing

structures close to the

ground

state ,vill also

increase,

and the

,veight

of the

ground

state will decrease at all

non-zero temperatures.

On the other

hand,

for mirror

image

sequences the curves for Wo superpose at low tem-

peratures, and decrease with N at

high

teiuperatures. This indicates the presence of a

phase

transition in the liniit N

~ cc. Front

figure

5b the transition temperature is

approximately

kT~ m 10 12. Belo,v T~ the

ground

state has a

finite,veight, independent

of N, whilst above T~, Wo is a function of N and decreases to zero as N

- cc. Wo is finite at T < T~ since the number of states at accessible energy levels does not increase with N.

The

corresponding

behaviour for

DIN

is also shown in

figure

5. For random chains we expect D

-~ N for all temperatures, and so for

large

N the curves of

DIN

should superpose.

The fact that the three curves in

figure

5a do not superpose is

presumably

due to finite size effects in these rather short chains. For mirror

image

sequences at T < T~, D is a function of

temperature

only

and not of

N,

thus

DIN

- 0 as N

- cc. At T > T~ we expect D -~ N for

large

N. Finite size effects are

again

t.athet.

large

in

figure

5b. The

crossing

of the three curves for

DIN

is an indication of the

phase

transition for

large

N. The transition is first

order,

I.e.

quantities

such as ivo, D and the energy of the systeiu ,viii

change discontinuously

at T~ in the limit N

- cc.

(12)

1-o

random mirror

0.8 0.8

~~

0.6 0.6

30

°" 76 °"

~~ 30

~~

30

o-o o-o

o,i i io ioo o-i i io ioo

kT kT

Fig. 5. Dependence of Wo and D on N for raitdom and mirror image sequences. Figures show Wo

(decreasing

curves) and

DIN (increasing

curves) as functions of temperature for chains of length N = 30, 50 and 76. For the mirror image molecules tltere is a low temperature phase for which Wo is

finite even as N

- oc, wltereas for random sequeitces there is no pltase transition and Wo decreases with N at all temperatures.

As stated

above,

the temperature scale used here is artificial because we have treated the free

energies

of the states as

simple energies

which do not

change

with temperature. The

thermodynamic

behaviour would be the same if the temperature

dependence

of the states were treated

properly.

For the mirror

image

molecules there would be a

phase

transition at the temperature where the

ground

state free energy is

equal

to the free energy for the sum of the other states. For a

typical

raiidom sequence there would be no such transition, and

thermodynamic quantities

would

change smoothly.

Thus real tRNA molecules have a

ground

state which is

considerably

more stable than a

typical

random sequence, but less stable than the extreme case of the mirror

image.

The mirror

image

molecules

are a

simple example

of a system with a low temperature

phase

which is

ground

state dominated. Other

examples

are discussed in section 6. The low temperature

phase

may be termed

frozen,

since the

configurational

entropy is not extensive at T < T~. A

typical

random sequence has an extensi,,e entropy at all temperatures.

5. Effect of small

changes

in the sequence.

In order to

investigate

the role of the modified bases in tRNA structure, we calculated the

ground

state for the same set of tRNA sequences, but the modified bases which had

previously

been treated as

non-bonding

were treated as the

equivalent

uninodified base. In most cases this did not affect the

prediction

oft-he

ground

state, however in two cases where the clover-leaf

was

predicted successfully before,

and alternative lower free energy structure was found when the modified bases where

replaced by

standard bases.

We see in table I that

replacement

of

non-bonding

bases

by

standard bases leads to

only

a

very small decrease in the mean

ground

state energy. IIO,vever there is a much greater

change

of number of structures. There are no,v aIniost twice as many structures and close

competitor

structures as before. Thus it would appear that the

non-bonding

bases may

play

an

important

(13)

role in

eliminating

alternative structures to the clover-leaf- Ninio [3] has also looked at the effects of

non-bonding

bases and finds that the

predictability

of the clover-leaf is

significantly

reduced if the

non-bonding

bases are treated as

bonding.

One

example,vhere

a modified base was found to be

important

was in <RNA~~~'~ from T. utilis shown in

figure

I

(sequence

092 from the

catalogue [8]).

When the base

m(G

at

position

26

was treated as

non-bonding

the clover-leaf structure was

predicted

as

ground

state

(Fig. 1a),

however when it was treated as a standard base

G,

the alternative structure

figure

1b was found.

Changing

the base

permits

an alternative lower free energy structure to form,

In

general

the

ground

state

configuration

is

extremely

sensitive to small

changes

in the sequence. If in the same

tRNA~'~~

molecule the base G at

position

69 is

replaced by

an

A,

then the

resulting

sequence hm the

ground

state shown in

figure

lc.

Changing

base 69

disrupts

the acceptor stem, but this leads to a

change

in the

configu

ration of a

large

fraction of the

molecule,

not

just

the acceptor stein itself.

Only

the

T~I loop

is conserved in all three

examples.

We have carried out a

systematic study

of the effect of mutations in the sequence on the

resulting secondary

struct.ure.

Firstly,

we looked at I

point

iuutations. For each of the 32 tRNA sequences in the

sample,

mutated sequences were

generated

which diffet.ed

by

one base from the real sequence. The mutated base was chosen to be either

A, C, G,

U or non

bonding

with the same

probabilities

as

given

in section 3

(but

was forced to be different fi.om the

original base).

One mutated

sequence was formed for each base on the

chain,

I.e. a total of

approximately

76 x 32 sequences

were

analyzed.

Each mutated sequence was classed

according

to whether its

ground

state had

increased, decreased,

or reiuained the same relative to the

original

sequence. The

probabilities

p,nc, pdec, psame of these three

possibilities

are

given

in

figure

6a. Two and three

point

mutations

were also

analyzed.

For each tRNA sequence 76 mutant sequences were

generated differing

at 2

(or 3) randomly

chosen

points.

100

°' tRNA b)

random

30 random

% % w

60 60

)

I

I

Psame P;n~/P £

20 p 20 ~~ tRNA

dec

~

0

3 2 3 0 12 3

mutations mutations mutations

Fig. 6.

a)

Comparison of tRNA with mutant sequences. pjnc > pd~~, indicating that mutant sequences have less stable ground state structures than tRNA.

b)

Mutations made to random sequences leave the chains statistically equivalent, therefore pjnc

" 1Jdec for random sequences,

c)

The number of

close competitor structures ,vitbin 5

kcal/mole

of the grouitd state increases as mutations are made to

tRNA sequences. The original tRNA sequences are shown at 0 mutations. The dotted line indicates

the value for raitdom sequences.

In each case pmc is

significant.ly

greater than pdec,

indicating

that

making

mutations tends to destabilize the

ground

state on average. The mutant sequences also have a reduced total

Références

Documents relatifs

Circles indicate maximum LL of the given matrix, and triangles indicate the best fitting parameters to the real data (that were also used to generate the simulated data).. In this

An infinitesimal increase of the volume ∆v brings an increase in p int , so that the piston moves until another equilibrium point with a bigger volume. On the other hand, a

In Section 3, we introduce the generalized Stern-Brocot intervals in connection with the expansion of real numbers in Rosen con- tinued fractions, which enables us to study the

In Section 3, we introduce the generalized Stern–Brocot intervals in connection with the expansion of real numbers in Rosen continued fractions, which enables us to study the

We define a partition of the set of integers k in the range [1, m−1] prime to m into two or three subsets, where one subset consists of those integers k which are &lt; m/2,

Again, a key question is: How do we measure the uniformity of the point set P n ? Standard practice is to use a figure of merit that measures the discrepancy between the

A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotides distribution.. ISMB/ECCB - 21st Annual international conference

It handles several classes of models useful for sequence analysis, such as Markov chains, Hidden Markov models, weighted context-free grammars, regular expressions and