• Aucun résultat trouvé

Codon Usage in the terminal region of <i>E. Coli</i> genes

N/A
N/A
Protected

Academic year: 2022

Partager "Codon Usage in the terminal region of <i>E. Coli</i> genes"

Copied!
6
0
0

Texte intégral

(1)

Article

Reference

Codon Usage in the terminal region of E. Coli genes

STEINBERGER, Cynthia

Abstract

A comparison of codon usage in the region close to the termination codon in E. coli genes with the average E. coli codon usage shows that those codons which differ from termination codons by one base change, called pretermination codons, appear more frequently at the end of the gene. The higher frequency of pretermination codons in this region might be due to single base mutations of previously existing multiple termination codons. In addition, a comparison is made of termination codon usage, tandem termination frequency, and termination context in E. coli, H. sapiens and bacteriophage T4.

STEINBERGER, Cynthia. Codon Usage in the terminal region of E. Coli genes. Life Science Advances Molecular Genetics , 1988, vol. 7, p. 141-145

Available at:

http://archive-ouverte.unige.ch/unige:127566

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

.,

Mol. Gen. (Life Sci. Adv.) 1988, 7: 141 -145

Codon Usage in the terminal region of E Colt genes

C. Alff-Steinberger

Department oi Molecular Biology, l)niversity of Geneva, 30 quai Erncst-Anscrmct, 1211 Gcncvc 4, Switzerland.

ABSTRACT

A comparison of codon usage in the region close to the termination codon in ~. ~ · genes

with the average ~. £Ql.i codon usage shows that those codons which differ from termination codons by one base change, called pretermination codons, appear more frequently at the end of the gene.

The higher frequency of pretermination codons in this region might be due to single base

mutations of previously existing muitiple termination codons. In addition, a comparison is made of termination codon usage, tandem termination frequency, and termination context in

t.

£Ql..i,

sapiens and bac~eriophage T4.

INTBODUCTION

The appearance of multiple termination codons in prokaryotes has been observed and commented upon for over a decade (Watson, 1976). It is not clear to what extent the tandem stops observed are still functional, or whether they are evolutionary remainders from a period when translation termination was less efficient than at present. In this study, the codons usage of 201 Escherichia £Ql.i genes was studied, and in particular, the usage of codons in the region close to the termination codon was compared to the average usage. It is found that in a small region 5' to the

termination codon, there is a significant increase in the appearance of pretermination codons (by which term are denoted the 18 codons which differ from the termination codons by only one base). If use of multiple termination codons were a common feature of the ancestor of ~- coli, but no longer required by the translation system of the modern prokaryote, single base substitution mutations of some of the redundant termination codons would lead to an enrichment of

pretermination codons at the end of the gene.

This seems to be a likely explanation of the effect observed in the present study. on the 3' side of the termination codon, the triplet observed is also frequently a pretermination codon, or a termination codon, or another triplet beginning with U, confirming the importance of context on translation termination efficiency {Fluck et al, 1977, Kohli and Grosjean, 1981).

MATERIALS AND METHODS Sequence Selection

The sequence data were obtained from the Nucleotide Sequence Data Library, Release 7, of the European Molecular Biology Laboratory (EMBL), Heidelberg, West Germany. Complete~­

.!22.li

coding regions for known genes were selected. In the cases where more than one sequence was given for the same gene, one of the sequences was arbitrarily chosen to be included in the tabulation. Genes were rejected if there were gaps in the sequence, if the gene product was not identified, if the gene was plasmid related, if the initiation or termination codon was missing, if the length of the coding region was not a multiple of three, or if there were inconsistencies in the data library entry. In this way, 201 complete genes were selected.

Analysis

The selection, tabulation, and

statistical analyses of the data were done using the CDC cyber computer of the cantonal Hospital in Geneva, using programs written by the author. The EMBL Data Library usually contains, for DNA entries, the sequence of the non-coding strand, which is homologous to the mRNA transcribed from the coding strand. The codon frequencies shown in Tables 1-4 are those of the mRNA.

(3)

142

RESULTS ANO DISCUSSION

This study is concerned with codon µsage i:n the terminal region of the gene. For this purpose, the codons are numbered in the 5' direction, starting with "O" f or the termination codon, "l" for the next codon upstream, etc. The codon frequencies for the 201 &:.·

£2.li

genes are given in Table 1, for the entire gene. In Table 2, the codon frequencies are given for 3 codons adjacent to the termination codon in the 5' direction, which are calied codons (1-3), and in Table 3, fo! the 10 codons adjacent to the

termination codon in the 5' direction, which are called codons (1-10). In Table 4 the distribution of the triplet following the termination codon on the 3' d irection is given. Of our original sample of 201 genes, only 185 sequences extended beyond the termination codon. Tab es 1-3 therefore contain entries from 201 genes, and Table 4 contains entries from 1.85 genes. The 18 pretermination codons are marked with an *

Table 1: Codon distribution for 201 complete coli genes

uuu

1039 PHE

ucu

842 SER

uuc

1458 PHE

ucc

803 SER UUA* 549 LEU UCA* 331 SER UUG* 660 LEU UCG* 474 SER

cuu

536 LEU

ccu

381 PRO CUC 598 LEU CCC 201 PRO CUA 146 LEU CCA 489 PRO CUG 4136 LEU CCG 1777 PRO AUU 1635 ILEU ACU 776 THR AUC 2273 ILEU ACC 1695 THR AUA 162 ILEU ACA 318 THR AUG 1854 MET ACG 687 THR GUU 1690 VAL GCU 1493 ALA GUC 861 VAL GCC 1520 ALA GUA 957 VAL GCA 1463 ALA GUG 1643 VAL GCG 2302 ALA Total number of codons = 69234

Table 2: Codon distrubution for codons (1-3) on the 51 side of the termination codon for 201

g.

coli

uuu

8 PHE

ucu

8 SER

uuc

10 PHE

ucc

8 SER UUA* 6 LEU UCA* 3 SER UUG*l3 LEU UCG* 2 SER

cuu

5 LEU

ccu

2 PRO

cue

2 LEU CCC 2 PRO CUA 0 LEU CCA 2 PRO CUG 28 LEU CCG 5 PRO AUU 7 ILEU ACU 6 THR AUC 11 ILEU ACC 6 THR AUA 2 ILEU ACA 1 THR AUG 9 MET ACG 4 THR GUU 18 VAL GCU 21 ALA GUC 8 VAL GCU 8 ALA GUA 5 VAL GCA 9 ALA GUG 10 VAL GCG 20 ALA

C. Alff-Steinberger The remaining codons, which are neither termination nor pretermina~ion codons, are called here non-pretermination codons.

In Tables 5, 6 , and 7, the numbers of preterrnination and non-pre~ermination codons in the terminal and other regions are

compared. It is se~n that for the codons closest to the termination codon, in positions (1-3], 40 % are pretermination codons, while in positions (4-10) and in positions (11 up to but not including

initiation], this fraction is 25%. To see if . this difference is statistically significant,

a chi-square test of independence is used. In Table 5, the numbers of pretermination and non-preterrnination codons fo und close to the termination codon (positions (1-3)) are compared to those found upstream nearby

(positions (4-10]) . The chi-square of 41.6 found for this table indicates that the probability that the row and column variables are independent is very small ,

g., the

higher frequency of pretermination codons in

UAU* 912 TYR l.JGU* 279 CYS UAC*l056 TYR UGC* 365 CYS UAA 153 TERM UGA 38 TERM UAG 10 TERM UGG* 692 TRP CAU 609 HIS CGU 2045 ARG CAC 816 HIS CGC 1451 ARG CAA* 820 GLUN CGA* 148 ARG CAG*2210 GLUN CGG 164 ARG AAU 866 ASPN AGU 376 SER AAC 1907 ASPN AGC 998 SER AAA*2804 LYS AGA* 102 ARG AAG* 870 LYS AGG 55 ARG GAU 2073 ASP GGU 2340 GLY GAC 1751 ASP GGC 2151 GLY GAA*3243 GLU GGA* 333 GLY GAG*l352 GLU GGG 496 · GLY

c::renes.

UAU* 5 TYR UGU* 2 CYS UAC* 8 TYR UGC* 2 CYS UAA 0 TERM UGA 0 TERM UAG 0 TERM UGG*ll TRP CAU 9 HIS CGU 14 ARG CAC 1 HIS CGC 12 ARG

° CAA*ll GLUN CAG*28 GLUN CGG 3 ARG CGA*

"

ARG

AAU 9 ASPN AGU 5 ·SER AAC 11 ASPN AGC g SER AAA*52 LYS AGA* ) ARG AAG*30 LYS AGG 1 ARG GAU 11 ASP GGU 15 GLY GAC 13 ASP GG~ 12 GLY GAA*37 GLU GGA* 4 GLY GAG*l7 GLU GGG 14 GLY

(4)

Codon usag~ in the terminal region of E coli genes 1-0 Tabl e 3: Codon distribution for

codons (l-10) on the 5' side o f t he t e rmination codon for 201 g. coli genes.

uuu

30 PHE

ucu

19 SER UAU* 20 TYR UGU* 5 CYS

uuc

43 PHE

ucc

18 SER UAC* 25 TYR UGC* 11 CYS UUA* 15 LEU UCA* 12 SER UAA 0 TERM UGA 0 TERM UUG* 27 LEU UCG* 18 SER UAG 0 TERM UGG* 25 TRP

cuu

21 LEU

ccu

13 PRO CAU 27 HIS CGU 46 ARG CUC 14 LEU CCC 4 PRO CAC 7 HIS CGC 44 ARG CUA 4 LEU CCA 13 PRO CAA* 25 GLUN CGA* 14 ARG CUG 106 LEU CCG 34 PRO CAG* 62 GLUN CGG 13 ARG AUU 44 ILEU ACU 14 THR AAU 26 ASPN AGU 16 SER AUC 52 ILEU ACC 37 THR AAC 50 ASPN AGC 21 SER AUA 4 ILEU AC/I. 7 THR AAA*l26 LYS AGA* 10 ARG AUG 44 MET ACG 20 THR AAG* 50 LYS AGG 4 ARG GUU 60 VAL GCU 68 ALA GAU 49 ASP GGU 55 GLY GUC 37 VAL GCC 35 ALA GAC 39 ASP GGC 43 GLY GUA 25 VAL GCA 63 ALA GAA*lOO GLU GGA* 11 GLY GUG 50 VAL GCG 63 ALA GAG* 42 GLU GGG 30 GLY

Table 4: Codon distribution for the triplet adjacent to the termination

codon on the 3' side for 185 ~- coli genes

uuu

11 PHE

ucu

5 SER

uuc

7 PHE

ucc

3 SER UUA*.5 LEU UCA* 4 SER . UUG* 3 LEU UCG* 8 SER

cuu

- 0 LEU

ccu

1 PRO CUC 0 LEU CCC 3 PRO CUA 0 LEU CCA 2 PRO CUG 2 LEU CCG 3 PRO AUU 1 ILEU ACU l THR AUC 0 ILEU ACC 4 THR AUA 5 ILEU ACA 4 THR AUG 3 MET ACG l THR GUU 2 VAL GCU 0 ALA GUC 1 VAL GCC 5 ALA GUA 3 VAL GCA 3 ALA GUG 0 VAL GCG 2 ALA

positi ons (1-3), compared with positions [4 - 10), is statistica lly s i g nificant. In Tabl e 6, the number s of pretermina t i on and non- prete rminati on codons found close t o the termination codon, in positions [ l-3], a re compa red wit h those found upstream, positi ons (4 up to but not i ncludi ng i n i t i ati on] . The ch i-square o f 71.2 f ound f or this t a b le ind.i c ates that the hig h er frequency o f pretermination codons in position (1-3], compared to position& upstream, is

statistically significant . In Table 7, the numbers of pretermination and non-

pretermination codons in positions [4-10], which are toward the end of the gene, are compared to those found in the region upstream, positions [11 up to but not

including initiation]. The chi-square of 0.26 found for this table indicates that frequency of pretermination codons found in positions [4-10] is not significantly different from that found in the upstream region; The choice of position 3 for the cut-off point for the chi-square calculation was made by examining

UAU* 4 TYR UGU* 2 CYS UAC* 1 TYR UGC* 5 CYS UAA 17 TERM UGA 7 TERM UAG 6 TERM UGG* 3 TRP CAU 2 HIS CGU 2 ARG CAC 2 HIS CGC 4 ARG CAA* 1 GLUN CGA* 1 ARG CAG* l GLUN CGG l ARG AAU 2 ASPN AGU 1 SER AAC 2 ASPN AGC 0 SER AAA* 3 LYS AGA* 2 ARG AAG* 4 LYS AGG 0 ARG GAU l ASP GGU 3 GLY GAC 0 ASP GGC 1 GLY GAA* 1 GLU GGA* 7 GLY GAG* 3 GLU GGG 4 GLY

t he frequency of pretermination codons in p o sitions [1-10], which are, respectively, 76 , 86, 78, 46, 55, 56, 57, 38, 67, 39, out o f a total of 201 codons at each position, of course. Thus, it is seen that in a smal l region o f length three codons, upstream from the termi nation codon, there is a significant increas e in the frequency of pretermination codons.

It can be seen from Table 4 that the frequencies of the triplets adjacent to the termination codon on the J' side are not random: 30 of the 185 genes have a double stop c odon ; 58 of the r emaining 155 t r i p lets are preterminat i on codons , wh ich i s

sign ifica ntl y l arger t ha n would be e xpected if the trip let freque ncies f o llowed the genome a v erage , a s in Tabl e 1; 9 1 of the 18 5 t riplets beg i n with

u,

i n contrast t o a ve rag e codon usag e, where U is t he least frequent ba se . in the firs t posit ion . All th ree stop codons tend to be followed by U. Of the 185 genes in Table 4, 143 terminate in UAA, of which 69 have U on the 3' side, 34 terminate

/Y

(5)

144

Tables 5, 6, and 7 : Comparison of the numbers of pretermination and non-pretermination codons in the terminal and other regions.

Table 5

Pretermination codons

Positions (1-3]

240 Non-pretermination 363 codons

Positions (4-10]

358 1049 Chi-square=41.6, probability <.0001

Table 6

*

Positions (1-3]

Pretermination 240 codons

Non-pretermination 363 codons

Positions (4 up to but not including initiation}

16960 51269 Chi-sqUare=71.2, probability <.0001

in UGA, of which 18 have U on the 3' side, and 8 terminate in UAG, of which 4 have U on the J' side.

A possible explanation for the increased pretermination codon frequencies in the terminal region is that they are remnants of a past usage of multiple termination codons.

If the use of multiple termination codons were common in the past, perhaps due to a less efficient translation termination system, and if an increase in the efficiency of translation termination has relaxed the requirement for multiple terminations, then it might be expected that single base

mutations occurring in the group of multiple terminations would lead to an increase in the frequency of pretermination codons on both sides of the termination codon, as is

observed in the current sample. The frequent appearance of U immediately after the

termination codon, as observed previously by Kohli and Grosjean, 1981, suggests that it may have some function in translation termination.

The terminal regions of sm~ll samples of HQID.Q sapiens and bact~riophage '1'4 genes have been investigated. Significant differences are seen between H· sapiens and ,t. coli. The li· sapiens sample used was that whose codon usage was tabulated by Alff-Steinberger, 1987. The relative frequency of the three stop codons is different in the two species.

In the 35 gene fi, sapiens sample, ·the

frequencies of UAA, UAG , and UGA were 14 , 10, and 11. 33 of these sequences extended beyond the termination codon. Of these 33, only 6 have U on the 3' side of the stop codon, and

Table 7

Pretermination codons

Positions (4-10]

358 Non-pretermination 1-049 codons

C. Alff-Steinberger

Positions [ll: up to but not including initiation]

16602 50220 Chi-square=0.26, probability <0.61

there is only 1 double termination. Thus, the context of

sapiens termination is

significantly different from that of ~. £.Qli on the 3' side. The number of

sapiens codons in this sample is not large enough to permit a significant measurement of

pretermination codon frequency in the terminal region.

Sequences of 37 bacteriophage T4 genes were provided by R. Epstein and collaborators from EMBL and other sources. Analysis of the terminal region of these genes shows that T4 is in some aspects similar to its host ~.

coli. The frequencies of the three stop codons UAA, UAG, and UGA were JO, 2, and 5, which is consistent with 153, 10, and 38 from

~. coli, in Table 1. 36 of the 37 T4

sequences extended beyond the stop codon. Of these 36, 18 have U on the 3' side of the termination codon, of which 6 are double terminations. The double termination

frequency is consistent with that observed in

~. coli. The T4 genes have more UA content than those of ,t. ~. so that an increase in U or A in any given position is to be

expected. There are 18 U's'and 10 A's on the 3' side of the T4 stop codons. Clearly better statistics are needed before one can conclude that the increase in U is significant. As in the case of the fi. sapiens sample, the number of T4 codons is not large enough to permit a significant observation of pretermination codon frequency in the terminal region.

Comparing Tables 1, 2, and J, and looking at each amino acid separately, it is seen that the codon usage for most amino acids in the terminal region is consistent

(6)

Codon usag~ in the terminal rep;ion of E coli J!Cnes

with the usage i~ the earlier part of the gene, the exceptions being histidine, leucine, arginine, lysine and alanine. For histidine, the ~odon CAU is much preferred in the terminal regioo. For leucine and

arginine, some of the rarer codons appear more frequently. For lysine, AAG usage is slightly increased in the extreme terminal region. For alanine, the relative usage of its 4 codons is different, with GCU rather than GCG being most frequent. The GC content of the terminal region does not differ ' from the average GC content of the gene, unlike the region following the initiation codon, where modified codon usage results in lower GC content (Rodier et al, 1982).

ACKNOWLEDGMENTS

I am grateful to L. Caro, R. Epstein and H.M. Krisch for discussions of this work. I thank the Computer Department of the Cantonal Hospital of Geneva f o r o ffering computer facilities and for user-friendly expertise, and H. Rieben for advice in using the Cyber.

This work was supported in part by grant No 3.516.- 0.83 to L. Caro fr om the Swiss National Science Foundation.

REFERENCES

Alff-Steinberger, C. 1987. ;i. Theor. Biol., l_H:89-95·

Fluck, M.M., Salser, W.

- 1977. Mol. Gen. Genet., Kohli, J. & Grosjean , Genet., 182:J0-4J9 .

& Epstein , R-

151~1.37-1.49.

H. 1981. !iQ.,l. Gen.

Rodier, F., Gabarro-Arpa , J., Ehrlich R.

Reiss,

c.,

1982. Nucleic Acids Res., 10:391- 401.

145

&

Watson, J.D. 1976. Molecular Biology of the Gene, published by W.A. Benjamin, Menlo Park, California.

Références

Documents relatifs

The aim of this study was to analyze and compare plasmids coding for resistance to ESC isolated from 16 avian commensal and 17 avian pathogenic Escherichia coli (APEC) strains

Using population transcriptomics data, we analyzed the relationship between codon usage, gene expression, allele frequency distribution, and recombination rate in 30 nonmodel species

The LRT is performed using two competing mutation-selection models: the null model is built with selection acting only on amino acid usage, assigning the same fitness to each

recherches aussi poussées, ce qui peut expliquer les divergences entre résultats numériques et expérimentaux. 2) Deuxièmement, il est probable que ces matériaux aient

Canadian Counsellor 1-143 Education II Building The University of Alberta Edmonton, Alberta T6G 2E1.. The CANADIAN COUNSELLOR is published quarterly by the Canadian Guidance

The neutrality plot or neutral evolution analysis was per- formed to determine and compare the extent of influence of mutation pressure and natural selection on the codon usage

Key words: stop codon readthrough, mechanism, translation termination, premature 22.. termination codon, readthrough activators,

whether the small RNAs produced on the 3ʹUTR in the absence of CSR-1 protein or its catalytic activity are also synthesized by EGO-1, we efficiently depleted CSR-1 using an