• Aucun résultat trouvé

A transposon with an unusual LTR arrangement from <i>Chlamydomonas reinhardtii</i> contains an internal tandem array of 76 bp repeats

N/A
N/A
Protected

Academic year: 2022

Partager "A transposon with an unusual LTR arrangement from <i>Chlamydomonas reinhardtii</i> contains an internal tandem array of 76 bp repeats"

Copied!
9
0
0

Texte intégral

(1)

Article

Reference

A transposon with an unusual LTR arrangement from

Chlamydomonas reinhardtii contains an internal tandem array of 76 bp repeats

DAY, Anil, ROCHAIX, Jean-David

Abstract

TOC1, a transposable element from Chlamydomonas reinhardtii, is 5662 bases long. The 217 and 237 base long terminal repeat sequences of TOC1 are unusually arranged around the 4600 and 123 base unique regions: [217]-4600-[237] [217]-123-[237]. Although TOC1 contains long terminal repeats and most TOC1 elements are complete, features shared with virus-like retroposons, its unique 4600 base region is more similar to the structure of the L1 family of non-virus retroposons: first, 11 3/4 tandemly repeated copies of a 76 base repeat are found 813 bases from the left end of TOC1, and second using the universal genetic code large open reading frames were not found in TOC1. The relationship between TOC1, virus-like retroposons and the L1 family of non-virus retroposons is unclear and may be very distant since only poor similarity was found between the TOC1 encoded ORFs and retrovirus polypeptides. The length of the tandem array of 76 base repeat sequences was conserved in most TOC1 elements and solo 76 base repeat sequences were not found outside TOC1 elements in the C. reinhardtii genome. Nucleotide substitutions allow all copies [...]

DAY, Anil, ROCHAIX, Jean-David. A transposon with an unusual LTR arrangement from Chlamydomonas reinhardtii contains an internal tandem array of 76 bp repeats. Nucleic Acids Research, 1991, vol. 19, no. 6, p. 1259-1266

DOI : 10.1093/nar/19.6.1259

Available at:

http://archive-ouverte.unige.ch/unige:141773

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Nucleic Acids Research, Vol. 19, No. 6 1259

A transposon with an unusual LTR arrangement from

Chlamydomonas reinhardtii contains an internal tandem

array of 76 bp repeats

Anil Day* and Jean-David Rochaix

Departments of Molecular and Plant Biology, Sciences 11, University of Geneva, 30 Quai Ernest Ansermet, CH-1211 Geneva 4, Switzerland

Received December 13, 1990; Revised and Accepted February 22, 1991

ABSTRACT

TOC1, atransposable element from Chiamydomonas reinhardtii, is 5662bases long. The 217 and 237 base long terminal repeatsequencesofTOC1 areunusually arranged around the 4600 and 123 base unique regions:

[217]-4600-[237][217]-123-[237].

Although TOCI contains long terminal repeats and most TOC1 elementsarecomplete, features sharedwithvirus-like retroposons, its unique 4600 base region is more

similar to the structure of the LI family of non-virus retroposons:first, 11 3/4 tandemlyrepeatedcopiesof

a76baserepeatarefound 813 bases from the left end ofTOCI, and second using the universal genetic code largeopenreading frameswerenotfoundinTOCI. The relationship between TOC1, virus-likeretroposonsand the LI family of non-virus retroposons is unclear and

may be very distant since only poor similarity was

found between theTOCI encoded ORFs and retrovirus polypeptides. The length of the tandemarrayof 76 base repeat sequences was conserved in most TOC1 elements andsolo 76 baserepeatsequences werenot found outside TOCI elements in the C.reinhardtii

genome. Nucleotide substitutions allow all copies of the 76 base repeat to be distinguished from one

another.

INTRODUCTION

DispersedrepetitiveDNAsequencesthattransposeviaanRNA intermediate have been called retroposons [1]. One subclassof retroposons,virus-likeretroposonsorretrotransposons [2,3,4], containlongterminalrepeatsorLTRsand resemble theintegrated provirusDNAstageof animal retroviruses[5]. Asecond subclass of retroposons, nonvirus-like retroposons, is composed of a

diversesetof DNAsequencesthatcanbegroupedonthebasis thattheylackLTRs, produceatargetsiteduplicationofvarying lengthandareoften flankedby anoligodAtract atthe 3' end of theirsensestrand. Some members of thissubgroup, e.g. Li elements frommammalian genomes [3, 6], encode a putative

reversetranscriptase. Althoughretroposonswerefirst identified

EMBL accession no. X56231

in animalsexamplesofboth virus-likeretroposons [7, 8, 9, 10, 11, 12, 13] and nonvirus-like retroposonshave beenfound in plants [14].

We have previously isolated a 5.7 kbp transposon, named TOC 1, from the nuclear genomeof the unicellular greenalga Chlamydomonas reinhardtii [15]. TOCI causes mutationsand

wasdiscoveredas aninsertion in theoxygenevolving enhancer polypeptideone(OEE1)gene.The classificationof TOCI poses aproblembecausealthoughitcontainsLTRs witha5-bp inverted

repeatlocated4 bp from theend that follows the5'TG 3' CA rule ofretrotransposons,thearrangementof theseLTRsis unlike that ofanyotherretrotransposon: the leftendofTOCI contains partofthe LTR (217 bprepeat) the remainder of which (237 bp repeat) ispresent at its extreme right end and is separated from thecomplete right LTR (tandem 237-/217 bprepeats) by

a unique 123bp sequence (see Figure 4).

A simple model for generating the unusual split LTR arrangementofTOC1isshown inFigure 1.Inthis model, TOCI elementsareproduced fromahypotheticaltransposonwiththe usual LTR structure of retroviral proviruses and virus-like

retroposons. The LTR can be divided into the U5, U3 and R

sequences[5] accordingtotheregionsof the LTR thatareunique tothe5'(U5), 3'(U3),andrepeated (R)atboth 5' and3'regions of theproviral transcript (seestep1 ofFigure 1).Thehypothetical retrovirus-like TOCI element istranscribed andconvertedinto

a linear double-stranded DNA molecule by the normal retrotransposonreplicationmachinery [16, 17, 18, 19]. During

orafterreplicationeitheraDNAcopyof theprimerforminus- strandsynthesis (reverse transcription)or anotherforeignnucleic acid species is linkedto theends of the linear double-stranded DNA molecule. When thismolecule circularizesby ligationof its ends the2-LTRcircular DNA molecule shown in step2of Figure 1 is produced. Integration within the U3 region ofthe LTRproduceseitherof thestructures shown in step 3Aor3B.

Therefore, in thismodel,theunique123 basesequenceofTOC1 is assumed to be derived from either the minus-strandprimer

or aforeign nucleic acidspecies.Ifaforeignnucleicacidspecies formsthe 123 basesequencethen the boundariesof the 237 and 217 bprepeats are shown in Figure 1 steps 3A (i) and 3B (i).

*Present Address: Genetics Laboratory, Biochemistry Department, South ParksRoad, OxfordOXI 3QU,UK

.=) 1991

Oxford University

Press

(3)

1260 Nucleic Acids Research, Vol. 19, No. 6

If the minus-strandprimerforms the 123 base sequence then the primer-binding site would also contribute tothe 237 bprepeat (step3Aii) or217bp repeat (step 3B ii). Aberrant replication of theminus-strandprimer has been describedduring Moloney murine leukemia virus DNA replication; a DNA copy of the primeris foundsandwichedbetweenthetwoLTRs[20].Of the twointegrationeventsdrawn inFig1 steps2A and2B,integration in therightLTR(step2B) producesanoutcome(Fig 1. step3B) which is consistent with the location ofa putativepromoter in the left 217 bp repeat of TOC1 [21]. This integration event predictslocations for the priming sites for minus-strand (-PB) andplus-strand (+PB)synthesisandtheorientation which should contain open reading frames (ORFs) that resemble the polypeptides of virus-like retroposons.

In order to gain a better understanding of the relationship between TOCI elements and other transposable elements we

determined the complete nucleotide sequence of a recently transposed copy of TOC1. This copy of TOC1, designated TOC1.1, was isolated from the OEE1 gene [15].

MATERIALS AND METHODS Media, strains and crosses

cc1373 and cc407 were from Dr. E.H.Harris at the

Chkmydoionas

culure collection,DukeUniversity. cc1952 [22]

was from Dr. P.A. Lefebvre (University ofMinnesota). The FUD44-R2 revertantof thephotosynthetic mutantFUD44 has beendescribed[15].The cellwall-lessmutantscw15[23]ofboth mating types (mt+ and mt-) were from P.Bennoun (Paris).

Chlakinonwnascells were grown inamediumcontainingacetate [24].

DNA manipulations

Methods for DNAextraction, restriction enzymedigestion,DNA electrophoresis, DNA blotting [25], DNA hybridization and sizing[26]were aspreviouslydescribed[15].Beforereprobing blots, hybridization probeswereremovedbyheating in distilled water at 80°C for 2-10 minutes. DNA restriction fragments purified on agarose gels were used as hybridization probes.

Hybridization probes B and C ofFigure7 were generated by SphI and BamHI digestion of the cloned 4.1 kbpHindIll-PstI internal fragment ofTOCI

[pTOC1.1R1

in 15]. To remove contaminationofthegel-purified233 baseSphI +BamHI(probe BinFigure7) and349 base SphI fragments (probeC inFigure 7) with the 1.3 kbpBamHI-SphI fragment (see Figure 7), 32p- labelledhybridization probes were pre-hybridized with single- stranded 76 base repeat-specific plasmid(probeAinFigure 7) bound tonitrocellulose. This 76 baserepeat specific fragment (probe A)was subcloned frompTOC1.iRI by using DNAseI and Klenow enzyme [27], followed by BamHI digestion; the DNAseI generated end is located at base 67 of the tenth copy ofthe76 base tandemly repeated sequence in TOC1. Toestimate contaminationofprobes B and C with 76 baserepeat sequences, these probes were hybridized to filters bearing a 76 base repeat plasmid (probe A) and dilutions of pTOC1.IRI. No contamination ofthe233 base fragment (probe B) with the 1.3 kb BarnHI-SphI fragment was detected while the 349 base fragment (probe C) hybridized to itself with a fifty fold higher intensity than it did to the 76 base repeat plasmid (probe A).

Restrictionfragmentswerelabelledbytherandomprimermethod [28].

DNA cloning

Internal TOC1. 1 restriction endonucleasefragmentswithBamHI, BglI, HindI, NcoI, PstI and Sacd ends were cloned into Ml3mpl8, Ml3mpl9 [29], and blue-scribe M13+ and M13- phagemid vectors (Vector cloning systems, San Diego). For directed nested deletions: DNA fragments bearing one DNaseI generated end [27] and one BamHJ or PstI end were cloned into SmaI + BamHI or SmaI + PstI digested blue scribe M13 +.

Ligation mixeswereusedtotransform competentcells according tothe standard transformationprotocolof Hanahan[30]. Phage or phagemids were grown in E.coli JM109 [29] or E.coli XL1-blue[31]. M13 recombinant phage particleswereprepared according to Messing [32]. For bluescribe phagemids, phage particleswereprepared usingM13K07helperphage according tothe supplier (vectorcloning systems). Single-stranded template DNA wasisolated fromphage accordingtoDente andco-workers

[33].

DNA sequencing

Single-strandedDNAtemplatewassequenced usingthe modified T7 DNApolymeraseaccordingtothesupplier (sequenase, United StatesBiochemicalsLtd.) with35S-dATP(Amersham, UK).Out ofatotal of 42 single-stranded templates sequenced usingthe

U3 IUSR -PB

( Primer()-PB

3A

_~~~~+B

(ii) 237j,217,:,23 2_

Transposition cycle i

+PB U3 R1U5

Primer (?) -PB x

3B

---PBA=

(I):2I7~~ +Fe237421'' 3 (i):217:j 237:W..732 j [

QI9 Transpositioncycle

Figure1. A model depicting the possible origin of TOCI elements. The model suggeststhepossible originsof the 123 base unique region, and 217 and 237 baserepeatregions ofTOCIfromthe U3, R andU5LTR regions of ahypothetical progenitor ofTOCIwithconventional LTRs. The locations of the promoter (P), primerbinding site for minus strandsynthesis(-PB) and priming site for plus- strandsynthesis(+PB) are shown. Theprmerforminus-strandsynthesisisusually atRNA[5].The 3' OHpriming plus-strand synthesis is formed by RNAseH actionontheoriginal RNA template base-paired to the newly synthesizedminus- strand DNA [5]. 1. A hypothetical TOC1 element with a standard LTR arrangement istranscribedto produce afull-lengthtranscriptindicated by a wavy line. 2. Reverse transcription and second strand DNA synthesis via the standard retrovirusreplicationmechanismproduc alineardoublestandedDNAmolecule.

During replication a copy of the primer for minus-strand synthesis (reverse transcription)oraforeign DNA species is attached to this linear dsDNA molecule.

Circularization via ligation of the ends of this molecule produces the circle shown instep 2. 3. Integration in the left (2A) or right (2B) LTR of the circle produces theoutcomesshown in 3A and 3B. (i) represents the consequence if the 123 base sequenceisaforeignDNAspecies (ii) if the primer forminus-strandsynthesis formsthe 123 base sequence then part of the 237 base repeat (3A) or 217 base repeat (3B) will be composed of the original minus primer binding site.

(4)

141

421 561 701

*41 91

r 217

rATacr^nxc_AGSGCAACTGC217qp

_CAMWI ( : TG=TTGTWSTICiG G ASCGTCCCSc _ _ _ _ _217 p

59 MA 217 J

* * *(4) * . * F76 bp (17* * (5

7* Ip

I (2) am ..HI . I(3)

_w_ _ _ _~~~~~~~~~~~~~~~~~~~~~~~~~~~~TlXASCXS=AA76bp

=CcTiWTTTATGGACAA7GGRCGGCATAK rnGCGCCCTTOC ~ ~ ~ ~ ~ SCitS00O 76bp

1121 _ __ _ GTTTAS GGACATGOXCC 76bp

I (7) (8)

1261 _ _ _ _ 76bp

I 19) . . . . . . . I (10)

1401CGTATC--A-- G ---CCGCAGOCAACGGOCTVGACCcC C CC C G 76bp OAF16 Y G L C A A G N G L R A L A S N T N R R V R v L N I R P V R R K Q R A P H P C F M

. (11). . . . . . (12)

1541 76bp

ORF16D N S R H T C P Y D V T C V P O A T A S H P C F N V I D G I R A L M I R P V C R K Q R A P H S

76 * * * r (13) 1 r (14)

1681 _A GI C

ORF16 C V V D Y R R H

ORF5 T A L K C L R A F V V H A F A L N T N C G L S A T F G Y G V C A A S N R L C A S H

r (15) .

1821 ATG T__GG TTGAGTGTGCA=

ORFS H V S C D N A C V P H A I N L P A L S L N T L R L V A C P A A C P V A I R L E C T V H G K P

1961 _ TGCG GC A T AC G

ORF5A P R Y C F V L T T E R A Y V I N R H P V R L E V R T H I I L S R G T T L V VN L T H PK P T 21010RF5 Y N H S N C A L G P S A E F V F P D I S P S V S D V E F C A V R P A C Y A 0 P I T I P H D A 224106 L Y R N T H K H A F V P C T R H R T L C H A V S P T L R N Q H F V A R Y G R S N D Y S L P G

2381 CCAGCA TCGOCT

ORF6 G F G N R P C V G L G C C C S R N G P P P S H G T H P P S P L V G P P A C T H A S H P L A I L

2521 AT

2661 ORF19 L L S R C Q A P C N Y L N

2601 A

CRF19T R Q A G L C R S V G R L Y G T A S V S R Y G L M L T V V T R L A C C A A R S L P I L Q D N Y ORF7A R N P P V N H G I C V T L N P H A Y R S H A A C V L C G T L P A N P S G H V 2941ORF19 A Y N L L R Q S H S R G S V F C Q C L N N P E T Q A N V R I S S A T H V N A N C N C R L P T ORF7A C I H V T S S E P ORF29 T Q R P K P T Y A L V P QR T S N P T V C V A C P L 3061ORF19 R E N T P R E P T L V P F S P G L T S P G G A I T Y H R S A R S S V L Y V I T I K S V N R H R CF299AS G R L G N Q P N F L S A P A L L H P A G Q L L I T E V L G A V C YN S L L L R A Y G D T G

3221 T_U

OF19 L T C G

0OF29 S H V G S Q A N Q A S Q L S P A S S Q Q Y H D V P V R G R V R S L A A G V A V S S L T L D D

3361 ---

CRF29 PS A T A V H Y N H V P R V A H P P A P P S V R S L N L G R G P N R I Q P A I A A P S A P Y F

3501 Ol

ORF29 L P S L R L L S L ORF9 G V T L H V A L P

3641 _ __TTATL

CRF9A I T V H P T A A V A S G I G T Y G T L C A C I P A A F G G K H S R A R L S C V R R G D A I Y

3761 TTCG

ORF9 G C T Y C L I G S D S Y A R H C K S Q A N H L A S K K E I S V R G H A A R L G N A A R I L H H 3921A

06F9 D G A G E R R A G C R A

0RF31 T K G RL P R V .T R V R K T A. L V L Q L A P S P A N RN L A S L S Q L R V T T L L L

4061 AT & C=

0OF31 F P R L F R A A Y N V N T C Y G N A L G A G R E Y R G L Q N F V P T R A N T H C Y L S N A N

4201 TCiGTGC

O0F31G R P Q L P T S N T P E 0 G C N C F P V T P R R R L H G N R T R D N A C V F L S N P L V N L I

43411

031 4491 4621 4761 4901 5041 5181' 5321

R L

TO MGAGTWAACGTTCGTGOOCAAG

r237bp

237bp

~ - 1~0T50TMGmC1'00CT09T0G9GCTT¶0TooT'y~6A(~To0ArrTvorro 237Np 237bp ir217bp

_ _ _ _ _ ___ - 237/217

217Np,r 123 bp

TTt0X0C0CGTTC217/123

123bp gr237bp

x; 1123/237 AOF23 C G G T R G R C G V R G C L D S A A I R N F I N A T N P V L A A L S A T S A T S T1

Sca I

5461A xGAGI GTWCW 237bp

0Ul23 R A G N GG H G G L G G V I G S C E A G G R R P L R A T N E L L G R L R H E N R L L S V . 237bp 1

5601TY00GANTV0TT0G0GccT0Gc0GnI0TAAcGCT_A00Ta3'N9i T

Figure2. Nucleotide sequence ofTOCl.1. Numberingstartsfrom baseoneofthe left 217 baserepeat. Locations of the217, 76 and 237 baserepeatsand 123 baseuniqueregionareindicatedontheright. The 76bprepeatsand four truncatedcopies (rptnos. 12, 13, 14, 15)arenumbered in order of distance from base one atthe left end ofTOCI. Alsoindicatedarethe 5' (base 165)and3' ends(base 5648)oftranscriptsthatmap withinTOCI [21].The amino acid sequences of ORF7A(47aminoacids)andmostof the ORFsgreater than 200bases,classifiedascoding (ORF9, 92%)or noopinion(40%, ORF5,ORF6,ORF16,ORF23, ORF31;77%,ORF19, ORF29) accordingtoFickett[40],areshown. ATGs thatprecedeORFs thatscorepositiveor noopinion accordingtoFickett[40]areunderlined;

ORFs truncatedattheir 5'endbyATGneednotgivethesame score asthecompleteORF.UniqueBamHI(G GATCC)andScaI(AGT ACT)cleavagesites that lieadjacenttorepeat sequencesare indicatedbyopentrianglesabove the sequence.

(5)

1262 Nucleic Acids Research, Vol. 19, No. 6

standard dGTP reactionmixes 24wererepeatedusingdITPto resolvecompressed bands.Chemicalsequencingwasaccording toMaxamand Gilbert[34]. Sequenceladderswerefractionated on denaturing 6 or 7% (W/V) polyacrylamide gels [35] and visualized by autoradiography.

Sequence analysis

DNA sequences were manipulated using the PC Gene programmes written by A.Bairoch, Department of Medical Biochemistry,Universityof Geneva. Sequencecomparisonswith release 17 of the EMBL databankwere madeusing theseqft and seqh programmes of Kanehisa

[36,

37].

Autoradiography and fluorogaphy

Hybridizationpatterns werevisualizedonKodakXS5 filmsby autoradiographyat roomtemperature(Fig. 7A)or amixture of autoradiography and fluorography in cassettes containing intensifyingscreens (Ilford, fasttungstate) at -70°C (Figs. 7B and 7C).

RESULTS

Both DNAstrandsofTOC1. 1 weresequenced. The sequences of the 217 base and 237base repeats, and 123 baseuniqueregion at the right end ofTOCI had already been determined [15].

Figure 2 presents the nucleotide sequence of the 5662 base TOCl.1 element and is annotated with thepositionsofrepeated DNA sequences and open reading frames (ORFs) discussed below. TOCI is 59% G+C rich which is similarto the62%

G+C value for C.reinhardtii nuclear DNA [38].

Open-reading frames in TOC1

Figure 3 shows the positions of ATGstart codons and UAA, UAG and UGA stop codons in TOC1.1. Since the 5' to 3' orientation of thenear-full length TOCI transcript [21] is left to right with respect to the TOCI orientation in Figure 3 we

expected larger ORFs in this sense orientation. However, translationusingthe universalgeneticcode shows that TOC1.

doesnotcontainlarge ORFs(Figures3 and4).ThelargestORFs inTOC1.1 areORF 5 (180 amino acids), ORF 24 (196 amino acids),ORF29 (175 amino acids) and ORF 35(207aminoacids).

The shortest ORF shown is ORF7A (47 amino acids) followed byORF22 (71aminoacids). ORFs 2, 3, 4, 14, 15,and25which encodea75 amino acidlong peptideandORF16 which encodes

a96amino acid peptide all resemble each other becausethey are translated from an internal tandemly repeated 76 base sequence in TOCI (see Figure 4 and below). None of the the ORFsfollow the general pattern of codon usage in C. reinhardtii nuclear genes [39]. ORF 9 scored positive as a proteincoding sequence, ORFs 5, 6, 16, 19, 23, 27, 29, 30, 31, 34 and 37 gota no opinionscore of40-77%, while all the other ORFs inFigure 4 werenon-coding according to Fickett'sanalysis [40].

Visual inspection of TOCI ORFs did not reveal extensive similarity to the amino acid sequence motifs normally found conservedbetween retroviruses and reotransposons[41, 42, 43];

afinger peptide (ORF7A) and protease consensus (ORF29)were found(Figures SC and SE). The finger peptide is of the CC-HH typefound in TFIIa[44] and thecarboxylterminus of putative reverse transcriptases [45] rather than the CC-CH type found in thegagregion of retroviruses and virus-like retroposons [46].

Twoadjacent aspartic acid residues are found in the mosthighly conserved region of reversetranscriptases [42,43] and are present in RNA dependent polymerases [47]. A DD motif is present in ORF29(Figure5F). However, the two threonines (residues 116 and 123), in the DD region of ORF29 (residues 111-124 of ORF29) are notfoundin other polymerase segments [47].

The absence of long ORFs in TOC1.1 suggests that it transposedintotheOEE1geneusingfunctionsprovided byother loci. However, if TOCI is derived from anancestral element which onceencodedproteins necessary for its transposition we may expect to find short ORFs which are remnants of these proteins. Furthermore,sincethere are no functional requirements for theseORFs we would not expect the patternof conservation of these ORFs to correspond to the peptides normally found conserved between functional DNA or RNA-mediated transposons. To address thispossibility, and thereby not limit ourselves to previously identified conserved regions of retroviruses and retrotransposons, we compared 42 ORFs in Figure 4 (all the ORFs except ORF7A) to a translation of around 17,000 DNA sequences present in release 17 of the EMBL nucleic acid data bank usingtheseqft program ofKanehisa [35, 36]. The sequence similarities foundwereweak andsomeofthese correspondedtoretroviral

polypeptides

(FiguresSA, 5B and SD):

17/58 (29%) residues between ORF16 and the putative gag polypeptides of HTLVIand II [48, 49]; 11/27 (41%) residues betweenORFS and the precursorenvelopeproteins of HIVI and AIARV-2 [50, 51]; 15/69 (22%) between ORF31 and a polypeptideofunknown function (molecular mass of11,000)that is encoded by URFPX-1 located after the env gene of HTLV-1

217 H tB 76

m * tTrTrrTtrrrnmn

5s

M M 2372171231237

3. 24 1221314 51267 2TT 3019 31 ri

5' 23 m-2- 2413 1411161_7L L_Lj12 12711281 E -1111929 2030 31 221 132 3 ]

3.

3' 2

i i2 3 * Skb

IIIII I11111I1

k.."1.1,"".'..A,I 11 IIi1 11I'I~ , 'I'I II 1 ,Sh

Figure 3. Positions ofpossible ATGstartcodons,andUAA, UAG and UGA stopcodons withinTOC1. 1. Thesix reading framesareindicatedby horizontal lines.ATG codonsarerepresented byverticallines above the horizontal. Stop codonsUAA, UAG and UGAaredrawnasverticallinesbelow the horizontal.

4336 T1L3

4737 E='40 E1,11[J1 5.

Figure4. Overall stnictural organization ofTOCI. Inaddition to the 217 and 237base repeats, theposition of the 76 base tandemly repeated sequences that startimmediately after the six base recognition site of BamHI site are shown.

Thepositions of theBamHI (Ba),HindI(H)and two of theMIuI(M)sitesof TOC1.Iareindicated. The largest ORFs in Figure 3 are indicated and assigned numbers from 1 to 42. The cut off values for the ORFs shown are 200 bases reading left to right (except ORF7A which is 141 bases) and 290 bases reading righttoleft.

(6)

[48]. The lengths of the polypeptides stretches compared are too short to be statistically significant [52]. However, it is interesting to notethat the relative order of ORF16 (weak gag similarity)- ORF7A(finger peptide)- ORF29 (DD containing ORF, Fig.SF) with respect to the 5'-3' TOCI encoded transcript is the same astheorder(5' gag with finger peptide-reverse transcriptase 3') found in retroviruses and retrotransposons (see Figure 4). The protease region is closer to the reverse transcriptase region in ORF29 than found in virus-like retroposons. However, in another class of TOCI element (TOC1.2) these regions are separated by aninsertion of 28 bases and are in different reading frames [53].

Normally the envelope precursor protein is found after the pol gene in retroviruses, is absent in many retrotransposons and is not wellconserved in evolution [42]. Therefore, becauseORF5 is located between ORF16 (weak resemblance to gag) and 0RF29

(weak resemblance topol) rather than to the right of ORF29, the weak similarity between ORF5 and the precursor envelope proteins of HIV-I and ALARV-2 may be spurious.

Amixture of 14-meroligonucleotides complementary to DNA sequences that encode most of apentapeptide that is conserved between reversetranscriptases Y-M/V-DD-I/M was previously found to hybridize strongly to aninternal region of TOCI that lieswithin a500-bp stretch right of the unique BamHI site [15].

Thishybridization is to the 76 base repeat region (see next section) and is misleading since the polypeptides encoded by the complementary sequence bear little resemblance to the pentapeptide of reverse transcriptases (Figure 5G). We were suprisedthat thisduplex of 10 bases with one mismatch was stable underthe hybridization conditions used [5 xSSC/ 30% (V/V) HCONH2/250C].

A

C16 (96 amino acids) ETAVI peg ETLVU gag CWS (180aminoacids) AXAAV-2 *nvlop.

MN 375 ev*elop.

L 64.

7.S PL IDH GI AP C RW CRG AGPR SHTAAL G A LWI GWLRAIASMWa-RVRVIMRIPVRAP VRRHPCFWHSRHTCPYDVTCVPQ6TSHPC GVLRACQwTPIDLWrPI-K-vLwPCFRPCrC

G,VLWOP-RR-PPP TOPCFR C GPC

60. 86.C

SLWrRLVACPAACPVAIRLECTVHGK CO7A (47amino acids) SLWSrPvLTcTN TXXIA (Rpt . no. 7) SLWDQSLKPCYKSTPLCVSTICTbLTN

aminoacidsubstitutions inORFs2, 3, 4, 14, 15

& 25relativetoa'r16

22. 43.

AACVLC-GPAWSG-HVCiHVr AvCDVsNyuuHDioHH

D 16. 84.

0W31 (137anoacds) LvLAPsPAASLrmLLF-P=F_mRwmn GzRGI.*vPmmnacL TVS (VW P-1) LLLRPPPAPcUVLLSGLL'LLFL.PLFFSPULLtSPSWISwpAwFLPWAPSt'AAA-FL

E 69. 74. F 111. 124.

Os29 (175aminoacids) YGDTGS 0W29 AvSSsLTDDPSATA protean. 17.6 LIDTGS Poly raseo s -hh-YGDD-hh-

smvz L.iDTGA yy M yy

remrse tremcriptase ETV-1 YmDDi V YNDDL

G M m FvDDM

T H D G I

5'ACAATGGACGGC AT 3' 76bp rpt (nos. 1, 2, 3, 4, 5& 6) 3' ATG TACCTGCTGTA5' oligoreversetranscriptase

... ... ... ... ..

5' TAC ATG GACGACAT3' Y M D D IH

Figure5A-F. Comparison between TOCIencoded ORFs and retroviralpolypeptides.Thesequences of polypeptides are given in the single letter code for amino acids. Positions ofidentityareindicated by large capital lettersand dashes were used to indicate gaps introduced to optimize alignment. Polypeptides were translated from:bases 1777-1941 (gag) and 6921-7127 (URF PX-1) in HTLV1 [48] , bases 1800-1964 (gag) in HTLVII [49], bases 382-462 (env) in HIV NY5 [50], bases6103-6183 (env) inALARV-2[51]. The 'zinc finger' motif was from the seventh 30 amino acid tandem repeat intranscription factorEIIaorTFIia[44].

Protease andreversetranscriptase consensussequences werefrom Fuetterer and Hohn [41]. The polymerase consensus was from Argos [47]. HTLVI and II are human T-cellleukemiaviruses I andII.HIVNY5 andALARV2 are different isolates of the human immune deficiency retrovirus. RSV=Rous sarcomaretrovirus, 17.6 =retrotransposon fromDrosophilamelanogaster, Tyl=retrotransposonfrom Saccharomyces cerevisiae.F)Polymerase consensus hy =hydrophobic amino acid andadash indicatesanyaminoacid.G)Thebasisof the observedhybridizationofanoligonucleotide encoding the YMDDI/M conserved pentapeptide of reverse transcriptases tothe 76bprepeatregionofTOCI [15].

10 20 30 40 S0 60 70

3 CC CA AC e cG=IITAcTGCCCTTATGATACe = 5 ..- . . .. . . T.. .-'. . . I .....-.-- -- -. . .. ..

2 .. .. T..- .T.. ---..--..-..--.-. .. . . . . . .. .. .. ..

4.. T6

1... ..T

....T..C...

.C.:G:

. .... G C

108 .......G. ....T.C CTC. .G.

9..T.A -- -- . C.C---CGTC--G--.---C----

11*A*-l- -T..- . . T*--A ..- . ... G -- T*--- 12 A...T.- AA*T.*G--G ....TT*-A *A...

15 .---T ....T....IT*--GCT -.-.-GT*G...T. >

14 .*..G -T----C. A* TT....>

13 <*-T* G--G ...>

IDENTITY POSITION 76/76

75/76 74/76 74/76 73/76 73/76 70/76 69/76 67/76 67/76 66/76 49/59 36/48 27/33 18/21

965-1040 1117-1192 889- 964 1041-1116 813-888 1193-1268 1269-1344 1497-1572 1345-1420 1421-1496 1573-1646 1647-1705 1839-1886 1783-1815 1735-1755

Figure6.Heterogeneity inthe76 baserepeatofTOCI.Repeatsequences have beencomparedtorepeatno. 3. Positionsofidentity areindicatedbydots. Base substitutionsareindicatedbythesingleletter code forbases.Deletionsareindicatedbyadashandinsertionsbyan arrow.Theboundaries ofthetruncatedversions ofthe 76 base repeats(repeatnos. 12, 13, 14 and 15)aremarkedbyarrow-heads. Theidentitycolumnindicates thenumberofmatchesscored overthenumber ofbasescompared.Thenumberingschemeforbases is relativetobase1of the left 217 baserepeatofTOC1.TheDNA sequenceshownis fromthetop strandofTOClI..

RPTNO.

(7)

1264 Nucleic Acids Research, Vol. 19, No. 6

111 27A217' _n237 217'123237

2 _ A)

I a\

3 - - A

4 5 6

HA)

237 217123237'217'\

^ _ t A ~~~~~)

217.'...., f\

237 217

m *w

7

_ _

--.U.

8

237 217 237 217

5'AGTCAATAA GCGCTGACT AGTCAATAA GCGCTGACT

Figure7.The76 base repeat isnotfound outside TOCI elements. Southern blot of 1% W/Vagarosegel containing SphIdigestedtotal DNAfromtheindicated strainsrunalongsidecopies/genomelanesconsistingofdilutions ofaplasmid containingafull-lengthcopyofTOC1.1digestedwithSphI.Forcopies/genome estimations thehaploid C.reinhardtiigenomesizewastakenas100,000 kbpwhich isanapproximation becausethe nucleargenomeis70,000 kbp [62]andtotal DNAcancontainupto 15% (by mass)ofchloroplastDNA. The locations of theprobesused in A(756 bp),B (233bp)andC(349bp)areshownbelow themapof TOCI. The blotprobedin Awasreprobedin B and C. Sixmonths passedbefore the blotwasreprobedwithprobeB. Intheintervening periodthe blotwashybridized withaTOCI leftjunction probewhichgivesadifferent hybridizationpattern from thatgiven by probeA(notshown).Thehybridized probeBwasstrippedoff beforeprobingwithprobeC. Wash conditions:inA, 0.1xSSC, 0. 1% SDS,60°C;in B andC,0.1xSSC,0.1% SDSat50°C (1xSSC is0.15MNaCI,0.015 M trisodiumcitrate, pH 7.5).

Repeatedsequences in TOC1

TOC1.1 containsa217baserepeatateachend andtwocopies ofa237 baserepeat at itsrightend. Nosequence differences

arefound between thetwocopiesofeachrepeatedsequence[15].

TOC1. 1 also contains 11 3/4copiesofatandemly repeated76 basesequencethatstartswithguanineatbase 813ofTOCI or

adjacenttotheuniqueBamHIsite in TOC1 (see Figures2and 4);notethat the 76 basesequencecanalso beconsideredtostart

withcytosineatbase 812. Three truncated versions (repeatnos.

13, 14 and 15) ofthis76 basesequence are foundtothe right ofthe tandem 76 base repeat array (Figure 4). Figure 6 is a

comparison oftherepeatunits of the 76 bptandem array and the 3adjacenttruncatedversionswiththe third76bprepeatunit.

Nucleotide substitutions allowallthe76 baserepeatunitstobe distinguished fromeachother. Ingeneral, repeatunitslocated furtherawayfromrepeatthree showahigherdegree ofsequence

divergencethanthemoreproximalrepeatunits. Of 83differences betweenrepeatno.3and theother 76bprepeatunits, 56were

transitions,24weretransversions,twoweresingle base deletions and one was an insertion ofa single base.

Arethe 76basetandemrepeatsfound outside TOCJ elements?

The76basetandemly repeatedsequences arepresenton a 1.3 kbp SphI fragment. Southernblots of SphI digests ofDNAfrom

anumber ofC.reinhardtiistrains (cc407, cwlS+, FUD44-R2,

cwlS-) and DNAfrom cc1373 (C.smithii)aspeciesinterfertile with C.reinhardtii, produce a multicopy 1.3 kbp band when probedwithahybridizationprobespecificforthe76 baserepeat

(Figure 7A). Larger bands are found in thecc1952, cc1373,

9 217 237 217 123 237

.d ,,T

GCGC TGAC T AGTC AATAA

10 50

5-TGGTGGCRCCCGTGGG[123]TOC 3:RCCRCCGUCGCCRCCCBovineLeucinetRNA

8O 71

figure8. AmodelforretrotanspositionofTOC1.Step1.Initiationoftranscription is shown in the left 217 bprepeat. Polyadenylation takes place in thefar-right 237 basesequence.Step2. Theprimerfor minus strandsynthesis is postulated tobindtothe 123 basesequence.Possiblebindingsites includeastretch of16 bases from base 39 of the 123 basesequencewhicharecomplementary (with twomismatches)tothe 3' end of bovine leucine tRNA[63]. Step3. Theprimer isextendedtothe end of the 237 baserepeat. Step4.The firstjump of newly synthesized minus strand DNA tothe 3' end of the polyadenylated RNA;

complementarity is provided bysequencesofthe 237 baserepeat. Step 5. The minus strand isextendedonthe RNAtemplatetoleft 217bprepeat. Step 6.

Although plus strand synthesis is showntobegin withinthe 123 basesequence,

initiation ofplus strand synthesis anywhere left of the far-right 217base repeat iscompatiblewith this model. Plusstrandsynthesiscontinuestoinclude theend of the217 base repeat in theminusstrand. Step7. Secondjumpinvolving the plusstrandtothe 3'regionof theminus strand.Minus strandsynthesis continues tothe 5' end of the plus strand and elongation of the plus strand takes placeto aregionclosetoorincluding the primerofminus strandsynthesistoproduce thelineardouble stranded DNAmolecule showninstep8. Thislinear molecule is shownasthesubstrate forintegration. Only thejunctions between theoutermost 237 and 217bprepeatsdirectintegration.

cc407,cwlS+, FUD44-R2 andcwlS-lanes. Asmaller 1.2kbp bandis found incc1373.Ourpreviousrestriction mapping studies have shown thatanumberof restrictionenzymesitesconserved inTOCIelementsarenotpresentin thetwoTOC1-relatedDNA

sequencesinstrain cc1952 [53]. Therefore,the absence of the 1.3 kbp SphI band inthis strain is not suprising (Figure 7A).

Non-TOC1.1 SphIbands of 2.3 and5.2kbp in cc1952, and 2.0, 2.2, 2.7and 4.7kbpinthe other strains thathybridizetoprobe Aalsohybridizetoprobe B(Figure7B).Since probeBis adjacent toprobeAinTOCI thismeansthat thebands whicharelarger thanthe1.3kbp SphIbandareflankedattheir leftendbyTOC1

sequences. This suggests that these SphI bands are probably derived fromdivergedTOCI elements thathave lost theleft SphI site of TOC 1.1 rather than isolated 76bp repeats unlinked to

TOCI. A0.35 kbp SphI fragment (probe C) that is locatedto the immediaterightof the 1300SphI fragmentispresentinall m i:: . ill,

51 (I . .

3'T H --VI

P

(8)

the strains tested except cc1952 (Figure 7C). This indicates that the SphI site that flanks the right end of the tandem array of 76 base repeats is conserved in most TOC1 elements. Probe C contains 34bases of a truncated 76 bp repeat (repeat no. 15 in Figure 6)sequence and as a consequence hybridizes to the 1.3 kbp SphI fragment. The larger faint bands of 1.5, 3.3 and 4.4 kb in the 10 and25 copies/genome lanes, which contain dilutions ofaSphI digest of plasmid pTOC1. 1, are due to contamination of probe C with other TOCI sequences.

DISCUSSION

TOCI was devoid of large ORFs and although none of these showed strong similarities to the functional proteins involved in DNAor RNAmediatedtransposition some ORFs contained short amino acid stretches that exhibited a weak resemblance to polypeptides encoded by retroviruses. The significance of these resemblances is unclear and could be fortuitous or indicative of previouslyfunctional genes encoding retrovirus-like polypeptides that havediverged since becoming defective. Some of the TOCI encoded ORFs may be functional and poorly related to the previously identified transposition enzymes of DNA and RNA mediated transposons. Non-functionality of TOCI ORFs cannot be deduced from their unusual codon usage, which does not conformtothepattern of other Creinhardtii nuclear genes [39], ortheir lack ofidentificationasprotein coding regionsaccording toFickett'scriteria [40]; the codon usage of the copia and 1731 retrotransposons ofDrosophila melanogaster are different from otherD.melanogaster genes [54] and although a positive Fickett's analysisscoreisusually conclusive,aclassification of a protein- coding region as non-coding or no-opinion can result from unusualcodonpreferences[40]. Our analysis ofTOC1.1 putative protein coding regions was relatively simple and we did not consider ribosomebinding sitesorinitiation codons, nor didwe raise thepossibilities of splicing, frameshifts and suppression in thegeneration of open reading frames. Suppression of anyone ofthe three nonsense codons doesnotreduce the numberofORFs inTOCItothelarge gag and pol ORFs found in retrotransposons.

In common with virus-like retroposons TOCI elements are repeatunitsofadiscrete size bounded by LTRs which contain signals for initiation and termination ofa full-length transcript [21, 53]. However, no sequences immediately adjacent to the 217 or237 bp repeats were foundthat resembled the priming sites forfirst and second strandsynthesisin thereplicationcycle of retroviruses [5]. Furthermore, the internal organization of TOC1elements doesnotresemble virus-like retroposons and is distinguished by a lack of large open reading frames and the presenceofatandemarray of76 base repeats located 656 bases fromthe 5' end of the TOCI transcript (813bases from the left endof TOC1).Thesefeaturesareassociated with the LI family ofnon-viral retroposons found in mammals [6, 55]. LI related elements have been found in fruitflies [56], trypanosomes[57]

and maize [14]. The 5' region ofLI elements in the African bushbaby, aprosimianprimate,wasfoundtocontain sixtoeight tandemly repeatedunits of 73bpstartingat730bases from their 5' end [55]. Although there is no sequencesimilarity between the 76bprepeatof TOCI and the 73bprepeatof theL1elements ofthe Africanbushbabythecorrespondancebetweentheir sizes and location is striking. Since neither 76 nor73 is a multiple

of three this region cannot represent the repeated motifof a

polypeptide. However, similarpolypeptidesencoded by ORFs thatspanthreeadjacent76bprepeatunitsarefound in TOCI:

namely ORFs 2, 3, 4, 14, 15, 16 and 25. It is interesting that these ORFs and ORF16 in particular show weak similaritywith the gag region of HTLVI andII.

The 76bp repeat sequence is present in allTOCI elementsand the size of the 76bp tandem array is conserved in themajority of TOC1 elements. If thetandem array arose by duplication of internal TOCI sequencesthe 76 bp repeat units must havebeen identical at their inception. Unequal crossing over together with replication slippage within the 76 bp tandem array ofTOC1 would preserve the homogeneity of repeat units and give rise to tandemarrays ofdifferent length [58, 59]. Since the length of the76bp tandem array has been preserved in the majority of TOCI elements from differentC.reinhardtii strains and C.smithii, this indicates that the length ofthe tandem array must be important in TOCI amplification or that the general mechanisms which changethelengthofa tandem array occur at alower frequency compared to TOCI amplification. Selection against unequal crossing-over and replication slippage may be expected to lead tosequencedivergence of the 76 bp repeats through mutation.

Replication slippage that preserves the length ofthe tandem array would be allowed andprovides an explanation for the observation that adjacent 76 bp repeat units are in general more similar than non-adjacent 76bp repeat units.

A modelfor TOC1 transposition

The model in Figure 1 provides one explanation for the evolutionary originofTOCI elements. However,oursequence analysisofthe internalorganization of TOCI does not support this model and we think it unlikely that all present-day TOCI elements are products of hypothetical elements with intact LTRs.

Webelieve that TOC1 elements with their split LTR structure are capable of transposition. Our view is supported by the observation that the majority of TOCI elements in a number of strains containasplitLTRstructure[15, 53]. As a working model weconsider thatTOCI transposes by an RNA intermediate. This isconsistent with the observation that TOC1 elements produce a near full-length transcript that is initiated in the left 217 bp sequence and polyadenylated in the far-right 237 bp sequence [21]. The source of reverse transcriptase involved in TOCI transposition could be provided by anelement relatedtoTOCI or acompletely unrelated locus. Other retroposons havenotbeen identified in C.reinhardtii and a second group of transposable elements in C. reinhardtii resembles the DNA mediated transposons ofhigherplants [60]. Although anORFencoding a reverse transcriptase-like protein has been identified in C.reinhardtii, it is located in the mitochondrialgenome [61] and unlikelytoparticipate in the nucleus.Simple retroposition models basedontheanalysisofpseudogenes[1]andLI-related elements [14] wouldnotregenerateafull-lengthTOC1 element fromthe near full-length TOC1 RNA which lacks 5' and 3' terminal sequences, nor do these models provide a mechanism for maintainingthehomogeneityofTOCI LTRsequences.In order toregenerateafull-lengthTOC 1 element froma nearfull-length TOC1 transcript we propose the model shown in Figure 8.

Although this model is speculative it predicts replication intermediates and primers whose reality can be addressed by futureexperiments. Recentreports [64,65] show that the PAT family oftransposable elements in the nematode Panagrellus redivivus resemble TOC1 elements in terminal repeat organization. ThesplitLTRarrangementis thereforenotunique toTOCI elements and raises thepossibilitythatPATandTOCI elements may representa newgroup oftransposableelements that share a similar transposition mechanism.

Références

Documents relatifs

I have used a synthetic oligonucleotide probe to isolate the nuclear Rubisco small subunit genes (rbcS) directly from a genomic library of Chlamydomonas reinhardtii DNA..

We were unable to obtain an amino acid sequence for the mature OEE3 pro- tein, but have estimated the end of the transit peptide (amino terminus of the mature

cDNA clones encoding two Photosystem I subunits of Chlamydomonas reinhardtii with apparent molecular masses of 18 and 11 kDa (thylakoid polypeptides 21 and 30; P21 and P30

yc4 is an unstable yellow-in-the-dark mutation found in the unicellular green alga Chlamydomonas rein- hardtii, yc4 reverts to the wild-type green-in-the-dark phenotype at

The mutants belong to 14 nuclear complementation groups and one chloroplast locus that are required for the assembly of psaA mRNA.. The chloroplast locus, tscA, is remote from any

We have characterized a Chlamydomonas reinhardtii nuclear mutant that is unable to accumulate transcripts covering not only psbB (encoding P5, the 47 kDa chloro- phyll-a

Fractionation of LHCI by mildly denaturing methods showed that, in contrast to higher plants, the long wavelength fluorescence emission typical of LHCI (705 nm in C. reinhardtii)

However, the mutants Ar-204 G256D and DCMU-4 S264A had a several fold higher forward electron transfer lifetime 2,000 ps, altered apparent equilibrium constant for Q^Qg QaQb