• Aucun résultat trouvé

Transposable elements can induce transcription of adjacent sequences

transposon-induced antisense RNA

4.1.3 Transposable elements can induce transcription of adjacent sequences

Although the exact localization of the antisense RNA 5’ end could not be determined, transcription appears to start within a 698-bp GalileoK element inserted inside the GalileoG element that originated inversion 2j (CÁCERES et al. 2001). Transcripts initiated within TEs may seem unusual events, but increasing evidence reveals that they are quite frequent in many species. For example, in wheat, retrotransposon Wis 2-1A LTRs drive the synthesis of

new transcripts from adjacent sequences, including the antisense or sense strands of known genes (KASHKUSH et al. 2003). These sense and antisense RNAs increase or decrease, respectively, the expression levels of the corresponding genes. A recent high-throughput analysis of TSSs revealed that 6-30% of mouse and human RNA transcripts initiate within repetitive elements (retrotransposons, satellite DNA, or simple repeats) (FAULKNER et al.

2009). Retrotransposons located 5’ of protein-coding genes can function as alternative promoters or express non-coding RNAs. Also, more than 25% of mouse and human genes contain retrotransposons in their 3’ UTRs, and on average they show reduced expression levels when compared to retrotransposon-free transcripts (FAULKNER et al. 2009). These two examples demonstrate that retrotransposons (LTR-retrotransposons in plants and mainly LINEs and SINEs in mammals) are able to generate transcripts that can affect the expression of adjacent genes, either causing the transcription of the coding region from an alternative promoter (which may result in expression in different tissues or with a different timing) or by synthesizing regulatory RNAs.

However, Galileo, which seems to be driving the transcription of the antisense RNA affecting CG13617 expression, is not a retrotransposon but a DNA transposon (MARZO et al.

2008). Another well-known example of repeated elements transcribing into adjacent sequences involving a DNA transposon is that of the Stellate (Ste) gene in D. melanogaster. In this case, the TE 1360 (also known as Hoppel) causes the synthesis of antisense transcripts in the bidirectionally transcribed Y-linked Supressor of Stellate [Su(Ste)] repeats that are able to cause Ste silencing in testes of wild-type males, even though the X-linked Ste gene and Su(Ste) present only 90% nucleotide identity. Transgenic constructs revealed that a short 102-bp sequence of Ste that contains only 33 transcribed nucleotides is sufficient to confer Su(Ste)-dependent silencing of a LacZ reporter gene (ARAVIN et al. 2001). Su(Ste) is essential for male fertility and its deletion leads to abnormalities in spermatogenesis. Antisense Su(Ste) RNA has three different start sites within the 1360 sequence, with the longest transcript containing 441 bp of TE, and seems to be non-polyadenylated (ARAVIN et al. 2001). All the transcripts involved co-localize in the nuclei of spermatocytes, which indicates that they are expressed at the same time and are able to form dsRNA in vivo. The detection of short RNAs resulting from Ste and Su(Ste) dsRNA processing together with the fact that mutations of genes involved in the RNAi pathway eliminate Ste silencing, suggest that a post-transcriptional mechanism is regulating Ste

expression in the D. melanogaster germ line (ARAVIN et al. 2004). Moreover, the P element has also been reported to induce transcription of adjacent sequences in certain alleles of gene nup154 due to the presence of an outward-reading promoter near its 3’ end (KIGER et al.

1999). As mentioned above, the recent identification of Galileo transposase in D. buzzatii and other Drosophila species has allowed to classify Galileo as a TE related to 1360 and P elements that belongs to the P superfamily of DNA transposons (MARZO et al. 2008). Thus, it is surprising that the few known examples of regulatory transcripts initiated inside DNA transposons have all been found in Drosophila species and caused by related TEs. This could suggest that, in spite of their different structures and sequences, Galileo and 1360 might share some features that facilitate their ability to originate transcripts that extend into adjacent sequences.

Why are TEs capable of driving transcription of adjacent regions? Since they can provide promoter and cis-regulatory regions, TEs have been proved to be a rich source of regulatory elements that can be used by the host organisms to evolve regulatory mechanisms for the genes adjacent to them. In particular, TEs can contain promoters, enhancers, TFBSs, insulators, splice sites or polyadenylation signals within their sequences (MEDSTRAND et al.

2005, FESCHOTTE 2008). For example, 25% of experimentally characterized human promoters contain TE-derived sequences (JORDAN et al. 2003). This same study showed that ~8% of all proximal promoter regions (500 bp upstream of known TSSs) and 2.5% of known TFBSs are located within TEs. Besides, many promoters and polyadenylation signals in human and mouse genes are derived from primate-specific and rodent-specific TEs (VAN DE LAGEMAAT

et al. 2003). These regulatory elements come from two possible origins. On the one hand, TEs provide raw material from which cis-regulatory elements can evolve de novo through point mutations (BRITTEN 1996a, MEDSTRAND et al. 2005). This is the case of certain human Alu sequences that are able to bind PAX6, a critical transcription factor involved in the development of the eye, pancreas and central nervous system (ZHOU et al. 2002). On the other hand, TEs already have pre-existing ready-to-use regulatory elements (promoters and TFBSs) to control their own expression that can be incorporated into the natural regulation of adjacent genes directly or after modifications of the surrounding environment. For example, B2 SINE elements carry an active RNA pol II promoter able to induce transcription (FERRIGNO et al. 2001) and the TFBSs that make possible that an LTR of endogenous

retrovirus ERV-L acts as a promoter for human gene B3GALT5 were already present in the original consensus sequence for this kind of LTRs (DUNN et al. 2005). In the case of the retrotransposon-initiated transcripts detected in mice and humans mentioned above, the vast majority started in previously unidentified promoters (FAULKNER et al. 2009).

The GalileoK element downstream CG13617 is a small 698-bp defective copy that does not have any coding capability. However, this copy could have retained the pre-existing promoter that Galileo must contain in order to control transcription of its own transposase gene, which is needed for transposition. To determine if this copy of GalileoK includes the region where the natural promoter of Galileo should be located (upstream of the transposase gene) and the relative orientation of this sequence with respect to gene CG13617, the GalileoK sequence was aligned with the full-length copy of Galileo isolated in D. buzzatii (MARZO et al.

2008). The GalileoK sequence aligns completely within the 1229-bp Galileo TIRs (results not shown) and does not seem to contain any of the internal sequences where the transposase gene and its promoter are located (in fact, the same alignment is obtained with the two possible orientations of the full-length Galileo). Therefore, no evidences could be found that this copy of GalileoK includes the original promoter of the TE or that it is responsible for the transcription of the antisense RNA.

Alternatively, a new promoter sequence could have evolved de novo in this particular copy through the introduction of one or a few point mutations, becoming able to recruit RNA polymerase and initiate transcription when a determined combination of transcription factors are present in the cell. Most cis-regulatory elements like TFBSs tend to be short and degenerate in sequence (WRAY et al. 2003), so it is not difficult to imagine that the high mutation rate that occurs in TEs (where most of the time there is no selective pressure to maintain the sequences) can originate such elements. In fact, putative promoter elements could be predicted in the TIRs of GalileoK using the NEURAL NETWORK PROMOTER

PREDICTION tool (results not shown), although the small size and degenerate nature of regulatory sequences and TFBS, together with the lack of knowledge about many types of core promoter sequences makes them difficult to predict bioinformatically and these results are not always very reliable. For example, MCPROMOTER, a different software, does not predict any promoter in the GalileoK element where the antisense RNA is thought to initiate

(results not shown), even though it detects putative promoters in other TEs inserted at the proximal breakpoint junction of line j-1. However, this TEs are not found in all 2j chromosomes, but specific insertions that occurred in this particular line and that are probably not related to the antisense RNA production, a feature shared by all chromosomes carrying the inversion. In any case, a newly evolved promoter sequence could have been maintained by natural selection if the presence of the antisense in embryos turned out to be useful for the individual. TE insertions that have acquired a function are usually conserved in different species or present high frequencies inside one species. In the inversion 2j, complex insertions made up of multiple nested TEs are found at the breakpoints. However, as mentioned above, the TE that provides the antisense transcript promoter seems to be present in all 2j chromosomes (CÁCERES et al. 2001) which might suggest that it acquired a function useful for the host that caused its increase in frequency in D. buzzatii populations.

Moreover, the promoter contained within the GalileoK copy seems to work mainly in embryos, the only developmental stage where it is able to generate the antisense transcript that causes the CG13617 expression change with respect to 2st individuals. FAULKNER et al. (2009) observed that the transcripts starting inside retrotransposons in the human and mouse genomes are frequently tissue-specific, with 35% of all retrotransposon-associated transcripts showing spatially or temporally restricted expression, in contrast to the 17% observed in transcripts initiated in non-repetitive elements. VAN DE LAGEMAAT et al. (2003) also found many cases where TE-derived promoters contribute to tissue-specific gene expression, like the placenta-specific promoter of the human CYP19 gene or the erythroid-specific promoter of the carbonic anhydrase (CA1) gene, which are both found within LTRs. The presence of specific TFBSs within TE sequences can explain these expression patterns, which will be restricted to those tissues (or moments) where (or when) the transcription factor able to bind them is expressed. For example, in a D. melanogaster strain, the insecticide resistance gene Cyp6g1 shows an increased expression restricted to tissues important for detoxification in larvae due to cis-regulatory sequences located within an Accord retrotransposon (CHUNG et al.

2007), and in humans several genes adjacent to ERV retroelement copies containing a p53 binding site are expressed in response to DNA damage (WANG et al. 2007). Besides, there are some TEs that are expressed only in certain tissues, so we could expect that if their regulatory elements were coopted by the host individual to express adjacent genes, at least in the initial

stages of the process, these genes would display the same tissue-specific activity. For example, in D. melanogaster P element is only able to transpose in germ cells due to a regulatory mechanism controlling the expression of the transposase gene (RIO 2002). Also, the tissue-specific expression of retroelements roo, strongly expressed during embryogenesis in certain restricted regions of the embryo (BRONNER et al. 1995), and F, transcribed in specific cells of the female and male germ lines and in various tissues during embryogenesis of D. melanogaster (KERBER et al. 1996), are both mediated by internal cis-acting elements contained inside the transposon. Therefore, the expression of the CG13617 antisense RNA predominantly in embryos is not an uncommon phenomenon.