• Aucun résultat trouvé

Despite the gained information on the origin of the 2j inversion, the underlying causes for the phenotypical differences between 2st and 2j individuals remain unknown. As mentioned above, one possibility is that the inversion breakpoints affect the expression of adjacent genes. In this particular case, the TE insertions at the breakpoint junctions could also modify by themselves the expression of flanking genes in 2j chromosomes. Since the location of the inversion breakpoints is known at a molecular level, these nearby genes can be easily identified (FIGURE 9 and TABLE 3) and the search for expression changes can be attempted.

The distal breakpoint (AB) is located downstream of gene Rox8, an mRNA-binding protein that is part of the U1 ribonucleoprotein complex and that might be involved in the regulation of alternative mRNA splicing (MOUNT and SALZ 2000, KATZENBERGER et al.

2009). In region A, and therefore outside the inverted segment, the coding region of Rox8 ends approximately 1.5 kb away from the breakpoint in the sequenced 2st line and is located 2319 bp away from the Galileo inserted at the breakpoint in line j-1, even though 841 bp correspond to an ISBu-1 element insertion (CÁCERES et al. 1999). Rox8 mRNA extends closer to the breakpoint (FIGURE 10a). To try to determine the length of Rox8 3’ UTR, D.

melanogaster and D. buzzatii non-coding sequences downstream of Rox8 coding region

FIGURE 9 | Genes flanking inversion 2j breakpoints in 2st (top) and 2j (bottom) chromosomes. The distal breakpoint is represented at the left and the proximal breakpoint at the right. The centromere (Cen) of D.

buzzatii chromosome 2 is also indicated. Vertical black arrows mark the breakpoints and white boxes on 2j chromosome correspond to the TE insertions found at the breakpoint junctions. The letters A, B, C and D indicate the single-copy sequences flanking each side of the two breakpoints (CÁCERES et al. 1999, CÁCERES et al.

2001). Note that the positions of B and C (located inside the inverted segment) are exchanged in the 2j chromosome with respect to the 2st non-inverted arrangement. Colored arrows represent the genes found close to the inversion breakpoints. The arrows indicate the direction of transcription of each coding region.

were aligned and a considerable sequence similarity was found between them, including an extraordinarily conserved 100-bp sequence that presents only five mismatches and one 1-bp indel between these two distantly related species. This relatively high similarity between species is lost abruptly 19 bp (in the D. buzzatii sequence) after the only conserved polyA signal (AAUAAA), which strongly suggests that this could be the end of the 3’ UTR (FIGURE

10a). This would then result in a 875-bp long 3’ UTR located 649 bp away from the distal breakpoint in the 2st arrangement, but separated only by 347 bp from the ISBu-1 insertion present in region A in inverted chromosomes (CÁCERES et al. 2001). In region B, also at the distal breakpoint but inside the inverted segment, 1 kb of D. buzzatii single-copy DNA was sequenced during the course of this and previous works without finding any significant similarity to any known coding region. However, recent data provided by the ongoing D.

buzzatii sequencing project have allowed the location of the inversion 2j distal breakpoint in the 18-kb long contig 104 and gene kuk (CG5175) has been identified as the closest gene in region B (Alfredo Ruiz, personal communication). More precisely, the initial methionine of kuk is located 2584 bp away from the breakpoint in the 2st arrangement. It is interesting that,

A B C D

Rox8 CG13617 Pp1-96A nAcR-96A Cen

2st

A B C D

Cen

2j

inversion 2j

Rox8 CG13617 kuk Pp1-96A nAcR-96A

kuk

according to FlyBase , in D. melanogaster this gene has two alternative TSSs located 2468 and 307 bp upstream the coding region, even though the resulting 5’ UTRs are shorter (161 and 151 bp, respectively) because both include introns that are spliced out. These two non-coding exons could not be found in the D. buzzatii sequence based on sequence conservation, and therefore, the approximate location of the TSS in this species has not been determined.

However, the fact that in D. melanogaster the gene kuk possesses a promoter and a non-coding exon ~2.5 kb upstream of the coding region raises the possibility that the expression of this gene could be altered by the presence of the breakpoint or the TEs in chromosomes carrying inversion 2j if such elements exist also in D. buzzatii. The protein encoded by kuk seems to be implicated in the cellularization and morphogenesis of the embryonic epithelium (PILOT et al.

2006), but its molecular function is unknown. Unlike the proximal breakpoint, which is located in a highly conserved block of genes, the gene downstream Rox8 is a gene called spas in all the sequenced Drosophila species, except for D. virilis and D. mojavensis (CLARK et al. 2007), the two that are phylogenetically closest to D. buzzatii. In all these species the gene kuk is also located in chromosome arm 3R but in a more distant position with respect to Rox8. This change in gene order probably is due to one of the multiple inversions that have caused the complete reshuffling of genes within chromosome arms during Drosophila genus evolution (RANZ et al. 2001).

With respect to the proximal breakpoint (CD), it is surprising that it is located in a relatively gene-rich region of chromosome 2. Two genes, Pp1α-96A and nAcRβ-96A, were originally thought to be flanking the proximal breakpoint in the 2st arrangement in regions C and D, respectively (CÁCERES et al. 1999). Gene Pp1α-96A is located inside the inverted segment in region C and thus changes its position in those chromosomes carrying inversion 2j (FIGURE 9). Pp1α-96A encodes a protein serine/threonine phosphatase involved in amino acid dephosphorylation (DOMBRÁDI et al. 1990) and only 667 bp separate its initial methionine from the proximal breakpoint. Its TSS could be as close as 140 bp away from the breakpoint based on sequence homology with D. melanogaster Pp1α-96A 5’ UTR (data obtained from FlyBase ) (FIGURE 10b). However, in this case sequence conservation of the non-coding sequences upstream Pp1α-96A coding region is much lower than for Rox8 downstream sequences and, although the putative D. melanogaster TSS is found within a short stretch of sequence (10 bp) conserved in D. buzzatii (which might suggest that this sequence is

FIGURE 10 | Proximity of Rox8 and Pp1α-96A transcripts to inversion 2j breakpoints. (a) End of Rox8 coding region in j-1 line. (b) Beginning of Pp1α-96A coding region in st-1 line. Coding sequences are depicted as bright colored boxes while lighter colors represent UTRs. The solid black line corresponds to single-copy non-coding sequences. Colored arrows above each image indicate the direction of transcription and vertical black arrows mark the inversion breakpoints. The lengths of different parts of these sequences are shown above or below a thin bar spanning the corresponding distance. For each gene, the line where this region has been more extensively sequenced has been represented. Both gene sequences are partial since only the last exon of Rox8 and the first two exons of Pp1α-96A have been sequenced in D. buzzatii and are therefore included in the figure. Part of the alignments (performed using MUSCLE ) between D. buzzatii (Dbuz) and D. melanogaster (Dmel) non-coding sequences used to determine the end of Rox8 3’ UTR and the TSS of Pp1α-96A are shown below each diagram. Identification of the putative end of transcription for Rox8 gene is based on a sudden decrease in the level of sequence conservation together with the presence of a conserved polyA signal, but a shorter 3’ UTR is indicated for this gene in FlyBase . For Pp1α-96A we relied on the current information about the TSS available in FlyBase for this gene in D. melanogaster to determine a possible start point in D. buzzatii. See main text for additional details.

functionally important), the precise location of this gene TSS is not clear (FIGURE 10b). At the other side of the breakpoint, outside the inverted segment in region D, the initial methionine of gene nAcRβ-96A coding region is located ~3.8 kb away from the breakpoint in the sequenced 2st line and ~4 kb in the 2j line (because of a 166-bp tandem duplication and several small indels). This gene encodes a subunit of a nicotinic acetylcholine-activated channel involved in ion transport (LITTLETON and GANETZKY 2000).

a

TABLE 3 | Genes adjacent to inversion 2j breakpoints. Each column in the table indicates the following information: BP, breakpoint adjacent to each gene in the 2st arrangement; Region, according to CÁCERESet al.

(1999, 2001) single-copy sequence surrounding the breakpoints where each gene is situated; Position, location of each gene with respect to the inverted segment; Orientation, end of the gene closest to the corresponding breakpoint; Coding region, distance from the stop codon or initial methionine of a gene to the breakpoint; UTR, distance from the putative end of the 3’ or 5’ UTR to the breakpoint. When distances are different in the sequenced 2st and 2j lines the two values are given. The points of start and finish of the transcripts have been inferred from sequence conservation in alignments with D. melanogaster sequences (see main text and FIGURE 10).

Question marks indicate unknown data.

Distance to BP Gene BP Region Position Orientation Coding

region UTR nAcRβ-96A Proximal D Outside 5'

3998 bp (2j) 3860 bp (2j)

nicotinic acetylcholine-activated cation-selective channel involved in ion transport

CG13617 Proximal D Outside 3' 12 bp ? unknown

1 The sequence includes a 841-bp IsBu-1 insertion. 2 Gene identified in contig 104 of the draft assembly of the D.

buzzatii genome sequencing project (Alfredo Ruiz, personal communication).

A preliminary analysis of the expression of the two genes initially considered to be the closest to the breakpoints, Pp1α-96A and Rox8, had been previously performed by semi-quantitative RT-PCR and Northern blot using one 2st line and one 2j line (CÁCERES et al.

1999). In spite of the proximity of the breakpoint, no expression differences were detected between the two lines for any of these two genes, which led to the conclusion that the adaptive value of the inversion did not seem to be related to mutations caused by the inversion breakpoints. Nonetheless, in 2000, the genome sequence of D. melanogaster was completed (ADAMS et al. 2000), and a new putative ORF, CG13617, was predicted between genes Pp1α-96A and nAcRβ-96A. This novel gene would be located in region D with respect to the proximal breakpoint in D. buzzatii, and therefore outside the inverted segment (FIGURE

9 and TABLE 3). The comparison of region D single-copy sequence with the D. melanogaster genome revealed that D. buzzatii region D indeed showed homology with the last exons of CG13617 (CÁCERES et al. 2001). A careful annotation of the available sequence unveiled that

CG13617 stop codon was located only 12 bp away from inversion 2j proximal breakpoint.

Thus, gene CG13617 became the closest gene to any of the breakpoints and its astonishing proximity made it a perfect candidate to search for position effects of either the inversion or the inserted TEs on its expression, reopening the possibility that one of the breakpoints affected somehow the expression of a gene and contributed to the evolutionary success of inversion 2j.

1.5 Objectives

The sequencing of an increasing number of complete genomes has revealed a great degree of structural variation, a type of genetic variation whose importance had previously been overlooked in many species. In this work, we have focused on chromosomal inversions, a type of chromosomal rearrangement that is known to be maintained by natural selection in Drosophila. Molecular mechanisms underlying the effects of inversions remain unknown and, even though several theories are able to explain the spreading and maintenance of inversions in populations, evidence of what is actually happening in natural inversions is scarce. The main goal of this thesis is to investigate how inversions are able to affect phenotype and become the subject of natural selection. In order to do this, we have studied a particular case: inversion 2j of D. buzzatii.

Inversion 2j has been extensively studied in our research group: from the determination of its phenotypic effects on size (RUIZ et al. 1991) and developmental time (BETRÁN et al. 1998), to the elucidation of the mechanism that originated the inversion (CÁCERES et al. 1999) and the molecular characterization of the TE insertions at the breakpoint junctions in several inverted chromosomes (CÁCERES et al. 2001). To date, there is not a single case of an inversion for which the whole story can be told: from the mechanisms responsible for its origin to the genetic basis of how it is able to affect phenotypic traits. This work represents the next step in the study of this well-known polymorphic inversion: the assessment of the presence of expression changes in the genes adjacent to the breakpoints.

These position effects could be caused either by the TEs inserted at the breakpoint junction

or by the inversion of a substantial portion of the chromosome. Specifically, we will focus on CG13617, a novel gene whose coding region is the closest to one of inversion 2j breakpoints.

This work is divided in two parts. In the first part we characterize gene CG13617 in D.

buzzatii, we compare its expression between lines carrying inversion 2j and those with the 2st non-inverted arrangement to identify any differences that might be caused by the inversion, and finally we try to determine the molecular mechanisms responsible for the detected expression changes. In the first part we intend to answer the following questions: