Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

(1)

Thesis

Reference

Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

FERFOGLIA, Fabio

Abstract

Le but de ma thèse a été d'identifier les signaux responsables de l'adressage du complexe de splicing SF3a vers le noyau et d'étudier les domaines de ses sous-unités qui sont importants pour une localisation correcte du complexe. J'ai démontré que l'import de SF3a120 dans le noyau est indépendant de ses interactions avec les plus petites sous-unités et j'ai identifié une séquence de localisation nucléaire. J'ai démontré l'implication d'une région de 84 acides aminés, contenant le domaine SAP, dans l'adressage de SF3a60 vers les corps de Cajal (CBs). Cette même région est également suffisante pour envoyer deux protéines reporteurs vers les CBs, ce qui prouve que la région est nécessaire et suffisante pour l'adressage du complexe SF3a entier aux CBs. D'autre part, la partie amino-terminale de la coilin montre des similitudes de séquence avec le domaine SAP, ce qui indique une fonction similaire pour les séquences communes entre SF3a60 et la coilin.

FERFOGLIA, Fabio. Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a. Thèse de doctorat : Univ. Genève, 2007, no. Sc. 3926

URN : urn:nbn:ch:unige-13397

DOI : 10.13097/archive-ouverte/unige:1339

Available at:

http://archive-ouverte.unige.ch/unige:1339

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSITE DE GENEVE FACULTE DES SCIENCES

Département de biologie cellulaire Professeure A. Krämer

Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

THESE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention biologie

par

Fabio Ferfoglia

de Trieste (Italie)

Thèse N° 3926

2007

(3)

ACKNOWLEDGEMENTS

I would like to thank Angela Krämer for accepting me in her laboratory and giving me the possibility to work and learn during these five years of PhD training. I would like to thank Karla Neugebauer and Françoise Stutz for accepting to be member of the thesis jury.

Additional thanks to Françoise Stutz, for all the helpful discussions on my work and for the always interesting conversations about science during these years. Thanks to all the present and former members of Krämer’s group for their help and support and also for sharing pleasant after work moments. Special thanks to Ching-Jung Huang for all his enthusiasm and for his generosity at work and outside the lab. Thanks to Silvia for her support and for her always useful and precious advices. Thanks to Julian for his contagious optimism.

Thanks to Nicolas Antih for his essential assistance with the Resume en Français. Thanks to Flore Mulhaupt for her support and for the time spent listening to me during work and coffee breaks. Thanks to Madame Salamin for her help and support, and for the ‘italian moments’. Thanks to my dear friend Mauro Ceol, for suggestions, help, support and all the rest.

Thanks to my family: Luciana, Edi, and Marco for their continuous support. And of course, thank you Ruxi for being there next to me, always.

(4)

1

RESUME EN FRANÇAIS

L’épissage, processus au cours duquel les introns sont soustraits au transcrit naissant (ARN pré-messager) pour générer l’ARN messager mature, est un mécanisme essentiel de l’expression des gènes chez les eucaryotes. L’excision d’un intron consiste en deux réactions biochimiques qui sont catalysées par le “spliceosome”, un large complexe composé de petits ARN nucléaires (small nuclear RNAs : snRNAs) et de protéines. Les petites particules ribonucléiques nucléaires (snRNPs) U1, U2, U4/U6 et U5 (c’est-à-dire les snRNAs associés avec des protéines communes et spécifiques) s’assemblent sur le pré-messager avec des nombreuses protéines. Le facteur d’épissage constitutif SF3a se compose des trois sous-unités de 60, 66 et 120 kDa (SF3a60, SF3a66 et SF3a120), qui sont nécessaires pour la formation, et font partie du snRNP U2 de 17S. SF3a est aussi impliqué dans la formation du “spliceosome”

et se fixe à l’ARN pré-messager en amont du site de branchement de manière séquence- indépendante, facilitant ainsi l’ancrage du snRNA U2 pour la reconnaissance du site de branchment. La dépletion d’une seule sous-unité de SF3a par interférence d’ARN inhibe l’épissage in vivo, chaque sous-unité étant essentielle pour la viabilité des cellules.

Le but de ce travail est d’identifier les signaux responsables de l’adressage du complexe SF3a vers le noyau, et d’étudier les domaines des sous-unités de SF3a importants pour une localisation sous-nucléaire correcte de l’hétérotrimère.

L’étude des interactions entre les différentes sous-unités de SF3a a précédemment été menée, au cours de laquelle il a été démontré que SF3a60 et SF3a66 interagissent avec deux domaines indépendants de SF3a120, mais n’interagissent pas entre elles. Des analyses des domaines d’interaction nécessaires à la formation du complexe SF3a et des études de localisation in vivo on suggeré un modèle dans lequel l’hétérotrimère SF3a est formé dans le cytoplasme, puis importé dans le noyau, indépendemment de l’association avec la snRNP U2, qui est donc un événement nucléaire.

Au cours de cette thèse, j’ai analysé l’adressage de SF3a dans le noyau et ai démontré que l’import de SF3a120 dans le noyau est indépendant de ses interactions avec les plus petites sous-unités. J’ai pu identifier une séquence de localisation nucléaire (NLS) de type bipartite

(7)

4 dans la région carboxy-terminale de SF3a120, suffisante pour adresser une protéine cytoplasmique reporteur vers le noyau.

Les dernières étapes de la biogénèse des snRNPs ont lieu dans le noyau. Les particules nouvellement assemblées sont localisées de manière transitoire dans les corps de Cajal (CBs) avant d’être relâchées dans le nucleoplasme pour participer à l’épissage ou être adressées vers les tâches nucléaires (“speckles”), qui sont principalement impliquées dans le stockage, l’assemblage et/ou la modification des facteurs d’épissage des ARN pré-messagers. Les corps de Cajal sont des structures sous-nucléaires dynamiques qui s’assemblent et se désassemblent au cours du cycle cellulaire. Leur formation est dépendante de l’état transcriptionnel et de la vitesse de croissance des cellules.

Bien que de nombreux composants des corps de Cajal aient été identifiés et classifiés au cours des années passées, l’adressage des proteines vers ces derniers est encore mal compris. Le fait que plusieurs sinon tous les composants des corps de Cajal soient présents dans d’autres domaines nucléaires pourrait expliquer les difficultés rencontrées lors des tentatives d’élucidation des mécanismes impliqués dans cette localisation spécifique.

Des études précédentes ont montré que les sous-unités du SF3a endogène sont exclusivement nucléaires, avec un profil typiquement tacheté (dans les “speckles”), caractéristique d’autres facteurs d’épissage. En revanche, chacune des sous-unités de SF3a, exprimées de manière transitoire en fusion avec une protéine verte fluorescente (GFP), s’accumulent dans les tâches nucléaires, mais sont également retrouvées dans les corps de Cajal. De plus, des mutants de SF3a60 et SF3a66, incapables de s’associer à U2 snRNP, exprimés de manière transitoire, sont accumulés dans les corps de Cajal et sont absents des tâches nucléaires (“speckles”). Ces observations, parmi d’autres, indiquent que les dernières étapes de la biogénèse de U2 snRNP ont lieu dans les corps de Cajal.

Lors de l’analyse de la dépendance de l’import nucléaire de SF3a120 par rapport à l’interaction de cette protéine avec les sous-unités plus petites, il devint évident qu’un mutant SF3a120-GFP, dont le domaine d’interaction avec SF3a60 manquait, avait une localisation nucléoplasmique diffuse et ne montrait pas le profile tâcheté typique. Alors qu’un mutant de SF3a120 incapable d’interagir avec SF3a66 s’accumulait dans les corps de Cajal.

Ces observations ont fortement suggeré que SF3a60 contenait un signal nécessaire pour l’adressage du complexe SF3a vers les corps de Cajal.

(8)

5 Des données précédentes ont montré que la protéine de fusion GFP-SF3a60, manquant la région centrale de 50 acides aminés qui contient le domaine SAF-A/B, Acinus et PIAS (SAP), avait une localisation nucleoplasmique diffuse et ne localisait pas dans les tâches nucléaires.

Des expériences de localisation additionnelles avec les mutants de fusion GFP-SF3a60, m’ont permis de démontrer l’implication d’une région de 84 acides aminés, contenant le domaine SAP, dans l’adressage de SF3a60 vers les corps de Cajal. Cette même région est également suffisante pour envoyer deux protéines non-impliquées dans l’épissage vers les corps de Cajal, ce qui prouve que la région de SF3a60 contenant le domaine SAP est necessaire et suffisante de l’adressage du complexe SF3a entier aux corps de Cajal. D’autre part, la partie amino- terminale de la coilin, un marqueur communément utilisé des corps de Cajal, montre des similitudes de séquence avec le domaine SAP, ce qui semble indiquer une fonction commune pour les séquences en commun entre SF3a60 et la coilin.

(9)

6

SUMMARY

Splicing, the removal of introns from a nascent transcript (pre-mRNA) to generate the mature messenger RNA (mRNA), is an essential mechanism of gene expression in eukaryotes. Intron excision consists of two biochemical reactions which are catalyzed by the spliceosome, a large complex composed of small nuclear RNAs (snRNAs) and proteins. The U1, U2, U4/U6 and U5 snRNPs (i.e. snRNAs associated with common and particle-specific proteins) assemble on the pre-mRNA together with many non-snRNPs proteins. The constitutive splicing factor SF3a consists of three subunits of 60, 66 and 120 kDa (SF3a60, SF3a66 and SF3a120), which are required for the formation, and are part of the mature 17S U2 snRNP. SF3a is also required for spliceosome assembly and binds the pre-mRNA upstream of the branch site in a sequence- independent manner, thus facilitating the anchoring of U2 snRNA for the recognition of the branch site. RNAi-mediated depletion of single SF3a subunits inhibits splicing in vivo, and each subunit is essential for cell viability.

The aim of this work was to identify the signals responsible for targeting the SF3a complex to the nucleus and to study domains in the SF3a subunits, important for proper sub-nuclear localization of the heterotrimer. In the past, interactions between SF3a60, SF3a66 and SF3a120 have been studied in vitro, and it has been shown that SF3a60 and SF3a66 interact with two independent domains of SF3a120 but not with each other. Previous analyses of the interaction domains (IDs) required for the formation of the SF3a complex and in vivo localization studies suggested a model, in which the SF3a heterotrimer is formed in the cytoplasm and then imported into the nucleus independently of U2 snRNP association, which is consequently a nuclear event. In my thesis, I have analysed how SF3a is imported into the nucleus. I have demonstrated that SF3a120 nuclear import is independent from interactions with the smaller subunits. I identified a bipartite-type NLS in the C-terminal region of SF3a120, which is sufficient to target a cytoplasmic reporter protein to the nucleus.

The final steps in the biogenesis of snRNPs occur in the nucleus. Newly assembled snRNPs transiently localize in Cajal bodies (CBs) before they are released into the nucleoplasm to participate in splicing or are targeted to nuclear speckles, which are mainly involved in the storage, assembly and/or modification of pre-mRNA splicing factors. CBs are dynamic,

(10)

7 subnuclear organelles that assemble and disassemble during the cell cycle and their formation is dependent on the transcriptional status and growth rate of the cell. Although many CB components have been identified and classified in the past years, the targeting of proteins to CBs is still poorly understood. The fact that many, if not all components of CBs are found also in other nuclear domains could explain the difficulties to elucidate the dynamics of this specific localization.

Previous studies demonstrated that endogenous SF3a subunits are exclusively nuclear, with a typical speckled pattern characteristic for other splicing factors. In contrast, transiently expressed GFP-tagged versions of SF3a subunits, accumulate in nuclear speckles, but are also found in CBs. In addition, transiently expressed mutants of SF3a60 or SF3a66, which cannot associate with the U2 snRNP, accumulate in CBs and are absent from nuclear speckles. These and other observations indicated that the final steps of U2 snRNP biogenesis occur in CBs.

While analyzing whether nuclear import of SF3a120 depended on its interaction with the smaller subunits, it became evident that a GFP-tagged SF3a120 mutant lacking the SF3a60-ID had a diffuse nucleoplasmic localization and did not show the typical speckled pattern, whereas a SF3a120 mutant impaired in binding to SF3a66 accumulated in CBs. These observations strongly suggested that SF3a60 contained a signal required for the targeting of the SF3a complex to CBs.

Previous data revealed that GFP-tagged SF3a60, lacking a central region of 50 amino acids encompassing the SAF-A/B, Acinus and PIAS (SAP) domain, showed a diffuse nucleoplasmic localization and failed to localize in speckles. I performed additional localization experiments with GFP-tagged SF3a60 mutants, which demonstrate a role for an 84-amino acid region, containing the SAP domain, in targeting SF3a60 to CBs. The same region is sufficient to target two unrelated proteins to CBs, which provides strong evidence that the SAP-containing region of SF3a60 is responsible for targeting the whole SF3a complex to CBs. In addition, the N-terminal part of coilin, a widely used marker for CBs, shows sequence similarity to the SAP domain, which may be indicative of a common function for the sequences shared between SF3a60 and coilin.

(11)

8

1. INTRODUCTION

1.1 The organization of the cell nucleus

The presence of membrane-bound compartments, specialized in a multitude of different metabolic activities, reflects the high level of organization of the eukaryotic cell, and gives an indication on how simple life forms might have evolved into advanced multicellular organisms. The cell nucleus was the first intracellular structure to be observed about two centuries ago. Separated from the cytoplasm by a double membrane, it contains most of the cell genetic material organized in one or more chromosomes. The function of the cell nucleus, an extremely dynamic organelle, is to maintain chromosomal integrity and to regulate all the processes related to DNA replication and gene expression. The nucleus itself is highly compartmentalized and contains distinct substructures which are characterized by the absence of membranes (Dundr and Misteli, 2001; Handwerger and Gall, 2006; Lamond and Earnshaw, 1998). The complexity of processes such as DNA replication, RNA transcription and processing or the assembly of ribosomal subunits explains in part the presence of this nuclear compartmentalization and the necessity of an accurate regulation. As an example, gene expression is strictly dependent on chromosome localization in specific territories of the interphase nucleus and furthermore, the spatial arrangement of the whole genome can directly affect the function of DNA (Misteli, 2007; Spector, 2003).

Many nuclear components involved in a multitude of different processes localize in well- defined nuclear bodies, supporting the idea that the cell nucleus is subdivided into specialized sub-nuclear non-membrane-bound structures. The study of the interchromatin space by electron microscopy revealed the intricate and highly organized nature of the cell nucleus.

Recent work on amphibian (Xenopus laevis) oocyte nuclei showed that nucleoli, speckles and Cajal bodies (CBs) are composed of an heterogeneous mixture of electron-dense particles which constitute a “sponge-like” network of components (Handwerger and Gall, 2006).

Nucleoli, speckles and CBs are probably the most studied subnuclear compartments although

(12)

9 the list of newly discovered bodies/domains is constantly increasing in number, confirming that the nuclear space is more complex than previously thought (Spector, 2006).

Nucleoli are associated with ribosome biogenesis, from the synthesis and processing of rRNA to the assembly of the ribosomal subunits that will be exported to the cytoplasm. The mammalian cell nucleus normally contains 1-5 nucleoli, and their diameter can vary from 0.5 up to 5.0 µm (reviewed in Spector, 2001). Recent proteomic studies on isolated nucleoli identified more than 700 protein components, only 30% of which are linked to ribosome biogenesis, suggesting that the nucleolus has also other functions. Multiple lines of investigation have recently confirmed roles for nucleoli in stress responses, in the control of the cell cycle, and in the coordination of the biogenesis of other classes of functional snRNP (reviewed in Boisvert et al., 2007).

Nuclear speckles, often referred to as interchromatin granule clusters (IGC) for how they appear at the electron microscope, are mainly involved in the storage, assembly and/or modification of pre-mRNA splicing factors. For instance, experiments in living cells show that after transcriptional activation of specific genes, splicing factors are recruited from speckles to the sites of transcription (Misteli et al., 1997). In contrast, an accumulation of splicing factors in IGC is observed when pre-mRNA splicing is inhibited (O'Keefe et al., 1994). Their size can range from one to several micrometers in diameter and their composition consists of 20-25-nm granules connected by a thin fibril. In addition to splicing factors and speckle-associated proteins, which are involved in the release of splicing factors from these structures, various components of the transcription machinery have been found in nuclear speckles (Handwerger and Gall, 2006; Lamond and Spector, 2003).

CBs are ubiquitous subnuclear organelles found in many plants and animal cells. They have first been described more than hundred years ago in vertebrate neuronal tissues as “nucleolar accessory bodies” because of their frequent association with the nucleolus. More than sixty years later, “coiled bodies” were observed in the electron microscope in thin sections of mouse, rat and human cells, which were described as relatively spherical aggregates of 0.5 µm in diameter composed of coiled threads. These nuclear structures have only recently been named Cajal bodies as a tribute to their initial discoverer (Cioce and Lamond, 2005; Gall, 2000).

(13)

10 CBs are dynamic structures that assemble and disassemble during the cell cycle and their formation is dependent on the transcriptional status and growth rate of the cell (Carmo- Fonseca et al., 1993; Fernandez et al., 2002). CBs can vary in size from less then 0.2 µm up to more than 2 µm, and in number from one to six, depending on cell type and species. CB number can also vary during the cell cycle, and is maximal at mid-to-late G1 phase (Andrade et al., 1993). Normally, CBs are tethered within a confined nuclear space, probably through interactions with specific regions of chromatin (Platani et al., 2002), but they also move within the nucleus during interphase (Boudonck et al., 1999; Platani et al., 2000). CBs are enriched in factors involved in the biogenesis of nuclear RNA, such as pre-mRNA splicing, pre-rRNA processing and histone mRNA 3’ end formation (reviewed in Cioce and Lamond, 2005; Gall, 2000; Handwerger and Gall, 2006; Matera and Shpargel, 2006).

Although many CB components have been identified and classified in the past years (Cioce and Lamond, 2005), the targeting of proteins to CBs is still poorly understood. The fact that many, if not all, components of CBs are found also in other nuclear domains could explain the difficulties to elucidate the dynamics of this specific localization.

p80 coilin (Andrade et al., 1991; Raska et al., 1991) is widely used as marker for CBs, although it has been shown that its homologue in Xenopus is not an essential component of these structures (Bauer and Gall, 1997). In addition, a coilin knockout mouse is viable and the derived coilin^-/- mouse embryonic fibroblast cell line shows nuclear bodies named residual CBs (Tucker et al., 2001). Residual CBs are more numerous than coilin-positive bodies and their composition can vary. Only a subset of components normally present in CBs is present in residual CBs (Jady et al., 2003; Tucker et al., 2001). These findings suggest that coilin is not required for the targeting or the retention of all the factors that localize in CBs. The targeting to CBs of coilin itself has been studied, revealing that its N-terminal part (92 amino acids) is necessary and sufficient for the self interaction of the protein and for the localization in CBs (Hebert and Matera, 2000).

(14)

11

1.2 U snRNP biogenesis

Spliceosomal uridine-rich small nuclear ribonucleoproteins (U snRNPs) are key players in the removal of non-coding introns from the pre-messenger RNA (pre-mRNA) of eukaryotic genes. Before they participate in the splicing reaction, U snRNPs go through a complex process of maturation which takes place in part in the cytoplasm and in part in the nucleus (Figure 1).

Figure 1. The U snRNPs biogenesis pathway. After transcription and capping the pre-U snRNAs are exported to the cytoplasm where the pre-formed heteromeric complexes of Sm proteins interact with the Sm site to form the so-called core snRNPs. PRMT5 and SMN complexes facilitate this assembly step. Subsequent to cap hypermethylation and 3’ end trimming, the core snRNPs are imported into the nucleus. In the nucleus, the snRNAs are internally modified and snRNP-specific proteins, imported independently of the snRNA, associate with the core snRNPs to form the mature snRNP particle.

(15)

12 U6 snRNP biogenesis is an exception to this model and takes place exclusively in the nucleus, where the U6 snRNA is transcribed by RNA polymerase III (Will and Lührmann, 2001). The common assembly pathway of U1, U2, U4 and U5 snRNPs starts in the nucleus with the transcription of the U snRNA by RNA polymerase II and the acquisition of a monomethylated m⁷G cap. The U snRNA precursor is then exported to the cytoplasm, where a set of seven Sm proteins is added in a stepwise manner to form a ring surrounding the conserved Sm binding site on the RNA moiety. The addition of these particle-specific proteins to the snRNP Sm core structure is facilitated by the survival of motor neurons protein (SMN), a component of a large multiprotein complex (Gubitz et al., 2004) and by the PMRT5 complex (Meister et al., 2002).

3’ end trimming and hypermethylation of the cap (m3G) complete the cytoplasmic modifications and the snRNP core is re-imported into the nucleus. A properly assembled Sm core is required for the import of the particle into the nucleus and together with the m3G cap it forms a bipartite U snRNP nuclear localization signal (NLS) (Fischer et al., 1993; Narayanan et al., 2004). The precise cytoplasmic location of the snRNP assembly process is not known.

Nevertheless, a recent study using fly ovary as a model system identifies novel cytoplasmic bodies (U bodies) that contain uridine-rich snRNPs and other essential factors involved in snRNPs biogenesis (Liu and Gall, 2007).

The final steps in the biogenesis of snRNPs occur in the nucleus. Newly assembled snRNPs transiently localize in CBs before they are released to the nucleoplasm to participate in splicing or are targeted to nuclear speckles where they accumulate (Sleeman and Lamond, 1999). Analysis in vivo of cells depleted of U4/U6.U5 tri-snRNP specific proteins revealed an accumulation of stable U4/U6 di-snRNP, but not U5 snRNPs, in CBs (Schaffert et al., 2004).

Similarly, FRET experiments performed to study the subnuclear localization of specific snRNP intermediates also indicated an accumulation of U4/U6 di-snRNP complexes in CBs.

These observations strongly suggest that U4/U6 di-snRNP assembly is completed in CBs.

Interestingly, a mathematical model based on measurements of the cell nucleus in living HeLa cells supports the hypothesis that accumulation of specific snRNP components in CBs improves the efficiency of snRNPs assembly in the nucleus (Klingauf et al., 2006). Additional works revealed that proteins involved in the assembly of U4/U6 and U4/U6.U5 snRNPs are enriched in CBs (Makarova et al., 2002; Stanek et al., 2003). A role for CBs in the biogenesis of U2 snRNP has also been proposed (Nesic et al., 2004). In addition, modifications of the

(16)

13 snRNA itself also occur in CBs where a new class of small CB-specific RNAs (scaRNAs) has been identified (Darzacq et al., 2002).

1.3 Pre-mRNA splicing

The removal of introns from nascent transcripts (pre-mRNAs) is an essential mechanism for the production of mRNA in eukaryotes and is catalyzed by a large complex, the spliceosome.

In the major spliceosome, five U snRNPs (U1, U2, U4, U5, and U6), together with more then 100 non-snRNP proteins, assemble on the pre-mRNA and represent the key components of this large complex (Jurica and Moore, 2003) (Figure 2). The formation of the early splicing complex (complex E) results from a RNA-RNA interaction between U1 snRNP and the 5’

splice site, which in higher eukaryotes corresponds to the consensus sequence AG/GURAGU (R=purine). Simultaneously, protein factors recognize the 3’ splice site, characterized by the sequence YAG/G (Y=pyrimidine), the polypyrimidine tract and the intron branch point sequence (BPS). The U2 snRNP interacts with the pre-mRNA in complex E through weak interactions (Das et al., 2000), but only the ATP-dependent formation of the pre-spliceosome (complex A) results in stable base-pairing between U2 snRNA and the BPS, located between 15 and 40 nucleotides upstream of the 3’ splice site. Complex B is formed when the U4/U6.U5 tri-snRNP joins the spliceosome. U6, which base pairs with the 5’ splice site, also interacts with U2, playing a central role in the splicing reaction by juxtaposing the reactive sites of the pre-mRNA (Dybkov et al., 2006; Nilsen, 1998). Moreover, a catalytic role in splicing for U2 and U6 snRNAs, independent of associated proteins, has been proposed (Valadkhan, 2007;

Valadkhan and Manley, 2001). Finally, a series of rearrangements based on snRNA-snRNA and snRNA-pre-mRNA interactions are required for the formation of complex C and for intron removal.

The low-abundant, minor spliceosome catalyses the removal of an atypical class of introns, with non-canonical consensus sequences, from eukaryotic pre-mRNA. A different snRNP composition distinguishes the minor spliceosome from the major spliceosome, where the four snRNPs U11, U12, U4atac and U6atac replace U1, U2, U4 and U6, respectively. Because of these differences, minor-class introns are also referred to as U12-type introns. In contrast, U5

(17)

14 snRNP is present in both spliceosomes. On the whole, the mechanism of splicing of major and minor introns is very similar, indicating a common evolutionary origin (Patel and Steitz, 2003). A recent study based on comparison of homologues of minor-spliceosome-specific proteins and snRNAs in different organism points out the possibility of an early origin of the minor spliceosome (Russell et al., 2006).

Figure 2. Spliceosome assembling (from Will and Luhrmann, 2006). Various spliceosomal complexes are indicated. Exon and intron sequences are indicated by boxes and lines, respectively. The first two and the last two intron nucleotides (GU, AG), as well as the branch site adenosine (A) are shown.

Splicing, which leads to the excision of the intron and ligation of the 5’ and 3’ exons, is essentially the result of two catalytic transesterification reactions (Figure 3).

The first biochemical step is a nucleophilic attack of the 2’ hydroxyl group of the conserved adenosine at the branch site on the 3’,5’-phosphodiester bond at the 5’ splice site.

Subsequently, the 5’ terminal guanosine of the intron is covalently linked to the branch site adenosine in a 2’,5’-phosphodiester bond. The cleaved-off 5’ exon and the intron-3’ exon

(18)

15 intermediate, in a lariat configuration, are formed. The two exon are joined together in the second step of splicing, after the 3’ hydroxyl group of the 5’ exon attacks the phosphodiester bond at the 3’ splice site and the lariat intron is released (Krämer, 1996).

Figure 3. The catalytic steps of pre-mRNA splicing (adapted from Krämer, 1996). Exons are shown as boxes, the intron as a line. Conserved nucleotides (in mammalian pre-mRNAs) and the phosphates at the splice sites are indicated. The dashed arrows represent the nucleophilic attack of the hydroxyl groups on the splice junctions.

1.4 U2 snRNP and splicing

The human U2 snRNP has first been described as a 12S particle where the U2 snRNA moiety is bound to the core Sm proteins, common to other snRNPs, and to two U2-specific proteins named A’ and B’’ (Behrens et al., 1993). Two additional heteromeric complexes, SF3a and SF3b, are present in the splicing-active 17S U2 snRNP and are incorporated in vitro in a stepwise manner (Brosi et al., 1993a). Both SF3a and SF3b are essential for the formation of the pre-spliceosomal complex A, assisting in the selection of the BPS by stabilization of the U2 snRNP-BPS interaction (Gozani et al., 1996).

(19)

16 Work in different laboratories on the SF3b complex led to the identification and characterization of a total of eight proteins (SF3b155, SF3b145, SF3b130, SF3b49, SF3b14a/p14, SF3b14b, SF3b10, and SF3b125), seven of which are also present in the 17S U2 snRNP. SF3b125, in contrast, is absent from the mature U2 snRNP, presumably because it dissociates during the assembly process (Das et al., 1999; Krämer et al., 1999; Will et al., 2001; Will et al., 2002).

1.5 The SF3a complex

SF3a is a constitutive splicing factor, essential for splicing in vitro and in vivo (Brosi et al., 1993b; Krämer et al., 2005; Tanackovic and Krämer, 2005). The complex binds the pre- mRNA upstream of the branch site in a sequence-independent manner and facilitates the anchoring of U2 snRNA for the recognition of the branch site (Gozani et al., 1996; Query et al., 1996). SF3a consists of three subunits of 60, 66 and 120 kDa (SF3a60, SF3a66 and SF3a120) (Brosi et al., 1993b), which are all required for the formation of the active 17S U2 snRNP and pre-spliceosome assembly in HeLa cell extracts (Nesic and Krämer, 2001).

Moreover, RNAi-mediated depletion of SF3a subunits from HeLa cells inhibits splicing in vivo, and each subunit is essential for cell viability (Tanackovic and Krämer, 2005). Prp9, Prp11, and Prp21 represent essential yeast homologues of the human proteins and a recent proteomic analysis confirmed the composition of the yeast SF3a heterotrimer (Dziembowski et al., 2004).

(20)

17 Figure 4. Schematic representation of SF3a subunits and their interactions. The modular structure of the three SF3a subunits and the IDs are indicated. Known conserved domains are highlighted in black or gray. SAP, SAF-A/B, Acinus, and PIAS motif; Zn, Zn finger domain of the C2H2 type; S1 and S2, SURP1 and SURP2 domains; +/-, charged region; PxPP, proline-rich regions; UBL, Ubiquitin- like region; GVHPPAP, proline-rich heptapeptide repeats.

The SF3a120 subunit (793 amino acids; Figure 4) contains two SURP modules in its N- terminal part (Krämer et al., 1995). This tandemly repeated motif of 43 amino acids has been found in other proteins involved in pre-mRNA splicing, and it has first been described in the alternative splicing factor DmSWAP of Drosophila melanogaster (Spikes et al., 1994). The SURP domain is conserved among different organisms and the level of homology is significantly high between the human, yeast (Saccharomyces cerevisiae), and nematode (Caenorhabditis elegans) counterparts (reviewed in Krämer, 1996). A short stretch of charged amino acids (amino acids 254-268) is located C-terminal to the SURP modules. Two proline- rich regions of 35 amino acids and 119 amino acids are found in the central and C-terminal part of SF3a120, respectively. The protein ends with an ubiquitin-like domain (UBL), a 77- amino acid region with 29.6% identity and 54.5% similarity to the highly conserved 76-amino acid protein ubiquitin (Walters et al., 2004). The proline-rich regions and the UBL domain are present in the C. elegans homologue but not in the S. cerevisiae counterpart (Krämer, 1996;

Rain et al., 1996). While human and yeast SF3a120 are constitutive splicing factors (Arenas and Abelson, 1993; Tanackovic and Krämer, 2005), an additional function in alternative

(21)

18 splicing has been proposed for SF3a120 in D. melanogaster (Park et al., 2004). In addition to the well characterized role of SF3a120 in splicing as a component of the SF3a complex, two independent studies point at its participation in transcription regulation. It has been purified in the nuclear receptor corepressor complex N-CoR and has been found to interact directly with the transcription factor Sp1 (Gunther et al., 2000; Underhill et al., 2000).

The SF3a66 subunit (464 amino acids; Figure 4) contains a Zn finger domain of the C2H2 type in its N-terminal half (amino acids 56-78), which is necessary for the binding of the subunit to the 15S U2 snRNP intermediate and to form the mature 17S U2 snRNP (Nesic and Krämer, 2001). A similar Zn finger is also present in the yeast homologue Prp11 and is essential for viability (Legrain and Chapon, 1993). The level of sequence similarity between the human and S. cerevisiae proteins is very low, compared with other organisms such as mouse, the fruit fly D. melanogaster, the nematode C. elegans, the plant Arabidopsis thaliana and the yeast S.

pombe, where the N-terminal part of the protein is well conserved through evolution. In the C- terminal half, human, mouse, D. melanogaster, and A. thaliana homologues contain proline- rich heptapeptide repeats (GVHPPAP), however the function of these domains has not been characterized (Bennett and Reed, 1993; Nesic and Krämer, 2001). A non-canonical function for SF3a66, independent of splicing, has recently been reported in neuronal cells (Takenaka et al., 2004). Neurite extension was observed after over expression of SF3a66 in mouse neuroblastoma cells. Furthermore, pull-down assays identified β-tubulin as a SF3a66- associated protein, suggesting a completely unexpected role of SF3a66 as microtubule- associated protein in the cytoplasm.

The SF3a60 subunit (501 amino acids; Figure 4) contains a zinc finger domain of the C₂H₂ type (amino acids 408-431) in its C-terminal region. The S. cerevisiae counterpart Prp9 contains a corresponding domain with very high sequence homology and it has been shown that the two motifs are functionally exchangeable in vivo (Krämer et al., 1994). A central region of 50 amino acids in SF3a60 is well conserved in metazoans and A. thaliana (Meyer et al., 1998) and encompasses a stretch of 35 amino acids with homology to the SAF-A/B, Acinus, and PIAS (SAP) motif (Aravind and Koonin, 2000). The function of this conserved region is unknown, although it has previously been shown to be dispensable for pre- spliceosome assembly (Nesic and Krämer, 2001).

(22)

19

1.6 SF3a interactions

In yeast, genetic and physical interactions between Prp9, Prp11, and Prp21 have been demonstrated (Legrain and Chapon, 1993; Ruby et al., 1993) and correspond to the interactions between the human homologues, which have been studied extensively in vitro (Krämer et al., 1995; Nesic and Krämer, 2001). These reports and work by C.-J. Huang and F.

Mulhaupt (see section 2.2) indicate that SF3a60 and SF3a66 interact with two independent domains of SF3a120, but do not interact with each other. As summarized in Figure 4, the N- terminal part of SF3a60 (amino acids 1-107) mediates the interaction with a region of SF3a120 that contains the SURP2 domain (amino acids 145-243; Section 2.2 – Figure 2). A shorter sequence C-terminal to this domain in SF3a120 (amino acids 269-295) is required for the interaction with an N-terminal sequence of SF3a66 (amino acids 108-210; Section 2.2 – Figure 5). Similar results have been obtained with the homologous components of SF3a in S.

cerevisiae (Rain et al., 1996).

1.7 SF3a localization

Analysis of the intracellular localization of the SF3a complex indicated that the endogenous SF3a subunits are exclusively nuclear (Nesic et al., 2004) and display a typical speckled pattern characteristic of other splicing factors (Lamond and Spector, 2003). Indeed, each subunit can be detected by immunostaining in nuclear speckles, where they co-localize with SC35 (a marker for speckles), and throughout the nucleoplasm, but not in CBs (Figure 5).

(23)

20 Figure 5. Localization of endogenous SF3a subunits. HeLa cells were fixed with cold methanol and immunostained with antibodies against SF3a120, SF3a66, and SF3a60 as indicated (upper panel).

HeLa cells transiently transfected with 120-FL were fixed in cold methanol and immunostained with anti-SC35 (lower panel). A computer generated overlay is indicated (merge). Bar, 10 µm.

In contrast, transiently expressed GFP-tagged versions of SF3a components, accumulate in nuclear speckles, but are also found in CBs (Nesic et al., 2004). Interestingly, cells depletion of SF3a by RNAi reduced the level of U2 snRNA and U2-B” in the nucleoplasm although their localization in CBs was not affected (Tanackovic and Krämer, 2005).

Based on in vitro experiments to determine the interaction domains required for the formation of the SF3a complex and on in vivo localization studies, a model was proposed in which the SF3a heterotrimer is formed in the cytoplasm and then imported into the nucleus independently of U2 snRNP association, which is consequently a nuclear event (Nesic and Krämer, 2001; Nesic et al., 2004). When the maturation process of U2 snRNP is complete, the particle is rapidly released from CBs and diffuses into the nucleplasm. This behaviour can explain why endogenous SF3a is not detected in CBs at steady state. Interestingly, after deletion of the central region of 50 amino acids encompassing the SAP domain, the speckled localization of the GFP-tagged SF3a60 subunit is compromised, and the protein localizes diffusely in the nucleus (Nesic et al., 2004). The SAP domain has therefore been proposed to be involved in the targeting of the protein to nuclear speckles, although additional experiments

(24)

21 are required to address this question. The deletion of the Zn finger motifs resulted in the accumulation of SF3a60 and SF3a66 in CBs and no speckles were observed (Nesic et al., 2004). Given that the Zn finger motifs are required for the incorporation of the two subunits into the mature 17S U2 snRNP (Nesic and Krämer, 2001), this observation indicated that the final steps of U2 snRNP biogenesis take place in CBs.

1.8 Nucleocytoplasmic Transport

The existence of nuclear and cytoplasmic compartments, separated by the nuclear envelope, is a major feature of all eukaryotic cells. A direct consequence of this compartmentalization is that all macromolecules have to be transported by an active process, in both directions, through nuclear pore complexes (NPCs) located in the nuclear envelope. Passive diffusion through the pores is only possible for ions and small proteins, whereas larger molecules require instead appropriate targeting signals and involve active transport (Nigg, 1997). The system of karyopherins (carrier proteins), termed importins and exportins, assists, respectively, nuclear import or export of macromolecular cargos (Radu et al., 1995). An important group of carrier proteins is the β-karyopherin superfamily, which mediates the shuttling between nucleus and cytoplasm of the transported substrate either directly or via the adaptor protein importin α. The active process of nuclear transport requires energy, which is mainly provided by GTP hydrolysis by Ran, a member of the ras superfamily of small GTPases (Koepp and Silver, 1996). Like other GTPases, Ran cycles between a GTP- and a GDP-bound state in a regulated manner to maintain RanGTP enriched in the nucleus and RanGDP in the cytoplasm.

The specific distribution of two different forms of Ran guarantees a control in binding and releasing of cargo and maintains the directionality of nuclear transport. The first step in nuclear import consists in the selective recognition of the substrate by the importin receptor.

Localization signals consisting of short amino acidic sequences have been identified on several classes of cargos and are referred to as nuclear localization signals (NLSs). The most common types of NLS contain basic residues in either one (monopartite) or two (bipartite) clusters. Two well known examples are the monopartite NLS of the SV40 large T antigen (PKKKRRV) which was the first NLS to be characterized (Kalderon et al., 1984) and the

(25)

22 bipartite NLS of nucleoplasmin (KRPAATKKAGQAKKKK) (Robbins et al., 1991). Bipartite NLSs usually contain a spacer of 10 or 12 amino acids, but exceptions to this rule have been reported (Nath and Nayak, 1990), consisting in a longer distance between the two clusters of basic amino acids.

1.9 Aim of the thesis

The first aim of this work was to identify the signal or signals responsible for targeting the SF3a complex to the nucleus. As mentioned above, all three subunits have a modular structure with numerous conserved motifs. Interacting sequences, required for the formation of the complex, have been identified in all subunits (section 2.2; Krämer et al., 1995; Nesic and Krämer, 2001). In addition, their implication in the nuclear import of SF3a60 and SF3a66 has been tested (Nesic et al., 2004). In contrast, previous experiments (Dobrila Nesic, unpublished data) indicated that nuclear targeting of SF3a120 does not depend on its association with SF3a60 or SF3a66. Based on these observations it is possible that SF3a120 contains a NLS responsible for targeting of the entire SF3a complex to the nucleus. Alternatively, additional nuclear targeting signals could be formed upon assembly of the SF3a heterotrimer. Given that nuclear import of SF3a120 has not been studied in detail, we decided to investigate this issue further with the experiments presented in this thesis.

The second aim of this work was to identify which regions or domains in the SF3a subunits are important for the proper sub-nuclear localization of the heterotrimer. The interesting phenotypes observed after deletion of the Zn finger domains from SF3a60 and SF3a66 (i.e., the accumulation of the two GFP-tagged mutants in CBs) represented strong evidence in favour of CBs being sites of U2 snRNP maturation (Nesic et al., 2004). Nevertheless, the fact that SF3a60 and SF3a66 impaired in their interaction with U2 snRNP localized and accumulated in CBs, left an open question on how the proteins (and the whole SF3a complex) are targeted to CBs. Likewise, the disappearance of the speckled pattern after deletion of the SF3a60 SAP domain raised another interesting issue concerning the role of this evolutionary conserved motif in the sub-nuclear localization of SF3a60.

(26)

23 Previous work indicated that the nuclear distribution of the SF3a complex in a characteristic speckled pattern requires an intact heterotrimer (Nesic et al., 2004). The experiments performed in this thesis were aimed to establish which domains in the SF3a subunits are important for correct sub-nuclear localization of the heterotrimer.

(27)

24

2. THESIS

2.1 Publication: Structure-function analysis of the U2 snRNP-associated splicing factor SF3a

My contribution as it relates to the experimental work reviewed in the paper consists in the identification in silico of putative NLSs of the SF3a subunits and subsequent functional analysis of these signals in localization experiments (Section: Biogenesis of the mature U2 snRNP).

(28)

25

(29)

26

(30)

27

(31)

28

(32)

29

2.2 Manuscript: Analysis of domains in the SF3a subunits required for protein-protein interactions, nuclear targeting and proper intranuclear localization

Fabio Ferfoglia, Ching-Jung Huang, Flore Mulhaupt and Angela Krämer

Department of Cell Biology, Faculty of Sciences, University of Geneva, Switzerland

The authors contributed to the experiments as follows:

Fabio Ferfoglia: Analysis of SF3a subunits in localization studies. Sections: 2.2.3.3 (Figure 5D); 2.2.3.4; 2.2.3.5; 2.2.3.6; 2.2.3.7; 2.2.3.8; 2.2.3.9; 2.2.3.10; 2.2.3.11

Ching-Jung Huang: Interactions between SF3a subunits. Sections: 2.2.3.1; 2.2.3.2

Flore Mulhaupt: Interactions between SF3a subunits. Section 2.2.3.3

(33)

30

2.2.1 Abstract

The final steps in the biogenesis of small nuclear ribonucleoproteins (snRNPs) take place in the nucleus after import of the core snRNPs from the cytoplasm. Human splicing factor SF3a is a component of the mature 17S U2 snRNP and its three subunits of 60, 66 and 120 kDa are essential for splicing in vitro and in vivo. The SF3a complex forms in the cytoplasm and enters the nucleus independently of the core U2 snRNP. Previous work indicated that the association between SF3a and the U2 snRNP takes place in Cajal bodies (CBs). Here we have defined the minimal regions in SF3a66 and SF3a120 required for the formation of the SF3a heterotrimer and studied signals for nuclear import of SF3a. In addition, we identified an evolutionarily conserved sequence in SF3a60, encompassing the SAF-A/B, Acinus and PIAS (SAP) domain, which is necessary and sufficient for targeting proteins to CBs. This region shares sequence similarity with the N-terminal part of p80-coilin, which is also required for CB targeting.

(34)

31

2.2.2 Introduction

The mammalian cell nucleus hosts a diverse array of complex processes such as DNA replication, RNA transcription and mRNA processing and is therefore necessarily a highly compartmentalized and dynamic structure. Nuclear components involved in different processes localize to well-defined nuclear bodies/domains, supporting the idea that the nucleus is subdivided into specialized sub-nuclear non-membrane-bound structures. Currently, there is a long and constantly increasing list of sub-nuclear bodies being characterised (Spector, 2006).

The nuclear compartmentalization suggests the existence of an accurate regulation in the targeting/localization of different factors to this multitude of sub-nuclear structures.

Nuclear speckles, often referred to as interchromatin granule clusters for their appearance in the electron microscope, are mainly involved in the storage, assembly and/or modification of splicing factors (Lamond and Spector, 2003). In addition, speckle-associated proteins, involved in the release of splicing factors from speckles, and components of the transcription machinery have been found in these structures (Handwerger and Gall, 2006).

CBs are ubiquitous sub-nuclear organelles found in plants and animal cells. They are dynamic structures that assemble and disassemble during the cell cycle and their formation is dependent on the transcriptional status and growth rate of the cell (Carmo-Fonseca et al., 1993;

Fernandez et al., 2002). CBs can vary in size, depending on cell type and species, and in number during the cell cycle (Cioce and Lamond, 2005). They are tethered within a confined nuclear space, probably through interactions with specific regions of chromatin (Platani et al., 2002), but also move within the nucleus during interphase (Boudonck et al., 1999; Platani et al., 2000). CBs are enriched in factors involved in the biogenesis of nuclear RNA, such as pre- mRNA splicing, pre-rRNA processing and histone mRNA 3’ end formation (Cioce and Lamond, 2005; Gall, 2000; Handwerger and Gall, 2006; Matera and Shpargel, 2006).

Although many CB components have been identified and classified in the past years, the targeting of proteins and snRNPs to CBs is still poorly understood (reviewed in Cioce and Lamond, 2005). Hebert and Matera (2000) demonstrated that the N-terminal 92 amino acids of p80-coilin, widely used as a marker for CBs (Andrade et al., 1991; Raska et al., 1991), are necessary and sufficient for its localization in CBs and for self interaction.

(35)

32 Spliceosomal uridine-rich small nuclear ribonucleoproteins (U snRNPs) are key players in the removal of non-coding introns from the pre-mRNA of eukaryotic genes. Newly assembled snRNPs first transiently localize in CBs before they are released into the nucleoplasm to participate in splicing or targeted to nuclear speckles where they are thought to be stored (Sleeman and Lamond, 1999). The maturation process of U1, U2, U4 and U5 snRNPs starts in the nucleus with the transcription of the U snRNA molecule by RNA polymerase II and the acquisition of a monomethylated m⁷G cap (Will and Lührmann, 2001). The U snRNA precursor is then exported to the cytoplasm, where a set of seven Sm proteins is added in a stepwise manner to form a ring surrounding the conserved Sm binding site on the snRNA. The addition of these proteins to the snRNP Sm core structure is facilitated by the survival of motor neurons protein (SMN), a component of a large multiprotein complex (Gubitz et al., 2004) and by the PMRT5 complex (Meister et al., 2002). 3’ end trimming and hypermethylation of the cap complete the cytoplasmic modifications and the snRNP core is re- imported into the nucleus.

The final steps in the biogenesis of snRNPs occur in the nucleus. Recent work suggests that the assembly of the U2 and U4/U6·U5 tri-snRNP is completed in CBs (Nesic et al., 2004;

Schaffert et al., 2004; Stanek et al., 2003). Moreover, modifications of snRNAs also occur in CBs, where a new class of small CB-specific RNAs (scaRNAs) has been identified (Darzacq et al., 2002). U6 snRNP biogenesis takes place exclusively in the nucleus, where the U6 snRNA is transcribed by RNA polymerase III.

Splicing factor 3a (SF3a) is essential for splicing in vitro and in vivo (Brosi et al., 1993b;

Tanackovic and Krämer, 2005). It consists of three subunits of 60, 66 and 120 kDa (SF3a60, SF3a66 and SF3a120), which are all required for the formation of the active 17S U2 snRNP and pre-spliceosome assembly in HeLa cell extracts (Nesic and Krämer, 2001). In the early steps of pre-mRNA splicing, U2 snRNA base pairs with the branch site sequence, located 15- 40 nucleotides upstream of the 3’ splice site, thus converting complex E into the pre- spliceosomal complex A. SF3a binds to the pre-mRNA upstream of the branch site in a sequence-independent manner and facilitates the anchoring of U2 snRNA for base pairing with the branch site (Gozani et al., 1996; Query et al., 1996).

In vitro, the functional U2 snRNP is assembled in a stepwise manner. Splicing factor SF3b binds to the 12S U2 snRNP to form a 15S intermediate, which is converted into the 17S U2

(36)

33 snRNP after binding of SF3a (Brosi et al., 1993a). All SF3a subunits are essential for this process (Nesic and Krämer, 2001).

Based on previous results we proposed a model, where the SF3a complex assembles in the cytoplasm and enters the nucleus independently of the core U2 snRNP (Nesic et al., 2004).

Thus far, the role of single SF3a subunits in nuclear import of the complex is poorly understood and their contribution to subnuclear targeting to sites of snRNP maturation has not been investigated.

The endogenous SF3a subunits are detected in nuclear speckles and throughout the nucleoplasm by immunostaining. In addition, transiently expressed GFP-tagged versions of SF3a components accumulate in nuclear speckles, but are also found in CBs (Nesic et al., 2004). C2-H2-type Zn finger domains present in SF3a60 and SF3a66 are required for incorporation of SF3a into the U2 snRNP (Nesic and Krämer, 2001). GFP-tagged SF3a60 and SF3a66, lacking these domains, accumulate in CBs and are not detected in speckles, which strongly suggested that the final steps of U2 snRNP biogenesis take place in CBs (Nesic et al., 2004).

The N-terminal ca. 100 amino acids of SF3a60 mediate the interaction with a region encompassing the second SURP domain of SF3a120 (Krämer et al., 1995; Nesic and Krämer, 2001) (Figure 1). The N-terminal half of SF3a66 binds to a region C-terminal to the second SURP domain of SF3a120. It has furthermore been shown that these regions in SF3a60 and SF3a66 are necessary for targeting to the nucleus, suggesting that SF3a is imported as preformed complex.

To define determinants for SF3a heterotrimer formation and targeting to splicing compartments in further detail, we have analysed the minimal regions in SF3a120 necessary for SF3a60 and SF3a66 binding. We have defined a nuclear targeting signal (NLS) in SF3a120, which is sufficient for nuclear import of a cytoplasmic protein, and most likely the major determinant for nuclear import of the SF3a heterotrimer. We also show that interactions between the SF3a subunits are required for correct sub-nuclear localization of the complex.

We have identified an 84-amino acid region in SF3a60, encompassing the SAF-A/B, Acinus and PIAS (SAP) domain (Aravind and Koonin, 2000), which is necessary and sufficient for targeting SF3a to CBs. This region is well conserved in humans, flies, worms and plants

(37)

34 (Meyer et al., 1998; Nesic and Krämer, 2001) and shares sequence similarity with a region in coilin required for CB targeting.

Figure 1. Schematic representation of SF3a subunits and their interactions.The modular structure of the three SF3a subunits and the IDs are indicated. The known conserved domains are highlighted in black or gray. SAP, SAF-A/B, Acinus, and PIAS motif; Zn, Zn finger domain of the C2H2 type; S1 and S2, SURP1 and SURP2 domains; +/-, charged region; PxPP, proline-rich regions; UBL, Ubiquitin- like region; GVHPPAP, proline-rich heptapeptide repeats.

2.2.3 Results

2.2.3.1 The SF3a60-interaction domain (ID) of SF3a120

The region in SF3a120 required for interaction with SF3a60 was analyzed in GST pull-down assays. Full-length GST-tagged SF3a60 expressed in E. coli was bound to glutathione agarose beads and incubated with in vitro-translated, ³⁵S-methionine-labeled SF3a120 proteins. We had previously shown that the SF3a60-ID of SF3a120 lies between amino acids 151 and 242, encompassing the SURP2 domain (Krämer et al., 1995). To define the borders of the interaction domain in further detail, N- and C-terminal deletion mutants of SF3a120 (Figure

(38)

35 2A) were incubated with GST-SF3a60. Full-length SF3a120 (120-FL) and 120/1-224 efficiently interacted with SF3a60; however further C-terminal truncation to 217 or 205 amino acids abolished binding (Figure 2C, lanes 2-5). The N-terminal deletion mutants 120/145-793, 157-793 and 164-793 bound SF3a60, but no interaction was seen with 120/180-793 (lanes 6- 9). These results indicate that the N- and C-terminal borders of the SF3a60-ID in SF3a120 lie between amino acids 164-180 and 217-224, respectively. When we tested whether amino acids 164-224 of SF3a120 are also sufficient for SF3a60 binding, the interaction was reduced to almost background levels (Figure 2D, lane 10). Addition of seven amino acids at the N- terminus did not improve binding (120/157-224; lane 7). The interaction slightly improved after extending the protein by 19 amino acids at the N or C termini (120/145-224 and 120/157- 243, respectively; lanes 5 and 8). Finally, adding 19 amino acids at each side resulted in binding comparable to that of full-length SF3a120 or truncated proteins 120/1-224 or 120/164- 793 (compare lane 6 with lanes 2, 3 and 4). Thus, amino acids 145-243 of SF3a120, but not SURP2 alone, are sufficient for binding to SF3a60 and SURP1 is not required for the interaction. Moreover, although SURP1 is very similar in sequence, it does not interact with SF3a60, as previously shown (Krämer et al., 1995)

To further analyze the role of SURP2 in the binding of SF3a120 to SF3a60, single conserved amino acids in this domain in the context of full-length SF3a120 were mutated (Figure 3A).

Mutation of Ala171 to Leu abolished binding (Figure 3B, lane 5). Similarly, the double mutant Gly178Val/Tyr193Cys did not interact with SF3a60 (lane 9). Binding of mutant Val167Ser was decreased (lane 4) and slight effects were visible with mutants Val166Ala and Val174Ser (lanes 3 and 7). Mutations in other residues tested (Phe173Ala and Arg176Ile; lanes 6 and 8) did not affect the interaction. These results indicate that certain residues in SURP2 are necessary for contacts between SF3a120 and SF3a60 and confirm that the SURP2 domain is essential for the interaction.

(39)

36 Figure 2. Amino acids 145-243 of SF3a120 are sufficient for SF3a60 binding. (A) Scheme of in vitro-translated SF3a120 proteins. Protein domains are as in Figure 1. Numbers refer to the N and C termini of SF3a120 mutants. (B) Comassie staining of recombinant GST (lane 1) and N-terminal GST- tagged full-length SF3a60 (lane 2). (C) GST pull-down with N- and C-terminal SF3a120 deletion mutants. In vitro-translated proteins indicated above the figure were incubated with GST (lane 1) or GST-60-FL (lanes 2-9) bound to glutathione agarose. Only 120-FL was incubated with the GST control (lane 1, co.). Bound (top) and input proteins (bottom) were separated by SDS PAGE and visualized by autoradiography. The migration of protein markers is indicated in kDa to the right of each panel. (D) GST pull-down with internal fragments of SF3a120. The experiment was performed as in (C).

A

B

(40)

37 Figure 3. The SURP2 domain is necessary for SF3a60 binding.(A) Multiple sequence alignment of SF3a120 SURP2 domain (top). Sequences were aligned with ClustalW (http://www.ebi.ac.uk/clustalw/). At1, Arabidopsis thaliana; At2, A. thaliana; Ca, Candida albicans;

Ce, Caenorhabditis elegans; Dm, Drosophila melanogaster; Dr, Danio rerio; Hs, Homo sapiens; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe. Blue and light blue shading indicates identical and conserved amino acids, respectively. Amino acid substitutions generated by site-directed mutagenesis are shown on the bottom. (B) GST pull-down with full-length SF3a120 proteins carrying point mutations in SURP2. The experiment was performed as in Figure 2C.

A

(41)

38 2.2.3.2 The SF3a66-ID of SF3a120

The site of SF3a66 binding has been roughly mapped to amino acids 243-373 of SF3a120 (Krämer et al., 1995). To analyse the SF3a66-ID in further detail, GST-66/1-216, which interacts with SF3a120 as efficiently as full-length SF3a66 (Nesic and Krämer, 2001), was incubated with truncated SF3a120 proteins as above. The C-terminal deletion mutants 120/1- 295 and 120/1-289 bound to SF3a66 similar to 120-FL (Figure 4B, lanes 2, 4 and 5), but binding was abolished when residues immediately C-terminal to the charged region were deleted (lane 3). Similarly, an N-terminal deletion of the charged region did not affect the interaction (lane 7), but further truncation to amino acids 295 or 307 completely inhibited binding (lanes 8 and 9). To test whether amino acids 269-295 were sufficient for SF3a66 binding, GST-tagged fragments of SF3a120 were used, to aid the detection of the small proteins. Figure 4C shows that in vitro-translated SF3a66/1-216 interacted equally well with GST-120/269-295 and GST-120/164-307, indicating that the region between amino acid 269 and 295 of SF3a120 is sufficient for SF3a66 binding (Figure 4C, lane 4). As expected from the results with the N-terminal SF3a120 truncations, SF3a66 did not bind proteins comprising SURP1, SURP2 (Figure 4C, lane 2 and 3) or the charged region (data not shown).

(42)

39 Figure 4. A 27- amino acid motif immediately C-terminal to the charged region of SF3a120 is sufficient for SF3a66 binding. (A) Scheme of in vitro-translated and N-terminal GST-tagged SF3a120 proteins. Protein domains are as in Figure 1. Numbers refer to the N and C termini of SF3a120 mutants. (B) GST pull-down with N- and C-terminal SF3a120 deletion mutants. The experiment was performed as in Figure 2C. (C) GST pull-down with proteins corresponding to internal segments of SF3a120. GST (lane 1) or GST-tagged SF3a120 proteins (as indicated above lanes 2-5) were bound to glutathione agarose and incubated with in vitro-translated, full-length SF3a66 (input lane in top panel). Bound proteins were separated by SDS PAGE and detected by autoradiography (top). GST-tagged proteins used were separated by SDS PAGE and stained with Coomassie blue (bottom). Full-length proteins are marked with black circles.

Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

Thesis

Reference

Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

UNIVERSITE DE GENEVE FACULTE DES SCIENCES

Département de biologie cellulaire Professeure A. Krämer

Protein domains required for nuclear and sub-nuclear targeting of subunits of splicing factor SF3a

THESE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention biologie

par

Fabio Ferfoglia

de Trieste (Italie)

Thèse N° 3926

2007

ACKNOWLEDGEMENTS

TABLE OF CONTENTS

RESUME EN FRANÇAIS

SUMMARY

1. INTRODUCTION

1.1 The organization of the cell nucleus

1.2 U snRNP biogenesis

1.3 Pre-mRNA splicing

1.4 U2 snRNP and splicing

1.5 The SF3a complex

1.6 SF3a interactions

1.7 SF3a localization

1.8 Nucleocytoplasmic Transport

1.9 Aim of the thesis

2. THESIS

2.1 Publication: Structure-function analysis of the U2 snRNP-associated splicing factor SF3a

2.2 Manuscript: Analysis of domains in the SF3a subunits required for protein-protein interactions, nuclear targeting and proper intranuclear localization

2.2.1 Abstract

2.2.2 Introduction

2.2.3 Results

A

B

A

A