A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in human cancers

(1)

Thesis

Reference

A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in human cancers

PISIGNANO, Giuseppina

Abstract

Reduced expression of E-cadherin, encoded by the CDH1 gene, is frequent in epithelial tumours and is associated with the acquisition of invasive, stem cell-like and metastatic properties. However, the molecular mechanisms underlying the loss of E-cadherin expression are not fully understood. In this project, we uncover a complex network comprising a promoter- associated noncoding RNA (paRNA), microRNA and epigenetic regulators that controls transcription of E-cadherin in epithelial cancers. E-cadherin silencing relies on the formation of a complex between the paRNA and microRNA-guided Argonaute 1 that, together, recruit SUV39H1 and induce repressive chromatin modifications in the gene promoter.

Notably, we found that a single nucleotide polymorphism (rs16260) linked to increased cancer risk alters the secondary structure of the paRNA, with the risk allele facilitating the assembly of the microRNA-guided Argonaute 1 complex and gene silencing.

PISIGNANO, Giuseppina. A non-coding RNA network influenced by genetic

polymorphism controls E-cadherin expression in human cancers. Thèse de doctorat : Univ. Genève, 2016, no. Sc. 5037

DOI : 10.13097/archive-ouverte/unige:93953 URN : urn:nbn:ch:unige-939533

Available at:

http://archive-ouverte.unige.ch/unige:93953

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSITÉ DE GENEVE FACULTÉ DES SCIENCES Section des Sciences Pharmaceutiques Professeur Leonardo Scapozza

INSTITUTE OF ONCOLOGY Tumor Biology and Experimental RESEARCH (IOR) Therapeutics Program Professeur Carlo V. Catapano

___________________________________________________________

A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in

human cancers

THÈSE

Présentée à la Faculté des Sciences de l’Université de Genève

pour obtenir le grade de Docteur ès sciences, mention sciences pharmaceutiques

Par

GIUSEPPINA PISIGNANO De

Bellano (Italie)

Thèse N° 5037

LECCO 2016

(3)

(4)

1

List of Abbreviations 4

Summary 9

Résumé 12

Introduction

16 1. EPIGENETIC REGULATORY MECHANISMS

17 1.1 DNA methylation

17 1.2 Histone modifications and histone code

19 1.3 DNA methylation and histone modifications crosstalk

23 2. GENETIC VARIATIONS, EPIGENETIC MECHANISMS AND CANCER

26 2.1 Non coding variants impact on genetic expression

26 2.2 Genetic variation in long non-coding RNAs

30 2.3 Cancer epigenetics

33 2.4 Epigenetics and therapy of cancer

37 3. PROMOTER-ASSOCIATED NON-CODING RNAs

41 3.1 Directionality of promoter transcription

44 3.2 Characterized promoter-associated RNAs in genes regulation

47 4. SHORT REGULATORY NON-CODING RNAS

50 4.1 RNA interference and post-transcriptional regulation

50 4.2 miRNAs biogenesis and regulatory pathways

52 4.3 Transcriptional gene regulation by promoter-targeted RNAs

56 4.4 Promoter-targeted RNAs require paRNAs

59 4.5 Interactions of long and short non-coding RNAs

62

(5)

2

5. E-CADHERIN

66 5.1 E-cadherin in signal transduction and cell physiology

66 5.2 E-cadherin silencing in human cancer

67 Overall hypothesis and experimental plan

69 Materials and methods

74 Results 91

1- Promoter-associated transcripts control CDH1 transcription

92 2- Mapping and characterization of promoter-associated

96 transcript in CDH1 promoter

3- Sub-cellular distribution of CDH1 promoter-associated

99 transcripts

4- Promoter-associated transcripts termination, stability and 100

degradation 5- S-paRNA contributes to the E-cadherin repression

102 6- Promoter-associated transcripts regulate in cis E-cadherin

105 7- Knockdown of S-paRNA affects the cancer cell phenotype

106 8- AGO1 c ontributes to CDH1 transcriptional regulation 109

9- AGO1 binds to the CDH1 promoter and to the promoter-

110 associated transcripts 10- AGO1 binds selectively the S-paRNA 112

11- SUV39H1 contributes to CDH1 gene silencing 115

12- S-paRNA secondary structure determination 118

13- Identification of a microRNA associated with the 122

AGO1/S-paRNA complex 14- IsomiR-4534 guides AGO1 recruitment and CDH1 repression 123 15- Sequence specific microRNA/S-paRNA interaction 128

16- A single nucleotide polymorphism in the CDH1 promoter

130 influences AGO1/S-paRNA interaction 17- The rs16260 (C/A) SNP influences recruitment of AGO1 to

134 the CDH1 promoter 18- The rs16260 (C/A) SNP influences S-paRNA secondary 136

structure

(6)

3

Discussion 141

Future Directions 153

Appendix

158 - Appendix I 159

- Appendix II 171

References 184

Curriculum vitae 224

Publications

227

(7)

4

List of abbreviations

(8)

5 agRNA

AML AR AS

BER BS-seq CASP8 CCND1 CCR5 CBC CDH1^L CDH1^H CDK CDR1 ceRNA CHD ChIP ChIP-seq circRNA COX-2 CPA CSC CUT DHFR DNase-seq DNMT dsRNA DUB eEF1a1 EMT eQTLs ESC EZH2

antigene RNA

Acute Myeloid Leukemia Androgen Receptor Antisense

Base Excision Repair Bisulfite sequencing Caspase-8

cyclin D1

Chemokine Receptor 5 Cap Binding Complex CDH1 low-expressing cells CDH1 high-expressing cells Cyclin-Dependent Kinase

Cerebellar Degeneration-Related protein 1 competing endogenous RNA

Chromodomain-Helicase DNA-binding protein Chromatin Immunoprecipitation

Chromatin Immunoprecipitation followed by sequencing circular lncRNA

Cyclooxygenase-2

Cleavage and Polyadenylation Cancer Stem Cell

Cryptic Unstable Transcript Dehydrofolate Reductase

DNase I–hypersensitive site identification by sequencing DNA Methyltransferase

double-stranded RNA Deubiquitinase

Elongation Factor 1-alpha 1

Epithelial-to-Mesenchymal Transition Expression Quantitative Trait Loci Embryonic Stem Cell

Enhancer of Zest 2

(9)

6 FDA

FRET GAS5

GRO-seq GRF GTF GWAS HA HAT HDAC HDM HGMD HMT HTT IGF2 LINE lncRNA LOI

MBD MDS MECP2 miRNA MLH1 MLL MRE MVP ncRNA NEXT NET-seq NF-Κb NFR NMIA NMR

Food and Drug Administration

Fluorescence Resonance Energy Transfer Growth Arrest-Specific 5

Global Run On sequencing General Regulatory Factor General Transcription Factor Genome-Wide Association Studies Hemagglutinin

Histone Acetyl-Transferase Histone Deacetylase Histone Demethylase

Human Gene Mutation Database Histone Methyl- Transferase Human Huntingtin

Insulin-like Growth Factor-2

Long Interspread Transposable Element long non-coding RNA

Loss Of Imprinting

Methyl CpG-Binding Domain Myelodysplastic Syndrome Methylcytosine-binding Protein 2 microRNA

MutL Homolog-1

Mixed-Lineage Leukemia miRNA Response Element Major Vault Protein non-coding RNA

Nuclear Exosome Targeting

Native Elongating Transcript sequencing Nuclear Factor Fappa-B

Nucleosome-Free Region N-Methylisatoic Anhydride Nuclear Magnetic Resonance

(10)

7 NNS

PALR paRNA PAS PASR PcG PCR PIC piRNA poly(A) PR PRC pri-miRNA pre-miRNA PROMPT RACE RASSF1A RBP RIP RISC RITS RNAa RNA-ChIP RNAi RNAPII rRNA RT-PCR S

SAFA SAHA saRNA SHAPE shRNA

Nrd1-Nab3-Sen1

Promoter-Associated Long RNA promoter-associated RNA Polyadenylation Signal

Promoter-Associated Small RNA Polycomb Group

Polymerase Chain Reaction Pre-Initiation Complex piwi-interacting RNA polyadenylation

Progesterone Receptor

Polycomb Repressive Complex primary miRNA

precursor miRNA

PROMoter uPstream Transcript Rapid Amplification of cDNA End Ras Association domain Family 1 RNA-Binding Protein

RNA-immunoprecipitation RNA-Induced Silencing Complex

RNA-Induced Transcriptional gene Silencing RNA activation

RNA Chromatin Immunoprecipitation RNA interference

RNA Polymerase II ribosomal RNA

Reverse-Transcriptase Polymerase Chain Reaction Sense

Semi-Automated Footprinting Analysis Suberoylanilide Hydroxamic Acid small activating RNA

Selective 2'-Hydroxyl Acylation analyzed by Primer Extension short hairpin RNA

(11)

8 SINE

siRNA snoRNA SNP SUMO SUT TASR TALR TES TET TF TGA TGS tiRNA tRNA TSS TSSa-RNA TTS uaRNA UBC UNT uPA UTR VEGF VHL

Short Interspread Transposable Element small interfering RNA

small nucleolar RNA

Single Nucleotide Polymorphism Small Ubiquitin-related Modifier Stable Unannotated Transcript Termini-Associated Short RNA Termini-Associated Long RNA Transcription End Site Ten Eleven Translocation Transcription Factor

Transcriptional Gene Activation Transcriptional Gene Silencing transcription-initiation RNA transfer RNA

Transcription Start Site

Transcription Start Site associated RNA Triplex Target DNA Site

upstream antisense RNA Ubiquitin C

Upstream Non-coding Transcript urokinase Plasminogen Activator Untranslated Region

Vascular Endothelial Growth Factor Von Hippel-Lindau disease

(12)

9

Summary

(13)

10

Reduced expression of E-cadherin, encoded by the CDH1 gene, correlates with increased tumor invasion, metastasis and poor prognosis. Loss of E-cadherin triggers epithelial-to- mesenchymal transition (EMT) and cancer stem cells (CSC) phenotype in epithelial cells.

In many human cancers E-cadherin is epigenetically silenced. However, the molecular mechanisms underlying the loss of E-cadherin expression are not fully understood.

Growing evidence show that non-coding RNAs play significant regulatory roles in the epigenetic machinery and disease in complex organisms.

In line with this, we uncovered a complex network of promoter-associated RNAs (paRNAs), transcribed in sense (S-paRNA) and antisense orientation (AS-paRNA) and associated with the CDH1 gene promoter that dictated transcriptional activity and E- cadherin expression. In particular, we found that sense and antisense paRNAs were differentially expressed in CDH1 low-expressing (CDH1^L) and high-expressing (CDH1^H) cells, respectively. This correlation was also confirmed in additional experimental models with reduced CDH1 mRNA and increased S-paRNA level. Moreover, the relationship between paRNA and CDH1 mRNA was observed in human prostate tumor samples, underlining how the relative abundance of S and AS- transcript characterize distinct transcriptional state of the gene.

We found that S-and AS-paRNAs are differently located in cells. While the S-paRNA was almost exclusively associated to the chromatin fraction, a substantial portion of AS- paRNA was also detected in the nucleoplasm and cytoplasm in CDH1^H cells, suggesting that this physical and temporal distinct non-coding transcription is related to their function in CDH1 regulation. Interestingly, we found that the S-paRNA played in cis a central role in this mechanism, recruiting the Argonaute protein AGO1 and the histone methyl transferase SUV39H1 to the promoter and thereby coordinating the transcriptional silencing of the gene. Depletion of S-paRNA triggered CDH1 transcription and increased level of E-cadherin reverts the cancer cell phenotype. We identified a mediator of the AGO1 and S-paRNA interaction, the isomiR-4534, a miRNA variant derived from a non-canonical editing processing and participating in CDH1 transcriptional silencing. We also defined a minimal AGO1 binding region, located at the 3’ end of S-paRNA (-280/-57) and characterized the structural motifs associated with the isomiR-4534 binding site.

Based on the role of S-paRNA in CDH1 gene regulation, we investigated the potential impact of single nucleotide polymorphisms present in the gene promoter. The rs16260

(14)

11

(C/A) polymorphism at -160bp from the CDH1 TSS was already reported having an impact of the risk of developing several cancers, including prostate cancer. However, the association with CDH1 expression remained not explained. In this study, we showed how the rs16260 (C/A) SNP modulated the function of S-paRNA and thereby affect the epigenetic regulation of E-cadherin expression. This effect was likely mediated by allele- specific changes in the S-paRNA folding and secondary structure.

We showed that CDH1 promoter harbouring the S-paRNA TSS and the -160A variant had lower activity in reporter luciferase assays. Moreover, we found that the presence of rs16260 (A) affected the binding of AGO1 to the S-paRNA and in turn the recruitment of AGO1 to the promoter, which reflected also in the differential recruitment of SUV39H1.

Performing Selective 2'-Hydroxyl Acylation analyzed by Primer Extension (SHAPE), we determined and compared the RNA secondary structures of S-paRNA with either the - 160A or -160C SNP. We found that the SNP rs16260 has altereted SHAPE reactivity, consistent with a RNA secondary structure change at and around the -160 (C/A) site in the S-paRNA. We propose that this structural change within the AGO1 interacting region and adjacent to the isomiR-4534 binding site in the S-paRNA explains the difference in AGO1 recruitment and the consequent differential activity of the CDH1 promoter between the two allelic variants.

Overall, our findings elucidate how the interplay between paRNAs on CDH1 promoter modulates CDH1 transcription and how genomic variation such as SNPs in non-protein promoter coding regions affect epigenetic mechanisms and contribute to transcriptional regulation by altering the structure and ability of paRNAs to recruit regulatory proteins.

These findings open new scenarios on the epigenetic regulation of CDH1 expression and provide a molecular explanation for the impact of non-coding genetic variants in malignant transformation and tumor progression.

(15)

12

Résumé

(16)

13

La diminution de l’expression de l’E-cadhérine, encodée par le gène CDH1, correspond à une prolifération accrue des tumeurs, des métastases, assortie d’un pronostic léger. La perte d’E-cadhérine provoque une transition épithélio-mésenchymateuse (EMT) et des cellules souches cancéreuses (CSC) avec modification du phénotype des cellules épithéliales. Dans de nombreux cancers humains, l’E-cadhérine est épigénétiquement silencieuse. Quoi qu’il en soit, les mécanismes moléculaires qui sous-tendent la perte d’expression de l’E-cadhérine ne sont pas complètement compréhensibles. Il est de plus en plus évident que les ARN non-codants jouent d’importants rôles régulateurs dans le processus épigénétique et le développement des maladies dans les organismes complexes.

En rapport avec ce qui précède, nous avons découvert un réseau complexe d’ARN promoteurs associés (paARN), transposés selon une orientation sens (S-paARN) et antisens (AS-paARN) et associés avec le gène promoteur CDH1 qui a dicté l’activité transcriptionnelle et l’expression de l’E-cadhérine. Nous avons découvert, en particulier, que le sens et l’antisens paARN étaient différentiellement exprimés respectivement dans des cellules de faible expression (de type CDH1^L) et de haute expression (de type CDH1^H). Cette corrélation a été également confirmée dans des modèles expérimentaux additionnels avec un niveau de CDH1 réduit lorsqu’il est en mARN et augmenté lorsqu’il est en S-paARN. De plus, la relation entre paARN et CDH1 mARN a été observée dans des échantillons de tumeur de prostate humaine, soulignant combien la relative abondance de S et d’AS- transposé caractérise distinct transcriptionnel état du gène.

Nous avons trouvé que les S- et les AS-paARN sont situés à différents endroits dans les cellules. Alors que le S-paARN était presque toujours associé à la fraction chromatine, une portion substantielle d’AS-paARN a été détectée dans le nucléoplasme et le cytoplasme des cellules CDH1^H, suggérant que cette transcription distincte, physique, temporelle et non-codante est associée à leur fonction dans la régulation du CDH1.

Curieusement, nous avons découvert que the S-paARN jouait in cis un rôle central dans ce mécanisme, en recrutant la protéine Argonaute AGO1 et l’histone méthyl transférase SUV39H1 pour le promoteur et, par conséquent, en coordonnant la mise sous silence transcriptionnelle du gène. La diminution du S-paARN a provoqué la transcription du CDH1 et le niveau accru d’E-cadhérine donne lieu à un phénotype de cellules cancéreuses. Nous avons identifié un médiateur de l’interaction entre AGO1 et S- paARN : l’isomiR-4534, un miARN variant dérivé d’une mise sous silence

(17)

14

transcriptionnelle par traitement, édition et participation en CDH1. Nous avons défini une région de liaison AGO1, a minima. Elle est située à 3’ de la fin de S-paARN (-280/-57) et caractérisée par les motifs structurels associés avec le site de liaison de l’isomiR-4534.

En nous appuyant sur le rôle de S-paARN dans la régulation du gène CDH1, nous avons recherché l’impact potentiel de polymorphismes à nucléotide simple présent dans le gène promoteur. Le polymorphisme rs16260 (C/A) à -160bp de CDH1 TSS était déjà supposé avoir un impact sur le risque de développer plusieurs cancers, y compris le cancer de la prostate. Toutefois, l’association avec l’expression du CDH1 reste inexpliquée. Dans cette étude, nous avons montré comment le rs16260 (C/A) SNP a modulé la fonction du S-paARN et affecté, ce faisant, la régulation épigénétique de l’expression de l’E-cadhérine. Cet effet fut atténué par des mutations spécifiques des allèles dans le pli du S-paARN et la structure secondaire.

Nous avons montré que le promoteur CDH1 abritait le S-paARN TSS et que le variant - 160A a eu une faible activité sur les tests rapporteurs luciférase. De plus, nous avons trouvé que la présence du rs16260 (A) avait affecté la liaison d’AGO1 et de S-paARN et en retour le recrutement de l’AGO1 jusqu’au promoteur, lequel se retrouvait également dans le recrutement différentiel du SUV39H1. Une fois les mesures d'acylation sélective de l'hydroxyle 2’ analysés par extension d’amorce (SHAPE), nous avons déterminé et comparé les structures ARN secondaires du S-paARN avec l’une et l’autre des -160A et - 160C SNP. Nous avons trouvé que le SNP rs16260 avait altéré la réactivité de l’extension d’amorce (SHAPE), en cohérence avec une structure ARN secondaire qui se transforme autour de -160 (C/A) site dans le S-paARN. Nous postulons que le changement structurel dans la région interactive AGO1, et attenant au site de liaison isomiR-4534 dans le S- paARN, explique la différence dans le recrutement de l’AGO1 et l’activité différentielle consécutive du promoteur CDH1 entre les deux variants alléliques.

Globalement, nos découvertes expliquent comment l’interaction entre les ARN du promoteur CDH1 module la transcription du CDH1, et comment la variation génomique tel que le polymorphisme nucléotidique dans le promoteur non-protéinique des régions codantes affecte les mécanismes épigénétiques et contribue à la régulation transcriptionnelle en altérant la structure et la capacité du paARN à recruter des protéines régulatrices. Ces découvertes ouvrent de nouveaux scénarios à propos de la régulation épigénétique de l’expression du CDH1 et fournit une interprétation

(18)

15

moléculaire pour l’impact des variants génétiques non-codants dans la transformation maligne et la progression de des tumeurs.

(19)

16

Introduction

(20)

17

1 Epigenetic regulatory mechanisms

Nearly all cells of an organism share the same genome but show different phenotypes and carry out diverse functions. Individual cell types, which are characterized by distinct gene expression patterns, are generated during development and are then stably maintained. The chromatin state has marked effects on gene expression. Dynamic changes in chromatin states contribute to the establishment and the maintenance of cell identities and accompany developmental transitions. The assembly of the chromatin is regulated by multiple epigenetic mechanisms, heritable and reversible, that alter gene expression without altering the primary DNA sequence (Bird, 2007; Goldberg et al., 2007). Epigenetic processes play an important role in various biological processes and their disruption lead to a wide variety of pathologies including metabolic and autoimmune diseases, neurological disorders, and cancer (Morgan et al., 1999). The key processes responsible for epigenetic regulation are DNA methylation, histone modifications and non-coding RNAs (ncRNAs). The chromatin state is often the result of the integrated interaction between these processes.

1.1 DNA methylation

DNA methylation is a covalent modification of the cytosine ring at the 5′ position of a CpG dinucleotide by addition of a methyl group added to the 5th carbon of the ring, using S-adenosyl methionine as a methyl donor. However, methylation in non-CG contexts has also been reported in both humans (Lister et al., 2009; Ramsahoye et al., 2000) and mice (Ichiyanagi et al., 2013).

CpG sites are not randomly distributed in the genome. Although the occurrence of the CpG dinucleotides themselves is predicted to be around 1% in mammalian genomes, there are CpG-rich regions (>200 bases), known as CpG islands. DNA hypermethylation, principally referred to as gain of methylation at specific sites, occurs mainly in promoter CpG islands, where it serves as a transcriptional “OFF” switch, associated with transcriptional repression. In humans, 50-70% of all CpGs are methylated, primarily in heterochromatin (inaccessible chromatin) regions. In contrast, in euchromatin (accessible chromatin) CpGs remain locally unmethylated, with the exception of genes

(21)

18

involved in imprinting, X chromosome inactivation, and tissue-specific differentiation (Esteller, 2008; Feinberg and Tycko, 2004; Jones and Baylin, 2007; Laird, 2003).

This modification is catalyzed by DNA methyltransferases (DNMTs).

Knockout DNMTs in mouse models are embryonically lethal, indicating the importance of DNA methylation for embryonic development of mammalian cells (Li et al., 1992;

Okano et al., 1999). There are three main DNMTs: DNMT1, DNMT3a, and DNMT3b (Robertson, 2002). DNMT1 maintains the existing methylation patterns following DNA replication, and DNMT3a and DNMT3b are de novo-enzymes that target unmethylated CpGs to initiate methylation, and are highly expressed during embryogenesis and minimally expressed in adult tissues (Jones and Liang, 2009; Mohn et al., 2008).

Emerging evidence suggests that these functions are not exclusive and functional overlap may exist such that DNMT3a and DNMT3b may correct errors made by DNMT1 after DNA replication (Jones and Liang, 2009). Another family member, DNMT-3L, lacks intrinsic methyltransferase activity and interacts with DNMT3a and DNMT3b to facilitate methylation of retrotransposons (Kinney and Pradhan, 2011).

In normal cells, DNA methylation occurs predominantly in repetitive genomic regions, including satellite DNA and parasitic elements such long interspread transposable element (LINE) and short interspread transposable elements (SINE) maintaining genomic integrity (Smith and Meissner, 2013). In this way, DNA methylation has a major role in maintaining genome stability by preventing the reactivation of transposable elements (Slotkin and Martienssen, 2007).

In addition, DNA methylation can inhibit gene expression directly by inhibiting the binding of specific transcription factors and indirectly by recruitment of methyl CpG- binding domain (MBD) proteins (Robertson, 2001). To date, six methyl-CpG- binding proteins, including methylcytosine-binding protein 2 (MECP2), MBD1, MBD2, MBD3, MBD4, and Kaiso, have been identified in mammals (Le Guezennec et al., 2006; Parry and Clarke, 2011).

Recently, a second modified base, 5-hydroxymethylcytosine (5hmC), generated from the oxidation of methylcytosine (5mC), was identified (Kriaucionis and Heintz, 2009) and mediated by the activity of the ten eleven translocation (TET) family of proteins (family members TET1, TET2, TET3) in humans and mice (Ito et al., 2010; Lian et al., 2012;

Tahiliani et al., 2009; Xu et al., 2011). The hydroxylation of 5mC by TET proteins is

(22)

19

generally followed by either deamination by another DNA methylation eraser protein, AID (activation-induced cytidine deaminase)/APOBEC protein, or carboxylation and entry in to the subsequent base excision repair pathway (Bhutani et al., 2011; De Carvalho et al., 2010; Ko et al., 2010; Wu and Zhang, 2010). A glycosylase of the base excision repair (BER) pathway, TDG, was recently found as involved in an alternative demethylation pathway and responsible of the 5-hydroxymethyluracil resulting from deamination of 5hmC repair. Alternatively, 5hmC is not recognized by DNMT1 (Lao et al., 2010). In this case replication of DNA containing this base would lead to the loss of the 5mC mark in the subsequent S phase.

Generally, 5hmC is one of the intermediates in the demethylation process of 5mC at gene promoter sites. This process can be associated to the maintenance of impaired DNA methylation during DNA replication, or occurs dependently on the ability of one or more enzymes to hydroxylate, further oxidize, or deaminate 5mC. In contrast to 5mC, which is associated with gene repression, it is unclear whether 5hmC is truly an epigenetic mark. 5hmC has been hypothesized to facilitate transcription by “opening”

chromatin (Mendonca et al., 2014). For these reasons, there is increasing appreciation for the role of 5hmC in development as well as its importance for tumor development and progression (Madzo et al., 2014; Moen et al., 2015; Rampal et al., 2014) (Figure 1).

Figure 1. Schematic representation on DNA methylation mechanism. DNA methylation occurring in cytosine in CpG dinucleotide. “Writers” DNMT1 and DNMT3, “ Reader”

MeCP2, and “ Eraser” TET enzymes taking part in the DNA methylation process.

1.2 Histone modifications and histone code

Within the cell’s nucleus, the DNA is packaged together with proteins, making a complex known as chromatin. Chromatin is a highly ordered structure that packages eukaryotic DNA into a higher order chromatin fiber of about 147 base pairs wrapped around a

C C

G G G G

C

G A A

A T

T

T T T

A

de novo

maintenance

CH3

CH3 CH3

DNMT3 MeCP2

DNMT1

CH3

TET

(23)

20

nucleosome, an octamer of histone proteins. These octamers consist of double subunits of H2A, H2B, H3, and H4 core histone proteins (Bhasin et al., 2006).

The flexible N-terminal tails of core histones, which protrude out of the nucleosome, are rich in positively charged amino acids and are subject to various reversible chemical post-translational modifications, involved in many cellular processes (Ruthenburg et al., 2007; Turner, 2005). These modifications influence not only how DNA strands are packaged but also their transcriptional activity. Generally loosen DNA association with histones provide a permissive environment for transcription, whereas histone modifications that tightly package DNA and histones repress transcription activity (Rodriguez-Paredes and Esteller, 2011).

The common histone modifications include acetylation, methylation, phosphorylation, ubiquitylation and sumoylation. Less frequently, histones are subject to polyADP ribosylation, citrullination or deimination, proline isomerization and propionylation. This includes some histone variants, like the H2A.Z (variant of H2A) and the H3.3 (variant of H3), mostly associated with unstable nucleosomes. Histone variants are known to replace one of the normal core histones and are involved in key cellular processes such as transcription, repair and replication (Boulard et al., 2007).

The hypothesis that combinations of post-translational modifications regulate chromatin structure and gene transcription was proposed by Strahl and Allis and it is known as

‘histone code’ theory (Chi et al., 2010; Strahl and Allis, 2000) (Figure 2).

Figure 2. The core histone tails are extensively modified. Histone post-translational modifications mainly occur at selected histone tails residues.

H4 H2A H2B H3

RSSRSKSKARAKGGQKGRGS PKKTESHHKAKGK GTKAVTKYTSSK

KGDKKQAKTVAKKSGKKPAPASKAPEP ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKP

SGRGKGGKGLGKGGAKRHRKVLRD

Me= Methylation Ub = Ubiquitination

PMe Ac Ac MeAc

Ac P MeAc

Me= Acetylation Ub = Phosphorylation N’-

N’- -N’

-N’

-C’

Ac Ac Ac Ac P

Ub

P Ac

Me Me Me P Me MeAc

MeAc

P Me Ac Me

AcUb MeAcP MeAc

AcMe Ac AcP Ac Ac

1 10 20 20 10 1

1 10 20 30

20 10 1

119

120

(24)

21

The histone modification impact on transcriptional regulation varies depending on the histone modification and residue involved. Lysine acetylation almost always leads to increased chromatin accessibility and transcriptional activity. Lysine methylation can have different outcomes (Table 1). Lysine methylation can mark both transcriptionally active and inactive chromatin, depending on the residue that is methylated, the degree of methylation (mono-, di-, or trimethylation; referred to as me1, me2 or me3, respectively) and the position of the methylated nucleosome within the gene and the genome. For example, methylation of histone H3 lysine 4 (H3K4) and H3 lysine 36 (H3K36) is associated with transcriptionally active chromatin. In contrast, methylation of H3 lysine 9 (H3K9), H3 lysine 27 (H3K27), and H4 lysine 20 (H4K20) generally correlate with transcriptional repression. However, various combinations of modifications in specific genomic regions can lead to opening and closing of chromatin structure, responsible for activation or repression of gene expression (Rodriguez-Paredes and Esteller, 2011) .

Table 1. Histone lysine methylation and transcriptional outcomes. Genomic locations and usual effects on transcription referred to the corresponding histone lysine modification. Right border, representation of progressive corresponding chromatin status. LTR, long terminal repeat. Adapted from (Mozzetta et al., 2015).

Less is known about ubiquitination and sumoylation. The size of SUMO and ubiquitin (11 and 9 kDa, respectively) clearly distinguishes them from the other known post- translation modifications of histones, which are all small chemical groups.

Histone marks Transcription Genomic regions

H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27me1 H3K36me1 H3K36me3 H3K20me1

Active Promoters

Enhancers Gene bodies

H3K4me3 H3K9me3 H3K27me3

Poised Promoters

(e.g: development genes)

H3K27me3 Repressed Promoters

Enhancers CpG-rich regions

Enhancers H3K9me3

X-inactivation centre H3K9me3

H3K27me3 H3K27me3

Repressed Repeats

(e.g: pericentromeric and LTR)

(25)

22

Monoubiquitination of H2A and H2B have been clearly implicated in transcriptional regulation. H2A ubiquitination is more frequently correlated with gene silencing, while H2B ubiquitination is mostly associated with transcription activation. Mechanistically, sumoylation occurs on H4 histone by small ubiquitin-related modifier (SUMO) (Shiio and Eisenman, 2003), through an enzyme cascade very similar to ubiquitination. The amino- terminal tail of histone H4 contains five lysines, all of which may be candidates for sumoylation. While ubiquitination has a role in protein degradation, SUMO does not.

Sumoylation may prevent chromosomal compaction and regulate transcriptional repression (Nathan et al., 2006; Shiio and Eisenman, 2003).

Histone modifications are executed by writers, interpreted by readers and removed by erasers; collectively, these changes are referred to as post-translational modifications (Figure 3).

“Writers”, which include histone acetyl-transferases (HATs), histone methyltransferases (HMTs), kinases, ubiquitin ligases (E1, E2 and E3 enzymes) and sumoligases introduce chemical modifications onto a protein.

Figure 3. Histone writers, readers and erasers. Main histone post-translational modifications with relative writers, readers, erasers enzymes. P, phosporylation; Ac, acetylation; Me, methylation; Ub, ubiquitination; SUMO, sumoylation.

“Readers” are proteins or complexes that specifically bind to a modified protein and recruit components of the nuclear signaling network to regions of chromatin inciting

P Ac Me Ub Sumo

Writers

Readers

Erasers

Kinase

Metyl- Transferase Acetyl-

Transferase

Ubiquitin-

ligase Sumoligases

Phospatase Deacetylase

Demethylase

Deubiquitinase Protease

(26)

23

gene transcription, DNA replication or recombination, DNA damage responses and chromatin remodeling (Musselman et al., 2012).

“Erasers” like histone deacetylases (HDACs and sirtuins), histone demethylases (HDMs), phosphatases, deubiquitinases (DUBs) and proteases remove chemical modifications from proteins (Tarakhovsky A, 2010, Ruthenburg AJ, 2007).

These enzymes often exist in multi-subunit complexes. For example, the Polycomb group (PcG) of repressor proteins, controls the accessibility of gene regulatory elements to the transcription machinery resulting crucial for early development (Mills, 2010). These group of proteins is organized in two repressive protein complexes, Polycomb repressive complex 1 (PRC1) contains either ring finger protein 1A (RING1A) or 1B (RING1B), both of which catalyze the monoubiquitination of histone H2A at lysine 119 (H2AK119ub1), and PRC2 contains Enhancer of zest 2 (EZH2), which catalyses the trimethylation of H3K27 (Cao and Zhang, 2004). Similarly, Trithorax group protein complexes contain the mixed- lineage leukemia (MLL) family of KMTs that catalyze the formation of the transcriptionally activating H3K4me3 mark.

Beyond post-translational histone modifications, chromatin compaction is also affected by ATP-dependent chromatin-remodeling complexes that use energy from ATP hydrolysis to exchange histones and to reposition or evict nucleosomes. Approximately 30 genes that encode the ATPase subunits have been identified in mammals. On the basis of the sequence and the structure of these ATPases, chromatin-remodelling complexes are divided into four main families: SWI/SNF, ISWI, chromodomain-helicase DNA-binding protein (CHD) and INO80 complexes(Ho and Crabtree, 2010). Many histone modifiers and chromatin remodelers have been implicated in stem cell pluripotency, cellular differentiation and development (Mattout and Meshorer, 2010; Meshorer et al., 2006).

1.3 DNA methylation and histone modifications crosstalk

While histone modifications and DNA methylation are executed by different cellular machineries, the two processes are dynamically linked. It is now recognized that significant dynamic interplay exists between these epigenetic mechanisms. During early development, histone modifications have been shown to induce DNA methylation (Cedar and Bergman, 2009). Once established, methylated CpGs are recognized by

(27)

24

methyl-CpG-binding proteins, which form a complex with histone deacetylase complexes acting to co-repress transcription (Feng and Zhang, 2001; Li, 2002). On the other hand, elevated histone acetylation can trigger DNA demethylation (Cervoni and Szyf, 2001;

D'Alessio et al., 2007). Tet1 contains a DNA-binding motif similar to Cfp1, a component of the H3K4 methyltransferase complex, suggesting that both proteins target similar sites, in this case of CpG islands, to maintain DNA demethylation (Tahiliani et al., 2009).

Although a direct connection between the two has yet to been shown, Tet1 does indeed localize to CpG islands and its depletion results in an increase in methylation within those CpG islands in mouse embryonic stem cell studies (Ficz et al., 2011; Wu and Zhang, 2011). In addition, DNMT-3L binds to H3 histone tails and recruits DNMT3a and DNMT3b to methylated DNA (Ooi et al., 2007). The direct binding of DNMT3a to the H3 histone tail, sometimes facilitated by H3K36 trimethylation, stimulates its methyltransferase activity (Dhayalan et al., 2010; Li et al., 2011a).

Conversely, the presence of the active histone modification H3K4 trimethylation (H3K4me3) impairs the binding of DNMT3a, DNMT3b, and DNMT-3L to H3 histone tails and prevents methylation to CpG islands (Mikkelsen et al., 2007; Ooi et al., 2007; Zhang et al., 2010c). Cfp1 has been found to target unmethylated CpG sites at murine CpG islands and may play a role in maintaining their hypomethylation (Lee and Skalnik, 2005;

Thomson et al., 2010).

Moreover, DNMTs directly interact with enzymes that regulate histone modifications typically involved in gene repression. Both DNMT1 and DNMT3a are known to bind to the histone methyltransferase SUV39H1 that restricts gene expression by methylation on H3K9 (Fuks et al., 2003). Furthermore, DNMT1 and DNMT3b can both bind to histone deacetylases that remove acetylation from histones to make DNA pack more tightly and restrict access for transcription (Fuks et al., 2000; Geiman et al., 2004).

In addition, methyl-binding proteins serve as the strongest link between DNA methylation and histone modification. The associated MBD family members in turn recruit histone-modifying enzymes and chromatin-remodeling complexes to methylated states (Deaton and Bird, 2011). Both the MBDs and the UHRF proteins interact with methylated DNA and histones to enhance gene repression (Citterio et al., 2004;

Karagianni et al., 2008; Nan et al., 1998; Ng et al., 1999; Sarraf and Stancheva, 2004). In particular, the UHRF protein family first binds to DNMT1 and then targets it to hemimethylated DNA in order to maintain DNA methylation, especially during DNA

(28)

25

replication (Achour et al., 2008; Bostick et al., 2007; Sharif et al., 2007). MeCP2 recruits histone deacetylases to remove active histone modifications and repress gene transcription (Fuks et al., 2003; Jones et al., 1998; Nan et al., 1998). Furthermore, MeCP2 enhances the repressive chromatin state by recruiting histone methyl-transferases that add repressive H3K9 methylation (Fuks et al., 2003). Overall, DNA methylation and histone modifications work closely together to regulate gene expression.

(29)

26

2 Genetic variation, epigenetic mechanisms and cancer

Gene expression occurs in a dynamic functional epigenomic landscape in which the majority of genomic sequence is proposed to have regulatory potential (Consortium, 2012). Both genetic and epigenetic changes as well their associated interactions contribute to observable phenotypic differences in traits ranging from morphology, physiology and behavior to predisposition to many human diseases, including cancer.

2.1 Non-coding variants impact on genetic expression

The present estimate of human variations derived from large scale whole genome sequencing projects stand at over 60 million variants, including single nucleotide variations and small insertions or deletions (Genomes Project et al., 2010; International HapMap et al., 2010; Manolio and Collins, 2009).

The Genome-wide association studies (GWAS) have identified large number of these genetic variants associated with human diseases (Hindorff et al., 2009; Manolio, 2010).

The majority of efforts to annotate functional variants to date have focused on variants that directly affect coding sequence, such as missense and nonsense mutations, or those that affect transcript splicing signals (Cooper and Shendure, 2011).

However, among the 7000 disease associated single nucleotide polymorphisms (SNPs), only about 7% affect coding regions. The vast majority of genetic variants associated with complex traits lie in non-coding regions of the genome, and many of these lie some distance away from the nearest protein-coding locus (Hindorff et al., 2009; Lee et al., 2013; Ward and Kellis, 2012b) (Figure 4).

These observations imply that many variants that affect the risk of common, complex diseases are likely to exert their effect by altering gene expression and its regulation rather than by directly affecting gene and protein function (Ritchie et al., 2014).

Generally common hereditable genetic variations associated with gene expression are mapped as quantitative trait loci expression (eQTLs, expression quantitative trait loci) (Jansen and Nap, 2001; Wright et al., 2014).

(30)

27

Figure 4. Genomic distribution (%) of SNPs in selected cancer types. The majority of cancer-related SNPs are located in non-coding regions of the genome (intergenic or intronic) and only a small number are found in coding region. Adapted from (Cheetham et al., 2013).

Therefore, a local molecular signal, such as a region of open chromatin showed to co- vary across the individuals with a presence of a SNP, can be explained on the basis of a gene expression phenotype (Figure 5).

Most of the SNPs that influence gene expression have been shown to be significantly enriched for GWAS associations (Nicolae et al., 2010; Schadt et al., 2008; Stranger et al., 2007; Zhong et al., 2010). Non-coding variations that directly contribute to the disease risk, often with a large effect sizes, can act locally (cis eQTL) and at a distance (trans eQTL) to modulate a range of regulatory epigenetic processes in a highly context-specific manner (Grundberg et al., 2012; Westra and Franke, 2014).

Various approaches have been developed to identify variants that are likely to play an important biological role. In addition and in combination with computational methods, high-throughput functional assays such as chromatin immunoprecipitation assays followed by sequencing (ChIP-seq) and DNase I–hypersensitive site identification by sequencing (DNase- seq) have been extensively used to show how binding of proteins to DNA and the accessibility of chromatin can be allele-specific dependent and inter- individual variable (Boyle et al., 2008; Crawford et al., 2006; Degner et al., 2012; Gross and Garrard, 1988; Johnson et al., 2007; McDaniell et al., 2010; Robertson et al., 2007).

The presence of SNPs in promoter gene regions leads to differences in transcription factor (TF) binding among individuals, as recently demonstrated for the nuclear factor

Genomic location of SNP Intergenic Intronic UTR Coding Prostate cancer

Ovarian cancer Lung cancer Colon cancer Breast cancer

Genomic distribution of associated SNPs (%)

0 20 40 60 80 100

(31)

28

kappa-B (NF-κB), an important transcription factor involved in inflammation and autoimmune disease genes regulation (Kasowski et al., 2010; McDaniell et al., 2010).

Figure 5. A cascade of regulatory mechanisms by which an eQTL SNP can affect gene expression. eQTLs might act to affect variation in mature mRNA with variety of mechanisms, First, eQTL SNPs can impact epigenetic modifications and transcription initiation (transcription factor binding, histone modifications, enhancer activity and DNA methylation, in red). Second, transcriptional and cotranscriptional processes contribute to variation in gene expression levels and mRNA isoform diversity (transcriptional elongation, cotranscriptional splicing, and mRNA processing and modification).Third, eQTL SNPs both within and outside the transcript have been shown to influence posttranscriptional mRNA processing, which includes mechanisms such as general mRNA degradation, defects in polyadenylation, and targeting by miRNAs. Adapted from (Pai et al., 2015).

This approach has helped to interpret and relate some of the genetic variants with cancer. For example, the presence of two intronic regulatory SNPs in the tyrosine kinase receptor alters the binding of the transcription factors Oct-1/Runx2 and C/EBPβ and lead to an increased expression of FGFR2 in the rarer homozygous genotypes which have increased breast cancer risk (Meyer et al., 2008).

Similarly, a SNP upstream of the proto-oncogene c-MYC has been shown to affect the binding of transcription factor YY1 which may serve to regulate c-MYC expression in

Transcriptional elongation, alternative splicing, and mRNA

processing and modification Enhancer

Histone modifications

TFs binding sites

PolII PolII

eQTL SNP

Transcription Co-transcription processingand Epigenetic modifications/

Transcription initiation

TF TF TF

nucleus nucleosomes

cytoplasm

Post-Transcription processing

DNA methylation

mature mRNAs mRNA stability (defects in polyadenilation,

targeting by miRNAs)

degraded mRNAs

(32)

29

prostate cancers (Meyer et al., 2011). Two SNPs in the intronic promoter of MDM2, an oncogene that downregulates the tumor suppressor TP53, have shown independent and opposite effects on the binding of the transcription factor SP1. Interestingly, the combination of both SNPs into a commonly observed haplotype reduces SP1 binding in the MDM2 promoter and reduces breast and ovarian cancer risk, likely by reducing MDM2 expression (Bond et al., 2004; Knappskog and Lonning, 2011). These observations underline how sometimes combination of two or more SNPs may act in concert to affect a trait (Kreimer and Pe'er, 2014). Moreover, functional variants involving enhancer elements can to contribute to phenotypic differences between individuals and ancestral groups by heritable variation in histone modifications trigging by SNPs and affecting in turn transcription factors site access (Kasowski et al., 2013; Pomerantz et al., 2009;

Schodel et al., 2012; Zhang et al., 2012).

Spontaneous hydrolytic deamination of 5-methylcytosine to thymine rather than uracil seems to be responsible of the approximately 75% decrease in the frequency of CpG methyl acceptor sites. The resulting T: G mismatch is more difficult to repair, and about a third of all disease causing familial mutations and single nucleotide polymorphisms or variants occur at methylated CpG sites (Rideout et al., 1990). DNA methylation profile at SNP regions in human tissues using bisulfite sequencing (BS-seq) identified a correlation between some of allele-specific methylation outside of imprinted regions and allele- specific expression of genes located proximal to the methylated region (Kerkel et al., 2008), as recently found in peripheral blood leukocytes study of individuals from three generations (Gertz et al., 2011).

Furthermore, a study showed that a single nucleotide variant within the 5’ untranslated region (UTR) of MLH1, a gene involved in DNA repair process, likely by inducing its constitutional epimutations occurring among individuals, is responsible of the monoallelic promoter methylation and cancer predisposition (Banno et al., 2012;

Hitchins et al., 2011).

Regulatory variants can affect histone modifications that have downstream consequences on chromatin status. Emerging studies have characterized interspecies and population level variation in multiple aspects of gene regulation integrating histone posttranslational modifications with transcription factors and RNA polymerase II (RNAPII) occupancy. With this approach, hundreds of quantitative trait loci, genome- wide, that affect histone modification or RNA polymerase II occupancy have been

(33)

30

identified (McVicker et al., 2013). A recent discovery shows that the majority of SNPs associated with breast cancer risk are enriched with FOXA1- and ER-binding sites, as well as H3K4me1 histone modiﬁcation in a cancer- and cell type–specific manner. The majority of these risk-associated SNPs seem to modulate the affinity of chromatin for FOXA1 at distal regulatory elements, as a pioneer factor central for opening compacted chromatin, nucleosome repositioning, and ER function and thereby resulting in allele- specific gene expression (Cowper-Sal lari et al., 2012). Furthermore, emerging evidences show that TF binding, histone modifications and transcription operate within the same allelic framework under genetic control, despite histone modifications are more prone to stochastic, possibly transient effects and likely reflect rather than define, coordinated regulatory interactions (Gutierrez-Arcelus et al., 2013; Kilpinen et al., 2013).

2.2 Genetic variation in long non-coding RNAs

Because a majority of the human genome is transcribed, growing interest has been focused on mechanisms by which non-coding variations associated with human diseases are translated into altered gene function through long non-coding RNAs (lncRNA) and their regulatory networks (Almlof et al., 2014; Chung, 2010; Gong et al., 2012; Jin et al., 2011; Ling et al., 2013b).

Several studies show that regulatory SNPs can cause human genetic disease affecting directly the expression of lncRNA genes (Hrdlickova et al., 2014b; Kumar et al., 2013; Li et al., 2013a; Shirasawa et al., 2004). SNPs can be located in promoter sequence and directly alter the expression of lncRNAs (loss-of-function, Figure 6a) or change the binding of inhibitory complexes, thereby allowing expression of lncRNAs that are not expressed under physiological circumstances (gain-of- function, Figure 6b).

Many SNPs located within or around the ANRIL lncRNA gene have been associated in GWAS with a susceptibility to several diseases (Burd et al., 2010; Pasmant et al., 2011). It has been hypothesized that they may directly regulate the expression level of ANRIL or whether they perturb the binding site of the transcription repressor STAT1, located in the ANRIL's enhancer (Harismendy et al., 2011). Recently, the ubiquitous transcription factor YY1, has been found specifically bound to the A allele of SNP rs356219, associated with Parkinson’s disease. The SNP rs356219 is located at the 3' of the SNCA gene

(34)

31

involved in this disease, which also correspond to the intron region for antisense non- coding RNA RP11-115D19.1 (Mizuta et al., 2013). It has been shown that YY1 binding stimulates the expression of antisense non-coding RNA RP11-115D19.1 and thereby affects the SNCA expression (Mizuta et al., 2013).

Secondly, SNPs within lncRNA genes may cause alternative splicing or altering processing of the transcript (Figure 6c). SNPs can affect the usage of alternative polyadenylation sites and thereby potentially influencing the stability of non-coding RNAs (Zhernakova et al., 2013).

Thirdly, similarly to the identification of multiple disease-associated mutations in the 3’

UTRs of coding genes altering the mRNA structure, single nucleotide variations can affect lncRNA structure and impact to the binding of RNA binding proteins and miRNAs (Doma and Parker, 2006; Halvorsen et al., 2010; Kazan et al., 2010; Ohanian et al., 2013;

Parsons et al., 2011; Pavesi et al., 2006; Taft et al., 2010) (Figure 6d).

Figure 6. Functional consequences of mutations on lncRNAs and their function.

Mutations located in regions involved in transcriptional control (e.g. promoters, enhancers) might result in: (a) direct alteration of the amount of lncRNA transcripts (loss- of function). (b) Indirect alteration of lncRNA levels when enhancer or repressor sites are affected (gain-of-function). (c) Mutations located in exons of lncRNAs might result in alternative splicing leading to loss-of-function. (d) Mutations might also alter lncRNA function by affecting their secondary structure. Adapted from (Hrdlickova et al., 2014a).

1 2 3

1 3

SNP indication

LcnRNA Protein-complex & lncRNA

wild type

mutated allele

wild type

mutated allele

wild type

mutated allele

wild type

mutated allele

a

c

b

d

Active conformation

Non-active conformation

(35)

32

In line whit this, growing interest have been arising to develop web servers to compute the potential deleterious effects of SNPs on RNA structure and function (Barash and Churkin, 2011; Churkin et al., 2011; Sabarinathan et al., 2013; Salari et al., 2013).

A well-known example is the mitochondrial transfer RNA (tRNA) mutations that disrupt the structure of tRNA and cause tRNA dysfunction, leading to a variety of severe diseases (Wittenhagen and Kelley, 2003). Mutations that alter the equilibrium between different conformational states of telomerase RNAs result in disease states such as dyskeratosis congenital (Chen and Greider, 2004), presumably through disruptions of the RNA scaffold structure into which are plugged modular binding sites for telomeric regulatory proteins (Zappulla and Cech, 2004).

The rs11752942A>G site in the lincRNA-uc003opf.1 exon determines a different esophageal squamous cell carcinoma (ESCC) susceptibility in Chinese populations, probably due to a differential RNA structural folding at the micro-RNA-149* binding site and thereby affecting the level of lincRNA-uc003opf.1 (Wu et al., 2013). Similarly, the C allele of the rs12325489C>T polymorphism in the exonic regions of lincRNA- ENST00000515084, associated with a significantly increased risk of breast cancer, when changes to T base, disrupts the binding site for miRNA-370 (Li et al., 2014c).

In this regard, genome-wide analysis of all known disease-associated SNPs from the Human Gene Mutation Database (HGMD) that map to all the untranslated regions have been performed and the nature of the structural conformational changes induced by the disease-associated mutations have been characterized in silico (Halvorsen et al., 2010).

These predictions show that the contribution of a 4-bp insertion/deletion polymorphism (rs10680577) within the distal promoter of EGLN2 and located within the intronic region of RERT-lncRNA, seems to disrupt the structure of RERT-lncRNA and subsequently affects EGLN2 expression, with the potential contribution to hepatocellular carcinoma (Zhu et al., 2012).

Recently, Parallel Analysis of RNA Structure (PARS) was applied to a family trio (father, mother, and child) and revealed the impact of genetic sequence variation on RNA secondary structure and inherited structural patterns (Wan et al., 2014). Interestingly, approximately 15% of transcribed single nucleotide variants do in fact alter RNA secondary structure. These transcripts named riboSNitches include genetic variants that have been previously linked to human diseases, suggesting that our current understanding of the impact of mutations in the non-protein coding regions of the

(36)

33

genome is still limited (Corley et al., 2015; Wan et al., 2014). Moreover, it is likely that many more lncRNAs are transcribed from cancer loci but they have not been examined in a targeted manner or they produce just low-abundance RNAs, making their detection and characterization complicated. Consistent with this, the 8q24 genomic region is frequently altered by amplification, deletion, viral integration or translocation in many types of human cancers is a hotspot for cancer-associated SNPs. Several lncRNAs including CCAT1 (Xiang et al., 2014), CCAT2 (Ling et al., 2013b), CARLo-5 (Kim et al., 2014), PVT1 (Tseng et al., 2014), PCAT1 (Prensner et al., 2011), and PRNCR1 (Yang et al., 2013) have been recently reported to be transcribed from this genomic region, so far considered a ‘gene desert’ one, opening a way to new potential functional explanations (Ling et al., 2015). In this regard, a recent discovery showed that the long ncRNA CCAT2, overlapped with the cancer risk-associated rs6983267 SNP, regulates cancer metabolism in vitro and in vivo in an allele specific manner by binding the Cleavage Factor I (CFIm) complex with distinct affinities for the two subunits (CFIm25 and CFIm68) (Redis et al., 2016). Lastly, a new lncRNA, lnc13, that harbours a celiac disease–associated haplotype block and represses expression of certain inflammatory genes under homeostatic conditions, has been identified and characterized. It has been shown that lnc13 repressive action depends on binding to a member of a family of ubiquitously expressed heterogeneous nuclear ribonucleoproteins hnRNPD. Interestingly, the lnc13 disease- associated variant binds hnRNPD less efficiently than its wild-type counterpart, thus helping to explain how these single-nucleotide polymorphisms contribute to celiac disease (Castellanos-Rubio et al., 2016).

Remarkably, as it is possible to change the structure of RNA with small drug like molecules, functional riboSNitches are also a new class of pharmaceutical targets and could be targeted, for example, by small molecules specific to the disease-associated riboSNitch. These reinforce the current therapeutic concept of moving towards personalized medicine (Guan and Disney, 2013; Halvorsen et al., 2010; Solem et al., 2015).

2.3 Cancer epigenetics

For long time cancer has traditionally been viewed as a disease driven by the accumulation of genetic mutations that have been considered the major causes of

(37)

34

neoplasia (Hanahan and Weinberg, 2011). However, this paradigm has now been expanded to incorporate the disruption of epigenetic regulatory mechanisms that manifest as global alterations in chromatin packaging and by specific promoter changes that influence the transcription of associated genes (Baylin and Jones, 2011; Sandoval and Esteller, 2012). Additionally, genomic variations impacting on epigenetic mechanisms add a layer of complexity to this disease.

Reports suggest that more than 300 genes and gene products are epigenetically altered in human cancers through various epigenetic regulated mechanisms (Miremadi et al., 2007).

DNA methylation alterations are probably the most widely studied epigenetic alterations in cancer. Cancer cells frequently display genome-wide hypomethylation and site- specific CpG island hypermethylation in the gene promoter regions that usually remain unmethylated in normal cells (De Smet et al., 2004; Ehrlich, 2002b; Robertson and Wolffe, 2000).

As described in several tumor types, DNA hypermethylation occurs at promoter CpG of specific genes involved in the major cellular pathways including DNA repair, cell cycle control, apoptosis and metastasis (Bedford and van Helden, 1987; Ehrlich, 2002a; Ehrlich et al., 2002; Feinberg and Vogelstein, 1983; Kim et al., 2010; Netto et al., 2008; Smith et al., 2007; Wolff et al., 2010). Thereby, hypermethylation of CpG-rich promoter regions is one of the regulatory mechanisms used by tumor cells to silence tumor suppressor genes (Jones and Baylin, 2007), like CDKN2A (cyclin-dependent kinase inhibitor 2A), MLH1 (mutL homolog-1), BRCA1 (breast cancer–associated-1) and VHL (von Hippel- Lindau tumor suppressor) (Esteller, 2008; Jones and Baylin, 2007). This observation has been expanded through the study of the inactivation of small non-coding RNAs, like microRNAs, with growth-inhibitory features by silencing (Huang et al., 2009; Lujambio et al., 2007; Saito et al., 2006; Toyota et al., 2008). This provides tumor cells with a growth advantage and increases their genetic instability and aggressiveness (Esteller, 2007; Shen et al., 2007).

Conversely, hypomethylation occurring in repetitive regions of the genome results in genomic instability and is a hallmark of tumor cells (Hatziapostolou and Iliopoulos, 2011). Alternatively, hypomethylation at specific loci can activate aberrant expression of oncogene and induce loss of imprinting (LOI) or the loss of parental allele-specific monoallelic expression of genes due to aberrant DNA hypomethylation (Markowitz and

(38)

35

Bertagnolli, 2009), like for the IGF2 protoncogene (encoding insulin-like growth factor- 2), leading to activation in a wide range of cancers (Ogawa et al., 1993; Watt et al., 2000;

Wilson et al., 2007).

Moreover, alterations on the DNA methylation machinery contribute to leads aberrant DNA methylation. DNMT1 mutations have been described in colorectal cancer (Kanai et al., 2003), while frequent DNMT3a mutations occur in myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML) (Ley et al., 2010; Yamashita et al., 2010; Yan et al., 2011). Germline mutations in DNMT3b underlie immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome and chromosome instability (Wijmenga et al., 2000), while SNPs have been suggested to be associated with risk of several cancers including breast and lung adenocarcinoma (Shen et al., 2002). Additionally, recent studies uncovered a role of DNMT3a in silencing self-renewal genes in hematopoietic stem cells (HSCs) to permit efficient hematopoietic differentiation, and its loss progressively impairs HSC differentiation (Challen et al., 2012; Trowbridge and Orkin, 2012).

On the other hand, the aberrant expression of DNMT1, DNMT3a and DNMT3b in various cancers possibly contributes to ectopic hypermethylation (Wu et al., 2007). Genetic mutations in MBD1 and MBD2 increase the risk of lung and breast cancer, respectively (Sansom et al., 2007).

Interestingly the MLL gene protein, which introduces the active H3K4me3 mark and plays important roles in development, has been observed in MLL-TET1 fusions in some cases of AML and lymphocytic leukemia (Burmeister et al., 2009; Meyer et al., 2009).

Furthermore, TET2 have been found mutated in MDS and in myeloproliferative neoplasms (Tan and Manley, 2009).

Disruption of histone modifications has also been linked to all the hallmarks of cancer.

One of the most representative examples is the global reduction H4K20me3 and H4K16Ac, along with DNA hypomethylation, at repeat sequences in many primary tumors (Fraga et al., 2005). Histone variants such as H2A.Z may affect the recruitment of TFs and often components of the transcription machinery, thereby contributing to aberrant gene expression (Conerly et al., 2010; Mills, 2010; Sharma et al., 2010).

Mutations or translocations on HATs (p300, CBP, and MYST4) genes are observed in colon, uterine lung tumors and in leukemias (Esteller, 2007; Yang, 2004). Further, these

A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in human cancers

Thesis

Reference

A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in human cancers

UNIVERSITÉ DE GENEVE FACULTÉ DES SCIENCES Section des Sciences Pharmaceutiques Professeur Leonardo Scapozza

INSTITUTE OF ONCOLOGY Tumor Biology and Experimental RESEARCH (IOR) Therapeutics Program Professeur Carlo V. Catapano

___________________________________________________________

A non-coding RNA network influenced by genetic polymorphism controls E-cadherin expression in

human cancers

THÈSE

Table of contents

List of Abbreviations 4

Summary 9

Résumé 12

Introduction

16

1. EPIGENETIC REGULATORY MECHANISMS

17

1.1 DNA methylation

17

1.2 Histone modifications and histone code

19

1.3 DNA methylation and histone modifications crosstalk

23

2. GENETIC VARIATIONS, EPIGENETIC MECHANISMS AND CANCER

26

2.1 Non coding variants impact on genetic expression

26

2.2 Genetic variation in long non-coding RNAs

30

2.3 Cancer epigenetics

33

2.4 Epigenetics and therapy of cancer

37

3. PROMOTER-ASSOCIATED NON-CODING RNAs

41

3.1 Directionality of promoter transcription

44

3.2 Characterized promoter-associated RNAs in genes regulation

47

4. SHORT REGULATORY NON-CODING RNAS

50

4.1 RNA interference and post-transcriptional regulation

50

4.2 miRNAs biogenesis and regulatory pathways

52

4.3 Transcriptional gene regulation by promoter-targeted RNAs

56

4.4 Promoter-targeted RNAs require paRNAs

59

4.5 Interactions of long and short non-coding RNAs

62

5. E-CADHERIN

66

5.1 E-cadherin in signal transduction and cell physiology

66

5.2 E-cadherin silencing in human cancer

67

Overall hypothesis and experimental plan

69

Materials and methods

74

Results 91

1- Promoter-associated transcripts control CDH1 transcription

92

2- Mapping and characterization of promoter-associated

96

transcript in CDH1 promoter

3- Sub-cellular distribution of CDH1 promoter-associated

99

transcripts

4- Promoter-associated transcripts termination, stability and 100

degradation 5- S-paRNA contributes to the E-cadherin repression

102

6- Promoter-associated transcripts regulate in cis E-cadherin

105

7- Knockdown of S-paRNA affects the cancer cell phenotype

106

8- AGO1 c ontributes to CDH1 transcriptional regulation 109

9- AGO1 binds to the CDH1 promoter and to the promoter-