• Aucun résultat trouvé

Dynamics of DNA methylation and genomic imprinting in arabidopsis

N/A
N/A
Protected

Academic year: 2021

Partager "Dynamics of DNA methylation and genomic imprinting in arabidopsis"

Copied!
226
0
0

Texte intégral

(1)

Dynamics of DNA methylation and genomic imprinting in

Arabidopsis

by

Colette Lafontaine Picard

B.Sc. Biological Sciences

B.Sc. Biometry and Statistics Cornell University (2012)

Submitted tothe Program of Computational and Systems Biology in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in Computational and Systems Biology

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2019

@ 2019 Massachusetts Institute of Technology. All rights reserved.

Author

...

Signature

redacted...

...

Computational and Systems Biology Graduate Program September 28, 2018 Certified by... Accepted by ...

Signature redacted

Signature redacted

MASSACHUSETTS INSTITUTE OF ?jTECHNOLcGY

JAN

0

7 2019

LIBRARIES

... Mary Gehring

Associate Professor of Biology

Thesis Supervisor

... Christopher B. Burge Professor of Biology and Biological Engineering Director, Computational and Systems Biology Graduate Program

(2)
(3)

Dynamics of DNA methylation and genomic imprinting in

Arabidopsis

by

Colette Lafontaine Picard

Submitted to the Program of Computational and Systems Biology February 2019, in partial fulfillment of the

requirements for the Degree of Doctor of Philosophy

Abstract

DNA methylation is an epigenetic mark that is highly conserved and important in diverse cellular

processes, ranging from transposon silencing to genomic imprinting. In plants, DNA methylation is both mitotically and meiotically heritable, and changes in DNA methylation can be

generationally stable and have long-lasting consequences. This thesis aims to improve

understanding of DNA methylation dynamics in plants, particularly across generations and during reproduction. In the first project, I present an analysis of the generational dynamics of gene body methylation using recombinant inbred lines derived from differentially methylated parents. I show that while gene body methylation is highly generationally stable, changes in methylation state occur nonrandomly and are enriched in regions of intermediate methylation. Important

DNA methylation changes also occur during seed development in flowering plants, and these

changes underlie genomic imprinting, the phenomenon of parent-of-origin specific gene expression. In plants, imprinting occurs in the endosperm, a seed tissue that functions analogously to the mammalian placenta. Imprinted expression is linked to DNA methylation patterns that serve to differentiate the maternally- and paternally-inherited alleles, but the mechanisms used to achieve imprinted expression are often unknown. I next explore imprinted expression and DNA methylation in Arabidopsis lyrata, a close relative of the model plant

Arabidopsis thaliana. I find that the majority of imprinted genes in A. lyrata endosperm are also

imprinted in A. thaliana, suggesting that imprinted expression is generally conserved.

Surprisingly, a subset of A. lyrata imprinted genes are associated with a novel DNA methylation pattern and may be regulated by a different mechanism than their A. thaliana counterparts. I then explore the genetics of paternal suppression of the seed abortion phenotype caused by mutation of a maternally expressed imprinted gene. Finally, I present the first large single-nuclei RNA-seq dataset generated in plants, reporting data from 1,093 individual nuclei obtained from developing seeds. I find evidence of previously uncharacterized cell states in endosperm, and examine imprinted expression at the single-cell level. Together, these projects contribute to our understanding of DNA methylation and imprinting dynamics during plant development, and highlight the strong generational stability of certain DNA methylation patterns.

Thesis Supervisor: Mary Gehring Title: Associate Professor of Biology

(4)
(5)

Acknowledgements

I would like to thank my advisor Mary Gehring, for giving me the opportunity to pursue my

research interests in such a fantastic environment. As someone who came from a background outside of plants but was very interested in epigenetics, working in this lab gave me the chance to try something completely new, and I learned so much in the process. Mary has been an

amazing and supportive mentor through all of the many projects I've taken on, and always has a new idea or suggestion whenever I start getting bogged down.

I would also like to thank all current and former Gehring lab members for their insight, advice,

friendship and support through the years. A shout-out to Ben, Satyaki, Becky, and Xiao-yu, who made the lab a fun place and who always had great ideas and suggestions when I needed them. I would also like to thank Maja, who was my first major collaborator, for giving me the opportunity to contribute to such a cool project. I was lucky to share my lab bay with my friend Rob, who had

a passion for science and education that was always amazing to see. Last but not least, thank you to my friend and fellow lab member Deborah, who is not only a great person to talk to about science, but who is also somehow my long-lost identical twin who shares all the same interests

as me. What are the odds? Thank you so much for everything - for learning to bake with me, for always having a new awesome book I should check out, for laughing out loud in lab at your audiobook the same way I do.

My thesis committee members, Chris Burge and Laurie Boyer, have watched my research evolve

through the years and given me invaluable advice throughout. Thank you both for your generous guidance and support. Thank you also to Manolis Kellis, who served on my qualifying exam committee, and to Alexander Gimelbrant, who served on my thesis defense committee.

My classmates from the CSB program, Peter, Vincent, Mandy, Mariana, Nezar and Rotem, have

been a constant source of support and friendship throughout the past six years. Thank you to all of you for the amazing experiences - from camping to birthdays to weddings. Grad school would

have been a lot less fun without you all. Thank you also to Jacquie Carota, our program's fantastic administrator, who was a constantly helpful and cheerful presence.

I would also like to thank my undergraduate research advisor Siu Sylvia Lee, who gave me the

opportunity to work in a biology lab and decide that this was where I wanted to make my career. In addition to Sylvia, two former PhD students from Sylvia's lab, Gizem Rizki and Ella Chang, were also amazing mentors who taught me the basics of working at the bench. The skills I learned in the Lee lab have been invaluable throughout my time in grad school.

Finally, I would like to thank my mom and dad, who have been a source of unwavering support through the past six years of grad school and long before. I couldn't have made it here without them. Thank you also to my extended family: aunts, uncles, cousins, and more. I love visiting

(6)
(7)

Table of Contents

1 In t ro d u c t io n ... 1 1

1.1 5-methylcytosine (5-mC) is a conserved epigenetic mark ... 12

1.1.1 Properties and function of 5-methylcytosine...12

1.1.2 M echanism s for adding 5-m C ... 16

1.1.3 M echanism s for rem oving 5-m C ... 20

1.1.4 Heritability of 5-m C across generations ... 20

1.2 Interplay between histone modifications and DNA methylation... 21

1.2.1 Dimethylation of H3K9me2 is associated with non-CG methylation... 21

1.2.2 H3K27me3 and the Polycomb Repressive Complex ... 22

1.2.3 H2A .Z is antagonized by 5-m C ... 23

1.3 Seed developm ent in A . thaliana ... 23

1.3.1 Endosperm is composed of multiple distinct domains ... 26

1.4 Genomic imprinting in plants occurs in the endosperm ... 28

1.4.1 O n the origin of genom ic im printing ... 28

1.4.2 Differential methylation of the maternal and paternal genomes underlies im printing in endosperm ... 29

1.5 O ve rview of the thesis... . . 32

2 Proximal methylation features associated with nonrandom changes in g e n e b o d y m e th y la tio n ... 3 3 2 .1 A b st ra c t ... 3 4 2 .2 In t ro d u c tio n ... 3 5 2 .3 R e s u lts ... 3 6 2.3.1 Cvi genes lack methylation at a subset of CG sites ... 36

2.3.2 Existing methylation states are stably transmitted for many g e n e ratio n s ... . 3 7 2.3.3 Differences in gene body methylation are not associated with gene expression differences... . . 4 1 2.3.4 A small number of CG sites consistently fail to maintain the parental m ethy latio n state ... . 4 2 2.3.5 Changes in CG methylation are not stochastic... 44

2.3.6 Dynamic CGs are clustered and share local methylation features...45

2.3.7 Prediction of dynamic genic CGs using a logistic regression framework...48

2 .4 D is c u ss io n ... 5 2 2 .4 .1 C o n clu sio n s ... . 5 5 2.5 Supplem entary Figures and Tables ... 56

2 .6 M e t h o d s ... 7 3 2 .6 .1 P la nt M ate ria l... . 7 3 2 .6 .2 B isulfite seq ue ncing ... . 73

2.6.3 Determ ining RIL genotype ... 74

(8)

2.6.5 Calculating correlation between replicates... 75

2.6.6 Estimating rate of gain and loss of genic methylation in the RILs ... 76

2.6.7 PCA of gene body methylation levels in 927 strains ... 76

2.6.8 Classifying CGs according to methylation variability across 927 A . tha liana strains ... . . 76

2.6.9 Physical clustering of RIL gain or RIL loss sites ... 77

2.6.10 Distribution of RIL gain or loss sites across gene bodies and intron/exon boundaries ... 78

2.6.11 M ethylation profile plots... 78

2 .6 .12 sR N A a n alysis ... . . 79

2.6.13 Comparison of [CG] and GC content at dynamic cytosines vs. b ackg ro u n d ... . . 7 9 2 .6 .14 M o tif a n alysis ... . . 79

2.6.15 Locus-specific bisulfite-PCR ... 80

2 .6 .16 R N A -se q ... . 8 0 2.6.17 PCA analysis of RNA-seq data... 81

2.6.18 Logistic regression m odel fitting... 81

2.6.19 Analysis of Z. mays and B. distachyon methylation data ... 82

2 .6 .20 List of ab b reviatio ns... . 8 3 2.6.21 Availability of data and m aterials ... 83

3 Conserved imprinting associated with unique epigenetic signatures in the A ra b id o p sis g e nu s ... 8 4 3 .1 A b stra c t ... 8 5 3 .2 In tro d u c tio n ... 8 6 3 .3 R e s u lts ... 8 7 3.3.1 Identification of imprinted genes in A. lyrata...87

3.3.2 Imprinted expression is conserved between A. thaliana and A. lyrata...89

3.3.3 Shared DNA methylation signatures in A. lyrata vs. A. thaliana e n d o sp e rm ... . 9 0 3.3.4 Novel DNA methylation signatures in A. lyrata endosperm ... 93

3.3.5 Association between endosperm DNA methylation and imprinting...94

3 .3 .6 C o n c lu sio n s ... 9 9 3.4 Supplem entary Figures and Tables ... 100

3 .5 M e th o d s ... 1 1 3 3 .5 .1 P la n t m ate ria l ... 1 1 3 3.5.2 m RN A -seq and m apping ... 113

3.5.3 U pdating gene annotations ... 114

3.5.4 Updating gene hom ology inform ation ... 115

3.5.5 Identifying SNPs betw een M N and Kar...116

3.5.6 Differential expression analysis ... 117

3 .5 .7 Im p rintin g a n a lysis ... 1 17 3.5.8 Comparison of imprinting status between A. thaliana and A. lyrata ... 118

(9)

3.5.10 Identifying differentially methylated regions...119

3.5.11 Plots of average methylation across features...120

3.5.12 Locus-specific bisulfite PCR...120

3.5.13 Testing the relationship between endosperm CHG methylation and g e n e e x p re ssio n ... 1 2 1 3.5.14 RT-qPCR analyses ... 121

3.5.15 List of abbreviations...122

3 .5 .1 6 D ata av a ila b ility ... 1 2 2 4 Paternal suppression of seed abortion caused by mutation of maternally expressed imprinted g e n e M E D E A ... 1 2 3 4 .1 In tro d u c tio n ... 1 2 4 4 .2 R e s u lts ... 1 2 5 4.2.1 Several additional A. thaliana accessions can rescue mea ... 125

4.2.2 Accessions that can rescue mea do not share DNA methylation fe a tu re s ... 1 2 8 4.2.3 Rescue of PRC2 mutantsfis2 and fie ... 129

4.2.4 Rescue of mea by Cvi does not require paternal M EA...131

4.2.5 Rescue of mea can be highly variable and depends on multiple e xte rn al fa cto rs ... 1 3 1 4 .3 D is c u ss io n ... 1 3 5 4.4 Supplementary Figures and Tables ... 138

4 .5 M e t h o d s ... 1 4 6 4 .5 .1 P la n t M ate ria l...14 6 4.5.2 Crosses and scoring seed abortion ... 146

4.5.3 Genotyping mea-3 and metl-6...147

4.5.4 List of primers used...147

4.5.5 DNA methylation analysis ... 148

5 Single-nuclei RNA-seq of A. thaliana endosperm reveals novel cell states and imprinting d y n a m ic s ... 1 4 9 5 .1 In tro d u ctio n ... 1 5 0 5 .2 R e s u lts ... 1 5 1 5.2.1 Isolating single endosperm nuclei using FACS...151

5.2.2 RNA-seq of single nuclei from A. thaliana seeds...152

5.2.3 DAPI-based sorting of triploid nuclei does not yield a pure population .... 156

5.2.4 Clustering of single nuclei reveals known and novel endosperm stru c tu re ... 1 5 6 5.2.5 Characterizing novel cell states in endosperm and seed coat...160

5.2.6 Imprinted expression in snRNA-seq datasets...161

5 .3 D is c u ss io n ... 1 6 4 5.4 Supplementary Figures and Tables ... 166

5 .5 M e th o d s ... 1 7 5 5.5.1 Plant M aterial and crossing ... 175

(10)

5.5.2 FACS sorting of leaf and endosperm nuclei...175

5.5.3 snRNA-seq library preparation and sequencing...175

5.5.4 snRNA-seq read mapping and preliminary analysis ... 177

5.5.5 Filtering out low-quality snRNA-seq libraries ... 178

5 .5.6 Read profile m etaplots...178

5.5.7 Clustering of nuclei using PCA and t-SNE ... 178

5.5.8 Clustering of nuclei using SC3 ... 179

5.5.9 Expression of known endosperm and seed coat markers in SC3 c lu ste rs ... 1 7 9 5.5.10 Identifying im printed genes...179

5 .5 .1 1 A b b reviatio ns use d ... 18 0 6 C o n c lu s io n ... 1 8 1 6 .1 S u m m a ry ... 1 8 1 6 .2 F u tu re D ire ctio n s ... 1 8 2 Al Cvi has decondensed chromatin but no significant difference in nucleosome p o sitio n in g v s. C o l ... 1 8 7 A 1 .1 In tro d u ctio n ... 1 8 8 A 1 .2 R e s u lts ... 1 8 9 A1.2.1 Comparison of chromocenter visibility in Col and Cvi nuclei ... 189

A1.2.2 Comparing chromatin organization in Col vs Cvi using MNase-seq a n d sa lt fractio n atio n ... 19 2 A1.2.3 Genome-wide comparison of nucleosome occupancy in Col vs Cvi ... 194

A1.2.4 Nucleosome occupancy is decreased over TEs in Cvi relative to Col ... 194

A1.2.5 Genome-wide comparison of nucleosome positioning in Col vs Cvi...196

A 1 .3 D isc u ss io n ... 1 9 9 A 1.4 Supplem entary Figures and Tables...201

A 1 .5 M e th o d s ... 2 0 4 A 1 .5 .1 P la n t M ate ria l...2 0 4 A1.5.2 Nuclei sorting and immunostaining from leaves or roots ... 204

A1.5.3 Fixed nuclei imaging and quantification of chromatin compaction ... 205

A1.5.4 Nuclei isolation and M Nase treatm ent ... 205

A1.5.5 Salt fractionation, library prep and mapping ... 206

A1.5.6 Nucleosome occupancy and PCA analysis...207

A1.5.7 Nucleosome occupancy/ read density metaplots...208

A 1.5.8 Sim ulating M N ase-seq reads ... 208

A1.5.9 Comparison of nucleosome positioning between Col and Cvi...208 R e fe re n c e s ... 2 1 0

(11)

Chapter 1: Introduction

The different cells of a multicellular organism are mostly genetically identical. Despite this, a huge amount of phenotypic variation exists between these cells, which are often highly specialized in both form and function. How can a single DNA sequence give rise to such variation? Both DNA and the protein scaffold into which it is organized can be marked by a variety of different modifications which affect the DNA without changing the actual DNA

sequence. These marks, called epigenetic marks, act in concert with other cellular machinery to alter the accessibility of the DNA to proteins and other factors, effectively turning different

regions of the genome on or off. They also protect against selfish genetic elements like transposable elements (TEs), which when unchecked can copy themselves and proliferate through the genome, disrupting normal cell function. How epigenetic marks influence gene expression and cell identity during gametogenesis and development, as well as how these marks can be transmitted to and affect offspring, are crucial building blocks for understanding a wide variety of biological questions ranging from human disease to crop improvement. The focus of this thesis is on a specific epigenetic mark: 5-methylcytosine (5-mC), a covalent modification of the DNA base cytosine that is also referred to as DNA methylation. DNA methylation is

widespread among prokaryotic and eukaryotic lineages, including both mammals and land plants, and has many important functions in the genome. In addition, this mark is highly heritable across cell divisions, and in plants DNA methylation changes can even transmitted across generations. In this thesis, I explore several aspects of 5-mC in flowering plants, including examining the transgenerational stability of DNA methylation in gene bodies, and the role of

DNA methylation changes during plant reproduction and seed development. Accordingly, this

introduction reviews what is already known about the establishment and function of DNA methylation in plants. Histone modifications, another family of epigenetic marks, are also discussed due to the high degree of crosstalk between these modifications and DNA

methylation. Later parts of the thesis explore the link between DNA methylation and genomic imprinting, an epigenetic phenomenon that occurs during seed development in plants; therefore, this introduction also reviews reproduction, seed development and imprinting in

(12)

1.1

5-methylcytosine (5-mC) is a conserved epigenetic mark

This section discusses the function of 5-mC in flowering plants, as well as the mechanisms involved in adding and removing 5-mC. In addition to 5-mC, many other DNA modifications exist (notably 5-hmC and 6-mA, among others); however, because 5-mC is far more abundant in plants than these other types of DNA methylation, the phrase "DNA methylation" or "methylation" is used throughout this work to refer specifically to 5-mC.

1.1.1 Properties and function of 5-methylcytosine

5-methylcytosine (5-mG) is a modification of the DNA base cytosine created by the addition of a methyl group to the 5th carbon (Fig. 1-1a). 5-mC is widely conserved and found in most

eukaryotes including both plants and animals, although the machinery required to establish and maintain genomic 5-mC has been lost in some lineages. In plants, 5-mC is found at cytosines in all sequence contexts, but these are usually separated into CG, CHG and CHH (Fig. 1-1b, H = A, T, or C) because 5-mC in these different contexts is added and recognized by different factors and pathways (also see section 1.1.2). 5-methylcytosine is mutagenic because it can spontaneously deaminate to thymine, which can lead to a C->T mutation (Fig. 1-1a) [1]. In mammalian

genomes, where 5-mC occurs primarily in the CG context and most CGs are methylated, the higher rate of C->T mutations at 5-mC has caused CG dinucleotides to become

underrepresented genome-wide [2]. Similarly, CG dinucleotides are depleted in methylated A.

thaliana genes, because genic methylation in A. thaliana also occurs primarily in the CG context

(Fig. 1-1c) [3]. The chemical properties of 5-mC also make it possible to assay the location of these marks genome-wide at very high resolution. 5-mC is resistant to treatment by sodium bisulfite, which converts unmethylated cytosines to uracil (Fig. 1-1a). This has been leveraged to assay methylation levels at single base pair resolution using bisulfite treatment followed by DNA sequencing (bisulfite-seq), although other methods using methylation-sensitive restriction enzymes or antibodies are also used [reviewed in 4].

DNA methylation was long considered mainly a transcriptionally repressive mark, and one

of its key roles in plants is the silencing of transposable elements (TEs) and other repetitive regions, which are common in plant genomes. TEs are segments of the DNA with the ability to

(13)

move or copy themselves to new regions of the genome [reviewed in 5,6]. Randomly transposing TEs can integrate into important regulatory regions or disrupt coding sequences, and so active transposition is highly undesirable to the organism. TEs were first discovered in maize by Barbara

McClintock in 1950 [7], and account for roughly 85% of the maize genome, 20-40% in rice, and

15% in A. thaliana [8], as well as at least 45% of the human genome [9]. In A. thaliana and other

flowering plants, most transposable elements are heavily methylated in all sequence contexts, and this methylation signature is usually associated with a repressed transcriptional state (Fig.

1-1c) [10,11]. Pericentromeric and heterochromatic regions, which contain more TEs and

repetitive DNA than euchromatin, also tend to be highly methylated in all sequence contexts. In contrast, methylation over gene bodies is uncommon, and when it occurs it is usually restricted to the CG context (Fig. 1-1c) [10,11,12]. Somewhat at odds with its canonical role in

transcriptional silencing, CG-only gene body methylation is not associated with transcriptional repression. Instead, CG-methylated genes tend to be moderately expressed, though the

relationship between gene body methylation and expression remains correlative [12]. The function and stability of gene body methylation is explored in more depth in Chapter 2.

(14)

NH 2 16 2 NAO H 5-methylcytosine 0 3 NH N O H thymine bisulfite treatment

1 CG methylation CHG methylation 9 CHH methylation

H = CT or A D = G,T or A CG 0 100 CHG 0. 100 CHH 0 AT2G21300

ilili

ii 111

4 M 0 .

AT2G21310 AT2G21320 AT2G21330

I AT2TE38570 AT2TE38580

Figure 1-1: 5-methylcytosine in plants

(a) Structure of cytosine and 5-methylcytosine (5-mC). Cytosine is converted to 5-mC by various DNA methyltransferases (see section 1.1.2). 5-mC is occasionally converted to thymine through spontaneous deamination. Bisulfite treatment converts unmethylated cytosines to uracil, while 5-mC is protected and nonreactive. (b) The three sequence contexts in which 5-mC is found in plants. (c) Example of average methylation profiles over genes and transposable elements (TEs). The 3 top tracks indicate percent methylation in the CG, CHG and CHH contexts respectively, and the bottom tracks indicate position of genes and TEs. Methylation levels are only shown for loci with at least 5 overlapping reads. Region shown is Chr2:9113276-9131274. Data from Col replicate #1 from Chapter 2.

a NH 2 6 2 H cytosine 0 s 3 NH 16 2 N 0 H uracil x no reaction b C gene annot. TE annot. I - - - -- I - - L. . 11111011111111

(15)

In addition to silencing TEs and other repetitive elements, DNA methylation is important for regulating the expression of a number of genes. In some cases, methylation at a TE or repetitive region appears to have been co-opted by the plant to regulate the expression of a nearby gene, in a type of transposon domestication [also see 5,6]. A striking example of this occurs at the ROS1 locus, which produces an enzyme that removes 5-mC (see section 1.1.3).

Methylation at a TE upstream of ROS1 is required for its expression [13]. Methylation levels at that TE are dependent on both de novo methylation activity by various pathways (see section 1.1.2) and demethylation activity by ROS1 itself; by tuning ROS1 activity to match de novo methylation activity, this mechanism promotes DNA methylation homeostasis [13]. Other examples include RPP7, which produces a nonfunctional transcript when an intronic TE is methylated [14], and SDC, which is normally silenced by high methylation levels at a repetitive region in the promoter, but can be activated when that region is demethylated [15]. Regulatory

DNA methylation is also not restricted to TEs and repetitive DNA: at the IBM1 locus, methylation

in the long intron is required for proper mRNA splicing, and this methylation is not associated with any annotated TE or repetitive element [16]. In one example with huge economic

ramifications, an attempt to clone the most productive oil palm trees to greatly increase yield met with disaster because a large proportion of the genetically identical clones produced sterile 'mantled' fruits that were useless for oil production [17]. This was unfortunately not discovered until years later - when the oil palms reached maturity and produced fruits - and was therefore very costly for farmers. The mantled phenotype was recently found to be due to loss of DNA methylation at a LINE retrotransposon in the intron of DEFICIENS, a gene involved in flower development [17]. The loss of methylation caused the DEFICIENS mRNA to be mis-spliced, producing a truncated and likely nonfunctional protein [17]. Hypomethylation of the LINE element likely occurred due to the cloning method, which has been previously shown to cause hypomethylation at TEs in Arabidopsis [18] and in rice [19] and maize [20]. Thus, aberrant DNA methylation patterns can have significant and long-lasting effects in plants.

Despite the clear importance of DNA methylation in regulating TE and gene expression, there is also significant natural variation in DNA methylation patterns within A. thaliana, and presumably other plant species. DNA methylation changes are predicted to be many times more

(16)

frequent than DNA sequence mutations, suggesting that methylation changes can be a source of genetic variability over short timescales and lead to greater phenotypic diversity and adaptability [21]. A recent study examined the genome-wide DNA methylation patterns of 1,107 A. thaliana accessions obtained worldwide [22]. They found widespread methylation variability across accessions that correlated with their geographical origin and climate, consistent with DNA methylation changes acting as a potential mechanism by which sessile organisms like plants can adapt relatively quickly to their environment [22]. This huge amount of natural variation can also be leveraged to study a number of DNA methylation phenomena using GWAS or other

approaches [22]. In particular, one of the strains studied in our lab is Cvi (from the Cape Verde Islands), which both we [23] and others [22] have noted has strongly depleted gene body methylation relative to other A. thaliana strains. Using this naturally occurring gene body methylation variant to improve our understanding of gene body methylation is the focus of Chapter 2.

Despite the important role that DNA methylation plays in the genome, plants are surprisingly robust to its loss over short timescales. In animals, acute genome-wide loss of DNA methylation is embryonic lethal [24], while first-generation plants that have lost nearly all of their DNA methylation are mostly phenotypically normal [25]. However, repeated selfing of these plants leads to sterility after a few generations due to the misexpression of homeotic genes that regulate flower formation and loss of floral identity [26,27]. Nonetheless, the ability of A. thaliana to tolerate large perturbations in the DNA methylation landscape have greatly facilitated the study of the different DNA methylation pathways and their interactions in plants. Much is now known about the pathways responsible for adding and removing DNA methylation in plants, and these are highlighted in the next two sections.

1.1.2 Mechanisms for adding 5-mC

Newly replicated DNA is hemimethylated: the newly synthesized daughter strand incorporates unmodified cytosines and thus lacks DNA methylation, while the template strand retains the original 5-mC marks. Thus, methylation information can be fully lost after as few as two cell divisions, highlighting the need for pathways dedicated to adding and maintaining 5-mC in the genome. Methylation in higher plants is added by several different DNA methyltransferases

(17)

which act in distinct pathways and are often specific to a particular context [reviewed in

10,11,28, among others]. These pathways work collectively to maintain the current methylation

state, in addition to establishing DNA methylation de novo when appropriate. Methylation in the symmetric CG context (mCG) is maintained by MET1, a homolog of the mammalian maintenance

methyltransferase DNMT1. Like DNMT1, MET1 recognizes hem imethylated DNA in the CG context and methylates the daughter strand to match the template strand (Fig. 1-2a). meti

plants lose all CG and most non-CG methylation genome-wide [29]. Mutant plants also fail to regain methylation at most CG-only sites even after functional MET1 is re-introduced, consistent with its function primarily as a maintenance methyltransferase with little de novo activity [27]. Three 5-methylcytosine-binding proteins, VIM1, VIM2 and VIM3, are also required for CG

methylation maintenance by MET1 [30].

DNA methylation in the CHG context (mCHG) is maintained by a plant-specific

methyltransferase, CHROMOMETHYLASE 3 (CMT3). CMT3 recognizes and binds to the histone modification H3K9me2 (also see section 1.2.1), methylating nearby CHG sites (Fig. 1-2b) [31]. The H3K9 methyltransferases KRYPTONITE (KYP/SUVH4), SUVH5 and SUVH6, in turn, recognize and bind to mCHG, adding H3K9me2 marks to nearby histones (Fig. 1-2b) [32]. This creates a strong positive feedback loop between mCHG and H3K9me2.

Methylation in the CHH context (mCHH) is asymmetric (Fig. 1-1b), and must therefore be re-established de novo after every round of DNA replication. This is primarily accomplished via the RNA-directed DNA methylation pathway (RdDM), which methylates DNA de novo in all contexts using a small RNA (sRNA) guided mechanism (Fig. 1-2c) [reviewed in 33,34; also see

10,11,28]. In the canonical RdDM pathway, a single-stranded RNA template is produced by the

plant-specific RNA polymerase Pol IV, which is then converted to double-stranded RNA (dsRNA)

by RDR2. The dsRNA is cleaved by DICER-LIKE 3 (DCL3) into short 24 nt sRNAs, and one strand of

the sRNA is loaded into ARGONAUTE4 (AGO4). The AGO4-sRNA duplex binds to a

complementary sequence on a scaffold produced by another plant-specific RNA polymerase, Pol V, and recruits a complex including the de novo methyltransferase DRM2. DRM2 is the plant homolog of mammalian de novo methyltransferases DNMT3a and DNMT3b, and

(18)

its target sites by SHH, which binds nucleosomes marked with H3K9me2 [35], while SUVH2 and SUVH9 recruit Pol V to methylated DNA [36], creating a feedback loop between existing

DNA methylation and RdDM. Although the canonical RdDM pathway maintains methylation

through this feedback loop, there are also multiple non-canonical versions of this pathway, including a variation to establish methylation de novo at a new TE [reviewed in 33]. RdDM preferentially acts in euchromatic regions, particularly at young transposons and other repeat sequences [11]. CHH methylation in heterochromatin and pericentromeric regions is instead primarily maintained by CHROMOMETHYLASE 2 (CMT2) in a feedback loop with H3K9me2 similar to the one between CHG methylation and H3K9me2 [37].

Even when all other elements of the pathway are present, a methyltransferase must be able to access the DNA in order to be able to add methylation. Thus, ATP-dependent chromatin remodelers, which can open up access to the DNA in otherwise highly compact heterochromatin, also play a key role in these different methylation pathways [reviewed in 38]. The chromatin remodeling factor DDM1 is required for methylation maintenance in pericentromeric regions and Hi-containing heterochromatin in all contexts, while DRD1 and other members of the chromatin remodeling DDR complex are required for RdDM [11,37]. The double ddml;drdl mutant abolishes nearly all CHH methylation, as well as most CG and CHG methylation in TEs, suggesting that these factors are particularly important for TE silencing [37].

(19)

a b

T= leM:i

I

CGr PO -- CHG AGO4 9 CHH Pol V nethylation methylation methylation

Figure 1-2: Mechanisms for adding 5-mC in plants

(a) MET1 recognizes and methylates hemi-methylated CGs produced after DNA replication. (b) CMT3 recognizes the histone modification H3K9me2 and methylates nearby CHG sites; in a

positive feedback loop, the H3K9 methyltransferase KYP recognizes methylated CHG sites and adds H3K9me2 to nearby histones. (c) Schematic of the canonical RNA-directed DNA

methylation (RdDM) pathway. replication complex i CMT3 KYP CG methylation

(t~)

C ssRNA

N

H3 histone subunit H3K9me2 CHG methylation od d sRNA

(20)

1.1.3 Mechanisms for removing 5-mC

Passive loss of 5-mC can occur after DNA replication if maintenance methylation fails. However,

plants can also remove 5-mC from a DNA strand directly using a DNA glycosylase and replace it with an unmodified cytosine, in a process called active DNA demethylation [reviewed in 39]. Arabidopsis has four DNA glycosylases that can remove 5-mC in any sequence context: DEMETER

(DME) [40], ROS1 (also known as DML1) [41,42], DML2 and DML3 [42]. After the 5-mC base has been removed by one of these enzymes, the abasic site is repaired via the standard base excision repair pathway. DME is only expressed during reproduction and is required for proper seed development (see section 1.4.2) [43]. ROS1, DML2 and DML3 are expressed in somatic tissues and particularly in meristems, which are composed of rapidly dividing cells (Ben Williams, unpublished data). ROS1 is much more highly expressed than DML2 or DML3 and is likely the primary demethylase in somatic tissues [13], though all three proteins share some overlapping function and the triple mutant ros1;dm2;dm3 has a more severe methylation phenotype than

rosi [39,42].

Many DNA methylation pathways like RdDM are in strong positive feedback loops with existing methylation (see section 1.1.2). This, in combination with the compact nature of the A. thaliana genome, increases the odds of methylation spreading ectopically from sites where it is desired (e.g. TEs) to nearby sites where it is not (promoters, genes, etc.). Active DNA

demethylation is thought to help prevent this spread, and ros1;dm2;dm3 mutants gain ectopic methylation around TEs that are near genes [29]. The expression of ROS1 is also tuned to match RdDM levels, balancing methylation and demethylation activity genome-wide [13]. Thus, the methylation levels at a particular locus represent a balance between methylation and

demethylation activity.

1.1.4 Heritability of 5-mC across generations

In mammals, the genome undergoes two waves of extensive demethylation followed by a re-establishment of DNA methylation during their life cycles, once during gametogenesis and once during early embryogenesis [44]. Changes in DNA methylation acquired during the parents' lifetimes are therefore not generally passed on to their progeny. In plants, however, there is little

(21)

evidence that any large-scale erasure and reestablishment of DNA methylation occurs in the embryonic lineage, so DNA methylation changes during a plant's lifetime can potentially be inherited by progeny. DNA methylation changes can therefore have long-lasting ramifications, and can be a rich source of phenotypic variation for applications such as crop improvement.

1.2

Interplay between histone modifications and DNA methylation

DNA is packaged into structures called nucleosomes, consisting of 147 base pairs (bp) of DNA

wrapped around a core set of eight histone proteins. Nucleosomes form the basic organizational unit of eukaryotic chromatin, and can be compacted together to form inaccessible, silent

heterochromatin, or loosened to form transcriptionally accessible euchromatin. There are four core histone subunits, two of which are found in each nucleosome: H2A, H2B, H3, and H4. Post-translational modifications of these core histone subunits, as well as specialized histone variants, influence the accessibility of the DNA and are important regulatory marks that often share a high degree of crosstalk with DNA methylation. Histones, their modifications, and the pathways that add these modifications are also highly conserved among eukaryotes. Here, I discuss some of the histone modifications and histone variants associated with DNA methylation in plants [reviewed in 45,46,47,48,49, among others].

1.2.1 Dimethylation of H3K9 is associated with non-CG methylation

In plants, dimethylation of the 9th lysine residue in the H3 subunit's N-terminal tail (H3K9me2) is important for the establishment of heterochromatin. H3K9me2 is deposited by KYP/SUVH4, although SUVH5 and SUVH6 can also deposit this mark and are partially redundant with KYP/SUVH4 [45]. As noted in section 1.1.2, H3K9me2 and non-CG methylation are mutually reinforcing, and this positive feedback loop helps maintain the heterochromatic state. Triple mutants of KYP and its paralogs SUVH5 and SUVH6 (kyp suvh5/6) show a strong loss of both CHG and CHH methylation, consistent with this link [50]. Thus, H3K9me2 acts in concert with DNA methylation to establish a silent heterochromatic state, and these marks tend to occur in broad domains in the genome, particularly in the pericentromere. H3K9me2 is removed by the histone demethylase IBM1. ibmi mutants gain ectopic DNA methylation and H3K9me2 at a number of

(22)

genes, suggesting that IBM1 is normally required to prevent the spread of these repressive marks into some actively transcribed areas [51]. Interestingly, in mammals the primary

heterochromatic mark is H3K9me3, whereas H3K9me2 instead marks silent euchromatic genes

[47].

1.2.2 H3K27me3 and the Polycomb Repressive Complex

H3K27me3 is another mark associated with silencing, although this mark is usually found in transcriptionally silent, euchromatic genes instead of heterochromatin. Unlike the broad heterochromatin domains established by H3K9me2/DNA methylation, H3K27me3 domains in plants tend to be shorter, covering the silenced gene or promoter but not nearby genes [48]. H3K27me3 often marks developmentally important genes that need to be turned on or off at specific times during the life cycle, and is particularly important for regulating developmental transitions [48,49]. One well-studied example is FLC, which is silenced by H3K27me3 to trigger flowering when developmentally appropriate [52]. H3K27me3 is added by the POLYCOMB REPRESSIVE COMPLEX 2 (PRC2), a highly-conserved complex of four different proteins. The PRC2 was first discovered in Drosophila, which has only one homolog of each of the four PRC2

components. Due to duplications of two of the four PRC2 components, there are at least three different PRC2 complexes in plants, and these have become partially specialized [49]. The FIS-PRC2 complex, composed of FIS2, MEA, FIE and MSI1, is required for proper female

gametophyte and endosperm development, and loss of this complex leads to seed abortion (also see section 1.4.2). The FIS-PRC2 complex is also important for establishing imprinted expression, and both FIS2 and MEA are themselves imprinted (see Chapter 4) [53,54]. The VRN-PRC2

complex is composed of VRN2, CLF or SWN, FIE and MS11, and is required for silencing of FLC in response to vernalization [55]. Finally, the EMF-PRC2 complex is composed of EMF2, CLF or

SWN, FIE and MS11, and is required to prevent early flowering and to repress a set of homeotic

genes that determine floral identity, among others [48]. None of the PRC2 components are predicted to bind DNA [56]. However, a number of transcription factors have been identified that interact with PRC2 components and help recruit PRC2 to its target loci [56,57]. H3K27me3

(23)

euchromatic genes, loss of DNA methylation in meti leads to redistribution of H3K27me3 to previously-DNA methylated TEs [58]. This and other work [59,60] suggests that DNA methylation antagonizes H3K27me3. There is also some evidence that H3K9me2 and H3K27me3 (see section

1.2.3) can substitute for each other under some conditions, since H3K9me2 hypermethylation in meti often occurs at PRC2 target genes [58]. Like H3K9me2, H3K27me3 can be removed by

specific histone demethylases. Two H3K27me3 demethylases have been reported to date in

Arabidopsis: ELF6 [61] and REF6 [62].

1.2.3 H2A.Z is antagonized by 5-mC

The H2A histone variant H2A.Z, which is involved in diverse processes including transcription and

DNA repair [63], is anticorrelated with DNA methylation [64]. It was proposed that incorporation

of H2A.Z into promoters may help promote gene expression by preventing accumulation of DNA methylation, and conversely that DNA methylation of promoters may silence expression in part

by preventing H2A.Z incorporation [64]. Loss of DNA methylation in meti causes gain of H2A.Z in

areas that are usually methylated, suggesting that DNA methylation inhibits incorporation of H2A.Z [64]. However, loss of H2A.Z does not lead to gain of DNA methylation [65], suggesting that while DNA methylation can antagonize H2A.Z, the reverse is not true. In addition, H2A.Z

profiles are largely similar over gene bodies in A. thaliana and E. salsugineum, a species that has

lost gene body methylation, suggesting that gene body methylation alone does not antagonize H2A.Z [66].

1.3

Seed development in A. thaliana

Epigenetic modifications, including DNA methylation, are highly dynamic during plant

reproduction. Plant life cycles alternate between two phases: a haploid, multicellular stage called the gametophyte, and a diploid multicellular stage called the sporophyte. Some plants, including

mosses, exist predominantly as gametophytes, whereas flowering plants (angiosperms) and other vascular plants exist primarily as sporophytes. In A. thaliana, which is an angiosperm, the female gametophyte develops in the ovule and is composed of seven cells mitotically generated from a single haploid spore, including the haploid egg cell and diploid central cell [reviewed in

(24)

67] (Fig. 1-3a). The A. thaliana male gametophyte is even more reduced, consisting of only three

cells at maturity: two identical sperm cells contained within a support cell called the vegetative cell [reviewed in 68] (Fig. 1-3b). Once a pollen grain is deposited on a stigma, the vegetative cell grows into a pollen tube, transporting the two sperm cells to an ovule within the ovary of the mother plant. The pollen tube grows into and destroys one of the two synergid cells in the female gametophyte, ejecting the two sperm cells in its place [69]. In a process called double fertilization, one of the sperm cells fertilizes the egg cell to produce the diploid embryo, while the other sperm cell fertilizes the diploid central cell to produce the triploid endosperm (Fig. 1-3c). The embryo will give rise to the future plant (sporophyte), while the endosperm is a terminal tissue that mediates nutrient transfer from the mother plant to the embryo, much like the placenta in mammals. After fertilization, the embryo undergoes an initial asymmetric division. The smaller, apical cell gives rise to the embryo proper, while the larger basal cell becomes the suspensor, a tissue that connects the embryo to the endosperm to allow nutrient transfer (Fig.

1-3d). Embryo development in A. thaliana is generally separated into globular, heart, linear, bent,

and mature stages, all named after the shape of the embryo at that stage (Fig. 1-3d). Endosperm development also occurs in several distinct stages [70,71,72]. The newly fertilized endosperm initially undergoes repeated, rapid DNA replication and nuclear division without cytokinesis or cell wall deposition, forming a coenocyte. A large central vacuole forces most of these nuclei to the outer edges of the endosperm (Fig. 1-3d). As the embryo reaches the early heart stage, the endosperm begins to cellularize, starting from the micropylar pole and moving towards the chalazal pole of the seed (Fig. 1-3c,d). By the linear embryo stage, the endosperm is almost fully cellularized. As the embryo matures, it gradually depletes the endosperm and grows into the space left behind. In A. thaliana mature seed, the embryo occupies nearly all the space, with only a single layer of endosperm cells remaining (Fig. 1-3d) [70]. In other species, such as wheat, maize, rice, coconut, and other crops, the endosperm is persistent, and the mature seed is composed primarily of this starchy, nutrient-rich tissue. Thus, endosperm is a primary source of calories consumed by both humans and animals, and is crucial for human survival.

(25)

ovule

antipodal cells (iN)

central cell (2N)

egg cell (iN)

synergid cells (IN)

b

pollen grain

vegetative cell (iN)

sperm cells (IN)

C developing seed seed coat (2?) endosperm (2?: 1, ) MP CP embryo(1 2:1') micropylar chalazal pole pole

d globular heart linear bent mature (dry)

nuclei

E endosperm cellularized endosperm embryo proper suspensor central vacuole

Figure 1-3: Fertilization and endosperm development in Arabidopsis thaliana.

(a) Schematic of the mature A. thaliana female gametophyte, composed of six haploid cells,

including the egg cell, and a diploid cell (the central cell) created by the fusion of two nuclei called polar nuclei during the final steps of gametophyte development. (b) Schematic of the mature A. thaliana male gametophyte, composed of three haploid cells. An initial mitotic division of a haploid progenitor creates the vegetative cell and a generative cell; the generative cell undergoes a second mitosis to form the two sperm cells. (c) Schematic of a developing seed post-fertilization. Seed polarity moves from the micropylar pole (MP, bottom left) to the chalazal pole (CP, bottom right), as indicated by the arrow. The seed is surrounded

by a maternal tissue called the seed coat. (d) Diagram of seed development at five different

stages. Globular: embryo is small and round, and endosperm is a syncytium consisting mainly of a central vacuole with nuclei along the outer edge. Heart: endosperm has begun to

cellularize, starting from the MP and moving to the CP. Linear: cellularization is nearly complete; embryo begins to elongate. Bent: endosperm fully cellularized, displaced and consumed by growing embryo. Mature: full-grown embryo fully occupies space; only a single cell layer of endosperm remaining. Seed is dry and turns brown.

a

(26)

nutrient content remains a key part of improving crop performance and improving food availability worldwide.

1.3.1 Endosperm is composed of multiple distinct domains

The developing seed is divided into three main components: the embryo and endosperm, both products of fertilization, and the seed coat, which is a maternal tissue. However, each of these tissues gives rise to several specialized sub-domains during development. The zygote

immediately divides in to form the suspensor, which initially mediates nutrient transfer from endosperm to embryo, and the embryo proper (Fig. 1-4). Similarly, the triploid endosperm is also composed of several highly-differentiated subdomains (Fig. 1-4), which can be distinguished as early as the 16-nuclei stage [73]. The bulk of the endosperm consists of peripheral endosperm, which includes a large central vacuole. This part of the endosperm acts as a nutrient sink, absorbing maternal nutrients and storing them until they are passed on to the embryo [74]. The micropylar domain of the endosperm develops in the area nearest the embryo (Fig. 1-4) and is the first endosperm region to cellularize. The last region to cellularize is the chalazal endosperm, also called the chalazal cyst (Fig. 1-4). The cyst is multinucleate, containing many large polyploid nuclei, and in other Brassicaceae the basal part of the cyst has been shown to grow into the nearby maternal tissues of the seed coat [75,76], likely to facilitate nutrient exchange between the mother and the endosperm. These three domains have all been well characterized

morphologically, and one recent study examined mRNA expression in these three domains using laser capture microdissection and microarrays [77]. This study found that the chalazal

endosperm is transcriptionally highly distinct from other endosperm subdomains, consistent with its specialized role in nutrient transfer [77]. However, much remains to be discovered about the function of these endosperm regions. A more detailed survey of the cell types and domains within endosperm using single-cell approaches is the focus of Chapter 5.

(27)

(2Y:1c') embryo (1'::10) micropylar endosperm (2y:1e) suspe 1sr (1Y:1')

seed coat (2y)

chalazal

endosperm (2y:16)

chalazal

seed coat (2)

Figure 1-4: Domains within the Arabidopsis thaliana seed. Seed shown at the heart

developmental stage. Seed coat and chalazal seed coat are maternal tissues, which surround the seed. Fertilization of the haploid egg cell with a sperm cell gives rise to the diploid embryo and suspensor. Fertilization of the diploid central cell with the second sperm cell gives rise to the endosperm, which consists of three domains: the micropylar endosperm, which surrounds the embryo; the peripheral endosperm; and the chalazal endosperm, which is the last part of the endosperm to cellularize.

(28)

1.4 Genomic imprinting in plants occurs in the endosperm

As part of its crucial role in providing nutrients to the developing embryo, the endosperm is the site of nearly all imprinted expression in angiosperms. Sexually-reproducing species inherit two copies of every locus; one from their mother and one from their father. This approach greatly enhances fitness by providing built-in redundancy to the genome, so that a functional copy can

potentially compensate for a second, nonfunctional copy. Most genes are expressed equally from the maternally-inherited copy and the paternally-inherited copy, and are therefore

biallelically expressed. Imprinted genes, however, are expressed either only from the maternal copy (Maternally Expressed Genes or MEGs) or only from the paternal copy (Paternally

Expressed Genes, or PEGs). Thus, a mutation in the expressed copy of a MEG or PEG is sufficient to cause a mutant phenotype, even if the non-expressed copy is normal. Why would an organism willingly forgo the "safety net" of expressing both copies of a gene, in favor of expressing only one? And how can the organism "remember" which allele was inherited from which parent, when those two alleles may be genetically identical? This section discusses theories on the origin of genomic imprinting, an epigenetic phenomenon mediated by DNA methylation, and explores mechanisms by which imprinted expression occurs in the endosperm of angiosperms. Imprinting is also a primary focus of Chapters 3-5 of this thesis.

1.4.1 On the origin of genomic imprinting

Some have proposed that genomic imprinting is simply an unintended consequence of endosperm methylation dynamics (see section 1.4.2). However, several arguments that imprinted expression in endosperm is beneficial and evolutionarily selected for have been proposed [78]. The parental conflict or kinship theory states that in species where the mother provisions offspring directly and may be fertilized by multiple fathers, both of which are true in angiosperms, there is a conflict between the mother and father over optimal resource allocation to the growing embryo [78,79]. Under these conditions, fathers maximize their reproductive fitness by favoring nutrient allocation to their own progeny at the expense of unrelated maternal half-siblings, while mothers maximize their fitness by allocating resources equally among all their progeny. This conflict may underlie the evolution of imprinted expression. For example, the

(29)

paternal genome of an embryo favors increased expression of growth-promoting genes, which the maternal genome counters by decreasing its own expression of those same genes. Over evolutionary time, the maternal alleles of these genes will become silenced, creating a PEG [78]. Accordingly, the kinship theory predicts that PEGs will tend to promote seed growth, while MEGs will inhibit growth [78]. Interestingly, the two conditions stated in the parental conflict theory are also met by therian mammals, which grow progeny internally and in many cases birth multiple young which may have different fathers. Consistent with these conditions favoring the evolution of imprinting, mammals are the only group other than angiosperms in which imprinted expression has been documented [80]. The mechanisms used to achieve imprinted expression are largely similar in both plants and animals [80]. Imprinting in mammals appears to have evolved along with the evolution of the placenta, and mammalian imprinted genes are often placental growth regulators [81]. Overall, the strong parallels between imprinting in angiosperms and mammals suggest that imprinted expression can be evolutionarily favorable under some conditions. The parental conflict theory and other explanations for the origin of genomic imprinting are reviewed in more detail in [78].

1.4.2 Differential methylation of the maternal and paternal genomes underlies imprinting in endosperm

The endosperm adopts a unique epigenetic state, characterized by decondensed chromocenters and unusual DNA methylation patterns [82]. Expression the DNA demethylase DME (previously discussed in section 1.1.3) in the central cell prior to fertilization, combined with the

downregulation of maintenance methylation pathways in both central cell and endosperm [83], leads to genome-wide loss of DNA methylation on the maternal alleles of endosperm,

particularly at TEs (Fig. 1-5a) [84]. This hypomethylation is important for proper endosperm development, and dme mutant seeds abort after fertilization [85]. This parent-specific difference in methylation is also thought to underlie genomic imprinting, which requires an epigenetic mechanism to distinguish maternally- and paternally-inherited alleles.

Although differential methylation of the maternal and paternal alleles is important for establishing imprinted expression, the mechanisms by which this is accomplished are often

(30)

unknown. However, imprinting at a handful of MEGs has been well characterized [85]. At the

MEG SDC, a set of seven tandem repeats in the promoter is usually heavily methylated via RdDM

and KYP/CMT3, preventing expression [15]. Activity of DME in the central cell removes this methylation on the maternal alleles, allowing maternal-only expression in the endosperm [86]. A similar mechanism appears to establish imprinting at several other MEGs, including FWA [85], MOP9.5 [86], and FIS2 [53]. From these examples, the emerging picture is that the paternal alleles of MEGs are silenced by DNA methylation, perhaps at a TE or repetitive region at the transcriptional start site, and that removal of this methylation by DME allows expression of the maternal alleles (Fig. 1-5b) [85]. However, regulation of the maternally imprinted gene MEA, which is also a component of the PRC2 complex (see section 1.2.2), is more complex [85]. Expression of the maternal MEA alleles requires demethylation by DME, as expected, but the paternal MEA allele is silenced by PRC2, suggesting that maternal MEA protein silences its own paternal MEA allele in a type of self-regulation [40].

Establishment of paternally-biased expression in Arabidopsis thaliana also involves PRC2. H3K27me3 has been found to occupy the maternal (silent) alleles of most PEGs, as well as regions demethylated by DME [87]. Additionally, loss of the PRC2 component FIE causes derepression of the maternal alleles and loss of imprinting at H3K27me3-marked PEGs [87]. A number of studies have suggested that DNA methylation can inhibit PRC2 (see section 1.2.2). Thus, at PEGs, loss of DNA methylation on the maternal alleles is thought to uncover PRC2

binding sites, leading to silencing of the maternal alleles while the methylated paternal alleles remain expressed (Fig. 1-5c). This mechanism has been shown for the PEG PHE1 [87,88], and a similar mechanism is thought to silence the maternal alleles of the PEG HDG3 [89]. In Chapter 3 I report evidence that a subset of PEGs in A. lyrata, a close cousin of A. thaliana, are regulated by a non-PRC2 dependent mechanism. Instead, we find evidence that the maternal alleles of these A. lyrata PEGs are repressed by CMT3/KYP (see Chapter 3).

(31)

a central cell 999 9 99 9 sperm cell 9999 i 9 99 b

DME

endosperm um a99 9 9 y Sunmethylated C methylated CG C K~K~

$2-i

i4

PR

tandem

repeats ' H3 t H3K27me3

9

unmethylated

C methylated C

Figure 1-5: Mechanisms to establish genomic imprinting in A. thaliana endosperm. (a) Activity of DME in the central cell prior to fertilization leads to hypomethylation of the maternal genomes relative to the paternal genomes in endosperm. (b) Mechanism of imprinted expression at SDC, a

MEG, relies on demethylation of promoter tandem repeats on the maternal alleles. (c) Potential

imprinting mechanism at PEGs, where methylation at nearby TEs is predicted to inhibit silencing of the paternal allele by PRC2.

(32)

1.5

Overview of the thesis

This thesis comprises several different projects that broadly focus on the dynamics of 5-mC in

Arabidopsis, both across generations (Chapter 2) and within a single generation, particularly

during reproduction (Chapters 3-5). In Chapter 2, I present a careful analysis of the generational dynamics of CG-only gene body methylation, a widespread and conserved DNA methylation signature that is nonetheless poorly understood. As an addendum to this project, which focused on gene body methylation in the A. thaliana accessions Col and Cvi, in Appendix Al I present an analysis comparing global chromatin organization in these two accessions. In Chapters 3-4, I examine DNA methylation dynamics during the Arabidopsis life cycle, focusing on the

relationship between DNA methylation and imprinting in endosperm. Chapter 3 presents an analysis of genomic imprinting and the associated DNA methylation signatures in A. lyrata, a close cousin of A. thaliana. By comparing these two species, we were able to discover both a conserved set of imprinted genes and a potential novel epigenetic regulatory mechanism for a subset of PEGs in A. lyrata. In Chapter 4, I explore the rescue of a specific maternally expressed gene, mea, by pollen from different A. thaliana accessions and explore a potential link to DNA methylation. Finally, in Chapter 5, I conduct a survey of gene expression in A. thaliana

endosperm in fine detail using single-nuclei RNA-seq. I use these data to identify and characterize cell states within endosperm, as well as explore genomic imprinting at high resolution.

(33)

Chapter 2: Proximal methylation features associated with

nonrandom changes in gene body methylation

The work presented in this chapter was published in Genome Biology (2017):

Picard CL and Gehring M. Proximal methylation features associated with nonrandom changes in gene body methylation. Genome Biology 2017; doi:https://doi.org/10.1186/s13059-017-1206-2.

(34)

2.1

Abstract

Background

Gene body methylation at CG dinucleotides is a widely conserved feature of methylated

genomes but remains poorly understood. The Arabidopsis thaliana strain Cvi has depleted gene body methylation relative to the reference strain Col. Here, we leverage this natural epigenetic difference to investigate gene body methylation stability.

Results

Recombinant inbred lines derived from Col and Cvi were used to examine the transmission of distinct gene body methylation states. The vast majority of genic CG methylation patterns are faithfully transmitted over nine generations according to parental genotype, with only 1-4% of CGs either losing or gaining methylation relative to the parent. Genic CGs that fail to maintain the parental methylation state are shared among independent lines, suggesting that these are

not random occurrences. We use a logistic regression framework to identify features that best predict sites that fail to maintain parental methylation state. Intermediate levels of CG

methylation around a dynamic CG site and high methylation variability across many A.

thaliana strains at that site are the strongest predictors. These data suggest that the dynamic

CGs we identify are not specific to the Col-Cvi recombinant inbred lines but have an epigenetic state that is inherently less stable within the A. thaliana species. Extending this, variably

methylated genic CGs in maize and Brachypodium distachyon are also associated with intermediate local CG methylation.

Conclusions

These results provide new insights into the features determining the inheritance of gene body methylation and demonstrate that two different methylation equilibria can be maintained within single individuals.

Références

Documents relatifs

l’art.  221  al.  1  let.  d  et  e  CPC  est  de  permettre  au  juge  de  déterminer  sur  quels  faits 

Using MethylCap-seq, we discovered changes in DNA methylation at discrete regions of the tadpole neural cell genome during meta- morphosis, with the largest changes occurring during

Nevertheless, the respective proportions of cDEF and tDEF (Figure S8B) are significantly different between the two floral phenotypes from Late Stage 2 – Early Stage 3 onwards, with

We also demonstrate its genome- wide application to the integrative search of new regions with strong association between DNA copy number and gene expression accounting for

→ Notre groupe de travail constate que, malgré les annonces faites, il n’y a pas encore d’application clini- que disponible sur ces appareils et donc que l’investisse- ment dans

In order to focus on the effect of the intermolecular interactions, expected to be the key in the thermodynamic stability of mixed thiol SAMs, all thiols studied were adsorbed on

The expected allele-specific methylation status at four CTCF sites in the H19 ICR was observed for all blastocysts obtained from in vivo fertilization and development with- out or

Nevertheless, the respective proportions of cDEF and tDEF (Figure S8B) are significantly different between the two floral phenotypes from Late Stage 2 – Early Stage 3 onwards, with