• Aucun résultat trouvé

Parental legacy, demography, and admixture influenced the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae)

N/A
N/A
Protected

Academic year: 2021

Partager "Parental legacy, demography, and admixture influenced the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae)"

Copied!
35
0
0

Texte intégral

(1)

HAL Id: hal-02369510

https://hal.archives-ouvertes.fr/hal-02369510

Submitted on 19 Nov 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae)

Dmytro Kryvokhyzha, Adriana Salcedo, Mimmi C. Eriksson, Tianlin Duan, Nilesh Tawari, Jun Chen, Maria Guerrina, Julia Kreiner, Tyler V. Kent, Ulf

Lagercrantz, et al.

To cite this version:

Dmytro Kryvokhyzha, Adriana Salcedo, Mimmi C. Eriksson, Tianlin Duan, Nilesh Tawari, et al..

Parental legacy, demography, and admixture influenced the evolution of the two subgenomes of the

tetraploid Capsella bursa-pastoris (Brassicaceae). PLoS Genetics, Public Library of Science, 2019, 15

(2), pp.e1007949. �10.1371/journal.pgen.1007949�. �hal-02369510�

(2)

Parental legacy, demography, and admixture influenced the evolution of the two

subgenomes of the tetraploid Capsella bursa- pastoris (Brassicaceae)

Dmytro Kryvokhyzha

ID1☯

, Adriana Salcedo

2☯

, Mimmi C. Eriksson

ID1,3

, Tianlin Duan

ID1

, Nilesh Tawari

4

, Jun Chen

ID1

, Maria Guerrina

1

, Julia M. Kreiner

2

, Tyler V. Kent

ID2

, Ulf Lagercrantz

1

, John R. Stinchcombe

ID2

, Sylvain Gle´min

1,5

*, Stephen I. Wright

ID2

*, Martin Lascoux

ID1

*

1 Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and Science for Life Laboratory, Uppsala University, Uppsala, Sweden, 2 Department of Ecology and Evolution, University of Toronto, Toronto, Canada, 3 Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden, 4 Computational and Systems Biology Group, Genome Institute of Singapore, Agency for Science, Technology and Research (A*Star), Singapore, 5 CNRS, Universite´ de Rennes 1, ECOBIO (Ecosyste´ mes, biodiversite´, e´volution) - UMR 6553, F-35000 Rennes, France

☯These authors contributed equally to this work.

*sylvain.glemin@univ-rennes1.fr(SG);stephen.wright@utoronto.ca(SIW);martin.lascoux@ebc.uu.se(ML)

Abstract

Allopolyploidy is generally perceived as a major source of evolutionary novelties and as an instantaneous way to create isolation barriers. However, we do not have a clear understand- ing of how two subgenomes evolve and interact once they have fused in an allopolyploid species nor how isolated they are from their relatives. Here, we address these questions by analyzing genomic and transcriptomic data of allotetraploid Capsella bursa-pastoris in three differentiated populations, Asia, Europe, and the Middle East. We phased the two subge- nomes, one descended from the outcrossing and highly diverse Capsella grandiflora (Cbp

Cg

) and the other one from the selfing and genetically depauperate Capsella orientalis (Cbp

Co

). For each subgenome, we assessed its relationship with the diploid relatives, tem- poral changes of effective population size (N

e

), signatures of positive and negative selec- tion, and gene expression patterns. In all three regions, N

e

of the two subgenomes decreased gradually over time and the Cbp

Co

subgenome accumulated more deleterious changes than Cbp

Cg

. There were signs of widespread admixture between C. bursa-pastoris and its diploid relatives. The two subgenomes were impacted differentially depending on geographic region suggesting either strong interploidy gene flow or multiple origins of C. bursa-pastoris. Selective sweeps were more common on the Cbp

Cg

subgenome in Europe and the Middle East, and on the Cbp

Co

subgenome in Asia. In contrast, differences in expression were limited with the Cbp

Cg

subgenome slightly more expressed than Cbp

Co

in Europe and the Middle-East. In summary, after more than 100,000 generations of co-exis- tence, the two subgenomes of C. bursa-pastoris still retained a strong signature of parental legacy but their evolutionary trajectory strongly varied across geographic regions.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

OPEN ACCESS

Citation: Kryvokhyzha D, Salcedo A, Eriksson MC, Duan T, Tawari N, Chen J, et al. (2019) Parental legacy, demography, and admixture influenced the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae). PLoS Genet 15(2): e1007949.https://doi.org/10.1371/

journal.pgen.1007949

Editor: Juliette de Meaux, University of Cologne, GERMANY

Received: October 1, 2018 Accepted: January 9, 2019 Published: February 15, 2019

Copyright:©2019 Kryvokhyzha et al. This is an open access article distributed under the terms of theCreative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability Statement: DNA sequence data generated for 21 accessions of C. bursa-pastoris is submitted to the NCBI database under the sequence read archive number SRP126886.

Previously generated DNA sequence data for 10 accessions of C. bursa-pastoris, 10 accessions C.

orientalis and 13 accessions C. grandiflora is available in the NCBI (SRP050328, SRP041585, SRP044121). RNA-Seq data is also available in the NCBI (SRA320558). Both phased and unphased SNPs, phylogenetic trees, reconstructed ancestral

(3)

Author summary

Allopolyploid species have two or more sets of chromosomes that originate from hybrid- ization of different species. It remains largely unknown how the two genomes evolve in the same organism and how strongly their evolutionary trajectory depends on the initial differences between the two parental species and the specific demographic history of the newly formed allopolyploid species. To address these questions, we analyzed the genomic and gene expression variation of the shepherd’s purse, a recent allopolyploid species, in three regions of its natural range. After *100,000 generations of co-existence within the same species, the two subgenomes had still retained part of the initial difference between the two parental species in the number of deleterious mutations reflecting a history of mating system differences. This difference, as well as differences in patterns of positive selection and levels of gene expression, also strongly depended on the specific histories of the three regions considered. Most strikingly, and unexpectedly, the allopolyploid species showed signs of hybridization with different diploid relatives or multiple origins in differ- ent parts of its range. Regardless if it was hybridization or multiple origins, this profoundly altered the relationship between the two subgenomes in different regions. Hence, our study illustrates how both the genomic structure and ecological arena interact to deter- mine the evolutionary trajectories of allopolyploid species.

Introduction

Allopolyploidy, the origin of polyploids from two different ancestral lineages, poses serious evolutionary challenges since the presence of two divergent sub-genomes may lead to pertur- bation of meiosis, conflicts in gene expression regulation, protein-protein interactions, and transposable element suppression [1–3]. Whole genome duplication also masks new recessive mutations thereby decreasing selection efficacy [4, 5]. This relaxation of selection, together with the strong speciation bottleneck and shift to self-fertilization that often accompany poly- ploidy [6], ultimately increases the frequency of deleterious mutations retained in the genome [7, 8]. All of these consequences of allopolyploidy can have a negative impact on fitness and over evolutionary time may contribute to the patterns of duplicate gene loss, a process referred to as diploidization [5, 9, 10]. Yet, allopolyploid lineages often not only establish and persist but may even thrive and become more successful than their diploid progenitors and competi- tors, with larger ranges and higher competitive ability [11–20]. The success of allopolyploids is usually explained by their greater evolutionary potential. Having inherited two genomes that evolved separately, and sometimes under drastically different conditions, allopolyploids should have an increased genetic toolbox, assuming that the two genomes do not experience severe conflicts [21–23]. This greater evolutionary potential of allopolyploids can be further enhanced by genomic rearrangements, alteration of gene expression and epigenetic changes [4, 5, 24–30].

All of these specific features come into play during the demographic history of allopoly- ploids. Demographic processes occurring when a species extends its range, such as successive bottlenecks or periods of rapid population growth in the absence of competition, are expected to have a profound impact on evolutionary processes, especially in populations at the front of the expansion range. Species that went through repeated bottlenecks during their range expan- sion are expected to have reduced genetic variation and higher genetic load than more ancient central populations [31, 32]. Similarly, range expansions can also lead to contact and admix- ture with related species. Such admixture can in turn shift the evolutionary path of the focal

sequences, estimates ofπand Dxy with sliding window approach, results of PSMC and SMC++, SIFT4G annotations, CLR estimates of sweepFinder2, TWISST and HAPMIX outputs, homeologue-specific gene expression values are deposited to the Open Science Framework Repository (DOI:10.17605/OSF.IO/5VC34).

Funding: This study was supported by grants from the Swedish Research Council (VR,https://www.

vr.se, 2012-04999, 2015-03797) and the Erik Philip-So¨rensens Stiftelse (www.epss.se) to ML, and by NSERC discovery grants (www.nserc- crsng.gc.ca) to JRS and SIW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

(4)

species. Finally, range expansion will expose newly formed allopolyploid populations to diver- gent selective pressures, providing the possibility of differentially exploiting duplicated genes, and creating asymmetrical patterns of adaptive evolution in different parts of the range.

In this paper, we aim to characterize the evolution of the genome of a recent allopolyploid species during its range expansion. In particular, we explore whether the two subgenomes have similar or different evolutionary trajectories in hybridization, selection and gene expres- sion. The widespread allopolyploid C. bursa-pastoris is a promising system for studying the evolution of polyploidy, with available information on its two progenitor diploid species and their current distribution. C. bursa-pastoris, a selfing species, originated from the hybridization of the Capsella orientalis and Capsella grandiflora / rubella lineages some 100-300 kya [10]. C.

orientalis is a genetically depauperate selfer occurring across the steppes of Central Asia and Eastern Europe. In contrast, C. grandiflora is an extremely genetically diverse obligate outcros- ser which is primarily confined to a small distribution range in the mountains of Northern Greece and Albania. The fourth relative, C. rubella, a selfer recently derived from C. grandi- flora, occurs around the Mediterranean Sea. There is evidence for unidirectional gene flow from C. rubella to C. bursa-pastoris [33]. Among all Capsella species, only C. bursa-pastoris has a worldwide distribution (Fig 1A), which may be partially due to extremely recent colonization and associated with human population movements [34]. A recent study reveals that in Eurasia, C. bursa-pastoris is divided into three genetic clusters—Middle East, Europe, and Asia—with low gene flow among them and strong differentiation both at the nucleotide and gene expres- sion levels [34, 35]. Reconstruction of the colonization history using unphased genomic data suggested that C. bursa-pastoris spread from the Middle East towards Europe and then into Eastern Asia. This colonization history resulted in a typical reduction of nucleotide diversity with the lowest diversity being in the most distant Asian population [34].

Whether adaptation on the two distinct non-recombining [36, 37] subgenomes of C. bursa- pastoris contributed to its rapid population expansion and how they were in return affected by it, remains unclear. Previous studies either ignored the population history of C. bursa-pastoris or failed to consider the two subgenomes separately. In a recent study that does not consider the population demographic history within C. bursa-pastoris, Douglas et al. [10] concluded that there is no strong sign of diploidization in C. bursa-pastoris and most of its variation is the result of the legacy from the parental lineages with some relaxation of purifying selection caused by both the transition to self-fertilization and the greater masking of deleterious muta- tions. Kryvokhyzha et al. [35] considered population history but did not separate the two sub- genomes, and showed that variation in gene expression among Asian, European and Middle Eastern accessions strongly reflects the population history with most of the differences among populations explained by genetic drift. We extend these previous studies by analyzing the genome-wide expression and polymorphism patterns of the two subgenomes of C. bursa-pas- toris in 31 accessions sampled across its natural range in Eurasia. We demonstrate that the two subgenomes follow distinct evolutionary trajectories in different populations and that these trajectories are influenced by both range expansion and hybridization with diploid relatives.

Our study illustrates the need to account for demographic and ecological differences among populations when studying the evolution of subgenomes of allopolyploid species.

Results

Phasing subgenomes

The disomic inheritance of C. bursa-pastoris [36, 37] allowed us to successfully phase most of

the heterozygous sites in the 31 samples analyzed in this study (Fig 1A, S1 Table). Out of 7.1

million high confidence SNPs, our phasing procedure produced an alignment of 5.4 million

(5)

Fig 1. Distribution ranges, sampling locations and phylogenetic relationships ofCapsellaspecies used in this study. (A) Approximative distribution ranges ofC. orientalis,C. grandiflora, andC. rubellaand sampling locations ofC. bursa-pastoris.C.

bursa-pastorishas a worldwide distribution, so its distribution range is not specifically depicted. ASI, EUR ME, CO, CG, CR indicate Asian, European and Middle Eastern populations ofC. bursa-pastoris,C. orientalis,C. grandiflora, andC. rubella, respectively. The distribution ranges are defined according to Hurka et al. [38]. (B) Whole genome NJ tree showing the absolute divergence between different populations ofC. bursa-pastorisat the level of subgenomes. TheCbpCoandCbpCgsubgenomes are marked with Co and Cg. The bootstrap support based on 100 replicates is shown only for the major clades. The rootN.

paniculatais not shown. (C) Density tree visualizing of 1002 NJ trees reconstructed with 100 Kb sliding windows.

https://doi.org/10.1371/journal.pgen.1007949.g001

(6)

phased polymorphic sites across the 31 accessions of C. bursa-pastoris . Scaling these phased SNPs to the whole genome resulted in the alignment of 80.6 Mb that had the same level of het- erozygosity as the unphased data. The alignment of these whole genome sequences of C.

bursa-pastoris with 13 sequences of C. grandiflora, 10 sequences of C. orientalis, one sequence of C. rubella (the reference), and one sequence of N. paniculata used here as an outgroup, yielded 12.8 million polymorphic sites that we used in all analyses. The information for each accession is provided in S1 Table.

We analyzed the structure of the phased data with phylogenetic analyses. The separation of the two subgenomes was strongly supported in the reconstructed whole genome tree (Fig 1B).

The grouping with a corresponding parental species was also maintained in the phylogenetic analyses of each subgenome separately (S1 Fig) as expected given assumptions of the phasing approach. The tree consisted of two highly supported (100% bootstrap) major clades grouping C. grandiflora and the C. grandiflora / rubella lineage descended subgenome of C. bursa-pas- toris (hereafter the Cbp

Cg

subgenome), on the one hand, and C. orientalis and the C. orientalis lineage descended subgenome of C. bursa-pastoris (hereafter the Cbp

Co

subgenome), on the other hand. We also analyzed phylogenetic signals at a finer genomic scale using a sliding win- dow approach with 100-kb window size (Fig 1C). Exclusive monophyly of C. orientalis with the Cbp

Co

subgenome, and C. grandiflora and C. rubella with Cbp

Cg

subgenome was detected in 95% and 83% of trees, respectively (S2 Fig).

Comparing linkage disequilibrium (LD) decay within and across subgenomes—e.g. com- paring r

2

between SNP1 and SNP2 in Cbp

Cg

and r

2

between SNP1 in Cbp

Cg

and SNP2 in Cbp

Co

—suggested largely consistent phasing within each subgenome after accounting for pop- ulation history (S3 Fig). Within homeologues (particularly Cbp

Cg

) we observed high LD in the Middle Eastern and European samples with gradual decay as genetic distance increases. LD decayed more rapidly in the Asian samples than in the other populations (possibly due to less population structure and expansion) but also showed the expected decay within both subge- nomes. In contrast, r

2

with SNPs from the other subgenome or randomly selected from either subgenome was substantially lower and did not show any decay within 10 KB, particularly in the Middle Eastern and European samples. High LD and gradual decay within, and not between, subgenomes suggested phasing was largely effective in separating the two subge- nomes and preserving signals of their demographic history.

The genomic data analyzed in this study were phased using the computational phasing of

reads mapped to the C. rubella reference (HapCUT method) and was validated with mapping

the same reads to the recent assembly of C. bursa-pastoris [37] (HomeoRoq method, see Mate-

rial and methods). The final alignment between the alternatively phased dataset comprised 800

K SNPs. This alignment was relatively small because mapping to two subgenomes resulted in

2x smaller coverage and subsequent genotype calling, filtering, phasing, and alignment cumu-

latively further reduced this size (S4 Fig). Phylogenetic trees reconstructed for 10 K sliding

windows showed 100% split between Cbp

Cg

and Cbp

Co

subgenomes (S5 Fig). Using a smaller

window of 1 K resulted in 83% of trees with mutual monophyly of Cbp

Cg

and Cbp

Co

subge-

nomes (S6 Fig). The other 17% still showed a split between Cbp

Cg

and Cbp

Co

for the majority

of samples, but some single samples were not consistent with the mutual monophyly of the

two subgenomes. This could be either due to incomplete lineage sorting or potential local

phasing errors. Nevertheless, the major clustering into two subgenomes and three populations

was largely maintained in the 17% of trees (S7 Fig). In all trees, the same samples phased with

two alternative methods (HapCUT and HomeoRoq) clustered together for most of the Euro-

pean samples. The Middle Eastern cluster showed slightly less consistent clustering, and the

Asian samples showed clustering into two groups according to the phasing method. The cause

of these discrepancies is difficult to define because errors are possible in both datasets. To

(7)

localize the problematic regions, we also compared the phased dataset analyzed in this study (HapCUT phased) with the assembly sequences of C. bursa-pastoris [37]. This comparison covered almost all positions of the dataset analyzed in this study (10.9Mb out of 12.7Mb).

Most of the discrepancies were located in the vicinity of pericentromeric regions (S8 Fig)), where assembling reads is problematic. In summary, there was a high concordance of the results obtained with the different phasing methods and those that differed are unlikely to have affected the main results of the study.

A geographically structured differentiation between subgenomes

For both subgenomes, the three C. bursa-pastoris populations, Asia (ASI), Europe (EUR) and Middle East (ME), constituted well-defined phylogenetic clusters (Fig 1B and 1C). However, the relationships of each subgenome with its parental species differed. The Cbp

Cg

subgenome formed a monophyletic clade with C. grandiflora at its base. In contrast, the Cbp

Co

subgenome was paraphyletic with C. orientalis that clustered within the ASI group instead of being outside of all C. bursa-pastoris Cbp

Co

subgenomes (this was observed in both phasing methods S9 Fig).

This clustering was unexpected and suggested potential gene flow between the ASI group and C. orientalis or multiple origins of the Cbp

Co

subgenome. Nucleotide diversity was higher on the Cbp

Cg

subgenome than on the Cbp

Co

subgenome for both EUR and ME (Fig 2, S10 Fig, S2 Table), though the difference was significant only for EUR (p-values: 0.005 and 0.154 for EUR and ME, respectively). The opposite pattern was observed for ASI (Fig 2): there the nucleotide diversity in the Cbp

Co

subgenome was significantly higher than in the Cbp

Cg

subgenome (p- value < 0.0001). Interestingly, the diversity of the Cbp

Co

subgenome in all populations was sig- nificantly higher than the diversity of its parental species, C. orientalis (p-value < 0.0001).

Fig 2. Variation in nucleotide diversity (π) between populations ofC. bursa-pastorisand parental species. This boxplot shows the distributions ofπestimated along the genome using 100 Kb sliding windows. Co and Cg indicateC.

orientalisandC. grandiflora / rubelladescendant subgenomes, respectively. ASI, EUR ME, CO and CG correspond to Asian, European and Middle Eastern populations ofC. bursa-pastoris,C. orientalis, andC. grandiflora, respectively.

https://doi.org/10.1371/journal.pgen.1007949.g002

(8)

A 10-fold or greater decrease in N

e

in all selfers but not in C. grandiflora To reconstruct the changes in effective population size (N

e

) over time in the three C. bursa-pas- toris populations and the two ancestral species, we used a pairwise sequentially Markovian coa- lescent model (PSMC). First, we reconstructed the demographic histories of C. orientalis and C. grandiflora (Fig 3). In C. grandiflora, N

e

was mostly constant with some slight decrease in the recent past, but the N

e

of C. orientalis decreased continuously towards the present. In C.

bursa-pastoris, despite a simultaneous rapid range expansion, N

e

of EUR and ME populations also gradually decreased starting from around 100-200 kya. The ASI population showed a sim- ilar pattern but with population size recovery around 5-10 kya and a subsequent decrease to the same N

e

as in EUR and ME. The N

e

patterns of the two subgenomes were similar within each population. Overall, the N

e

history of C. bursa-pastoris was most similar to that of its self- ing ancestor, C. orientalis. We also verified these PSMC results with SMC++, which can con- sider more than two haploid genomes and incorporates linkage disequilibrium (LD) in coalescent hidden Markov models [39]. The general trend was globally the same but the recent decline of C. orientalis was sharper and fluctuations in N

e

were more pronounced (S11 Fig). In summary, the overall pattern of N

e

change over time was mostly the same between the two subgenomes and between the three populations of C. bursa-pastoris and it was largely similar to the pattern observed for the diploid selfer C. orientalis.

Phylogenetic relationships between C. bursa-pastoris and its parental species vary among populations

To quantify the relationships between populations of C. bursa-pastoris and the two parental species, we applied a topology weighting method that calculates the contribution of each indi- vidual group topology to a full tree [40]. We looked at the topologies joining each subgenome of C. bursa-pastoris and a corresponding parental lineage. There are 15 possible topologies for three populations of C. bursa-pastoris, a parental species, and the root. We grouped these topologies into five main groups: species trees—topologies that place a parental lineage as a basal branch to C. bursa-pastoris; three groups that join one of the populations of C. bursa-pas- toris with a parental lineage and potentially signifies admixture; and all other trees that place a parental lineage within C. bursa-pastoris but do not relate it with a particular population of C.

bursa-pastoris (Fig 4A).

These topology weightings varied along the subgenomes and illustrated distinct patterns between the two subgenomes (Fig 4B). In the Cbp

Co

subgenome, the largest average weighting was for the topology grouping the ASI population of C. bursa-pastoris with C. orientalis (Fig 4C), and the species topology had the second largest average weighting. The difference between the average weighting in these two topology groups was statistically significant (S3 Table). In contrast, the species topologies weighting dominated in the Cbp

Cg

subgenome, regardless if C. rubella or C. grandiflora were used as a parental lineage (Fig 4C, S12 Fig, S4 and S5 Tables). The topology uniting the Cbp

Cg

subgenome of the EUR population with C.

rubella had the largest topology weighting among the topologies indicating admixture between these clusters (Fig 4C). Thus, the two subgenomes differed substantially in the pattern of topol- ogy weighting and there were signs of a potential admixture of EUR and ASI with C. rubella and C. orientalis, respectively.

C. bursa-pastoris subgenomes show signs of admixture with diploid relatives

The phylogenetic grouping of C. orientalis with the Asian Cbp

Co

subgenome, together with

topology weighting results and the relatively elevated nucleotide diversity in this subgenome,

(9)

Fig 3. Population size histories ofC. bursa-pastorisand its parental species. Effective population sizes were inferred withPSMCusing whole-genome sequences from a pair of haplotypes per population (thick lines) and 100 bootstrap replicates (thin lines). The estimates for different pairs were similar and shown inS11 Fig. Co and Cg specify theCbpCo

andCbpCgsubgenomes ofC. bursa-pastorisand corresponding parental species in the CO & CG plot. ASI, EUR, ME, CO & CG indicate Asian, European and Middle Eastern populations ofC. bursa-pastoris, andC. orientalisandC.

grandiflora, respectively. The axes are in log scale and the most recent times wherePSMCis less reliable were excluded.

https://doi.org/10.1371/journal.pgen.1007949.g003

(10)

suggested the possibility of gene flow between C. orientalis and C. bursa-pastoris in the ASI population. To test this hypothesis, and at the same time to check for possibilities of gene exchange between C. bursa-pastoris and other Capsella species, we conducted two complemen- tary tests of admixture.

We first used the ABBA-BABA test, a coalescent-based method that relies on the assump- tion that alleles under incomplete lineage sorting are expected to be equally frequent in two descendant populations in the absence of admixture between any of them and a third popula- tion that diverged earlier on from the same common ancestor [41, 42]. The deviation from equal frequency is measured with the D-statistic, which ranges between 0 and 1, with 0 indicat- ing no gene flow and 1 meaning complete admixture. The ABBA-BABA test also provides an estimate of the fraction of the genome that is admixed by comparing the observed difference in ABBA-BABA with the difference expected under a scenario of complete admixture (f-statis- tic). We estimated D and f for triplets including one diploid species and two populations of C.

bursa-pastoris represented by the most related subgenome to that species (Table 1). N. panicu- lata was the outgroup in all tests. The D-statistic were significantly different from 0 in most of the tests, so we considered all three combinations per test group (see Table 1) to determine the pairs that were the most likely to be admixed. The largest fraction of admixture was identified for the pair of the ASI Cbp

Co

subgenome and C. orientalis with an estimate of f indicating that at least 14% of the ASI Cbp

Co

subgenome is admixed. The second largest proportion of

Fig 4. Topology weighting of the three populations ofC. bursa-pastorisand parental species. (A) Fifteen possible rooted topologies for the three groups ofC. bursa-pastorisin one subgenome and the corresponding parental species. The topologies are grouped into five main groups.

TheCbpCoandCbpCgsubgenomes are marked with Co and Cg. ASI, EUR ME, CO, CR, N indicate Asian, European and Middle Eastern populations ofC. bursa-pastoris,C. orientalis,C. rubella, andN. paniculata, respectively. (B) Topology weightings for 100 SNP windows plotted along 8 main scaffolds with loess smoothing (span = 1Mb). The tentative centromeric regions are shaded and only eight major scaffolds (chromosomes) are shown. (C) Average weighting for the five main topology groups. The topology groups are in the same order (left-right and bottom-up) and colors in all three plots.

https://doi.org/10.1371/journal.pgen.1007949.g004

(11)

admixture was detected between C. rubella and the EUR Cbp

Cg

subgenome with f estimate of at least 8%. The estimates for tests with C. grandiflora were the smallest but similar to those obtained for C. rubella. The latter may reflect the strong genetic similarity between these two species rather than real gene flow between C. grandiflora and C. bursa-pastoris which, based on crosses, seems unlikely. Indeed, we crossed individuals from the three populations of C.

bursa-pastoris with their three diploid relatives to test for the presence of reproductive barriers.

Regardless of the direction of the crosses, all crosses between C. rubella and the three popula- tions of C. bursa-pastoris produced viable F1 seeds. In contrast, crosses involving C. grandiflora mostly failed and the abortion rate approached 100%. Details on these crosses are provided in S1 Appendix. The admixture with C. rubella in EUR and C. orientalis in ASI was also sup- ported by the unphased and complete (i.e. no missing genotypes) genomic data (S6 and S7 Tables). Finally, it should be pointed out that given that evidence for C. bursa-pastoris mono- phyly is weak, it is also possible that the signals of admixture with the parental species that we are detecting here actually reflect introgression from an independently-arisen C. bursa-pastoris into either Cbp

Co

or Cbp

Cg

subgenomes.

We then used HAPMIX, a haplotype-based method, which enables capture of both large and fine-scale admixture, as well as an absolute estimate of the proportion of the genome that was admixed. For the analysis of the Cbp

Cg

subgenome of C. bursa-pastoris, the highest levels of admixture were found consistently across regions to be with the diploid C. rubella. In Europe, genomic regions showed an average 18% of admixture with C. rubella, followed by 11% in the Middle East, and just 2% in Asia (S8 Table, S13A Fig). All three populations also showed signs of admixture with C. grandiflora but to a reduced extent compared to C. rubella (7% in Europe, 6% in the Middle East, 0.2% in Asia). C. rubella is highly similar to C. grandi- flora, and as noted above, we expect difficulties in discerning between the two, suggesting that much of the signal of admixture with C. grandiflora could in fact be due to admixture with C.

rubella. Of the regions putatively admixed with C. grandiflora, 78%-96% of sites called as admixed overlapped with those from C. rubella, none of which occurred in unique regions for C. grandiflora. Because of this, and in combination with the reduced genome-wide probability of admixture with the diploid C. grandiflora compared to C. rubella (e.g. 0.11 compared to 0.24 in Europe), we argue that the signals of admixture with the diploid C. grandiflora were likely an artifact of its similarity with the regions of C. rubella admixture. These findings in

Table 1. Results of the ABBA-BABA tests assessing admixture betweenC. bursa-pastorisandC. orientalis,C. grandifloraandC. rubella.

P1 P2 P3 D±error Z-score P-value f±error (%)

EUR_Co ASI_Co CO 0.29±0.03 8.62 <0.0001 22.9±2.5

ME_Co ASI_Co CO 0.18±0.04 4.80 <0.0001 14.0±2.8

EUR_Co ME_Co CO 0.17±0.03 5.70 <0.0001 11.7±2.4

ASI_Cg EUR_Cg CG 0.19±0.01 15.45 <0.0001 19.8±2.2

ASI_Cg ME_Cg CG 0.17±0.02 10.14 <0.0001 12.6±2.0

ME_Cg EUR_Cg CG 0.06±0.01 5.14 <0.0001 6.1±1.2

ASI_Cg EUR_Cg CR 0.61±0.02 26.74 <0.0001 20.1±2.1

ASI_Cg ME_Cg CR 0.49±0.03 14.55 <0.0001 10.6±1.6

ME_Cg EUR_Cg CR 0.26±0.05 4.84 <0.0001 7.9±1.7

P1, P2, and P3refer to the three populations used in the ABBA-BABA tests. A significantly positiveDindicates admixture between P2and P3.fprovides an estimate of the fraction admixture.Z-score andP-value were estimated with the block jack-knife method. The error term corresponds to a standard error. ASI, EUR and ME are the three populations ofC. bursa-pastoriswith _Co and _Cg indicating different subgenomes. CO, CG, and CR stand forC. orientalis,C. grandiflora,C. rubella, respectively.

Every test group is separated by a horizontal line.

https://doi.org/10.1371/journal.pgen.1007949.t001

(12)

accord with the ABBA-BABA results imply that the Cbp

Cg

subgenome has experienced signifi- cant admixture with C. rubella in Europe, and to a lesser extent in the Middle East.

For the analysis of the Cbp

Co

subgenome, signals of admixture with the diploid C. orienta- lis were present in all three populations. In the ME population, genomic regions showed an average 18-21% admixture depending on the reference populations used (S8 Table, S13B Fig). Using the Middle East population for the analysis of the Cbp

Co

subgenomes of EUR and ASI, since it was the least admixed in the HAPMIX results, yielded 15% C. orientalis admix- ture in Asia, and 14% in Europe. These findings suggest admixture of the diploid C. orientalis with the Cbp

Co

subgenome across all three geographic regions. Assuming these levels of admixture accurately reflect reality, we do not have a non-admixed reference population to use for HAPMIX, and are thus violating a key assumption of the method. HAPMIX inferences for the Cbp

Co

subgenome should therefore be taken with caution but we note that the results for ASI and ME are generally congruent with the admixture pattern obtained with ABBA- BABA.

Selection and gene expression

Deleterious mutation accumulation in subgenomes reflects parental legacy. We first estimated the nucleotide diversity at 0-fold (π

0

) and 4-fold (π

4

) degenerate sites and then the ratio π

0

4

as a measure of purifying selection. Low values of π

0

4

would indicate more effi- cient purifying selection [43]. As expected, π

0

4

was much lower in C. grandiflora than in C. orientalis. In C. bursa-pastoris, purifying selection was more efficient in the Cbp

Cg

subge- nome than in the Cbp

Co

subgenome in both EUR and ME. However, the opposite was observed in the ASI population. For both subgenomes, the ASI population had the highest value of π

0

4

even if compared with C. orientalis (S14 Fig).

We then investigated the differences in deleterious mutations among subgenomes and pop- ulations by classifying nonsynonymous mutations with the SIFT4G algorithm that uses site conservation across species to predict the selective effect of nonsynonymous changes [44]. In order to control for possible biases due to the unequal genetic distance between the different genomes and C. rubella, we used both C. rubella and A. thaliana SIFT4G annotation databases.

Because we were interested in the number of deleterious mutations that accumulated after spe- ciation of C. bursa-pastoris, we polarized the mutations of all three species with the recon- structed ancestral sequences of the common ancestors (see Material and methods).

Regardless of the SIFT4G database (C. rubella or A. thaliana), the proportion of deleterious

nonsynonymous sites among derived mutations was always significantly higher in C. orientalis

and the Cbp

Co

subgenomes than in C. grandiflora and the Cbp

Cg

subgenomes (Fig 5, S9 and

S10 Tables). Within C. bursa-pastoris, the proportion of deleterious mutations depended on

the population considered with the highest value in the ASI population and the smallest in

EUR. It is also noteworthy that the proportion of deleterious nonsynonymous sites of the

Cbp

Co

subgenome in EUR and ME was significantly smaller than that of C. orientalis suggest-

ing that a higher effective population size in the Cbp

Co

subgenome than in its ancestor led to

more efficient purifying selection in these two populations. On the other hand, the proportion

of deleterious nonsynonymous sites in the ASI Cbp

Co

subgenome was larger than in C. orienta-

lis, but this difference was only significant for the A. thaliana database. The Cbp

Cg

subgenome

also had a significantly higher proportion of deleterious sites in ASI than in EUR and ME in

all comparisons. In conclusion, the proportion of deleterious sites in the two subgenomes of

extant C. bursa-pastoris still reflected the differences between the parental species and the effi-

cacy of purifying selection in the different C. bursa-pastoris populations was associated to their

synonymous nucleotide diversity or, equivalently, to their effective population size.

(13)

Selective sweeps differ between subgenomes among populations of C. bursa-pastoris.

The three populations of C. bursa-pastoris also differ in patterns of positive selection. Selective sweeps were more significant on the Cbp

Cg

subgenome than on the Cbp

Co

subgenome in EUR and ME, whereas the opposite was true in the ASI population (Fig 6). The regions harboring significant sweep signals were also larger on the Cbp

Cg

subgenome than on the Cbp

Co

subge- nome in EUR and ME (total length 42 Mb, 50 Mb vs 9 Mb, 3 Mb), whereas in ASI the sweep regions were larger on Cbp

Co

than on Cbp

Cg

(total length 4 Mb vs 830 Kb). Although the loca- tions of the Cbp

Cg

sweep signals in EUR and ME largely overlap, the patterns differed between the two populations. For example, the strongest sweep signal in EUR was located on scaffold 1, whereas the strongest sweep signal in ME was on scaffold 6. In addition, EUR had many

Fig 5. Genetic load in the subgenomes ofC. bursa-pastorisand its parental species. The proportion of deleterious nonsynonymous changes was estimated withSIFT4Gon derived alleles, i.e. alleles accumulated after the speciation ofC. bursa-pastoris. The left plot shows the results obtained with theC. rubelladatabase and the right plot those obtained withA. thalianadatabase. Co and Cg are the two subgenomes ofC.

bursa-pastoris. ASI, EUR, ME, CO, CG indicate Asian, European and Middle Eastern populations ofC. bursa-pastoris, and parental speciesC.

orientalisandC. grandiflora, respectively.

https://doi.org/10.1371/journal.pgen.1007949.g005

Fig 6. Selective sweep differences between populations ofC. bursa-pastoris. Selective sweeps are detected with the composite likelihood ratio statistics (CLR) along theCbpCoandCbpCgsubgenomes in Asian (ASI), European (EUR) and Middle Eastern (ME) populations. The 0.01 significance level defined with data simulated under a standard neutral model (green dashed line). Pericentromeric regions are removed. Only eight major scaffolds (chromosomes) are shown.

https://doi.org/10.1371/journal.pgen.1007949.g006

(14)

sweeps in Cbp

Co

subgenome (109 in EUR_Cg, 128 in EUR_Co), but they all were small and hardly above the significance threshold (Fig 6). In the ME population, the sweep signals in the Cbp

Cg

subgenome were prevailing both in size and numbers (101 in ME_Cg, 22 in ME_Co).

The ASI population differed strongly from both EUR and ME not only because most of its sweep signals were on the Cbp

Co

subgenome but also because these sweeps regions were nar- rower and less pronounced (Fig 6). We also verified these selection signals with the most strin- gent filtering of phased fragments (phasing state of all SNPs was supported by fixed differences between parental species) and the major differences between population were maintained (S15 Fig). Thus, all three populations of C. bursa-pastoris were distinct in their selective sweeps pat- terns with the ASI population being the one least affected by positive selection.

Lack of a strong expression dominance of one subgenome. One can expect differences in selection patterns between subgenomes and populations of C. bursa-pastoris to also affect the relative expression of the two subgenomes, or homeologue-specific expression (HSE). In particular, biased adaptation towards one subgenome may select for decreased expression of the other subgenome. Hence, given the observed selection pattern, one would expect the Cbp

Cg

subgenome to be overexpressed in EUR and ME, and the Cbp

Co

subgenome to be over- expressed in ASI.

To assess HSE, we analyzed the RNA-Seq data of 24 accessions representing all three popu- lations of C. bursa-pastoris in a hierarchical Bayesian model that integrates information from both RNA and DNA data [45]. Overall, in agreement with Douglas et al. [10], one subgenome did not dominate the other in the 24 accessions considered together, though a few genes demonstrated a slight expression shift toward the Cbp

Cg

subgenome. On average, we assessed HSE in 13,589 genes per accession (range 12,808-15,340) and 18% of them showed significant HSE (posterior probability of HSE > 0.99). The expression ratios between subgenomes (defined here as Cbp

Co

/ Total) across all assayed genes in the DNA data were close to equal (mean = 0.495). Thus, there was no strong mapping bias.

We found some HSE variation among populations. The mean expression ratios for all genes were 0.494, 0.489, and 0.489 in the ASI, EUR, and ME accessions, respectively, and these mean ratios for genes with significant HSE were 0.487, 0.465, 0.468. The difference in mean ratio between EUR and ME was not significant, but both EUR and ME were significantly dif- ferent from ASI (S11 Table). In addition, the distribution of expression ratios between the two subgenomes was right-skewed in EUR and ME, whereas in the ASI population, the distribu- tion was more symmetrical (Fig 7). The difference between the populations was particularly evident in the grand mean values (Fig 7). Thus, the shift towards higher expression of the Cbp

Cg

subgenome was more prominent in Europe and the Middle East than in Asia.

Discussion

In the present study, we analyzed the genetic changes experienced by the recently formed allo-

polyploid C. bursa-pastoris since its founding, focusing on the evolutionary trajectories fol-

lowed by its two subgenomes in demographically and genetically distinct populations from

Europe, the Middle East, and Asia. The shift to selfing and polyploidy had a global impact on

the species, resulting in a sharp reduction of the effective population size in all populations

accompanied by relaxed selection and accumulation of deleterious mutations. However, the

two subgenomes were not similarly affected, with the magnitude of the subgenome-specific

differences depending on the population considered. The relative patterns of nucleotide diver-

sity, genetic load, selection and gene expression between the two subgenomes in the European

and the Middle Eastern populations were distinct from that observed in Asia. The differences

between populations were further enhanced by post-speciation hybridization of C. bursa-

(15)

pastoris with local parental lineages. Below, we discuss these global and local effects in more detail and their consequences for the history of the species.

But before this, a few words need to be said on the reliability of the phasing approach that was chosen here. Our results are based on the computationally phased (HapCUT) genomic data that was generated from the reads mapped to the C. rubella reference genome—the only reference genome available for the Capsella genus at the time of this study. This approach was possible due to the strict disomic inheritance of C. bursa-pastoris [36, 37]. Disomic inheritance and reproduction by selfing resulted in almost no variation within subgenomes and we there- fore were able to treat our data as diploid. The major challenge of our approach was to reduce mapping bias in favor of the Cbp

Cg

subgenome that is more closely related to C. rubella than the Cbp

Co

subgenome [10]. To minimize this bias, we performed tolerant read mapping, strin- gent SNP filtering and even more stringent filtering of gene expression data. Our survey of the frequency of non-reference alleles, consistent decay of linkage disequilibrium with a distance, distribution of coverage and heterozygosity across the genome (see Material and methods), and almost an equal ratio between homeologous alleles in the DNA data, indicated that there were no major phasing errors. We also verified the sweeps and admixture results by analyzing

Fig 7. Distributions of expression ratios between the two subgenomes ofC. bursa-pastoris. The homeologue specific expression (HSE) is estimated by the fraction theCbpCosubgenome relative to the total expression level. The upper part presents the distributions for DNA counts, the middle plots show the expression for all assayed gene and the lower plot shows only the distribution for genes with significant HSE. The histograms present the distribution of the allelic ratio, whereas the boxplots summarize these results with the grand mean for every sample. ASI, EUR, and ME indicate Asian, European and Middle Eastern populations, respectively.

https://doi.org/10.1371/journal.pgen.1007949.g007

(16)

only the phased fragments showing 100% consistency with fixed differences between parental species. Furthermore, we phased the same data using an alternative approach (HomeoRoq) and mapping our reads to the assembly of C. bursa-pastoris [37]. The two alternatively phased datasets were largely consistent in the separation between the subgenomes, with a perfect clus- tering between alternatively phased samples (HapCUT and HomeoRoq) in EUR and a cluster- ing more influenced by phasing in ASI samples. However, even in that case, ASI samples still were more related to each other than to any other cluster and larger-scale grouping remained the same. Given that the assembly of C. bursa-pastoris we used for this validation was done with short reads and that the classification of the scaffolds between homeologues was based only on the coding part and allowed up to 25% of discrepancy between parental species [37], the congruence of the results is encouraging. Overall, we hence believe that our phasing approach did not lead to biases that could invalidate the main conclusions of our study.

The effect of parental legacy is detectable even after *100,000 years of co- evolution of two initially different subgenomes

The genetic composition of parental species obviously has a strong impact on the evolution of subgenomes in allopolyploids. For example, the allopolyploid Arabidopsis kamchatica origi- nated from the outcrossing diploids Arabidopsis halleri and Arabidopsis lyrata [46] that have rather similar effective population sizes and the two subgenomes of A. kamchatica have thus similar effective population sizes and levels of negative selection [47]. In contrast, the effective population size of the diploid outcrossing ancestor of C. bursa-pastoris, C. grandiflora, is ten times larger than that of its selfing ancestor C. orientalis [10]. Any analysis of the difference in effective population size between the subgenomes of C. bursa-pastoris or of their evolutionary trajectories must therefore account for this initial difference. After the bottleneck associated with the origin of C. bursa-pastoris and the reduction in N

e

due to the shift to selfing [48], the effective population sizes of the two subgenomes are expected to progressively converge and decrease along the same trajectory.

While this was indeed the observed overall pattern, the trajectories followed by the two sub- genomes in the three populations differed: in Europe the initial level was similar to that in the Middle East but higher than in Asia and the decline of N

e

of the Cbp

Cg

subgenome was delayed compared to the sudden decline experienced by the Cbp

Co

subgenome. In contrast, in Asia the two subgenomes initially followed similar downwards trajectories but N

e

increased again in both subgenomes at around 40,000 ya. In the diploid C. orientalis, there was a period of stasis followed by a steeper decline than in the tetraploid. The difference in demography across the three regions could indicate multiple origins of C. bursa-pastoris as suggested by Douglas et al.

[10] and the difference between the diploid and the tetraploid could reflect a mixture of the population expansion experienced by the tetraploid and the buffering effect of tetraploidy against deleterious mutations.

There was a clearly noticeable difference between the two subgenomes in the number of

accumulated deleterious mutations. Based on the strong differences in N

e

, one would expect

the efficacy of selection to be much higher in C. grandiflora than in C. orientalis that has a

much smaller N

e

[49]. In the analysis of the genetic load, we indeed observed that C. orientalis

had a higher proportion of deleterious mutations than C. grandiflora. Hence, the amount of

genetic load most likely was different between the Cbp

Cg

and Cbp

Co

subgenomes of C. bursa-

pastoris at the time of the species emergence. Interestingly, hundreds of thousands of genera-

tions of selfing did not totally erase the differences between the two subgenomes and, today,

the Cbp

Co

subgenome still accumulates more deleterious mutations than the Cbp

Cg

subge-

nome. Despite being statistically significant, the difference between subgenomes is smaller

(17)

than between C. orientalis and C. grandiflora suggesting a convergence of the two subgenomes, which is also confirmed by analyses of genetic load dispersion and gene expression regulation [50].

Everything else being equal, genomic redundancy in polyploids is expected to lead to relax- ation of purifying selection, and to a higher rate of accumulation of deleterious mutations than in diploids. Douglas and coworkers [10], using forward simulations, concluded that indeed the Cbp

Cg

subgenome showed an excess of deleterious mutations compared to what would have been expected from mating system shift alone and ascribed this excess of deleterious mutations to genomic redundancy. Our study also suggests that both effective population size and geno- mic redundancy contribute to the observed pattern. The proportion of derived deleterious mutations in both subgenomes is, with the exception of the Cbp

Co

subgenome in the Asian population, lower than the same proportion in C. orientalis but much higher than in C. grandi- flora where it is extremely low (Fig 5). Furthermore purifying selection is weaker on the Cbp

Co

subgenome than on the Cbp

Cg

subgenome. These data can be explained as follows. First, puri- fying selection on the Cbp

Co

subgenome is stronger than in C. orientalis because the latter has a smaller effective population size than C. bursa-pastoris. Second, purifying selection is weaker on the Cbp

Co

subgenome than on the Cbp

Cg

because of genomic redundancy. Or, stated differ- ently, because of its highest initial genetic quality the Cbp

Cg

may be more sheltered from dele- terious mutations than the Cbp

Co

subgenome.

Nucleotide diversity also demonstrated the effect of parental legacy. The Cbp

Cg

subgenome inherited from the more variable outcrosser C. grandiflora was still more diverse in all popula- tions except the Asian one. The maintenance of part of the parental legacy in both cases sug- gests that, in spite of their initial differences, both subgenomes have experienced similar levels of fixation since the creation of the species. The Asian population is an exception in this regard because it was affected by secondary gene flow as discussed below. Variation in nucleotide diversity in the coding part of the genome also demonstrated similarity in the efficacy of puri- fying selection between the two subgenomes and their corresponding parental lineages, though the pattern in the ASI population was the reverse of that observed in the parental lineages. The effect of parental legacy in C. bursa-pastoris was also noted in Douglas et al. [10]. Thus, the effect of the genetic background of hybridizing species may not be as overwhelming as the effect of mating system but it still impacts the fate of the two subgenomes long after the species arose.

Admixture with diploid species and/or multiple origins

Based on coalescent simulations and the amount of shared variation between C. bursa-pastoris and its parental species Douglas et al. [10] ruled out a single founder but noted that it would be difficult to estimate the exact number of founding lineages. Douglas et al. [10] did not consider hybridization but an earlier study detected gene flow from C. rubella to the European C. bursa- pastoris using 12 nuclear loci and a coalescent-based isolation-with-migration model [51], a result which is in agreement with the general occurrence of abundant trans-specific polymor- phism in the Arabidopsis genus [52]. The present study adds two new twists to the story. First, our results indicate that shared polymorphisms were not symmetrical: namely, the Cbp

Cg

sub- genome showed signs of admixture with C. rubella in the EUR and ME populations, whereas the Cbp

Co

subgenome was admixed with C. orientalis in ASI. Second, in both the whole genome and density trees, C. orientalis appears as derived from the C. bursa-pastoris Cbp

Co

subgenome rather than the converse as one would have expected. No such anomaly was observed for C. grandiflora that, as expected, grouped at the root of the C. bursa-pastoris Cbp

Cg

subgenome. These results could be explained by a mixture of multiple origins and a recent

(18)

admixture. Multiple origins seem to be common in allotetraploids [27, 53, 54] and interploidy gene flow has already been inferred for Capsella [51] and other plant genera [55–57].

Our crossing results did not reject the possibility of ongoing gene flow between C. bursa- pastoris and parental lineages in both Europe and Asia. The distribution of European and Asian populations of C. bursa-pastoris partially overlap with the distribution ranges of C.

rubella and C. orientalis, respectively (Fig 1A). The exact proportion of admixture remains unclear at this stage. Taken at face value, the strongest admixture was between the ASI Cbp

Co

subgenome and C. orientalis. Considering the overlapping estimates of f-statistic and HAP- MIX, the proportion of admixture between the ASI Cbp

Co

subgenomes and C. orientalis was around 14%-23%. The admixture between the EUR Cbp

Cg

subgenome and C. rubella was also strong, being around 8-20%. There were also signs of minor admixture in the ME population with both C. orientalis and C. rubella. This lack of a non-admixed population posed a problem of correct estimation of the proportion of admixture for both the ABBA-BABA and HAPMIX approaches.

In the ABBA-BABA test, departures from the assumptions can lead to incorrect interpreta- tion of the results. We assumed monophyly of the three populations of C. bursa-pastoris, which may be wrong if these populations were of multiple origins. Thus, the observed shared polymorphism might be due to closer relatedness of our C. orientalis samples with the parent of the ASI population than with the parent of EUR and ME populations, but not admixture.

Departures from the assumptions of the ABBA-BABA test can lead to under- or overestimated admixture. In the present case, some proportion of the variation shared between P

3

and both P

1

and P

2

populations could be due to gene flow and not due to incomplete lineage sorting and this would lead to underestimating the amount of admixture. On the other hand, small N

e

and recent divergence of the populations used in the test can inflate estimates of D [58]. Further, the behavior of D in tests involving both selfing and outcrossing species has not been assessed yet. The D statistics were significantly different from zero in all our comparisons suggesting that admixture did indeed occur in all populations of C. bursa-pastoris. The f statistic is consid- ered less prone to be affected by these factors [58], and it was more reliable in our tests too. Its values were close to zero in the alternative combinations for the ABBA-BABA tests where we did not expect to find admixture, while D had high estimates (S12 Table). Thus, the f values are the closest to the real proportion of admixture we could obtain.

In HAPMIX, if the reference populations have non-negligible levels of admixture with each other, such that they have few differences, it will be difficult for HAPMIX to distinguish with which reference population the focal population is more likely to share ancestry, driving admixture probabilities to intermediate values. Therefore, we observed a discrepancy between the results of HAPMIX and ABBA-BABA in the estimates of admixture between the EUR Cbp

Co

subgenome and C. orientalis. However, the results for the Cbp

Cg

subgenome largely agreed between HAPMIX and ABBA-BABA and, together with the results by Slotte et al. [51]

and our crossing experiment bolsters the hypothesis of admixture between C. rubella and C.

bursa-pastoris in Europe. On balance, a scenario with a single origin of C. bursa-pastoris with later rampant admixture with C. orientalis in Asia and less extensive admixture with C. rubella in Europe is consistent with our data.

On the other hand, our results could also be obtained under a scenario of multiple origins.

Such a scenario seems particularly likely if one looks at Fig 4, where the histories of the Cbp

Co

and Cbp

Cg

subgenomes are strikingly different. If we assume that C. orientalis and C. grandi-

flora are indeed parental lineages and there was no unknown parental lineage that went

extinct, this picture can be only explained by a separate and more recent origin of the ASI pop-

ulation (Fig 8). However, the scenario of multiple origins and post-speciation admixture are

not mutually exclusive. The signs of gene flow between EUR and C. rubella are still best

(19)

explained by post-speciation admixture. The weak signs of admixture between C. bursa-pas- toris and C. orientalis in EUR and ME are also difficult to fit into a scenario involving only multiple origins. A possibility is that these signs of admixture resulted from gene flow from ASI to EUR and ME within C. bursa-pastoris. The ASI population is more related to C. orienta- lis and the presence of its alleles in EUR and ME could be spuriously recognized as intro- gressed from ASI. Regardless of whether a single or a multiple origins scenario is the true one, our results demonstrate that the history of C. bursa-pastoris is far more complex than previ- ously imagined.

Weak subgenome-specific expression differences

Many allopolyploid species show subgenome expression bias, where one subgenome tends to be overexpressed relative to the other one [59–62]. This expression dominance is often observed in synthetic allopolyploids [63–66] and thus the major part of such preferential sub- genome dominance is probably established immediately after allopolyploidization. The subge- nome expression dominance is also suggested to be largely defined by parental expression differences [67, 68]. Contradictory results on patterns of subgenome specific expression in C. bursa-pastoris have been obtained so far. Douglas et al. [10] concluded that there is no strong homeologue expression bias and those few genes showing HSE could be explained by parental expression differences. However, genes with HSE do show a slight bias towards over- expression of the Cbp

Cg

subgenome inherited from C. grandiflora / rubella lineage on the

Fig 8. A tentative scenario of multiple origins ofC. bursa-pastoris. The Asian population originated separately from otherC. bursa-pastoris populations. There is gene flow between the EuropeanC. bursa-pastorisandC. rubella(solid arrow), and there may be gene flow between the Asian population andC. orientalis(dashed arrow). ASI, EUR, and ME indicate Asian, European and Middle Eastern populations ofC. bursa- pastoris, respectively.

https://doi.org/10.1371/journal.pgen.1007949.g008

(20)

Figure 3B in Douglas et al. [10]. In contrast, Steige et al. [69] reported higher expression of the Cbp

Co

subgenome inherited from C. orientalis in three accessions, and Cbp

Cg

over-expression in a fourth one (CbpGR). Steige et al. [69] hypothesized that the over-expression of the Cbp

Co

subgenome might be related to a higher number of transposable elements in this subgenome, but they did not find any evidence of this and could not explain the down-regulation of the Cbp

Co

subgenome in the CbpGR accession and in the artificial hybrid between C. rubella and C. orientalis.

Considering the population histories of C. bursa-pastoris sheds some light on these discrep- ancies. The results of Douglas et al. [10] and Steige et al. [69] are consistent with the hypothesis that cis-regulatory differences between the C. orientalis and C. grandiflora / rubella genomes result in over-expression of the Cbp

Cg

subgenome in a hybrid comprising both genomes. Thus, in the absence of other factors, the slight over-expression of the Cbp

Cg

subgenome would be the default HSE pattern in C. bursa-pastoris. In accordance with this, we observed over-expres- sion of the Cbp

Cg

subgenome in the ME and EUR populations that are most likely the closest to the region of origin of C. bursa-pastoris [34]. The accessions that show over-expression of the Cbp

Cg

subgenome in Douglas et al. [10] (SE14 from Sweden) and in Steige et al. [69]

(CbpGR from Greece), as we now know belong to the EUR population [34]. Hence, their results are consistent with ours and expected if the HSE is defined primarily by the differences between the parental lineages. On the other hand, we observed that genes with HSE in the ASI population showed equal expression between the two subgenomes. The accessions showing over-expression of the Cbp

Co

subgenome in Steige et al. [69] also mostly belong to the ASI pop- ulation (CbpKMB and CbpGY, though not CbpDE that putatively originates from Germany).

Thus, the Asian accessions show a HSE that differs from the default pattern. This difference can be caused by the selection preference for the Cbp

Co

subgenome and/or by admixture with C. orientalis that enhanced the cis-regulatory elements of the Cbp

Co

subgenome. The ASI pop- ulation experienced a strong population bottleneck, so genetic drift played some role as well.

These explanations need to be confirmed because HSE can be influenced by many factors (e.g.

trans-regulatory elements, gene methylation, transposable elements), but it is clear that there are different directions of HSE in populations of C. bursa-pastoris and they are caused by the different evolutionary histories of those populations.

The reason we observed an equal expression between subgenomes in ASI, whereas Steige et al. [69] detected expression bias of the Cbp

Co

subgenome for Asian samples, could also be due to different approaches in our analyses. First, we extracted RNA from seedlings, whereas Steige et al. [69] obtained RNA from leaves and flower buds. Variation in HSE for different tis- sues of C. bursa-pastoris is not characterized yet, so the Cbp

Co

expression in seedlings may not be apparent yet. Second, we mapped reads to the C. rubella reference with masked polymor- phism, whereas Steige et al. [69] used the reconstructed reference of an F1 hybrid between C.

orientalis and C. rubella. The bias in our DNA data was not stronger than in Steige et al. [69], so which method is more appropriate remains to be found out.

Conclusion

Three salient, and sometimes unexpected, features of the evolution of the tetraploid shep-

herd’s purse that emerged from the present study, are its complex origin and possible admix-

ture with diploid relatives, the long-lasting effects of the difference between its two parental

species, and the importance of demography in shaping its current genomic diversity. Hence,

the present study suggests that understanding the evolution of tetraploid species without pay-

ing due attention to the historical and ecological backgrounds under which it occurred could

be misleading.

Références

Documents relatifs

In this context, the genome of rice cultivars can be thought of as a mosaic of segments with different origins and histories related to both primitive domesticates and occasional

Scenario 1, which assumes that American invasion started following an admixture between Fiji and the sampled South-East Asia population, obtained posterior probabilities

C’était le moment choisi par l’aïeul, […] pour réaliser le vœu si longtemps caressé d’ accroître son troupeau que les sècheresses, les épizoodies et la rouerie de

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Bien qu’apparemment défavorable, le balayage du bassin versant par le système pluvieux a été trop rapide par rapport aux vitesses de propagation des écoulements dans le cours du

Since the first version of the WT admixture dating method (Pugach et al. 2011) is restricted to admixture events involving only two parental populations, we have modified the

While the differential accumulation of deleterious mutations between subgenomes could explain part of the differential expression between them, there were also

When the recessive load is high relative to the recombina- tion rate, the inverted haplotype never reaches a sufficient fre- quency to replace ancestral haplotypes, and thus