• Aucun résultat trouvé

A gene-based map of the Nod factor-independent Aeschynomene evenia sheds new light on the evolution of nodulation and legume genomes

N/A
N/A
Protected

Academic year: 2021

Partager "A gene-based map of the Nod factor-independent Aeschynomene evenia sheds new light on the evolution of nodulation and legume genomes"

Copied!
13
0
0

Texte intégral

(1)

HAL Id: hal-02410329

https://hal-cnrs.archives-ouvertes.fr/hal-02410329

Submitted on 12 Mar 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International

License

A gene-based map of the Nod factor-independent

Aeschynomene evenia sheds new light on the evolution

of nodulation and legume genomes

Clémence Chaintreuil, Ronan Rivallan, David Bertioli, Christophe Klopp,

Jerome Gouzy, Brigitte Courtois, Philippe Leleux, Guillaume Martin,

Jean-Francois Rami, Djamel Gully, et al.

To cite this version:

Clémence Chaintreuil, Ronan Rivallan, David Bertioli, Christophe Klopp, Jerome Gouzy, et al.. A

gene-based map of the Nod factor-independent Aeschynomene evenia sheds new light on the evolution

of nodulation and legume genomes. DNA Research, Oxford University Press (OUP), 2016, 23 (4),

pp.365-376. �10.1093/dnares/dsw020�. �hal-02410329�

(2)

Full Paper

A gene-based map of the Nod

factor-independent Aeschynomene evenia sheds

new light on the evolution of nodulation and

legume genomes

Cle´mence Chaintreuil

1

, Ronan Rivallan

2

, David J. Bertioli

3

,

Christophe Klopp

4

, Je´roˆme Gouzy

5

, Brigitte Courtois

2

, Philippe Leleux

1,4

,

Guillaume Martin

2

, Jean-Franc¸ois Rami

2

, Djamel Gully

1

,

Hugues Parrinello

6

, Dany Se´verac

6

, Delphine Patrel

1,7

, Jo€

el Fardoux

1

,

William Ribie`re

7

, Marc Boursot

1

, Fabienne Cartieaux

1

, Pierre Czernic

1

,

Pascal Ratet

8,9

, Pierre Mournet

2

, Eric Giraud

1

, and

Jean-Franc¸ois Arrighi

1,

*

1

IRD, UMR LSTM, Campus International de Baillarguet, F-34398 Montpellier, France,

2

CIRAD, UMR AGAP, Campus

de Lavalette, F-34398 Montpellier, France,

3

University of Brasılia, Institute of Biological Sciences, Campus Darcy

Ribeiro, 70910-900 Brasılia, DF, Brazil,

4

INRA, Plateforme GenoToul Bioinfo, UR 875, INRA Auzeville, F-31326

Castanet-Tolosan, France,

5

INRA, UMR441 LIPM, INRA Auzeville, F-31326 Castanet-Tolosan, France,

6

MGX-Montpellier GenomiX, Institut de Ge´nomique Fonctionnelle, F-34094 MGX-Montpellier, France,

7

IRD, Centre IRD de

Montpellier France Sud, F-34394 Montpellier, France,

8

Institute of Plant Sciences Paris Saclay IPS2, CNRS, INRA,

Universite´ Paris-Sud, Universite´ Evry, Universite´ Paris-Saclay, 91405 Orsay, France and

9

Institute of Plant

Sciences Paris-Saclay IPS2, Paris Diderot, Sorbonne Paris-Cite´, 91405 Orsay, France

*To whom correspondence should be addressed. Tel.þ33 04 67 59 38 82. Fax. þ33 04 67 59 38 02. Email: jean-francois.arrighi@ird.fr

Edited by Dr Sachiko Isobe

Received 11 March 2016; Accepted 2 May 2016

Abstract

Aeschynomene evenia has emerged as a new model legume for the deciphering of the molecular

mechanisms of an alternative symbiotic process that is independent of the Nod factors. Whereas

most of the research on nitrogen-fixing symbiosis, legume genetics and genomics has so far

fo-cused on Galegoid and Phaseolid legumes, A. evenia falls in the more basal and understudied

Dalbergioid clade along with peanut (Arachis hypogaea). To provide insights into the symbiotic

genes content and the structure of the A. evenia genome, we established a gene-based genetic

map for this species. Firstly, an RNAseq analysis was performed on the two parental lines

se-lected to generate a F

2

mapping population. The transcriptomic data were used to develop

mo-lecular markers and they allowed the identification of most symbiotic genes. The resulting map

comprised 364 markers arranged in 10 linkage groups (2n

¼ 20). A comparative analysis with the

sequenced genomes of Arachis duranensis and A. ipaensis, the diploid ancestors of peanut,

indi-cated blocks of conserved macrosynteny. Altogether, these results provided important clues

re-garding the evolution of symbiotic genes in a Nod factor-independent context. They provide a

VCThe Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact

journals.permissions@oup.com 365

doi: 10.1093/dnares/dsw020 Advance Access Publication Date: 13 June 2016 Full Paper

(3)

basis for a genome sequencing project and pave the way for forward genetic analysis of

symbio-sis in A. evenia.

Key words: Aeschynomene evenia; symbiosis; legume; genetic map; comparative genomics

1. Introduction

Fabaceae, or legumes, account for 27% of the world’s primary crop production. The capacity of most legumes to form nitrogen fixing nod-ules with rhizobia underlies their importance as a source of protein for human and animal diet and of nitrogen in both natural and agricul-tural ecosystems.1Among the 20,000 legume species, around 14,000 fall into the Papilionoideae subfamily that is divided into four main clades Genistoids, Dalbergioids, Phaseoloids and Galegoids (Fig. 1).7

To advance in the understanding of how legumes have evolved to nitrogen-fixing symbiosis with rhizobia, two model species belonging to the Galegoid clade and with favorable genetic attributes, Medicago truncatula and Lotus japonicus, have been extensively studied. This study has revealed a sophisticated symbiotic process that is triggered by the recognition of rhizobial signal molecules called Nod factors (NF) and that involves both the formation of an infection thread guiding the bacteria inside the root and a distant in-duction of a nodule primordium. Furthermore, the genetic dissection of the nodulation process has allowed the identification and elucida-tion of the role of many genes that are essential for the different steps of nodule development, some of them being also involved in mycor-rhizal symbiosis.8Such mechanisms are likely to be highly conserved

in the Phaseolid and Galegoid clades that share similar symbiotic in-fection and nodule organogenesis processes (Fig. 1). In addition, since most of the legumes used as crops are members of the Phaseoloids and Galegoids, genetic and genomic studies, including genome sequencing and comparative genomics, have greatly ad-vanced in these two clades.9–17

Conversely, the Genistoid and the Dalbergioid clades, which are more basal in their divergence within the Papilionoideae, have lagged far behind, even though they contain important crop legumes such as the genistoid lupine (Lupinus angustifolius) and the dalbergioid pea-nut (Arachis hypogaea). This is partly due to the large and complex nature of their genomes that makes their study challenging. Hence, in the case of peanut, which is a recent allotetraploid species, the choice was made to sequence first the genomes of its two diploid pro-genitors, Arachis duranensis and A. ipaensis.18Similarly, genistoid

and dalbergioid legumes have been understudied for their symbiotic properties although they display a distinct infection process that is initiated in an intercellular fashion and by the formation of nodules that originate directly from dividing infected cells (Fig. 1).2But most

surprisingly, in some Aeschynomene species, which are phylogeneti-cally related to Arachis, an unconventional symbiotic process has also been described where some Bradyrhizobium strains are able to form nodules in the absence of NF synthesis (Fig. 1).19Unravelling

the molecular mechanisms of this NF-independent process would bring important insights on the evolution of rhizobium-legume sym-biosis.20For this purpose, Aeschynomene evenia was proposed as a

new model legume because of its advantageous genetic and develop-mental characteristics for molecular genetics.21,22

To provide insights into the symbiotic gene content and on the structure of the NF-independent A. evenia genome, we undertook the development of a gene-based genetic map. RNAseq data ob-tained for each parental line were mined for symbiotic gene discovery and molecular marker development. These markers were used to

Dalbergioids

Arachis hypogaea

Aeschynomene evenia

Genistoids Lupinus angustifolius

Phaseolids Glycine max Phaseolus vulgaris

Robinoids IRLC Galegoids Lotus japonicus Medicago truncatula Intercellular infection

Root hair infection thread Basal clades 55 MYA *NI* Rare occurrence of nodulation 50 MYA 58 MYA 54 MYA

Figure 1. Phylogeny of the Papilionoid lineage and evolution of the symbiotic infection process. The tree is a simplified representation with triangle represent-ing the major clades and the two subclades of the Galegoids: the Robinoids and the plastid DNA inverted repeat-lackrepresent-ing clade (IRLC). Known divergence times are indicated in MYA (million years ago) for some nodes (circles) and notable species are listed for each clade. Symbiotic infection processes encountered in the different clades are reported, excepted for the basal clades for which occurrence of nodulation is restricted to few understudied species. Note that Aeschynomene is the only legume genus where a Nod-independent symbiotic infection process (*NI*) has been reported for several species, including A. evenia. Phylogeny and drawings are adaptations, date estimates and nodulation properties come from previous publications.2-6

(4)

genotype a F2mapping population and construct a high-density

ge-netic map that was subjected to comparative analysis with the Arachis duranensis and A. ipaensis genomes.

2. Materials and methods

2.1. Plant material, culture, crossing and observation

Seed germination, plant culture and hybridizations were performed as indicated.21To develop the mapping population, individual seeds

of the A. evenia accessions CIAT8232 (Brazil) and CIAT22838 (Malawi) obtained from CIAT (Colombia) were selfed three times to generate inbred lines and then crossed manually. The hybrid nature of the F1 plants was confirmed with molecular markers as

de-scribed.22Two F1plants obtained by bi-directional hybridizations

were selfed to develop the F2mapping population in greenhouse. For

pollen viability analysis, buds were fixed in Carnoy’s fixative prior to anthesis, when pollen was mature but anthers non-dehiscent, and then stained using a simplified method of the Alexander’s stain as de-tailed.3Pollen viability was scored under the light microscope for at

least three flowers per plant by counting aborted pollen grains, which stained pale turquoise blue and nonaborted pollen grains, which stained dark blue or purple.

2.2 DNA and RNA isolation

Genomic DNA was extracted from young leaves using the CTAB method with the addition of b-mercaptoethanol 2% and PVPP 2% to the CTAB solution in order to limit polysaccharides and polyphe-nols co-extraction.21DNA quality and quantity were evaluated in

1% agarose gel electrophoresis and by spectrophotometer before DNA normalization to a concentration of 10 ng/ml.

For RNA extractions, tissue samples were collected from in vitro cultured plants: roots and leaves on 7-days old un-inoculated plants, nodules at 4, 7 and 14-days after inoculation with the Bradyrhizobium strain ORS278. Total RNA was extracted using the SV total RNA Isolation System (Promega) excepted for leaves for which RNA was extracted using a CTAB protocol.23 RNA was quantified using a

NanoDrop ND-1000 spectrophotometer and its quality verified using a 2100 Bioanalyzer RNA Nanochip (Agilent, Santa Clara, CA, USA). For each parental line, a total of 12 mg of RNA was pooled equally from the five tissues for Illumina library construction.

2.3 Development of Illumina transcriptomes

Two mRNA libraries were built and sequenced in one lane as de-scribed.24The Illumina paired-end sequencing technology generated 2 100 bp independent reads from the 200 bp insert libraries. RNA-seq data were processed for the de novo transcriptome assemblies us-ing the Velvet 1.2.07 and CAP3 08/06/13 softwares and orthologous relationships between contigs of the two transcriptome assemblies were identified by best reciprocal hit search. To evaluate the quality of the assembly, all usable reads were realigned to the contigs using BWA-MEM 0.7.12-r1039 and SAMtools idxstats 1.1. To assess the depth of gene coverage through comparative genome analysis, ORFs were searched in A. evenia transcripts with TransDcoder_r20140704 and used for BLASTX alignment (E value < 1020) with Blastall 2.2.26 onto the protein database of A. duranensis (Araip. V14167.a1.M1.peptide.fa file downloaded from www.peanut.org). Raw Illumina sequence data have been deposited in the NCBI Sequence Read Archive (SRA) under the accession numbers

SRR3276128 and SRR3285082, for the CIAT8232 and CIAT22838 lines respectively.

2.4 Sequence search and comparison

Symbiotic gene sequences were obtained from M. truncatula and L. japonicus by searching in the NCBI database. These sequences were used to find orthologous unigenes in an A. evenia local database devel-oped with a Galaxy work bench (INRA, Toulouse, France) (https:// bbric-pipelines.toulouse.inra.fr/galaxy/) and containing the CIAT8232 and CIAT22838 transcriptome data (Supplementary Tables S1, S3;

Files S1, S2).

For estimation of divergence time, single-copy genes identified in the CIAT8232 and CIAT22838 transcriptomes were used to ob-tain orthologous sequences for A. evenia IRLFL6945 from its tran-scriptome database (http://esttik.cirad.fr) and for A. duranensis and A. ipaensis from their genome sequence database (http://peanutbase. org/) (Supplementary Table S5). Coding sequences were cropped, aligned and used to calculate Ks values with the DnaSP 5.10 soft-ware (Supplementary Fig. S1). Divergence times were obtained based on a simple molecular clock assumption and by using the previously known 49 MYA divergence time estimate between Aeschynomene and Arachis.6

2.5 SSR and INDEL genotyping

Detection of microsatellites in the transcriptome data were performed using the SPUTNIK software by searching for di- to hexanucleotide SSRs with a minimum of three to six repetitions depending on the mo-tifs. Orthologous sequences in the two transcriptome assemblies com-prising different lengths of the same SSR motif, with a contig length over 500 nt and with at least 50 nt surrounding the SSR motifs were selected as in silico polymorphic SSRs (Supplementary Table S1). For specific gene mapping, SSR and INDELs were searched by sequence comparison from the two transcriptomes. Primer pairs were designed using Primer3 with tree parameters: (i) a primer length of 20 bp with an annealing temperature of 55C, (ii) three ranges of PCR product

size of 90–120, 150–180 and 220–250 bp, (iii) the addition of a 50-end

M13 tail (50-CACGACGTTGTAAAACGAC-30) on the forward

primer (Supplementary Table S2). SSR markers were checked for PCR-amplification and co-dominant behavior, organized in 12-SSR multiplexes by combining both the three ranges of fragment sizes and the four fluorochromes (FAM, VIC, NED and PET), and then used to genotype the F2mapping population as described.25

2.6 SNP genotyping

To detect single nucleotide polymorphisms for a set of target genes, sequences of orthologous contigs identified in the two transcriptome datasets were aligned with the MUSCLE program (Supplementary Table S3). Positions of the SNPs were determined by examining the intron/exon structure of orthologous genes in Arachis (http://peanut base.org/). Whenever necessary, additional SNPs were searched by intron sequencing. Up to three SNPs were selected for each target gene and flanking sequences 50 bp both upstream and downstream each target SNP were sent to KBioscience for primer design (Supplementary Table S4). Genotyping of the F2mapping population

was performed with the KASP genotyping chemistry (KBioscience Ltd., Hoddesdon, UK) on Fluidigm 96.96 Dynamic Arrays using the BioMark HD system (Fluidigm Corp, San Francisco, CA). On the basis of obtained fluorescence, allele call data were viewed

(5)

graphically as a scatter plot for each marker assayed by using Fluidigm SNP Genotyping Analysis software and checked manually.

2.7 Genetic map construction

Linkage analysis was performed on segregated genotypic data from the F2 mapping population using JoinMapVR version 4 (Plant

Research International BV, Wageningen, Netherlands). The marker loci were attributed to linkage groups using the JoinMapVR grouping

module with a logarithm of odds (LOD) scores of 6.0. Marker order and genetic distance (in cM) were calculated using a regression map-ping algorithm with the following parameters: Kosambi’s mapmap-ping function, recombination frequency 0.40, and LOD threshold of 1.0 (Supplementary Table S6). The segregation data were tested for devi-ation from the expected Mendelian ratio using a Chi-square test. LG numbering and orientation were attributed to best fit the A. duranen-sis and A. ipaenduranen-sis chromosomes.

2.8 Comparative genomics

Gene sequences underlying the molecular marker loci assigned to the linkage map were blasted against the A. duranensis and A. ipaensis genome sequences (http://peanutbase.org/) to detect significant matches at minimum threshold E values < 1060 (Supplementary

Table S6). For the purpose of graphic preparation, cM distance on the A. evenia linkage groups were scaled by factors calculated on the basis of the genome size of A. ipaensis, to provide matching chromo-somal lengths in base pairs. The graphical comparative maps were drawn using the Circos program (http://circos.ca/) and the SpiderMap software (JF Rami, unpublished).

3. Results and discussion

3.1 Selection of parental genotypes to develop a

mapping population

3.1.1 Choice of the reference line

The A. evenia species was proposed as a new model legume to inves-tigate the evolution of the nitrogen-fixing symbiosis because it pos-sesses a number of interesting characteristics for both classical and molecular genetics.21,22Key attributes of A. evenia include diploidy (2n ¼ 2 ¼ 20), self-pollination, a small genome for a legume, a short growth cycle and a high level of diversity. In addition it is efficiently nodulated by the well-characterized strains Bradyrhizobium ORS278 and ORS285 and it can be transformed using the hairy root system. First nodulation studies were performed on the cultivar IRFL6945 belonging to the A. evenia ssp. serrulata.21But subsequent analysis of the intra-specific diversity revealed that A. evenia ssp. evenia is more appropriate due to a smaller genome size (415 vs 465 Mb) and the presence of genotypes with a more rapid reproduc-tive cycle and a non-branching habit that facilitate plant manage-ment (Fig. 2A,B).22We made use of this knowledge to select from

within the subspecies evenia var. evenia the genotype Mbao CIAT22838, which displays these advantageous characteristics, as the reference genotype. Its nodulation properties were previously shown to be representative of the species with 11nodules/plant, and its ability to be transformed by Agrobacterium rhizogenes was found to be similar to what was described for the line IRLF6945 with a 60% root transformation rate and subsequent 90% nodula-tion rate upon inoculanodula-tion with Bradyrhizobium ORS278.21Even

though A. evenia is a self-pollinated plant and was previously shown

to be predominantly homozygous, a single seed from the reference genotype was selfed three times to ensure high homozygosity.

3.1.2 Choice of a mapping parent

The establishment of a mapping population required the identifica-tion of a second parent that has favorable polymorphism while the resulting F2population should not have developmental and genetic

abnormalities. Characterization of the diversity in A. evenia had pre-viously revealed that the genetic polymorphism was the highest be-tween the subspecies evenia and serrulata but that inter-subspecies crosses led to sterile F1hybrids. Conversely, intra-subspecific

hybridi-zations were shown to generate fully fertile F1 progenies.22

Therefore, to maximize genetic polymorphism, the genotype Bahia CIAT8232 that belongs to the subspecies evenia var. pauciciliata and that presents a distinct geographic origin (African for Mbao versus American for Bahia) was developed as an inbred line and used as the second mapping parent (Fig. 2A). The Mbao and Bahia genotypes present a similar DNA content (415 and 405 Mb, respectively) but they were shown to display a 32% SSR polymorphism.22This situa-tion is similar to the one observed in Medicago truncatula with the two mapping parents Jemalong and DZA315.16 and between three ecotypes of Arabidopsis thaliana, indicating it is convenient for gen-erating genetic maps.26In addition, the two genotypes display phe-notypic differences making them readily distinguishable such as the branching habit, the timing of flower openings and the presence of a dark throat on the upper petal (Fig. 2B).

3.1.3 Development of a F2 mapping population

Artificial hybridizations were performed using the genotype Mbao CIAT22838 as female and male alternatively and producing F1A

and F1B progenies respectively. These F1 plants showed hybrid

vigor, intermediate flower characteristics (opening time and shape) and were self-fertile (Fig. 2B). In particular, they displayed equiva-lent pod set (Bahia: 8.6 6 0.9, Mbao: 8.2 6 1, F1: 8.8 6 1.1 seeds/

pod). The resulting F2Aand F2Boffsprings had no albino or dwarf

individuals. Altogether, this suggested that Bahia CIAT8232 was a suitable crossing partner to Mbao CIAT22838 and that the ob-tained F2offsprings could be used as mapping population to

de-velop a genetic map.

3.2 Transcriptome sequencing for identification of

symbiotic genes and genetic markers

3.2.1 Illumina paired-end sequencing and de novo

transcriptome assembly

High-throughput transcriptomic sequencing allows generating a large transcript sequence dataset for gene discovery and development of molecular markers that serve for gene-based map construction. To be able to achieve a broad survey of genes associated with symbiosis, RNA was extracted and pooled from various organs: roots, leaves and nodules at 4, 7 and 14-dpi. In addition, to increase the efficiency of molecular marker development by prior in silico polymorphic analysis, a RNA library was developed for each parental genotype.

Using Illumina paired-end sequencing and after stringent quality check of the raw data, 72 million reads were kept for the Mbao ge-notype and 66 million for the Bahia gege-notype. Based on these high-quality reads, a total of 51,763 and 52,648 contigs were assembled respectively, with a common N50 of 1,800 bp and a consistent contig size distribution (Table 1). This indicated that a similar transcript content was obtained for both mapping parents. Furthermore,

(6)

realignment of all the usable reads to the contigs revealed that 35,000 contigs were remapped by more than 100 reads and, among them, 24,500 contigs were realigned with more than 500 reads, suggesting the contigs were well overlapped by the sequencing reads (Fig. 3A). A. evenia being a member of the Dalbergioid clade along with the recently sequenced A. duranensis and A. ipaensis, their genomic data could be also used to assess the quality and cover-age of the assembled transcriptomes.22Since A. duranensis has not

experienced the local gene duplications evidenced for A. ipaensis, the former species was better suited for this purpose. Thus, comparative analysis revealed that 60% of total contigs (31,850) and 83% of contigs over 500 bp in length (26,378) had BLAST matches (E-value < 105) onto the 36,700 predicted genes in the A. duranensis

genome (Fig. 3B). In addition, plotting the contigs onto top-hit ho-mologous coding sequences of A. duranensis (E-value < 1030)

indi-cated a high proportion of genes with complete coding sequence (Fig. 3C). Altogether, these data suggested that a comprehensive gene capture was achieved for A. evenia.

3.2.2 Identification of putative orthologs of symbiotic

genes

To determine which symbiotic genes can be identified in A. evenia, a BLAST search was performed in the Bahia and Mbao transcriptomes using the sequence of symbiotic genes characterized in the two model legumes M. truncatula and L. japonicus.8,27–33 This allowed the

identification of a single sequence in A. evenia for 36 out of 47 sym-biotic genes analysed, for which the putative orthology was checked by reciprocal BLAST and sequence alignment (Fig. 4; Files S1, S2). These genes completely covered the different steps of the nodulation process: Nod signaling, activation and regulation of transcription, nodule organogenesis and regulation of nodulation (Fig. 4). This coverage suggested that these steps were conserved in A. evenia, but it could not be excluded that some of these genes were only involved in the arbuscular mycorrhizal symbiosis or in other biological pro-cesses. So far, only the A. evenia SYMRK, CCaMK, CRE1 and DNF1 orthologs were effectively demonstrated to play a role in nodulation.34,35

A. duranensis

A. ipaiensis

Arachis

49 MYA

3.5 MYA

A. hypogaea

Goias IRFL6945

A. evenia ssp. serrulata

A. evenia ssp. evenia

var. pauciciliata

A. evenia ssp. evenia

var. evenia

Aeschynomene

3.3 MYA

0.4 MYA

Reference line

Mbao CIAT22838

Mapping parent

Bahia CIAT8232

n

o

i

s

s

e

c

c

A

s

e

i

c

e

p

S

s

u

n

e

G

A

B

CIAT22838

CIAT8232

F1 hybrid

Flower opening

F

l

o

w

e

r

s

h

a

p

e

P

o

d

t

i

b

a

h

t

n

a

l

P

Figure 2. Differentiation within the A. evenia species and its use for genomic and genetic studies. (A) Schematic phylogeny depicting the genetic differentiation within the A. evenia species and the relationships with Arachis sp.18,22Dash lines connect the tetraploid A. hypogaea to its most probable diploid genome

do-nors A. duranensis and A. ipaensis. Divergence times at certain nodes (circles) were estimated using a relaxed molecular clock (Table 3). (B) Phenotypic patterns which distinguish CIAT22838 and CIAT8232 lines and impact on the F1 hybrid obtained by cross-pollination: plant habit (ramified or not), time of flower opening (early and late), flower shapes (standard shape and dark throat indicated by an arrow), pod production. Bars¼ 5 mm.

(7)

Interestingly, for some other symbiotic genes, transcripts were not detected despite the use of root and nodule materials for the transcrip-tomes. Missing genes are involved in three symbiotic processes: bacte-rial recognition (i.e. LYK3 and LYR3), rhizobial infection (i.e. ANN1, EPR3, FLOT and RPG) as well as nodule functioning and immunity (i.e. DNF2, FEN1, SUNERGOS1, SymCRK and VAG1) (Fig. 4). In the absence of genomic data for A. evenia, it was not possible to deter-mine whether these genes were absent, not expressed or missed when mining the transcriptome data. But, it is worth noting that the concerned genes are involved in symbiotic processes that present sharp differences between A. evenia, M. truncatula and L. japonicus (Fig. 1).2,19As a consequence, these first observations represented

im-portant cues to further investigate the evolution of rhizobial symbiosis.

3.2.3 Development of EST-based molecular markers

To allow the development of molecular markers, the Bahia and Mbao transcriptomes were searched for microsatellites. These were subse-quently analysed for in silico polymorphism by comparing the orthol-ogous SSR-containing contigs of the two parental genotypes and filtered to allow primer design on flanking sequences. 1,500 microsat-ellites complying with these criteria were identified, the most abundant type of repeat motif being tri-nucleotides (46.4%), followed by di-nucleotides (35.9%) (Table 2). 500 SSR sites were selected for primer design and used for assessment of the polymorphism between the two parental genotypes and the F1progeny by capillary sequencing. Of the

412 successfully amplified SSRs, 335 (81%) were co-dominant, 50 (12%) were dominant and 27 (7%) were not polymorphic. Hence, thanks to the in silico analysis, the efficiency of polymorphic SSR de-velopment was increased by 2.5-fold compared to previous work with no prior filtering of the genic SSR tested on the same Bahia and Mbao genotypes.22Finally, a set of 318 co-dominant SSR markers was re-tained for the genotyping of the F2 mapping population

(Supplementary Tables S1, S2).

In order to map the symbiotic genes, polymorphisms were also searched for by comparing the orthologous sequences from the two parental genotypes and were converted into molecular markers. When no polymorphic SSR and no INDEL could be detected, Single Nucleotide Polymorphism (SNP) was used instead for KASP assay de-sign (Supplementary Tables S1–S4). Subsequently, at least one

molecular marker could be satisfactorily developed for each identified symbiotic gene, except for two genes, SEN1 and RSD, that lack of polymorphism. A few additional markers were also used, notably to locate the 5S and 45S loci that were previously identified on A. evenia chromosomes by cytogenetics (Supplementary Tables S1, S2).21

3.3 Construction of the genetic map and mapping of

symbiotic genes

3.3.1 Genotyping of the 220 F

2

mapping population

110 F2Aand the 110 F2B individuals, which were obtained by

bi-directional crossings of the Bahia and Mbao genotypes, formed the Table 1. Overview of the sequencing and assembly

Nucleotide length (pb) A. evenia Illumina libraries Bahia CIAT8232 Mbao CIAT22838

150–500 21,020 20,269 501–1,000 10,109 9,990 1,001–2,000 13,127 13,216 2,001–3,000 5,673 5,561 3,001–4,000 1,767 1,769 4,001–5,000 570 578 >5,000 382 379 Raw reads 68,945,277 75,745,315 Filtered reads 65,865,857 72,387,963 Number of contigs 51,763 52,648 min length (bp) 150 150 max length (bp) 16,895 16,886 N50 (bp) 1,799 1,803 Average length (bp) 1,092 1,082

Total nucleotide length (bp) 56,526,002 57,007,824

Figure 3. Assessment of assembly quality and gene coverage for the Illumina transcriptomes. (A) Distribution of mapped reads within the assem-bled contigs for the Illumina transcriptomes of the two parental genotypes Bahia and Mbao. B, Comparison of contig length between hit and no-hit con-tigs from the Mbao transcriptome assembly after tBLASTX analysis with the protein database of A. ipaensis. C, Comparison of Mbao contigs to putative orthologous A. ipaensis sequences by evaluating the % of alignment of cor-responding coding sequences.

(8)

mapping population. DNAs were extracted from the 220 F2

individ-uals and were genotyped for the selected 318 SSR markers by capil-lary sequencing. Multiplexes of 12 SSR markers were chosen so as to combine both three ranges of fragment sizes and four fluorochromes (FAM, VIC, NED and PET). 46 additional markers comprising SSRs, INDELs and SNPs, which target notable putative symbiotic genes, were also used for genotyping by capillary sequencing or KASPAR technology. This genotyping generated a matrix with <0.6% of missing data for the 364 co-dominant markers used.

3.3.2 Building of the genetic map

The A. evenia genetic map was built using the JOINMAP software with a minimum LOD score value of 6. The markers were assembled into 10 linkage groups, corresponding to the haploid chromosome set of A. evenia (2n ¼ 20). Interestingly, no marker remained unlinked and no link-age group could be split by increasing the LOD score value up to a LOD of 15. In addition, manual checking of the marker positions using color-mapping revealed no discrepancy in the genotyping data, thus corrobo-rating the robustness of the 10 linkage groups obtained. They were thus numbered AeLG1 to AeLG10 and orientated as explained below (Fig. 5).

The resulting genetic map spans 1,036 Kosambi cM, with an aver-age of 400 kb/cM. The 10 linkaver-age groups have different genetic sizes ranging from 82 to 130.8 cM and contained 13–49 markers (Table 3). This leads to average distances between two adjacent markers varying from 2.6 to 7.5 cM but with maximum interval sizes per linkage group ranging from 10.7 to 25.1 cM (Table 3). This situ-ation is reminiscent to what has been already observed in Vigna unguiculata and Lens culinaris for which genetic maps were also de-veloped with gene-based markers.36,37The gaps have been proposed

to correspond to gene-poor regions or to recombination hotspots. In addition, mapping of the 5S and 45S rDNA loci further enabled dis-tinguishing two linkage groups. Thus the 5S rDNA locus was found to locate on linkage group AeLG8 with a central position (at 55.6 cM on total 96.2 cM length) that is in accordance with the prox-imal localization previously observed on metaphase chromosomes by GISH.21Conversely, two 45S rDNA loci were shown to locate to in secondary constrictions of satellite chromosomes.21 Sequence

polymorphism for the ITS region allowed mapping only one of the two loci on the upper part of the linkage group AeLG10 (at 19.4 cM on a total 82 cM length) (Fig. 5).22Finally, gene-specific markers cor-responding to symbiotic genes were found to be distributed in all but one (AeLG2) linkage groups, with AeLG5, AeLG6, AeLG7 and AeLG9 containing the majority of them (Fig. 5).

3.3.3 Regions of segregating distortion (RSD)

Although almost all markers complied with the 1:2:1 expected Mendelian ratio for co-dominant genetic markers, two regions of segregation distortion were observed (Table 3,Fig. 6A). The AeLG2 RSD comprised up to 10 markers covering a map distance of up 70 cM and the AeLG9 RSD contained 11 markers (among which the symbiotic NSP1 and EFD genes) distributed over 45 cM, altogether representing 10% of the genetic map. In both cases, there was a strong overrepresentation of the Bahia allele and a near complete dis-appearance of the Mbao allele where the distortion culminated,

Infection Nodulation regulation Nod factor receptors Nod signaling Nodule organogenesis NFP (LYK3) (LYR3) SYMRK SIP1 NUP85 NUP133 Castor Pollux CCaMK Cyclops CRE1 NAP1 PIR1 LIN (RPG) NSP1 NSP2 NIN ERN1 ERF1 Transcription activation/ regulation VPY LNP ARPC1 LATD NOOT SUNN ASTRAY EFD SICKLE RDN1 (EPR3) Nodule differentiation/ functioning/immunity DNF1 (SymCRK) (SUNERGOS1) (VAG1) SIP2 (DNF2) SEN1 RSD (FEN1) bHLH476 (FLOT) PUB1 NF-YA (ANN1) SYMREM1

Figure 4. Simplified model for the symbiotic signaling pathway in legumes. Symbiotic genes identified in M. truncatula and L. japonicus are listed and classified in main symbiotic functions.8,27-33Putative orthologs were searched in the Illumina transcriptome of the two mapping parents. Those that could not be

identi-fied are between parentheses. Genes highlighted in bold characters are those which were demonstrated to be involved in the Nod-independent nodulation pro-cess in A. evenia.33,34

Table 2. Distribution of repeat types and number of repeats within the Mbao library

Repeat type Number of repeat units Total %

3 4 5 6 7 8 9 10 >10 Di – – – 64 105 95 107 68 120 559 35.9 Tri – 68 141 181 132 93 49 29 31 724 46.4 Tetra 12 58 53 29 11 2 2 1 – 168 10.8 Penta 25 51 27 1 1 1 – – – 106 6.8 Hexa – 2 – – – – – – – 2 0.1

(9)

whereas the proportion of heterozygotes remained unaltered. Judged from these skewed genotypic frequencies, a strong selection in favor of the Bahia alleles occurs at these RSD.

Distortions can be caused by chromosomal structural rearrange-ments, differences in DNA content, gene incompatibilities or deleteri-ous recessive alleles. Inversion or translocation events have been shown to be the cause of distortions in genetic maps of other legume such as M. truncatula, L. japonicus and L. culinaris.13,26,38–41Such

events are more likely to occur when performing interspecific crosses than when using related genotypes, but one cannot exclude that a chromosomal rearrangement occurred within the subspecies evenia to which both the Bahia and Mbao genotypes belong. In addition, even if they display a similar overall DNA content, this DNA content

may not be identical in the two RSD regions. Comparison of the ge-notypic data for the F2Aand F2Bmapping populations, revealed

simi-lar allele skewing in the two RSD regions, indicating that the observed distortions were independent of maternal and paternal ef-fects. However, pollen viability analysis showed that the F1progeny

contained 38% aborted pollen in contrast to the Bahia and Mbao parental genotypes that showed more than 99% pollen viability (Fig. 6B,C). This F1semisterility may account for the observed skewed

ge-netic frequencies, as it has been reported for some M. truncatula crosses.38Although, this RSD can alter genetic distances, it does not

compromise marker ordering in the SDR regions.

3.4 Comparative analysis with Arachis

3.4.1 Estimate of time divergences

With the aim of fostering genomic and symbiotic comparative analy-sis, we wanted to further our knowledge of the genetic differentiation in A. evenia and of the evolution within the Dalbergioid clade that contains Aeschynomene and Arachis. A 49 MYA divergence of Aeschynomene and Arachis was previously estimated from chloro-plast gene phylogenies (Fig 2A).6Using this date for calibration and

assuming a simple molecular clock, we made a rough estimate for the divergence times between different A. evenia genotypes and be-tween the two Arachis ancestral species of peanut. The coding read-ing frame of seven sread-ingle-copy nuclear genes was obtained from the transcript databases available for A. evenia genotypes Bahia (CIAT8232), Goias (IRFL6945) and Mbao (CIAT22838) and their Arachis orthologs in the genome sequences of A. duranensis and A. ipaensis (Fig 2A,Table 4,Supplementary Table S5). Synonymous substitution rates (Ks) were calculated for binned values and an

AeSSR256 0.0 AeSSR356 5.3 AeSSR204 6.2 AeSSR469 6.6 AeSSR164 7.1 AeSSR247 AeSSR452 8.7 AeSSR559 9.2 AeSSR140 14.8 AeSSR321 19.5 AeSSR271 AeSSR288 28.2 AeSSR396 41.7 ARPC1 52.5 AeSSR471 63.6 AeSSR348 66.7 AeSSR540 71.2 AeSSR158 71.7 AeSSR538 73.5 AeSSR167 74.0 AeSSR523 74.5 AeSSR195 75.1 AeSSR171 84.8 AeSSR459 AeSSR458 85.1 AeSSR370 94.4 AeSSR129 102.7 AeSSR156 AeSSR232 107.1 AeLG1 AeSSR283 0.0 AeSSR199 2.6 AeSSR476 7.1 NAP1 9.2 AeSSR510 9.6 AeSSR258 12.4 AeSSR192 13.1 AeSSR570 14.1 AeSSR462 15.1 AeSSR249 40.2 AeSSR318 45.0 AeSSR515 49.4 AeSSR287 53.1 AeSSR151 53.6 AeSNP1 55.3 AeSSR188 59.0 AeSSR543 59.2 AeSSR113 61.7 AeSSR361 68.1 AeSSR464 74.2 AeSNP2 76.8 AeSSR268 77.5 AeSSR118 77.8 AeSSR378 78.8 AeSSR169 81.0 AeSSR336 87.0 AeSSR236 AeSSR316 91.0 AeSSR219 91.7 AeSSR335 94.2 AeSSR444 95.4 AeSSR209 100.7 AeSSR600 104.0 AeSSR495 SYMREM1 104.5 AeSSR120 AeSSR214 105.2 AeSSR146 105.7 AeSSR168 108.3 AeSSR112 108.5 AeSSR467 110.2 AeSSR294 111.3 AeSSR102 111.6 AeSSR574 AeSSR587 AeSSR440 112.0 AeSSR229 112.3 CRE1 112.4 AeSSR518 113.1 AeLG3 AeSSR592 0.0 AeSSR296 3.2 AeSSR119 8.1 AeSSR443 8.6 AeSSR152 13.2 AeSSR409 AeSSR389 15.6 AeSSR243 16.4 AeSSR126 17.1 AeSSR175 AeSSR122 POLLUX 19.1 AeSSR520 AeSSR155 20.2 AeSSR563 26.0 AeSSR522 28.6 AeSSR451 29.3 AeSSR590 30.0 AeSSR576 30.5 AeSSR173 32.2 AeSSR373 32.9 AeSSR181 38.2 AeSSR165 AeSSR379 51.5 AeSSR594 53.9 AeSSR470 60.9 AeSSR109 64.5 AeSSR457 67.0 AeSSR395 68.6 AeSSR307 71.5 AeSSR280 73.7 AeSNP3 76.9 AeSSR483 82.6 AeSSR410 87.2 AeSSR419 91.3 AeSSR549 92.0 AeSSR292 93.7 AeSSR176 SUNN 95.9 AeSSR121 96.6 AeSSR325 100.8 AeSSR297 104.4 AeSSR115 104.5 AeSSR597 AeSSR497 AeSSR137 105.2 AeLG4 AeSSR344 0.0 AeSSR206 8.1 AeSSR585 25.6 AeSSR498 33.0 AeSSR366 37.1 AeSSR166 46.4 AeSSR412 48.9 AeSSR289 50.7 AeSSR542 52.8 AeSSR138 70.4 AeSSR416 92.5 AeSSR191 97.2 AeSSR547 98.1 AeLG2 SYMRK 0.0 AeSSR350 AeSSR161 2.5 AeSSR499 3.2 CYCLOPS 3.9 AeSSR450 AeSSR301 9.1 AeSSR573 10.4 AeSSR351 12.2 AeSSR501 13.9 NFP 16.4 AeSSR305 17.4 AeSSR322 18.5 AeSSR371 AeSSR177 23.4 BHLH476 25.6 AeSSR422 28.0 AeSSR311 31.8 AeSSR150 33.3 AeSSR466 33.8 AeSSR111 34.7 AeSSR465 AeSSR461 37.2 AeSSR564 39.6 AeSSR298 43.6 AeSSR162 48.2 AeSSR357 64.1 AeSSR114 64.3 VAPYRIN 67.9 AeSSR387 69.8 AeSSR480 71.9 AeSSR248 91.7 AeSSR330 96.8 AeSSR264 97.7 AeSSR170 100.6 AeSSR127 101.4 AeSSR571 103.1 AeSSR241 107.4 AeSSR491 115.6 AeSSR572 118.0 AeSSR418 118.2 AeSSR460 120.9 AeSSR101 123.0 AeSSR251 AeSSR575 123.5 AeSSR552 124.4 AeSSR200 AeSSR238 124.7 AeLG5 NIN 0.0 NUP133 0.4 AeSSR290 0.6 AeSSR492 0.8 AeSSR224 1.0 AeSSR388 2.7 AeSSR508 AeSSR580 5.2 RDN1 6.7 AeSNP5 7.6 AeSNP6 8.1 PUB1 AeSSR257 10.5 AeSSR125 10.8 AeSSR363 19.0 AeSSR441 20.1 AeSSR561 22.9 AeSSR524 24.6 AeSSR427 26.3 AeSSR159 35.1 AeSSR231 45.8 AeSSR235 47.0 AeSSR577 49.4 AeSSR349 NUP85 50.1 AeSNP7 51.3 NSP2 52.7 AeSSR323 54.8 LATD 62.8 AeSSR354 68.9 AeSSR554 69.5 AeSSR407 73.7 AeSSR453 78.3 AeSSR261 83.6 AeLG7 SIP2 0.0 AeSSR304 0.7 EFD 4.0 AeSSR160 17.8 AeSNP10 19.5 AeSSR139 19.9 AeSSR478 AeSSR397 23.7 AeSSR182 24.5 NSP1 28.1 AeSNP11 34.0 AeSSR180 34.5 AeSSR267 41.6 AeSSR545 45.5 AeSSR582 45.7 ERF1 53.4 AeSSR432 57.8 LIN 58.5 AeSSR225 62.3 AeSSR375 65.2 AeSSR584 65.9 AeSSR282 66.9 AeSSR202 67.4 AeSSR578 72.2 AeSSR488 77.9 AeSSR190 82.0 AeSSR568 84.0 AeSSR415 90.7 AeSSR408 91.5 AeSSR581 94.5 AeSSR503 AeSSR424 94.7 AeINDEL1 AeSSR380 95.0 AeSSR245 95.1 AeSSR293 95.3 AeLG9 AeSSR449 0.0 SICKLE AeSSR103 6.4 AeSSR490 8.9 AeSSR254 10.6 AeSSR430 14.1 AeSSR313 14.6 NFYA AeSSR586 16.2 AeSSR253 16.7 AeSSR174 17.0 AeSSR367 AeSSR222 17.4 AeSSR446 20.0 AeSSR207 20.2 AeSSR106 22.7 AeSSR136 23.8 AeSSR193 24.5 DNF1 28.5 AeSSR472 28.6 PIR1 31.8 AeSSR596 33.8 CASTOR 35.6 AeSSR244 36.7 AeSSR512 AeSSR107 37.0 AeSSR385 40.4 NOOT 47.4 AeSSR365 48.3 AeSSR132 51.9 AeSSR230 52.6 AeSSR382 61.7 AeSSR234 63.0 AeSSR228 76.3 AeSSR255 98.1 AeSNP4 112.3 AeSSR189 113.6 AeSSR414 117.6 AeSSR347 125.5 AeSSR286 126.8 AeSSR273 130.8 AeLG6 AeSSR504 AeSSR328 0.0 AeSSR212 0.7 SIP1 1.4 AeSSR239 1.6 AeSSR279 1.9 AeSSR539 AeSSR196 2.2 AeSSR327 2.8 AeSSR475 AeSSR516 6.4 AeSNP8 6.9 AeSSR266 8.1 AeSSR105 9.0 AeSSR556 11.3 AeSSR334 12.3 AeSSR527 14.6 AeSSR374 17.5 AeSSR536 20.0 AeSSR392 27.8 AeSSR579 32.2 AeSNP9 34.1 AeSSR310 AeSSR186 37.0 AeSSR429 38.5 AeSSR405 41.4 AeSSR346 42.1 AeSSR317 42.4 AeSSR131 46.0 CCAMK 53.1 AeSSR569 AeSSR312 54.0 AeSSR494 54.4 5S 55.6 AeSSR544 58.3 AeSSR270 70.8 AeSSR560 79.2 AeSSR284 83.2 AeSSR342 87.6 AeSSR205 AeSSR281 90.2 AeSSR369 93.7 AeSSR481 96.2 AeLG8 AeSSR599 0.0 AeSSR211 0.2 AeSSR123 12.7 ASTRAY 14.8 AeSSR240 17.7 ITS 19.4 AeSSR314 27.0 AeSSR420 27.8 AeSSR217 48.4 AeSSR300 50.8 AeSSR426 53.7 AeSSR532 54.6 AeSSR565 54.9 AeSSR431 56.4 AeSSR445 AeSSR226 63.3 AeSSR402 63.4 AeSSR513 65.5 AeSSR355 73.0 AeSSR541 76.3 AeSSR507 77.2 AeSSR485 79.1 AeSSR275 80.1 AeSSR394 80.8 AeSSR278 82.0 AeLG10

Figure 5. Genetic map of Aeschynomene evenia. The genetic map based on F2 mapping population CIAT22838 CIAT8232 is comprised of 364 gene based markers including SSR, INDEL and SNP (KASP) markers. The ten linkage groups are designated as AeLG1-AeLG10. The code to the right of the linkage groups refers to the marker or symbiotic gene name. The numbers to the left of the linkage groups refers to the genetic distances (Kosambi cM) from the top.

Table 3. Characteristics of the Aeschynomene evenia genetic map

LG Total length (cM) Number of loci Number of markers Average distance (cM) Min–max distances (cM) Distorted markers (P < 0.001) AeLG1 107.1 25 29 4.3 0–13.5 0 AeLG2 98.1 13 13 7.5 0–22.1 10 AeLG3 113.1 44 49 2.6 0–25.1 0 AeLG4 105.2 38 46 2.8 0–13.3 0 AeLG5 124.7 42 48 3.0 0–19.8 0 AeLG6 130.8 37 41 3.5 0–21.8 0 AeLG7 83.6 31 34 2.7 0–10.7 0 AeLG8 96.2 37 43 2.6 0–12.6 0 AeLG9 95.3 33 36 2.9 0–13.9 15 AeLG10 82 24 25 3.4 0–20.6 0

(10)

average mutation rate of 5.34  103substitutions/synonymous site/

Mya was obtained by using the 49 MYA value for the Aeschynomene–Arachis divergence.6

This enabled an estimate of divergence time between the two map-ping parents that belong to A. evenia ssp. evenia var. evenia and var. pauciciliata at 0.4 Mya, reflecting both the low difference observed in polymorphism and the cross-compatibility between the two varieties.22

In contrast, a much older divergence time of 3.3 Mya was deduced for A. evenia ssp. evenia–A. evenia ssp. serrulata. This is in agreement with the pronounced genetic differentiation of the two A. evenia sub-species evidenced by the difficulty of making successful crosses and the almost completely sterile hybrids.22This divergence time is similar to

the 3.5 Mya obtained for A. duranensis–A. ipaensis, which are geneti-cally isolated, thus supporting the idea that the two A. evenia subspecies are at the final stage of the speciation process. Interestingly, the value calculated for the Arachis species couple is in accordance with other es-timates (2.1–3.5 Mya).18,42Therefore, our approach seems to be valid

to provide a timeframe for A. evenia evolution.

3.4.2 Genome conservation and evolution with

Arachis ipaensis

Several studies have undertaken comparative genomics within the Papilionoids revealing that the degree of synteny is correlated with Figure 6. Segregation distortion and pollen viability of F1 hybrids. (A) Segregation distortion of the co-dominant markers along linkage groups AeLG2 and AeLG9 in the F2 mapping population. Triangles refer to the CIAT22838 genotype, lozenges to the CIAT8232 genotype and squares to the heterozygote. X-axis: genetic distance from the top of the linkage group in Kosambi cM. Y-axis: frequency (%) of segregation of the different genotypes. If no distortion occurs, the segregation values should be 25%/50%/25%. B, Proportion of non-aborted pollen grains in the mapping parents and the F1 hybrid. C, Anther from the CIAT22838 line (upper left panel) and pollen grains of the mapping parents and the F1 hybrid. Aborted pollen stains pale and nonaborted pollen stains dark. Bars¼ 50 mm.

Table 4. Estimation of divergence times between Aeschynomene and Arachis genomes

Evolutionary divergence Substitution per site (Ks) Average

(SR) Divergence APX2 (738 pb) AS2 (1,629 pb) b-Fructofuranosidase (1,536 pb) PIP2;7 (864 pb) Proline-tRNA ligase (1,098 pb) SPAG1 (861 pb) TIP1;1 (750 pb) (SR/Mya) (Mya)

A. duranensis–A. evenia Mbao 0.581 0.556 0.504 0.797 0.380 0.431 0.442 0.527 0.00534* 49* A. duranensis–A. evenia Mbao 0.569 0.576 0.531 0.758 0.367 0.421 0.419 0.520

A. duranensis–A. ipaensis 0.029 0.039 0.037 0.053 0.025 0.049 0.026 0.037 0.00534 3.5

A. evenia Goias-Bahia 0.035 0.056 0.039 0.024 0.029 0.038 0.026 0.035 0.00534 3.3

A. evenia Goias-Mbao 0.035 0.050 0.045 0.024 0.038 0.032 0.026 0.036 0.00534 3.3

A. evenia Bahia-Mbao 0.000 0.011 0.008 0.000 0.008 0.005 0.000 0.005 0.00534 0.4

*Average divergence rate obtained by using the known 49 MYA divergence time between Aeschynomene and Arachis.6Numbers in bold are inferred values.

(11)

phylogenetic distance of the legume species.9–17The first glimpses of genome evolution within the Dalbergioid clade were only recently obtained by the genome sequencing of the two putative genome do-nors of peanut. This revealed a mostly one-to-one correspondence between the two species, with the genome of A. duranensis, but not of A. ipaensis, having undergone several major rearrangements.22To

perform the Aeschynomene-Arachis genome comparison, we made use of the gene-based nature of the markers mapped on the genetic map of A. evenia (2n ¼ 20) to locate orthologous sequences on the chromosomes of A. duranensis and A. ipaensis (2n ¼ 20) (Supplementary Table S6). Although all the transcripts used to de-velop the markers did not necessarily contain predicted coding read-ing frames, 97% of them (353 out of 364) showed significant matches to Arachis sequences. To facilitate comparison, the Aeschynomene linkage groups have been numbered and oriented to best match the corresponding Arachis chromosomes (Supplementary Fig. S2A,B).

When the A. evenia linkage groups and the A. ipaensis chromo-somes were aligned, extensive stretches of shared colinearity between them were evident (Fig. 7A,Supplementary Fig. S2A). Some link-age groups of A. evenia showed hits mainly with one chromosome of A. ipaensis, notably AeLG1 that matches Araip.A01, AeLG2 with Araip.A02, AeLG5 with Araip.05, AeLG6 with Araip.06 and AeLG9 with Araip.09 (Fig. 7A). However, detailed analysis of mac-rosynteny between homologous chromosomes evidence internal rear-rangements such as translocations and inversions (Fig. 7B). In addition, even though A. evenia and A. ipaensis share the same base chromosome number of 10, one-to-one relationships do not hold true for the remaining A. evenia linkage groups. Instead, they appear to be composed of segmental syntenic blocks matching different chromosomal positions in A. ipaensis as exemplified by AeLG4 that contains three conserved blocks found in the chromosomes Araip. A03, A04 and A10. It is worth noting that some gaps observed

within Aeschynomene linkage groups are coincident with syntenic block junctions and thus may correspond to more variable regions prone to genome restructuring as suggested earlier.4 In addition, Aeschynomene loci mostly match and cover the euchromatic arms of Arachis chromosomes while only few loci map pericentromeric re-gions (Fig. 7A).18,22This observation is consistent with all the

eu-chromatic regions of the A. evenia genome being captured by the gene-based markers composing the genetic map.

3.4.3 Macrosynteny analysis for the symbiotic genes

Considering the divergence time between Aeschynomene and Arachis (49 Mya) relative to the isolation time of the Dalbergioid clade from the Phaseolids and Galegoids (55 Mya), along with the 3 to 3.5-fold larger genome of A. duranensis and A. ipaensis (1,250 and 1,560 Mb, respectively) compared to A. evenia (415 Mb), the high level of macrosynteny documented here between Arachis and Aeschynomene is remarkable. Much of the expansion in Arachis ge-nomes may be due to retroelements, but it has done little to disrupt macrosynteny that is dominated by large chromosome arms-size rearrangements. Such macrosynteny was found to be maintained for the symbiotic genes mapped onto the A. evenia linkage groups, sug-gesting that true orthologous relationships were identified (Fig. 7A). One noticeable exception concerns the CCaMK gene that is mapped onto AeLG8 and is positioned between two syntenic blocks, while its A. ipaensis ortholog locates on Araip.A01 chromosome within a block that is syntenic to AeLG1 (Fig. 7A,B). Although it is not possi-ble to determine the extent to which macrosyntenic relationships identified by genetic mapping are indicative of conserved microsyn-teny, it is interesting to note that a retrotransposon insertion site was previously identified 564-bp upstream of the AeCCamK gene.34 Since retrotransposons are known to be involved in genome restruc-turing and also to induce gene mobility, further studies may help

B

A

AeCCaMK

AeLG1

Araip.B01

AeLG8

Araip.B0

8

Araip.B07

*

Figure 7. Syntenic relationships of A. evenia with the sequenced Dalbergioid legume Arachis ipaiensis. (A) Comparison of the genetic map of A. evenia with the genome sequence of A. ipaensis. Circled bars on the left side correspond to the linkage groups of A. evenia while those on the right side represent the chromo-somes of A. ipaensis. Homologous loci are connected by lines and spots correspond to symbiotic genes. Spots indicate conserved macrosyntenic locations and asterisks non-conserved ones. (B) Detailed example of macrosynteny between A. evenia and A. ipaensis. Alignment of linkage groups AeLG1 and AeLG8 from the developed genetic map of A. evenia with the corresponding A1 and A8 chromosomes of A. ipaensis. Dot lines connect orthologous loci between the two species. The dot line link the non-macrosyntenic location of the AeCcaMK gene.

(12)

determining their potential impact on the genomic environment of this key symbiotic gene in A. evenia.

4. Conclusions and perspectives

To advance in our understanding of the NF-independent symbiosis in the model legume A. evenia, we developed a genetic map incorpo-rating expressed genes that are likely to be involved in nodulation. Previous characterization of different genotypes in A. evenia allowed us to select the Mbao genotype as a reference for molecular genetics and the Bahia genotype as a suitable crossing parent to develop a F2

mapping population. Transcriptome in silico analysis for the two pa-rental genotypes enabled an efficient selection of polymorphic molec-ular markers that were used to genotype the mapping population. The choice of genotyping methods, i.e. multiplexing and capillary se-quencing for SSR and INDEL markers along with the KASP assays for SNP markers resulted in 99.4% complete genotyping data. The genetic map was resolved in 10 robust linkage groups that most likely represented the 10 chromosome pairs in A. evenia. The avail-ability of a high-density map containing easy-to-use and cost-effective markers will facilitate gene and QTL mapping work in A. evenia.

Given the gene-based nature of the markers used, this genetic map provided the first insights into the structure of the genome gene-space and, in particular, uncovered the distribution of expressed orthologs of known symbiotic genes. Comparative genomic analysis with Arachis further evidenced that the Aeschynomene genome con-stituted of macrosyntenic blocks. Therefore, this genetic map of Aeschynomene could form the basis for future efforts in the scaffold anchoring and assembly in an Aeschynomene genome sequencing initiative. The availability of a whole genome sequence is now feasi-ble thanks to the next-generation sequencing technologies. This will allow much better comparison with Arachis to characterize genome evolution within the Dalbergioid clade. In particular, microsynteny analysis will allow investigating if chromosome rearrangements and transposable elements had an impact on certain symbiotic genes as suspected for the AeCCaMK gene and those that could not be found in the transcriptome datasets. Understanding how these symbiotic genes evolved (by changes of gene expression or by gene loss) in a NF-independent context would be an important advance in under-standing the evolution of nodulation.

In addition, an a priori identification of symbiotic genes in A. evenia using a mutagenesis approach should be a powerful tool. In this perspective, a strategy combining gene mapping with sequencing at gene or whole genome-level will allow an efficient identification of new alleles and new symbiotic loci among mutant lines, as has been successfully performed in M. truncatula and L. japonicus.31,43–46

New methods, such as SHOREmap and MutMap, now integrate si-multaneous mapping and mutation identification in a single step by deep whole-genome resequencing of pooled DNA from a F2mutant

population.47,48This population can be obtained either by crossing

the mutant to the mapping parent or to the original WT parental line used for mutagenesis. In the latter case, mapping relies on the detec-tion of mutagenesis-induced polymorphisms. The similar genetic backgrounds present the advantage of both simplifying segregation patterns and suppressing the occurrence of distortions. As a result, applying this strategy to A. evenia, the identification of new symbi-otic genes should be straightforward and should facilitate the deci-phering of the molecular mechanisms of the NF-independent process.

Acknowledgements

We thank A.M. Risterucci and F. de Bellis (AGAP, CIRAD, France) for valuable suggestions on genetic map construction and L. Legrand (LIPM, INRA/CNRS, France) for informatics assistance.

Accession numbers

SRR3276128, SRR3285082

Conflict of interest

None declared.

Supplementary data

Supplementary dataare available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported by Agropolis Fondation through the « Investissements d’avenir » programme (ANR-10-LABX-0001-01) under the reference ID “AeschyMap” AA1202-009, and by a grant from the French National Research Agency (ANR-AeschyNod-14-CE19-0005-01). Funding to pay the Open Access publication charges for this article was provided by the French National Research Agency.

References

1. Graham, P.H. and Vance, C.P. 2003, Legumes: importance and con-straints to greater use, Plant Physiol., 131, 872–7.

2. Sprent, J.I. 2007, Evolving ideas of legume evolution and diversity: a taxo-nomic perspective on the occurrence of nodulation. New Phytol., 174, 11–25.

3. Peterson, R., Slovin, J.P., Chen, C. 2010, A simplified method for differen-tial staining of aborted and non-aborted pollen grains. Int. J. Plant Biol., 1:e13.

4. Bertioli, D.J., Moretzsohn, M.C., Madsen, L.H. et al. 2009, An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes. BMC Genomics, 10, 45–55.

5. Held, M., Hossain, M.S., Yokota, K., et al. 2010, Common and not so common entry. Trends Plant Sci., 15, 540–5.

6. Lavin, M., Herendeen, P.S., Wojciechowski, M.F. 2005, Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the Tertiary. Syst. Biol., 54, 575–594.

7. Doyle, J.J. and Luckow, M.A. 2003, The rest of the iceberg. Legume di-versity and evolution in a phylogenetic context, Plant Physiol., 131, 900–10.

8. Oldroyd, G.E.D., Murray J.D., Poole P.S., et al. 2011, The rules of en-gagement in the legume-rhizobial symbiosis, Annu. Rev. Genet., 45, 119–44.

9. Choi, H.K., Mun, J.H., Kim, D.J., et al. 2004, Estimating genome conser-vation between crop and model legume species, Proc. Natl. Acad. Sci., 101, 15289–94.

10. Isobe, S.N., Hisano, H., Sato S., et al. 2012, Comparative genetic map-ping and discovery of linkage disequilibrium across linkage groups in white clover (Trifolium repens L.). G3, 2, 607–17.

11. Sato, S., Nakamura, Y., Kaneko, T., et al. 2008, Genome structure of the legume, Lotus japonicus. DNA Res., 15, 227–39.

12. Schmutz, J., Cannon, S.B., Schlueter, J., et al. 2010, Genome sequence of the palaeopolyploid soybean, Nature, 463, 178–83.

(13)

13. Young, N.D., Debelle´, F., Oldroyd, G.E., et al. 2011. The Medicago ge-nome provides insight into the evolution of rhizobial symbioses. Nature, 480, 520–4.

14. Varshney, R.K., Chen, W., Li, Y., et al. 2012, Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farm-ers. Nat. Biotechnol., 30, 83–9.

15. Zhu, H., Choi, H.K., Cook, D.R., Shoemaker, R.C. 2005, Bridging model and crop legumes through comparative genomics. Plant Physiol., 137, 1189–96.

16. Varshney, R.K., Song, C., Saxena, R.K. et al. 2012, Draft genome se-quence of Chickpea (Cicer arietinum) provides a resource for trait im-provement. Nat. Biotechnol., 31, 240–6.

17. Schmutz, J., Mc Clean, P.E., Mamido, S. et al. 2014, A reference genome for common bean and genome wide analysis of dual domestications. Nature Genet., 46, 707–13.

18. Bertioli, D.J., Cannon, S.B., Froenicke, L. et al. 2016, The genome se-quences of Arachis duranensis and Arachis ipaensis the diploid ancestors of cultivated peanut. Nature Genet. doi:10.1038/ng.3517.

19. Giraud, E., Moulin, L., Vallenet, D. et al. 2007, Legumes symbioses: ab-sence of Nod genes in photosynthetic bradyrhizobia. Science, 316, 1307–1312.

20. Sprent, J.I. and James, E.K. 2008, Legume-rhizobial symbiosis: an an-orexic model? New Phytol., 179, 3–5.

21. Arrighi, J.F., Cartieaux, F., Brown, S.C., et al. 2012, Aeschynomene evenia, a model plant for studying the molecular genetics of the Nod-independent rhizobium-legume symbiosis. Mol. Plant-Microbe Interact., 25, 851–861.

22. Arrighi, J.F., Cartieaux, F., Chaintreuil, C., et al. 2013, Genotype delimi-tation in the Nod-independent model legume Aeschynomene evenia. PloS One, 8, e63836.

23. Rubio-Pi~na, J.A. and Zapata-Pe´rez, O. 2011, Isolation of total RNA from tissues rich in polyphenols and polysaccharides of mangrove plants. Plant Biotech., doi:10.2225/Vol02304-issue5-fulltext-8.

24. Combes, M.C., Dereeper, A., Severac, D., et al. 2013, Contribution of subgenomes to the transcriptomes and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytol., 200, 251–60.

25. Allegre, M., Argoult, X., Boccara, M., et al. 2012, Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and sim-ple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L. DNA Res., 19, 23–35.

26. Thoquet, P., Ghe´rardi, M., Journet, E.P., et al. 2002, The molecular ge-netic linkage map of the model legume Medicago truncatula: an essential tool for comparative legume genomics and the isolation of agronomically important genes. BMC Plant Biology, 2, 1.

27. Gourion, B., Berrabah, F., Ratet., P., et al. 2014, Rhizobium-legume symbioses: the crucial role of plant immunity. Trends Plant Sci., 20, 186–94.

28. Reid, D.E., Ferguson, B.J., Hayashi, S., et al. 2011, Molecular mecha-nisms controlling legume autoregulation of nodulation. Ann. Bot., 108, 789–95.

29. Couzigou, J.M., Zhukov, V., Mondy, S., et al. 2012, NODULE ROOT and COCHLEATA maintain nodule development and are legume ortho-logs of Arabidopsis BLADE-ON-PETIOLE genes. Plant Cell., 24, 4498–510.

30. Suzaki, T., Ito, M., Yoro, E., et al. 2014, Endoreduplication-mediated ini-tiation of symbiotic organ development in Lotus japonicus. Development, 141, 2441–5.

31. Yoon, H.J., Hossain, M.S., Held, M., et al. 2014, Lotus japonicus SUNERGOS1 encodes a predicted subunit A of a DNA topoisomerase VI that is required for nodule differentiation and accommodation of rhizobial infection. Plant J., 78, 811–21.

32. Kawaharada, Y., Kelly, S., Nielsen, M.W., et al. 2015, Receptor-mediated exopolysaccharide perception controls bacterial infection. Nature, 523, 308–12.

33. Fliegmann, J., Canova, S., Lachaud, C., et al. 2013. Lipo-chitooligosac-charidic symbiotic signals are recognized by LysM receptor-like kinase LYR3 in the legume Medicago truncatula. ACS Chem. Biol., 8, 1900–6. 34. Fabre, S., Gully, D., Poitout, A., et al. 2015, The Nod factor-independent

nodulation in Aeschynomene evenia required the common plant-microbe symbiotic “toolkit”. Plant Physiol., 169, 2654–64.

35. Czernic, P., Gully, D., Cartieaux, F., et al. 2015, Convergent evolution of endosymbiont diffenrentiation in Dalbergioid and IRLC legumes medi-ated by nodule-specific cysteine-rich peptides. Plant Physiol., 169, 1254–65.

36. Xu, P., Wu, X., Wang, B., et al. 2011, A SNP and SSR based genetic map of Asparagus Bean (Vigna unguiculata ssp. sesquipedialis) and compari-son with the broader species. PLoS ONE, 6, e15952.

37. Sharpe, A.G., Ramsay, L, Sanderson, L.A., et al, Ancient orphan crop joins modern era: gene-based SNP discovery and mapping in lentil. BMC Genomics, 14, 192.

38. Kamphuis, L.G., Williams, A.H., D’Souza, N.K., et al. 2007, The Medicago truncatula reference accession A17 has an aberrant chromo-somal configuration. New Phytol., 174, 299–303.

39. Hayashi, M., Miyahara, A., Sato, S., et al. 2001, Construction of a genetic linkage map of the model legume Lotus japonicus using an intraspecific F2 population. DNA Res., 8, 301–10.

40. Sandal, N., Krusell. L., Radutoiu, S., et al. 2002, A genetic linkage map of the model legume Lotus japonicus and strategies for fast mapping of new loci. Genetics, 161, 1673–83.

41. Gujaria-Verma, N., Vail, S.L., Carrasquilla-Garcia, N., et al. 2014, Genetic mapping of legume orthologs reveals high conservation of synteny between lentil species and the sequenced genomes of Medicago and chick-pea. Front. Plant. Sci., 5, 676.

42. Nielen, S., Vigidal, B.S., Leal-Bertioli, S.C.M., et al. 2012, Matita, a new retroelement from peanut: characterization and evolutionary context in the light of the Arachis A-B genome divergence. Mol. Genet. Genomics, 287, 21–38.

43. Sandal, N., Petersen, T.R., Murray, J., et al. 2006, Genetics of symbiosis in Lotus japonicus: recombinant inbred lines, comparative genomic maps, and map position of 35 symbiotic loci. MPMI, 16, 80–91.

44. Sandal, N., Jin, H., Rodriguez-Navarro, D.N., et al. 2012, A set of Lotus japonicus Gifu x Lotus burttii recombinant inbred lines facilitates map-based cloning and QTL mapping. DNA Res., 19, 317–23.

45. Murray, J., Karas, B., Ross, L., et al. 2006. Genetic suppressors of the Lotus japonicus har1-1 hypernodulation phenotype. MPMI, 19, 1082–91. 46. Kim, M., Chen, Y., Xi, J., et al. An antimicrobial peptide essential for

bacte-rial survival in the nitrogen-fixing symbiosis. Proc. Natl. Acad. Sci. USA., 112, 15238–43.

47. Schneeberger, K., Ossowski, S., Lanz, C., et al. 2009, SHOREmap: simul-taneous mapping and mutation identification by deep sequencing. Nature Methods, 6, 550–1.

48. Aba, A., Kosugi, S., Yoshida, K., et al. 2012, Genome sequencing reveals ag-ronomically important loci in rice using MutMap. Nat. Biotech. 30, 174–9.

Figure

Figure 1. Phylogeny of the Papilionoid lineage and evolution of the symbiotic infection process
Figure 2. Differentiation within the A. evenia species and its use for genomic and genetic studies
Figure 3. Assessment of assembly quality and gene coverage for the Illumina transcriptomes
Figure 4. Simplified model for the symbiotic signaling pathway in legumes. Symbiotic genes identified in M
+4

Références

Documents relatifs

With the goal of discovering natural variation in the Nod-independent Aeschynomene legumes, we developed a large collection of 226 accessions that spans the geo- graphical

find an expression profile similar to that observed in M. Still, no straightforward hypothesis can be proposed concerning the absence of DNF1 up-regulation in A. In the case of

Finally, nodulation tests with different Bradyrhizobium strains revealed new nodulation behaviours and the diploid species outside of the Nod-independent clade were compared for

Two mRNA libraries were built and sequenced in one lane as de- scribed. 24 The Illumina paired-end sequencing technology generated 2 100 bp independent reads from the 200 bp

We evaluated the conservation of the signaling pathway common to other endosymbioses using three candidate genes: Ca 2+ /Calmodulin-Dependent Kinase (CCaMK), which plays a central

Our purpose is thus to study the plant molecular mechanisms governing this new interaction and in particular to determine whether the signalling pathway triggering nodulation

independant symbiosis, we have screened a large Tn5 mutant library of the Bradyrhizobium ORS278 strain for their inability to induce nodules on Aeschynomene.. No strict

In this study, we describe the polymorphism and geographical distribution of a 7-bp insertion in the 3’ untranslated region of the rabbit SRY that we pre- viously identified in only