• Aucun résultat trouvé

The bottle gourd genome provides insights into Cucurbitaceae evolution and facilitates mapping of a Papaya ring-spot virus resistance locus

N/A
N/A
Protected

Academic year: 2021

Partager "The bottle gourd genome provides insights into Cucurbitaceae evolution and facilitates mapping of a Papaya ring-spot virus resistance locus"

Copied!
13
0
0

Texte intégral

(1)

RESOURCE

The bottle gourd genome provides insights into

Cucurbitaceae evolution and facilitates mapping of a Papaya

ring-spot virus resistance locus

Shan Wu1,† , Md Shamimuzzaman2,†, Honghe Sun1,3,†, Jerome Salse4,†, Xuelian Sui2,5,6, Alan Wilder2, Zujian Wu5,6, Amnon Levi2, Yong Xu3, Kai-Shu Ling2,* and Zhangjun Fei1,7,*

1

Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA, 2

US Vegetable Laboratory, USDA-Agriculture Research Service, Charleston, SC, USA, 3

Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegeta-ble Germplasm Improvement, National Engineering Research Center for VegetaVegeta-bles, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China,

4Institut National de la Recherche Agrinomique, Unites Mixtes de Recherche 1095, Genetics, Diversity and Ecophysiology of Cereals, Paleogenomics & Evolution (PaleoEvo) Group, Clermont-Ferrand, France,

5

State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Province Key Laboratory of Plant Virol-ogy, Institute of Plant VirolVirol-ogy, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China,

6

Department of Plant Protection, Fujian Agriculture and Forest University, Fuzhou, China, and

7USDA-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, USA

Received 13 July 2017; revised 1 September 2017; accepted 7 September 2017; published online 23 October 2017. *For correspondence (e-mail zf25@cornell.edu or kai.ling@ars.usda.gov).

These authors contributed equally to this work.

SUMMARY

Bottle gourd (Lagenaria siceraria) is an important vegetable crop as well as a rootstock for other cucurbit crops. In this study, we report a high-quality 313.4-Mb genome sequence of a bottle gourd inbred line, USVL1VR-Ls, with a scaffold N50 of 8.7 Mb and the longest of 19.0 Mb. About 98.3% of the assembled scaf-folds are anchored to the 11 pseudomolecules. Our comparative genomic analysis identifies chromosome-level syntenic relationships between bottle gourd and other cucurbits, as well as lineage-specific gene fam-ily expansions in bottle gourd. We reconstructed the genome of the most recent common ancestor of Cucurbitaceae, which revealed that the ancestral Cucurbitaceae karyotypes consisted of 12 protochromo-somes with 18 534 protogenes. The 12 protochromoprotochromo-somes are largely retained in the modern melon gen-ome, while have undergone different degrees of shuffling events in other investigated cucurbit genomes. The 11 bottle gourd chromosomes derive from the ancestral Cucurbitaceae karyotypes followed by 19 chro-mosomal fissions and 20 fusions. The bottle gourd genome sequence has facilitated the mapping of a domi-nant monogenic locus, Prs, conferring Papaya ring-spot virus (PRSV) resistance in bottle gourd, to a 317.8-kb region on chromosome 1. We have developed a cleaved amplified polymorphic sequence (CAPS) marker tightly linked to the Prs locus and demonstrated its potential application in marker-assisted selection of PRSV resistance in bottle gourd. This study provides insights into the paleohistory of Cucurbitaceae gen-ome evolution, and the high-quality gengen-ome sequence of bottle gourd provides a useful resource for plant comparative genomics studies and cucurbit improvement.

Keywords: bottle gourd, Lagenaria siceraria, genome sequencing, cucurbit genome evolution, Papaya ring-spot virus resistance.

(2)

INTRODUCTION

Bottle gourd [Lagenaria siceraria (Molina) Standl.] (2n= 2x = 22) belongs to the genus Lagenaria in the Cucurbitaceae family. It is believed to originate in sub-Saharan Africa (Decker-Walters et al., 2004), and consists of two subspecies: the African L. siceraria ssp. Siceraria and the Asian L. siceraria ssp. Asiatica (Schlumbaum and Vandorpe, 2012). As one of the most ancient crops culti-vated by human, bottle gourd is widely grown in the world today, particularly in the East Asian countries (Kistler et al., 2014). There are many beneficial uses of bottle gourd, due largely to its fruits that can be used for food, medicine, containers, musical instruments or decorative artefacts (Mashilo et al., 2017). With the nutritional benefits to human health, its immature young fruits are a delightful culinary vegetable in many tropical and temperate regions (Morimoto and Mvere, 2004). In recent years, bottle gourd has also been used as an important rootstock for grafting to other cucurbit crops to improve their disease resistance and cold tolerance (Davis et al., 2008; King et al., 2008).

Plant diseases are a major constraint in bottle gourd pro-duction (Mashilo et al., 2017). Genetic sources of resis-tance to a number of economically important diseases have been identified in bottle gourd (Provvidenti, 1981, 1995; Ling and Levi, 2007; Kousik et al., 2008; Ling et al., 2013; Sarao et al., 2014), but their genetic inheritance of resistance and molecular markers useful for marker-assisted selection remain largely unstudied. Among plant diseases, viral diseases are a major limiting factor in cucur-bit crop productions. The most prevalent viruses causing major impact to cucurbit crop productions worldwide are those aphid-transmitted viruses in the family Potyviridae, including Papaya ring-spot virus watermelon strain (PRSV-W), Watermelon mosaic virus (WMV), and Zucchini yellow mosaic virus (ZYMV) (Lecoq et al., 1998; Turechek et al., 2010; Ali et al., 2012). Commercially available bottle gourd cultivars are generally susceptible to viral diseases (Ling et al., 2013). We previously screened the United States Plant Introductions (PIs) of bottle gourd (L. siceraria) col-lected from Africa, Asia, South and North America, which represent a wide genetic diversity (Levi et al., 2009), and identified numerous lines with resistance to several poty-viruses (Ling and Levi, 2007; Ling et al., 2013). Two of these resistant PIs were further developed into inbred lines, USVL1VR-Ls and USVL5VR-Ls. In greenhouse and field trials, these two lines are proven to possess broad resistance to potyviruses, including PRSV-W, WMV, ZYMV and Squash vein yellowing virus (SqVYV) (Ling et al., 2013). Papaya ring-spot virus (PRSV) is one of the most destructive viruses infecting papaya and cucurbits world-wide (Gonsalves et al., 2010). Based on the host range, PRSV is grouped into two major biotypes or strains, the papaya strain (PRSV-P) that infects both papaya and

cucurbits, and the watermelon strain (PRSV-W) that infects only cucurbits. The success in controlling PRSV-P epidemic that saved the papaya industry in Hawaii was largely attrib-uted to the first engineered and commercialized fruit crop since 1998 (Gonsalves, 1998). Conversely, due to greater diversity of viruses encountered in cucurbit crop produc-tions, a combination of transgenic technology and breed-ing for resistance to multiple viruses would be necessary to achieve effective viral disease control in the field (Gonsalves et al., 2010).

Recent advancement of sequencing technologies has facilitated the improvement of cucurbit crops, including melon (Cucumis melo), cucumber (Cucumis sativus) and watermelon (Citrullus lanatus), by providing reference gen-omes (Huang et al., 2009; Garcia-Mas et al., 2012; Guo et al., 2013). In comparison with these major cucurbit crops, a reference genome for bottle gourd has yet to be available. Recent studies on syntenic relationships among the sequenced cucurbit genomes have shown the complex history of chromosome structure changes leading to the genome of cultivated cucumber after its divergence from melon (Huang et al., 2009), and the chromosome rear-rangements that shaped the watermelon genome from the 21-chromosome eudicot intermediate ancestor (Guo et al., 2013). However, the evolutionary history of modern cucur-bit genomes from their most recent common ancestor gen-ome remains largely unexplored. In the present study, we report a high-quality genome assembly of a bottle gourd inbred line, USVL1VR-Ls, which enabled us to characterize and map a dominant monogenic locus associated with PRSV-W resistance in bottle gourd. Using the currently available genome resources of extant modern cucurbit species, we reconstructed the genome of the most recent common ancestor of Cucurbitaceae, which provides insights into the evolutionary history of this family and represent a unique resource for studying conserved agro-nomical traits.

RESULTS AND DISCUSSION

Genome assembly, anchoring and quality evaluation The genome of a bottle gourd inbred line, USVL1VR-Ls, was sequenced and assembled. High-quality cleaned sequences were generated from Illumina paired-end and mate-pair libraries, representing 3959 coverage of the genome (Table S1), based on the estimated genome size of ~334 Mb (Achigan-Dako et al., 2008). Analysis of K-mer distribution of the genome sequencing reads confirmed that USVL1VR-Ls was highly homozygous (Figure S1). The resulting de novo assembly had a total size of 313.4 Mb (~93.8% of the estimated genome size), containing 444 scaffolds with a N50 of 8.7 Mb and the longest of 19.0 Mb,

(3)

which represented nearly an entire arm of chromosome 3 (Table 1 and Figure S2). About 95.0% (297.6 Mb) of the assembly consisted of bases without gaps (Table 1). To anchor the assembled bottle gourd scaffolds, a genetic map was constructed using an F2population derived from a cross between USVL10, a PRSV-W-susceptible line, and USVL5VR-Ls, a PRSV-W-resistant line. The resulting map contained 631 SNPs spanning 1731.7 cM across 11 linkage groups (LGs) (Table S2). In addition, we constructed another more saturated genetic map using a previously published RAD-Seq dataset (Xu et al., 2014). This genetic map consisted of 1397 SNPs in a total genetic distance of 1788.4 cM, and was largely collinear with the USVL map with all 11 LGs having shared scaffolds (Figures 1 and S2 and Table S2). Three of the 441 initial scaffolds were identi-fied as chimeric, on the basis of the same scaffolds map-ping to different LGs and the lack of mate-pair read support at the point of potential mis-joining. The chimeric scaffolds were split at the identified break points (Fig-ure S3). Finally, based on the two genetic maps, 308.1 Mb (98.3%) of the assembled scaffolds were anchored to the 11 linkage groups, and 283.9 Mb (90.6%) were oriented (Table S3 and Figure S2).

To evaluate the quality of the assembly, we first aligned genomic DNA and RNA-Seq reads back to the genome. About 94.7% of the reads from the 500-bp insert library could be mapped back in a proper paired-end relationship, and 89.0–92.2% of the RNA-Seq reads from five different tis-sues (flower, fruit, leaf, stem and root) could be mapped to the genome (Table S4). The completeness of the assembly was further assessed in terms of the gene content with the BUSCOsoftware (Sim~ao et al., 2015). Approximately 96.9% of the highly conserved plant orthologues were covered by the assembled bottle gourd genome, and 95.4% were found complete (Table S5). All these results supported the high quality of the assembled bottle gourd genome.

Repeat sequence prediction and gene annotation

We constructed de novo repeat libraries from the bottle gourd genome assembly, which were used as the input of

REPEATMASKER (http://www.repeatmasker.org/). This resulted in the masking of 46.9% of the bottle gourd genome (Table S6). In the bottle gourd genome assembly, 39.8% were annotated as the long terminal repeat (LTR) retro-transposons, and predominantly, the copia-type (23.2%) and gypsy-type (13.4%) LTRs. Similar findings have been reported in the genomes of other Cucurbitaceae plants including cucumber, melon and watermelon (Huang et al., 2009; Garcia-Mas et al., 2012; Guo et al., 2013).

We constructed both genome-guided and de novo tran-script assemblies using RNA-Seq data generated from diverse tissues of bottle gourd, including flower, fruit, leaf, stem and root. These assembled transcripts, combined with evidences from ab initio gene prediction and protein homology, were used in the MAKER pipeline (Cantarel et al., 2008) for gene prediction in the repeat-masked bottle gourd genome. We predicted 22 472 protein-coding genes in bottle gourd, comparable with the numbers of genes predicted in the genomes of cucumber (23 248) and water-melon (23 440) (Li et al., 2011; Guo et al., 2013). In total, 17 817 (79.3%) predicted bottle gourd genes were sup-ported by transcript evidence from this study. As expected, the density of genes in the bottle gourd genome correlated negatively with the occurrence of repeat sequences (Fig-ure 1b,c). Based on homology to proteins in the UniProt database and protein domains in the InterPro database, as well as gene ontology (GO) annotations, functional descriptions were assigned to 19 823 (88.2%) predicted bottle gourd genes. In total, 4901 GO terms were assigned to 16 786 (74.7%) bottle gourd genes (with an average of three GO terms per gene). We identified 1427 transcription factors in bottle gourd (Table S7). The numbers and classes of transcription factors in bottle gourd were com-parable with those in watermelon, cucumber and melon (Table S7).

Disease resistance genes

The sequenced Cucurbitaceae genomes are known to have a relatively smaller number of genes encoding nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins (also referred to as R genes) (Lin et al., 2013). We identified 84 R genes in the bottle gourd genome. This number is higher than those in watermelon, cucumber and melon (Table S8) but is considerably fewer than those in most other plants such as the Rosaceae species (Jia et al., 2015), poplar (Kohler et al., 2008), Arabidopsis (Baumgarten et al., 2003) and rice (Goff et al., 2002). We classified the R genes into the Toll/interleukin-1 (TIR) and coiled-coil (CC) types according to their amino-terminal conserved domains. Each class was further divided into two groups based on the presence or absence of LRR domains. Bottle gourd had the highest numbers of both TIR and CC types of R genes compared with the other three cucurbits, whereas melon had the highest number of R genes without the TIR or CC Table 1 Summary statistics of bottle gourd genome assembly

Scaffolda Contig

Size (bp) Number Size (bp) Number

N90 1 498 876 47 8116 10 545 N80 2 863 114 31 12 964 7680 N70 5 301 321 22 17 722 5724 N60 5 988 221 17 22 829 4244 N50 8 701 157 12 28 343 3077 N25 11 949 146 5 48 832 1044 Longest 19 050 912 1 249 401 1 Total 313 397 697 444 297 572 241 18 083 a

(4)

domain. As found in other plant species, R genes in bottle gourd were mainly arranged in clusters. Although they were distributed across all 11 chromosomes, chromo-somes 8 and 11 contained 21 and 19 R genes, respectively, and carried the largest two clusters (Figure S4).

Comparative genomics

We performed comparative analyses using the complete gene sets of bottle gourd and 10 sequenced plant species including four other cucurbits, namely, watermelon,

a

b

c

d

e

0 50 100

b: Gene density

100%

c: Repeat density

100%

0

0

Figure 1. Genomic landscape of bottle gourd.

(a) Ideogram of the 11 bottle gourd pseudochromosomes (in Mb scale).

(b) Gene density represented by percentage of genomic regions covered by genes in 200-kb windows.

(c) Repeat density represented by percentage of genomic regions covered by repeat sequences in 200-kb windows. (d) GC content in 200-kb windows.

(5)

cucumber, melon and bitter gourd (Momordica charantia), three additional Rosids, peach (Prunus persica), Arabidop-sis (ArabidopArabidop-sis thaliana) and grape (Vitis vinifera), one species from the Asterid clade, tomato (Solanum lycoper-sicum), one monocot species, rice (Oryza sativa), and the basal angiosperm Amborella (Amborella trichopoda). In total, 233 197 proteins from the 11 species (77.3% of the input sequences) were grouped into 15 894 families (rang-ing in size from 2 to 623 members), of which 9770 con-tained widespread genes in the selected species, 298 were eudicot-specific, 271 Cucurbitaceae-specific and 176 specific to the Benincaseae tribe that includes bottle gourd, water-melon, cucumber and melon (Figure 2a). A phylogenetic

tree was determined based on protein sequences of single-copy orthologous genes, which agreed with previous analyses in placing bottle gourd and watermelon in a clade sister to the Cucumis clade (Schaefer et al., 2009) (Figure 2a).

According to the known divergence time between cucumber and melon (estimated at 8.4–11.8 million years ago) (Sebastian et al., 2010), the Ks distribution of ortholo-gous genes between species suggested that bottle gourd diverged from watermelon around 10.4–14.6 million years ago (Mya), from Cucumis 17.3–24.3 Mya and from bitter gourd 29.2–41.0 Mya (Figure 2b). The Ks distribution pat-tern of paralogous gene pairs suggested the absence of a

10.4-14.6 Mya

17.3-24.3 Mya

29.2-41.0 Mya Pan-eudicot triplication

Substitutions per synonymous site (Ks)

Speciation Whole-genome duplication

Substitutions per synonymous site (Ks) (a)

(b) (c)

Figure 2. Phylogenetic relationship and comparative genomics analyses.

(a) Phylogenetic tree of 11 plant species based on protein sequences of single-copy orthologous genes. The scale bar of branch length shows 0.1 substitutions per site.

The bar graph shows the numbers of widespread genes found in 10 out of 11 species, eudicot-specific genes found in eight out of the nine eudicot species, Cucurbitaceae-specific genes found in all the five cucurbit species, Benicaseae-specific genes found in all the four species in the Benicaseae tribe, species-speci-fic genes with no orthologues in other species in the tree, and the remaining genes.

(b,c) Distribution of Ks of orthologous (b) or paralogous (c) genes in the genomes of bottle gourd, watermelon, cucumber, melon and bitter gourd. Times of speciation events are labelled near the corresponding peaks.

(6)

recent whole-genome duplication in bottle gourd (Fig-ure 2c), same as that found in other sequenced cucurbit genomes such as cucumber (Huang et al., 2009), melon (Garcia-Mas et al., 2012) and watermelon (Guo et al., 2013).

Synteny analyses between the genomes of bottle gourd and the three other Benincaseae species showed that four chromosomes of bottle gourd (chromosomes 1, 5, 6 and 8) each had a nearly one-to-one syntenic relationship with a melon chromosome (Figures 3 and S5), suggesting limited fission and fusion events of these chromosomes in the genomes of bottle gourd and melon since the divergence of Lagenaria and Cucumis. The entire bottle gourd chro-mosomes 5 and 6 were each contained in one cucumber chromosome, but no one-to-one chromosome collinearity was observed between bottle gourd and cucumber. In watermelon, only chromosome 8 was collinear to a single bottle gourd chromosome (chromosome 8), indicating dif-ferent patterns of interchromosomal rearrangements in the genomes of watermelon and bottle gourd after their diver-gence. These results also suggested that watermelon gen-ome experienced more rearrangements than that of bottle gourd, which was supported by our genome evolution analysis (see below).

Lineage-specific gene family expansion

We used the stochastic birth and death process modelling method implemented in CAFE v3.0 (Han et al., 2013) to analyse changes in gene family (orthologous group) size

among bottle gourd and 10 other species over a phylogeny (the phylogenetic tree in Figure 2a). Gene families with large variance in size among species were identified with CAFE family-wide P-values <0.05, and those with acceler-ated rate of expansion in bottle gourd were further deter-mined with branch-specific Viterbi P-values <0.05. We detected 20 gene families involving 250 bottle gourd genes that were expanded significantly in bottle gourd in com-parison with the 10 representative plant species (Table S9). The expanded families included five encoding b-galactosi-dase (OG0000100), polygalacturonase (OG0000122), carbo-hydrate esterase (OG0001026), pectinesterase (OG0001069) and b-glucosidase (OG0012259), respectively, that are pre-dicted to function in cell wall modifications. The thaumatin-like proteins (OG0000457, pathogenesis-related protein family 5), serine/threonine protein kinases (OG0000645) and jasmonate-induced proteins (OG0001690) might play roles in plant defence, and their expansion could be complementary to the low copy number of R genes in bottle gourd.

There were an additional 15 families that were expanded in bottle gourd compared with other investigated cucurbit species, and also expanded in lineages outside Cucur-bitaceae (Table S9). One of them was a MADS-box tran-scription factor family (OG0000961). The seven bottle gourd MADS-box genes in OG0000961 were classified as type I (or M-type) MADS-box genes, generally experiencing faster birth-and-death evolution than the type II MADS-box genes due to a higher rate of segmental duplications and

Figure 3. Schematic representation of syntenies between the genomes of bottle gourd and watermelon, melon and cucumber. Syntenic regions between species are connected by lines.

(7)

weaker purifying selection (Nam et al., 2004), which could partly explain the expansion of this family in bottle gourd. The type I MADS-box genes play key roles in plant repro-duction (Masiero et al., 2011).

Cucurbitaceae genome evolution

In order to assess the paleohistory of the Cucurbitaceae, we carried out a comparative analysis of genomes of bottle gourd, watermelon, melon, cucumber and squash (Cucur-bita moschata) (Sun et al., 2017), using the genome align-ment parameters and ancestral genome reconstruction methods described in Salse (2016). Conserved gene adja-cencies delivered an ancestral Cucurbitaceae karyotype (ACK) consisting of 12 protochromosomes (or Conserved

Ancestral Regions, CARs) with 18 534 protogenes (Figure 4a). The complete dot-plot based deconvolution into 12 recon-structed CARs of the observed synteny and paralogy between ACK and the investigated species validates the 12 pro-posed protochromosomes as the origin of Cucurbitaceae (Figure 4b). Our evolutionary scenario, reconciling the modern genome structures to the founder ACK, clearly established that melon has retained the ancestral genome structure in contrast to the other investigated species that experienced shuffling events. The 11 bottle gourd chro-mosomes derive from ACK followed by 19 chromosomal fissions and 20 fusions. The modern cucumber genome (seven chromosomes) has been shaped from ACK through 6 chromosomal fissions and 11 fusions, and

Figure 4. Cucurbitaceae evolutionary history.

(a) Evolutionary scenario of the modern Cucurbitaceae (bottle gourd, squash, watermelon, melon and cucumber) genomes from the ancestral Cucurbitaceae karyotype (ACK).

The modern genomes are illustrated at the bottom with different colours reflecting the origin from the 12 ancestral chromosomes from ACK. The shuffling events (fusions and fissions) are shown on the tree branches.

The whole-genome duplication event is shown with the red circle.

(b) Complete dot plot-based deconvolution into 12 reconstructed Conserved ancestral regions (CARs) (dot plot y-axis) of the observed synteny and paralogy (dot plot diagonals) between ACK (dot plot y-axis) and the investigated species (dot plot x-axis).

As a case example for ACK protochromosome 1, the paralogous (within the C. moschata genome) and orthologous (between modern genomes of different spe-cies as well as ACK) gene relationships are illustrated in green circles.

(8)

watermelon (11 chromosomes) through 27 fissions and 28 fusions. Finally, the squash genome experienced chro-mosomal rearrangements and a specific whole-genome duplication (Sun et al., 2017) to reach its modern structure of 20 chromosomes.

Our comparative genomics-based evolutionary scenario unravels the Cucurbitaceae paleohistory from the ACK, and delivers the complete catalogue of paralogous and ortholo-gous gene relationships between the modern Cucur-bitaceae genomes, which can now be used as a guide to perform translational research between the five investi-gated cucurbit species to accelerate the dissection of con-served agronomical traits.

Mapping of the PRSV-W resistance locus in bottle gourd The bottle gourd USVL5VR-Ls line is resistant to PRSV-W (Ling et al., 2013). To map the locus (or loci) underlying the PRSV-W resistance in bottle gourd, we developed an F2

population by crossing a virus-susceptible line, USVL10 to USVL5VR-Ls (Figure 5a). All F1individuals (n= 11) gener-ated from this cross were resistant to PRSV-W infection, indicating that PRSV-W resistance from USVL5VR-Ls was inherited in a dominant fashion. Trait segregation for PRSV-W resistance in the USVL109 USVL5VR-Ls F2 popu-lation was observed at a ratio close to 3:1 (77 resistant ver-sus 25 ver-susceptible). We further evaluated the ratio of homozygous versus heterozygous resistant F2 plants by testing the F3. Progeny of two-thirds of the resistant F2 lines (53 out of 77) segregated for the PRSV-W resistance, and those of the rest one-third (24 out of 77) did not. These results suggested that PRSV-W resistance in USVL5VR-Ls was controlled by a dominant monogenic locus, which we named Prs. PRSV-W resistance is also controlled by a sin-gle dominant gene (Prsv-2) in a multi-virus resistant line of cucumber, TMG-1 (Wai and Grumet, 1995), while a linked recessive gene (prsv) confers the resistance in another USVL5VR_Ls USVL10 Chr1 8,240,796 7,319,558 7,223,304 7,001,785 6,466,149 6,818,813 8,619,61 1 9,198,540 R R R R F2 S S S R R R R R R/S F3 S S S R/S R/S R R/S R/S R/S R/S R/S 9 36 65W 64 11 27 56 11W 41W 69 13 33 F2 Genotype

0.5 Mb USVL10 USVL5VR_Ls Heterozygous

Phenotype (a) (b)20 10 15 Chr 01 02 03 04 05 06 07 08 09 10 11 LOD 0 5 Chr01 10 Chr01_7001785 Chr01_7223304 Chr01_7319558 LOD 0 5 15 20 0 50 100 150 200 Genetic distance (cM) (d) (c)

Figure 5. Mapping of the Prs locus in bottle gourd. (a) Symptom expressions on the susceptible and resistant parent lines of bottle gourd to PRSV-W infection.

(b) Interval mapping analysis on the 11 bottle gourd chromosomes.

(c) Interval mapping result on chromosome 1. Markers with the genetic distances are on the x-axis.

The logarithm of the odds (LOD) score is shown by the y-axis.

The black bar indicates the 1-LOD confidence inter-val.

(d) Genotypes and PRSV-W-resistance phenotypes of F2individuals showing recombination at the Prs

locus and the phenotypes of their F3progenies.

The colour bars show the genotypes along chromo-some 1 inferred by SNPs for the seven recombinant F2 individuals between Chr01_7001785 and

Chr01_7319558 (in bold font), and another five plants fixed for the USVL10 allele at SNPs upstream or downstream to the Prs confidence interval. The physical positions of the SNPs flanking each recombination are shown at the bottom.

R and S indicate resistant and susceptible to PRSV-W infection, respectively.

The black arrow indicates the relative location of Prs to a certain SNP.

The colour arrows indicate the genotypes on the rest of the chromosome 1.

(9)

cucumber cultivar, Surinam Local (Wai et al., 1997). In melon (Cucumis melo), monogenic PRSV-W resistance is governed by two different dominant alleles (Pitrat and Lecoq, 1983).

The result of interval mapping indicated that Prs is on

chromosome 1 (Figure 5b), flanked by markers

Chr01_7001785 and Chr01_7319558 (Figure 5c). The most significant marker associated with Prs was Chr01_7223304 (P < 0.0001), explaining 64% of the phenotypic variation among the F2individuals, and the degree of dominance of the USVL5VR-Ls allele was 1.08. The SNP Chr01_7223304 was converted to a cleaved amplified polymorphic sequence (CAPS) marker and tested in 93 USVL109 USVL5VR_Ls F2 individuals. Phenotypes of 88 individuals (94.6%) upon infection of PRSV-W could be predicted by their genotypes scored based on the CAPS marker (Fig-ure S6), suggesting that this CAPS marker could be a use-ful tool for selection of PRSV-W resistance in bottle gourd breeding.

Among the 102 F2, five PRSV-W resistant plants were fixed for the USVL10 allele at SNPs upstream or down-stream to the Prs confidence interval. We also identified seven recombinant F2individuals between Chr01_7001785 and Chr01_7319558: two between Chr01_7001785 and Chr01_7223304, and five between Chr01_7223304 and Chr01_7319558. The PRSV-W resistance phenotypes exhib-ited by the F3 progeny of these F2 confirmed the precise location of Prs between Chr01_7001785 and Chr01_7319558 (Figures 5d and S7).

This 317.8-kb region contained 39 annotated bottle gourd genes (Table S10). An NBS-LRR gene confers PRSV resistance in melon (Brotman et al., 2013). How-ever, none of the 39 genes within the current Prs region is an R genes. One of these encodes an APETALA 2/ ethylene response factor (AP2/ERF) (Lsi01G008690). The AP2/ERF transcription factors play important roles in plant response to various stresses, and have been reported to underlie defence mechanisms against vari-ous pathogens including viruses (Phukan et al., 2017), which makes the AP2/ERF gene a viable candidate gene for Prs. Interestingly, an ethylene-responsive transcrip-tion factor is one of the candidate genes in the prsv02245 locus on cucumber chromosome 6 (Tian et al., 2015), suggesting that ethylene signalling may be involved in a

common PRSV-resistance mechanism in cucurbits.

Delimiting Prs to a smaller region requires additional experiments, such as resequencing the parental lines, USVL5VR-Ls and USVL10, to identify more variations within the current interval for fine mapping in larger segregating populations, and survey a number of PRSV-resistant and PRSV-susceptible bottle gourd lines to per-form association mapping. Expression analyses of genes in Prs would also provide evidence for determining the candidate genes.

EXPERIMENTAL PROCEDURES Plant materials

The two multiple virus resistant inbred lines of bottle gourd, USVL5VR-Ls (derived from PI 381834) and USVL1VR-Ls (derived from PI 271360), and a susceptible inbred line, USVL10 (derived from PI 181948) were used in this study. Genetic populations (F1,

F2, F3) were generated by crossing USVL10 with USVL5VR-Ls.

Seeds were germinated in Metro-Mix 360 potting soil (Sun Gro Horticulture; http://www.sungro.com/) on plastic trays. Seedlings were supplement with slow-release fertilizer (Osmacote 14-14-14; http://www.osmocote.co.za/) and grown in a greenhouse under 14–16 h of natural sun light with temperature at 25–30°C. Genomic and RNA-Seq library construction and sequencing

Healthy young leaves from USVL1VR-Ls were used for genomic DNA extraction. Tissues from USVL1VR-Ls for RNA-Seq included male flowers at anthesis, young fruits of 3 cm in length, young leaves, young stems from the first internode beneath the shoot apex and young roots. Genomic DNA was extracted using the QIAGEN DNeasy Plant Mini Kit, and total RNA was prepared using the QIAGEN RNeasy Plant Mini Kit (QIAGEN; https://www.qiagen. com/). We evaluated the quality of DNA and RNA via agarose gel electrophoresis, and measured the quantity by NanoDrop (Thermo Fisher Scientific; https://www.thermofisher.com/). We constructed the paired-end libraries with insert sizes ranging from 200 bp to 1 kb using Illumina’s Genomic DNA Sample Preparation kit (https://www.illumina.com/). The mate-pair libraries with insert sizes ranging from 3 to 15 kb were prepared using the Nextera Mate Pair Sample Preparation kit (https://www.illumina.com/prod ucts/by-type/sequencing-kits/library-prep-kits/nextera-matepair.html). Strand-specific RNA-Seq libraries were constructed from total RNA using the protocol described in Zhong et al. (2011). Genomic and RNA-Seq libraries were sequenced on an Illumina HiSeq 2500 system with 29 150 paired-end reads.

De novo genome assembly

Duplicated read pairs in the raw Illumina reads were collapsed into unique pairs. We defined duplicated read pairs as those that have identical bases at positions of 14–90 in both left and right reads. The adaptors and low-quality sequences were removed from the non-duplicated reads with Trimmomatic (Bolger et al., 2014). Read pairs from mate-pair libraries were processed with the ShortRead package (Morgan et al., 2009) to remove junction adaptors. The high-quality cleaned reads were assembled into scaffolds with SOAPdenovo2 (Luo et al., 2012) and gaps in the resulting scaffolds were filled with theGAPCLOSERprogram in theSOAPDENOVO2 package. Pilon (Walker et al., 2014) was used to correct base errors in the assembly, fix mis-assemblies and further fill gaps. Potential con-taminations from microorganisms were detected by aligning the assemblies to NCBI non-redundant nucleotide (nt) database using BLASTNwith an E-value cut-off of 1e-5. Scaffolds that were largely similar to bacterial sequences (more than 90% of their lengths) were removed. Finally, we removed redundant scaffolds that were contained within other scaffolds and with sequence identity>95%. Annotation of transposable elements

We constructed a de novo long terminal repeat retrotransposon (LTR-RT) library and a miniature inverted-repeat transposable elements (MITE) library by scanning the assembled bottle gourd genome using LTRharvest (Ellinghaus et al., 2008) and

(10)

MITE-Hunter (Han and Wessler, 2010), respectively. The repeat sequences in the assembled genome were then masked with these LTR-RT and MITE libraries using RepeatMasker (http:// www.repeatmasker.org/). We further searched for repeat ele-ments in the unmasked sequences using RepeatModeler (http:// www.repeatmasker.org/RepeatModeler.html). A single repeat library was generated by combining all the repetitive sequences generated above. This repeat library was then compared against the Swiss-Prot database (UniProt Consortium, 2011) to remove sequences that matched non-TE proteins. The TEs were classified with REPCLASS (Feschotte et al., 2009). The classified repeat libraries were used to identify TEs in the genome of bottle gourd withREPEATMASKER(http://www.repeatmasker.org/). Protein-coding gene prediction and functional annotation Gene prediction was performed using the repeat-masked bottle gourd genome with MAKER (Cantarel et al., 2008), which makes use of combined evidences from transcript mapping, protein homology and ab initio gene prediction to define confident gene models. SNAP (Korf, 2004) and AUGUSTUS (Stanke et al., 2006) were used for ab initio gene predictions. Two transcript assem-blies were generated using RNA-Seq data from different tissues of bottle gourd (flower, fruit, leaf, stem and root) with the de novo mode and the genome-guided mode, respectively, in Trinity (Grabherr et al., 2011). These two transcript assemblies were com-bined and aligned to the bottle gourd genome by the PASA2 pipe-line (Haas et al., 2003). The resulting alignments were used as the transcript evidence. To provide the protein homology evidence, protein sequences from Arabidopsis, watermelon, cucumber, and melon, as well as the UniProt (Swiss-Prot plant division) database were aligned to the bottle gourd genome using Spaln (Iwata and Gotoh, 2012). The same annotation pipeline was applied to the bit-ter gourd genome (Urasaki et al., 2017), which predicted 20 778 protein-coding genes that were used in the comparative genomics analyses in this study.

For gene annotation, protein sequences of the predicted bottle gourd genes were compared against the UniProt (Swiss-Prot and TrEMBL; http://www.uniprot.org/) database usingBLAST with an E-value cut-off of 1e-4, as well as the InterPro database using InterProS-can (Jones et al., 2014). Gene ontology (GO) annotations were obtained usingBLAST2GO(Conesa et al., 2005) based on the BLAST results and the InterProScan analysis. These outputs were processed using AHRD (Automated assignment of Human Readable Descrip-tions; https://github.com/groupschoof/AHRD) for assigning functional descriptions to the predicted bottle gourd genes. Transcription factors were identified using theITAKprogram (Zheng et al., 2016). Identification and classification of disease resistance genes To identify disease resistance (NBS-LRR) genes in the genomes of bottle gourd, watermelon, cucumber and melon, we performed hmmsearch (Eddy, 1998) using the Pfam database (http://pfam.xfa m.org) for the NB-ARC, TIR and LRR domains with the ‘–cut_ga’ option. The CC domains in the NBS-LRR genes were identified using the MARCOIL program (Delorenzi and Speed, 2002) with a probability threshold of 90. The identified CC domains were fur-ther validated using PAIRCOIL2 (McDonnell et al., 2006) with a P score cut-off of 0.025.

Comparative genomics analysis, phylogeny and divergence time

Homology relationships among proteins from bottle gourd and ten other representative plant species including Amborella

trichopoda, Oryza sativa, Solanum lycopersicum, Vitis vinifera, Arabidopsis thaliana, Prunus persica, Momordica charantia, Cucu-mis sativus, CucuCucu-mis melo, and Citrullus lanatus were inferred usingORTHOFINDER (Emms and Kelly, 2015). In total, 1784 ortholo-gous groups (OGs) containing only one gene from each of the 11 species were used to construct the phylogenetic tree. Protein sequences within each OG were aligned with gapped regions removed using the ETE3 toolkit (Huerta-Cepas et al., 2016). The alignments were used to construct phylogenetic trees usingPHYML 3.0 with default parameters (Guindon et al., 2010). A consensus tree was produced using the TreeAnnotator program in BEAST2 with default parameters (Bouckaert et al., 2014).

To identify syntenic regions within and between genomes, pro-tein sequences of bottle gourd, watermelon, cucumber, melon and bitter gourd were aligned against themselves and each other using BLASTP, and high-confidence collinear blocks were deter-mined usingMCSCANXwith an E-value cut-off of 1e-10 (Wang et al., 2012). Ks values of homologous pairs were calculated using the Yang-Nielsen algorithm implemented in thePAMLpackage (Yang, 1997). The time of divergence (T) was calculated as T= Ks/2r, where r is the evolutionary rate, which was set as 3.609 109to 5.069 109substitutions/site/year, according to the known diver-gence time between cucumber and melon which is estimated at 8.4–11.8 million years ago (Sebastian et al., 2010).

Detection of lineage-specific expansion of protein families Lineage-specific gene family expansion was inferred from OGs using the software CAFE (v3.0; Han et al., 2013). Orthologous groups were constructed using OrthoFinder (Emms and Kelly, 2015) with protein sequences from bottle gourd and 10 other plant species, namely, Amborella trichopoda, Oryza sativa, Solanum lycopersicum, Vitis vinifera, Arabidopsis thaliana, Prunus persica, Momordica charantia, Cucumis sativus, Cucumis melo, and Citrul-lus lanatus. The random gene birth and death rates were esti-mated using the maximum-likelihood method across the phylogenetic tree composed of the 11 species (constructed as described above). We first identified gene families (OGs) with a large variance in size by CAFE with P-value cut-off of 0.05. The branches of the tree where the largest changes had taken place were further identified using the Viterbi method that calculates P-values for transitions between the parent and child family sizes for all the branches of the phylogenetic tree (Han et al., 2013). Gene families with accelerated rate of expansion in bottle gourd were determined with a branch-specific Viterbi P-value cut-off of 0.05. The functional description of each gene family was based on the AHRD annotations of bottle gourd genes in the family.

Evolutionary analysis of Cucurbitaceae genomes

The ACK, a ‘median’ or ‘intermediate’ genome consisting of a clean reference gene order common to the extant species investigated, and derived evolutionary scenario were obtained following the method described in Salse (2016) based on the orthologous and paralogous relationships identified between Lagenaria siceraria (11 chromosomes, 22 472 genes), Cucurbita moschata (20 chromo-somes, 32 205 genes) (Sun et al., 2017), Citrullus lanatus (11 chro-mosomes, 23 440 genes) (Guo et al., 2013), Cucumis melo (12 chromosomes, 27 427 genes) (Garcia-Mas et al., 2012), Cucumis sativus (seven chromosomes, 23 248 genes) (Huang et al., 2009). Briefly, the investigated genomes were first aligned to identify con-served and duplicated gene pairs based on alignment parameters: Cumulative Identity Percentage and Cumulative Alignment Length Percentage. Groups of conserved and duplicated genes were then clustered and chained into ancestral protochromosomes (also

(11)

referred to as Contiguous Ancestral Regions) corresponding to independent sets of blocks sharing paralogous and/or orthologous relationships in modern species. Conserved gene pairs (or conserved groups of gene-to-gene adjacencies) between the investigated species were considered as potentially ancestral in the same order and orientation. From the reconstructed ACK, an evolutionary scenario that might have operated between ACK and the modern genomes was then inferred taking into account the fewest number of genomic rearrangements, which included deletions, inversions, translocations, fusions and fissions). Virus isolates and inoculation

A PRSV isolate collected from infected watermelon in South Caro-lina was propagated and maintained on susceptible bottle gourd plants. Virus inoculum was prepared by macerating virus-infected leaves in ice-cold 0.1Mphosphate buffer (pH 7.0) with a Homex-6 tissue homogenizer (BioReba; http://www.bioreba.ch/) or in a plastic sampling bag using a hand-held roller, and maintained on ice dur-ing inoculation to preserve virus infectivity. Seedldur-ings at 1–2 leaf stage, lightly dusted with carborundum (320-grit, Thermo Fisher Scientific), were gently rub-inoculated with a cotton swab soaked in the prepared virus inoculum. To ensure virus infection on the tested plants without potential escape, a second inoculation using a freshly prepared inoculum was carried out within a week following the first inoculation. Each experiment also included mock-inocu-lated plants. Rating of resistance and susceptibility was based on the severity of symptom expression (mosaic or asymptomatic) on the inoculated plants 4 weeks post inoculation (wpi), in combina-tion with enzyme-linked immunosorbent assay (ELISA) absorbent readings: 0, symptomless with no detectable level of virus in ELISA, or plant recovery, which displayed mild mosaic on the inoculated leaves followed by asymptomatic on the systemic leaves, with low detectable level of the virus in ELISA (OD405 nm< 0.3); 1, persistent

mosaic and leaf epinasty on the systemic leaves, with a detectable level of the virus in ELISA (OD405 nm> 0.3). An ELISA to the specific

virus was conducted following the manufacturer’s instructions (BioReba). Plants with rating at 0 were considered resistant and those with ratings at one were susceptible.

Genotyping of the F2population

Genomic DNA was extracted from young healthy leaf tissues of 102 F2 individuals and the parents, USVL10 and USVL5VR-Ls,

using a DNeasyâ Plant Mini Kit (Qiagen) and quantified using the dsDNA BR kit for Qubit (Invitrogen; https://www.thermofisher. com/). Genotyping of these plants was performed following the Genotyping-by-Sequencing (GBS) protocol (Elshire et al., 2011), using ApeKI as the restriction enzyme. The resulting libraries were multiplexed and sequenced on a HiSeq 2500 system (Illumina Inc.; https://www.illumina.com/) with single-end read length of 101 bp. The GBS sequencing reads generated in this study and reads from a previously published RAD-Seq experiment on a F2population

derived from a cross between two bottle gourd cultivars, Hang-zhou gourd (HZ) and J129 (Xu et al., 2014) were processed using the TASSEL-GBS (v4) pipeline (Glaubitz et al., 2014) for SNP call-ing. Briefly, for each population, the reads from all samples were combined and collapsed into a master tag list. Tags that occurred at least 10 times were mapped to the bottle gourd genome using BWA (v0.7.12) (Li and Durbin, 2009) with default parameters. Alignments with mapping quality≥ 1 were used for SNP calling. The minimum genotype quality was set to 50. SNPs with data missing rate >10% or minor allele frequency (MAF) < 0.1 were excluded from the downstream analyses.

Linkage map construction and quantitative trait locus (QTL) mapping

SNPs resulted from GBS were used to construct multipoint maxi-mum-likelihood linkage maps using a Hidden Markov Model approach implemented in theR package onemap (https://cran.r-project.org/web/packages/onemap/). Assignment of markers to linkage groups was based on the logarithm of the odds (LOD) threshold of six and linkages with a recombination rate< 0.4. Recombination frequencies were converted into centiMorgans (cM) distances using the Kosambi function. The two resulting genetic maps were combined for pseudomolecule construction with ALLMAPS by optimizing agreement between the maps to order and orient scaffolds (Tang et al., 2015). Interval mapping of QTL was performed using the Haley-Knott regression method implemented in R/qtl (Broman et al., 2003). The effect of the locus on PRSV-W resistance was also evaluated by single-point one-wayANOVA. The degree of dominance of the alleles was estimated by the D/A ratio, where D= AB  (AA + BB)/2 and A = (AA  BB)/ 2 (AA, BB and AB represent phenotypic values of the resistant, susceptible homozygous and heterozygous genotypes, respec-tively).

Design and evaluation of the CAPS marker

The CAPS marker corresponding to the SNP, Chr01_7223304, was developed following the standard protocol (Konieczny and Ausu-bel, 1993). Primers 50-TGAGCTCAAATGAGTATGTTTGC-30 and 50 -CACGCTCCCCTTTTCAATAA-30 were used to amplify a 348-bp DNA fragment, which yielded distinct banding patterns in different genotypes when digested with the restriction endonuclease, HaeIII. The following polymerase chain reaction (PCR) program was used: 2 min initial denaturation at 95°C and 35 cycles of 95°C for 30 sec, 58°C for 30 sec and 72°C for 30 sec, followed by a final extension at 72°C for 5 min. Digested products were run in 3% agarose gels.

ACCESSION NUMBERS

This Whole-Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession NHZF00000000. The version described in this paper is version NHZF 01000000. Raw genome and transcriptome sequence reads have been deposited into the NCBI sequence read archive (SRA) under accessions SRP107898 and SRP107894. The genome sequences and the annotations of bottle gourd are also available at the Cucurbit Genomics Database (http:// cucurbitgenomics.org) (Will be made publicly available upon the acceptance of this manuscript).

ACKNOWLEDGEMENTS

We thank Andrea Gilliard for her excellent technical assistance. This research was supported by grants from the USDA-ARS Office of International Research Program from a grant provided by the USAID Feed-the-Future program (58–0210–3-012) and USDA National Institute of Food and Agriculture Specialty Crop Research Initiative (2015–51181–24285).

CONFLICT OF INTEREST

(12)

SUPPORTING INFORMATION

Additional Supporting Information may be found in the online ver-sion of this article.

Figure S1. K-mer distribution of Illumina genomic sequencing reads of bottle gourd, USVL1VR-Ls.

Figure S2. Construction of the bottle gourd pseudomolecules. Figure S3. Alignments of 15-kb insert size mate-pair reads to the three initial chimeric scaffolds identified based on the genetic maps.

Figure S4. Locations of R genes on bottle gourd chromosomes. Figure S5. Syntenic dot plots showing the comparisons between the genomes of bottle gourd and watermelon, melon and cucum-ber.

Figure S6. Genotyping the inbred lines, F1and F2individuals by

the CAPS marker.

Figure S7. Genotypes and PRSV-resistance phenotypes of Prs recombinant F2plants and the phenotypes of their F3progenies.

Table S1. Summary of the genomic sequencing reads. Table S2. Summary of the bottle gourd genetic maps.

Table S3. Construction of pseudomolecules of bottle gourd based on the genetic maps.

Table S4. Mapping of genomic and RNA-Seq reads to the bottle gourd assembly.

Table S5. Assessing the completeness of the assembly by BUSCO. Table S6. Summary of repeat annotation.

Table S7. Numbers of transcription factors identified in the gen-omes of bottle gourd and other plant species.

Table S8. Numbers of NBS-LRR genes in four cucurbit species. Table S9. Significantly expanded gene families in bottle gourd. Table S10. Annotated bottle gourd genes in the Prs locus.

REFERENCES

Achigan-Dako, E.G., Fuchs, J., Ahanchede, A. and Blattner, F.R. (2008) Flow cytometric analysis in Lagenaria siceraria (Cucurbitaceae) indicates cor-relation of genome size with usage types and growing elevation. Plant Syst. Evol. 276, 9–19.

Ali, A., Abdalla, O., Bruton, B., Fish, W., Sikora, E., Zhang, S. and Taylor, M. (2012) Occurrence of viruses infecting watermelon, other cucurbits, and weeds in the parts of southern United States. Plant Heal. Prog. https:// doi.org/10.1094/php-2012–0824–01-rs

Baumgarten, A., Cannon, S., Spangler, R. and May, G. (2003) Genome-level evolution of resistance genes in Arabidopsis thaliana. Genetics, 165, 309–319.

Bolger, A.M., Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trim-mer for Illumina sequence data. Bioinformatics, 30, 2114–2120. Bouckaert, R., Heled, J., K€uhnert, D., Vaughan, T., Wu, C.-H., Xie, D.,

Suchard, M.A., Rambaut, A. and Drummond, A.J. (2014) BEAST 2: a soft-ware platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, e1003537.

Broman, K.W., Wu, H., Sen, S. and Churchill, G.A. (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics, 19, 889–890.

Brotman, Y., Normantovich, M., Goldenberg, Z. et al. (2013) Dual resistance of melon to Fusarium oxysporum races 0 and 2 and to Papaya ring-spot virus is controlled by a pair of head-to-head-oriented NB-LRR genes of unusual architecture. Mol. Plant, 6, 235–238.

Cantarel, B.L., Korf, I., Robb, S.M.C., Parra, G., Ross, E., Moore, B., Holt, C., Sanchez Alvarado, A. and Yandell, M. (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196.

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M. and Robles, M. (2005) Blast2GO: a universal tool for annotation, visualiza-tion and analysis in funcvisualiza-tional genomics research. Bioinformatics, 21, 3674–3676.

Davis, A.R., Perkins-Veazie, P., Sakata, Y. et al. (2008) Cucurbit grafting. CRC. Crit. Rev. Plant Sci. 27, 50–74.

Decker-Walters, D.S., Wilkins-Ellert, M., Chung, S.-M. and Staub, J.E. (2004) Discovery and genetic assessment of wild bottle gourd [Lagenaria sicer-aria (Mol.) Standley; Cucurbitaceae] from Zimbabwe. Econ. Bot. 58, 501– 508.

Delorenzi, M. and Speed, T. (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 18, 617–625.

Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755– 763.

Ellinghaus, D., Kurtz, S. and Willhoeft, U. (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9, 18.

Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell, S.E. (2011) A robust, simple Genotyping-by-Sequen-cing (GBS) approach for high diversity species. PLoS ONE, 6, e19379. Emms, D.M. and Kelly, S. (2015) OrthoFinder: solving fundamental biases in

whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157.

Feschotte, C., Keswani, U., Ranganathan, N., Guibotsy, M.L. and Levine, D. (2009) Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic gen-omes. Genome Biol. Evol. 1, 205–220.

Garcia-Mas, J., Benjak, A., Sanseverino, W. et al. (2012) The genome of melon (Cucumis melo L.). Proc. Natl Acad. Sci. USA. 109, 11872–11877. Glaubitz, J.C., Casstevens, T.M., Lu, F., Harriman, J., Elshire, R.J., Sun, Q.

and Buckler, E.S. (2014) TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE, 9, e90346.

Goff, S.A., Ricke, D., Lan, T.-H. et al. (2002) A draft sequence of the rice gen-ome (Oryza sativa L. ssp. japonica). Science, 296, 92–100.

Gonsalves, D. (1998) Control of Papaya ringspot virus in papaya: a case study. Annu. Rev. Phytopathol. 36, 415–437.

Gonsalves, D., Tripathi, S., Carr, J.B. and Suzuki, J.Y. (2010) Papaya ring-spot virus. Plant Health Instructor. https://doi.org/10.1094/PHI-I-2010-1004-01.

Grabherr, M.G., Haas, B.J., Yassour, M. et al. (2011) Full-length transcrip-tome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652.

Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. and Gas-cuel, O. (2010) New algorithms and methods to estimate maximum-likeli-hood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321.

Guo, S., Zhang, J., Sun, H. et al. (2013) The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat. Genet. 45, 51–58.

Haas, B.J., Delcher, A.L., Mount, S.M. et al. (2003) Improving the Arabidop-sis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666.

Han, Y. and Wessler, S.R. (2010) MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199.

Han, M.V., Thomas, G.W.C., Lugo-Martinez, J. and Hahn, M.W. (2013) Esti-mating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 30, 1987–1997. Huang, S., Li, R., Zhang, Z. et al. (2009) The genome of the cucumber,

Cucu-mis sativus L. Nat. Genet. 41, 1275–1281.

Huerta-Cepas, J., Serra, F. and Bork, P. (2016) ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638. Iwata, H. and Gotoh, O. (2012) Benchmarking spliced alignment programs

including Spaln2, an extended version of Spaln that incorporates addi-tional species-specific features. Nucleic Acids Res. 40, e161.

Jia, Y., Yuan, Y., Zhang, Y., Yang, S. and Zhang, X. (2015) Extreme expan-sion of NBS-encoding genes in Rosaceae. BMC Genet. 16, 48.

Jones, P., Binns, D., Chang, H.-Y. et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics, 30, 1236–1240.

King, S.R., Davis, A.R., Liu, W., Levi, A., King, S.R., Davis, A.R., Liu, W. and Levi, A. (2008) Grafting for disease resistance. HortScience, 43, 1673– 1676.

Kistler, L., Montenegro, A., Smith, B.D., Gifford, J.A., Green, R.E., Newsom, L.A. and Shapiro, B. (2014) Transoceanic drift and the domestication of African bottle gourds in the Americas. Proc. Natl Acad. Sci. USA. 111, 2937–2941.

(13)

Kohler, A., Rinaldi, C., Duplessis, S., Baucher, M., Geelen, D., Duchaussoy, F., Meyers, B.C., Boerjan, W. and Martin, F. (2008) Genome-wide identifi-cation of NBS resistance genes in Populus trichocarpa. Plant Mol. Biol. 66, 619–636.

Konieczny, A. and Ausubel, F.M. (1993) A procedure for mapping Arabidop-sis mutations using co-dominant ecotype-specific PCR-based markers. Plant J. 4, 403–410.

Korf, I. (2004) Gene finding in novel genomes. BMC Bioinformatics, 5, 59. Kousik, C.S., Levi, A., Ling, K.-S. and Wechter, P. (2008) Potential sources of

resistance to cucurbit powdery mildew in U. S. plant introductions of bottle gourd. HortScience, 43, 1359–1364.

Lecoq, H., Wisler, G. and Pitrat, M. (1998) Cucurbit viruses: the classic and the emerging. In Cucurbitaceae’98 Eval. Enhanc. cucurbit germplasm (McCreight, J.D. ed.). Alexandria, VA: ASHS press, pp. 126–142. Levi, A., Thies, J., Ling, K., Simmons, A.M., Kousik, C. and Hassell, R. (2009)

Genetic diversity among Lagenaria siceraria accessions containing resis-tance to root-knot nematodes, whiteflies, ZYMV or powdery mildew. Plant Genet. Resour. 7, 216–226.

Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Bur-rows-Wheeler transform. Bioinformatics, 25, 1754–1760.

Li, Z., Zhang, Z., Yan, P., Huang, S., Fei, Z. and Lin, K. (2011) RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genom. 12, 540.

Lin, X., Zhang, Y., Kuang, H. and Chen, J. (2013) Frequent loss of lineages and deficient duplications accounted for low copy number of disease resistance genes in Cucurbitaceae. BMC Genom. 14, 1.

Ling, K.-S. and Levi, A. (2007) Sources of resistance to Zucchini yellow mosaic virus in Lagenaria siceraria germplasm. HortScience, 42, 1124– 1126.

Ling, K.-S., Levi, A., Adkins, S., Kousik, C.S., Miller, G., Hassell, R. and Keinath, A.P. (2013) Development and field evaluation of multiple virus-resistant bottle gourd (Lagenaria siceraria). Plant Dis. 97, 1057– 1062.

Luo, R., Liu, B., Xie, Y. et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, 18. Mashilo, J., Shimelis, H. and Odindo, A. (2017) Phenotypic and genotypic

characterization of bottle gourd [Lagenaria siceraria (Molina) Standl.] and implications for breeding: a Review. Sci. Hortic. 222, 136–144. Masiero, S., Colombo, L., Grini, P.E., Schnittger, A. and Kater, M.M. (2011)

The emerging importance of type I MADS box transcription factors for plant reproduction. Plant Cell, 23, 865–872.

McDonnell, A.V., Jiang, T., Keating, A.E. and Berger, B. (2006) Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics, 22, 356–358.

Morgan, M., Anders, S., Lawrence, M., Aboyoun, P., Pages, H. and Gentle-man, R. (2009) ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinfor-matics, 25, 2607–2608.

Morimoto, Y. and Mvere, B. (2004) Lagenaria siceraria. In Vegetables. Plant Resources of Tropical Africa 2. (Grubben, G.J.H. and Denton, O.A. eds). Wageningen/Leiden: Backhuys Publishers/CTA, pp. 353–358.

Nam, J., Kim, J., Lee, S., An, G., Ma, H. and Nei, M. (2004) Type I MADS-box genes have experienced faster birth-and-death evolution than type II MADS-box genes in angiosperms. Proc. Natl Acad. Sci. USA. 101, 1910– 1915.

Phukan, U.J., Jeena, G.S., Tripathi, V. and Shukla, R.K. (2017) Regulation of Apetala2/Ethylene response factors in plants. Front. Plant Sci. 8, 150. Pitrat, M. and Lecoq, H. (1983) Two alleles for Watermelon mosaic virus 1

resistance in muskmelon. Cucurbit Genet. Coop. Rep. 6, 52–53. Provvidenti, R. (1981) Sources of resistance to viruses in Lagenaria siceraria.

Cucurbit Genet. Coop. Rep. 4, 38–40.

Provvidenti, R. (1995) A multi-viral resistant cultivar of bottle gourd (Lage-naria siceraria from Taiwan). Cucurbit Genet. Coop. Rep. 18, 65–67.

Salse, J. (2016) Ancestors of modern plant crops. Curr. Opin. Plant Biol. 30, 134–142.

Sarao, N.K., Pathak, M. and Kaur, N. (2014) Microsatellite-based DNA finger-printing and genetic diversity of bottle gourd genotypes. Plant Genet. Resour. 12, 156–159.

Schaefer, H., Heibl, C. and Renner, S.S. (2009) Gourds afloat: a dated phy-logeny reveals an Asian origin of the gourd family (Cucurbitaceae) and numerous oversea dispersal events. Proc. R. Soc. London B Biol. Sci. 276, 843–851.

Schlumbaum, A. and Vandorpe, P. (2012) A short history of Lagenaria sicer-aria (bottle gourd) in the Roman provinces: morphotypes and archaeoge-netics. Veg. Hist. Archaeobot. 21, 499–509.

Sebastian, P., Schaefer, H., Telford, I.R.H. and Renner, S.S. (2010) Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc. Natl Acad. Sci. USA. 107, 14269–14273.

Sim~ao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. and Zdob-nov, E.M. (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31, 3210–3212. Stanke, M., Tzvetkova, A. and Morgenstern, B. (2006) AUGUSTUS at

EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7, S11.

Sun, H., Wu, S., Zhang, G. et al. (2017) Karyotype stability and unbiased fractionation in the paleo-allotetraploid Cucurbita genomes. Mol. Plant. 10, 1293–1306.

Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C., Schnable, P.S., Lyons, E. and Lu, J. (2015) ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3.

Tian, G., Yang, Y., Zhang, S., Miao, H., Lu, H., Wang, Y., Xie, B. and Gu, X. (2015) Genetic analysis and gene mapping of papaya ring spot virus resistance in cucumber. Mol. Breed. 35, 110.

Turechek, W.W., Kousik, C.S. and Adkins, S. (2010) Distribution of four viruses in single and mixed infections within infected watermelon plants in Florida. Phytopathology, 100, 1194–1203.

UniProt Consortium (2011) Ongoing and future developments at the univer-sal protein resource. Nucleic Acids Res. 39, D214–D219.

Urasaki, N., Takagi, H., Natsume, S. et al. (2017) Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions. DNA Res. 24, 51–58.

Wai, T. and Grumet, R. (1995) Inheritance of resistance to watermelon mosaic virus in the cucumber line TMG-1: tissue-specific expression and relationship to Zucchini yellow mosaic virus resistance. Theor. Appl. Genet. 91, 699–706.

Wai, T., Staub, J.E., Kabelka, E. and Grumet, R. (1997) Linkage analysis of potyvirus resistance alleles in cucumber. J. Hered. 88, 454–458. Walker, B.J., Abeel, T., Shea, T. et al. (2014) Pilon: an integrated tool for

comprehensive microbial variant detection and genome assembly improvement. PLoS ONE, 9, e112963.

Wang, Y., Tang, H., Debarry, J.D. et al. (2012) MCScanX: a toolkit for detec-tion and evoludetec-tionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49.

Xu, P., Xu, S., Wu, X., Tao, Y., Wang, B., Wang, S., Qin, D., Lu, Z. and Li, G. (2014) Population genomic analyses from low-coverage RAD-Seq data: a case study on the non-model cucurbit bottle gourd. Plant J. 77, 430–442. Yang, Z. (1997) PAML: a program package for phylogenetic analysis by

max-imum likelihood. Bioinformatics, 13, 555–556.

Zheng, Y., Jiao, C., Sun, H. et al. (2016) iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant, 9, 1667–1670.

Zhong, S., Joung, J.-G., Zheng, Y., Chen, Y., Liu, B., Shao, Y., Xiang, J.Z., Fei, Z. and Giovannoni, J.J. (2011) High-throughput Illumina strand-spe-cific RNA sequencing library preparation. Cold Spring Harb. Protoc. 2011, 940–949.

Figure

Figure 1. Genomic landscape of bottle gourd.
Figure 2. Phylogenetic relationship and comparative genomics analyses.
Figure 3. Schematic representation of syntenies between the genomes of bottle gourd and watermelon, melon and cucumber.
Figure 4. Cucurbitaceae evolutionary history.
+2

Références

Documents relatifs

Cinq projets étaient en cours de réalisation en 2009-2010 et se poursuivront en 2010-2011 : la Démarche de prévention du décrochage scolaire, le Guide d’accompagnement aux devoirs et

SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence

We have validated CmVPS41 as the gene responsible for the resistance, both by generating CMV susceptible transgenic melon plants, expressing the susceptible allele in the

SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University

It is important to provide the users of biomolecular databases with a degree of integration between the three types of sequence- related databases (nucleic acid sequences,

a) Thanks to a collaboration with the group of Chris Sander at EMBL, the feature table of sequence entries of proteins whose tertiary structure is known experimentally contains

SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University

L’idée de contrôler l’activité des ribosomes impliqués dans la synthèse des protéines de résistance a été proposée en 2007 comme une stratégie de lutte contre