• Aucun résultat trouvé

Development and validation of the Axiom (R) Apple480K SNP genotyping array

N/A
N/A
Protected

Academic year: 2021

Partager "Development and validation of the Axiom (R) Apple480K SNP genotyping array"

Copied!
13
0
0

Texte intégral

(1)

RESOURCE

Development and validation of the Axiom

®

Apple480K SNP

genotyping array

Luca Bianco1, Alessandro Cestaro1, Gareth Linsmith1, Helene Muranty2, Caroline Denance2, Anthony Theron3,

Charles Poncet3, Diego Micheletti1, Emanuela Kerschbamer1, Erica A. Di Pierro4, Simone Larger1, Massimo Pindo1,

Eric Van de Weg5, Alessandro Davassi6, Francßois Laurens2, Riccardo Velasco1, Charles-Eric Durel2and Michela Troggio1,*

1

Research and Innovation Centre, Fondazione Edmund Mach, via Edmund Mach 1, 38010 San Michele all’Adige, Trento, Italy,

2

Institut de Recherche en Horticulture et Semences– UMR1345, Institut National de la Recherche Agronomique (INRA), SFR

4207 QUASAV, 42 Rue Georges Morel, F-49071 Beaucouze, France,

3

Plateforme Gentyane, INRA UMR1095 Genetics, Diversity and Ecophysiology of Cereals, 5 Chemin de Beaulieu, 63039 Clermont-Ferrand, France,

4

Department of Biosciences, University of Milan, via Celoria 26, 20133 Milano, Italy,

5

Wageningen UR Plant Breeding, Wageningen University and Research Centre, PO Box 16, 6700 AA, Wageningen, The Netherlands, and

6

Affymetrix UK Ltd, Mercury Park, Wycombe Lane, Wooburn Green, High Wycombe HP10 0HH, UK

Received 9 December 2015; revised 3 February 2016; accepted 5 February 2016; published online 26 February 2016. *For correspondence (e-mail michela.troggio@fmach.it).

SUMMARY

Cultivated apple (Malus 3 domestica Borkh.) is one of the most important fruit crops in temperate regions, and has great economic and cultural value. The apple genome is highly heterozygous and has undergone a recent duplication which, combined with a rapid linkage disequilibrium decay, makes it dif-ficult to perform genome-wide association (GWA) studies. Single nucleotide polymorphism arrays offer highly multiplexed assays at a relatively low cost per data point and can be a valid tool for the identifi-cation of the markers associated with traits of interest. Here, we describe the development and

valida-tion of a 487K SNP Affymetrix Axiomâgenotyping array for apple and discuss its potential applications.

The array has been built from the high-depth resequencing of 63 different cultivars covering most of the genetic diversity in cultivated apple. The SNPs were chosen by applying a focal points approach to enrich genic regions, but also to reach a uniform coverage of non-genic regions. A total of 1324 apple accessions, including the 92 progenies of two mapping populations, have been genotyped with the

AxiomâApple480K to assess the effectiveness of the array. A large majority of SNPs (359 994 or 74%) fell

in the stringent class of poly high resolution polymorphisms. We also devised a filtering procedure to identify a subset of 275K very robust markers that can be safely used for germplasm surveys in apple.

The AxiomâApple480K has now been commercially released both for public and proprietary use and will

likely be a reference tool for GWA studies in apple.

Keywords: SNP chip, Malus 3 domestica Borkh., genotyping, genome-wide association study, validation, linkage mapping.

INTRODUCTION

During the last decade, high-throughput genotyping has facilitated the dissection of complex traits in species with large and/or complex genomes and a high level of genetic diversity (Eckert et al., 2012; Chen et al., 2014; Evans et al., 2014; Unterseer et al., 2014; Falginella et al., 2015).

High-density genotyping data deliver high precision for the detection of quantitative trait loci (QTLs)/genes, and allows for the improvement of large-scale breeding programs and the performance of genomic selection (Kumar et al., 2012; Li et al., 2015; Perez-Enciso et al., 2015).

(2)

Array-based marker systems have been increasingly adopted for high-throughput genotyping, not only in model organisms but also in many non-model plant spe-cies for which genomic resources are now available. Among the many available array types for molecular mar-ker genotyping, single nucleotide polymorphism (SNP) chips have become popular for genome-wide genotyping since they offer highly multiplexed assays at a relatively low cost per data point. The discovery of SNPs has also been facilitated by the reduced costs of resequencing using next-generation sequencing technologies (for a review see Gupta et al., 2013).

Illumina and Affymetrix platforms became the most widely used SNP array technologies for density, high-throughput SNP genotyping. A number of medium-density SNP arrays for plants have been published and extensively used for genetic studies, such as linkage mapping, popula-tion structure and linkage disequilibrium (LD) (for a review

see Ganal et al., 2014). More recently, Axiomâhigh-density

genotyping arrays have been developed to perform gen-ome-wide association (GWA) studies on strawberry, cot-ton, rose and soybean (Bassil et al., 2015; Hulse-Kemp et al., 2015; Koning-Boucoiran et al., 2015; Lee et al., 2015). Such genotyping tools are now starting to become avail-able also for apple, and we believe this will boost GWA studies in this species.

Apple (Malus9 domestica Borkh.) is one of the most

important fruit crops in temperate regions, with great eco-nomic and cultural value (Juniper and Mabberley, 2006). Apple, like other perennial crops, is clonally propagated, but an extensive germplasm collection is available around the world. Domesticated apple has in fact retained high

genetic diversity throughout the domestication and

improvement process, as reported in numerous studies (Velasco et al., 2010; Micheletti et al., 2011; Gross et al., 2014).

A large number of genomic resources are already pub-licly available for the domesticated apple, including a refer-ence genome (Velasco et al., 2010), several genetic variants available through open-access journals (Chagne et al., 2012; Khan et al., 2012; Bianco et al., 2014; Gardner et al., 2014), more than 22 500 SNPs deposited in the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/) and

two publicly available Infiniumâ SNP arrays, namely, the

IRSC 8K and the 20K (Chagne et al., 2012; Bianco et al., 2014). A number of studies have reported the use of the

two medium-density Infiniumâarrays for the generation of

a number of saturated genetic linkage maps and QTL detection (Antanaviciute et al., 2012; Clark et al., 2014; Fal-ginella et al., 2015) in bi-parental full-sib families, for pedi-gree-based analysis (PBA) on multiple pedigreed families (Guan et al., 2015; Allard et al., 2016) or for GWA analysis in a family-based design (Kumar et al., 2013). However, GWA studies using the 8K SNP array in collections of

unre-lated apple accessions (Kumar et al., 2014; Leforestier et al., 2015) highlighted the limitations of the arrays avail-able to date, which provide a low density of markers for

GWA studies as LD decays quite quickly in apple. An r2

below 0.2 within 100 kbp was in fact reported by Lefor-estier et al. (2015) in two core collections of unrelated accessions of dessert and cider apples, while Kumar et al.

(2014) reported an r² below 0.13 within 100 kbp in a diverse

population of apple accessions. Moreover, even the 20K

apple Infiniumâarray was based on a restricted number of

accessions (13) in the discovery panel, and although this tool perfectly suits the needs of PBA investigations (Allard et al., 2016), its application to GWA studies is limited by the restricted genetic background on which it is based

(Gross et al., 2014). On the contrary, the 8K Infiniumâarray

has been developed starting from a wider discovery panel comprising 27 accessions; it therefore covers more diver-sity but the low SNP dendiver-sity limits its applicability to GWA studies.

We have developed a much denser SNP array based on the high-depth resequencing of 63 apple cultivars covering most of the genetic diversity in cultivated apple in order to overcome the limitations of both previously developed arrays and to provide a tool enabling GWA studies in apple. To maximize the usefulness of the array, we decided to incorporate a small percentage

(7.5%) of rare SNPs (minor allele frequency <0.05) to

allow genetic diversity studies of uncharacterized apple collections.

In this paper, we describe the development and valida-tion of a 487K SNP Affymetrix Axiom genotyping array for

the cultivated apple (Malus9 domestica Borkh.) and

dis-cuss its potential application for GWA studies and for the improvement of the apple reference genome. By applying some rather stringent filters, we also provide a set of very robust and validated SNPs, selected from the 487K com-prising this array, that can be safely used for germplasm surveys in apple.

The AxiomâApple480K is currently the largest SNP

array available for a fruit tree species and has now been commercially released for both public and proprietary use.

RESULTS AND DISCUSSION Alignment and SNP detection

Alignment of the high-coverage whole-genome resequenc-ing of 63 apple accessions plus two doubled haploids (X9273 and X9336) (Table 1, Figure 1) against the Apple Genome Reference v.3 (https://www.rosaceae.org/) and multi-sample SNP calling produced a total of 15 499 525 variants. A detailed description of all the steps leading to the final selection of the SNPs included in the array is given in Figure 2.

(3)

The decision to allow up to seven mismatches for each alignment was taken in accordance with previous studies on apple (Bianco et al., 2014) and considering the large genomic diversity of the apple accessions present in the discovery panel. Although quite conservative, the filtering out of multi-mapping reads is a necessary step in organ-isms like apple that have a highly heterozygous and recently duplicated genome (Velasco et al., 2010). The rea-son for this is that the consensus produced by reads com-ing from paralogous regions may result in false variants. In the present study, removal of non-unique alignments was performed at the contig level rather than at a pseudo-chromosome or scaffold level, as potential scaffolding errors of contigs might result in failed SNPs due to paral-ogy issues in the latter two cases. A proper pairing filter

was applied, such that only read pairs uniquely aligning to the same contig were accepted regardless of their relative distance. This was a less conservative filter than the one applied for the Illumina 20K array in order to avoid exces-sive penalization of small structural rearrangements that might have occurred among the members of the wider SNP discovery panel.

The filtering out of heterozygous variants in at least one of the doubled haploids, a further measure to remove false-positive SNP calls, only removed 0.2% of the detected variants (leaving a total of 15 465 829 vari-ants), which gives an indication of the strong effective-ness of the post-alignment filters of uniqueeffective-ness and proper pairing in removing known repetitive and paralo-gous regions.

Accession name Origin Read pairs Accession name Origin Read pairs

Abbondanza ITA 90 321 460 Ijunskoe ranee RUS 123 440 174

Ag Alma RUS 127 334 651 Jantarnoe RUS 90 682 815

Aivaniya BGR 119 632 134 Jonathan USA 128 267 363

Ajmi TUN 82 950 154 Keswick Codlin GBR 69 130 834

Aker€o SWE 91 666 140 Kosıkove CZE 86 098 710

Alfred Jolibois FRA 92 947 232 Kronprins NOR 62 159 897

Amadou FRA 96 043 319 Lady Williams AUS 108 963 567

Annurca ITA 86 930 085 Macoun USA 88 835 758

Antonovka OB RUS 122 361 758 Maikki FIN 38 025 283

Antonovka Pamtorutka

RUS 39 392 157 Malinove holovouske CZE 81 050 254

Aport Kubanski RUS 71 297 068 McIntosh CAN 126 063 436

Belle et Bonne BEL 113 928 709 Mela Rosa ITA 95 875 984

Borowitsky RUS 96 122 310 Mela Rozza ITA 79 694 154

Braeburn NZL 113 833 008 Ovcı hubicka RUS 114 799 463

Budimka SRB 210 405 926 Panenske ceske CZE 76 470 501

Busiard ITA 84 486 599 Papirovka RUS 90 957 520

Cabarette BEL 83 997 287 Patte de Loup FRA 110 531 953

Chodske CZE 62 687 249 Pepino Jaune ESP/

FRA

144 185 284 Court-Pendu Henry BEL 99 053 865 Precoce de Karaj IRN 110 480 537 Cox’s Orange

Pippin

GBR 92 500 881 President Roulin BEL 106 767 106

De l’Estre FRA 110 615 195 Priscilla-NL USA 117 058 356

Delicious USA 130 861 887 Reinette Clochard FRA 111 511 186

Dr Oldenburg DEU 125 373 896 Reinette Dubois BEL 102 336 676 Durello di Forlı ITA 46 088 993 Renetta Grigia di

Torriana

ITA 64 416 556

F2-26829-2-2 USA 103 188 147 Rosa ITA 70 443 859

Filippa DNK 90 247 516 Skryzhapel RUS 78 536 012

Fuji JPN 109 365 440 Sonderskow CZE 74 020 116

Fyriki GRC 81 179 122 Sp€asserud SWE 84 123 712

Gelata ESP 83 186 975 Worcester

Pearmain-USA

GBR 96 410 779 Godelieve

Hegmans

BEL 98 149 222 Young America USA 96 202 193

Golden Delicious USA 80 333 013 X9273 FRA 193 831 694

Heta FIN 44 608 537 X9336a FRA 184 325 907

Hetlina CZE 88 568 362

a

Coded X9748 in Bianco et al. (2014).

Table 1 The SNP detection panel. The complete resequencing panel consisting of 63 apple cultivars plus two doubled haploids (X9273 and X9336). The cultivar name, the geographical origin and the number of reads produced is reported for each sample

(4)

SNP filtering and selection

One of the primary reasons for the failure of SNPs in geno-typing arrays is the presence of additional variants in the

probe. Although Affymetrix Axiomâ technology is more

tolerant to variants present in the probe than Illumina

Infiniumâ(only the 35 bp closest to the SNP must be

vari-ant-free as opposed to the 50 bp for the Illumina Infiniumâ,

according to the technical notes available at http:// www.affymetrix.com/), a diverse SNP discovery panel of a highly heterozygous organism like apple is quite likely to produce variants that can be very close to each other (Micheletti et al., 2011). Ideally, only the most reliable SNPs should be incorporated into the array, but variants should be avoided in the probe. In other words, only true-positive SNPs should be incorporated, minimizing the chance that false-negative SNPs (i.e. real SNPs not detected) affect the efficiency of the probe. A two-tier filter-ing approach was devised to meet both requirements (Fig-ure 2). Firstly, a set of reliable variants (quality filter) was selected by applying some minimal quality filters. Sec-ondly, more stringent criteria were applied to the quality-filtered variants to select only the high-confidence SNPs (Affymetrix filter) to be incorporated in the array. The qual-ity filter is mainly focused on the removal of false-positive variants (of poor quality or present in potential paralogous regions) and potential errors in the reference genome [al-lele frequency (AF) equal to 1], while attempting not to dis-card true positives that might be present in the probe of some nearby SNPs. The quality filter removed 18% of the detected variants, leaving a total of 12 701 549 SNPs and insertions/deletions (Indels) that were subjected to the sec-ond, more stringent, level of filtering.

The procedure to identify the high-confidence SNPs for the array by removing Indels, triallelic polymorphisms, A/T and C/G transversions, entries having additional variants in the 35 bp closest to the SNP, highly repetitive 16-mers in the probe and poor Affymetrix conversion score (pCon-vert), discarded 78% of the reliable variants, leaving a total of 2 800 173 SNPs potentially suitable for incorporation.

Additional filters were applied for final selection of the SNPs for inclusion on the array. These filters considered the status of the SNPs in the discovery panel with regard to minor allele frequency (MAF), distortion in terms of

Hardy–Weinberg equilibrium (HWE) chi-square and the

number of missing individual genotype calls. A focal point approach was used that selects SNPs in pre-defined win-dows (20 kbp in this array) centered on anchors (i.e. focal points) placed at specific points of the genome. Moreover, a tagSNP approach (Stram, 2004) was developed to discard all SNPs that were too strongly correlated to each other within a focal point window, avoiding the retention of

redundant genotypic information. A LD value of r² = 0.85

was used as a threshold.

This final selection was performed in five steps in order to enrich the SNP content of predicted genic regions and to achieve, where possible, an even distri-bution of SNPs along the assembled genome. The first set of SNPs to be included in the array were selected from the entries belonging to focal points close to pre-dicted genes with no more than 20% of missing geno-types and with a MAF greater than or equal to 0.1. This produced a total of 327 361 SNPs (70.3% of the total newly predicted ones) covering a total of 54 735 gene predictions. The aim of the second selection step was then to include markers for additional predicted genes; this was achieved by relaxing the constraint on the

missing genotypes to no more than 50% of the

Figure 1. The neighbor-joining tree of the 63 accessions.

Sixteen simple sequence repeat genotypic data have been used to compute the dissimilarities among accessions before generating the neighbor-joining tree.

(5)

resequenced accessions. Although arguably less reliable, these 8316 additional SNPs (1.8% of the total) targeted 4910 gene predictions not present in the previous step. The potential decrease in reliability of this small percent-age of the total number of newly predicted variants is a small price to pay for being able to target nearly 5000 additional gene predictions. In total 72% of the newly predicted variants incorporated into the array were located in predicted genic regions and exhibited a rather

high MAF, which fits in with the main goal of the array, i.e. GWA studies.

The third selection step was aimed at achieving a uni-form coverage of the whole genome by selecting SNPs with a MAF greater than or equal to 0.1, no more than 20% of missing genotypes and located outside predicted genic regions. This provided a further 94 918 SNPs

(20.4% of the total) – the second biggest set of newly

predicted variants included in the array. The final two

Figure 2. The single nucleotide polymorphism (SNP) detection and selection workflow.

The choice of SNPs to be incorporated into the AxiomâApple480K SNP array involved five steps: read alignment and processing, SNP detection, quality filtering, Affymetrix-specific filtering and SNP selection. The final design included 20K SNPs from the Illumina Infiniumâapple 20K array (Bianco et al., 2014) and 1.4K SNPs identified from a geno-typing-by-sequencing experiment (Gardner et al., 2014).

(6)

sets of newly predicted SNPs were selected from the variants having a MAF ranging from 0.05 to 0.1 (29 730 SNPs, 6.3% of the total) and from 0.01 to 0.05 (5461 SNPs, 1.2% of the total). Although rare, these final 35K SNPs may still carry useful alleles for GWA studies if the studied sample size is large (e.g. 1000 individuals) and were therefore included in the array to reach the final 465 786 new SNPs. A set of 19 990 previously identified SNPs coming from the Illumina Apple 20K chip and 1473 SNPs produced by a genotyping by sequencing experi-ment (Gardner et al., 2014) completed the list of 487 249

SNPs present in the AxiomâApple480K genotyping array

(Table S1 in Supporting Information).

Validation of the AxiomâApple480K SNP array

A total of 1324 apple accessions, including the 92 progeny

of two mapping populations (‘Golden Delicious’9 ‘Renetta

Grigia di Torriana’ and ‘Fuji’9 ‘Pinova’), were genotyped

with the AxiomâApple480K. Only two samples were

removed as they did not pass the dish quality control

check (DQC< 0.82). Thirteen additional samples were

removed due to a lower call rate (<97%), possibly due to triploidy in 10 of these samples as further determined based on simple sequence repeat (SSR) analysis. An initial

performance validation of the array followed the Axiomâ

best practices genotyping workflow. SNPs were classified into six classes as described in Experimental Procedures and results are detailed in Table 2. Those SNPs belonging to the poly high resolution (PHR) class were subjected to

additional filters implemented in the software SNPOLISHER

for polyploid species. This further step of filtering identi-fied eight additional SNP quality categories. For the sake of simplicity, classes BBvarianceX (BBvarX), BBvarianceY (BBvarY), ABvarianceX (ABvarX), ABvarianceY (ABvarY),

AAvarianceX (AAvarX), AAvarianceY (AAvarY) and

HomHomResolution (HomHomRes) were merged in the class ‘high variance’ (HVAR). The vast majority of SNPs fell in the more thoroughly defined class of PHR polymor-phisms (359 994, 74%). Although this classification is more exacting than the standard SNPolisher output, this result is in line with other published arrays, improving the 60% of PHR variants obtained in the 180K SoyaSNP array (Lee et al., 2015), but falling short of the 92% of PHR variants featured by the Maize 600K array (Unterseer et al., 2014). However, it should be noted that in this latter case a more expensive approach had been followed with the produc-tion of two screening arrays and the final incorporaproduc-tion of the most reliable 50% of the variants. A summary of the distribution of all SNPs in the different classes is shown in the bar plot of Figure 3 and Table 2.

The representation of these classes among the five selection steps (Figure 3) revealed that genic regions (step 1) comprising SNPs predicted from resequencing

data with few missing genotypes (i.e. less than 20%) and with a MAF greater than or equal to 0.1 exhibited the highest percentage of PHR SNPs (261 504, 80%). The per-formance of the array in genic regions with less strin-gent filtering on missing genotypes (i.e. less than 50%, step 2) was significantly degraded, as only about 18% (1514) of SNPs fell in the PHR class in this case. A strong proportion (28%) of SNPs was classified as off-tar-get variants (OTVs), indicating the presence of a null allele at a rather high frequency, which is consistent with the higher proportion of missing genotypes in the rese-quencing data (step 2). Nevertheless, this approach pro-vided about 1.5K PHR SNPs falling in the proximity of a gene that was not possible to target with the more strin-gent settings. Those SNPs detected outside of gene pre-dictions with the more stringent set of filters (step 3) featured a high percentage of PHR polymorphisms (65 533, 69%). This figure is lower than that obtained in genic regions with the same settings, but much higher than that obtained with the less stringent filters. This result is in accordance with the observation by Chagne et al. (2012) that reliable calls of SNPs outside genic regions is harder. Those SNPs called with the more strin-gent filters but based on lower MAF values (steps 4 and 5) resulted in a good proportion of PHR SNPs [respec-tively 17 494 (59%) and 1760 (32%) in the two cases] and in an increase in no minor homozygous (NMH) and unexpected heterozygosity (UHET) polymorphisms, which is an expected effect of the lower MAF. The same analy-sis was performed for the 20K SNPs identified in the pre-vious Illumina Apple 20K array (Bianco et al., 2014). Although these polymorphisms were designed for pedi-gree-based analysis from a resequencing panel of 13 major founders, the present study shows that such a SNP array has a good transferability of 63% to germ-plasm (12 689 out of 19 990 total SNPs; Table 2).

An additional step of filtering was finally applied to identify a set of very robust SNPs that can be safely used for germplasm surveys. A quality prediction model based on logistic regression provided a total of 275 223 SNPs (56% of the 487K; Table S1) that include 261 972 SNPs (95%) belonging to the PHR class, 9389 (3%) to the UHET class and 3862 (2%) to the NMH class. The repre-sentation of these classes among the five selection steps (Table 2) confirmed the features of the selection criteria described above. Despite the stringency of this additional filtering, an even coverage of the assembled genome was still reached (see Figures 4 and S1). In particular, the vast majority of these very robust SNPs were located within a few kilobases of each other. Only a few gaps

between SNPs exceeded 100 kbp, making the

AxiomâApple480K SNP array ideal for GWA studies

(7)

Genotype call assessment

Reproducibility was high, when considering both the same Golden Delicious DNA replicated over all the plates (techni-cal replicates) and repeated DNA extractions from the same genotypes (biological replicates). Over the 14 plates, a consensus genotype call was easily identified for Golden Delicious for each SNP. The error rate for technical replicates was very low, ranging from 0.001 to 0.04% for 11 plates; only three plates exhibited a higher error rate

(0.30–0.56%), consistent with their slightly lower DQC

linked to technical problems. The biological replicates were also very homogeneous, with an error rate ranging from 0.03 to 0.05%.

The genotype data generated from the Axiomâ

Apple480K SNP array for 347 805 PHR SNPs [i.e. not including the 20K and genotyping-by-sequencing (GBS) SNPs] were compared with the in silico variant calls from the high-depth resequencing. Forty-two accessions of the discovery panel present in both datasets were scrutinized for differences in allele calls. An average genotype concor-dance rate of 97% of was observed with a minimum of 96% for ‘Durello di Forlı’ and maximum of 98% for ‘Ovcı hubicka’, which confirms the figure reported by Bianco et al. (2014). As expected, a lower average genotype concordance rate was observed among SNPs that were predicted with a greater number of missing data (87%,

for step 2), and a high departure (P< 0.0001) from the

HWE (80%),

The average genotype concordance rate was lower when resequencing genotype calls were homozygous (96% versus 99% for heterozygous) probably due to an undercall of heterozygosity in regions with a low read depth (Table S2).

The reliability of the 275 223 robust SNPs was also checked by considering the Mendelian error (ME) rate (i.e.

mismatch rate) in 10 newly discovered two-parent–child

relationships. The number of MEs varied between 214 and

711, corresponding to a ME rate in the range of 0.08–

0.26%. This figure is similar to those reported in previous human studies using Affymetrix arrays (about 0.32%, giv-ing an estimated genotypgiv-ing error rate of 0.13%; Saunders et al., 2007).

To enable comparison across studies and the evaluation of data reproducibility, a total of 15 771 genetically mapped SNPs from the Illumina 20K array (Bianco et al.,

2014) included in the AxiomâApple480K were considered.

Of these, 9434 were classified as PHR markers in the

AxiomâApple480K and further analyzed. For each SNP, the

concordance rate of genotype calls across the 53 acces-sions characterized with both arrays was calculated. The average concordance rate on all the SNPs revealed a very high reproducibility of the genotype calls (98%). The discordant genotype calls (Table S3) between the two

Table 2 Detailed description of the distribution of Axiom âsingle nucleotide polymorphism (SNP) markers among the eight quality classes. The fraction of robust SNPs for each selection step is also reported (in parentheses) SNPs/Selection step PHR (robust) HVAR UHET (robust) NMH (robust) MHR CRBT OTV Other Total Total robust Robust/total (%) 1st_genes 261 504 (195 886) 14 620 3109 (2505) 4075 (590) 3183 16 806 1727 22 337 327 361 198 981 60.8 2nd_add_32miss_ genes 1514 (748) 1173 37 (20) 292 (31) 122 642 2319 2217 8316 799 9.6 3rd_add_no_genes 65 533 (41 747) 5900 773 (548) 1511 (166) 1377 7562 1194 11 068 94 918 42 461 44.7 4th_add_low_maf 17 494 (13 833) 764 5417 (4884) 2863 (1408) 458 538 422 1774 29 730 20 125 67.7 5th_add_very_low_maf 1760 (1383) 74 1161 (1070) 1817 (1318) 157 44 161 287 5461 3771 69.1 20K 11 295 (7771) 1822 393 (330) 1001 (322) 878 1265 557 2779 19 990 8423 42.1 GBS 894 (604) 136 43 (32) 73 (27) 67 84 26 150 1473 663 45.0 Total 359 994 24 489 10 933 11 632 6242 26 941 6406 40 612 487 249 275 223 56.5 Robust (261 972) 73% (9389) 86% (3862) 33% PHR, poly high resolution SNPs; HVAR, high variance; UHET, unexpected heterozygosity; NMH, no minor homozygosity; MHR, mono high resolution; CRBT, call rate below threshold; OTV, off-target variants; GBS, genotyping-by-sequencing.

(8)

Figure 3. Distribution of the 487 249 single nucleotide polymorphisms (SNPs) in the eight SNPolisher classes for each SNP selection steps.

The relative proportion of SNPs in each selection step (described in Figure 2) is reported alongside the global distribution. Considering the whole array (487 249 SNPs): poly high resolution (PHR) SNPs amount to 74%, unexpected heterozygosity (UHET) to 2%, no minor homozygosity (NMH) to 2%, mono high resolution (MHR) to 1%, call rate below threshold (CRBT) to 6%, off-target variants (OTV) to 1%, high variance (HVAR) to 5% and Other to 8%. For selection step 1 (1st_ge-nes; 327 361 SNPs): PHR amount to 80%, UHET to 1%, NMH to 1%, MHR to 1%, CRBT to 5%, OTV to 1%, HVAR to 4%, Other to 7%. For selection step 2 (2nd_add_32miss_genes; 8316 SNPs): PHR amount to 18%, UHET to 0%, NMH to 4%, MHR to 1%, CRBT to 8%, OTV to 28%, HVAR to 14%, Other to 27%. For selection step 3 (3rd_add_no_genes; 94 918 SNPs): PHR amount to 69%, UHET to 1%, NMH to 2%, MHR to 1%, CRBT to 8%, OTV to 1%, HVAR to 6%, Other to 12%. For selection step 4 (4th_add_low_maf; 29 730 SNPs): PHR amount to 59%, UHET to 18%, NMH to 10%, MHR to 1%, CRBT to 2%, OTV to 1%, HVAR to 3%, Other to 6%. For selection step 5 (5th_add_very_low_maf; 5461 SNPs): PHR amount to 32%, UHET to 21%, NMH to 33%, MHR to 3%, CRBT to 1%, OTV to 3%, HVAR to 2%, Other to 5%. For 20K and genotyping-by-sequencing (GBS) (21 463 SNPs): PHR amount to 57%, UHET to 2%, NMH to 5%, MHR to 4%, CRBT to 6%, OTV to 3%, HVAR to 9%, Other to 14%.

Figure 4. Distance distribution of consecutive sin-gle nucleotide polymorphisms (SNPs) in the apple genome.

The log-scaled histograms of the distance between consecutive SNPs is reported for all SNPs included in the AxiomâApple480K (cross-hatched) and for the 275K robust SNPs (in black). Only a few gaps between SNPs exceeding 100 kbp are found.

(9)

techniques might be caused by a different probe design,

also considering that the AxiomâApple480K SNP array was

specifically constructed for polyploid species.

The AxiomâApple480K SNP array as a tool for improving

the apple genome assembly

The Malus9 domestica genome is an improved

high-quality draft genome (Chain et al., 2009) and work is

continuing to improve it. The AxiomâApple480K chip

introduced in this paper provides a wealth of marker information that will be a useful resource to improve the

current version of the Malus9 domestica genome

assembly v.3 (https://www.rosaceae.org/). To this extent, the progenies and parents of two mapping populations were genotyped using this new tool. Array data mining identified 174 426 (36% of the total of 487 249 included in the array) robust polymorphic SNPs belonging to either the abxaa or aaxab segregating type, which were retained to build maternal and paternal maps following the double pseudo-test cross model. Four dense single

parental genetic maps were built using JOINMAP software

v.4.1 (Van Ooijen, 2006) with the multipoint maximum likelihood (ML) mapping algorithm approach (Van Ooijen,

2011) with a logarithm of odds (LOD) score >7 and ML

mapping algorithm for the calculation of genetic dis-tances with default parameters. A total of 173 082 SNPs were mapped in one of the 17 linkage groups (LGs) (more details on the genetic maps are provided in Table S4) and their genetic position can be used to pro-vide a more reliable version of the genome sequence. Moreover, some of the unanchored contigs (i.e. those placed on LG 0) can be assigned to a LG thanks to the genetic map (Table S5). A preliminary analysis was per-formed to assess the potential impact of this high-den-sity array on the current version of the genome (i.e. v.3). The two mapping populations provided 173K anchoring points (genetically mapped SNPs) for a total of 26 261 contigs which span 436 Mbp of the whole genome. More specifically, 18 934 of these contigs (for a total of 323 Mbp) are currently anchored in the draft genome and could therefore be validated by comparison with the genetic maps. Additionally, 7327 contigs (for a total of 113 Mbp) currently belong to LG 0 and can therefore be newly assigned to LGs (Tables S4 and S5). This informa-tion, combined with the physical links provided by long-jump libraries (e.g. mate-pairs, bacterial artificial chromo-some ends), could be used to increase the anchored part of the genome and to resolve inconsistencies.

CONCLUSIONS

A new high-density Affymetrix Axiomâ SNP array has

been built to gather more than 487K SNPs that are fairly well distributed over the 17 LGs making up the apple genome. This SNP array has been built from a large

dis-covery panel (comprising 63 different cultivars) rese-quenced at a high depth, which allows for a very good and confident representation of the diversity of the apple genome. The SNPs to be incorporated into the array were chosen by applying (i) a tag SNP approach to avoid the incorporation of too many redundant SNPs and (ii) a focal points approach to enrich marker pres-ence in genic regions, to evenly distribute markers along the genome and to facilitate SNP haploblock approaches. The majority of SNPs incorporated into the array exhib-ited a MAF greater than or equal to 0.1 (93%), but the remaining markers, although with lower MAF, might still carry useful information.

The AxiomâApple480K array is well suited for the

perfor-mance of GWA studies in apple collections thanks to the high percentage of well-distributed and robust SNPs observed after the screening of many diverse apple acces-sions. This new tool will first be used to finely dissect the genetic basis of important agronomic traits such as phe-nology, fruit quality, disease resistance or drought toler-ance, which have been assessed for several years over hundreds of old apple cultivars in the frame of the Euro-pean project FruitBreedomics (Laurens et al., 2012). The high SNP density will allow the fine mapping of candidate genes underlying already identified QTLs and will deliver SNP haplotypes useful for further marker-assisted

breed-ing. The AxiomâApple480K will likely be a reference tool

for further GWA studies to be performed on other apple collections worldwide, as well as providing many as yet

unknown parent–offspring relationships among old apple

cultivars. This array will help the construction of high-reso-lution linkage maps and the improvement of the current reference genome sequence.

EXPERIMENTAL PROCEDURES Plant material and sequencing

The resequencing panel comprised 65 apple accessions which included the 13 apple major founders and two doubled haploids (X9273 and X9336) used for the Illumina 20K SNP chip (Bianco et al., 2014). This panel (see Table 1, Figure 1) was chosen to cover a very large genetic diversity encompassing European and Russian germplasms and including some Iranian, Tunisian and US cultivars. Briefly, SSR genotypic data (16 SSRs) avail-able from the European project FruitBreedomics (Laurens et al., 2012) were used to select 52 new accessions exhibiting large genetic distances both among them and with the previously resequenced 13 accessions + 2 doubled haploids. Notoriety of the cultivars from various European countries was used as a second selection criterion. Leaf material was obtained from vari-ous institutions as described in Table S6. The neighbor-joining tree was built with the software DARWIN (Perrier and Jacque-moud-Collet, 2006).

DNA was extracted from freeze-dried young leaf material using the DNeasy Plant Mini Kit (Qiagen, http://www.qiagen.com/) and quantified with Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, https://www.thermofisher.com/) using the

(10)

Nan-odrop 3300 (Thermo Fisher Scientific). Sequencing libraries were constructed according to the NEBNext Ultra DNA Library Prep kit for Illumina (New England Biolabs, https://www.neb.com/) employing double-size selection steps. For each sample, 1lg of genomic DNA was fragmented with a Bioruptor Plus (Diagenode, https://www.diagenode.com/) and size selected to 300–600 bp. Resulting fragments were end-repaired, adenylated and ligated to Illumina paired-end adaptors. Each library was amplified with an index barcode tag and the size was confirmed on the BioAnalyzer 2100 (Agilent, http://www.agilent.com/). All the libraries were quantified with the KAPA Library Quantification Kit – Illumina (Kapa Biosystems, https://www.kapabiosystems.com/) using the LightCycler 480 (Roche, http://www.roche.com/) and pooled on an equimolar basis.

Each pool was sequenced on an Illumina HiSeq2000 platform with paired-end runs of 29 101 + 7 bp. Base calling and quality control were performed through the Illumina RTA sequence analy-sis pipeline.

Alignment and SNP detection

Quality filtering was performed to remove poor-quality bases at the end of the Illumina reads by using fastx-toolkit’s fastq_qual-ity_trimmer (http://hannonlab.cshl.edu/fastx_toolkit/index.html) with a minimum Phred quality score of 26 and a minimum length of 80 bp for trimmed reads. Leftovers were then aligned to the 94 079 contigs present in the Apple Genome Reference v.3 (pub-lished at https://www.rosaceae.org/) with BFAST (Homer et al., 2009). Due to the genetic heterogeneity of the discovery panel, alignments with up to seven mismatches were allowed through and were subsequently filtered by removing multi-mappers (i.e. reads aligning in multiple places of the genome) to avoid the pit-falls of false SNPs due to paralogous regions. Finally, in order not to over-penalize small rearrangements among the different apple accessions, a proper alignment filter was applied to keep pairs aligning on the same contig (without considering insert sizes) and on opposite orientations.

Detection of SNPs was performed using samtools and bcftools (Li et al., 2009) in a pooling approach whereby the SNPs were called on all the apple accessions together and single genotypes were called for each one of them through their read groups. Stan-dard parameters were used and produced a total of nearly 15.5 million variants (Figure 2) stored in the standard variant calling format (VCF). Heterozygous variants in the two doubled haploids were considered as false positives and therefore removed from the analysis.

SNP filtering

The SNP filtering procedure consisted of a two-tier approach aimed at removing false positives first and then variants that were not suitable for Affymetrix AxiomâSNP arrays. The entire filtering procedure was done through a custom-made python (https:// www.python.org/) script (available upon request) using standard VCF formatted files.

The first filtering step (quality filter) removed variants having a Phred scaled quality score lower than 19 (VCFs QUAL field), a read support higher than 4000 (DP field) and an AF equal to 1. The last two filters were used to remove variants detected in regions where the coverage distribution was too skewed, which might indicate repetitive regions or potential single-base errors in the reference genome.

A second filtering step was performed in order to maximize the number of SNPs that could be included in the chip, following

Affy-metrix guidelines (http://www.affyAffy-metrix.com/) It removed Indel variants, A/T and C/G transversions and tri-allelic predicted SNPs. All the variants meeting these filtering criteria were exported in the format specified in Affymetrix Technical Note on Axiomâ myDesignTM

(http://www.affymetrix.com/) and submitted for their internal quality check alongside 3000 DQC sequences containing no variants. In order to maximize conversion rate and space on the array, Affymetrix-checked SNPs were subjected to a final filter-ing step to remove entries with additional predicted variants in the 35 up- or downstream bases, a k-mer count of the 16 bases on each side of the SNP higher than 300 in the whole genome and a pConvert score lower than 0.6 for both the forward and the reverse probes.

SNP selection

The SNP selection procedure used custom-made python scripts and involved five distinct selection criteria. Similar to the method used for other SNP arrays (Chagne et al., 2012; Bianco et al., 2014) a focal points approach was used to pursue an enrichment of genic regions (Celton et al., 2014) first; this was then extended to non-genic regions in order to reach a uniform distribution of focal points along the entire genome.

Within each focal point, tag SNPs were identified by plink (Pur-cell et al., 2007) with options—tag-r2 0.85 —list-all – show-tags on the following five SNP sets:

(i) MAF greater than or equal to 0.1, HWE chi-square P-value higher than 10 8, missing genotypes in no more than 14 accessions and belonging to a gene prediction,

(ii) MAF greater than or equal to 0.1, HWE chi-square P-value higher than 10 8, missing genotypes in 15–32 accessions and belonging to a gene prediction,

(iii) MAF greater than or equal to 0.1, HWE chi-square P-value higher than 10 8, missing genotypes in no more than 14 accessions and not belonging to a gene prediction,

(iv) MAF between 0.05 and 0.1, HWE chi-square P-value higher than 10 8, missing genotypes in no more than 14 accessions, (v) MAF between 0.01 and 0.05, HWE chi-square P-value higher

than 10 8, missing genotypes in no more than 14 accessions, and kept to be incorporated in the final chip design.

In addition to the aforementioned newly discovered SNPs, 19 990 markers from the Illumina 20K Apple Chip (Bianco et al., 2014) and 1473 markers obtained from a GBS experiment (Gard-ner et al., 2014) were added by default to complete the selection of SNPs to be included in the AxiomâApple480K SNP array.

Validation of SNPs by genotyping

A total of 1324 apple accessions (mostly old apple cultivars) and two small mapping populations of 46 progenies (‘Golden Delicious’9 ‘Renetta Grigia di Torriana’ and ‘Fuji’ 9 ‘Pinova’) each were genotyped using the AxiomâApple480K chip to eval-uate the array performance. The same DNA sample from Golden Delicious was included 14 times as a technical replicate (one sample per plate). The error rate of each replicate was computed as the rate of departure from the consensus over the PHR+ UHET + NMH SNPs. Two DNA extractions were also per-formed independently for three additional cultivars grown in two different countries. Such DNA samples were genotyped as biological replicates to assess the error rate for independent DNA extraction from independent trees. For each sample, 200 ng of genomic DNA was used; after WGA and fragmenta-tion, 1500lg was hybridized to arrays using the Affymetrix GeneTitan platform following Affymetrix’s guidelines available at http://www.affymetrix.com.

(11)

The Affymetrixâ Genotyping ConsoleTM

software (GTC) (v.4.2) was used for processing raw hybridization intensity data, cluster-ing and genotype callcluster-ing. Samples with a DQC value<0.82 and sample call rate <0.97 were excluded from further genotyping analysis.

The results from GTC were then post-processed using the SNPolisher R package (v.1.5.2) which classifies the SNPs into six major classes: PHR, mono high resolution, OTV, call rate below threshold, NMH and Other. A second run of SNPolisher further processed the PHR SNPs and identified the following eight sub-classes in addition to a new, more thoroughly defined PHR class: BBvarX, BBvarY, ABvarX, ABvarY, AAvarX, AAvarY, HomHomRes and UHET.

A quality prediction model was then added for the three most informative classes, i.e. PHR, UHET and NMH, to refine the selec-tion of very robust SNPs. First, a set of about 1600 randomly selected SNPs pulled from these three classes was scored visually as 0 (poor) or 1 (good) according to visual inspection of the qual-ity of the discrimination between the three clusters of genotype calls (AA, AB, BB). Some stringent criteria were adopted for this stage, for example SNPs not showing a clear separation of clus-ters were discarded, as were SNPs with individuals placed inbetw-een two clusters, and those having genotype calls (generally AA or BB) too close to the OTV (null–null) cluster (i.e. showing no clear discontinuity between AA, resp. BB, and obvious null–null). The main metrics produced for each SNP by SNPolisher applied to the 1324 samples were then collected. These included the Fish-er’s linear discriminant (FLD), HomFLD (a version of FLD com-puted for the homozygous genotype clusters), heterozygous strength offset (HetSO), homozygote ratio offset (HomRO), MAF, percentage of AA (percAA, etc.), AB, BB and NoCall individuals, variance over AA (respectively, AB, and BB) along the X(Y)-axis (AAvarX, AAvarY, ABvarX, ABvarY, BBvarX, BBvarY). All such val-ues were fed to a logistic regression model implemented in R using a generalized linear model (glm) with the argument ‘fam-ily=binomial’, where a forward–backward stepwise process with Akaike’s information criterion being used to determine the optimal inclusion of the previous parameters into the regression model. The best model identified (i.e. SNP~ FLD + HetSO + HomRO + percAA+ percAB + percBB + AAvarX + BBvarX + ABvarX + ABvarY) was checked for robustness by cross-validation and only the SNPs exhibiting a prediction value higher than 0.5 were retained. The final list of very robust markers was obtained by applying some additional SNP filtering steps based on the presence of seg-regation errors in the two mapping populations, of Mendelian errors in allele transmission between Golden Delicious and its two doubled haploids or within other known parent–offspring situations and of discrepancies over technical and biological repli-cates.

Array genotype call assessment

Array genotype calls available for 42 out of the 63 accessions of the discovery panel were compared with those obtained through high-coverage resequencing. Genotyping data for the PHR SNPs were exported and compared with the calls from re-sequencing that had a support of at least eight reads.

The genotype calls of 53 accessions genotyped with both the AxiomâApple480K and the Illumina Infiniumâ apple 20K SNP arrays were also compared to assess the concordance of the two techniques. The Affymetrix data points to compare were limited to the PHR SNPs, while the corresponding Illumina genotypes were obtained from the mapped set of 15K SNPs described in Bianco et al. (2014).

Linkage maps construction

The genotype calls for robust polymorphic SNPs belonging to both the abxaa and aaxab segregating types from the two map-ping populations were retained to build maternal and paternal maps following the double pseudo-test cross model (Grattapaglia and Sederoff, 1994). Parental genetic maps were built using J OIN-MAPsoftware v.4.1 (Van Ooijen, 2006) with a ML mapping algo-rithm (Van Ooijen, 2011) for calculation of genetic distances with default parameters.

DATA AVAILABILITY

All python and R scripts are available upon request. Infor-mation regarding the 487 249 SNPs present in the

AxiomâApple480K array has been uploaded to dbSNP

database (http://www.ncbi.nlm.nih.gov/SNP/) and will be made publicly available from the next release of the data-base. Raw data of the 1324 accessions genotyped with the Apple480K array with encoded sample names are available upon request.

ACCESSION NUMBERS

Information regarding the 487 249 SNPs present in the

AxiomâApple480K array has been uploaded to the dbSNP

database (http://www.ncbi.nlm.nih.gov/SNP/) and will be made publically available from the next release of the data-base.

ACKNOWLEDGEMENTS

This work has been partly funded under the EU Seventh Framework Programme by FruitBreedomics project no. 265582: Integrated Approach for Increasing Breeding Efficiency in Fruit Tree Crop. The views expressed in this work are the sole responsi-bility of the authors and do not necessarily reflect the views of the European Commission. We thank Matthew Ordidge (University of Reading, UK), Marc Lateur (CRA-W, Centre Wallon de Recherche Agronomique, Belgium), Frantizek Paprstein and Jiri Sedlak (RBIPH, Institute of Pomology Holovousy Ltd, Czech Republic), Hilde Nybom (SLU, Sveriges Lantbruksuniversitet, Sweden), Anna Pukinova and Nina Krasova (VNIIISPK, The All Russian Research Institute of Horticultural Breeding, Russia) and Ivan Suprun (NCRRIHV, North-Caucasian Regional Research Institute of Horti-culture and VitiHorti-culture, Russia), Walter Guerra (Research Centre for Agriculture and Forestry, Italy) and Stefano Tartarini (Univer-sity of Bologna, Italy) for providing leaf or DNA material. We thank Erika Stefani, Daniela Nicolini, Elisa Banchi for library preparation, Sylvain Gaillard for providing a formatted list of genome positions of apple expressed genes, Azeddine Si Ammour, Mirko Moser and Marco Moretto for providing the list of apple full transcripts, and Pietro Franceschi, Paolo Fontana and Federico Vaggi for helpful discussions on data analysis. A commercial organization, Affyme-trix Inc., was involved in the development of the array and in the preparation of the manuscript.

SUPPORTING INFORMATION

Additional Supporting Information may be found in the online ver-sion of this article.

Figure S1. Distribution of robust single nucleotide polymorphisms in the 17 apple chromosomes.

(12)

Table S1. Single nucleotide polymorphism information. The sub-set of 275K very robust markers is also reported.

Table S2. Genotype concordance for single nucleotide polymor-phisms called with the Axiomâ Apple480K and high-depth rese-quencing.

Table S3. List of single nucleotide polymorphisms with discordant genotypes.

Table S4. Summary of the genetic linkage maps. Table S5. Genetic location of contigs.

Table S6. List of the institutions that provided leaf material for the single nucleotide polymorphism detection panel.

REFERENCES

Allard, A., Legave, J.M., Martinez, S., Kelner, J.J., Bink, M.C.A.M., Di Guardo, M., Di Pierro, E.A., Laurens, F., Van de Weg, W.E. and Costes, E. (2016) New insights on Chilling and Heat Requirement genetic deter-minisms in apple and putative candidate genes through a multi-family QTL discovery. J. Exp. Bot. In press.

Antanaviciute, L., Fernandez-Fernandez, F., Jansen, J., Banchi, E., Evans, K.M., Viola, R., Velasco, R., Dunwell, J.M., Troggio, M. and Sargent, D.J. (2012) Development of a dense SNP-based linkage map of an apple root-stock progeny using the Malus Infinium whole genome genotyping array. BMC Genom. 13, 203.

Bassil, N.V., Davis, T.M., Zhang, H. et al. (2015) Development and prelimi-nary evaluation of a 90 K AxiomâSNP array for the allo-octoploid culti-vated strawberry Fragaria9 ananassa. BMC Genom. 16, 155.

Bianco, L., Cestaro, A., Sargent, D.J. et al. (2014) Development and valida-tion of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus9 domestica Borkh). PLoS ONE, 9, e110377.

Celton, J.M., Gaillard, S., Bruneau, M., Pelletier, S., Aubourg, S., Martin-Magniette, M.L., Navarro, L., Laurens, F. and Renou, J.P. (2014) Widespread anti-sense transcription in apple is correlated with siRNA production and indicates a large potential for transcriptional and/or post-transcriptional control. New Phytol. 203, 287–299.

Chagne, D., Crowhurst, R.N., Troggio, M. et al. (2012) Genome-wide SNP detection, validation, and development of an 8K SNP array for apple. PLoS ONE, 7, e31745.

Chain, P.S., Grafham, D.V., Fulton, R.S. et al. (2009) Genomics. Genome project standards in a new era of sequencing. Science, 326, 236–237.

Chen, H., Xie, W., He, H. et al. (2014) A high-density SNP genotyp-ing array for rice biology and molecular breedgenotyp-ing. Mol. Plant. 7, 541– 553.

Clark, M.D., Schmitz, C.A., Rosyara, U.R., Luby, J.J. and Bradeen, J.M. (2014) A consensus’Honeycrisp’ apple (Malus9 domestica) genetic link-age map from three full-sib progeny populations. Tree Genet. Genomes, 10, 627–639.

Eckert, A.J., Wegrzyn, J.L., Cumbie, W.P., Goldfarb, B., Huber, D.A., Tol-stikov, V., Fiehn, O. and Neale, D.B. (2012) Association genetics of the loblolly pine (Pinus taeda, Pinaceae) metabolome. New Phytol. 193, 890–902.

Evans, L.M., Slavov, G.T., Rodgers-Melnick, E. et al. (2014) Population geno-mics of Populus trichocarpa identifies signatures of selection and adap-tive trait associations. Nat. Genet. 46, 1089–1096.

Falginella, L., Cipriani, G., Monte, C., Gregori, R., Testolin, R., Velasco, R., Troggio, M. and Tartarini, S. (2015) A major QTL controlling apple skin russeting maps on the linkage group 12 of ‘Renetta Grigia di Torriana’. BMC Plant Biol. 15, 150.

Ganal, M.W., Wieseke, R., Luerssen, H., Durstewitz, G., Graner, E-M., Plieske, J. and Polley, A. (2014) High-throughput SNP profiling of genetic resources in crop plants using genotyping arrays. In Genomics of Plant Genetic Resources Volume 1. Managing, Sequencing and Mining Genetic Resources (Tuberosa, R., Graner, A. and Frison, E. eds). Dordrecht, Netherlands: Springer, pp. 113–130.

Gardner, K.M., Brown, P., Cooke, T.F., Cann, S., Costa, F., Bustamante, C., Velasco, R., Troggio, M. and Myles, S. (2014) Fast and cost-effective

genetic mapping in apple using next-generation sequencing. G3, 4, 1681–1687.

Grattapaglia, D. and Sederoff, R. (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics, 137, 1121–1137.

Gross, B.L., Henk, A.D., Richards, C.M., Fazio, G. and Volk, G.M. (2014) Genetic diversity in Malus9 domestica (Rosaceae) through time in response to domestication. Am. J. Bot. 101, 1770–1779.

Guan, Y., Peace, C., Rudell, D., Verma, S. and Evans, K. (2015) QTLs detected for individual sugars and soluble solids content in apple. Mol. Breed. 35, 135.

Gupta, P.K., Rustgi, S. and Mir, R.R. (2013) Array-based high-throughput DNA markers and genotyping platforms for cereal genetics and genomics. In Cereal Genomic II (Gupta, P. K. and Varshney, R.K., eds). Dordrecht, Netherlands: Springer, pp. 11–55.

Homer, N., Merriman, B. and Nelson, S.F. (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS ONE, 4, e7767.

Hulse-Kemp, A.M, Lemm, J., Plieske, J. et al. (2015) Development of a 63K SNP array for cotton and high-density mapping of intraspecific and inter-specific populations of Gossypium spp. G3, 5, 1187–1209.

Juniper, B.E. and Mabberley, D.J. (2006) The Story of the Apple. Portland, OR: Timber Press Inc.

Khan, M.A., Han, Y., Zhao, Y.F. and Korban, S.S. (2012) A high-throughput apple SNP genotyping platform using the GoldenGate (TM) assay. Gene, 494, 196–201.

Koning-Boucoiran, C.F.S., Esselink, G.D., Vukosavljev, M. et al. (2015) Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.). Front. Plant Sci. 6, 249.

Kumar, S., Bink, M.C.A.M., Volz, R.K., Bus, V.G.M. and Chagne, D. (2012) Towards genomic selection in apple (Malus9 domestica Borkh.) breed-ing programmes: prospects, challenges and strategies. Tree Genet. Gen-omes, 8, 1–14.

Kumar, S., Garrick, D.J., Bink, M.C.A.M., Whitworth, C., Chagne, D. and Volz, R.K. (2013) Novel genomic approaches unravel genetic architecture of complex traits in apple. BMC Genom. 14, 393.

Kumar, S., Raulier, P., Chagne, D. and Whitworth, C. (2014) Molecular-level and trait-level differentiation between the cultivated apple (Malus9 do-mestica Borkh.) and its main progenitor Malus sieversii. Plant Genet. Res. 12, 330–340.

Laurens, F., Aranzana, M.J., Arus, P. et al. (2012) Review of fruit genetics and breeding programmes and a new European initiative to increase fruit breeding efficiency. Acta Hort. 929, 95–102.

Lee, Y.-G., Jeong, N., Kim, J.H. et al. (2015) Development, validation and genetic analysis of a large soybean SNP genotyping array. Plant J. 81, 625–636.

Leforestier, D., Ravon, E., Muranty, H., Cornille, A., Lemaire, C., Giraud, T., Durel, C.-E. and Branca, A. (2015) Genomic basis of the differ-ences between cider and dessert apple varieties. Evol. Appl. 8, 650–661.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R.; 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.

Li, L., Long, Y., Zhang, L., Dalton-Morgan, J., Batley, J., Yu, L., Meng, J. and Li, M. (2015) Genome wide analysis of flowering time trait in multiple environments via high-throughput genotyping technique in Brassica napus L. PLoS ONE, 10, e0119425.

Micheletti, D., Troggio, M., Zharkikh, A., Costa, F., Malnoy, M., Velasco, R. and Salvi, S. (2011) Genetic diversity of the genus Malus and implications for linkage mapping with SNPs. Tree Genet. Genomes, 7, 857–868.

Perez-Enciso, M., Rincon, J.C. and Legarra, A. (2015) Sequence- vs. chip-assisted genomic selection: accurate biological information is advised. Genet. Sel. Evol. 47, 43.

Perrier, X. and Jacquemoud-Collet, J.P. (2006) DARwin Software. http://dar-win.cirad.fr/darwin.

Purcell, S., Neale, B., Todd-Brown, K. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575.

(13)

Saunders, I.W., Brohede, J. and Hannan, G.N. (2007) Estimating genotyping error rates from Mendelian errors in SNP array genotypes and their impact on inference. Genomics, 90, 291–296.

Stram, D.O. (2004) Tag SNP selection for association studies. Genet. Epi-demiol. 27, 365–374.

Unterseer, S., Bauer, E., Haberer, G. et al. (2014) A powerful tool for gen-ome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array. BMC Genom. 15, 823.

Van Ooijen, J. (2006) Software for the Calculation of Genetic Linkage Maps in Experimental Populations. Wageningen: Kyazma BV.

Van Ooijen, J.W. (2011) Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genet. Res. 93, 343–349. Velasco, R., Zharkikh, A., Affourtit, J. et al. (2010) The genome of the

domesticated apple (Malus9 domestica Borkh.). Nat. Genet. 42, 833– 839.

Figure

Table 1 The SNP detection panel. The complete resequencing panel consisting of 63 apple cultivars plus two doubled haploids (X9273 and X9336)
Figure 1. The neighbor-joining tree of the 63 accessions.
Figure 2. The single nucleotide polymorphism (SNP) detection and selection workflow.
Figure 3. Distribution of the 487 249 single nucleotide polymorphisms (SNPs) in the eight SNPolisher classes for each SNP selection steps.

Références

Documents relatifs

The objectives of the work presented here are (1) to use SNPs previously identified in maize to develop a first reliable and standardized large scale SNP genotyping array; (2)

Sandra and Van Abbenyen (2009) assume a full-form representation of inflected word forms in Dutch as well as two memory systems that might be causally involved in errors of

« Je déclare que quiconque veut exceller un jour en quoi que ce soit, doit s’appliquer à cet objet dès l’enfance en trouvant à la fois son amusement et son occupation dans

(14) This method corresponds to positing a linear model where, for a hypothetical quantitative trait, the genetic value of an individual is the sum of independent marker

Total number of haplotypes, haplotype diversity, nucleotide diversity and haplotype-based polymorphism information content of validated cacao genes.. Disease

• GWAS conducted with 16480 SNP markers (GBS) • on 570 cocoa trees from Brazil, Cameroun, Ecuador, Trinidad, evaluated for self incompatibility. Genome Wide Association

The results show SNP markers presents/or near in putative transcription factor binding sites (TFBS) genotiped with success in two populations of mapping. RESULTS AND DISCUSSION

Using the focal point approach detailed in the methods, 16,330 SNPs were identified from the re-sequencing of 14 genotypes of the discovery panel, as well as 3,670 validated