Advances in oil palm
genomic selection
David CROS, Florence JACOB, Achille NYOUMA,
Billy TCHOUNKE, Dadang AFANDI, Indra SYAHPUTRA, Benoit COCHARD
1/8 - Principle of genomic selection
TRAINING POPULATION
VALIDATION POPULATION related
TRAINING POPULATION Phenotyping 𝑦𝑦𝑖𝑖 = µ + ∑𝑗𝑗=1𝑛𝑛 𝑍𝑍𝑖𝑖𝑗𝑗𝑚𝑚�𝑗𝑗 + 𝑒𝑒𝑖𝑖, … Prediction models VALIDATION POPULATION High density genotyping
TRAINING POPULATION APPLICATION POPULATION Phenotyping 𝑦𝑦𝑖𝑖 = µ + ∑𝑗𝑗=1𝑛𝑛 𝑍𝑍𝑖𝑖𝑗𝑗𝑚𝑚�𝑗𝑗 + 𝑒𝑒𝑖𝑖, … Prediction models VALIDATION POPULATION Validated model High density genotyping
TRAINING POPULATION APPLICATION POPULATION Phenotyping 𝑦𝑦𝑖𝑖 = µ + ∑𝑗𝑗=1𝑛𝑛 𝑍𝑍𝑖𝑖𝑗𝑗𝑚𝑚�𝑗𝑗 + 𝑒𝑒𝑖𝑖, … Prediction models (selection candidates) VALIDATION POPULATION Validated model High density genotyping
TRAINING POPULATION APPLICATION POPULATION Phenotyping 𝑦𝑦𝑖𝑖 = µ + ∑𝑗𝑗=1𝑛𝑛 𝑍𝑍𝑖𝑖𝑗𝑗𝑚𝑚�𝑗𝑗 + 𝑒𝑒𝑖𝑖, … Prediction models (selection candidates) VALIDATION POPULATION Validated model High density genotyping (Pre-) selection on GEGV/GEBV
2/8 - Information captured by markers
GS models can, depending on trait and population:
–
predict the genetic value of unevaluated selection
candidates
, with prediction accuracies ranging from 0.14
and 0.73 for various yield components
–
capture genetic differences within full-sib families and
between families
, enabling the selection of the best
individuals of the best families
(Cros et al. 2017, Kwong et al. 2017a, b, Cros et al 2015b)
• SSR suitable for validation, not for
practical application
(Cros et al. 2015b; Marchal et al. 2016)
• High throughput SNP genotyping required as:
- large number of individuals to genotype
- high marker density maximizes GS accuracy (Kwong et al. 2017b)
SNP arrays
(Kwong et al. 2016, 2017a, b; Ithnin et al. 2017)
genotyping by sequencing (GBS)
(Cros et al. 2017; Nyouma et al. 2019b)
3/8 - Molecular data
Initial empirical validations with SSRs high throughput SNP genotyping
… … … … … SNP genotyping:
• Minimum marker density for GS is affected by marker type, marker sampling, trait and population, but a few thousand SNPs are enough
(Marchal et al. 2016; Kwong et al. 2017a; Cros et al. 2017; Nyouma et al. 2019b)
E.g., in empirical between-site hybrid cross value prediction with GBS:
(Cros et al. 2017, BMC Genomics)
• Minimum marker density for GS is affected by marker type, marker sampling, trait and population, but a few thousand SNPs are enough
(Marchal et al. 2016; Kwong et al. 2017a; Cros et al. 2017; Nyouma et al. 2019b)
E.g., in empirical between-site hybrid cross value prediction with GBS:
(Cros et al. 2017, BMC Genomics)
3/8 - Molecular data
• SNP filtering can reduce marker density, with GS accuracies equal or
higher than with all the SNPs (GBS-SNPs with the least missing data (Cros et al. 2017, BMC Genomics); SNPs with the highest association scores (rrBLUP-B) (Kwong et al. 2017, Sci Report))
11
4/8 - Training and application populations
12
E.g., % oil in pulp in Deli oil palm:
(Cros et al, 2015 TAAG)
G
BL
UP
accu
racy
Maximum relationship between training and application individuals
• GS accuracy increases with
the relationship between training and application individuals (Cros et al. 2015b)
implementing GS in
full-sibs or progenies of the training individuals will maximize GS efficiency
4/8 - Training and application populations
5/8 - Statistical methods of predictions
A wide range of statistical methods has been applied for oil palm GS, and they did not significantly affect GS accuracy (although BayesB and
SVM could be slightly better in some cases)
(Cros et al. 2015b; Kwong et al. 2017a,b; Ithnin et al. 2017)
GBLUP and RR-BLUP (widely used for GS predictions due to their
simplicity and computational efficiency) are suitable for oil palm
6/8 - Data modeling
14
Independent GS predictions in each
parental group Parents
Parent performances in hybrid crosses
(GCAs, testcross phenotypic means)
Parent phenotypes
Prediction approach Genotypes Data records
Cros et al. 2015b;
Wong and Bernardo 2008
15 Independent GS predictions in each parental group Joint GS predictions in the 2 parental groups Hybrids Parents Parent performances in hybrid crosses
(GCAs, testcross phenotypic means)
Parent phenotypes
Prediction approach Genotypes Data records
Cros et al. 2015b;
Wong and Bernardo 2008
Ithnin et al. 2017; Kwong et al. 2017b
Hybrid phenotypes
Cros et al. 2015a; Nyouma et al. 2019b
Parents
Cros et al. 2015a, 2017, 2018; Marchal et al. 2016
Kwong et al. 2017a
Using parental origin of alleles Not using parental
origin of alleles
16 Independent GS predictions in each parental group Joint GS predictions in the 2 parental groups Hybrids Parents Parent performances in hybrid crosses
(GCAs, testcross phenotypic means)
Parent phenotypes
Prediction approach Genotypes Data records
Cros et al. 2015b;
Wong and Bernardo 2008
Ithnin et al. 2017; Kwong et al. 2017b
Hybrid phenotypes
Cros et al. 2015a; Nyouma et al. 2019b
Parents
Cros et al. 2015a, 2017, 2018; Marchal et al. 2016
Kwong et al. 2017a
Using parental origin of alleles Not using parental
origin of alleles
Comparison of these different modeling approaches is lacking
7/8 - Annual genetic progress
…but comparing GS and conventional phenotypic selection must take into account r, i (selection intensity) and L (generation interval)
17 Annual response to selection
i × r
â,a× σ
aL
=
accuracy (r)= key parameter to evaluate GS, as directly related to the annual genetic progress:
…but comparing GS and conventional phenotypic selection must take into account r, i (selection intensity) and L (generation interval)
18
Simulations showed GS could largely increase annual selection
progress by decreasing the length of the breeding cycles and by increasing the selection intensity
(Wong and Bernardo 2008, Cros et al. 2015a, Cros et al. 2018)
Annual response to selection
i
× r
â,a× σ
aL
=
accuracy (r)= key parameter to evaluate GS, as directly related to the annual genetic progress:
+
RRGS2:
• Calibration:
- genotypes = progeny-tested individuals - phenotypes = hybrid individuals
• 300 candidates
• progeny-tests 1 generation / 2
+72%
+45%
RRS:
• 120 candidates / population / generation • progeny-tests every generation
RRGS1:
• Calibration:
- genotypes = idem RRGS2 + hybrids - phenotypes = hybrid individuals • 300 candidates
• progeny-tests 1 generation / 4
Cros et al, 2015 BMC Genomics
Very promising simulation results…
but empirical studies showed GS can also have low accuracy (<0.2)
and/or fail to capture within-family genetic differences in some traits / populations
(Cros et al. 2015b, 2017, Ithnin et al. 2017, Nyouma et al. 2019)
field evaluations remain necessary in all cycles
current possible practical application = …
20
Very promising simulation results…
but empirical studies showed GS can also have low accuracy (<0.2)
and/or fail to capture within-family genetic differences in some traits / populations
(Cros et al. 2015b, 2017, Ithnin et al. 2017, Nyouma et al. 2019)
field evaluations remain necessary in all cycles
current possible practical application = genomic preselection prior
to field trials (progeny tests, clonal trials):
21
Current method for preselecting parents of hybrid crosses before
progeny tests = phenotypic preselection on 1 or 2 traits with high h²
Cros et al, 2017 BMC Genomics
new possible method = preselection on more yield components Current method for preselecting parents of hybrid crosses before
progeny tests = phenotypic preselection on 1 or 2 traits with high h²
… but in an empirical dataset, FFB in the hybrid crosses could have been
more than 10% higher with genomic preselection in the parental
populations prior to progeny tests (thanks to higher selection intensity):
Current method for preselecting hybrid individuals before clonal trials = phenotypic preselection on 1 or 2 traits with high H²
Current method for preselecting hybrid individuals before clonal trials = phenotypic preselection on 1 or 2 traits with high H²
…but for most of the oil yield components, GS models better estimate
the individual genetic values than conventional phenotyping:
New possible methods = preselection on all yield components /
early GS preselection
Nyouma et al. (in prep)
GS approaches conventional PS
GS PS GS
Oil palm GS simulations showed faster increase in inbreeding in the parental populations than conventional breeding (possibly detrimental for seed production and long term progress)
Oil palm GS simulations showed faster increase in inbreeding in the parental populations than conventional breeding (possibly detrimental for seed production and long term progress)
better to combine GS and inbreeding management:
Simulation results, Tchounke et al. (in prep)
max selected La Mé per full-sib family = 1 mate selection
no inbreeding management
Mate selection reduces inbreeding in parents, but also increases
genetic progress (by optimizing matings between selected individuals)
8/8 - Conclusions
GS has the potential to speed up oil palm genetic progress to a
unprecedented level by:
- allowing better preselection before progeny tests and clonal trials
(more traits can be submitted to GS than to phenotypic preselection)
- allowing earlier preselection before progeny tests and clonal trials
(at the nursery stage)
- predicting the genetic value of (much) more selection candidates
- (in the future?) avoiding field trials in some cycles
Further studies will optimize oil palm GS:
- comparing the GS modeling approaches
- optimizing the design of training and application populations
- making predictions for new traits (e.g. resistance to diseases)
- modeling G × E interactions
- using multi-omics data
- quantifying variations of GS accuracy between families
- making inter-specific predictions
- comparing imputation methods (for GBS)
Etc.
29
For more information, see PIPOC article and Nyouma et al 2019 TGG (and 2020 articles!).
Pobè, Benin Yaoundé, Cameroon Montpellier, France
Indonesia
Montpellier, France