HAL Id: hal-02739431
https://hal.inrae.fr/hal-02739431
Submitted on 2 Jun 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Inferring the ancestral dynamics of population size from
genome wide molecular data - an ABC approach
Simon Boitard, Stanislas Sochacki
To cite this version:
Simon Boitard, Stanislas Sochacki. Inferring the ancestral dynamics of population size from genome
wide molecular data - an ABC approach. 10. World Congress on Genetics Applied to Livestock
Production (WCGALP), Aug 2014, Vancouver, Canada. American Society of Animal Science, 2014,
Proceedings 10th World Congress of Genetics Applied to Livestock Production. �hal-02739431�
Inferring the ancestral dynamics of population size from
genome wide molecular data - an ABC approach
Simon Boitard 1,2 , Stanislas Sochacki 1
1 : UMR 7205 ISYEB (EPHE - MNHN - CNRS - UPMC), Paris.
2 : UMR 1313 GABI (INRA - AgroParisTech), Jouy en Josas
WCGALP 2014
Motivation
Genome wide sequence data contains rich information about population
size history, cf PSMC (Li and Durbin, 2011).
Development of an ABC approach
Several estimation methods :
Sequentially Markovian Coalescent : PSMC (Li and Durbin, 2011),
dical (Sheehan et al, 2013), MSMC (Schiffels and Durbin, 2014).
Runs of Homozygocity : MacLeod et al (2013), Harris and Nielsen
(2013).
So far limited to small sample sizes (n = 1 to ≈ 5 diploid individuals).
→ low accuracy for recent history estimation.
Approximate Bayesian Computation (ABC) could take advantage of
both genome wide data and large sample size.
Outline
1 Methods
2 Simulation Results
3 Application to bovine NGS data
Outline
1 Methods
2 Simulation Results
3 Application to bovine NGS data
Methods
Principles of Approximate Bayesian Computation (ABC)
Model with parameter θ (multi-dimensional), dataset D.
Approximate P(θ|D) by P(θ|S), for a set S of (meaningfull!)
summary statistics.
Estimate P(θ|S) by intensive simulations :
1 Compute S = f (D)
2 For i from 1 to I:
1 Sample parameter value θ i from the prior distribution of θ.
2 Simulate dataset D i from the model with parameter value θ i .
3 Compute S i = f (D i ).
4 Keep θ i if dist(S i , S) < .
3 Estimate P(θ|S) from the selected θ i values, by simple counting or
other regression approaches.
Model
D = n diplo¨ıd genomes.
Coalescent model with mutation and recombination.
One single panmictic population (no structure).
Piecewise constant effective population size, m fixed time windows.
Methods
Prior distributions
Per generation per bp mutation rate : µ = 1e − 8.
Per generation per bp recombination rate : r ∼ U (0.1e − 8, 1e − 8).
Population size :
log(N 0 ) ∼ U (1, 5).
log(N i +1 ) = log(N i ) + α, α ∼ U (−1, 1).
1 ≤ log(N i ) ≤ 5.
Summary statistics - Allele Frequency Spectrum (AFS)
Proportion of polymorphic sites over the genome.
Proportion of sites with i copies of the minor allele, for i from 1 to n.
●
●
●
●
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●