• Aucun résultat trouvé

Inferring the population size history of 4 cattle breeds using NGS data and an approximate Bayesian computation approach

N/A
N/A
Protected

Academic year: 2021

Partager "Inferring the population size history of 4 cattle breeds using NGS data and an approximate Bayesian computation approach"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01194066

https://hal.archives-ouvertes.fr/hal-01194066

Submitted on 5 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Inferring the population size history of 4 cattle breeds using NGS data and an approximate Bayesian

computation approach

Simon Boitard

To cite this version:

Simon Boitard. Inferring the population size history of 4 cattle breeds using NGS data and an approximate Bayesian computation approach. 23. Plant and Animal Genome (PAG), Jan 2015, San Diego, United States. �hal-01194066�

(2)

Inferring the population size history of 4 cattle breeds using NGS data and an approximate

Bayesian computation approach

Simon Boitard 1,2 (1) UMR 7205 Institut de Systématique Evolution et Biodiversité, Paris, France. (2) UMR 1313 Génétique Animale et Biologie Intégrative, Jouy en Josas, France.

S UMMARY

I present here a new Approximate Bayesian Computation (ABC) approach allowing to estimate the ancestral dynamics of effective size in a population from a sample of unphased diploid genomes from this population. This approach is based on summarizing the observed genomic sample using a set of statistics related to the Allele Frequency Spectrum (AFS) and the decay of Linkage Disequilibrium (LD) with physical distance. Using simulations, I show that this approach provides accurate estimations of population sizes from present time up to at least 20,000 generations before present. I then apply it to large genomic samples in 4 cattle breeds, which reveals interesting aspects of cattle history, from domestication to modern breed creation.

E STIMATION APPROACH

The history of effective size in a population is estimated using the following ABC [1] approach :

• The observed sample of n genomes is summarized using the following statistics : the overall proportion of polymorphic sites over the genome (1 statistic), the relative proportion of these polymorphic sites carrying i copies of the minor allele, for i from 1 to n/2 ( n /2 statistics), and the average zygotic r

2

for 18 bins of physical distance (from 300bp to 1.4Mbp) between SNP (18 statistics).

• 450,000 genomic samples of n diploid genomes are simulated under a coalescent model with mutation and recombination, assuming a piecewise constant population size with 21 fixed time windows. For each simu- lated sample, the recombination rate and the 21 population sizes are sampled from prior distributions, while the mutation rate is fixed to 10

−8

.

• Simulated samples are summarized using the same statistics as the observed sample, and the best 0.5% of population size histories (i.e. those providing statistics that are the most similar to the statistics of the observed sample) are selected. Population size history is finally estimated from these selected histories, using a neural network regression approach and the abc R package [2].

P REDICTION ERROR

0.00.20.40.60.81.0

generations before present (log scale)

population size prediction error

0 10 100 1,000 10,000 100,000

● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ●

● ●

● ● ● ● ● ● ● ●

LD statistics

AFS statistics without the proportion of SNP AFS statistics

all statistics

Prediction errors computed from 2,000 simulated samples ( n = 25 ).

Random histories sampled from the prior result in an error of 1.

E STIMATED HISTORY FOR 5 DIFFERENT SIMULATED SCENARIOS

constant size

generations before present (log scale)

effective population size (log scale)

0 10 100 1,000 10,000 100,000

1001,00010,000100,000

decline

0 10 100 1,000 10,000 100,000

1001,00010,000100,000

true value

ABC estimation MSMC estimation

expansion

0 10 100 1,000 10,000 100,000

1001,00010,000100,000

zigzag

0 10 100 1,000 10,000 100,000

1001,00010,000100,000

For each scenario, 3 simulated samples ( n = 25 ) were analyzed with ABC or MSMC [3]. Reconstruction of recent history is more accurate with ABC.

E STIMATED HISTORY IN 4 CATTLE BREEDS

years before present (log scale)

effective population size (log scale)

0 10 100 1,000 10,000 100,000

1003001,0003,00010,00030,000100,000

domestication Holstein, n=25

Angus, n=25 Fleckvieh, n=25 Jersey, n=15

years before present (log scale)

effective population size (log scale)

0 1000 2000 3000 4000 5000 6000

0500010000150002000025000

Long term history (log scale) Short term history (natural scale)

A CKNOWLEDGMENTS

Cattle genomes were obtained from the 1000 bull genomes project (www.1000bullgenomes.com). Data analyzes were performed on the computer cluster of the Genotoul bioinformatics platform Toulouse Midi-Pyrénées. I would like to thank Willy Rodriguez, Flora Jay, Stefano Mona and Frédéric Austerlitz for their help on this study.

R EFERENCES

[1] Mark A. Beaumont, Wenyang Zhang, and David J. Balding. Approximate bayesian com- putation in population genetics. Genetics, 162(4):2025–2035, 2002.

[2] Katalin Csilléry, Olivier François, and Michael G. B. Blum. abc: an r package for approxi- mate bayesian computation (abc). Methods in Ecology and Evolution, 3(3):475–479, 2012.

[3] S. Schiffels and R. Durbin. Inferring human population size and separation history from

multiple genome sequences. Nature Genetics, 46:919–925, 2014.

Références

Documents relatifs

The standard way to model random genetic drift in population genetics is the Wright-Fisher model and, with a few exceptions, definitions of the effective

Simon Boitard, Willy Rodríguez, Flora Jay, Stefano Mona, Frédéric Austerlitz. Inferring Popula- tion Size History from Large Samples of Genome-Wide Molecular Data - An

Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Compu- tation Approach.. conférence Jaques Monod ”Coalescence des

Outside this window however, population size histo- ries estimated by MSMC often had a totally different trend than the true history, (see for instance the small constant size or

In the second case, no finite population size is assumed, but the gain from future patients is geometrically discounted, so that the gain from patient j if they receive treatment i is

Inferring the ancestral dynamics of population size from genome wide molecular data - an ABC approach..

Genome wide sequence data contains rich information about population size history, cf PSMC (Li and Durbin, 2011).... Development of an

In the framework of our approach, elicitating the prior comes down to determine the 2 J parameters of the Dirichlet distribution placed on the parameter of the saturated model,