• Aucun résultat trouvé

Joint inference of effective population size and genetic load from temporal population genomic data

N/A
N/A
Protected

Academic year: 2021

Partager "Joint inference of effective population size and genetic load from temporal population genomic data"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-03189306

https://hal.archives-ouvertes.fr/hal-03189306

Submitted on 3 Apr 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Joint inference of effective population size and genetic load from temporal population genomic data

Vitor Pavinato, Stéphane de Mita, Jean-Michel Marin, Miguel Navascués

To cite this version:

Vitor Pavinato, Stéphane de Mita, Jean-Michel Marin, Miguel Navascués. Joint inference of effective population size and genetic load from temporal population genomic data. Ancient Biomolecules of Plants, Animals, and Microbes 2021, Mar 2021, Virtual, United Kingdom. �hal-03189306�

(2)

Joint inference of effective population size and genetic load

from temporal population genomic data

Vitor A.C. Pavinato

St´ephane De Mita

Jean-Michel Marin

Miguel de Navascu´es

INRAE, Uppsala universitet, Universit´e de Montpellier

miguel.navascues@inrae.fr

Common assumptions in population genetic inference

Most classical population genetic inference methods assume that genome wide genetic diversity is mostly influenced by neutral processes (commonly named as “demography”) while selective processes affecting few isolated

loci. Thus, changes in allele frequencies through time for a large number of loci can be used to infer the amount of drift (i.e. the effective population

size, Ne) and the presence of some selection is assumed to have a negligible

impact on that estimate. Loci under selection can then be identified as they will present allele frequency changes larger or more often in the same

direction, than expected under pure drift.

0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 time allele frequencies

Figure: A naive view of population genetics. Simulation of allele frequencies change

through time in nine neutral loci (black) and one adaptive locus (red) due to drift (Ne = 500)

and selection (s = 0.4).

Reality is more complex

In some cases these assumptions might not hold. If the action of selection is widespread along the genome or recurrent in time, the high proportion of loci affected by selection and hitch-hiking could significantly bias the

estimates of effective population size. Removing outlier loci would not solve this problem because only loci with large effects will be detected but many other loci under weaker selection (e.g. selection on polygenic characters) could be present. 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 time allele frequencies

Figure: A proportion of neutral loci are afected by linked selection. Simulation of

allele frequencies change through time in nine neutral loci (black), one adaptive locus (red) and

five hitch-hiking neutral loci (blue) due to drift (Ne = 500), selection (s = 0.4) and

recombination (0.005 ≤ r ≤ 0.1). 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 time allele frequencies

Figure: Loci under weak selection are difficult to distinguish from neutral loci.

Simulation of allele frequencies change through time in nine neutral loci (black) and nine

adaptive loci (orange) due to drift (Ne = 500) and selection (0.03 < s < 0.13).

Our proposal: joint inference using ABC

We propose to make population genetic inference using models that include both the neutral and adaptive processes on the genome scale. This will

allow to estimate demographic and selection parameters taking into account their interaction. Because these models are difficult to address under a

likelihood framework we recourse to approximate Bayesian computation via Random Forest (ABC-RF). ABC-RF uses simulations to generate a training data set from which RF can learn and make inferences from real data.

Our model

We applied this approach to the case of a single population of size N.

Advantageous mutations arrive to the population at rate µb with selection

coefficients (s) taken from a gamma distribution. We simulate this model

with SLiM and at each generation the effective population size (Ne) is

calculated from the variance of reproductive success of individuals. We also calculate genetic load (L) and proportion of polymorphisms with Ns > 1

(P). Samples of individuals are taken at two different generations, separated by a period of time τ .

Results: Estimation of Ne

The performance of the method was assessed using the simulations from

the training data. For each of the the calculated Ne during the simulation

(true Ne) was compared to its estimate (out-of-bag, OOB, prediction from

ABC-RF). An estimate of Ne from temporal FST was also calculated as

comparison with a method that assumes all loci to be neutral.

N N N N N N F 휃 휃 MSE = 0.023 MSE = 0.032 MSE = 1.479 MSE = 0.299 R = 0.6592 a b R = 0.9662 R = 0.9722 R = 0.6302

Figure: Performance of the Ne estimation. Left: Estimates from ABC-RF. Right:

Estimates from temporal FST (assume no selection).

−6 −4 −2 0 2 4 6 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 log10( b) L o ca l MS E f ro m lo g10 ( N ^ )e e st ima te s Ne from ABC−RF Ne from Temporal FST 𝜃

Figure: Good Ne estimation despite pervasive selection. Error in Ne estimate in function

of the presence of selection (measured as θb = 4Neµb). Estimates from FST show a dramatic

increase in error when selection is frequent in the population but ABRF estimates are only weakly affected.

Results: Estimation of selection

P = 0 s = 0 L = 0 P P P P B R B R P s P s 훾 훾 L L MSE = 1.386 2 R = 0.387 MSE = 1.596 2 R = 0.784 MSE = 0.607 2 R = 0.187 MSE = 0.116 2 R = 0.516 MSE = 2.476 2 R = 0.566 a c P = 0 s = 0 L = 0 P P P P B R B R P s P s 훾 훾 L L MSE = 1.386 2 R = 0.387 MSE = 1.596 2 R = 0.784 MSE = 0.607 2 R = 0.187 MSE = 0.116 2 R = 0.516 MSE = 2.476 2 R = 0.566 a c

Figure: Quantities measuring selection are harder to estimate. Left: Estimates of

genetic load. Right: Estimates of proportion of polymorphism under selection.

Conclusions

I Including both neutral and adaptive processes in the

model allows:

. Good estimate of demography in presence of pervasive

selection

. Estimate of quantities describing the action of selection

based exclusively on polymorphism data.

I Further work needed to evaluate more complex models

Acknowledgements

This project has received funding from the LabEx AGRO (convention ANR-10-LABX-0001-01), CEMEB (convention ANR-10-LABX-0004) and

NUMEV (convention ANR-10-LABX-20) through the AAP Inter-LabEx (ABCSelection). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreement No 791695 (TimeAdapt). S. De Mita was

funded by INRAE (Projet Innovant EFPA).

Références

Documents relatifs

ity of achieving dosage balance through downregulation of auto- somal genes functionally interacting with dosage-sensitive X-linked genes (to compensate for the decrease of

d) On a ABC est un triangle équilatéral de centre de gravité G et (C) son cercle circonscrit, d’où le cercle (C) est de centre G et de rayon GA... On suppose que la proposition

For instance, little is known regarding the historical divergence of major Western pig breeds from their wild ancestor, the European wild boar, and the impact of

In those studies, hippocampal tissue (gray + white) volume, parahip- pocampal cortical volume, and total cerebral cortical volume have all been shown to be smaller in the

A restriction on family contributions leads to a more dispersed distribution of selections among families, and thus higher effective size. The trend becomes

Body size reaction norms in Drosophila melanogaster: temporal stability and genetic architecture in a natural population... Original

(2013) First Record of Culicoides oxystoma Kieffer and Diversity of Species within the Schultzei Group of Culicoides Latreille (Diptera: Ceratopogonidae) Biting Midges in Senegal..