• Aucun résultat trouvé

Detecting past contraction in population size using haplotype homozygosity

N/A
N/A
Protected

Academic year: 2021

Partager "Detecting past contraction in population size using haplotype homozygosity"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02932275

https://hal.inrae.fr/hal-02932275

Submitted on 7 Sep 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Detecting past contraction in population size using haplotype homozygosity

C Merle, Jean-Michel Marin, F. Rousset, Raphaël Leblois

To cite this version:

C Merle, Jean-Michel Marin, F. Rousset, Raphaël Leblois. Detecting past contraction in population size using haplotype homozygosity. Mathematical and Computational Evolutionnary Biology 2016, Jun 2016, Montpellier, France. �hal-02932275�

(2)

Detecting past contraction in population size using haplotype homozygosity

C. Merle1,2&3, J-M. Marin1&3, F. Rousset3&4 and R. Leblois2&3

1 - Institut de Montpeliérain Alexander Grothendieck (IMAG, UM); 2 - Centre de Biologie pour la Gestion des Populations (CBGP, INRA); 3 - Institut de biologie computationnelle (IBC); 4 - Institut des Sciences de l’Evolution (ISEM, CNRS)

Next generation genetic data

Classical inference methods of the demographic history (likelihood based methods (IS, MCMC), approximate bayesian methods and Site Frequency Spectrum), suitable for polymorphism data sets consisting in some loci and assuming the genealogies of different loci are independent, do not exploit the genetic recombination.

Take advantage of genome wide data with known genome

{ Consider the dependency of genealo-gies of adjacent positions in the genome.

Pairwise alignment

GT CATGAGCCGA GT

GA CATGAGCCGA

| {z }

TT

New inference approaches of demographic history based on the conserved sequence

lengths in the pairwise alignment within a diploid genome : [3], [5], [4], [2], [1].

The patterns of linkage disequilibrium (LD) between polymorphic markers are shaped by the ancestral population history as we can see on haplotype homozygosity curves which measure the LD.

0 2000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1.0

Constant population size

length HHth 500 1000 1500 2000 3000 4000 5000 7500 10000 0 2000 4000 6000 8000 10000 0.4 0.6 0.8 1.0

Contracting population size

length HHth (500,1500,5) (500,1500,10) (500,1500,20) (1000,1500,5) (1000,1500,10) (1000,1500,20) (2000,1500,5) (2000,1500,10) (2000,1500,20) 0 2000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1.0

Constant population size

length HHth 500 1000 1500 2000 3000 4000 5000 7500 10000 0 2000 4000 6000 8000 10000 0.4 0.6 0.8 1.0

Contracting population size

length HHth (500,1500,5) (500,1500,10) (500,1500,20) (1000,1500,5) (1000,1500,10) (1000,1500,20) (2000,1500,5) (2000,1500,10) (2000,1500,20)

M0: Constant population size θ0 = (Ne0) M1: Contracting population size θ1 = (Ne0, t1, f1)

Demographic history inference from genome wide sequence data

Haplotype Homozygosity: HH(i) : probability for i adjacent markers drown at random in the whole genome sequence to be homozygote.

Theoretical HH of [3], denoted HHth : coalescent

based computation, assuming the mutation and recom-bination rate known and constant along the genome.

Empirical HH, denoted HHd computed from the

observed data, is the segment proportion of at least i

homozygotes adjacent markers. 0 2000 6000 10000

0.85 0.90 0.95 1.00 Segment length HH 100 empirical HH Theoretical HH 0 2000 6000 10000 0.4 0.6 0.8 1.0 Segment length HH 100 empirical HH Theoretical HH 0 2000 6000 10000 0.75 0.85 0.95 Segment length HH 100 empirical HH Theoretical HH M0: θ0 = 500 M0: θ0 = 5000 M1: θ1 = (500, 1500, 10) We estimate θ0 and θ1 by : bθ0 ∈ arg min θ0 X i∈I       HHth0, i) − HHd(i) d HH(i)       2 bθ1 ∈ arg min θ1 X i∈I       HHth1, i) − HHd(i) d HH(i)       2

Evaluation of i 7→ HHth1, i) is time consuming { optimization with the blackbox R package.

Model choice criterion: embedded models

Penalized mean square criterion

arg min j∈{0,1} X i∈I wj(i)        HHth(bθj, i) − HHd(i) d HH(i)        2 w0(i) = bS (1) Ne0(i)/  b S (1)Ne 0(i) + bS (1) t1 (i) + bS (1) f1 (i) , w1(i) = 1. 0 5000 10000 15000 0.0 0.2 0.4 0.6 0.8 1.0 Penalisation weight Segment length S ^ N e0 S ^ N e0 + S ^ t1 + S ^ f1 b

S φ(1)(i) : estimate of Sobol’s sensitivity index of order one of a given parameter φ computed for i adjacent markers under the more complex model (sensitivity R package).

Avoid choosing the more complex model

0 5000 10000 15000 0.80 0.85 0.90 0.95 1.00

Jeu de données 9, Choix de modèle : modèle 1, péna modèle 0

Segment length

HH

Empirical HH, (500) Model 0 (500)

Model 1 (478,3525,6.4)

On this simulated data set with constant population size θ0 = 500, the non penalized mean square criterion would choose M1 whereas the penalized mean square criterion choose M0.

Detecting a contraction 0 5000 10000 15000 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Jeu de données 3, Choix de modèle : modèle 1, péna modèle 1

Segment length

HH

Empirical HH, (500,1500,10) Model 0 (896)

Model 1 (480,1310,10)

On this simulated data set with contracting population size θ1 = (500, 1500, 10), the penalized mean square criterion choose M1.

Numerical results

Data sets simulated under M0, θ0 = Ne0 = 500

npos 5000 10000 Locus 2000 200000 2000 200000 Data sets 100 1 100 1 Adjustment 0.74 1 0.59 0 Penalized Adj. 0.94 1 0.98 1 b θ0 Ned0 494 494 502 501 [478 − 510] . [496 − 509] −

Data sets simulated under M0, θ0 = Ne0 = 5000

npos 5000 10000 Locus 2000 200000 2000 200000 Data sets 100 1 100 1 Adjustment 0.41 0 0.54 0 Penalized Adj. 0.86 1 0.79 0 b θ0 Ned0 4998 5044 4915 4880 [4960 − 5036] − [4881 − 4949]

Data sets simulated under M1, θ1 = (Ne0, t1, f1) = (500, 1500, 10)

npos 10000 15000 20000 Locus 2000 200000 2000 200000 2000 200000 Data sets 100 1 100 1 100 1 Penalized Adj. 0.81 1 0.92 1 0.95 1 b θ1 d Ne0 [617 − 650]634 633 [547 − 576]561 529 [520 − 548]534 492 bt1 2573 2445 2031 1592 1881 1446 [2402 − 2744] − [1918 − 2144][1727 − 2036] − bf1 12.4 11.5 12.3 9.1 12.2 9.49 [11.8 − 13.0] − [11.7 − 12.8][11.6 − 12.7]

Holstein data sets

npos 10000 Locus 2000 200000 Data sets 100 1 CDR Penalized Adj. 0.93 1 b θ1 d Ne0 6690 6924 [6533 − 6847] − bt1 5965 6532 [5559 − 6372] − bf1 4.23 4.0 [3.90 − 4.57] −

References

[1] Sharon R Browning and Brian L Browning. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. The American Journal of Human Genetics, 97(3):404–418, 2015.

[2] Kelley Harris and Rasmus Nielsen. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS genetics, 9(6):e1003521, 2013.

[3] I. MacLeod, T. Meuwissen, B. Hayes, and M. Goddard. A novel predictor of multilocus haplotype homozygosity: comparison with existing predictors. Genetics research, 91(6):413–426, 2009.

[4] I. MacLeod, D. Larkin, H. Lewin, B. Hayes, and M. Goddard. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors. Molecular Biology and Evolution, 30(9):2209–2223, 2013. [5] Pier Francesco Palamara, Todd Lencz, Ariel Darvasi, and Itsik Peer. Length distributions of identity by descent reveal fine-scale demographic history. The American Journal of Human Genetics, 91(5):809–822, 2012.

Références

Documents relatifs

Si certains travaux ont abordé le sujet des dermatoses à l’officine sur un plan théorique, aucun n’a concerné, à notre connaissance, les demandes d’avis

Using the Fo¨rster formulation of FRET and combining the PM3 calculations of the dipole moments of the aromatic portions of the chromophores, docking process via BiGGER software,

Later in the clinical course, measurement of newly established indicators of erythropoiesis disclosed in 2 patients with persistent anemia inappropriately low

Households’ livelihood and adaptive capacity in peri- urban interfaces : A case study on the city of Khon Kaen, Thailand.. Architecture,

3 Assez logiquement, cette double caractéristique se retrouve également chez la plupart des hommes peuplant la maison d’arrêt étudiée. 111-113) qu’est la surreprésentation

Et si, d’un côté, nous avons chez Carrier des contes plus construits, plus « littéraires », tendant vers la nouvelle (le titre le dit : jolis deuils. L’auteur fait d’une

la RCP n’est pas forcément utilisée comme elle devrait l’être, c'est-à-dire un lieu de coordination et de concertation mais elle peut être utilisée par certains comme un lieu

The change of sound attenuation at the separation line between the two zones, to- gether with the sound attenuation slopes, are equally well predicted by the room-acoustic diffusion