• Aucun résultat trouvé

Reconstructing haplotypes from genotypes in multiparental populations using Artificial Neural Networks

N/A
N/A
Protected

Academic year: 2021

Partager "Reconstructing haplotypes from genotypes in multiparental populations using Artificial Neural Networks"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02151672

https://hal.archives-ouvertes.fr/hal-02151672

Submitted on 12 Jun 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Reconstructing haplotypes from genotypes in multiparental populations using Artificial Neural

Networks

Jérémy Andréoletti, Luke Noble, Henrique Teotónio

To cite this version:

Jérémy Andréoletti, Luke Noble, Henrique Teotónio. Reconstructing haplotypes from genotypes in multiparental populations using Artificial Neural Networks. Session Poster pour Immersion Experi- mentale, May 2019, Paris, France. 2019. �hal-02151672�

(2)

• Currently, the best haplotype imputation algorithms are based on Hidden Markov Models (HMM) [4]

• In the last years, geneticists started to unfold the power of Convolutional Neural Networks in population genetic inference [5]

• Our goal is to explore the potential and robustness of Neural Networks for haplotype reconstruction, and secondarily for predicting the number of recombination events

• We worked on simulations before applying our model to the C. elegans multiparental experimental evolution (CeMEE) dataset [6], with the Single Nucleotide Polymorphisms (SNPs) of the chromosome I as a case study

Jérémy ANDRÉOLETTI, Luke NOBLE and Henrique TEOTÓNIO Team Experimental Evolutionary Genetics

Previous methods (HMM) : require the non-trivial computation of statistical indicators vs. Neural Networks (NN) : only need the genotypes

Training data : thousands of genotypes + associated haplotypes obtained by recombining simulated or real founders

Input : matrix constructed by alternating 1 descendant with the 16 founders and the genetic map (= probability of recombination between markers)

• Types of Neural Networks employed :

1. CNN = Convolutional Neural Network ⇢ suitable for image analysis piece by piece, reduces the dimensionality

2. LSTM = Long-Short Term Memory ⇢ suitable for sequence analysis, returns sequences, refines the results of the CNNs

3. Dense Neural Network ⇢ classical fully connected NN, more efficient on reduced input, returns a multi-categorical classification = haplotype

Reconstructing haplotypes from genotypes in multiparental populations using Artificial Neural Networks

References :

[1] Crow and Kimura. An Introduction to Population Genetics Theory, 1970 (Harper & Row) [2] Wakeley. Coalescent Theory: an introduction, 2009 (Roberts & Company Pub.)

[3] Rakshit S., Rakshit A. & Patil. Journal of Genetics, 2012 (91: 111.)

[4] Zheng, Boer, Eeuwijk. Genetics, 2015 (vol. 200 no. 4 1073-1087) and 2018 (vol. 210 no. 1 71-82)

[5] Flagel, Brandvain, Schrider. Molecular Biology and Evolution, 2019 (vol. 36, p. 220–238) [6] Noble, Chelo et al. Genetics, 2017 (vol. 207 no. 4 1663-1685)

Results

• Neural Network are efficient at predicting haplotypes from genotypes, but the upcoming comparison with HMMs will be crucial to assess their valuableness

• Our model is robust to varying mean or variance of genotypes similarities – in a limited range – and it is likely to give accurate predictions for other datasets

• A “scalable” model that requires less memory is also being developed

• Future models should be trainable without the founders and they may be optimised with a meta-modelling approach to further improve their accuracy

Artificial Neural Networks Introduction

Conclusion and prospects

Aim and methods

SNPs along the chromosome

16 founders

Simulations

= random founders

Real data

= similar founders 87% accuracy after 100

epochs (= training runs of the model)

43% accuracy after 100 epochs (48% if related

founders are merged)

SNPs along the chromosome

Accuracies for varying genotypes similarities between founders

Comparisons between a predicted haplotype (1) and the real one (2)

1 2 1

2

• A major goal in population genetics is to predict the genetic history of contemporary populations from sequence data [1] [2]

• In experimental and agricultural genetics there are many cases where multiple founders (of known genotypes) are combined to produce recombinant progeny [3]

• For each descendant, reconstructing its haplotype means finding which regions along the genome descend from which founder

• Limiting factors for haplotype reconstruction are the number of founders, relatedness between them, and the number of generations to the focal descendants

Simulation of haplotypes and genotypes, mimicking the experimental design [6]

Architecture of our most accurate neural network model for haplotype reconstruction :

A = input formatting, B = dimensionality reduction, C = core analysis, D = prediction checking

Recombinations number prediction

Références

Documents relatifs

As empirical verification has been minimized the mutual information con- taining a variational function given by equation (15) and a set of normal prob- ability densities, as

[26] presents mechanisms for programming an ANN that can correctly classify strings of arbitrary length belonging to a given regular language; [9] discuss an algorithm for

The first section presents biological neurons and neural networks; the second section introduces biological glia cells; The third section provides artificial neural network, and

7.1.1 Supervised Band Selection using Single-Layer Neural Network In this context, the present thesis proposed a supervised filter-based band selection framework based on

This paper presented a methodology to model a nonlinear oscillator in VHDL-AMS using an ANN, including the capability of simulating phase noise. The output re- produces faithfully

Our main goal is to demonstrate that using the data collected from an instrumented game played by a subject, a trained artificial neural network is able to determine whether or not

The main purpose of the authors is to research and test a new method to solve the second problem, namely, training ANNs, which will be based on the use of genetic algorithms, as

Training network neural is an interactive process, at the entrance of which net finds secret nonlinear relationships between input parameters and final