• Aucun résultat trouvé

Improvement of the assembly of heterozygous genomes of non-model organisms

N/A
N/A
Protected

Academic year: 2021

Partager "Improvement of the assembly of heterozygous genomes of non-model organisms"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01231793

https://hal.inria.fr/hal-01231793

Submitted on 20 Nov 2015

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Improvement of the assembly of heterozygous genomes of non-model organisms

Anaïs Gouin, Anthony Bretaudeau, Emmanuelle d’Alençon, Claire Lemaitre, Fabrice Legeai

To cite this version:

Anaïs Gouin, Anthony Bretaudeau, Emmanuelle d’Alençon, Claire Lemaitre, Fabrice Legeai. Im-provement of the assembly of heterozygous genomes of non-model organisms. Genome Informatics, Oct 2015, Cold Spring Harbor Laboratory, United States. 2015. �hal-01231793�

(2)

Anaïs GOUIN1, Anthony BRETAUDEAU2, Emmanuelle d'Alençon3, Claire LEMAITREand Fabrice LEGEAI2 1INRIA/IRISA/GenScale, Campus de Beaulieu, 35042 Rennes cedex, France

2INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Domaine de la Motte – 35653 Le Rheu 3INRA DGIMI, université de Montpellier 1, 34000 Montpellier

Motivation

  :

 

Some  heterozygous  regions 

have a significant divergence between the two 

haplotypes  and  the  assembly  process  can 

lead  to  the  construction  of  two  different 

contigs, instead of one consensus sequence. 

Objective

 : 

Set  up  a  strategy  to  detect  and 

correct  false  duplications  in  already­built 

assemblies.

Improvement of the assembly of heterozygous 

genomes of non­model organisms

scaffold_a Read-depth Expected read-depth superscaffold_c 2 scaffold_b Potential erroneous duplications Expected coverage Potential duplications N um be r of s ca ff ol ds Coverage of scaffolds pre-selection of pairs of “similar” scaffolds at least one hit with :

-e-value ≤ 1e-100 -hit length ≥ 1 kb

(or 80% of smallest scaffold) Fasta file

of the assembly (TEs masked)

Fast self whole genome alignment Identification of mis-assemblies Genome correction Re-annotation of lost genes Re-alignment of selected pairs of scaffolds BAM file

(mapped reads onto

the assembly) Fasta file GFF and of annotated proteins Fasta file of the corrected assembly New GFF of gene annotations - alignments + chaining of hits

to get longer alignments

-filtering small chains : ≤ 1 kb (or 80% of smallest scaffold)

Based on 3 main criteria :

- topology : “included” “border” - read depth : - uniqueness : filtering duplications by checking uniqueness of matches

METHOD

APPLICATION 

Spodoptera frugiperda genome

≤ Abp ≤ Abp dist1 dist2 ≥ B% of query cumulated read depth lim1 ≤ ≤ lim2 “included”

The smallest scaffold is deleted

“border”

The scaffolds are linked by their extremities,

keeping the allele located on the longest scaffold of the pair

Relocation and merging of

supernumerary gene annotations : - alignment of the impacted gene onto the remaining allele (Exonerate) :

- NO => delete allele

- YES => 3 distinct cases

“synonymous”

“no intersection”

“intersection”

Segment in the corrected genome Deleted segment Both alleles annotated : no need to re-annotate the lost gene N ew p re d ic tio n us in g A ug us tu s

Genome correction

Initial  assembly Allpaths Corrected  assembly Haplomerger Total size (Mb) 526.0 434.9 369.5 Nb. scaffolds  48,272 41,577 37,797 N50 (kb) 39.6 52.8 58.4 Expected size : ~ 400 Mb BUSCO statistics : Benchmarking sets of Universal Single­ Copy Orthologs (2,675 for Arthropoda species) [6] Plast [3] Lastz AxtChain [2] Exonerate [4] Augustus [5]

Annotation stats

Previous release : 25,041 genes ==> 3,746 genes to re­annotate # genes % success “no alignment” 34 0 “synonymous” 747 100 “no intersection” 643 45.4 “intersection” 2,322 86.3 ==> Overall success of 80% / New release : 21,578 genes Addition of a new gene in the remaining region Modification of an already annotated gene Initial  assembly Corrected  assembly Haplomerger Missing 363 336 562 Single copy  1,246 1,586 1,242 Fragmented 476 457 771 Duplicated 590 296 100

Read depth analysis : before/after correction

[1] Huang S. Et al, Genome research, 2012 [2] Kent WJ. Et al, Proceedings of the National Academy of Sciences, 2003. [3] Nguyen V.H et al, BMC Bioinformatics, 2009 [4] Slater G. S. et al, BMC bioinformatics, 2005 [5] Stanke M. Et al, Genome Biol, 2006 [6] Waterhouse et al, Nucleic Acids Research, 2013 Improvement of the initial assembly for both methods

Haplomerger merged

more regions, leading to a smaller final

assembly

Comparison with another method : Haplomerger [1]

Reduction of the genome size (17%), increase of the N50 and more single copies for important genes

Reduces less than Haplomerger gain of numerous BUSCO genes

Our method: more conservative, preserves genome consistency and allows easier re-annotation of

impacted genes

* best result by category

* * *

Références

Documents relatifs

We present a method for simulating local water waves caused by obstacles in water streams for real-time graphics applications.. Given a low-resolution water surface and velocity

Although it is clear that species and functional diversity per se can affect the pro- vision of certain ecosystem services (Tilman 1996, Balvanera et al. 2007), the SP approach

The formation of the red circle intervenes when (1) Hox gene encodes a home- odomain protein which is a regulation factor of the red gene, (2) this homedo- main protein has

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads

Aurelie Canaguier, Simone Scalabrin, Marie-Christine Le Paslier, Eric Duchêne, Nacer Mohellibi, Aurélie Berard, Aurelie Chauveau, Jean-Michel.. Boursiquot, Gabriele Di Gaspero,

Historians have agreed to define transition as the period of time when a certain situation slowly or violently dissolves, while in the mean- time a new system is being formed..

When courses on the history of the Nation and of the Party were introduced in the syl- labuses of all faculties, the cooperative history PhD captain was sent to lecture to

Querelle épigraphique entre deux savants : l’exemple de la correspondance, publiée dans la Revue archéologique de 1847, entre Antoine-Jean Letronne et Jules Chevrier à propos de