• Aucun résultat trouvé

Reference guided genome assembly in metagenomic samples

N/A
N/A
Protected

Academic year: 2021

Partager "Reference guided genome assembly in metagenomic samples"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01934823

https://hal.archives-ouvertes.fr/hal-01934823

Submitted on 26 Nov 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Reference guided genome assembly in metagenomic samples

Cervin Guyomar, Wesley Delage, Fabrice Legeai, Christophe Mougel, Jean-Christophe Simon, Claire Lemaitre

To cite this version:

Cervin Guyomar, Wesley Delage, Fabrice Legeai, Christophe Mougel, Jean-Christophe Simon, et al..

Reference guided genome assembly in metagenomic samples. RECOMB 2018 - 22nd International Conference on Research in Computational Molecular Biology, Apr 2018, Paris, France. pp.1. �hal- 01934823�

(2)

Reference guided genome assembly in metagenomic samples

Cervin Guyomar1,2 , Wesley Delage2, Fabrice Legeai1,2,

Christophe Mougel1, Jean-Christophe Simon1, Claire Lemaitre2

1 : INRA, UMR 1349 IGEPP, le Rheu, France

2 : INRIA/IRISA GenScale, Campus de Beaulieu, Rennes, France

Motivations

MindTheGap assembly workflow

Step 1 : Reference based read recruiting and

backbone contig assembly Step 2 : Reference free gapfilling between contigs using MindTheGap [4]

Reference genome

Backbone contigs

Scaffolding contigs without any assumption on their order

or orientation

Able to return several alternative solutions

(large structural polymorphism)

An hybrid approach :

- Mapping reads and assembly of conserved regions - Gapfilling of diverging regions with all reads

Reads mapping on the remote reference

(BWA)

Assembly (Minia)

Output :

super-contigs (fasta) + gfa format

MindTheGap algorithm

- De Bruijn graph assembly starting from a contig end kmer

- Search target kmers in the contig graph

2 solutions between contig 1 and contig 2 1 solution between contig 1 and contig 3 Metagenomic readset

All the reads

Results in the pea aphid holobiont

Metagenomics = A mixture of reads : - from different genomes

- with polymorphism

(SNPs and large structural polymorphism)

- with no close reference genome (most of the time)

Objectives :

Assemble an genome of interest - from metagenomic reads,

- using a remote reference genome - detecting stuctural polymorphism

Existing methods :

Successful assembly of a bacterial genome in one circular contig Discovery of unknown phage structural variants

Degnan 2008 [6]

- 7 known variants known,

differing by a ~5kb virulence cassette

Results :

- 3 new phage variants discovered in 5 samples - Coabundant phage successfully assembled in 3 samples

+ Assembly is reference-free

-

Assembly first

De novo metagenomic assembly + contig taxonomic assignation

Ex : MegaHit[1] + Blast

Mapping first Assembly of reads Mapped on reference

Ex : BWA+Minia[2]

Hybrid stategy

Reference guided assembly

Local assembly of the regions diverging from the reference

Ex : MitoBim[3]

Assembly time reduced by read selection

- Conserved regions easily assembled using reference

- Able to reconstruct diverging regions - Time consuming and challenging

assembly

- Incomplete assembly if remote reference

- Tricky contig filtering

- Requires a close reference genome - Incomplete assembly if remote

reference

No tool suited for metagenomic data : - fail to detect structural polymorphism

- No scale-up with metagenomic datasets

Target kmer (contig 3)

Target kmer (contig 2) Tarket kmer

(contig 2)

Start kmer (contig 1)

→ Need for a tool dedicated to guided assembly in metagenomic context

MindTheGap results

- One-contig assembly for 30 samples

- Genome length close to the real reference - On average 7 times faster than Megahit

Assembly of Buchnera aphidicola from 42 pea aphid metagenomic samples [5]

using Buchnera genome from another species

Assembly of the phage APSE from 42 pea aphid metagenomic samples

In developpment on GitHub

https://github.com/GATB/MindTheGap/tree/contig_dev

References :

[1] : Li et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics

[2] : Chikhi et al (2013) Space-efficient and exact de Bruijn graph representation based on a Bloom filter.

Algorithms for molecular biology : AMB

[3] : Hahn et al (2013) Reconstructing mitochondrial genomes directly from genomic next-

generation sequencing reads - A baiting and iterative mapping approach. Nucleic Acids Research

[4] : Rizk, G. et al., (2014) MindTheGap: integrated detection and assembly of short and long insertions. Bioinformatics, [5] : Guyomar et al (2018) Multi-scale characterization of symbiont diversity in the pea aphid complex through

metagenomic approaches (in revision – Microbiome)

[6] : Degnan and Moran (2008) Diverse Phage-Encoded Toxins in a Protective Insect Endosymbiont, Applied and Environmental Microbiology

Références

Documents relatifs

Furthermore, a genome-wide association study (GWAS) was performed on several thousands of dairy animals in order to assess the impact of validated SVs on routinely

The most common approach to perform genome assemblies is de novo assembly, where the genome is reconstructed exclu- sively from the information of overlapping

We identified 25 gene models exhibiting the 4 domain fungal laccase signature and the main characteristic of fungal laccases in term of length, cellular addressing and signaling

Direct cortico- spinal projec- tions Areas of sig- nificant projec- tion in MI Cortical rela- tionships Unit response to peripheral field stimula- tion Major thalamic

(2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next- Generation Sequencing Technologies.. (2011) Parallelized short read

We applied this method to reconstruct genomes from several metagenomic samples of the pea aphid Acyrthosiphon pisum. Focusing on the primary endosymbiont Buchnera aphidicola,

The approach is based on integer programming model for solving genome scaffolding as a problem of finding a long simple path in a specific graph that satisfies additional

Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome- level highly contiguous genome assembly of Tasha created with long-read technologies that in- creases