• Aucun résultat trouvé

Whole metagenome analysis with metagWGS

N/A
N/A
Protected

Academic year: 2021

Partager "Whole metagenome analysis with metagWGS"

Copied!
6
0
0

Texte intégral

(1)

HAL Id: hal-03176836

https://hal.archives-ouvertes.fr/hal-03176836

Submitted on 22 Mar 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Whole metagenome analysis with metagWGS

Joanna Fourquet, Céline Noirot, Christophe Klopp, Philippe Pinton, Sylvie

Combes, Claire Hoede, Géraldine Pascal

To cite this version:

(2)

What bioinformatics solution exists to process whole metagenome shotgun data?

ATLAS

[3]

→ Incomplete

✓ Metagenome assembly

×

Taxonomic affiliation

of reads and contigs

✓ Taxonomic affiliation of bins

✓ Functional annotation

→ Issues during use

✓ Automated with

[4]

→ Reproducible

✓ Installation and run with

→ Modular

✓ Different parameters

✓ Possibility to skip certain steps

→ Available

✓ https://github.com/metagenome-atlas/atlas

Whole metagenome analysis with metagWGS

Joanna FOURQUET

1

, Céline NOIROT

2

, Christophe KLOPP

2

,

Philippe PINTON

3

, Sylvie COMBES

1

, Claire HOEDE

2

, Géraldine PASCAL

1

1

GenPhySE, Université de Toulouse, INRAE, ENVT, F-31326, Castanet Tolosan, France

2

INRAE, UR875 MIAT, PF Bioinfo GenoToul, F-31326, Castanet-Tolosan, France

3

Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, F-31027 Toulouse, France

Corresponding Author:

geraldine.pascal@inrae.fr

Acknowledgements

ExpoMycoPig project and JF are funded by France Futur Elevage

We are grateful to the Genotoul bioinformatics platform Toulouse Occitanie (Bioinfo Genotoul, doi: 10.15454/1.5572369328961167E12) for providing computing and storage resources

Our solution: metagWGS

→ Complete

✓ Metagenome assembly

✓ Taxonomic affiliation

of reads, contigs and bins

✓ Functional annotation

→ Easy of use

✓ Automated with

→ Highly reproducible

✓ Comes with a Singularity container

→ Modular

✓ Different parameters

✓ Possibility to skip certain steps

→ Available

✓ https://forgemia.inra.fr/genotoul-bioinfo/metagwgs

nf-core/mag

→ Incomplete

✓ Metagenome assembly

✓ Taxonomic affiliation

of reads and bins

×

Taxonomic affiliation of contigs

×

Functional annotation

→ Easy of use

✓ Automated with

[1]

→ Highly reproducible

[2]

✓ Comes with a Singularity container

→ Modular

✓ Different parameters

✓ Possibility to skip certain steps

→ Available

(3)

metagWGS: preprocessing, assembly and annotation

Assembly

Taxonomic affiliation of reads

Quality control

FastQC

[10]

.html …

FastQC

.html …

Cleaning

Raw data

Adapter sequence

(entire or truncated)

High quality read

Low quality read

Host read

Contig 1.1

Contig 1.2

S1

S2

S3

Reads deduplication

Contig 1.1

Contig 1.2

S1

S2

Assembly filter

S1

S2

S3

Contig 1.1

Contig 1.2

CPM filter

Annotation of genes

Possible skipping

cutadapt

[5]

sickle

[6]

bwa mem

[7]

kaiju

[8]

kronaTools

[9]

metaspades

[11]

or megahit

[12]

S2

S3

Contig 1.2

Gene 1

Gene 2

Contig 1.3

Gene 1 bis

Gene 3

S1

S1

S2

S3

S1

S2

S3

prokka

[14]

S3

bwa mem, samtools markdup

[13]

(4)

Taxonomic affiliation of bins

Binning of contigs

Possible skipping

Input from part 1

Annotation of genes

Reads deduplication

Assembly filter

Taxonomic affiliation of contigs

Clustering of genes

Contig 1.2

Gene 1

Gene 2

Contig 1.3

″Gene 1″

Gene 3

Contig 2.1

Gene 4

″Gene 1″

Quantification of reads on genes

Reads

Reads

Count 2 2

Count 1 4

Reads

Count 3 1

S1

S2

Contig 2.1

Gene 4

″Gene 1″

Contig 1.2

Gene 1

Gene 2

Contig 1.3

″Gene 1″

Gene 3

S1

#contig

consensus taxid

consensus lineage

S1_1

1263

cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Clostridia; Clostridiales

S1_10

84108

cellular organisms; Bacteria; Terrabacteria group; Actinobacteria; Coriobacteriia

S2

S3

Taxonomy

Taxonomy

Taxonomy

S1

S2

S3

Contig 1.2

Gene 1 Gene 2

Contig 1.3

Gene 1 bis

Gene 3

Contig 1.1

Contig 1.2

S1

S2

S3

S1

S2

S3

Contig 1.1

Contig 1.2

metagWGS: quantification and affiliation

cd-hit-est

[15]

bwa mem, featureCounts

[16]

(5)

Length and number of contigs

MultiQC

[21]

graphic outputs of metagWGS

Read count metrics

Assessment of bins

Taxonomic affiliation

of reads metrics

Quantification table of reads on genes

#contig

consensus

taxid

consensus lineage

first1

210

cellular organisms; Bacteria; Proteobacteria; delta/epsilon subdivisions; Epsilonproteobacteria; Campylobacterales;

Helicobacteraceae; Helicobacter; Helicobacter pylori

first10

1681

cellular organisms; Bacteria; Terrabacteria group; Actinobacteria; Actinobacteria; Bifidobacteriales; Bifidobacteriaceae;

Bifidobacterium; Bifidobacterium bifidum

first100

1358

cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Lactococcus; Lactococcus

lactis

first1000

1322

cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae; Blautia; Blautia hansenii

first10004

644

cellular organisms; Bacteria; Proteobacteria; Gammaproteobacteria; Aeromonadales; Aeromonadaceae; Aeromonas; Aeromonas

hydrophila

Taxonomic classification of contigs

Kronas

id_cluster

first.featureCounts.tsv second.featureCounts.tsv

first59.Prot_28193

844

887

first82.Prot_34066

847

891

first992.Prot_96159

4092

4389

first8.Prot_06503

5279

3702

first21.Prot_13879

584

611

(6)

Time

metagWGS will be used to analyze ExpoMycoPig data

How can we characterize microbiome digestive ecosystems for pig exposed to mycotoxins?

3 weeks

Fecal sampling

Fecal sampling

n = 8

Experimental strategy

DNA samples

Total DNA

extraction

Whole genome

shotgun

sequencing

.fastq.gz

metagWGS

Output files

n = 8

Control diet

Diet contaminated with FB1 mycotoxins

[22]

References

[1] Di Tommaso et al. Nextflow enables reproducible computational workflows. Nat Biotechnol., 2017. [2] Kurtzer et al. Singularity: Scientific containers for mobility of compute. PLoS One, 2017.

[3] Kieser et al., ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data, bioRxiv, 2019. [4] Köster et al., Snakemake - A scalable bioinformatics workflow engine. Bioinformatics, 2012.

[5] Martin. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 2011.

[6] Joshi and Fass. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files [Software]. Available at https://github.com/najoshi/sickle, 2011. [7] Li and Durbin. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 2009.

[8] Menzel et al. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun., 2016. [9] Ondov et al. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 2011. [10] Andrews. FastQC. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc, 2010. [11] Nurk et al. MetaSPAdes: a new versatile metagenomic assembler. Genome Res., 2017.

[12] Li et al. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015. [13] Li at al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009.

[14] Seemann. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 2014.

[15] Fu et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012.

[16] Liao et al. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 2014. [17] Buchfink et al. Fast and sensitive protein alignment using DIAMOND. Nature Methods, 2015.

[18] Langmead and Salzberg. Fast gapped-read alignment with Bowtie 2. Nature Methods, 2012.

[19] Kang et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 2019. [20] von Meijenfeldt et al. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biology, 2019. [21] Ewels et al. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 2016.

Références

Documents relatifs

Whole metagenome analysis with metagWGS Joanna Fourquet, Adeline Chaubet, Hélène Chiapello, Christine Gaspin, Marisa Haenni, Christophe Klopp, Agnese Lupo, Jean Mainguy, Celine

It should be noted that we have used the example of macaque social systems to describe the calculation of the complexity index, but data from a higher number of species would

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

CAG:237 Blautia Lachnospiraceae Clostridiales Clostridia Firmicutes Bacteria CAG:70 1578 genus NA Faecalibacterium Ruminococcaceae Clostridiales Clostridia Firmicutes Bacteria

1 a Short-axis view of transthoracic echocardiography showing the fistula (arrows) with aneurysm (dotted line) between the right pulmonary artery (RPA) and the left atrium (LA). b

The clustering phenomenon is due to the discreteness of birth and death events and cannot be explained in the framework of continuous model of population dynamics.We show by

We describe our efforts to develop a novel resource, the Model Organism Linked Database (MOLD 7 ), which uses Semantic Web technologies to make the knowledge of six model

In this model DNA sequences that do not encode proteins, which represent more than 98% of the human genome, are involved in the definition of DNA crossing points