Categorization of bacterial habitats linking 16S RNA
amplicon sequencing to biotope information
Introduction
Hiba Jabri, Georges Daube & Bernard Taminiau
University of Liège, Fundamental and Applied Research for Animal & Health, Food Microbiology, Food Science Department. B43b Boulevard de Colonster 20 b-4000 Liège
.
During the last decade we have witnessed a rapid expansion of DNA sequencing technologies and their applications, leading to the development of culture-independent methods for the identification and characterization of microbial ecosystems. The resulting massive amount of DNA sequence data forced us to revise the scale of estimated bacterial species diversity and of the diversity of biotopes colonized by bacteria. The aim of this work is to develop a database linking bacterial taxonomy (based on 16S RNA amplicon) and the species detection in ecosystems/biotope information. This database is intended to be fully compatible with amplicon sequencing and metagenetic designs.
Objective
Create a tool for cataloging the biotopes with their bacterial species and enrich the database by extracting pertinent information from sequence associated publications.
Results
Discussion and conclusion
5,353,236.00 6,073,181.00 719,945.00 Host 33% DOI/PUBMED 53% Titles and other information 36%
Raw SILVA database
sequences Data in amplicon sequences with no information related Data with information related to bacteria biotopes
Figure 3 : Compartmentalisation of information related
to our database human 39% soil 28% milk 1% water 11% Animal 12% foodstuff 0.04% fecal 2% plant7% aquaculture 0.03% uranium 0.03%
In this poster, we present our first approach based in syntactic analysis of taxonomic sequences by building a databases using 5,350,236 sequences with useful information. We have founded in the primary focus on host information which represent 33% form collected databases a distribution to human, animals, environment and food. the next stage of this work is to use text mining tools to enrich the database by extracting pertinent information from publication using ontology design and validate this database by exploring a target/specific ecosystems. The final phase of the project will consist of automating the updating of the database and rendering it accessible via the web interface.
Figure 4 :Host source bacterial from amplicon
sequences
Data with source related information
Data associated to publication information Data associated other useful information
• 16s DNA sequence • Accession number
Identity
• Environment • Animal
• Human (skin, gut)…
Biotope
• Bibliography sourceOntology
Figure 2 : size of collected references