Categorization of bacterial habitats linking 16S RNA amplicon sequencing to biotope information

(1)

Categorization of bacterial habitats linking 16S RNA

amplicon sequencing to biotope information

Introduction

Hiba Jabri, Georges Daube & Bernard Taminiau

University of Liège, Fundamental and Applied Research for Animal & Health, Food Microbiology, Food Science Department. B43b Boulevard de Colonster 20 b-4000 Liège

.

During the last decade we have witnessed a rapid expansion of DNA sequencing technologies and their applications, leading to the development of culture-independent methods for the identification and characterization of microbial ecosystems. The resulting massive amount of DNA sequence data forced us to revise the scale of estimated bacterial species diversity and of the diversity of biotopes colonized by bacteria. The aim of this work is to develop a database linking bacterial taxonomy (based on 16S RNA amplicon) and the species detection in ecosystems/biotope information. This database is intended to be fully compatible with amplicon sequencing and metagenetic designs.

Objective

Create a tool for cataloging the biotopes with their bacterial species and enrich the database by extracting pertinent information from sequence associated publications.

Results

Discussion and conclusion

5,353,236.00 6,073,181.00 719,945.00 Host 33% DOI/PUBMED 53% Titles and other information 36%

Raw SILVA database

sequences Data in amplicon sequences with no information related Data with information related to bacteria biotopes

Figure 3 : Compartmentalisation of information related

to our database human 39% soil 28% milk 1% water 11% Animal 12% foodstuff 0.04% fecal 2% plant7% aquaculture 0.03% uranium 0.03%

In this poster, we present our first approach based in syntactic analysis of taxonomic sequences by building a databases using 5,350,236 sequences with useful information. We have founded in the primary focus on host information which represent 33% form collected databases a distribution to human, animals, environment and food. the next stage of this work is to use text mining tools to enrich the database by extracting pertinent information from publication using ontology design and validate this database by exploring a target/specific ecosystems. The final phase of the project will consist of automating the updating of the database and rendering it accessible via the web interface.

Figure 4 :Host source bacterial from amplicon

sequences

Data with source related information

Data associated to publication information Data associated other useful information

• 16s DNA sequence • Accession number

Identity

• Environment • Animal

• Human (skin, gut)…

Biotope

_{• Bibliography source}

Ontology

Figure 2 : size of collected references

Categorization of bacterial habitats linking 16S RNA amplicon sequencing to biotope information