• Aucun résultat trouvé

GATB: a software toolbox for genome assembly and analysis

N/A
N/A
Protected

Academic year: 2021

Partager "GATB: a software toolbox for genome assembly and analysis"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01088641

https://hal.archives-ouvertes.fr/hal-01088641 Submitted on 9 Dec 2014

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

GATB: a software toolbox for genome assembly and analysis

Erwan Drezen, Guillaume Rizk, Rayan Chikhi, Charles Deltel, Claire Lemaitre, Pierre Peterlongo, Dominique Lavenier

To cite this version:

Erwan Drezen, Guillaume Rizk, Rayan Chikhi, Charles Deltel, Claire Lemaitre, et al.. GATB: a software toolbox for genome assembly and analysis. Bio-IT World Conference, Apr 2014, Boston, United States. �hal-01088641�

(2)

GATB-CORE GATB-TOOLS

GATB-PIPELINE PARTIESTHIRD

GATB's de Bruijn graph: a basis for families of tools

► Data error correction

Assembly

Biological motif detection

Several tools based on GATB are already available

Bloocoo K-mer spectrum based read error corrector for large datasets

Minia Short read assembler based on a de Bruijn graph. Results are of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet)

DiscoSNP Discover Single Nucleotide Polymorphism (SNP) from

non-assembled reads

TakeABreak Detects inversion breakpoints without a reference genome by

looking for fixed size topological patterns in the de Bruijn graph

5. GATB helps you as a NGS user

The GATB C++ library gives you the opportunity to quickly develop new NGS tools that fit your needs.

Major facts about the GATB C++ library

► Object oriented design

► Simple and powerful de Bruijn graph API

► Simple and powerful multithreading model

► HDF5 usage for data storage

► Full Doxygen documentation including many code samples

► Test suite that checks the correct behavior of the library

6. GATB helps you as a NGS developer

Here is a typical workflow when working with GATB

GATB-CORE transforms the reads into a de Bruijn graph, saves it in a HDF5 file that can be opened by other tools developed with the GATB-CORE API.

The core data structure of GATB is a de Bruijn graph that encodes the main information from the sequencing reads. Strength of GATB

GATB makes this graph compact by using a Bloom filter (a space efficient probabilistic

data structure) and by using a CFP additional structure that avoids false

positive answers from the Bloom filter due to its probabilistic nature.

The GATB philosophy proposes a 3-layer construction to analyze NGS datasets

GATB

GATB

The Open Source Toolbox for

The Open Source Toolbox for

Genomic Assembly & Analysis

Genomic Assembly & Analysis

1. What is GATB ?

Motivation

NGS technologies produce terabytes of data. Efficient

and fast NGS algorithms are essential to analyze them.

Objective

The Genome Assembly Tool Box (GATB)

► is an open-source software

► provides an easy way to develop efficient and fast NGS tools

► is based on data structure with a very low memory footprint

► allows complex genomes to be processed on desktop computers

Erwan Drézen

1

, Guillaume Rizk

1

, Rayan Chikhi

2

, Charles Deltel

1

, Claire Lemaitre

1

, Pierre Peterlongo

1

and Dominique Lavenier

1

1 INRIA/IRISA/GenScale, Campus de Beaulieu, 35042 Rennes cedex 2 Department of Computer Science and Engineering, Pennsylvania State University, USA

2. Software Solution

3. Compact de Bruijn graph data structure

4. Workflow

Partners

Publications

1. GATB-CORE: a C++ library holding all the services needed for developing software dedicated to NGS data.

2. GATB-TOOLS: a set of elementary NGS tools mainly built upon the GATB library (k-mer counter, contiger, scaffolder, variant detection, etc.).

3. GATB-PIPELINE: a set of NGS pipeline that links together tools from the previous layer.

License & Web Site

GATB is released under the GNU Affero General Public License.

Proprietary licencing for software editors or services providers is currently being studied.

For more details on GATB:

http://gatb.inria.fr

a whole human genome sequencing reads can be handled with 5 GBytes of memory

How to

How to

analyze complex

analyze complex

genomes on a

genomes on a

simple desktop

simple desktop

computer ?

computer ?

>read 1 ACGACGACGTAGACGACTAGCA AAACTACGATCGACTAT >read 2 ACTACTACGATCGATGGTCGCG CTGCTCGCTCTCTCGCT ... >read 100.000.000 TCTCCTAGCGCGGCGTATACGC TCGCTAGCTACGTAGCT ... CGC GCT CCG TCC CTA TAT ATC CGA AGC ATT GAG AAA GGA TGG TTG

Critical False Positive (additional structure) Real de Bruijn graph node False Positive G ra p h (H D F 5) R ea d s (F A S T A ) >14_G1511837 AGTCGGCTAGCATAG TGCTCAGGAGCTTAA ACATGCATGAGAG >14_G1517637 ATCGACTTCTCTTCT TTTCGAGCTTAGCTA ATCA >14_G1517621 CCGATCGTAGAATTA ATAGATATATAA Library API Binaries GATB-CORE Tool 2 Tool 1 Tool 3 GATB-TOOLS

G. Rizk, D. Lavenier, R. Chikhi, DSK: k-mer counting with very low memory

usage, Bioinformatics, 2013 Mar 1;29(5):652-3

R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation

based on a Bloom filter, Algorithms for Molecular Biology 2013, 8:22

G. Collet, G. Rizk, R. Chikhi, D. Lavenier, Minia on Raspberry Pi, assembling

a 100 Mbp genome on a Credit Card Sized Computer, Poster at the JOBIM

conference, 2013 Jul 1-4 (Toulouse) Best poster award.

K.l Salikhov, G. Sacomoto, G. Kucherov, Using Cascading Bloom Filters to

Improve the Memory Usage for de Brujin Graphs, Algorithms in Bioinformatics,

Références

Documents relatifs

European countries also reject convergence but are driven by a lower number of common stochastic trends..

7 ديدحت( فرحلا – فرعتلا يلع ةملكلا – ةباتك فورح ءاجهلا – ةقلاطلا يف قطن ةملكلا – ةقلاطلا يف قطن )ةلمجلا تمدختساو ةساردلا اارابتخا يف سايق ليصحتلا

Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS/Unless otherwise stated above, the

Le but principal de cette analyse est dans une première étape, de calculer la forme de la ligne d’eau dans une contraction rectiligne de canal, et dans une seconde étape,

Les boulets de broyage ou de concassage sont des éléments de broyeurs utilisés dans les cimenteries; ils exigent une résistance à l’usure élevée sous l’action de produits

yeserwes awudem-is ɣer Ssidna Sliman.. Nwala-d amek ttwalin yinagmayen tamiḍrant “timawit” i yellan d aferdis agejdan deg unadi-a , am akken i d-yenna unagmay A.Ameziane

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des