• Aucun résultat trouvé

KLAST: a new high-­performance sequence similarity search tool

N/A
N/A
Protected

Academic year: 2021

Partager "KLAST: a new high-­performance sequence similarity search tool"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01088629

https://hal.archives-ouvertes.fr/hal-01088629

Submitted on 9 Dec 2014

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

KLAST: a new high-performance sequence similarity

search tool

Erwan Drezen, Patrick Durand, Dominique Lavenier

To cite this version:

Erwan Drezen, Patrick Durand, Dominique Lavenier. KLAST: a new high-performance sequence

similarity search tool. Bio-IT World Conference, Apr 2014, Boston, United States. �hal-01088629�

(2)

KoriLog

BIOINFORMATICS

Solutions

4 rue Gustave Eiffel

56230 Questembert

France

Phone : + 33 960 368 038

www.korilog.com - klast@korilog.com

KLAST software development by

KLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool

Blast suite of algorithms. Relying on unique software architecture, KLAST takes full advantage

of recent multi-core personal computers without requiring any additional hardware devices.

Tara Oceans benchmark

Tools benchmark

Application on comparative bacterial genomics

KLAST and BLAST benchmark comparison of 8,245 sequences (translated 454 reads) from Tara Oceans metagenomic data against 15 million proteins from Uniprot. Both algorithms ran on 8 Intel Xeon cores. KLAST achieved sequence comparisons 18x times faster than BLAST, while covering up to 96% of the results produced by BLAST.

Benchmark data courtesy of Thomas Vannier and Jean-Marc Aury research team (Genoscope/CEA).

More on this study is available at tinyurl.com/d54ahrb

The SSEARCH, BLAST, USEARCH software are considered

Comparison of 2,329 protein sequences from bacterium A.hospitalis against the SwissProt databank The reference is given by SSEARCH since it implements the rigorous Smith and Waterman algorithm and generates optimal alignments. Alignments are evaluated on a moderate-size dataset due to the long execution time of SSEARCH. The diagrams synthetize the numbers of alignments found by the different softwares (A, B), the number of queries matching the reference databank (C) and the search time (D).

red section reports other alignments. As it can be seen, the total number of alignments can exceed the number of alignments found by SSEARCH. This is mainly due to the fragmentation of long alignments (including large gaps) into shorter ones by KLAST, BLAST and USEARCH.

More benchmarks are available at koriscale.inria.fr

Command-line tool

Workflow and

data analytics platform

Graphical platform

CLC Genomics

workbench

©

Anaximandre 2014

Search time (sec)

ssearch blast klast usearch

A ) B ) C ) D ) 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000

Search time (min

)

blast klast

KLAST integration

Professional version of PLAST

(BMC Bioinformatics, 2009)

Optimized for bank-to-bank

sequence comparisons

Provide high speed and

high quality results

QUERY match [QUERY,HIT] match ALIGNMENT match (overlap 80%) ssearch blas t klas t

usearch ssearch blas t

klas t

usearch ssearch blas t klast usearc h common distinct 326832 317526 1390

E.  DREZEN  –  P.  DURAND  –  D.  LAVENIER  

(3)

Références

Documents relatifs

We previously sequenced and herein investigate the entire genomes of five trypanosomatids which harbor a symbiotic bacterium (SHTs for Symbiont-Haboring Trypanosomatids) and

and Fire Codes play an important role in the evaluation of all buildings for rehabilitation. The current practice followed in some areas of requiring the entire building.. to

The MTCT-Plus Initiative aims to engage pregnant and postpartum women identified as HIV-infected to initiate comprehensive HIV care and treatment for the woman and

Elle regarda avec tristesse son Prince saisir la couronne en rougissant, puis il se tourna vers elle : « Oraën, veux-tu porter cette couronne et régner avec moi au cœur de la

7   Sequenzielle MRT 7   Sequenzielle MRT Das Ausmaß der Nierenvolumenzu- nahme ist ein Surrogatmarker für die  Krankheitsprogression

BLAST first scans the database for words (typically of length three for proteins) that score at least T when aligned with some word within the query sequence. Any aligned word

Query expansion [7] explores the image manifold by recursive Euclidean or similarity search on the nearest neighbors NN at increased online cost.. Graph-based methods [44, 53]

If the mode of the system is Dynamic or Heuristic, the KPIs are computed at runtime, and the system chooses the protocol that has the highest evaluation score E among all protocols