• Aucun résultat trouvé

MINIA on a Raspberry Pi, Assembling a 100 Mbp Genome on a Credit Card Sized Computer

N/A
N/A
Protected

Academic year: 2021

Partager "MINIA on a Raspberry Pi, Assembling a 100 Mbp Genome on a Credit Card Sized Computer"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-00842027

https://hal.inria.fr/hal-00842027

Submitted on 6 Jul 2013

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

MINIA on a Raspberry Pi, Assembling a 100 Mbp Genome on a Credit Card Sized Computer

Guillaume Collet, Guillaume Rizk, Rayan Chikhi, Dominique Lavenier

To cite this version:

Guillaume Collet, Guillaume Rizk, Rayan Chikhi, Dominique Lavenier. MINIA on a Raspberry Pi, Assembling a 100 Mbp Genome on a Credit Card Sized Computer. JOBIM - Journées Ouvertes en Biologie, Informatique et Mathématiques, Jul 2013, Toulouse, France. 2013. �hal-00842027�

(2)

Minia on Raspberry Pi

Assembling a 100 Mbp genome on a Credit Card Sized Computer

Guillaume Collet, Guillaume Rizk, Rayan Chikhi, Dominique Lavenier

References

[1] G. Rizk, D. Lavenier, and R. Chikhi (2013). DSK: k-mer counting with very low memory usage. Bioinformatics, 29(5), 652-653.

[2] R. Chikhi and G. Rizk (2012). Space-efficient and exact de Bruijn graph representation based on a Bloom filter. In Lecture Notes in Computer Sci- ence (Ed.), wabi (Vol. 7534, pp. 236–248).

[3] D. Zerbino and E. Birney (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821-829.

[4] R. Li et al. (2012). SOAPdenovo2: an empirical improved memory-efficient short-read de novo assembler. GigaScience, 1(1), 1-6.

[5] G. Alexey et al. (2013). QUAST: quality assessment tool for genome as- semblies. Bioinformatics, 29(8), 1072-1075.

References

Compact de Bruijn graph data structure

CGC GCT

AGC

CGA

GAG

AAA TGG

GGA

TTG ATT

CTA

ATC TAT TCC

CCG

Real Kmer

Critical False Positive

False Positive

Bloom Filter 10

00 01 10 00

k

i=1

aii mod 10 a1...ak

ATC CCG

TCC

CGC 0

0 5 6

Critical False Positives

ATT

CGA

AGC

+

Figure 2: The probabilistic de Bruijn graph representation is obtained by inserting all the k-mers in a Bloom filter. Querying the Bloom filter for the membership of a k-mer may return a false positive answer. To avoid false positives, and consequently false branching, we propose to store the critical false positives only, in a separate structure. Thus false positives are not reachable.

K-mer counting with DSK

First Pass

Second Pass

Sort in RAM

Write on disk Max in RAM

Max on disk

Kmer Count

>read01

>read02

Figure 1: K-mer counting is performed by the fixed-memory and fixed-disk space algorithm DSK (Disk Streaming of K-mers). The set of k-mers is divided in partitions (colored boxes). Each k-mer is written only once on disk. Then, each partition is sorted and k-mers are counted. The trade-off between memory, disk-space, and computation allows to use DSK on a very small system.

C. elegans assembly on Raspberry Pi

Method Minia SOAPdenovo Velvet

System Raspberry Pi 64GB/Xeon E5462 64GB/Xeon E5462

CPU Time (h) 18.9 6.25 13.5

Peak memory (GB) 0.2 29.6 30.6

Number of contigs (K) 29.5 29.5 28.2

Longest contig (Kbp) 75.2 90.9 62.6

Contig N50 (bp) 5741 5975 6031

Sum (Mbp) 86.4 88.3 90.4

Misassemblies 12 7 419

Genome fraction (%) 80.9 82.8 85.0

mismatches (per 100 kbp) 3.2 0.75 25.6

Table 1: De novo C elegant contigs assembled by Minia [2], SOAPdenovo2 [4], and Velvet [3]. Assembly quality was computed using the QUAST software [5].

MINIA and Velvet were single-threaded. For SOAPdenovo2, the CPU time is the sum for each thread.

Our experiment consists in assembling the nematode

C. elegans. We used

33 million unfiltered paired-end reads of length 100 bp (SRR065390), covering the genome at about 64x. Paired-end information was not used.

Picontigotron

Live demo during

the poster sessions

Raspberry Pi

®

35€ARM11 700MHz 512 MB RAM 16 GB sdcard

32 GB flash drive

MINIA applications

Human genome assembly with less than 6 GB RAM

http://kissplice.prabi.fr/

https://colibread.inria.fr/mapsembler2/

https://colibread.inria.fr/read2snps/

http://minia.genouest.org/

MINIA: contig de novo assembler

This work shows that the genome assembly program MINIA is able to assem- ble a 100 Mbp genome on a Raspberry Pi. The MINIA software was developed to drastically reduce the memory footprint needed for genome assembly, en- abling human genomes to be assembled on a desktop computer. The efficiency of MINIA is based on the DSK k-mer counting [1] and a compact de Bruijn graph data structure [2]. Here we show that it is also able to successfully as- semble a genome on a very low-end, low-power system with 512 MB RAM and a 32 GB flash drive.

Références

Documents relatifs

Capturing images and video using a Raspberry Pi Camera Module is a great way to get started with programming and making computing tangible:.. students can use button presses,

Pi model B and case, from PC Supplies LTD UP - PART 2 1 2 BACKING Keeping the SD card images safe and restoring backups GAFFER TAPE 1 6 BASH Learn some lashup scripts with

You’ve got yourself a brand new Raspberry Pi and you’re ready to start using it for learning code, creating amazing projects, or just simply to power a home theatreW. While

In order to build the project, you’ll need a Raspberry Pi, official Camera Module, and that fake CCTV housing.. “The Camera Module is the most expensive part,”

Most of the robot will be easy to take apart – just remove a few wires – but you’ll also have to snip those cable ties and probably unsolder the battery pack and extra wires on the

Open-source licences are designed to make it easy to share your software and hardware with the community of makers, allowing other people to have fun building your project,

Pour ne garder que les contours d'une image, on peut rendre celle-ci floue puis réaliser la différence entre l'image de base et sa version floue.. Ceci ne conserve que

Matériel utilisé : 1 LED, 1 Photodiode, 1 Résistance pour chaque, 1 Bouton Difficulté : Moyen (pour l’arrêt du programme avec le bouton). Démonstration de notre détecteur