• Aucun résultat trouvé

SVJedi : Structural variation genotyping using long reads

N/A
N/A
Protected

Academic year: 2021

Partager "SVJedi : Structural variation genotyping using long reads"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02290884

https://hal.inria.fr/hal-02290884

Submitted on 18 Sep 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

SVJedi : Structural variation genotyping using long reads

Lolita Lecompte, Pierre Peterlongo, Dominique Lavenier, Claire Lemaitre

To cite this version:

Lolita Lecompte, Pierre Peterlongo, Dominique Lavenier, Claire Lemaitre. SVJedi : Structural vari- ation genotyping using long reads. HiTSeq 2019 - Conference on High Throughput Sequencing, Jul 2019, Basel, Switzerland. �hal-02290884�

(2)

SVJedi : Structural variation genotyping using long reads

Lolita Lecompte 1 , Pierre Peterlongo 1 , Dominique Lavenier 1 , Claire Lemaitre 1

1

Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France

Introduction

Structural variations (SVs) are DNA seg- ments that have been moved regarding another reference genome. The number of known SVs is increasing, especially with the development of third generation sequencing technologies, which produce long reads data.

Discovery vs. Genotyping

SV discovery

Given a reference genome, discover SVs be- tween a sequencing sample and a reference genome.

SVIM (2019), Sniffles (2018), pbsv (2018), NanoSV (2017)

SV genotyping

Given a set of already known variants for a species, assess the presence or absence of each variant in a given sequencing sample.

There is no known method for SV genotyping with long reads.

Method - SVJedi

Results on a simulated dataset

Simulated dataset

1,000 deletions selected on the human chromosome 1 from dbVar

(50 bp - 10k bp)

Equally distributed : 0/0, 0/1, 1/1

Long reads simulated with SimLoRD [1]

16 % error rate, 20X sequence depth

SVJedi

0/0 0/1 1/1 ./.

T ruth

0/0 330 3 0 0

0/1 19 305 6 4

1/1 3 10 311 9

Precision : 95.85 %

Does SV discovery tools correctly estimate genotypes?

Sniffles [2]

0/0 0/1 1/1

T ruth

0/1 3 89 0

1/1 0 254 60

Recall : 60.87 % Precision : 36.70 %

80.9 % of 1/1 deletions are wrongly genotyped as 0/1

Results on a real dataset

Dataset

Sample: NA12878

Data: Nanopore WGS consortium (37X)

svclassify [3]: 2,677 high confidence dele- tions

sv classify

SVJedi

0/0 0/1 1/1 ./.

0/1 194 1,289 202 28

1/1 8 75 877 4

Identity : 81.89 % Predicted genotypes = 98.81 %

Runtime : 2h35 (40 CPU)

Conclusion

▶ New method to genotype SVs using long reads data.

▶ Importance of dedicated genotyping methods.

▶ Currently only applied to deletions.

▶ Implementation: SVJedi

References

[1] Bianca K Stöcker, Johannes Köster, and Sven Rahmann. Simlord: simulation of long read data. Bioinformatics, 32(17):2704–2706, 2016.

[2] Fritz J Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, Arndt von Haeseler, and Michael C Schatz. Accurate detection of complex structural variations using single-molecule sequencing. Nature methods, 15(6):461, 2018.

[3] Hemang Parikh, Marghoob Mohiyuddin, Hugo YK Lam, Hariharan Iyer, Desu Chen, Mark Pratt, Gabor Bartha, Noah Spies, Wolfgang Losert, Justin M Zook, et al. svclassify: a method to establish benchmark structural variant calls. BMC genomics, 17(1):64, 2016.

https://github.com/llecompte/SVJedi

Contact: lolita.lecompte@inria.fr

Références

Documents relatifs

Hay que establecer políticas para cada uno de los sectores importantes de la COSUDE (Agencia Suiza para el Desarrollo y la Cooperación); por tanto también para el sector

We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely- used methods, i.e., GATK and SAMtools, which rely

MaskAl combines a fast preprocessing step on raw genomic data — filtering and masking — with established algorithms to align sanitized reads, from which sensitive parts have been

ELECTOR then compares three different versions of each read: the ‘uncorrected’ ver- sion, as provided by the sequencing experiment or by the read simulator, the ‘corrected’

To assess genetic diversity with an improved resolution, we have selected 65 accessions with diploid and triploid combinations of the A and/or B genomes including AAB plantains and

A lattice experiment with three replicates was established in two years for evaluating four important yield traits: plant height (in m), plot weight (in kg), sucrose content (or

Trajectories that are restricted to the edge via this algorithm eventually converge to the edge state, which can be a fixed point like in spatially periodic plane Couette flow [50],

This edit script can easily be retrieved by Daccord, as it relies on DALIGNER (Myers, 2014) to compute actual alignments between the long reads. However, as CONSENT relies on a