• Aucun résultat trouvé

LONG DNA TECHNOLOGIES EVALUATION FOR PLANT STRUCTURAL VARIATIONS DETECTION : NANOPORE ONT SEQUENCING vs BIONANO GENOMICS OPTICAL MAPPING

N/A
N/A
Protected

Academic year: 2021

Partager "LONG DNA TECHNOLOGIES EVALUATION FOR PLANT STRUCTURAL VARIATIONS DETECTION : NANOPORE ONT SEQUENCING vs BIONANO GENOMICS OPTICAL MAPPING"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02925940

https://hal.inrae.fr/hal-02925940

Submitted on 31 Aug 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

LONG DNA TECHNOLOGIES EVALUATION FOR PLANT STRUCTURAL VARIATIONS DETECTION :

NANOPORE ONT SEQUENCING vs BIONANO GENOMICS OPTICAL MAPPING

Aurélie Canaguier, Erwan Denis, Romane Guilbaud, Ghislaine Magdelenat, Caroline Belser, Benjamin Istace, Cruaud Corinne, Arnaud Lemainque,

Patrick Wincker, Marie-Christine Le Paslier, et al.

To cite this version:

Aurélie Canaguier, Erwan Denis, Romane Guilbaud, Ghislaine Magdelenat, Caroline Belser, et al.. LONG DNA TECHNOLOGIES EVALUATION FOR PLANT STRUCTURAL VARIATIONS DE-TECTION : NANOPORE ONT SEQUENCING vs BIONANO GENOMICS OPTICAL MAPPING. Plant and Animal Genome XXVIII Conference, Jan 2020, San Diego, United States. �hal-02925940�

(2)

Discordance

3- SV ONT vs SV Bionano

LONG DNA TECHNOLOGIES EVALUATION

FOR PLANT STRUCTURAL VARIATIONS DETECTION :

NANOPORE ONT SEQUENCING vs BIONANO GENOMICS OPTICAL MAPPING

Plant Biology and Breeding Division

F-78026 Versailles, France

http://www.bap.inra.fr/

INTRODUCTION

Structural Variations (SVs) identification remains in infancy in plants mainly due to the limited useful genomic resources. Long-read sequencing obtained with the MinION instrument (Oxford Nanopore Technologies (ONT)) and DLE optical maps produced by the Saphyr device (Bionano Genomics), were evaluated on their ability to identify SVs, taking as study model two Arabidopsis thaliana ecotypes, Columbia-0 (Col-0) and Landsberg erecta-1 (Ler-1).

Aurélie Canaguier

1

, Erwan Denis

2

, Romane Guilbaud

1

, Ghislaine Magdelenat

3

, Caroline Belser

3

, Benjamin Istace

3

,

Corinne Cruaud

3

, Arnaud Lemainque

3

, Patrick Wincker

3

, Marie-Christine Le Paslier

1

, Patricia Faivre-Rampant

1

and Valérie Barbe

2

Technology ONT Bionano

Ecotype Col-0 Ler-1 Col-0 Ler-1

Cumula&ve size (Mb) sequences or molecules 13 036 8 174 577 460 610 944 N50 size (Kb) 12 14 153 135 Depth haploide genome of 130Mb X100 X63 X4 440 X4 700 Cumulated size (Mb) Trimmed seq / sampled molecules 9 817 6 141 90 000 75 000 N50 size (Kb) 12.7 16.5 259 300 Depth haploide genome of 130Mb X75 X47 X700 X580 Number of con=gs or cmaps 79 101 18 14 N50 size in Mb (number) 12.5 (5) 10.7 (5) 15.5 15.4 Assembly size (Mb) 117 117 133 131

Technology Ecotype Ref. # SV Cumulated Size (Mb) Median Size (bp) # INS # DEL # INV # Others

ON T Ler-1 Col-0 1 187 12.3 3 455 591 579 12 5 Col-0 Ler-1 1 047 5.4 3 180 537 496 10 4 Bi on an o Ler-1 Col-0 589 6.7 4 332 288 293 6 2 Col-0 Ler-1 520 3 3 741 280 236 4 0

CONCLUSION

99% of the SV detected were INS and DEL. Absence of duplication could be explained by absence or by elimination of alignments shorter than 10kb in ONT analysis. In Bionano analysis it may be due to the incapacity to catch such SV precision or too high divergence between the duplicated sequences.

ONT data analysis showed around two fold SV than Bionano, more over the spectra of size is larger. As expected, Bionano SV are bigger and fewer. In our study, 97% of SV detected with Bionano data are detected with ONT, more than 83 % of them are concordant. ONT and Bionano SV are respectively at 86% and 100% located around genes and/or TE. Whereas Bionano did no SV detection in complex and too divergent regions, MUMmer can detect some. ONT SV are more fragmented and this explains 10 to 14% of conflicts by number between both technologies. Description and relevance have to be carefully check.

With our draft ONT assembly we recovered 80% of SV published in Zapata et al (2016). When we tested the detection with several sampling of corrected and trimmed sequences, we demonstrated that the assembly

obtained with X20 sequences depth, allowed catching more than 90% of the ONT SV.

Where Bionano Access is an integrative way from molecules filtering to SV detection, applying quality control based on the coverage and the label density, the ONT SV detection is the combination result of alignment, filtering parameters and detector tool for the description. According to these results, taking into account cost and difficulties to develop standard protocol of HMW plant DNA extraction, ONT technology seems to be more affordable to detect SV to date.

* filters : not in reference gap, not in organelles, types insertion (INS) deletion (DEL) inversion (INV), SV size > 1kb.

Figure 1. Analysis pipeline from ONT and Bionano data to SV comparison

MATERIAL AND METHODS

Arabidopsis thaliana Col-0 (accession 186AV) and Ler-1 (accession 213AV) seeds were obtained from the

Versailles Arabidopsis Stock Center, INRA (France).

SV were obtained using Ler-1 and Col-0 assembly and cmap, respectively versus the Col-0 reference (Arabidopsis Genome Initiative, 2000) and the Ler-1 reference (Zapata et al 2016).

Table 1. ONT and Bionano metrics

Table 2. Filtered SV and other rearrangements (jump & translocation) description

Figure 3. Distribution of filtered Ler-1 SV along the Col-0 Chr4 ( centromer, 100-Kb windows) Figure 4. Filtered SV discordance and concordance between ONT and Bionano detection

1 Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux (EPGV), 91000, Evry, France, 1 IR INRAE Génomique ,

2 Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France , 3 Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, 91057 Evry, France.

Chr4 Col-0 cmap SV Ler-1 cmaps B

2- SV description

No duplication was reported in this study.

MUMmer plot output

Forward alignment Reverse alignment Bionano Access output

Col-0 reference molecules Ler-1 query molecules

Links between common labels Common labels Divergent labels Deletion in query Insertion in query B A Chr4 Col-0 reference Unitig L er-1 SMAR T d e no vo a sse mb ly A Data Processing Data Alignment SV Detec=on SV Filter * Data Assembly

Molecules sampling with Bionano Access

Bionano Access

-  with reference, genome size of 115Mb

-  Without Extend and Split

Perl script: file2.vcf to file2.bed and SV selection

SV Comparison

bedtools intersect -wa -wb -a file1.bed/file2.bed -b file2.bed/file1.bed –loj

Two SV are considered as equivalent as soon their absolute positions overlap by one base.

Sequences correction and trimming with Canu (Default parameters)

MUMmer (parameters from Zapata et al, 2016) nucmer (-c 100 -b 500 -l 50 -g 100 -L 50) delta-filter (-1 -l 10000 -i 0.95)

MUMmer show-diff

Perl script: file1.diff to file1.bed and SV selection SMARTdenovo (-k 17 -c 1 -b)

Inversion

Zapata et al, 2016

SV ONT specific - Chr01:9,835,000..9,890,000 SV Bionano specific - Chr04:11,625,000..11,672,000 (INV)

Bi on an o cma ps Col-0 Ler-1 ONT Asse mb ly&ma pp in g

ONT-Bionano conflict of SV type

Chr05:13,420,000..13,468,000

Bionano-ONT conflict of SV number

Chr01:10,568,000..10,610,000 Concordance Bi on an o cma ps Col-0 Ler-1 ONT Asse mb ly&ma pp in g Bi on an o cma ps Col-0 Ler-1 ONT Asse mb ly&ma pp in g Ecotype Ref.

ONT ONT – Bionano Bionano Bionano – ONT

specific by number/by type conflict concord. specific by number/by type conflict concord. Ler-1 Col-0 25.0 % 27.0 % / 0.4 % 47.6 % 1.0 % 14.5 / 1.2 % 83.3 % Col-0 Ler-1 38.0 % 14.3 % / 0.4 % 47.3% 2.3 % 11.0 / 1.0 % 85.7 %

Table 3. Percent of filtered SV, discordant (specific, conflictual by number or by type) and concordant (concord.) between ONT and Bionano data analysis

RESULTS

1- Assembly & Alignment : metrics and visualization

EPGV group contact

Patricia.Faivre-Rampant@inra.fr

Poster Online

ONT DEL ONT INS

ONT INS

ONT DEL Bionano INS

ONT INS ONT DEL

Bionano INS

B A

ONT-Bionano DEL - Chr04:790,000..850,000 ONT-Bionano INS - Chr04:10,850,000..10,880,000

Bionano DEL

ONT DEL

Bionano INS

ONT INS Figure 2. Alignments of Ler-1 assembly and cmaps against the Chr4 Col-0 reference

Bionano INV # va ria nt s D LE si te s C ove ra ge # T E # g en es

Références

Documents relatifs

Each sequence stored in GreenPhylDB is accessible through a specific page that contains sequence information such as TIGR or TAIR sequence ID and annotations, cross references

I-GOST (Iterative GreenPhyl Orthologous Search Tool) is a fast and user friendly comparative genomic program using a phylogeny-based method to infer orthologs and paralogs in

Each sequence stored in GreenPhylDB is accessible through a specific page that contains sequence information such as TIGR or TAIR sequence ID and annotations, cross references

Many studies have highlighted that plant formations (i.e. vegetation units with similar physiognomy such as woods, shrubs, fens, and grasslands) can be mapped with high accuracy

We use Minimap2 [10] (version 2.16-r922), with default parameters, as it is a fast and accurate mapper, specifically designed for long erroneous reads... Two corresponding to

Method We propose a novel method that aims at assigning a genotype for a set of already known SVs in a given individual sample sequenced with long reads data. In other words, the

Then, the data structure is requested with the seed matches extracted from the reads, each read being assigned to a taxonomy level with respect to

Comparing error rates between ONT 1D RNA reads and the simulated reads shows that the simulator produces reads with realistic error profiles:. Metric ONT 1D RNA reads