• Aucun résultat trouvé

Genome v2.0 An Improved Version of the Genome for Genetic and Functional Genomic Studies X. Argout

N/A
N/A
Protected

Academic year: 2021

Partager "Genome v2.0 An Improved Version of the Genome for Genetic and Functional Genomic Studies X. Argout"

Copied!
6
0
0

Texte intégral

(1)

The Cacao Criollo

Genome v2.0

An Improved Version of the Genome for Genetic and

Functional Genomic Studies

X. Argout

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Introduction

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Introduction : Genome v1.0

Strategy:

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Introduction : Genome v1.0

Assembly:

473.8

178

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Introduction : Genome v1.0

Assembly:

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Introduction

Why improving the Criollo genome?

Gene coverage : 98%

Genome anchored : 66,8%

Many genes located in the unkown chromosome (5269)

Important for genetic studies (GWAS, QTL resolution)

Candidate genes studies

Important for genomic selection

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

(2)

Introduction

Why the Criollo genome was fragmented?

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Genome

Repeated sequences

Scaffolding with small insert size libraries

Assembly STOP

Contig assembly

Contigs

TEs = 35,4%

Introduction

Solution

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7 Contig assembly

Genome

Repeated sequences

Contigs

Scaffolding with large insert size libaries

OR

Criollo genome V2

Materials

Assembly V1 contigs

Illumina mate Paired libraries :

•3-5kb : cov. 23x

•5-8kb : cov. 21x

•8-11kb : cov. 11x

•11-15kb : cov. 6x

8x PacBio data error corrected

Bac ends v1

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Materials

Progeny UF676 x ICS95 held in French Guiana

Genotyping By Sequencing data for 450 individuals

4 857 SNPs for scaffold anchoring

Cocoa chloroplast genome (Kane et al., 2012)

Cotton mitochondrion genome (Liu et al., 2012)

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 1 : Chloroplast and mitochondrion contig Identification

Chloroplast

Sequence homology search> 80%

37 contigs v1 removed

Mitochondrion

No cocoa mitochondrion available yet

Cotton sequence homology search

21 contigs v1 removed

removed contigs < 1000 bp

25 527 contigs kept from the 25 912

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 2 : contigs v1 with PE 3-5kb

Scaffolding test with SSPACE

Scaffold validation with genetic data

Composite scaffolds (scaffolds with genetic markers located in

different linkage groups)

Hypothesis : composite contigs v1?

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

(3)

Criollo genome V2

Step 2 : contigs v1 with PE 3-5kb

ScaffRemodler

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 2 : contigs v1 with PE 3-5kb

Identification of 25 composite contig

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 3 : Scaffolding PE 3-5kb

Consistent with genetic data

24 scaffolds to split (ScaffRemodler)

Inserted contig file:

Total number of contigs = 25527 Sum (bp) = 290549573 Total number of N's = 294 Sum (bp) no N's = 290549279 GC Content = 34.20% Max contig size = 189922 Min contig size = 1001 Average contig size = 11382 N25 = 36708

N50 = 19777 N75 = 9714

After scaffolding lib3-5: Total number of scaffolds = 4383 Sum (bp) = 303905568 Total number of N's = 13371700 Sum (bp) no N's = 290533868 GC Content = 34.20% Max scaffold size = 2464003 Min scaffold size = 1004 Average scaffold size = 69337 N25 = 378444 N50 = 189145 N75 = 86916 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Ex

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 4 : Scaffolding PE 5-8 kb

Consistent with genetic data

Inserted contig file; Total number of contigs = 4407 Sum (bp) = 303893817 Total number of N's = 13367168 Sum (bp) no N's = 290526649 GC Content = 34.20% Max contig size = 2464003 Min contig size = 1004 Average contig size = 68957 N25 = 378444

N50 = 188269 N75 = 86304

After scaffolding lib5-8: Total number of scaffolds = 1906 Sum (bp) = 312317588 Total number of N's = 21790939 Sum (bp) no N's = 290526649 GC Content = 34.20% Max scaffold size = 2803292 Min scaffold size = 1004 Average scaffold size = 163860 N25 = 894026 N50 = 439422 N75 = 226508 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 5 : Scaffolding PE 8-11kb

Consistent with genetic data

Inserted contig file; Total number of contigs = 1910 Sum (bp) = 312312945 Total number of N's = 21786304 Sum (bp) no N's = 290526641 GC Content = 34.20% Max contig size = 2803292 Min contig size = 1004 Average contig size = 163514 N25 = 894026

N50 = 439422 N75 = 226508

After scaffolding lib8-11: Total number of scaffolds = 1271 Sum (bp) = 315916265 Total number of N's = 25389624 Sum (bp) no N's = 290526641 GC Content = 34.20% Max scaffold size = 3771893 Min scaffold size = 1004 Average scaffold size = 248557 N25 = 1211276 N50 = 709384 N75 = 343075 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

(4)

Criollo genome V2

Step 6 : Scaffolding PE 11-15kb

Consistent with genetic data

Inserted contig file; Total number of contigs = 1271 Sum (bp) = 315916265 Total number of N's = 25389624 Sum (bp) no N's = 290526641 GC Content = 34.20% Max contig size = 3771893 Min contig size = 1004 Average contig size = 248557 N25 = 1211276

N50 = 709384 N75 = 343075

After scaffolding lib11-15: Total number of scaffolds = 980 Sum (bp) = 318241244 Total number of N's = 27714603 Sum (bp) no N's = 290526641 GC Content = 34.20% Max scaffold size = 4705272 Min scaffold size = 1004 Average scaffold size = 324735 N25 = 1580640 N50 = 906533 N75 = 467617 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step7 : Scaffolding Bac Ends

Consistent with genetic data

Inserted contig file; Total number of contigs = 981 Sum (bp) = 318235570 Total number of N's = 27708931 Sum (bp) no N's = 290526639 GC Content = 34.20% Max contig size = 4705272 Min contig size = 1004 Average contig size = 324399 N25 = 1580640

N50 = 906533 N75 = 467617

After scaffolding Libsanger: Total number of scaffolds = 554 Sum (bp) = 325168055 Total number of N's = 34641416 Sum (bp) no N's = 290526639 GC Content = 34.20% Max scaffold size = 14867920 Min scaffold size = 1004 Average scaffold size = 586945 N25 = 9230816 N50 = 5324109 N75 = 2107648 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 8 : Gap closing

554 scaffolds = 325 Mb

Ns = 10.6% of the assembly

Mix of PacBio + Illumina PE : 5.6% in final assembly

Next step, anchor scaffolds to the 10 chromosomes

Caca

o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 9 : Chromosome anchoring

4857 SNP markers

Grouping with Joinmap software

Pairwise data export and study of recombination frequencies

between markers to anchor and orientate scaffolds (ScaffHunter

program

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

(5)

Criollo genome V2

Step 9 Chromosome anchoring

10 chomosomes

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7 Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Comparison V1 - V2

Final assembly : 554 scaffolds (4,792 V1)

N50 : 6,5 Mb (0,47Mb V1)

Genome anchored : 314,2Mb (218,4Mb V1) = 96,7% anchored

High reduction of unknown Chr : 10,5Mb (108,5Mb V1)

Unknown sites (Ns) : 5.7% (10.8%)

99% of genes anchored to chromosomes

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

V1

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Tc00 integration into v2

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Tc00 integration into v2

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

(6)

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Criollo genome V2

Step 11 : Annotation

Structural annotation transferred from v1 to v2

New annotation carried out by NCBI (RefSeq)

Update of functional annotation Blast Swissprot, Interpro, KEGG

Availability

http://cocoa-genome-hub.southgreen.fr

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Comparison V2 Criollo/Matina1-6

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

http://cocoa-genome-hub.southgreen.fr

Ca ca o A d va n ced O mi cs W o rk sh o p . P A G 2 01 7

Guillaume MARTIN

Gaëtan DROC

Claire LANAUD

Karine LABADIE

Jean Marc AURY

Références

Documents relatifs

Dans ce contexte, et pour soutenir les efforts postérieurs à la génomique, nous avons développé le Centre du Génome du Cacao (Cocoa Genome Hub)

This Integrated System relies on: (i) pipelines for Transposable Elements annotation (REPET) and gene structural and functional predictions (ii) databases and user-friendly

To face this challenge, the URGI (http://urgi.versailles.inra.fr) platform aims at providing tools to annotate entirely sequenced genome comprising: pipelines, databases

In theory, similar or better prediction accuracies than those reported in this study can be obtained if data on all the individuals in the meta-GWAS are available, and if they

GP with genic SNPs from WGS (the WGS_genic data) provided the highest predictive abil- ity compared to that obtained when all SNPs from WGS data were used. This implies that

15 Number of significant (at the genome‑wise false discovery rate of 5%) SNPs that have a pleiotropic effect on calving performance and body conformation traits using the

Genome-wide association studies (GWAS) are an efficient approach to identify quantitative trait loci (QTL), and genomic selection (GS) with high-density single nucleotide

W e report here the complete genome sequence of Streptomy- ces ambofaciens DSM 40697 (1), which is used as a model for genome plasticity and genome evolution studies.. A total of