Project 2010
Aims:
GNPAnnot is a project on green genomics which intends to develop a community system of structural and functional annotation supported bycomparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects. Gaëtan Droc1, Valentin Guignon1, Franc-Christophe
Baurens1, Vincent Jouffe1, Claire Poiron1, Juliette
Lengellé1, Olivier Garsmeur1, Mathieu
Rouard2,Stéphanie Bocs1
2 CfL, Bioversity, Montpellier
Michael Alaux3, Leatitia Brigitte3, Delphine Steinbach Samson3, Erik Kimmel3, Cyril
Pommier3, Isabelle Luyten3, Nancy Terrier5, Philippe Leroy4, Hadi Quesneville3
4 UMR GDEC, INRA, Clermont 5 UMR SPO, INRA, Montpellier
Joelle Amselem6, Baptiste Brault6, Adeline
Simon6, Victoria Dominguez Del Angel6,
Claire Hoede6, Sabine Fillinger6, Michel
Meyer6, Thierry Rouxel6, Marc-Henri
Lebrun6
6 UMR BIOGER, INRA, Versailles 7 UMR BIO3P, INRA, Le Rheu
BIVI
8 UMR BIVI, INRA, Montpellier
References: http://www.gnpannot.org/ http://www.gmod.org Contact: stephanie.sidibe-bocs@cirad.fr michael.alaux@versailles.inra.fr fabrice.legeai@rennes.inra.fr joelle.amselem@versailles.inra.fr
Done with plant, insect, fungal genomic sequences:
- Predictions of protein-coding genes and transposable elements- CAS core roundtrips: Chado, GBrowse, Apollo, Artemis - Feature, qualifier, value, annotation rule definitions
- Annotator training courses & manual curation of biological features - GMOD report development
- Chado controller development to manage access rights, annotation inspector & history
- In collaboration with GnpIntegr project, advanced search user interface / query builders: Biomart, Hibernate search (lucene)
- Communications (posters, talks, Web site) 1 UMR DAP, CIRAD, Montpellier
Fabrice Legeai7, Goulven Kerbellec9, Olivier Collin10,
Jean-Pierre Gauthier7, Emmanuelle d’Alençon8, François
Cousserans8, Philippe Fournier8, Denis Tagu7
9 Korilog SARL, Muzillac
10 IRISA, INRIA, Rennes 3 URGI, INRA, Evry
Place Subject Unit Organism predicted curated current predicted curated current DAP Banana 7.13 1378 441 1298 3836 1279 2095 CfL Palm tree 0.27 43 30 41 5 5 9 Sugarcane 1.30 133 113 URGI Grapevine 480.00 26346 SPO GDEC Wheat 3B 18.21 175 175 10782 3222 3222 Botrytis 39.50 16360 1096 32 Leptosphaeria 44.90 12469 0 1850 472 Tuber 124.90 7496 1307 2520 0 BIO3P Aphid 460.00 34821 1926 34547 498474 ~800 498474 BIVI Lepidopteran 4.00 1086 70 1086 2027 0 2027 In progress Wheat & grapevine BIOGER Gene nb TE nb Genomic size (Mb) In progress Montpellier Versailles Rennes South & Tropical plants Fungi Insects
Component core Montpellier
Gene structure automatic annotation EuGène EuGène TriAnnot
Gene function & genome comparison in-house pipeline Funannot pipeline MAUVE
TE automatic annotation REPET REPET REPET
SGBD Postgres Chado Postgres Chado MySQL BioDBSeqFeat Postgres Chado
Genome browser GBrowse GBrowse GBrowse
Genome editor Artemis Apollo Apollo
Synteny Viewer Apollo Cmap
Search & query builder Biomart Hibernate search Apache Lucene
Versailles Rennes
Results:
Architecture of GNPAnnot CAS in three bioinformatics platformsGNPAnnot CAS resource statistics
Database Storage CHADO
with controller
Gene Databanks
Uniprot (Swiss-Prot, TrEMBL) Genbank / EMBL / DDBJ
EST databanks
MSU (rice) / TAIR (arabidopsis)
TE Databanks
Repbase TREP
Plant Repeat Database Internal Ontologies Sequence Ontology Gene Ontology Feature Property Prediction pipeline Annotation storage Annotation Browser Annotation Editor GFF3 Query Builders Intermine BioMart
Hibernate search GnpIntegr
Comparative Genomics viewers
CMAP
GBrowse_syn
Apollo synteny viewer Artemis Comparison Tool
Genome Browsers
GBrowse with access rights GMOD report
Annotation history
Genome Editors
Apollo
Artemis with inspector
Gene automatic annotation
EuGene
Repeat automatic annotation
Structure Combiner (nucleotide)
- BLASTER / BLASTn - RepeatMasker - CENSOR - MATCHER - TRF - Mreps - BLASTER / tBLASTx
Other repeat analyses
- RepSeek - LTR_STRUC - LTR_Finder - LTRharvest - TE nest - FINDMITE REPET
Structure combiner (nucleotide)
- EugeneIMM - FGENESH - SpliceMachine - Gth - BLASTx Refinement
structure (nucleotide region) function (protein)
- Gth
- tBLASTn | prot4EST | frameDP
- Exonerate - BLASTp / BBMH - InterProScan Comparative Genomics - Ensembl Compara - Greenphyl
Other ISs (e.g. GnpIS)
DDBJ / EMBL / GenBank EST (GnpSeq, ESTtik)
Marker (SIReGal, TropGene) Metabolism (BioCyc, KEGG)
Interoperability faa GFF3 fna GFF3 EMBL fna faa db_xref EC_number
Team work (Wiki, Alfresco, JIRA, CVS/SVN, Drupal) GFF3 nwk clustalW Other storage Ensembl CMAP GBrowse_syn Intermine BioMart Flat files
Concept:
The Community Annotation System (CAS) is user-friendly, generic,modular, portable, sustainable, upgradable and compatible
History
GBrowse
Artemis
GMOD report
Ongoing work:
- JBrowse- Annotation extractors, reconcilers & updaters (new genomic sequence, new gene annotation, other gene annotation set, new assembly of a genomic sequence)
- Comparative genomics
- Bioinformatics platform exchanges
- Integration of annotation history in the GMOD report - Interoperability with other systems
- Communication (CECILL licences, publications)