Chado Controller
a database monitor for confidentiality, quality and tracking of genomic annotations
Valentin Guignon
1, 2, Gaëtan Droc
1, Claire Poiron
1, 3, Juliette Lengelle
1, Olivier Garsmeur
1, Franc-Christophe Baurens
1and Stéphanie
Bocs
1Contact: valentin.guignon@cirad.fr 1. UMR DAP, CIRAD, Montpellier, FRANCE 2. Bioversity-France, Bioversity International, Montpellier, FRANCE 3. IMGT IGH, CNRS, Montpellier, FRANCE
1. Context
► Part of GNPAnnot project1
■ plants, insects, funghi Community Annotation System (CAS)
► Integrated to GMOD2 framework:
■ Database: Chado3 (PostgreSQL)
■ Visualisation: GBrowse4 (Perl)
■ Editors: Artemis5/Apollo6 (Java)
► Specific needs:
■ Feature confidentiality → Access Restriction
■ Manual annotation quality → Annotation Inspector ■ Manual annotation tracking → Annotation History
2. Access Restriction
► users and groups handled
► login/password management
(with PostgreSQL account sync.)
► scaffold to feature-level access management ► forbiden/read/write access levels supported
3. Manual Annotation Inspector
► automated procedures
(auto-set qualifiers, transposable element structure)
► validation procedures
(structure, sequence content & length, introns, qualifiers and mandatory fields)
► generalisation of Controlled Vocabulary
References
1. GNPAnnot, http://www.gnpannot.org 2. GMOD, http://gmod.org
3. Chado, http://gmod.org/wiki/Chado
4. Stein LD et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12: 1599-610
5. Artemis: sequence visualization and annotation. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA and Barrell B, Bioinformatics (Oxford, England) 2000;16;10;944-5, PUBMED: 11120685 6. Apollo: a sequence annotation editor. Lewis SE, Searle SMJ, Harris N, Gibson M, Iyer V, Ricter J, Wiel C, Bayraktaroglu L, Birney E, Crosby MA, Kaminker JS, Matthews B, Prochnik SE, Smith CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME (2002). Genome Biology 2002, 3(12):research0082
7. GMOD Report, http://www.aphidbase.com/aphidbase 8. Tripal, http://www.genome.clemson.edu/software/tripal
5. Integration & Compatibility
► GBrowse 1.704 ► Artemis5 ► Compatibility mode (GMOD scripts,...)
4. Annotation
History
► keep track of any modification ► feature history report
► new perspectives
(Inspector, Chado undo,...)
6. Chado
Controller
in action!
► Currently used on more than 5 CAS
► 22 Mb of annotated genomic sequences ► 1982 curated genes out of
5004 predicted genes (40%) ► 2703 curated TEs out of
3819 predicted TEs (70%)
7. Perspectives
► GMOD Report7 integration
► Tripal8 integration
► Apollo6 integration
► GBrowse 24 and next generations
► Chado undo/revert script
► Adaptation to other databases
Figure 2. Access Restriction administration interface - user management.
Figure 1. Access Restriction - GBrowse login block.
Figure 3. Access Restriction administration interface - contig management.
$ ./cc_compatibility.pl -h server.fr -p 5432 -d chado_db -U gnpannot_admin -on Chado Controller Compatibility Manager v1.0.0
Please enter the password to connect to Chado (as gnpannot_admin):
Enable compatibility mode...WARNING: Chado Access Restriction module: set to com patibility mode! Access restriction DISABLED!
WARNING: Chado Annotation Inspector module: set to compatibility mode! Annotatio n Inspector DISABLED!
Done!
$ ... (calls to GMOD scripts for instance)
$ ./cc_compatibility.pl -h server.fr -p 5432 -d chado_db -U gnpannot_admin -off Chado Controller Compatibility Manager v1.0.0
Please enter the password to connect to Chado (as gnpannot_admin):
Disable compatibility mode...INFO: Chado Access Restriction module: compatibilit y mode OFF, access restriction back to normal (enabled)!
INFO: Chado Annotation Inspector module: compatibility mode OFF, Annotation Insp ector back to normal (enabled)!
Done!
Figure 6. Disabling the Chado Controller in order to run demanding scripts.
Note: Annotation History can not be turned off.
Figure 4a. GBrowse initial state
Eugene track:
- orange: automatic prediction - magenta: curation in progress - red: curation finished
Figure 4b. Artemis: start editing gene Ma4001J14_g020.
Figure 4c. Artemis Gene Builder: gene Ma4001J14_g020 initial status.
Figure 4d. Artemis: Annotation Inspector message on a commit.
Figure 4e. Artemis: re-opening the region after the commit.
Figure 4f. Artemis Gene Builder: re-opening
the gene after the commit.
Figure 4g. GBrowse view of the result. Figure 5a. GBrowse History: gene 'Ma4001J14_g020' before curation.
Figure 5b. GBrowse History: gene 'Ma4001J14_g020' after curation.
Figure 4.
The predicted gene 'Ma4001J14_g020' has been edited using artemis to merge 2 exons, to remove the qualifiers 'alternate_splicing' and 'length' and to set the 'annotator_comment' qualifier. The 'owner', 'note' and 'color' quali-fiers were auto-updated by the Anno-tation Inspector. It added the qualifier 'transposable_element_gene' and it also replaced the CV term 'automatic'
by 'curated'.
Contig description
musa, scaffold_0001 (3624 feat.) musa, scaffold_0009 (11934 feat.) musa, scaffold_0021 (14872 feat.) musa, scaffold_0023 (2643 feat.) musatract, scaffold_0094 (11241 feat.) musatract, scaffold_0116 (956 feat.) sugarcane, scaffold_0027 (5321 feat.) theobroma, scaffold_0002 (3724 feat.) theobroma, scaffold_0047 (8040 feat.) coffea, scaffold_0002 (6646 feat.)
Average loading
w/o Chado Controller (sec.)
13,3 17,4 17,3 12,8 13,8 9,5 12,5 16 18,8 11 Average loading w/ Chado Controller (sec.)
19,4 23,2 22,5 18,8 17,5 13,8 14,5 22 23,8 11,8 Loading time increase (%) 46 33 30 47 27 45 16 38 27 8
Table I. Loading time increase related to the Chado Controller Access Restriction.