• Aucun résultat trouvé

07/02/2013 1

N/A
N/A
Protected

Academic year: 2021

Partager "07/02/2013 1"

Copied!
5
0
0

Texte intégral

(1)

A community annotation system applied to sugarcane sequences

Olivier GARSMEUR ICSB workshop

International Plant & Animal Genome XXI January 12-16, 2013 - San Diego, CA, USA

Identify the different elements of the genome, (location and stucture) :

1 - structural annotation

Introduction to concepts and methods

Attribute a biological information to these elements : 2 - functional annotation Two main concepts:

Automatic annotation – Gene prediction

Intrinsec methods (ab-initio)

- Only based on computational analysis using statistical models. - Probabilistic models like Hidden Markov models (HMM) for discriminating coding and non-coding region of the genome.

-Need a training set genes. (learning)

The learning set is composed of several hundred of gene-sequences manually annotated derived from cDNAs / genomic alignments. Ideally, these genes represent the diversity of the genes that can be found in the genome.

The sequences of exons, splice sites and introns have different statistical properties:

- GC%  Introns are AT rich and

- splicing site is almost GT / AG (for plants 95%)

Extrinsec methods

Automatic annotation – Gene prediction

Based on comparative approaches

= sequence similarities

The sequence to annotate is compared with databases. ADNg Protéine Alignement ADNg - Protéine Alignement ADNg - ADNc ADNc ADNg Alignement ADNg - ADNg BLASTX Genome Threader Exonarate BLASTN

Automatic annotation : Extrinsic methods

database of predictive protein "signatures" can be used for the classification and automatic annotation of proteins.

Interproscan classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains and important sites.

conserved protein domains = signatures

Domain databases used by interproscan: Prosite patterns Pfam ProDom Superfamily TIGRFAMs GENE3D HAMAP PANTHER PIRSF http://www.blast2go.com/b2ghome

Automatic annotation : Pipeline for functional annotation

Blast2GO is a bioinformatics tool for functional annotation of sequences, primarily based on the gene ontology (GO) vocabulary.

Transfers the function from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy. Its also supports InterPro, enzyme codes and KEGG pathways.

(2)

Helitrons LTR Non-LTR (LINEs) SINEs TIR MITEs Retrotransposons CLASS I

DNA transposons CLASS II

Transposable Elements (TEs)

Genes represent only a little part of the genome. Some regions can be gene-rich but some other can contain a majority of repeated elements.

Class Order Superfamily Family Code / Label

Copia opie RLC Gypsy maggy RLG Unclassified RLX L1 RIL Unclassified RIX Alu RSA Unclassified RSX CACTA DTC Mutator DTM Stowaway DTT Tourist DTH Helitron Helitron DHH

LTR, long terminal repeat; LINE, long interspersed nuclear element; SINE, short interspersed nuclear element; TIR , terminal inverted repeat. MITE, Miniature Inverted Transposable Element

DNA transposons TIR MITE LINE SINE Retrotransposon LTR

Class Order Superfamily Family Code / Label

Copia opie RLC Gypsy maggy RLG Unclassified RLX L1 RIL Unclassified RIX Alu RSA Unclassified RSX CACTA DTC Mutator DTM Stowaway DTT Tourist DTH Helitron Helitron DHH

LTR, long terminal repeat; LINE, long interspersed nuclear element; SINE, short interspersed nuclear element; TIR , terminal inverted repeat. MITE, Miniature Inverted Transposable Element

DNA transposons TIR MITE LINE SINE Retrotransposon LTR

Annotation of transposable elements : tools

Mreps RepeatMasker Blastn Censor tBlastx TRF MISA Findmite LTR_harvest LTR_STRUC LTR_finder Find_ltr

Repseek Tandem repeats Microsatellites MUST Class II Class I Intrinsic Extrinsic BLAST Cross_match

-Several ab-initio programs can be used to detect structures of TEs = intrinsec -Comparisons with databases can be used to classify the elements = extrinsec

- Pipeline to detect and classify Tes = REPET pipeline ( Flutre et al, 2011 PLOS one )

Integrative methods = combine ab-initio (intrinsec) + comparative approaches (extrinsec) improving significantly the annotation.

Integrative method – The combiner

GNPAnnot is a community system for structural and functional annotation dedicated to plants, insects and fungus genomes allowing both automatic predictions and manual curations of genomic objects.

GNPAnnot Project

http://southgreen.cirad.fr/

The system is currently being used for various plants, insect and fungus species.

Each specie has a personalized pipeline

GNPAnnot platform

INPUT sequence

Integration in Eugene combiner

FGenesH BLAST Threader Genome Machine Splice Eugene HMM

DNA sequence

Sugarcane Pipeline

1

2 Genomic sequence

(3)

Gene3 Blastx Functional annotation Gene1 Gene2 Region1 Region2 Region3 Refine structure 4 Gene3 Gene1 Gene2 Gene Models Genome-Threader Exonerate regions

Sugarcane Pipeline

BlastP TBlastN Interpro genomes BBMH Blast2GO 5

Gene3

Gene1 Gene2 Stucture

+ Function ADH

S6PDH S/T kinase

GNPAnnot on the web : http://southgreen.cirad.fr

Private databases

GNPAnnot Portal, secure access for private data

Tables with links to gbrowse

Click on the link to access to the Gbrowse

interactive gbrowse

BAC clone

Click on feature

(4)

From the Gbrowse , click and edit your sequences

Database User Password

Artemis annotation tool is loading

Connection to the chado database (private)

artemis : to see, to check, to correct, to validate, to store

commit

The gene builder

History report

A web site for sugarcane genomic resources

Sugarcane sequences (BACs and other sequence data, markers, DArTs ...) could be available from a private web portal.

Sugarcane sequences will be annotated with GNPAnnot and anchored on the sorghum sequence

example

Export results in excel sheet

(5)

Sorghum Gbrowse with R570 BAC clone locations

Link to sugarcane GNPAnnot Gbrowse with BAC annotations

O. Garsmeur C. Charron C. Hervouet S. Sidibe-Bocs G. Droc M. Ruiz A. D'Hont Bioalysis, bioinformatics & Experiments BAC Sequencing V. Guignon M. Rouard

Références

Documents relatifs

Staf-50 Is a Member of the Ring Finger Family-A search and analysis for amino acid sequence homologies in the Gen- Bank data base revealed that the complete Staf-50 protein shares

niveau d’ordre juridique dans la loi n° 04-20 du 25 décembre 2004 19 « relative à la prévention des risques majeurs et la gestion des catastrophes dans le cadre du

We will compare our results for the Copia, Gypsy and Bel-Pao superfamilies (Bel-Pao only for D. melanogaster because there is no annotation for it in A. thaliana) with those of

sified into two categories: classical noise, like the light source having fluctuation in intensity or beam direction, and quantum noise, like the photon shot noise at the de- tector

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

Abstract We report a new water-ice cloud feature observed during the Mars year 34 global dust storm: twilight cloud bands that routinely formed just past the evening terminator1. We

The logarithm of UTH was shown to be given by a linear model in which the regressors are functions of AMSU-B channel 18 and 19 brightness temper- atures, upper tropospheric water

Mutation rate estimates (measured in mutations per generation), obtained from combined meiosis data from 29 published studies (listed in supplementary table S2) and predicted from the