• Aucun résultat trouvé

Identifying distant homologous viral sequences in metagenomes using protein structure information

N/A
N/A
Protected

Academic year: 2021

Partager "Identifying distant homologous viral sequences in metagenomes using protein structure information"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01090987

https://hal.archives-ouvertes.fr/hal-01090987

Submitted on 5 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Identifying distant homologous viral sequences in metagenomes using protein structure information

Martine Boccara, Mathilde Carpentier, Jacques Chomilier, François Coste, Clovis Galiez, Joël Pothier, Alaguraj Veluchamy

To cite this version:

Martine Boccara, Mathilde Carpentier, Jacques Chomilier, François Coste, Clovis Galiez, et al..

Identifying distant homologous viral sequences in metagenomes using protein structure information.

ECCB’14 Workshop on Recent Computational Advances in Metagenomics, Sep 2014, Strasbourg, France. �hal-01090987�

(2)

Identifying distant homologous viral sequences in metagenomes using protein structure information

M. Boccara M. Carpentier J. Chomilier F. Coste C. Galiez J. Pothier A. Veluchamy

June 3, 2014

Marine viruses have a huge impact on marine life: it is estimated that they daily kill about 20% of the ocean biomass. However, marine viruses are poorly known and characterized. One lever that could help to give better knowledge of marine viruses is marine metagenomics [Sut07].

Between 2009 and 2011, the TARA Ocean [EC11] expedition sampled oceans at different depths, for different sizes of organisms. This project gave rise to a huge amount of scientific data, and specifically to metagenomic sequences.

Standard annotation tools such as MEGAN [DAJS07], PhyloPythiaS [PRH+11]

and Phymm [AS09], are either sequence-homology or composition based and they largely depends on the already available reference sequences. As viral sequences are highly divergent, the annotation of such data is very poor (in previous marine viral metagenomic studies, over 90% of sequences have no rec- ognized homology [Sut07]).

It is therefore necessary to change the paradigm of current methods to improve the detection of viral sequences in metagenomes using other sources of informa- tion such as protein structure. Our goals are both to develop new methods and databases and to contribute measuring the impact of virus over marine ecosys- tems. We want first to adapt recently developed methods combining sequence and structure information.

A family of proteins, the capsid proteins, is present in every all viruses. Due to geometrical constrains on the viral capsid, these proteins are limited to a very small number of folds [KB11]. Moreover these folds are characteristic of the shape of the viral capsid [SC13]. Thus, identifying the fold of a protein present in a metagenomic dataset could help to identify the type of virus it is associated to.

In the PEPS VAG project, we aim at developing and adapting methods that allow designing robust sequential protein patterns taking advantage of structural information and applying them on a viral fraction of TARA Ocean metagenomic data. Aside from the novel developments, existing fold recognition methods like FROST will be adapted to our goals. FROST has been developed in the team of JF Gibrat with whom we collaborated [JRJ+02]. Given a protein structure library, it assigns its most probable fold to a given sequence. We will build a library of capsid folds by comparing the protein structures of similar capsids.

We will also add the information of MIRs (Most Interacting Residues) developed by one of the participating teams [PEB+04] to improve its performance. MIRs

1

(3)

are positions dispersed along the sequence, predicted from folding simulations on a lattice. They allow to give a signature of a fold, because they statistically correspond to residues compulsory on a structural point of view.

We are also developing a novel methodology in the PEPS VAG. The idea is to extract portions of proteins – called contact motifs – capturing strong structural conservation along with relevant sequential signals. We are heading to develop short signatures at the sequential level of characteristic structures, such as capsids, that can be used to find their occurrences on sequenced data. Our method identifies characteristic structural motifs of capsid proteins of a same fold from which we deduce a characteristic signal of the underlying sequence.

We present here the first experiment we made on the viral fraction of station 23 of TARA Oceans at the Deep Chlorophyll Maximum depth.

References

[AS09] Brady A and Salzberg SL. Phymm and phymmbl: metagenomic phylogenetic classification with interpolated markov models. Nat Methods., 2009.

[DAJS07] Huson DH, Auch AF, Qi J, and Schuster SC. Megan analysis of metagenomic data. Genome Res., 2007.

[EC11] Karsenti E and Tara Oceans Consortium. A holistic approach to marine eco-systems biology. PLoS Biol., 2011.

[JRJ+02] Pley J., Andonov R., Gibrat JF., Marin A., and Poirriez V. Par- all´elisations d’une m´ethode de reconnaissance de repliements de prot´eines (frost). JOBIM, 2002.

[KB11] Mart Krupovic and Dennis H Bamford. Double-stranded dna viruses:

20 families and only five different architectural principles for virion assembly. Current Opinion in Virology, 2011.

[PEB+04] N. Papandreou, E. Eliopoulos, I Berezovsky, A. Lopes, E. Eliopoulos, and J. Chomilier. Universal positions in globular proteins : observa- tion to simulation. Eur. J. Biochem., 2004.

[PRH+11] Patil, Kaustubh R., Haider, Peter, Pope, Phillip B., Turnbaugh, Peter J., Morrison, Mark, Scheffer, Tobias, McHardy, and Alice C.

Taxonomic metagenome sequence assignment with structured output models. Nat Methods, 2011.

[SC13] Cheng S and Brooks CL. 3rd viral capsid proteins are segregated in structural fold space. PLoS Comput Biol, 2013.

[Sut07] Curtis A. Suttle. Marine viruses — major players in the global ecosystem. Nature Reviews Microbiology, 2007.

2

Références

Documents relatifs

The rate of nasal carriage of Staphylococcus aureus and as- sociated risk factors were determined in a cross-sectional study involving Swiss children’s hospitals.. In a

We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the

Indeed structural changes in disordered proteins have been implicat- ed in disease processes, with evidence for D➔O transitions triggered by disease-causing mutations in

Its applications on two commercial compound collections, and its assessment on the experimental screening results of 11 different PPI systems shows a robust behavior in identifying

Nowadays, the adoption of XML (eXtensible Markup Language) [164] as a stan- dard for the interchange and publication of structured data has created the oppor- tunity to design

I assessed the following predictor variables: partner rank (1-14, where 1 is high- ranking and 14 is low-ranking), copulation length (continuous, seconds), female parity

IP was analyzed with the Wechsler Intelligence Scale for Children-III (WISC-III) in 25 patients and by the Kaufman Assessment Battery for Children in two patients. MP was evaluated

L’Association paritaire pour la santé et la sécurité du travail du secteur affaires sociales (ASSTSAS), fondée en 1979, est une association sectorielle paritaire qui compte