• Aucun résultat trouvé

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

N/A
N/A
Protected

Academic year: 2021

Partager "Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-00937763

https://hal.inria.fr/hal-00937763

Submitted on 28 Jan 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of

amyloidogenic peptides

Witold Dyrka, Florence Thirion, Jean-Christophe Nebel, Malgorzata Kotulska

To cite this version:

Witold Dyrka, Florence Thirion, Jean-Christophe Nebel, Malgorzata Kotulska. Probabilistic context- free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides.

11th Workshop on Bioinformatics and 6th Symposium of the Polish Bioinformatics Society, Sep 2013, Wroclaw, Poland. �hal-00937763�

(2)

Poster

Wrocław, 27-29 September 2013 Page 42

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

Witold Dyrka

a, b *

, Florence Thirion

a

, Jean-Christophe Nebel

c

, Malgorzata Kotulska

a

a Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Poland

b Inria Centre de Research Sud-Ouest, Bordeaux, France

c School of Computing and Information Systems, Faculty of Science, Engineering and Computing, Kingston University, London

*e-mail: witold.dyrka@pwr.wroc.pl

Keywords:

probabilistic context-free grammar, grammar inference, helix-helix pairs, amyloidogenic peptides

Hidden Markov Models power many state-of-the-art tools in the field of protein

bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey

directly information on medium and long-range residue-residue interactions. This requires

an expressive power of at least context-free grammars. However, application of more

powerful grammar formalisms to protein analysis has been surprisingly limited. We have

developed a probabilistic grammatical framework for problem-specific protein languages,

which has been already successfully applied to recognition of ligand binding sites. The core

of the model consists of a probabilistic context-free grammar (PCFG), automatically inferred

by a genetic algorithm from only a generic set of expert-based rules and positive training

sequences. Here, we show that the PCFG approach matches state-of-the-art performance

in two other tasks: classification of transmembrane helix-helix pairs and recognition

of amyloidogenic peptides. First, the framework was applied to produce grammar descriptors

of four classes of transmembrane helix-helix contact sites. The highest performance

of the classifiers reached AUC ROC of 0.70. Second, the analogous approach was used

to distinguish between amyloidogenic and non-amyloidogenic protein fragments. It yielded

good results whether these fragments were isolated or within an entire protein (AUC ROC

up to 0.80). Finally, an attempt to model pairing amyloidogenic fragments resulted

in classifiers reaching AUC ROC of 0.70. A significant feature of the PCFG method is

that grammar rules and parse trees are human-readable, and thus could provide biologically

meaningful information.

Références

Documents relatifs

Establishment Establishment DRM1/DRM2 DRM1/DRM2 siRNAs siRNAs Maintenance Maintenance Maintenance Maintenance CG CG MET1 MET1 CNG CNG CHH CHH (asymmetric) CMT3 CMT3 DRM1/DRM2

However, as the relation between grammar dependencies and real residue-residue contacts is not straightforward, a practical approach could consist on 3 steps: (1) assigning a helix

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

We compared the lengths and amino acid sequences of the GAP associated to the three GnRH types present in gnathostomes, named in this study as GAP1, GAP2, and GAP3, and of the

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

We also prove that the intrinsic redundancy of software systems can be used to avoid failures, and we propose an approach to handle failures automati- cally and at runtime, solving

An Unusual Helix Turn Helix Motif in the Catalytic Core of HIV-1 Integrase Binds Viral DNA and LEDGF.. Hayate Merad, Horea Porumb, Loussinée Zargarian, Brigitte René, Zeina

computer physics had developed unique knowledge and an international network on how computer technology and software development could be used for safe and accurate