• Aucun résultat trouvé

Indexing and Retrieval of Multiple 3D Protein Structure Representations for the Protein Data Bank

N/A
N/A
Protected

Academic year: 2021

Partager "Indexing and Retrieval of Multiple 3D Protein Structure Representations for the Protein Data Bank"

Copied!
24
0
0

Texte intégral

(1)

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la

première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.

Questions? Contact the NRC Publications Archive team at

PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.

https://nrc-publications.canada.ca/eng/copyright

NRC Publications Archive Record / Notice des Archives des publications du CNRC :

https://nrc-publications.canada.ca/eng/view/object/?id=bc1259a5-bba1-4902-a316-6b01f43af6d3 https://publications-cnrc.canada.ca/fra/voir/objet/?id=bc1259a5-bba1-4902-a316-6b01f43af6d3

NRC Publications Archive

Archives des publications du CNRC

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

Indexing and Retrieval of Multiple 3D Protein Structure Representations for the Protein Data Bank

(2)

Indexing and Retrieval of Multiple 3D

Protein Structure Representations for

Protein Structure Representations for

the Protein Data Bank

Eric Paquet,

Senior Research Officer1,

Adjunct Professor2

Adjunct Professor2

Herna L. Viktor,

Associate Professor2

1National Research Council of Canada 2University of Ottawa, Canada

(3)

P

bl

Problem statement

Research suggests that a protein’s function often

depends on the shape and physical properties of the active sites

Estimates indicate that the number of newly discovered

protein structures will grow “exponentially”

– (PDB ~ 60,000)

O

Our solution:

– 2D and 3D Indexing and Similarity Search System

which eliminates need for prior structure alignment

O l

Our goal:

– to aid molecular biologics to label new structures, find

family members, docking problem, etc.

(4)

O

i

Overview

We developed a new indexing and similarity search

system to retrieve protein structures, based on their 3D shape and 2D appearance

Our system:

– translation, scale and rotation invariant, which

eliminates the need for prior structure alignment eliminates the need for prior structure alignment

– handles various representations

 Tested against 45.000 protein structures from the

PDB (real-time retrieval)

Suitable for the docking problem

(5)

R

i

Representations

Shape

– Backbone: structural – Van der Waal

– Envelope: interaction

 Encoding of physicochemical properties (Color code) – Secondary structures: structural

Residue types: interaction

– Residue types: interaction

(6)

O

S

A O

i

Our System: An Overview

Application Queries Results Indexing Capri/MR System Representation Generation 2D Indexing 3D Indexing Si m il a ri ty S e ar ch g Protein Data 2D 3D Database 5 Bank (PDB) 2D Indexes 3D Indexes

(7)

Retrieval System: Query by Example

(8)

C

i

f I d

Comparison of Indexes

Comparison of the

indexes with

various metrics

Geometry of the

f

t

feature space:

Riemannian:

geodesic distance

geodesic distance

[3DIP 2010]

7

(9)

E

i

l d

i

Experimental design

Tested against 45.000 protein structures

from the Protein Data Bank - PDB

I

l

t d

i

J

d J

3D

Implemented using Java and Java 3D

Experiments on workstations with two dual

core Xeon processors and 32 GB memory

core Xeon processors and 32 GB memory

Search engine run on a tablet or a laptop:

retrieval in real time

Evaluation:

Precision/recall of nearest structures

(10)

3D I d

i

3D Indexing

Tensor of inertial of the distribution

F

Ei

t

Frame: Eigen vectors

Labeling of the axis: Eigen value

W i ht d

l

d

di l di t ib ti

f

Weighted angular and radial distribution of

surface elements relative to the frame

Translation scale (if needed) and rotation

Translation, scale (if needed) and rotation

invariant

120 bytes

9

(11)

Homo Sapiens Hemoglobin: Query with the lrly structure

(12)

Homo Sapiens Hemoglobin: Query with the 1flq structure

(13)

2D I d

i

f P

i

2D Indexing of Proteins

Color encoding

of physicochemical

properties: secondary structures, residue

d

id

t

(

h d

h bi

names and residue types (e.g. hydrophobic

which is important for docking)

Characteristic views: efficient with only

four

Characteristic views: efficient with only

four

(

4

) views

Each view is indexed or described according

Each view is indexed or described according

to its visual appearance

~1000 bytes per index (for 4 views)

12

1000 bytes per index (for 4 views)

(14)

2D I d

i

2D Indexing

Tensor of inertial of the distribution

Frame: Eigen vectors and values

I

t

i

li

4 i

Isotropic sampling: 4 views

For each view:

Structuring element

Structuring element

Random motion

Joint distribution: color and proportion within

the element

the element

Accumulation

250 bytes

13

y

(15)

3D

2D R

i

l

3D versus 2D Retrieval

2D indexing approach utilizes the colour, texture and

composition of the images

3D indexing method is based on shape: more3D indexing method is based on shape: more

precise; no ambiguity

Research question:q

– The colours of protein structures provide us with a

semantic key to the functionality thereof:

Th f th 2D i t i l h ld id

– Therefore, the 2D image retrieval should provide

us with a complementary view, in contrast to when we apply a shape-based description.

14

(16)

3D and 2D Query results for

Glutamyl-tRNA (GluRS) family members

(17)

Envelope Representation and 3D

Indexing

Important

f

D

ki

for

Docking

Complex

h

shape

Phage T4 lysozyme from Bacteriophage T4 142l

16

(18)

Retrieval of all members of the Phage T4 lysozyme from Bacteriophage T4 conformation, using the

142l t t ith i i 100% d

142l structure as a query, with precision 100% and recall 100% (80 out of 80)

(19)

Envelope: P22 tailspike protein from

Salmonella phage P22 - 1tyv

(20)

Retrieval of all members from the P22 tailspike

protein from Salmonella phage P22 conformation,

i th 1t t t ith

using the 1tyv structure as a query, with a

precision 100% and recall 100% (9 out of 9)

(21)

D

ki

Goal of protein-ligand docking:

To predict the

position

and

Docking

To predict the position

and

orientation

of a ligand when

bound to a protein receptor

p

p

(22)

Docking through virtual fragmentation

Random, or constrained, fragmentation of the

proteins

3D indexing of each fragment (as previously

described)

For a given fragment or receptor, retrieval of the

closest fragments or ligands from a 3D shape

point-of-view:

query by example

point of view: query by example

Flexibility taken into account by using various

conformations

of the same protein, when

21

p

,

(23)

Docking

through virtual

fragmentation…

(24)

C

l

i

Conclusions

 3D and 2D indexing

 Exhaustive search in real-time  Exhaustive search in real time

 Can describe various representations  Similarity search, query by example

V d i i d ll

Very good precision and recall  Very Large Databases

 Address the docking problem: virtual fragmentation  Help to reduce the number of clinical trials

Eric: eric paquet@nrc-cnrc gc ca

23

Eric: eric.paquet@nrc cnrc.gc.ca Herna: hlviktor@site.uottawa.ca

Références

Documents relatifs

In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.. Results: The paper presents two

Summary: We present an improved version of our Protein Peeling web server dedicated to the analysis of protein structure architecture through the identification of Protein

Der Zustand beim nicht operierten Auge wird jedoch auch bei abnehmen- dem ε im Modell kaum erreicht, denn der durch das Implantat künstlich geschaffene Abflusskanal bleibt in

Selon ce décret en effet, les missions du ministère de la promotion de la femme ont été étendues en incluant « l’élaboration et le suivi de la mise en

L’étude TWILIGHT a permis de démontrer la pertinence de réduire la durée de la DTAP à trois mois, puis de poursuivre le ticagrélor en monothérapie chez certains patients à haut

CONTENT AND FORMAT OF THE DATABANK For each protein in PDB, with identifier xxxx (like: 1PPT, 5PCY), there is a ASCII (text) file xxxx.HSSP which contains the primary sequence of

In this paper we present a novel algorithm that performs 3D protein struc- ture comparison at the level of C-alpha atoms and aims at detecting similarity between a specific

هاروتكدلا لئاسر ةشقانمب قلعتم ططخم لكب رييستلا مولع و ةيراجتلا ، ةيداصتقلإا مولعلا ةي تامولعم ةقاطب )بيرع( بقللا يطيوط )بيرع( مسلاا ىفطصم