Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la
première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.
Questions? Contact the NRC Publications Archive team at
PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.
https://publications-cnrc.canada.ca/fra/droits
L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.
READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.
https://nrc-publications.canada.ca/eng/copyright
NRC Publications Archive Record / Notice des Archives des publications du CNRC :
https://nrc-publications.canada.ca/eng/view/object/?id=bc1259a5-bba1-4902-a316-6b01f43af6d3 https://publications-cnrc.canada.ca/fra/voir/objet/?id=bc1259a5-bba1-4902-a316-6b01f43af6d3
NRC Publications Archive
Archives des publications du CNRC
Access and use of this website and the material on it are subject to the Terms and Conditions set forth at
Indexing and Retrieval of Multiple 3D Protein Structure Representations for the Protein Data Bank
Indexing and Retrieval of Multiple 3D
Protein Structure Representations for
Protein Structure Representations for
the Protein Data Bank
Eric Paquet,
Senior Research Officer1,
Adjunct Professor2
Adjunct Professor2
Herna L. Viktor,
Associate Professor2
1National Research Council of Canada 2University of Ottawa, Canada
P
bl
Problem statement
Research suggests that a protein’s function often
depends on the shape and physical properties of the active sites
Estimates indicate that the number of newly discovered
protein structures will grow “exponentially”
– (PDB ~ 60,000)
O
Our solution:
– 2D and 3D Indexing and Similarity Search System
which eliminates need for prior structure alignment
O l
Our goal:
– to aid molecular biologics to label new structures, find
family members, docking problem, etc.
O
i
Overview
We developed a new indexing and similarity search
system to retrieve protein structures, based on their 3D shape and 2D appearance
Our system:
– translation, scale and rotation invariant, which
eliminates the need for prior structure alignment eliminates the need for prior structure alignment
– handles various representations
Tested against 45.000 protein structures from the
PDB (real-time retrieval)
Suitable for the docking problem
R
i
Representations
Shape
– Backbone: structural – Van der Waal
– Envelope: interaction
Encoding of physicochemical properties (Color code) – Secondary structures: structural
Residue types: interaction
– Residue types: interaction
O
S
A O
i
Our System: An Overview
Application Queries Results Indexing Capri/MR System Representation Generation 2D Indexing 3D Indexing Si m il a ri ty S e ar ch g Protein Data 2D 3D Database 5 Bank (PDB) 2D Indexes 3D Indexes
Retrieval System: Query by Example
C
i
f I d
Comparison of Indexes
Comparison of the
indexes with
various metrics
Geometry of the
f
t
feature space:
Riemannian:
geodesic distance
geodesic distance
[3DIP 2010]
7E
i
l d
i
Experimental design
Tested against 45.000 protein structures
from the Protein Data Bank - PDB
I
l
t d
i
J
d J
3D
Implemented using Java and Java 3D
Experiments on workstations with two dual
core Xeon processors and 32 GB memory
core Xeon processors and 32 GB memory
Search engine run on a tablet or a laptop:
retrieval in real time
Evaluation:
–
Precision/recall of nearest structures
3D I d
i
3D Indexing
Tensor of inertial of the distribution
F
Ei
t
Frame: Eigen vectors
Labeling of the axis: Eigen value
W i ht d
l
d
di l di t ib ti
f
Weighted angular and radial distribution of
surface elements relative to the frame
Translation scale (if needed) and rotation
Translation, scale (if needed) and rotation
invariant
120 bytes
9
Homo Sapiens Hemoglobin: Query with the lrly structure
Homo Sapiens Hemoglobin: Query with the 1flq structure
2D I d
i
f P
i
2D Indexing of Proteins
Color encoding
of physicochemical
properties: secondary structures, residue
d
id
t
(
h d
h bi
names and residue types (e.g. hydrophobic
which is important for docking)
Characteristic views: efficient with only
four
Characteristic views: efficient with only
four
(
4
) views
Each view is indexed or described according
Each view is indexed or described according
to its visual appearance
~1000 bytes per index (for 4 views)
12
1000 bytes per index (for 4 views)
2D I d
i
2D Indexing
Tensor of inertial of the distribution
Frame: Eigen vectors and values
I
t
i
li
4 i
Isotropic sampling: 4 views
For each view:
–
Structuring element
Structuring element
–
Random motion
–
Joint distribution: color and proportion within
the element
the element
–Accumulation
250 bytes
13y
3D
2D R
i
l
3D versus 2D Retrieval
2D indexing approach utilizes the colour, texture and
composition of the images
3D indexing method is based on shape: more 3D indexing method is based on shape: more
precise; no ambiguity
Research question:q
– The colours of protein structures provide us with a
semantic key to the functionality thereof:
Th f th 2D i t i l h ld id
– Therefore, the 2D image retrieval should provide
us with a complementary view, in contrast to when we apply a shape-based description.
14
3D and 2D Query results for
Glutamyl-tRNA (GluRS) family members
Envelope Representation and 3D
Indexing
Important
f
D
ki
for
Docking
Complex
h
shape
Phage T4 lysozyme from Bacteriophage T4 142l
16
Retrieval of all members of the Phage T4 lysozyme from Bacteriophage T4 conformation, using the
142l t t ith i i 100% d
142l structure as a query, with precision 100% and recall 100% (80 out of 80)
Envelope: P22 tailspike protein from
Salmonella phage P22 - 1tyv
Retrieval of all members from the P22 tailspike
protein from Salmonella phage P22 conformation,
i th 1t t t ith
using the 1tyv structure as a query, with a
precision 100% and recall 100% (9 out of 9)
D
ki
Goal of protein-ligand docking:
To predict the
position
and
Docking
To predict the position
and
orientation
of a ligand when
bound to a protein receptor
p
p
Docking through virtual fragmentation
Random, or constrained, fragmentation of the
proteins
3D indexing of each fragment (as previously
described)
For a given fragment or receptor, retrieval of the
closest fragments or ligands from a 3D shape
point-of-view:
query by example
point of view: query by example
Flexibility taken into account by using various
conformations
of the same protein, when
21
p
,
Docking
through virtual
fragmentation…
C
l
i
Conclusions
3D and 2D indexing
Exhaustive search in real-time Exhaustive search in real time
Can describe various representations Similarity search, query by example
V d i i d ll
Very good precision and recall Very Large Databases
Address the docking problem: virtual fragmentation Help to reduce the number of clinical trials
Eric: eric paquet@nrc-cnrc gc ca
23
Eric: eric.paquet@nrc cnrc.gc.ca Herna: hlviktor@site.uottawa.ca