Conformational characterization of disulfide bonds: A tool for protein classification

(1)

HAL Id: hal-00634004

https://hal.archives-ouvertes.fr/hal-00634004

Submitted on 20 Oct 2011

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Conformational characterization of disulfide bonds: A tool for protein classification

José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo

To cite this version:

José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo. Conformational character-

ization of disulfide bonds: A tool for protein classification. Journal of Theoretical Biology, Elsevier,

2010, 267 (3), pp.388. �10.1016/j.jtbi.2010.09.012�. �hal-00634004�

(2)

www.elsevier.com/locate/yjtbi

Author’s Accepted Manuscript

Conformational characterization of disulfide bonds:

A tool for protein classification

José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo

PII: S0022-5193(10)00478-9

DOI: doi:10.1016/j.jtbi.2010.09.012 Reference: YJTBI 6152

To appear in: Journal of Theoretical Biology Received date: 16 June 2010

Revised date: 29 August 2010 Accepted date: 8 September 2010

Cite this article as: José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury and André Melo, Conformational characterization of disulfide bonds: A tool for protein classification, Journal of Theoretical Biology, doi:10.1016/j.jtbi.2010.09.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As

a service to our customers we are providing this early version of the manuscript. The

manuscript will undergo copyediting, typesetting, and review of the resulting galley proof

before it is published in its final citable form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply

to the journal pertain.

(3)

Conformational characterization of disulfide bonds: a tool for protein classification

JoséRuiFerreiraMarques

¹

(zerui.marques@fc.up.pt) RuteR.daFonseca

²

(rute.r.da.fonseca@gmail.com)

BrettDrury

³

(brett.drury@gmail.com) AndréMelo

^1,#

(asmelo@fc.up.pt)

1

REQUIMTE/DepartamentodeQuímicaeBioquímica,FaculdadedeCiênciasda UniversidadedoPorto,RuadoCampoAlegre,687,4169007Porto,Portugal

2

CIMAR/CIIMAR,CentroInterdisciplinardeInvestigaçãoMarinhaeAmbiental, UniversidadedoPorto,RuadosBragas,177,4050123Porto,Portugal

3

LIAADINESC,RuadeCeuta,118,6º,4050190Porto,Portugal

#

Correspondingauthor(Emailaddress:asmelo@fc.up.pt,Tel:+351220402503,Fax:

+351220402659.

(4)

Abstract

Background

Throughoutevolution,mutationsinparticularregionsofsomeproteinstructureshave resulted in extra covalent bonds that increase the overall robustness of the fold:

disulfidebonds.Thetwostrategicallyplacedcysteinescanalsohaveamoredirectrole inproteinfunction,eitherbyassistingthiolordisulfideexchange,orthroughallosteric effects.Inthiswork,weverifiedhowthestructuralsimilaritiesbetweendisulfidescan reflect functional and evolutionary relationships between different proteins. We analyzed the conformational patterns of the disulfide bonds in a set of disulfiderich proteins that included twelve SCOP superfamilies: thioredoxinlike and eleven superfamiliescontainingsmalldisulfiderichproteins(SDP).

Results

Thetwentyconformationsconsideredinthepresentstudywerecharacterizedbyboth structuralandenergeticparameters.Thecorrespondingfrequenciespresentdiverse patternsforthedifferentsuperfamilies.Theleaststrainedconformationsaremore abundantfortheSDPsuperfamilies,whilethe“catalytic”+/RHookisdominantforthe thioredoxinlikesuperfamily.The“allosteric”–RHSapleismoderatelyabundantforBBI, Crispandthioredoxinlikesuperfamiliesandlessfrequentfortheremaining

superfamilies.Usingahierarchicalclusteringanalysiswefoundthatthetwelve superfamiliesweregroupedinbiologicallysignificantclusters.

Conclusions

In this work, we carried out an extensive statistical analysis of the conformational

motifsforthedisulfidebondspresentinasetofdisulfiderichproteins. Weshowthat

(5)

the conformational patterns observed in disulfide bonds are sufficient to group proteinsthatsharebothfunctionalandstructuralpatternsandcanthereforebeused asacriterionforproteinclassification.

Keywords:Disulfidebond,conformer,clusteranalysis,proteinclassification.

(6)

Introduction

DisulfidebondsareacommonmotifinNature.Thesestructuralelementshavea significantroleinthethermalstabilityandfunctionofproteins(Bhattacharyyaetal., 2004;Creighton,1988;Hogg,2003;Klinketal.,2000;Sardiuetal.,2007).Froman evolutionaryperspective,thesebondsarearelativelyrecentadditiontoprotein structure(BrooksandFresco,2002;Brooksetal.,2002;Jordanetal.,2005;Schmidt andHogg,2007)Accordingtotherespectivefunctions,thedisulfidebondscanthenbe classifiedasstructural,catalyticorallosteric(Schmidtetal.,2006;SchmidtandHogg, 2007).Schmidtetal(2006)haveperformedathoroughanalysisofdisulfidespresentin theXraystructuresofthePDBdatabase,andfoundthatbothcatalyticandallosteric disulfidesfellintoparticularstructuralcategories.Thetwogroupshadahigher averagepotentialenergy,whichreflectedtheirfunctionalrolethatimpliedeasybond breaking(Schmidtetal.,2006).

ThedisulfidethreedimensionalstructureishighlyconservedinNatureandhasbeen

usedforproteinclustering(Cheeketal.,2006;Chuangetal.,2003;Harrisonand

Sternberg,1996;Thanguduetal.,2007).Differentschemeshavebeenintroducedto

classifythedisulfideconformers(HarrisonandSternberg,1996;Hutchinsonand

Thornton,1996;OzhoginaandBominaar,2009;Schmidtetal.,2006;Srinivasanetal.,

1990)andinthisworkweadoptedtheschemeproposedbySchmidtetal(2006).We

analyzedasampleofdisulfidebondsassociatedwithaproteinsetextractedfrom

SCOPdatabase(Andreevaetal.,2004;Andreevaetal.,2008;Murzinetal.,1995).The

proteinsetincludedelevensuperfamiliesofsmalldisulfiderichproteins(SDP)andthe

thioredoxinlikesuperfamily.Eachsuperfamilyselectedfortheproteinsethadtofit

(7)

thefollowingcriteria:(i)containaminimumofthirtydisulfidebonds,(ii)havea minimumoffivePDBstructuresavailable,(iii)haveXraystructureswitharesolution higherthan2.5Åand(iv)haveonlyuncomplexedstructures.Inordertounderstand whetherornotthestructureofthedisulfidesreflectedfunctionalorevolutionary relationshipsbetweenthedifferentproteins,wegroupedthedisulfidefromthe12 superfamiliesindifferentclustersusingaHierarchicalClusteringAnalysis(HCA)anda structuralbaseddistanceprotocol.Theresultsdemonstratethattheclusters’

aggregatesuperfamiliessharebothfunctionalandstructuralpatterns,thereforewe concludethattheuseofdisulfidebondsconformationalpatternsisavalidprotein classificationcriterion.

Methodology

Theschemeusedinthisworktoclassifythedisulfideconformerswasbasedonfive relevanttorsionangles(Figure1).Thedisulfidespeciesweretreatedassymmetrical.In thiscontext,onlytwentyconformationalcategorieshadtobeconsidered(Table1).For examplethe–RHHookconformationalcategorycanbeobtainedbyeither

combinationsoftorsionangles(,+,+,,)or(,,+,+,).Thisclassificationwasbasedon structuralpatterns(Schmidtetal,2006)thatincludedmain,orientationaland peripheralmotifs(Table2).

[InsertFigure1]

[InsertTable1]

[InsertTable2]

(8)

Representativestructuresforthedifferentconformationalcategoriesarepresentedin Tables3to5.

[InsertTable3]

[InsertTable4]

[InsertTable5]

TheproteinsetunderstudyischaracterizedinTable6.Wedeterminedthefive

relevanttorsionangles(

F1

,

F2

,

F3

,

F2

’and

F1

’)foreachdisulfidebond.Additionally,the (C

_D

C

_D

’andC

_E

C

_E

’)distancesandthedihedralstrainenergy(DSE)werealsoevaluated.

[InsertTable6]

TheDSEquantitywasexpressed,asafunctionofthefiveabovementionedtorsion angles,bytheempiricalequation(KatzandKossiakoff,1986;Weineretal.,1984):

)) 3 cos(

1 ( 51 . 2 )) 2 cos(

1 ( 64 . 14

)) ' 3 cos(

1 ( 18 . 4 )) 3 cos(

1 ( 18 . 4

)) ' 3 cos(

1 ( 37 . 8 )) 3 cos(

1 ( 37 . 8 ) (

3 3

2 2

1 1

1

F F

mol kJ DSE

(1)

TheDSEquantityprovidedausefulrankingofthemostfavoreddisulfide

conformations.Theminimum(2.5kJmol

¹

)andthemaximum(84.5kJmol

¹

)valuesof DSEcorrespondtothetorsionanglescombinations(60º,60º,±83º,60º,60º)and(0º, 0º,0º,0º,0º),respectively(Schmidtetal,2006).Despiteitssimplicity,thisequation hasbeensuccessfullyappliedforasemiquantitativeevaluationofthestrainenergyin disulfidebonds(Schmidtetal.,2006;SchmidtandHogg,2007).

Representativeconformationsofthedifferenttypesofdisulfidebonds(structural,

catalyticorallosteric)areidentifiedinTable7.Wewillbereferringtobondswiththe

conformations+/RHHookas“catalytic”,andRHStapleas“allosteric”,becausethese

(9)

twotypesofbondswerefoundtobeintimatelyassociatedwiththoseconformational categories(Schmidtetal.,2006).

Acomputerprogram,designatedbyDisulph,wasdevelopedtoperformthe calculations.Thedisulfidebondspropensity

PrA

,forasuperfamily

A

with

npA

PDB structures,wascalculatedas,

¦

^np^A ^u

k

k k A

A np nss nres

1

100 1

Pr , (2)

where

nssk

and

nresk

wererespectivelythenumberofdisulfidebondsandthenumber ofcodedresiduesinthePDBstructure

k

.Thisquantityevaluatesthefrequencyofthe disulfidebondswithinasuperfamily.Itiscalculatedastheaveragefrequency

associatedwithacorrespondentsampleofPDBstructures.

Thefrequenciesassociatedwithalltheconformationalcategories,definedinTable1, werethenevaluatedforeachsuperfamilyandforthesample.Thesequantitieswere usedtobuildasquareEuclideandistancesmatrix,whoseelements(

dEuclidean²

(

A

,

B

) ) weredefinedas:

2 20

1

2

(

A

,

B

) (

freq

(

i

,

A

)

freq

(

i

,

B

))

d

i

Euclidian

¦ ;A=1,...,12andB=1,...,12 (3)

Inequation(3),freq(i,A)andfreq(i,B)arerespectivelythefrequencyofconformational categoryiinthesuperfamiliesAandB.ThesquareEuclideandistancesmatrixdefines ametricforevaluatingthesimilaritiesbetweenobjectsinndimensionalspacesand thereforecanbeusedinclusteranalysis.

(10)

Inordertorepresentthismatrix,weadoptedtheintuitiveformalismintroducedby (Xieetal.,2000).Thecoordinatesoftheoriginalobjects(thetwelvesuperfamiles) wereprojectedinthe3DCartesianspacebyminimizingthesquaredeviationcost function

SD

:

_¦¦

¹²

1 1

1

2

( , )

2

)

, (

A

A B

Euclidian A B d

B A d

SD

, (4)

whered(A,B)wasthedistancebetweentheprojectionsthesuperfamilies

A

and

B

in the3DCartesianspace.WeusedtheNewtonmethodtocarryouttheiterative minimizationprocess.Theprocedureassociatedwithequation(4)wasintroducedfor visualizinglargechemicaldatabases(Xieetal.,2000).Theminimizationofthis

equationprovidedanappropriaterepresentationoftheoriginalhighspaceofthe chemicaldescriptorsinalowdimensionalspace(2Dor3D).

ThesquareEuclideandistancesmatrixwasthenusedforaHCAprocedure(Johnson andWichern,2007),whichprovidedaclassificationofthesuperfamiliesindifferent clusters.WeevaluatedtheconsistencyoftheHCApartitioning,bytheevaluationof thesquareEuclideandistancesmatrixintheclusterspace.Theelementsofthismatrix wereallthemeansquaredistancesbetweenacluster

Ci

with

Ci

n

superfamiliesanda

cluster

Cj

with

Cj

n

superfamilies(

MSd_Euclidean²

(

C_i

,

C_j

) )andwithinacluster

Ci

(

²

(

_i

)

Euclidean C

MSd

):

ⁱ ^u ^j ^u

¦¦

ⁿ^Cⁱ ^C^j

A n

B

Euclidian C

C j

Euclidian Ci C n n d A B

MSd

1 1 2

2

( , ) ( 1 ( ) ) ( , ) (5)

(11)

ⁱ ^u ⁱ ^u

¦¦

ⁿ^Cⁱ

A A B

Euclidian C

C

Euclidian Ci n n d A B

MSd

1 1

1 2

2

( ) ( 2 ( ) ) ( , ) (6)

ThismatrixwasdefinedaccordingtothemeanlinkagecriterionwithintheHCA procedure(JohnsonandWichern,2007).ThedissimilaritybetweentwoclustersC

_i

and C

_j

increasedwiththeincreasingofthecorrespondentnondiagonalelement

(

²

(

_i

,

_j

)

Euclidean C C

MSd

).Ontheotherhand,thesimilaritywithinaclusterC

i

increases

withthedecreasingofthecorrespondentdiagonalelement(

MSd_Euclidean²

(

C_i

) ).

InthisworkweusedtheHCAdivisivemethodwhichpartitionedsuccessivelyaninitial setwithnobjectsintofinerclusters.Thecorrespondentalgorithmwasthefollowing:

(i) Assignthenobjectstoasinglecluster.

(ii) Computeadistancematrixintheclusterspaceusinganappropriatemetric.

As was mentioned above, we adopted a square Euclidean metric in this work.

(iii) Findtheleastsimilarobjectsandseparatethemindifferentclusters.

(iv) Repeat steps (ii) and (iii) until the diagonal elements of this matrix being significantlysmallerthanthenondiagonalones.

Results

Thecharacterizationofthedisulfideconformationalcategoriesfoundinoursampleis

presentedinTable7.The–LHSpiralisthemostfrequentlyobservedcategory(28.9%)

andhasthelowestDSE(11.5kJmol

¹

).Additionally,sixleaststrainedcategories(

(12)

LHSpiral,+/RHSpiral,+/LHSpiral,RHSpiral,+RHSpiraland/+RHHook)areclearly prevalent(63.1%)relativetotheremainderofthemoststrainedcategories(36.9%).

Therepresentativeconformationsforcatalytic(+/RHHook)andallosteric(RHStaple) disulfidebondshavemoderateDSEvalues.Wefoundthed(C

_D

C

_D

')distancestobe morerelevantfordisulfideconformationalspecificitiesthanthed(C

_E

C

_E

')distances (Table7).Thed(C

_D

C

_D

')distanceswerequiteinsensitivetothenatureof

conformationalcategories(variesfrom3.3to4.0),whilethed(C

_E

C

_E

')distances hadasignificantvariationovertheseries(from4.4to6.0).Forinstance,in agreementwithSchmidtetal(2006),the–RHStapleconformationwascharacterized bysignificantlowerd(C

_D

C

_D

')distancesthantheotherconformationalcategories.

[InsertTable7]

[InsertTable8]

[InsertFigure2]

Thefrequenciesforthedifferentconformationalcategories,calculatedforeach superfamily,arepresentedinTable8andFigure2.Fromthisfigure,itisevidentthat thioredoxinlikeandSDPsuperfamiliesexhibitverydistinctconformationalpatterns.

TheleaststrainedconformationsaresignificantlyabundantinSDPsuperfamilies presentsignificantabundances(from43.4%to86.5%),butoccurataverylow frequencyinthioredoxinlikesuperfamily(13.8%).Thisisobviousforthemoststable conformation(–LHSpiral)forwhichtheSDPsuperfamiliespresentfrequenciesatleast fourtimeslargerthanthethioredoxinlikefrequency(from12.1%to43.8%against 3.1%;Table8andFigure2).Mostofthedisulfidebondsofthioredoxinlike

superfamily(50.8%)areassociatedwiththe“catalytic”+/RHHookconformation,

(13)

whereasthisisrelativelyrare(from0.0%to7.7%)fortheSDPsuperfamilies(Table8 andFigure2).Ontheotherhand,the“allosteric”–RHSapleismoderatelyabundantfor BBI(24.2%),Crisp(24.1%)andthioredoxinlike(16.9%)superfamiliesandscarce (from0.0%to5.7%)fortheremaindersuperfamilies.

[InsertFigure3]

Furtherinsightintohowthestructuralsimilaritiesbetweendisulfidescanreflect relationshipsbetweendifferentproteinswasobtainedwithaHCAprocedure,whose dendrogram(Murtagh,1984)ispresentedinFigure3.The3Dcartesianprojectionof therespectivesquareEuclideandistancesmatrixisrepresentedinFigure4together withthesixclustersidentifiedbythisanalysis.Fourclustersreflectthemainstructural andfunctionalmotifsidentifiedinthesample:

o

Cluster1includesthecatalyticproteinsofthioredoxinlikesuperfamily, withthelowestdisulfidepropensitiesandadominant

D

/

E

secondary structure;

o

Cluster4includesmostofthemetabolicsuperfamilies(CystineKnot, EGFLamininandPlantlectins),withadominant

E

secondarystructure;

o

Cluster5includesmostofthetoxin/defensesuperfamilies(Defensin like,omegatoxins,smallsnaketoxinsandscorpionsliketoxins),with moderatetohighdisulfidepropensitiesandadominant

E

secondary structure;

o

Cluster2includestheplantproteaseinhibitorsofBBIsuperfamily,with highdisulfidepropensitiesandadominant

E

secondarystructure.

[InsertFigure4]

(14)

Theremaindertwoclustersreflectdivergencesfromthementionedmotifs:

o

Cluster3includesCrispsuperfamilyandisadivergencefromcluster5.

Thisclusterincludestoxin/defenseproteinswithlowdisulfide propensitiesandadominant

D

secondarystructure.

o

Cluster6includesBPTIlikeandKringlelikesuperfamilies.Thisclusteris theleastwellcharacterizedandincludesproteinswithsmalldisulfide propensitiesanddifferentbiologicalfunctions.Theelementsofthis clustersharemorediffusepropertiesas(i)theyareconstrainedbythree disulphidebondswiththesamedisulfidetopology(16,24and35)and (ii)theyareassociatedwiththeregulationofsimilarbiologicalprocesses (bindingmediation,proteolyticactivity,bloodclotting,etc.).

WerepresenttheEuclideandistancesmatrixfortheclusterspaceinTable9.Fromthe analysisofthisTable,wecanverifythatthemeansquaredistancesbetweenthe clustersaresignificantlargerthanwithintheclusters.Theseresultsstronglyindicate thattheHCApartitioningisconsistent.

[InsertTable9]

Conclusions

Inthiswork,wecarriedoutanextensivestatisticalanalysisoftheconformational

motifsforthedisulfidebondsfoundinsetofdisulfiderichproteinsfromtwelveSCOP

superfamilies.

(15)

Thefrequenciesofthetwentyconformationalcategoriesprovidedanearspectral representationofthe12dimensionhyperspaceunderstudy.Thegeneraltrends observedinthissamplewerequiteconsistentwiththeresultsobtainedbyother authors(Schmidtetal.,2006;SchmidtandHogg,2007)forthreedifferentproteinsets.

Wecalculatedtherootmeansquaredeviationsbetweenourandthepreviously obtainedfrequencies.Thethreevaluesobtainedwerealllowerthan2.6%.

TheHCApartitioningofthedatausingasquareEuclideandistancesmatrixresultedin a number of clusters, the majority of which aggregates superfamilies sharing both functional and structural patterns. The only exception is cluster 6, whose elements presentedmorediffuseconnections.Wethereforesuggesttheuseofdisulfidebonds conformational patterns as a criterion in SDP classification, as well as to recognize main divergences between SDP and other disulfiderich superfamilies. However, the generalized application of this methodology for protein classification has to be subjectedtofurtherinvestigation.

Acknowledgements:

WethanktheFundaçãoparaaCiênciaeaTecnologia(FCT)foradoctoralscholarship

granted to José Rui Ferreira Marques. Rute R. da Fonseca was funded by FCT

(SFRH/BPD/26769/2006). We thank the Universidade do Porto for an electric

wheelchairandaTrackerPro(acomputerinputdevicethattakestheplaceofamouse

forpeoplewithnohandmovement)grantedtoJoséRuiFerreiraMarques.

(16)

(17)

References

Andreeva,A.,Howorth,D.,Brenner,S.E.,Hubbard,T.J.P.,Chothia,C.,Murzin,A.G., 2004.SCOPdatabasein2004:refinementsintegratestructureandsequence familydata.NucleicAcidsResearch32,D226D229.

Andreeva,A.,Howorth,D.,Chandonia,J.M.,Brenner,S.E.,Hubbard,T.J.P.,Chothia, C.,Murzin,A.G.,2008.DatagrowthanditsimpactontheSCOPdatabase:new developments.NucleicAcidsResearch36,D419D425.

Bhattacharyya,R.,Pal,D.,Chakrabarti,P.,2004.Disulfidebonds,theirstereospecific environmentandconservationinproteinstructures.ProteinEngineering DesignandSelection17,795808.

Brooks,D.J.,Fresco,J.R.,2002.Increasedfrequencyofcysteine,tyrosine,and phenylalanineresiduessincethelastuniversalancestor.Molecular&Cellular Proteomics1,125131.

Brooks,D.J.,Fresco,J.R.,Lesk,A.M.,Singh,M.,2002.Evolutionofaminoacid

frequenciesinproteinsoverdeeptime:Inferredorderofintroductionofamino acidsintothegeneticcode.MolecularBiologyandEvolution19,16451655.

Cheek,S.,Krishna,S.S.,Grishin,N.V.,2006.Structuralclassificationofsmall,disulfide richproteindomains.JournalofMolecularBiology359,215237.

Chuang,C.C.,Chen,C.Y.,Yang,J.M.,Lyu,P.C.,Hwang,J.K.,2003.Relationship betweenproteinstructuresanddisulfidebondingpatterns.ProteinsStructure FunctionandGenetics53,15.

Creighton,T.E.,1988.Disulfidebondsandproteinstability.Bioessays8,5763.

(18)

Harrison,P.M.andSternberg,M.J.E.,1996.Thedisulphidebetacross:Fromcystine geometryandclusteringtoclassificationofsmalldisulphiderichproteinfolds.

JournalofMolecularBiology264,603623.

Hogg,P.J.,2003.Disulfidebondsasswitchesforproteinfunction.Trendsin BiochemicalSciences28,210214.

Hutchinson,E.G.andThornton,J.M.,1996.PROMOTIFAprogramtoidentifyand analyzestructuralmotifsinproteins.ProteinScience5,212220.

Johnson,R.A.,andWichern,D.W.,2007.AppliedMultivariateStatisticalAnalysis.

PrenticeHall,NewJersey.

Jordan,I.K.,Kondrashov,F.A.,Adzhubei,I.A.,Wolf,Y.I.,Koonin,E.V.,Kondrashov,A.

S.,Sunyaev,S.,2005.Auniversaltrendofaminoacidgainandlossinprotein evolution.Nature435,528528.

Katz,B.A.andKossiakoff,A.,1986.Thecrystallographicallydeterminedstructuresof atypicalstraineddisulfidesengineeredintosubtilisin.JournalofBiological Chemistry261,54805485.

Klink,T.A.,Woycechowsky,K.J.,Taylor,K.M.,Raines,R.T.,2000.Contributionof disulfidebondstotheconformationalstabilityandcatalyticactivityof ribonucleaseA.EuropeanJournalofBiochemistry267,566572.

Murtagh,F.,1984.CountingdendrogramsAsurvey.DiscreteAppliedMathematics7, 191199.

Murzin,A.G.,Brenner,S.E.,Hubbard,T.,Chothia,C.,1995.SCOPAstructural

classificationofproteinsdatabasefortheinvestigationofsequencesand

structures.JournalofMolecularBiology247,536540.

(19)

Ozhogina,O.A.andBominaar,E.L.,2009.Characterizationofthekringlefoldand identificationofaubiquitousnewclassofdisulfiderotamers.Journalof StructuralBiology168,223233.

Sardiu,M.E.,Cheung,M.S.,Yu,Y.K.,2007.Cysteinecysteinecontactpreferenceleads totargetfocusinginproteinfolding.BiophysicalJournal93,938951.

Schmidt,B.,Ho,L.,Hogg,P.J.,2006.Allostericdisulfidebonds.Biochemistry45,7429 7433.

Schmidt,B.andHogg,P.J.,2007.SearchforallostericdisulfidebondsinNMR structures.BMCStructuralBiology7,49.

Srinivasan,N.,Sowdhamini,R.,Ramakrishnan,C.,Balaram,P.,1990.Conformationsof disulfidebridgesinproteins.InternationalJournalofPeptideandProtein Research36,147155.

Thangudu,R.R.,Sharma,P.,Srinivasan,N.,Offmann,B.,2007.Analycys:Adatabasefor conservationandconformationofdisulphidebondsinhomologousprotein domains.ProteinsStructureFunctionandBioinformatics67,255261.

Weiner,S.J.,Kollman,P.A.,Case,D.A.,Singh,U.C.,Ghio,C.,Alagona,G.,Profeta,S., Weiner,P.,1984.Anewforcefieldformolecularmechanicalsimulationof nucleicacidsandproteins.JournaloftheAmericanChemicalSociety106,765 784.

Xie,D.X.,Tropsha,A.,Schlick,T.,2000.Anefficientprojectionprotocolforchemical

databases:SingularvaluedecompositioncombinedwithtruncatedNewton

minimization.JournalofChemicalInformationandComputerSciences40,167

177.

(20)

(21)

Figurescaptions

Figure1:Graphicalrepresentationofthefivetorsionanglesusedtoclassifythe

disulphideconformers.

Figure2:Frequenciesforthedisulfideconformationalcategories.

Figure3:Dendrogramforthehierarchicalclusteringanalysis.Thefollowingnotation

wasadopted:(1)Crisp,(2)CystineKnot,(3)Defensinlike,(4)EGFLaminin,(5)Omega toxins,(6)Plantlectins,(7)Smallsnaketoxins,(8)Scorpionliketoxins,(9)BBI,(10) BPTIlike,(11)Kringlelikeand(12)Thioredoxinlike.

Figure4:.Projected3DCartesianrepresentationofthesquareEuclideandistances

matrixandclustersobtainedbythehierarchicalclusteringanalysis.Thefollowing notationwasadopted:(1)Crisp,(2)CystineKnot,(3)Defensinlike,(4)EGFLaminin,(5) Omegatoxins,(6)Plantlectins,(7)Smallsnaketoxins,(8)Scorpionliketoxins,(9)BBI, (10)BPTIlike,(11)Kringlelikeand(12)Thioredoxinlike.

(22)

Table1.Classificationofdisulphidebondsinconformationalcategories(Schmidtetal,

2006).

Disulphidecategory^# F F F F' F'

-LHSpiral

-RHHook + +

+/-RHSpiral + + + + +/-LHSpiral + -RHSpiral + + + +/-RHHook + + + +RHSpiral + + + + +

-LHHook +

-/+RHHook + + +

-RHStaple +

+/-LHHook + + -/+LHHook + + +/-LHStaple + + +

-LHStaple + + +LHSpiral + +

+LHHook + + +

+RHHook + + + + +/-RHStaple + +

+LHStaple + + - + + +RHStaple + - + - +

#

LH:Lefthandedoriented;RH:Righthandedoriented;:Negative

valuefortherespectivetorsionangle;+:Positivevalueforthe

respectivetorsionangle.

(23)

Table 2. Characteristic conformational motifs used for disulphide classification.

Main motifs

F2 F3 F2’

Orientational motifs

F3

Peripheral motifs

F1 F1’ Spiral

+ + +

LH

-

+

+ + - - -

Staple

+ - +

-

- - - + -

Hook

+ + -

RH

+

+/-

+ - - + +

+ - -

-/+

- +

- - +

(24)

Table 3. Representative structures for the spiral conformational categories.

-LHSpiral -RHSpiral

+LHSpiral +RHSpiral

+/-LHSpiral +/-RHSpiral

(25)

Table 4. Representative structures for the staple conformational categories.

-LHStaple -RHStaple

+LHStaple +RHStaple

+/-LHStaple +/-RHStaplel

(26)

24

Table 5. Representative structures for the hook conformational categories.

-LHHook -RHHook

+LHHook +RHHook

+/-LHHook +/-RHHook

-/+LHHook -/+RHHook

(27)

25

Table 6.

Characterization of t h e p rotein s et u nder study . The s am p le used in th e st atis ti cal an aly ses is consid ered t o in clud e al l th e disulph ide bonds i d en tifie d in this pr otein set.

Superfamily Dominant secondary structure Propensity#No. of PDB structures

No. of disulphide bonds Function

Crisp

D

5. 3 % 6 54 Toxins/defense C y st ine-Knot

E

3.7 % 13 112 Meta bolic De fensi n-like

E

7. 4 % 15 47 Toxins/defense EGF-Lam inin

E

6.4 % 27 121 Meta bolic Om ega t o xin s

E

8. 9 % 28 88 Toxins/defense Plant lectins

E

9.9 % 8 100 Meta bolic S m al l snak e tox ins

E

6. 5 % 40 209 Toxins/d efense Scorpion-lik e tox in s

E

7. 9 % 70 247 Toxins/d efense BBI (B ow m an B irk I nhibit ors)

E

9. 6 % 5 33 P rot ease inh ibit ion BP TI-like

DE

5. 1 % 12 42 P rot ease inh ibit ion Kri ngle-li ke

E

3.7 % 12 53 Meta bolic Thioredoxin-l ike

DE

0. 8 % 43 66 Is o m er as e catal y si s

#

Calcu lated b y equation 2

.

(28)

26

Table 7. Average parameters for the disulphide bonds conformational categories in the

sample under study. Representative conformations for structural (-LHSpiral), catalytic (+/-RHHook) and allosteric (-RHStaple) disulphide bonds are represented in bold.

Conformational category Frequency DSE/kJ mol

^-1

d(C

_D

-C

_D

')/ d(C

_E

-C

_E

')/

-LHSpiral 28.9% 11.5 5.7 3.7

-RHHook 9.9% 25.0 5.7 4.0

+/-RHSpiral 8.6% 14.5 5.9 3.8 +/-LHSpiral 7.9% 17.9 6.0 3.7 -RHSpiral 7.0% 18.9 6.0 3.8

+/-RHHook 6.1% 19.4 5.3 3.8

+RHSpiral 6.0% 12.8 5.8 3.7

-LHHook 5.2% 37.0 5.7 4.1

-/+RHHook 4.7% 17.9 5.5 3.9

-RHStaple 4.0% 21.1 4.4 4.0

+/-LHHook 2.2% 26.8 5.9 4.0

-/+LHHook 1.9% 32.7 6.1 4.0

+/-LHStaple 1.6% 30.3 5.0 3.7

-LHStaple 1.5% 31.4 5.5 3.9

+LHSpiral 1.4% 20.8 6.2 3.9

+LHHook 1.2% 29.3 5.9 3.8

+RHHook 0.7% 30.7 6.1 4.1

+/-RHStaple 0.6% 32.3 5.9 4.1

+LHStaple 0.4% 39.3 5.4 3.3

+RHStaple 0.1% 24.9 5.9 3.3

Least strained

^#

63.1% 15.6 5.8 3.8 Most strained 36.9% 28.6 5.6 3.9

#

The six conformational categories with the smallest DSE have a grey background.

(29)

27

Table 8.

fr equen cies for the d iffe ren t confor m ational ca tegor ies.

-LHSpiral35.2%43.8%26.4%46.1%13.8%29.0%27.3%28.3%12.1%28.6%32.7%3.1%28.9% -RHHook0.0%5.4%18.9%8.7%20.7%24.0%6.7%9.3%0.0%4.8%3.8%10.8%9.9% +/-RHSpiral7.4%14.3%7.5%5.2%8.0%20.0%3.8%12.1%3.0%4.8%3.8%1.5%8.6% +/-LHSpiral11.1%4.5%3.8%9.6%2.3%22.0%4.3%2.8%0.0%19.0%30.8%6.2%7.9% -RHSpiral14.8%9.8%3.8%2.6%3.4%0.0%17.2%3.2%21.2%4.8%1.9%1.5%7.0% +/-RHHook0.0%3.6%3.8%3.5%2.3%0.0%2.9%7.7%0.0%2.4%0.0%50.8%6.1% +RHSpiral0.0%3.6%1.9%3.5%4.6%4.0%12.9%9.7%0.0%0.0%1.9%1.5%6.0% -LHHook0.0%1.8%7.5%6.1%5.7%1.0%5.3%10.1%6.1%2.4%3.8%1.5%5.2% -/+RHHook5.6%0.0%0.0%1.7%11.5%0.0%2.9%4.9%12.1%23.8%15.4%0.0%4.7% -RHStaple24.1%0.0%5.7%1.7%4.6%0.0%2.4%0.4%24.2%0.0%0.0%16.9%4.0% +/-LHHook0.0%0.0%9.4%1.7%5.7%0.0%2.9%2.0%0.0%0.0%1.9%3.1%2.2% -/+LHHook0.0%1.8%5.7%0.0%3.4%0.0%3.8%2.4%0.0%0.0%0.0%0.0%1.9% +/-LHStaple1.9%0.9%0.0%0.0%10.3%0.0%1.4%0.0%6.1%0.0%1.9%3.1%1.6% -LHStaple0.0%0.9%3.8%0.9%2.3%0.0%1.0%1.6%6.1%9.5%0.0%0.0%1.5% +LHSpiral0.0%8.9%0.0%0.0%1.1%0.0%1.0%1.2%0.0%0.0%0.0%0.0%1.4% +LHHook0.0%0.0%1.9%2.6%0.0%0.0%2.4%1.2%6.1%0.0%0.0%0.0%1.2% +RHHook0.0%0.9%0.0%1.7%0.0%0.0%0.5%1.2%0.0%0.0%1.9%0.0%0.7% +/-RHStaple0.0%0.0%0.0%0.9%0.0%0.0%1.4%0.8%3.0%0.0%0.0%0.0%0.6% +LHStaple0.0%0.0%0.0%2.6%0.0%0.0%0.0%0.8%0.0%0.0%0.0%0.0%0.4% +RHStaple0.0%0.0%0.0%0.9%0.0%0.0%0.0%0.0%0.0%0.0%0.0%0.0%0.1%

12Sample8910114567 Superfamily Categorie 123

(30)

28

Table 9. Square Euclidian distances matrix for the cluster space.

Cluster 1 2 3 4 5 6

1 0.00% 34.89% 40.29% 45.79% 32.03% 44.03%

2 34.89% 0.00% 8.77% 25.38% 13.42% 21.20%

3 40.29% 8.77% 0.00% 12.00% 11.24% 12.72%

4 45.79% 25.38% 12.00% 6.18% 19.99% 24.61%

5 32.03% 13.42% 11.24% 19.99% 3.98% 19.42%

6 44.03% 21.20% 12.72% 24.61% 19.42% 1.83%

(31)

N C

_b

C

_a

C ’

_a

C ’

_b

N’

S

_g

S ’

_g

c

1

c

2

c

2

' c

1

'

c

3

Figure 1

4. Figure

(32)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

-LHSpi ra l -RHHook +/-RHSpir al +/-LHS p ir a l -RHSpir al +/-RHHook +RHS pira l -LHHook -/+RHHoo k -RHSt apl e

+/-LHHook -/+L HHook +/-LHSt aple -L HSta ple +LHSpi ra l +LH H ook +RHHook +/-RHSt apl e +LHSt aple +RHSt apl e

Freq uency

Disulphide conformational categories

Figure 2

(c a talytic) (allos teric)

Crisp Cystine-Knot Defensin-like EGF-Laminin Omega toxins Plantlectins Smallsnake toxins Scorpion-liketoxins BBI BPT-like Kringle-like Thioredoxinlike

0%

5%

10%

15%

4. Figure

(33)

Cluster5

Step 1 Step 2 Step 3 Step 4 Step 5

Cluster1Cluster6Cluster3Cluster4Cluster2

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1112 1, 2, 3, 4, 5, 6, 7, 8, 10, 119 2, 3, 4, 5, 6, 7, 8, 10, 11

12 9112 2, 4, 6, 10, 113, 5, 7, 81912 3, 5, 7, 82, 4, 610, 111219

4. Figure

(34)

Cluster 1

Cluster 2

Cluster 3

Cluster 5

Cluster 4

Cluster 6

12

9

1

5 7

3 8

10

2 11 4

6 4. Figure