HAL Id: hal-00634004
https://hal.archives-ouvertes.fr/hal-00634004
Submitted on 20 Oct 2011
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Conformational characterization of disulfide bonds: A tool for protein classification
José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo
To cite this version:
José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo. Conformational character-
ization of disulfide bonds: A tool for protein classification. Journal of Theoretical Biology, Elsevier,
2010, 267 (3), pp.388. �10.1016/j.jtbi.2010.09.012�. �hal-00634004�
www.elsevier.com/locate/yjtbi
Author’s Accepted Manuscript
Conformational characterization of disulfide bonds:
A tool for protein classification
José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury, André Melo
PII: S0022-5193(10)00478-9
DOI: doi:10.1016/j.jtbi.2010.09.012 Reference: YJTBI 6152
To appear in: Journal of Theoretical Biology Received date: 16 June 2010
Revised date: 29 August 2010 Accepted date: 8 September 2010
Cite this article as: José Rui Ferreira Marques, Rute R. da Fonseca, Brett Drury and André Melo, Conformational characterization of disulfide bonds: A tool for protein classification, Journal of Theoretical Biology, doi:10.1016/j.jtbi.2010.09.012
This is a PDF file of an unedited manuscript that has been accepted for publication. As
a service to our customers we are providing this early version of the manuscript. The
manuscript will undergo copyediting, typesetting, and review of the resulting galley proof
before it is published in its final citable form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply
to the journal pertain.
Conformational characterization of disulfide bonds: a tool for protein classification
JoséRuiFerreiraMarques
1(zerui.marques@fc.up.pt) RuteR.daFonseca
2(rute.r.da.fonseca@gmail.com)
BrettDrury
3(brett.drury@gmail.com) AndréMelo
1,#(asmelo@fc.up.pt)
1
REQUIMTE/DepartamentodeQuímicaeBioquímica,FaculdadedeCiênciasda UniversidadedoPorto,RuadoCampoAlegre,687,4169007Porto,Portugal
2
CIMAR/CIIMAR,CentroInterdisciplinardeInvestigaçãoMarinhaeAmbiental, UniversidadedoPorto,RuadosBragas,177,4050123Porto,Portugal
3
LIAADINESC,RuadeCeuta,118,6º,4050190Porto,Portugal
#
Correspondingauthor(Emailaddress:asmelo@fc.up.pt,Tel:+351220402503,Fax:
+351220402659.
Abstract
Background
Throughoutevolution,mutationsinparticularregionsofsomeproteinstructureshave resulted in extra covalent bonds that increase the overall robustness of the fold:
disulfidebonds.Thetwostrategicallyplacedcysteinescanalsohaveamoredirectrole inproteinfunction,eitherbyassistingthiolordisulfideexchange,orthroughallosteric effects.Inthiswork,weverifiedhowthestructuralsimilaritiesbetweendisulfidescan reflect functional and evolutionary relationships between different proteins. We analyzed the conformational patterns of the disulfide bonds in a set of disulfiderich proteins that included twelve SCOP superfamilies: thioredoxinlike and eleven superfamiliescontainingsmalldisulfiderichproteins(SDP).
Results
Thetwentyconformationsconsideredinthepresentstudywerecharacterizedbyboth structuralandenergeticparameters.Thecorrespondingfrequenciespresentdiverse patternsforthedifferentsuperfamilies.Theleaststrainedconformationsaremore abundantfortheSDPsuperfamilies,whilethe“catalytic”+/RHookisdominantforthe thioredoxinlikesuperfamily.The“allosteric”–RHSapleismoderatelyabundantforBBI, Crispandthioredoxinlikesuperfamiliesandlessfrequentfortheremaining
superfamilies.Usingahierarchicalclusteringanalysiswefoundthatthetwelve superfamiliesweregroupedinbiologicallysignificantclusters.
Conclusions
In this work, we carried out an extensive statistical analysis of the conformational
motifsforthedisulfidebondspresentinasetofdisulfiderichproteins. Weshowthat
the conformational patterns observed in disulfide bonds are sufficient to group proteinsthatsharebothfunctionalandstructuralpatternsandcanthereforebeused asacriterionforproteinclassification.
Keywords:Disulfidebond,conformer,clusteranalysis,proteinclassification.
Introduction
DisulfidebondsareacommonmotifinNature.Thesestructuralelementshavea significantroleinthethermalstabilityandfunctionofproteins(Bhattacharyyaetal., 2004;Creighton,1988;Hogg,2003;Klinketal.,2000;Sardiuetal.,2007).Froman evolutionaryperspective,thesebondsarearelativelyrecentadditiontoprotein structure(BrooksandFresco,2002;Brooksetal.,2002;Jordanetal.,2005;Schmidt andHogg,2007)Accordingtotherespectivefunctions,thedisulfidebondscanthenbe classifiedasstructural,catalyticorallosteric(Schmidtetal.,2006;SchmidtandHogg, 2007).Schmidtetal(2006)haveperformedathoroughanalysisofdisulfidespresentin theXraystructuresofthePDBdatabase,andfoundthatbothcatalyticandallosteric disulfidesfellintoparticularstructuralcategories.Thetwogroupshadahigher averagepotentialenergy,whichreflectedtheirfunctionalrolethatimpliedeasybond breaking(Schmidtetal.,2006).
ThedisulfidethreedimensionalstructureishighlyconservedinNatureandhasbeen
usedforproteinclustering(Cheeketal.,2006;Chuangetal.,2003;Harrisonand
Sternberg,1996;Thanguduetal.,2007).Differentschemeshavebeenintroducedto
classifythedisulfideconformers(HarrisonandSternberg,1996;Hutchinsonand
Thornton,1996;OzhoginaandBominaar,2009;Schmidtetal.,2006;Srinivasanetal.,
1990)andinthisworkweadoptedtheschemeproposedbySchmidtetal(2006).We
analyzedasampleofdisulfidebondsassociatedwithaproteinsetextractedfrom
SCOPdatabase(Andreevaetal.,2004;Andreevaetal.,2008;Murzinetal.,1995).The
proteinsetincludedelevensuperfamiliesofsmalldisulfiderichproteins(SDP)andthe
thioredoxinlikesuperfamily.Eachsuperfamilyselectedfortheproteinsethadtofit
thefollowingcriteria:(i)containaminimumofthirtydisulfidebonds,(ii)havea minimumoffivePDBstructuresavailable,(iii)haveXraystructureswitharesolution higherthan2.5Åand(iv)haveonlyuncomplexedstructures.Inordertounderstand whetherornotthestructureofthedisulfidesreflectedfunctionalorevolutionary relationshipsbetweenthedifferentproteins,wegroupedthedisulfidefromthe12 superfamiliesindifferentclustersusingaHierarchicalClusteringAnalysis(HCA)anda structuralbaseddistanceprotocol.Theresultsdemonstratethattheclusters’
aggregatesuperfamiliessharebothfunctionalandstructuralpatterns,thereforewe concludethattheuseofdisulfidebondsconformationalpatternsisavalidprotein classificationcriterion.
Methodology
Theschemeusedinthisworktoclassifythedisulfideconformerswasbasedonfive relevanttorsionangles(Figure1).Thedisulfidespeciesweretreatedassymmetrical.In thiscontext,onlytwentyconformationalcategorieshadtobeconsidered(Table1).For examplethe–RHHookconformationalcategorycanbeobtainedbyeither
combinationsoftorsionangles(,+,+,,)or(,,+,+,).Thisclassificationwasbasedon structuralpatterns(Schmidtetal,2006)thatincludedmain,orientationaland peripheralmotifs(Table2).
[InsertFigure1]
[InsertTable1]
[InsertTable2]
Representativestructuresforthedifferentconformationalcategoriesarepresentedin Tables3to5.
[InsertTable3]
[InsertTable4]
[InsertTable5]
TheproteinsetunderstudyischaracterizedinTable6.Wedeterminedthefive
relevanttorsionangles(
F1,
F2,
F3,
F2’and
F1’)foreachdisulfidebond.Additionally,the (C
DC
D’andC
EC
E’)distancesandthedihedralstrainenergy(DSE)werealsoevaluated.
[InsertTable6]
TheDSEquantitywasexpressed,asafunctionofthefiveabovementionedtorsion angles,bytheempiricalequation(KatzandKossiakoff,1986;Weineretal.,1984):
)) 3 cos(
1 ( 51 . 2 )) 2 cos(
1 ( 64 . 14
)) ' 3 cos(
1 ( 18 . 4 )) 3 cos(
1 ( 18 . 4
)) ' 3 cos(
1 ( 37 . 8 )) 3 cos(
1 ( 37 . 8 ) (
3 3
2 2
1 1
1
F F
F F
F F
mol kJ DSE
(1)
TheDSEquantityprovidedausefulrankingofthemostfavoreddisulfide
conformations.Theminimum(2.5kJmol
1)andthemaximum(84.5kJmol
1)valuesof DSEcorrespondtothetorsionanglescombinations(60º,60º,±83º,60º,60º)and(0º, 0º,0º,0º,0º),respectively(Schmidtetal,2006).Despiteitssimplicity,thisequation hasbeensuccessfullyappliedforasemiquantitativeevaluationofthestrainenergyin disulfidebonds(Schmidtetal.,2006;SchmidtandHogg,2007).
Representativeconformationsofthedifferenttypesofdisulfidebonds(structural,
catalyticorallosteric)areidentifiedinTable7.Wewillbereferringtobondswiththe
conformations+/RHHookas“catalytic”,andRHStapleas“allosteric”,becausethese
twotypesofbondswerefoundtobeintimatelyassociatedwiththoseconformational categories(Schmidtetal.,2006).
Acomputerprogram,designatedbyDisulph,wasdevelopedtoperformthe calculations.Thedisulfidebondspropensity
PrA,forasuperfamily
Awith
npAPDB structures,wascalculatedas,
¦
npA uk
k k A
A np nss nres
1
100 1
Pr , (2)
where
nsskand
nreskwererespectivelythenumberofdisulfidebondsandthenumber ofcodedresiduesinthePDBstructure
k.Thisquantityevaluatesthefrequencyofthe disulfidebondswithinasuperfamily.Itiscalculatedastheaveragefrequency
associatedwithacorrespondentsampleofPDBstructures.
Thefrequenciesassociatedwithalltheconformationalcategories,definedinTable1, werethenevaluatedforeachsuperfamilyandforthesample.Thesequantitieswere usedtobuildasquareEuclideandistancesmatrix,whoseelements(
dEuclidean2(
A,
B) ) weredefinedas:
2 20
1
2
(
A,
B) (
freq(
i,
A)
freq(
i,
B))
di
Euclidian
¦ ;A=1,...,12andB=1,...,12 (3)
Inequation(3),freq(i,A)andfreq(i,B)arerespectivelythefrequencyofconformational categoryiinthesuperfamiliesAandB.ThesquareEuclideandistancesmatrixdefines ametricforevaluatingthesimilaritiesbetweenobjectsinndimensionalspacesand thereforecanbeusedinclusteranalysis.
Inordertorepresentthismatrix,weadoptedtheintuitiveformalismintroducedby (Xieetal.,2000).Thecoordinatesoftheoriginalobjects(thetwelvesuperfamiles) wereprojectedinthe3DCartesianspacebyminimizingthesquaredeviationcost function
SD:
¦¦
121 1
1
2
( , )
2)
, (
AA B
Euclidian A B d
B A d
SD
, (4)
whered(A,B)wasthedistancebetweentheprojectionsthesuperfamilies
Aand
Bin the3DCartesianspace.WeusedtheNewtonmethodtocarryouttheiterative minimizationprocess.Theprocedureassociatedwithequation(4)wasintroducedfor visualizinglargechemicaldatabases(Xieetal.,2000).Theminimizationofthis
equationprovidedanappropriaterepresentationoftheoriginalhighspaceofthe chemicaldescriptorsinalowdimensionalspace(2Dor3D).
ThesquareEuclideandistancesmatrixwasthenusedforaHCAprocedure(Johnson andWichern,2007),whichprovidedaclassificationofthesuperfamiliesindifferent clusters.WeevaluatedtheconsistencyoftheHCApartitioning,bytheevaluationof thesquareEuclideandistancesmatrixintheclusterspace.Theelementsofthismatrix wereallthemeansquaredistancesbetweenacluster
Ciwith
Ci
n
superfamiliesanda
cluster
Cjwith
Cjn
superfamilies(
MSdEuclidean2(
Ci,
Cj) )andwithinacluster
Ci(
2(
i)
Euclidean C
MSd
):
i u j u¦¦
nCi CjA n
B
Euclidian C
C j
Euclidian Ci C n n d A B
MSd
1 1 2
2
( , ) ( 1 ( ) ) ( , ) (5)
¦¦
nCiA A B
Euclidian C
C
Euclidian Ci n n d A B
MSd
1 1
1 2
2
( ) ( 2 ( ) ) ( , ) (6)
ThismatrixwasdefinedaccordingtothemeanlinkagecriterionwithintheHCA procedure(JohnsonandWichern,2007).ThedissimilaritybetweentwoclustersC
iand C
jincreasedwiththeincreasingofthecorrespondentnondiagonalelement
(
2(
i,
j)
Euclidean C CMSd
).Ontheotherhand,thesimilaritywithinaclusterC
iincreases
withthedecreasingofthecorrespondentdiagonalelement(
MSdEuclidean2(
Ci) ).
InthisworkweusedtheHCAdivisivemethodwhichpartitionedsuccessivelyaninitial setwithnobjectsintofinerclusters.Thecorrespondentalgorithmwasthefollowing:
(i) Assignthenobjectstoasinglecluster.
(ii) Computeadistancematrixintheclusterspaceusinganappropriatemetric.
As was mentioned above, we adopted a square Euclidean metric in this work.
(iii) Findtheleastsimilarobjectsandseparatethemindifferentclusters.
(iv) Repeat steps (ii) and (iii) until the diagonal elements of this matrix being significantlysmallerthanthenondiagonalones.
Results
Thecharacterizationofthedisulfideconformationalcategoriesfoundinoursampleis
presentedinTable7.The–LHSpiralisthemostfrequentlyobservedcategory(28.9%)
andhasthelowestDSE(11.5kJmol
1).Additionally,sixleaststrainedcategories(
LHSpiral,+/RHSpiral,+/LHSpiral,RHSpiral,+RHSpiraland/+RHHook)areclearly prevalent(63.1%)relativetotheremainderofthemoststrainedcategories(36.9%).
Therepresentativeconformationsforcatalytic(+/RHHook)andallosteric(RHStaple) disulfidebondshavemoderateDSEvalues.Wefoundthed(C
DC
D')distancestobe morerelevantfordisulfideconformationalspecificitiesthanthed(C
EC
E')distances (Table7).Thed(C
DC
D')distanceswerequiteinsensitivetothenatureof
conformationalcategories(variesfrom3.3to4.0),whilethed(C
EC
E')distances hadasignificantvariationovertheseries(from4.4to6.0).Forinstance,in agreementwithSchmidtetal(2006),the–RHStapleconformationwascharacterized bysignificantlowerd(C
DC
D')distancesthantheotherconformationalcategories.
[InsertTable7]
[InsertTable8]
[InsertFigure2]
Thefrequenciesforthedifferentconformationalcategories,calculatedforeach superfamily,arepresentedinTable8andFigure2.Fromthisfigure,itisevidentthat thioredoxinlikeandSDPsuperfamiliesexhibitverydistinctconformationalpatterns.
TheleaststrainedconformationsaresignificantlyabundantinSDPsuperfamilies presentsignificantabundances(from43.4%to86.5%),butoccurataverylow frequencyinthioredoxinlikesuperfamily(13.8%).Thisisobviousforthemoststable conformation(–LHSpiral)forwhichtheSDPsuperfamiliespresentfrequenciesatleast fourtimeslargerthanthethioredoxinlikefrequency(from12.1%to43.8%against 3.1%;Table8andFigure2).Mostofthedisulfidebondsofthioredoxinlike
superfamily(50.8%)areassociatedwiththe“catalytic”+/RHHookconformation,
whereasthisisrelativelyrare(from0.0%to7.7%)fortheSDPsuperfamilies(Table8 andFigure2).Ontheotherhand,the“allosteric”–RHSapleismoderatelyabundantfor BBI(24.2%),Crisp(24.1%)andthioredoxinlike(16.9%)superfamiliesandscarce (from0.0%to5.7%)fortheremaindersuperfamilies.
[InsertFigure3]
Furtherinsightintohowthestructuralsimilaritiesbetweendisulfidescanreflect relationshipsbetweendifferentproteinswasobtainedwithaHCAprocedure,whose dendrogram(Murtagh,1984)ispresentedinFigure3.The3Dcartesianprojectionof therespectivesquareEuclideandistancesmatrixisrepresentedinFigure4together withthesixclustersidentifiedbythisanalysis.Fourclustersreflectthemainstructural andfunctionalmotifsidentifiedinthesample:
o
Cluster1includesthecatalyticproteinsofthioredoxinlikesuperfamily, withthelowestdisulfidepropensitiesandadominant
D/
Esecondary structure;
o
Cluster4includesmostofthemetabolicsuperfamilies(CystineKnot, EGFLamininandPlantlectins),withadominant
Esecondarystructure;
o
Cluster5includesmostofthetoxin/defensesuperfamilies(Defensin like,omegatoxins,smallsnaketoxinsandscorpionsliketoxins),with moderatetohighdisulfidepropensitiesandadominant
Esecondary structure;
o
Cluster2includestheplantproteaseinhibitorsofBBIsuperfamily,with highdisulfidepropensitiesandadominant
Esecondarystructure.
[InsertFigure4]
Theremaindertwoclustersreflectdivergencesfromthementionedmotifs:
o
Cluster3includesCrispsuperfamilyandisadivergencefromcluster5.
Thisclusterincludestoxin/defenseproteinswithlowdisulfide propensitiesandadominant
Dsecondarystructure.
o
Cluster6includesBPTIlikeandKringlelikesuperfamilies.Thisclusteris theleastwellcharacterizedandincludesproteinswithsmalldisulfide propensitiesanddifferentbiologicalfunctions.Theelementsofthis clustersharemorediffusepropertiesas(i)theyareconstrainedbythree disulphidebondswiththesamedisulfidetopology(16,24and35)and (ii)theyareassociatedwiththeregulationofsimilarbiologicalprocesses (bindingmediation,proteolyticactivity,bloodclotting,etc.).
WerepresenttheEuclideandistancesmatrixfortheclusterspaceinTable9.Fromthe analysisofthisTable,wecanverifythatthemeansquaredistancesbetweenthe clustersaresignificantlargerthanwithintheclusters.Theseresultsstronglyindicate thattheHCApartitioningisconsistent.
[InsertTable9]
Conclusions
Inthiswork,wecarriedoutanextensivestatisticalanalysisoftheconformational
motifsforthedisulfidebondsfoundinsetofdisulfiderichproteinsfromtwelveSCOP
superfamilies.
Thefrequenciesofthetwentyconformationalcategoriesprovidedanearspectral representationofthe12dimensionhyperspaceunderstudy.Thegeneraltrends observedinthissamplewerequiteconsistentwiththeresultsobtainedbyother authors(Schmidtetal.,2006;SchmidtandHogg,2007)forthreedifferentproteinsets.
Wecalculatedtherootmeansquaredeviationsbetweenourandthepreviously obtainedfrequencies.Thethreevaluesobtainedwerealllowerthan2.6%.
TheHCApartitioningofthedatausingasquareEuclideandistancesmatrixresultedin a number of clusters, the majority of which aggregates superfamilies sharing both functional and structural patterns. The only exception is cluster 6, whose elements presentedmorediffuseconnections.Wethereforesuggesttheuseofdisulfidebonds conformational patterns as a criterion in SDP classification, as well as to recognize main divergences between SDP and other disulfiderich superfamilies. However, the generalized application of this methodology for protein classification has to be subjectedtofurtherinvestigation.
Acknowledgements:
WethanktheFundaçãoparaaCiênciaeaTecnologia(FCT)foradoctoralscholarship
granted to José Rui Ferreira Marques. Rute R. da Fonseca was funded by FCT
(SFRH/BPD/26769/2006). We thank the Universidade do Porto for an electric
wheelchairandaTrackerPro(acomputerinputdevicethattakestheplaceofamouse
forpeoplewithnohandmovement)grantedtoJoséRuiFerreiraMarques.
References
Andreeva,A.,Howorth,D.,Brenner,S.E.,Hubbard,T.J.P.,Chothia,C.,Murzin,A.G., 2004.SCOPdatabasein2004:refinementsintegratestructureandsequence familydata.NucleicAcidsResearch32,D226D229.
Andreeva,A.,Howorth,D.,Chandonia,J.M.,Brenner,S.E.,Hubbard,T.J.P.,Chothia, C.,Murzin,A.G.,2008.DatagrowthanditsimpactontheSCOPdatabase:new developments.NucleicAcidsResearch36,D419D425.
Bhattacharyya,R.,Pal,D.,Chakrabarti,P.,2004.Disulfidebonds,theirstereospecific environmentandconservationinproteinstructures.ProteinEngineering DesignandSelection17,795808.
Brooks,D.J.,Fresco,J.R.,2002.Increasedfrequencyofcysteine,tyrosine,and phenylalanineresiduessincethelastuniversalancestor.Molecular&Cellular Proteomics1,125131.
Brooks,D.J.,Fresco,J.R.,Lesk,A.M.,Singh,M.,2002.Evolutionofaminoacid
frequenciesinproteinsoverdeeptime:Inferredorderofintroductionofamino acidsintothegeneticcode.MolecularBiologyandEvolution19,16451655.
Cheek,S.,Krishna,S.S.,Grishin,N.V.,2006.Structuralclassificationofsmall,disulfide richproteindomains.JournalofMolecularBiology359,215237.
Chuang,C.C.,Chen,C.Y.,Yang,J.M.,Lyu,P.C.,Hwang,J.K.,2003.Relationship betweenproteinstructuresanddisulfidebondingpatterns.ProteinsStructure FunctionandGenetics53,15.
Creighton,T.E.,1988.Disulfidebondsandproteinstability.Bioessays8,5763.
Harrison,P.M.andSternberg,M.J.E.,1996.Thedisulphidebetacross:Fromcystine geometryandclusteringtoclassificationofsmalldisulphiderichproteinfolds.
JournalofMolecularBiology264,603623.
Hogg,P.J.,2003.Disulfidebondsasswitchesforproteinfunction.Trendsin BiochemicalSciences28,210214.
Hutchinson,E.G.andThornton,J.M.,1996.PROMOTIFAprogramtoidentifyand analyzestructuralmotifsinproteins.ProteinScience5,212220.
Johnson,R.A.,andWichern,D.W.,2007.AppliedMultivariateStatisticalAnalysis.
PrenticeHall,NewJersey.
Jordan,I.K.,Kondrashov,F.A.,Adzhubei,I.A.,Wolf,Y.I.,Koonin,E.V.,Kondrashov,A.
S.,Sunyaev,S.,2005.Auniversaltrendofaminoacidgainandlossinprotein evolution.Nature435,528528.
Katz,B.A.andKossiakoff,A.,1986.Thecrystallographicallydeterminedstructuresof atypicalstraineddisulfidesengineeredintosubtilisin.JournalofBiological Chemistry261,54805485.
Klink,T.A.,Woycechowsky,K.J.,Taylor,K.M.,Raines,R.T.,2000.Contributionof disulfidebondstotheconformationalstabilityandcatalyticactivityof ribonucleaseA.EuropeanJournalofBiochemistry267,566572.
Murtagh,F.,1984.CountingdendrogramsAsurvey.DiscreteAppliedMathematics7, 191199.
Murzin,A.G.,Brenner,S.E.,Hubbard,T.,Chothia,C.,1995.SCOPAstructural
classificationofproteinsdatabasefortheinvestigationofsequencesand
structures.JournalofMolecularBiology247,536540.
Ozhogina,O.A.andBominaar,E.L.,2009.Characterizationofthekringlefoldand identificationofaubiquitousnewclassofdisulfiderotamers.Journalof StructuralBiology168,223233.
Sardiu,M.E.,Cheung,M.S.,Yu,Y.K.,2007.Cysteinecysteinecontactpreferenceleads totargetfocusinginproteinfolding.BiophysicalJournal93,938951.
Schmidt,B.,Ho,L.,Hogg,P.J.,2006.Allostericdisulfidebonds.Biochemistry45,7429 7433.
Schmidt,B.andHogg,P.J.,2007.SearchforallostericdisulfidebondsinNMR structures.BMCStructuralBiology7,49.
Srinivasan,N.,Sowdhamini,R.,Ramakrishnan,C.,Balaram,P.,1990.Conformationsof disulfidebridgesinproteins.InternationalJournalofPeptideandProtein Research36,147155.
Thangudu,R.R.,Sharma,P.,Srinivasan,N.,Offmann,B.,2007.Analycys:Adatabasefor conservationandconformationofdisulphidebondsinhomologousprotein domains.ProteinsStructureFunctionandBioinformatics67,255261.
Weiner,S.J.,Kollman,P.A.,Case,D.A.,Singh,U.C.,Ghio,C.,Alagona,G.,Profeta,S., Weiner,P.,1984.Anewforcefieldformolecularmechanicalsimulationof nucleicacidsandproteins.JournaloftheAmericanChemicalSociety106,765 784.
Xie,D.X.,Tropsha,A.,Schlick,T.,2000.Anefficientprojectionprotocolforchemical
databases:SingularvaluedecompositioncombinedwithtruncatedNewton
minimization.JournalofChemicalInformationandComputerSciences40,167
177.
Figurescaptions
Figure1:Graphicalrepresentationofthefivetorsionanglesusedtoclassifythe
disulphideconformers.
Figure2:Frequenciesforthedisulfideconformationalcategories.
Figure3:Dendrogramforthehierarchicalclusteringanalysis.Thefollowingnotation
wasadopted:(1)Crisp,(2)CystineKnot,(3)Defensinlike,(4)EGFLaminin,(5)Omega toxins,(6)Plantlectins,(7)Smallsnaketoxins,(8)Scorpionliketoxins,(9)BBI,(10) BPTIlike,(11)Kringlelikeand(12)Thioredoxinlike.
Figure4:.Projected3DCartesianrepresentationofthesquareEuclideandistances
matrixandclustersobtainedbythehierarchicalclusteringanalysis.Thefollowing notationwasadopted:(1)Crisp,(2)CystineKnot,(3)Defensinlike,(4)EGFLaminin,(5) Omegatoxins,(6)Plantlectins,(7)Smallsnaketoxins,(8)Scorpionliketoxins,(9)BBI, (10)BPTIlike,(11)Kringlelikeand(12)Thioredoxinlike.
Table1.Classificationofdisulphidebondsinconformationalcategories(Schmidtetal,
2006).
Disulphidecategory# F F F F' F'
-LHSpiral
-RHHook + +
+/-RHSpiral + + + + +/-LHSpiral + -RHSpiral + + + +/-RHHook + + + +RHSpiral + + + + +
-LHHook +
-/+RHHook + + +
-RHStaple +
+/-LHHook + + -/+LHHook + + +/-LHStaple + + +
-LHStaple + + +LHSpiral + +
+LHHook + + +
+RHHook + + + + +/-RHStaple + +
+LHStaple + + - + + +RHStaple + - + - +
#
LH:Lefthandedoriented;RH:Righthandedoriented;:Negative
valuefortherespectivetorsionangle;+:Positivevalueforthe
respectivetorsionangle.
Table 2. Characteristic conformational motifs used for disulphide classification.
Main motifs
F2 F3 F2’
Orientational motifs
F3
Peripheral motifs
F1 F1’ Spiral
+ + +
LH
-
+
+ + - - -
Staple
+ - +
-
- - - + -
Hook
+ + -
RH
+
+/-
+ - - + +
+ - -
-/+
- +
- - +
Table 3. Representative structures for the spiral conformational categories.
-LHSpiral -RHSpiral
+LHSpiral +RHSpiral
+/-LHSpiral +/-RHSpiral
Table 4. Representative structures for the staple conformational categories.
-LHStaple -RHStaple
+LHStaple +RHStaple
+/-LHStaple +/-RHStaplel
24
Table 5. Representative structures for the hook conformational categories.-LHHook -RHHook
+LHHook +RHHook
+/-LHHook +/-RHHook
-/+LHHook -/+RHHook
25
Table 6.
Characterization of t h e p rotein s et u nder study . The s am p le used in th e st atis ti cal an aly ses is consid ered t o in clud e al l th e disulph ide bonds i d en tifie d in this pr otein set.
Superfamily Dominant secondary structure Propensity#No. of PDB structuresNo. of disulphide bonds Function
Crisp
D5. 3 % 6 54 Toxins/defense C y st ine-Knot
E3.7 % 13 112 Meta bolic De fensi n-like
E7. 4 % 15 47 Toxins/defense EGF-Lam inin
E6.4 % 27 121 Meta bolic Om ega t o xin s
E8. 9 % 28 88 Toxins/defense Plant lectins
E9.9 % 8 100 Meta bolic S m al l snak e tox ins
E6. 5 % 40 209 Toxins/d efense Scorpion-lik e tox in s
E7. 9 % 70 247 Toxins/d efense BBI (B ow m an B irk I nhibit ors)
E9. 6 % 5 33 P rot ease inh ibit ion BP TI-like
DE5. 1 % 12 42 P rot ease inh ibit ion Kri ngle-li ke
E3.7 % 12 53 Meta bolic Thioredoxin-l ike
DE0. 8 % 43 66 Is o m er as e catal y si s
#Calcu lated b y equation 2
.26
Table 7. Average parameters for the disulphide bonds conformational categories in thesample under study. Representative conformations for structural (-LHSpiral), catalytic (+/-RHHook) and allosteric (-RHStaple) disulphide bonds are represented in bold.
Conformational category Frequency DSE/kJ mol
-1d(C
D-C
D')/ d(C
E-C
E')/
-LHSpiral 28.9% 11.5 5.7 3.7
-RHHook 9.9% 25.0 5.7 4.0
+/-RHSpiral 8.6% 14.5 5.9 3.8 +/-LHSpiral 7.9% 17.9 6.0 3.7 -RHSpiral 7.0% 18.9 6.0 3.8
+/-RHHook 6.1% 19.4 5.3 3.8
+RHSpiral 6.0% 12.8 5.8 3.7
-LHHook 5.2% 37.0 5.7 4.1
-/+RHHook 4.7% 17.9 5.5 3.9
-RHStaple 4.0% 21.1 4.4 4.0
+/-LHHook 2.2% 26.8 5.9 4.0
-/+LHHook 1.9% 32.7 6.1 4.0
+/-LHStaple 1.6% 30.3 5.0 3.7
-LHStaple 1.5% 31.4 5.5 3.9
+LHSpiral 1.4% 20.8 6.2 3.9
+LHHook 1.2% 29.3 5.9 3.8
+RHHook 0.7% 30.7 6.1 4.1
+/-RHStaple 0.6% 32.3 5.9 4.1
+LHStaple 0.4% 39.3 5.4 3.3
+RHStaple 0.1% 24.9 5.9 3.3
Least strained
#63.1% 15.6 5.8 3.8 Most strained 36.9% 28.6 5.6 3.9
#
The six conformational categories with the smallest DSE have a grey background.
27
Table 8.
fr equen cies for the d iffe ren t confor m ational ca tegor ies.
-LHSpiral35.2%43.8%26.4%46.1%13.8%29.0%27.3%28.3%12.1%28.6%32.7%3.1%28.9% -RHHook0.0%5.4%18.9%8.7%20.7%24.0%6.7%9.3%0.0%4.8%3.8%10.8%9.9% +/-RHSpiral7.4%14.3%7.5%5.2%8.0%20.0%3.8%12.1%3.0%4.8%3.8%1.5%8.6% +/-LHSpiral11.1%4.5%3.8%9.6%2.3%22.0%4.3%2.8%0.0%19.0%30.8%6.2%7.9% -RHSpiral14.8%9.8%3.8%2.6%3.4%0.0%17.2%3.2%21.2%4.8%1.9%1.5%7.0% +/-RHHook0.0%3.6%3.8%3.5%2.3%0.0%2.9%7.7%0.0%2.4%0.0%50.8%6.1% +RHSpiral0.0%3.6%1.9%3.5%4.6%4.0%12.9%9.7%0.0%0.0%1.9%1.5%6.0% -LHHook0.0%1.8%7.5%6.1%5.7%1.0%5.3%10.1%6.1%2.4%3.8%1.5%5.2% -/+RHHook5.6%0.0%0.0%1.7%11.5%0.0%2.9%4.9%12.1%23.8%15.4%0.0%4.7% -RHStaple24.1%0.0%5.7%1.7%4.6%0.0%2.4%0.4%24.2%0.0%0.0%16.9%4.0% +/-LHHook0.0%0.0%9.4%1.7%5.7%0.0%2.9%2.0%0.0%0.0%1.9%3.1%2.2% -/+LHHook0.0%1.8%5.7%0.0%3.4%0.0%3.8%2.4%0.0%0.0%0.0%0.0%1.9% +/-LHStaple1.9%0.9%0.0%0.0%10.3%0.0%1.4%0.0%6.1%0.0%1.9%3.1%1.6% -LHStaple0.0%0.9%3.8%0.9%2.3%0.0%1.0%1.6%6.1%9.5%0.0%0.0%1.5% +LHSpiral0.0%8.9%0.0%0.0%1.1%0.0%1.0%1.2%0.0%0.0%0.0%0.0%1.4% +LHHook0.0%0.0%1.9%2.6%0.0%0.0%2.4%1.2%6.1%0.0%0.0%0.0%1.2% +RHHook0.0%0.9%0.0%1.7%0.0%0.0%0.5%1.2%0.0%0.0%1.9%0.0%0.7% +/-RHStaple0.0%0.0%0.0%0.9%0.0%0.0%1.4%0.8%3.0%0.0%0.0%0.0%0.6% +LHStaple0.0%0.0%0.0%2.6%0.0%0.0%0.0%0.8%0.0%0.0%0.0%0.0%0.4% +RHStaple0.0%0.0%0.0%0.9%0.0%0.0%0.0%0.0%0.0%0.0%0.0%0.0%0.1%12Sample8910114567 Superfamily Categorie 123
28
Table 9. Square Euclidian distances matrix for the cluster space.Cluster 1 2 3 4 5 6
1 0.00% 34.89% 40.29% 45.79% 32.03% 44.03%
2 34.89% 0.00% 8.77% 25.38% 13.42% 21.20%
3 40.29% 8.77% 0.00% 12.00% 11.24% 12.72%
4 45.79% 25.38% 12.00% 6.18% 19.99% 24.61%
5 32.03% 13.42% 11.24% 19.99% 3.98% 19.42%
6 44.03% 21.20% 12.72% 24.61% 19.42% 1.83%
N C
bC
aC ’
aC ’
bN’
S
gS ’
gc
1c
2c
2' c
1'
c
3Figure 1
4. Figure
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
-LHSpi ra l -RHHook +/-RHSpir al +/-LHS p ir a l -RHSpir al +/-RHHook +RHS pira l -LHHook -/+RHHoo k -RHSt apl e
+/-LHHook -/+L HHook +/-LHSt aple -L HSta ple +LHSpi ra l +LH H ook +RHHook +/-RHSt apl e +LHSt aple +RHSt apl e
Freq uency
Disulphide conformational categories
Figure 2
(c a talytic) (allos teric)
Crisp Cystine-Knot Defensin-like EGF-Laminin Omega toxins Plantlectins Smallsnake toxins Scorpion-liketoxins BBI BPT-like Kringle-like Thioredoxinlike
0%
5%
10%
15%
4. Figure
Cluster5
Step 1 Step 2 Step 3 Step 4 Step 5
Cluster1Cluster6Cluster3Cluster4Cluster21, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1112 1, 2, 3, 4, 5, 6, 7, 8, 10, 119 2, 3, 4, 5, 6, 7, 8, 10, 11
12 9112 2, 4, 6, 10, 113, 5, 7, 81912 3, 5, 7, 82, 4, 610, 111219
4. Figure
Cluster 1
Cluster 2
Cluster 3
Cluster 5
Cluster 4
Cluster 6
12
9
1
5 7
3 8
10
2 11 4
6 4. Figure