HAL Id: tel-00481898
https://tel.archives-ouvertes.fr/tel-00481898
Submitted on 7 May 2010
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Structure prediction of P1-type ATPases and molecular
dynamics simulations on their Metal Binding Domains
Karthik Arumugam
To cite this version:
Karthik Arumugam. Structure prediction of P1-type ATPases and molecular dynamics simulations
on their Metal Binding Domains. Modeling and Simulation. Université Joseph-Fourier - Grenoble I,
2009. English. �tel-00481898�
É ole Do torale Chimie et S ien es du vivant
THÈSE
pour obtenir legrade de
DOCTEUR DE L'UNIVERSITÉ JOSEPH FOURIER
Spé ialité : Biologie Stru turale et Nanobiologie
présentée etsoutenue publiquement le12 novembre2009
par
Karthik ARUMUGAM
Stru ture predi tion of P1-type ATPases and mole ular dynami s
simulations on their Metal Binding Domains
JURY
Pr. Arthur SOUCEMARIANADIN Président
Dr. DorothéeBERTHOMIEU Rapporteur
Dr. Norbert GARNIER Rapporteur
Dr. Patri eCATTY Examinateur
Dr. Stéphane REDON Examinateur
Dr. Serge CROUZY Dire teur dethése
LABORATOIRE DE CHIMIEETBIOLOGIE DES MÉTAUX UMR 5249
Institut deRe her hes en Te hnologies etS ien espour leVivant
Commissariatá l'Énergie Atomique (CEA), Grenoble
NANO-D- INRIAGrenoble - Rhone-Alpes
I would like to thankmy advisor, Dr. Serge Crouzy, for his great guidan e and support throughout
myresear h. It wasaprivilege to bea memberof histeam. Ihad awonderful experien eworkingin
his labwhere Ienjoyed working on s ienti proje ts and learned howto be omea s ientist. During
the years we worked together he wasmu h more than an advisor to me. He taught me a lot about
lifeaswellass ien e. EverythingIhavelearnedfromhimwill staywithmethroughoutmylife. This
thesiswouldhaveneverbeen possiblewithout him.
I would like to thank my Co-supervisor Dr. Stephane Redon, Dr. Patri e Catty, Dr. Elisabeth
Mintz,Dr. Florent Guillainand Dr. Mi helFerrandfortheir valuable advi eandsupportduring my
PhD thesis.
ManythankstoDr. Mi helVivaudou and Dr. ChristopheMoreau, forhis usefuldis ussionabout
Hm2Kir6 proje t withmeandallowing meto do resear h onthesame.
Igratefullya knowledgetheeortsofmydissertation ommitteemembers,Dr. DorotheeBerthomieu
and Dr. Norbert Garnier. I owe my personal thanks to Prof. Arthur Sou emarianadin for his kind
helpduring myinitial time of my thesis.
Iwouldalsoliketothank allmypastandpresentmemberofmyLab. Iamdeeplyindebtedto the
peopleat UniversityJoseph Fourier and at CEA, for theirhelp duringmystay.
Igratefullya knowledgethenan ialsupportthatIre eivedfromtheCEADire tiondesRelations
Internationales (DRI) viaEGIDE,and fromthe LCBMlaboratory.
I would like to express my gratitude towards my Prof. R. Nayak and Prof. Shaila from Indian
Institute of S ien e, India, for their guidan e, en ouragement and support. I espe ially thank my
Deputy Registrar Mr. Paneerselvamfor his ontinuous en ouragement and trust onme.
A spe ialthoughts to myfriendswhohave mademylifeat Grenoble soenjoyable both insideand
outsidetheLab,in ludingPrem, Sandeep,Tarun,Peng,Mahesh,Chei kna, Marine,Mouna,Pankaj,
Abhinav, Robert,Shikha.
FinallyIwouldliketo thankmyfamily. Thebiggestthankswhi hhasbeensavedtilllasthastogo
1 Introdu tion 1
1.1 Histori al ba kground . . . 1
1.2 From sequen eto stru ture to fun tion . . . 2
1.3 The ell membrane . . . 2
1.4 Computational biologyand Bioinformati s . . . 4
1.4.1 Membraneproteins: . . . 4
1.4.2 Alpha(
α
)-Heli al membrane proteins: . . . 41.4.3 Beta(
β
)-barrel membrane proteins:. . . 51.4.4 Monotopi andperipheral membrane proteins . . . 5
1.5 MetalsinBiologi al Systems. . . 6
1.5.1 Metalhomeostasis and tra king . . . 6
1.5.2 Thehuman oppermetallo haperone: Atox1(orHah1). . . 7
1.6 Whatthis Thesis isAbout ? . . . 9
2 Theory and Methods 11 2.1 Stru turepredi tion using NOE-like restraints withX-PLOR . . . 11
2.1.1 Introdu tion. . . 11
2.1.2 Distan e Geometry . . . 12
2.1.3 NOE-like and dihedralangle restraints . . . 13
2.1.4 Stru turedetermination . . . 14
2.2 Homology modelling . . . 15
2.2.1 Introdu tionto Homology modelling . . . 15
2.2.2 Classi alsteps inHomology modeling . . . 15
2.2.2.1 Template re ognition andinitial alignment . . . 16
2.2.2.2 Alignment orre tion . . . 16 2.2.2.3 Ba kbone generation. . . 17 2.2.2.4 Loop modeling . . . 17 2.2.2.5 Side- hain modeling . . . 17 2.2.2.6 Modeloptimization . . . 18 2.2.2.7 Modelvalidation . . . 18
2.2.3 Theprogram MODELLER . . . 19
2.3 Classi alMe hani s andDynami s . . . 19
2.3.2.1 Verlet Algorithm . . . 20
2.3.2.2 Leap-Frog Algorithm . . . 21
2.3.2.3 LangevinDynami s(LD) Simulation . . . 22
2.4 Adaptive Mole ular Dynami s . . . 23
2.4.1 Introdu tion. . . 23
2.4.2 Divide andConquer Algorithm . . . 23
2.5 Statisti al Me hani s . . . 25
2.5.1 Introdu tionto Statisti al Me hani s . . . 25
2.5.2 Denitions . . . 25
2.5.3 EnsembleAveragesand TimeAverages . . . 25
2.6 For eFields . . . 26
2.6.1 General FeaturesofMole ular Me hani s For eFields . . . 26
2.6.2 Bondedterms . . . 27
2.6.2.1 BondStre hing . . . 27
2.6.2.2 AngleBending . . . 29
2.6.2.3 Torsional Terms . . . 30
2.6.2.4 ImproperTorsions /Out-of-Plane Bending . . . 31
2.6.3 NonBonded terms . . . 32
2.6.3.1 Ele trostati Intera tions . . . 32
2.6.3.2 Van derWaals Intera tions . . . 32
2.7 TheCHARMM program andfor e eld . . . 33
2.7.1 TheCHARMM for eeld . . . 33
2.7.2 DataStru tures . . . 34
2.7.2.1 Residue Topology File (RTF) . . . 34
2.7.2.2 Parameter File(PARAM) . . . 34
2.7.2.3 Protein Stru tureFile(PSF) . . . 35
2.7.2.4 CoordinateFile(CRD) . . . 35
2.7.3 Energy Minimization inCHARMM. . . 35
2.7.3.1 Des riptionof Minimization. . . 35 2.7.3.2 Minimizationmethods . . . 35 2.8 Solvation . . . 36 2.8.1 Expli itsolvation . . . 36 2.8.1.1 Periodi boundaries . . . 37 2.8.2 Impli it solvation . . . 38
2.8.2.1 A essible surfa earea basedmethod . . . 38
2.8.2.2 Poisson-Boltzmann . . . 39
2.8.2.3 GeneralizedBorn. . . 39
2.8.2.4 GBSA . . . 40
2.8.2.5 Ad-ho fastsolvation models . . . 40
2.8.3 Hybridimpli it/expli it solvationmodels . . . 40
2.8.4.2 Vis osity . . . 41
2.8.4.3 Hydrogen-bondswithwater . . . 41
3 Metal-Binding Membrane Proteins 43 3.1 Ioni Channels andTransporters . . . 43
3.1.1 Overview . . . 43
3.1.2 Spe ial ATP-binding assette transporters . . . 44
3.1.3 K
ATP
- hannels . . . 443.1.4 Biomole ularsensors basedon SUR . . . 44
3.1.5 Modelofan ICCR . . . 46
3.2 Generalities about P-Type ATPases . . . 51
3.2.1 Introdu tion. . . 51
3.2.2 Catalyti Me hanismofP-type ATPases . . . 51
3.3 SERCA :AP-type ATPaseofknowntri-dimensional stru ture . . . 52
3.3.1 Stru tureof SERCA . . . 52
3.4 P1B-type ATPases . . . 53
3.4.1 Stru tural Featuresof P1B-typeATPases . . . 53
3.4.2 Physiologi al Roles ofP1B-Type ATPases . . . 55
3.4.3 Catalyti Me hanismofP1B-type ATPases . . . 55
3.4.4 TransmembraneMetal BindingSites and Classi ation . . . 56
3.4.5 Cadmium ATPases . . . 57
3.4.6 Cytoplasmi MetalBindingDomains . . . 58
3.4.6.1 Stru tureof ytoplasmi MBDs. . . 58
3.4.6.2 RegulatoryRoles of ytoplasmi MBDs . . . 58
3.4.7 TheATPBinding (ATP-BD) andA tuator (A)Domains . . . 60
3.4.8 HumanCopperATPases . . . 60
3.4.8.1 Overview . . . 60
3.4.8.2 ATP7A and ATP7B are representatives of the P1B-family of ion-transporting ATPases . . . 63
3.4.8.3 Fun tional a tivity of human Cu-ATPases is oupled to their ability to tra . . . 63
3.4.8.4 Domainorganization of humanCu-ATPases. . . 65
3.4.8.5 ATP7A and ATP7B have distin t fun tionalproperties . . . 66
3.4.9 Chara teristi transmembranehairpin TMS1,2 inP1B-ATPases . . . 66
3.4.10 The diverse roles of the metal-binding sites within the N-terminal domain of Cu-ATPases . . . 66
4 Stru ture and dynami s of The TransMembrane region of a Cd
2+
ATPase 69 4.1 Introdu tion . . . 694.2 Se ondarystru ture predi tion . . . 69
4.2.3 Byhomology withCopA. . . 72
4.3 3Dstru turepredi tion byhomology withotherATPases . . . 72
4.3.1 Homology withSer a. . . 73
4.3.2 Homology withCopA . . . 73
4.4 Ab initioStru turepredi tion . . . 75
4.4.1 Core TMbundleresponsiblefor metalbinding . . . 78
4.4.2 Topology ofCadA TMbundle: program buildtopo . . . 78
4.4.3 Derivation ofmodel restraints forX-PLOR . . . 80
4.4.4 Modelbuilding . . . 82
4.4.5 Modelrenement withCHARMM . . . 82
4.5 Model he king using standardmethods . . . 86
4.5.1 Using Pro he k . . . 86
4.5.2 Using stride . . . 86
4.6 Model he king using energy minimizationand MD simulations . . . 90
4.6.1 Validation usingbR andSer a . . . 90
4.6.1.1 Ba teriorhodopsin (BR) . . . 90 4.6.1.2 Ser a . . . 92 4.6.2 Resultson CadA . . . 94 4.6.2.1 Energy . . . 94 4.6.2.2 RMSD . . . 94 4.6.2.3 TMHtiltangles . . . 98 4.6.2.4 Cadmium sites . . . 98 4.6.2.5 RoleofK181 (eqv. K671) . . . 99 4.6.2.6 RoleofD202 (eqv. D692) . . . 100 4.7 3D-models ofCadA. . . 101
4.8 Dis ussionand perspe tives . . . 101
5 Dynami s and stability of the Metal Binding Domains of the Menkes ATPase 105 5.1 Introdu tion . . . 105
5.2 Models and simulationdetails . . . 105
5.2.1 Simulation parameters . . . 105
5.2.2 Modelstru tures . . . 106
5.2.3 Simulation details . . . 107
5.2.4 Restraints . . . 108
5.2.5 Additionalsessionsand restraints . . . 109
5.3 Results. . . 110
5.3.1 Sequen e alignment. . . 110
5.3.2 Timeevolutionofenergy and boxdimensions . . . 110
5.3.2.1 Energy . . . 110
5.3.2.2 Boxdimensions. . . 111
5.3.3.2 Following sessions: Apo . . . 111
5.3.3.3 Following sessions: Holo . . . 114
5.3.4 Conservation of se ondarystru ture . . . 116
5.3.4.1 Session1 . . . 116
5.3.4.2 Following sessions . . . 121
5.3.5 RootMean Square Flu tuations. . . 121
5.3.6 Radial Distribution Fun tion ofwater around Copper . . . 125
5.3.7 Average stru tures . . . 127
5.4 Summary, dis ussionandperspe tives . . . 128
6 Con lusion 135 7 Supplementary Material 137 7.1 Program for ndingtopologi al models ofCadA:buildtopo . . . 139
7.2 First pageof outputof buildtopo . . . 139
7.3 All 2D-gridmodels ofCADA . . . 141
Introdu tion
1.1 Histori al ba kground
Four billion years ago,the rst living ell appeared. Mother earth had been waiting patiently for at
least10billionyearsforthistohappenandeversin ethatday,nothinghasbeenquitethesame. The
ellsrepli ated and repli ated and nallya group of ellswrote this thesis.
In order to have a fun tional, living ell some of the basi things that are neededare: Some way
of onverting and utilizing energy, for instan e hemi al energy or the ray from the sun and self
repli ation and selfassembly. Therst ells might have usedRNAas theinformation arrier and as
mole ular ma hines. Nowadays, modern ells have DNA as the information arrier and RNAas an
intermediate steptowards proteinsbut sometimes italso fun tionsasmole ular ma hines.
Oneless obviousthing is thatmole ulesneed to be lo atedat theright pla eat theright time. One
thingthat aidsthis is ompartmentalisation, whenmole ulesarekeptseparated bydierent kindsof
barriers. For instan e, the DNAmole ule of a ell needsto belo atedinsidethe ell, ifit isoutside,
itwill rapidly getdegraded.
My resear h has been fo used on the proteins that are lo ated in the membrane that surrounds
ellsand ompartments. Theseproteinsareresponsiblefortransportthroughthemembrane,theyare
involvedin ellto ell signalling,theyare ru ialforgeneratingand onverting hemi alandsunrays
into useful energy. In orderto be ableto understand these pro esses indetail, a 3Dstru ture of the
proteinisneeded. Unfortunately,stru turedeterminationofmembraneproteinsisverytime
onsum-ing and di ult so there are onsiderably fewer stru turesavailable than for water soluble proteins.
There are millions of proteins with known amino a id sequen e and tens of thousands of proteins
withknown stru ture but only afew hundred ofthese stru tures aremembraneproteins. Therefore,
theoreti alapproa hesthatgivestru tural insight from theamino a idsequen e holdgreatpotential
sin e omputingtimeisvery heap. However, sin e omputationalpredi tionsarejustpredi tions,it
The revolutionary dis overy of the mole ular stru ture of DNA by Watson and Cri k (1953) made
a tremendous impa t on s ien e [1 ℄. Neverbefore had it been possible to study and understand the
mole ular details ofhowgeneti information wasstored. It ouldbeseen thattheDNAdoublehelix
was stabilizebyhydrogen bondsbetween Adenine (A),Thymine (T), Guanine(G) and Cytosine(C).
Theproblem wasthatno one knewhow theDNAsequen es weretranslated into proteins.
The geneti ode wasnallybroken bythe Nirenberglaboratoryin 1961-66[2℄. It wasdis overed
thatea h DNAnu leotide tripletstands for one ofthe20 dierent amino a ids thatarethe building
blo ksof peptides andproteins. Theseevents leadto the birth ofmole ular biology.
There are two main events involved in going from DNA to proteins. The rst step is the
tran-s ription, opying, ofDNA into omplementary RNAbytheenzymeRNApolymerase. Thisenzyme
is essential for life and present in all organisms and also in some viruses. The se ond step is the
translation of the mRNA into an amino a id that forms the protein. This step is performedby the
ribosome whi h translatesthe geneti ode to anamino a id hain.
One of the remaining unsolved problems of mole ular biology is how the linearamino a id hain
foldsinto a threedimensionalprotein. Allinformation abouthowa proteinstru ture will looklikeis
present initsamino a idsequen e [3℄.
Awatersolubleproteinfoldswithinfra tionsofase ondandthatmakesitimpossibletoexploreall
dierent onformations thatexist for theamino a ids. The Levinthalparadoxstatesthat due to the
large amount of degrees offreedom intheprotein hainthemole ule hasan astronomi al number of
possible onformationsasestimateis10
143
,[4 ℄. Iftheproteinwouldhave toexploreall onformations
it would take longer time thatthe universe has existed. Thus, there exist folding pathways, yet the
details ofthem remainun lear.
The key to a detailed understanding of protein fun tion lies in its stru ture. A high resolution
stru ture is neededto beable to study theprotein fun tion at amole ular level. The most ommon
methodto determine the stru ture isto useX-ray rystallography.
1.3 The ell membrane
Membranes area key element for life sin e they a tas a physi al barrier to separate the interior of
a ell from the outside world. They also serve to onne its dierent ompartments, providing the
means for a ell to be out from the equilibrium withthesurroundings, allowing important pro esses
o urring in the interior. In addition to be ru ial for ell integrity, ell membranes also serve as a
matrixandsupportformanytypesofproteinsinvolvedinimportant ellfun tionsandtherefore,they
areessential for ell fun tion.
Biologi almembranes onsistoforganizedassembliesoflipidsandproteins. The urrentknowledgeof
howtheyarestru turedisstills ar e duetothedi ulties asso iatedtotheexperimentalte hniques
required to investigate their properties. The modernview ofbiologi al membranes is basi ally based
on the uid mosai model proposed in the seventies [5℄, in whi h lipids are arranged in bilayers,
where proteins are embedded, and subje t to a lateral freely diusion. At that time, however, the
membraneproteins,thea eptedvisionhasbe omemoreprotein enteredandthe ru ialroleoflipids
diluted. It is not until more re ently that, witha better understanding of lipid-protein intera tions,
theroleof lipidsinprotein fun tion isagainhavingan in reasing attention [6,7 ℄. The urrent vision
ofmembranesismoreasamosai thanauid: lipidsorganizeamatrixwhere proteinsaredistributed
inregionsof biased ompositionwith varyingprotein environment [8℄.
Figure1.1: The ell membrane
Membranes onsist of a omplex mixture of lipids, where dierent proteins both, integral and
peripheral are embedded (See Fig. 1.1). Protein ontent variesgreatly among the dierent kinds of
membranes, ranging typi ally between 15 to 75%, depending on the fun tionsthat they must arry
out [9 ℄. Furthermore,lipid omposition hanges fromone membrane toanother due to theenormous
stru tural diversity found, that an be asso iated with the dierential roles and properties of ea h
membraneor region. The most widelyfound lipids onsist ofa stru ture ofa fattya idlinked byan
ester bondto anal ohol su h asgly erolor holesterol, or through amide bondsto a sphingoid base
or to otheramines. Mostlipids have a highlypolar head group and two hydro arbon tails.
In a typi al membrane, approximately half of the lipids are phospholipids, mainly phosphatidyl
- holines (PC), -ethanolamines (PE) and -serines (PS). Other major omponents following in
im-portan e are sphingolipids, gly olipids and holesterol [10℄. Interestingly, sin e the two sides of the
membrane bilayer must deal with dierent surroundings, the two leaets of a membrane typi ally
exhibit an asymmetri omposition. The lateral distribution of lipids isalso nonhomogeneous in
re-gardtolipid omponents,providingregionswithdierentlevelsofmole ular orderthatareknownas
rafts. These seem to be important for the fun tion of membrane proteins [11 ℄. Finally, the presen e
1.4 Computational biology and Bioinformati s
The amount of available geneti information ontinues to in rease at a steady pa e. Ever sin e
GenBank, a sequen e database, started 1982 it has doubled its size every 18 months. The latest
release (v160) ontains 77,248,690,945 bases. To put this enormous number into perspe tive, the
number of bases that was sequen ed from April to June 2007 was 42,880 ea h se ond. The DNA
sequen ing te hnology also gets faster and faster, e.g. a 454 gene sequen er an sequen e 25 million
bases with 99%a ura y in 4 hours, whi h makes it possible to sequen ea ba terial genome during
the ourse ofa day[13 ℄.
There is learly a need for methods that an make sense out of all this vast amount of geneti
information and this is where bioinformati s and omputational biology steps in. The dieren e in
denition between the two terms is that omputational biology refers to hypothesis driven
investi-gations of biologi al problems using omputers while bioinformati s is more te hnique driven and
on erns development of algorithms and omputational and statisti al te hniques. Many times the
two termsareusedinter hangeably.
The unifyingtopi inthis thesisis membraneproteins andComputational Biophysi s te hnology.
I will then ontinue with a brief overview about membrane proteins. Then, I will fo us on metal
transporting proteins.
1.4.1 Membrane proteins:
Insidethemembranetherearemembraneproteinswithmanydiversefun tions. Theyareresponsible
for sele ting whi h substan es are allowed to pass through the membrane, they are important for
transmitting signals, ell-to- ell ommuni ation, and numerousother fun tions. Morethan halfofall
drugsaretargeting membraneproteins [14 ,15 ℄.
The amount of membrane proteins insidethe lipid bilayer vary from organism to organism, from
ell type to ell type. For instan e, membranes proteinsmake up 75% of themassof the membrane
inE. oli while only 30%for humanmyelin sheath ells.
There arethree main lassesof membraneproteins:
1.4.2 Alpha (
α
)-Heli almembrane proteins:α
-heli almembraneproteinshavetheiraminoa idsarrangedintightlypa kedheli esinthetransmem-braneregions(Fig.1.2A).Thereasonbehindthe
α
-heli alarrangementistohaveaslargehydrophobioutersurfa e areaaspossible that an intera t withthehydrophobi fattya idsof thelipids. At the
same time,it alsosatisesall its internal ba kbonehydrogen bonds.
α
-heli al membrane proteins typi ally make up 20-30% of the en oded proteins of an organism[16 ℄. Mostoftenthey arelo atedinthe innermembrane but re ently one example,a polysa haride
A. An
α
-heli al membrane protein with 7 transmembrane heli es like ba teriorhodopsine studied inse tion 4.6.1.1. B. A
β
-barrel membrane protein with 8transmembraneβ
- strands. C. A monotopimembraneprotein.
1.4.3 Beta (
β
)-barrel membrane proteins:Theotherpossibilityofspanningthemembranewhilesatisfyinginternalba kbonehydrogenbondsand
maintainingalargehydrophobi surfa eareaisthe
β
-barrelfold(Fig.1.2B).Sin eaβ
-strandalwayshydrogen bonds to another
β
-strand, these proteins onsist ofβ
-hairpins through the membrane.Thus,this lassalways hasanevennumberof
β
-strands. Every se ondresidueinthetransmembraneregion is fa ing the inside of the barrel whereas the other is fa ing the outside, towards the lipids.
This alternating pattern is ree ted in the amino a id sequen e where one amino a id is non-polar
andhydrophobi (and dire tedtowardsthelipids) andthenextonemorepolar(anddire tedtowards
the interior of the barrel) et . Sin e a
β
-strand is extended ompared to the oiledα
-helix, thetransmembrane regions are shorterand typi ally onsist of 8 to 15 amino a ids [18 ℄. The
β
- barrelmembraneproteins an befound inthe outer membrane ofba teria, mito hondria and hloroplasts.
Around2-3%ofagenomeen odes
β
-barrelmembraneproteins[19℄butthatisanun ertainnumbergiventhedi ultiesofidentifying them. Themainreasonfor thisisthat
β
-strandsaremoredi ulttopredi tthan
α
-heli essin ethey onsistoffewerresiduesand ontainmorelongrangeintera tions.There arealso fewerstru turesof
β
-barrel membrane proteinsthanα
-heli al.1.4.4 Monotopi and peripheral membrane proteins
Monotopi membraneproteinsareonlyasso iatedtooneofthetwoleaetsofthemembrane. Prostaglandin
H2 synthase-1is one example and isshown inFig. 1.2C.[20 ℄. The heli es thatare asso iated to the
membraneareamphiphili [21 ℄ and the protein is onlyremovablefrom themembranebydetergents,
organi solventsor denaturantsthatinterfere withthe hydrophobi intera tions.
ele -domainsofother integralmembraneproteins. Peripheral membraneproteins an be disso iatedfrom
themembranebytreatment withsolutions of highioni strengthor elevatedpH.
1.5 Metals in Biologi al Systems
1.5.1 Metal homeostasis and tra king
Organismsrequireessentialheavymetalsin ludingCu,Zn,Mn,Fe,Co,NiandMoto arryout
biolog-i al fun tions. Heavymetals a tas ofa torsinenzymerea tions in ludinggrouptransfer,redoxand
hydrolysis[22℄. Otherslike Na,K,andCaareinvolved inphysiologi alpro essesand/ormaintaining
stru tureaswell as ontrolling the fun tionof ell walls. It hasbeenestimatedthatoveraquarterof
allknownenzymesrequireaparti ular metalionfor fun tion. Theseenzymes anbedividedintotwo
groups: metala tivatedenzymes and metalloenzymes. Metal-a tivatedenzymes require theaddition
ofmetalionstobe omestimulated[23 ℄. Kinasesareanexampleofthistypeofenzymesbe auseofits
use of the Mg-ATP omplex asa phosphoryl group donating substrate. Metalloenzymes ne essitate
metalionstofun tionproperly. Themetalionisrmlyboundtotheenzymeandisfrequentlyre y led
after protein degradation. Heme groups in hemoglobin or yto hromesbind a Fe
2+
ion tightly. The
Chlorophyll thatsustains every e osystemalso needs aMg
2+
ion. The Cu-Zn superoxide dismutases
(SOD) require metal ions to detoxify the ell from dangerous free radi als. These are some of the
manyexamplesofdiverserolesthatmetalionsplay. Metalionsareessentialforlife,yettheyaretoxi
to the ellsathigh on entrations or inthefreeform [22,24 ℄. They angenerate freeradi als,whi h
arehighly rea tive and oxidant mole ules that an rea t withanybiomole ule su h asnu lei a ids.
This intera tion with biomole ules may eventually lead to ell death. Cells have developed highly
omplex systems usedfor sensing, tra king and transporting of metals. Manyof these systems are
not well understood yet[25 , 26℄.
In biologi al systems, these metals are mostly bound to proteins. In these metalloproteins, they
have atalyti and stru tural roles:
1)
as onstituents of enzyme a tive sites;2)
stabilizing enzymetertiary or quaternary stru ture;
3)
forming weak-bonds withsubstrates ontributing to theirorien-tation to support hemi al rea tions; and
4)
stabilizing harged transitionstates[27℄. Inoxygenatedstates Cu, Fe and Mn have unpaired ele trons that allow their parti ipation in redox rea tions in
enzyme a tive sites [27 ℄. For instan e, Cu mediates the redu tion ofone superoxide anion to
hydro-gen peroxide and oxidation of a se ond superoxide anion to mole ular oxygen in the a tive site of
ytoplasmi superoxide dismutase [28℄. Zn does not have any unpaired ele trons in the Zn
2+ state
and it has been proposed to prevent the formation of harmful free radi als by ompeting with the
redox a tive metals su h as Fe and Cu inthe enzyme a tive sites [29℄. Zn
2+
is also a ofa tor of a
numberofenzymesin ludingRNApolymerase, arboni anhydraseandCu/Znsuperoxidedismutase
[27 ℄. Other heavy metals in luding Cd, Pb, Cr, Hg and As have no known physiologi al a tivity
and are non-essential [30℄. Elevated levels of both, essential and non-essential heavy metals, results
in toxi ity symptoms mostly asso iated with the formation of rea tive oxygen spe ies (ROS) that
homeostati me hanisms to maintainphysiologi al on entrations of essential heavy metals in
dier-ent ellular ompartments andto minimize thedamage fromexposure to non-essential ones.
The mainme hanisms ofheavy metalhomeostasisin lude transport, helation, and detoxi ation
byeuxorsequestrationintoorganelles. Heavymetalsaretransportedintothe ellsbyvarious
trans-membranemetal arriers. Ithasbeenshownthatalthough ellularZn
2+
orCu
2+
total on entrations
are in the millimolar and mi romolar range respe tively, ytosoli free Zn
2+
on entration is in the
femtomolar range while free Cu
+
is in the zeptomolar range, i.e. less than one free atom per ell
[25 , 32℄. Thisindi ates thatthe heavymetals areimmediately omplexed withmole ulesor peptides
upon entry to the ell. Chelators buer ytosoli metal on entrations and they in lude mole ules
su hasphosphates, phytates,polyphenolsandglutathiones, orsmall peptidessu h asphyto helatins
and some proteins like metallothioneins [27, 33℄. Someof these helators are thought to be involved
inmetaltransportinto sub ellularorganelle. Forinstan e, ithasbeen shownthatCadmiumATPase
fromListeriamono ytogenesis,subje tof hapter4ofthis work,transportsCd
2+
omplexesinto the
plasma. Chaperones are proteins that bind spe i essential heavy metals and deliverthem to
par-ti ular target metalloproteins where they fun tion as part of theenzymati a tivity. Similarly, they
tra the metal to spe i membrane transporters that eux the metal to the extra ellular spa e
and thelumen ofsub ellular organelles [34℄. Fig.1.3summarizes the interplay ofCu
+
with dierent
metallo haperones inyeast.
1.5.2 The human opper metallo haperone: Atox1 (or Hah1)
The redoxpotential of opper makes ita useful ofa tor for manyenzymes, but also requires opper
to be sequestered at all times to avoid oxidative damage. In yeast, it has been demonstrated that
almost no opper exists free in solution; instead all opper is bound to proteins or small mole ules
su h asglutathione [35℄. When opperenters the ell throughthe transporter Ctr1,it is ahought to
immediatelypasstoanothermole ulewiththe helpofa opper haperone. (SeeFig.1.3). Inhumans,
the ytosoli protein Atox1 (HAH1) has been shown to play a key role in the delivery of opper to
spe ialized membraneproteins alledCu-ATPases (Seese tion3.4for a ompletedes riptionofthese
proteins). The deletionof the Atox1geneinmi e leads to both intra ellular opper a umulation as
well asa de reaseinthea tivityof se reted opper-dependent enzymes [36 ℄,indi ative ofthe
dimin-ished Cu-ATPase transport a tivity. In yeast, the Atox1ortholog Atx1 hasbeen shown to fa ilitate
fun tion ofC 2, the P-type ATPase, whi h transports opper into thelate Golgi ompartment [37 ℄.
Theevolutionary onservationof fun tionalintera tions between Atox1and Cu-ATPasesemphasizes
therole ofthe haperone inregulating the Cu-ATPasefun tion.
Stru turally, Atox1 is a 68-amino a id residues protein that has the
βαββαβ
fold and a singleMx-CxxC opper binding motif ommon with the N-terminal Metal Binding Domains (MBDs) of
Cu-ATPases [38 ℄. Similarly to the N-terminal MBDs, Atox1binds opper with a linear, two- oordinate
geometry butotherdata,in luding workdone inthe laboratory[39℄,arehighlysuggestiveofa
trans-ferme hanismwhere oppermoves fromAtox1 to a metal-binding domain of Cu-ATPase through a
transportedinto yeast ellswithhighanity,following redu tionfromCu
2+
to Cu
+
bytheFre1and
Fre2 plasma membrane Cu(II)/Fe(III) ion redu tases. Then, the high-anity Cu transporters Ctr1
and Ctr3 mediate thepassage ofCu a rossthe plasmamembrane On e insidethe ell, CCS ensures
thedistributionof oppertotheCu/ZnSuperoxyde dismutase(SOD),Cox17isakey-elementforthe
in orporation of opperinto mito hondriae andAtx1supplies opperto to theP-typeATPase: C 2
intheGolgiapparatus. On eCu
+
rea hesC 2,itistransportedtothelumenoftheGolgi/endosome
ompartment. Here,fourCuatomsareassembledwithFet3. TheFet3-Ftr1 high-anityFe-transport
omplex assembles at the plasma membrane. Ctr2 is a va uolar opper onveyor making itpossible
maylook [38℄. Insolution, theintermediateis transient and annotbevisualized [40℄.
1.6 What this Thesis is About ?
Both hemistry and most other experimental s ien es usually rely on a top-down approa h. That
is, measurements are gradually rened to be able to observe smaller stru turesand faster pro esses
until te hni al limits are rea hed. If we had a ess to very, very powerful omputers one ould try
to reverse this algorithm and do bottom-up modelling instead. This is the basi idea behind the
approa h Ihave usedfor the resear hsummarizedinthis thesis;starting fromsimple(almost trivial)
pairwiseintera tions ofatoms, omputers anbeusedtosimulatewhathappensin omplexbiologi al
mole ules on longer times ales. This way, it is a tually possible to see the atomi motions on a
levelusually not a essible to experiments. Theknowledge gained an thenbeused to return to the
drawing table and formulate better models for the phenomena observed, to be able to understand
and perhaps even manipulate the systems, e.g. transport drug mole ules or metals through ellular
membranes. Further, when the simulations rea h time and length s ale where it is also possible to
performexperimenst(andagreewiththese),the hemistryandphysi softhemole ules anbetra ed
all thewayfrom individual atoms up real-world ma ros opi systems. (Of ourse,that isthe ni e,
ideal pi tureinpra ti e it ishours, daysand months ofprogramming, debugging,running dynami s
and waitingfor those longrunsto nish. Butat leastyou neverhave to doanywetlaboratory work
!)
Morepre isely,andfromthetitleofthisthesis,I'vebeeninterestedinstru turepredi tionofheavy
metalATPases anddynami s oftheir MetalBindingDomains usingan insili oapproa h. Afterthis
introdu tion on membrane proteins and metals inbiologi al systems,thethesis is divided in4 main
hapters:
- TheoryandMethods: Iwillpresent omputationalte hniquesusedtopredi tthethree-dimensional
stru ture of proteins when it is unknown experimentally. Then, I will present mole ular
me- hani sfor eeldsand mole ulardynami s simulation algorithmsin ludingadaptive mole ular
dynami s thatIhave usedto study thefun tion ofthese proteins.
- Metal-BindingMembraneProteins: Inthis hapter, Iwilldetailthe urrentknowledgeonheavy
metaltransportingATPasesfo ussingonCd
2+
andCu
+
ATPases. Iwillalsoshowhowwe ould
model the stru ture of an ion hannel oupled to a 7-transmembrane-helix re eptor and open
thewayfor thestudy ofsignaltransdu tion inthis system.
- Stru ture and dynami s of The TransMembrane region of a Cd
2+
ATPase: I have applied
omputational te hniques to yieldseveral possible models of theTMregion of a Cd
2+
ATPase
also studied experimentally inthe laboratory.
- Dynami sand stability of theMetalBinding Domains of theMenkesATPase: In this hapter,
I've usedthe known3Dstru ture oftheN-terminalsoluble partofaCu
+
ATPaseasastarting
point for mole ular dynami s simulations of the metal binding domains of this protein in the
fun tion oftwoparti ular heavymetalATPases. When thestru ture oftheseATPases wasunknown
stru ture predi tion methods have been used. Mole ular dynami s simulations have been used to
approa h thefun tion oftheseproteins. Thenalgoalisto understandhowthemetalispassedfrom
soluble haperones whi h bindthemetalassoonasitenters the ell totheir partner ATPases whose
roleistodistributethis metalwhere it'sneededor eliminateit. Myworkmayalsobeusedto predi t
theee t of or proposeamino a id mutations thatimpair thenormal behaviorof theseproteisn and
Theory and Methods
Whenattemptingto studybiologi aland hemi alme hanismsinproteinsusing omputational
te h-niques, the rst requirement is the stru ture of the ma romole ule involved. When this stru ture is
not available fromdatabases, we an eithertry to predi tit abinitio, thatis withoutanyknowledge
about otherexisting stru tures, or usehomology modeling when thestru ture of a similarprotein is
known.
I will start des ribing a new method that we have developed to build stru tural models of the
transmembranepartofproteinswhennohomologyispresent. ThenIwillpresenthomologymodelling
and the program MODELLER that we have used to build models of an ABC transporter. Finally,
the stru ture of the protein being known, I will present the te hniques of mole ular me hani s and
dynami s and the program CHARMMthatI have usedto study thedynami s ofATPases.
2.1 Stru ture predi tion using NOE-like restraints with X-PLOR
2.1.1 Introdu tion
X-PLOR[41℄isaprogramusedtodetermineandrenesolutionNMRstru turesbasedoninterproton
distan e estimates (from Nu lear Overhauser Ee ts or NOE), oupling onstants measurements,
and other information, su h as known hydrogen-bonding patterns. What su h measures an we
theoreti allyderive or predi twhen buildingmodelsofthetransmembrane(TM) partofan
α
-heli almembraneprotein ?
- Ea hTMhelix anbe onsideredasaregular
α
-helix. Wethenknowthatthedistan ebetweenHatom of residue
i
inthe helixand Oatom ofresiduei + 4
is lose to 1.5Å.- We also knowthatinsu h aregularhelix, the
φ
andψ
dihedralangles have values lose to -60and -50degrees,respe tively.
- TMheli esformabundleinwhi hthe enterof1helixis lose tothe enterof3or4neighbour
heli es;the distan ebetween enters liesaround 10 Å.
- InMetal transportingTM proteins, residuesin3 or 4TMheli es areknown to bindthe
trans-ported metal; this knowledge allows us to set up some new distan e restraints between metal
eralalternativemethods possible inX-PLOR:full-stru turedistan e geometry,substru turedistan e
geometry,andabinitiosimulatedannealing(SA)startingfromtemplatestru turesorrandom
oordi-nates. The hoi eofproto oldependsonthedesirede ien yand samplingof onformationalspa e.
In our ase of theoreti al distan e restraints substru ture embedding and regularization followed by
SArenementhavebeenused. Simulatedannealingissopowerfulthatit an onvertarandomarray
ofatomsintoawelldenedstru turethroughdistan erestraints. Stru turesobtainedbythisproto ol
haveto be regularized andrened.
Figure2.1: Overviewofstru ture al ulations
2.1.2 Distan e Geometry
Manytypesofstru turalinformation(distan es,J- ouplingdata, hemi al ross-linking,neutron
s at-tering,predi tedse ondarystru tures,et .) anbe onveniently expressedasintra-or intermole ular
distan es. In our ase, we dene NOE-like distan es whi h are distan e restraints imposed on our
models and similar to NOE results. In the absen e of J- oupling data, we dene dihedral angle
restraints. The distan e geometry formalism allows these distan e restraints to be assembled and
three-dimensional stru tures onsistent with them to be al ulated. The distan e geometry routines
inX-PLORbeginbytranslatingthebondlengths, bondangles,dihedralangles,improperangles,and
van derWaals radii in the urrent mole ular stru ture into a (sparse) matrix of distan ebounds
be-tweenthebondedatoms,atomsthatarebondedtoa ommonthirdatom,oratomsthatare onne ted
Experimental onstraints anbeadded bytheNOE assign statementsand therestraints dihedral
statements. These listsof restraints areautomati ally read,translated into distan e onstraints, and
enteredinto the boundsmatrix. Theyhave thefollowing forms:
NOE-like distan e restraints:
The total distan erestraint energy
E
N OE
is asumover all distan erestraints:E
N OE
=
X
restraints
e
N OE
where:e
N OE
= min(1000, S)∆
2
(2.1) and∆
isdened as∆ =
R − (d + d
plus
)
ifd + d
plus
< R
0
ifd − d
minus
< R < d + d
plus
d − d
minus
− R
ifR < d − d
minus
S
is the s ale fa tor, andd
,d
plus
, andd
minus
are the average target distan e and ranges andR
is thedistan ebetween the two atoms. This denes the biharmoni fun tion witha valueof 0 for
R
largerthan
d − d
minus
andsmaller thand + d
plus
.Dihedral Angle Restraints:
The fun tionalform oftheee tive dihedralrestraint energy
E
CDIH
isgivenbyE
CDIH
=
X
well(modulo
2π
(φ − φ
o
), ∆φ)
2
(2.2)where thesumextends overall restrained dihedral angles and thesquare-well potential
well(a, b)
isgiven by
well(a, b) =
a − b
ifa > b
0
if−b < a < b
a + b
ifa < −b
Plane Restraints:Planar restraints have alsobeen addedinsome simulationsto maintainatoms ina ommon plane
mimi king thephospholipid membrane. Restraints on an individual atom are based on its distan e
fromaplanedenedbythenormalve tor
~z
. Anatomforwhi hplanerestraintsaredenedexperien esrestraints only in the dire tion of
~z
. Planerestraints are dened from the ve tor dieren e betweenpresent (
~r
)and referen e(~r
ref
i
) oordinates by:E
P lane
=
X
~z
|~z|
.
~r
i
− ~r
i
ref
2
(2.3)Stru ture determination of extended polypeptides or DNA/RNAdouble strands is usually
underde-termined. In parti ular,the overall shape or bend of themole ule isa freeparameter. Thus, neither
distan egeometrynor abinitiosimulatedannealingwillprodu e uniquestru tures. Theproblem an
beavoidedbyin ludingadditional restraints
The rst step of stru ture determination using X-PLOR involves providing the program with
the information it needs about the mole ular stru ture, NOE-like distan e bounds, dihedral angle
restraints. Themole ularinformationofthema romole ulehastobegeneratedusingtheall-hydrogen
for eelds "topallhdg.pro", "parallhdg.pro" for proteins.
Template Stru ture
The next step involvesgeneration of a template oordinate set. The template oordinate set an be
any onformation of the ma romole ule with good lo al geometry and no nonbonded onta ts. It
an be generated by using most mole ular modeling graphi s programs or, preferably, by using the
X-PLORproto oldes ribedbelow. The purposeofthe template oordinate setisto providedistan e
geometry information about the lo algeometry of thema romole ule.
The proto ol we used initially pla es the atoms of the ma romole ule along the x-axis, with y
and z setto random numbers. The oordinates are thenregularized using simulated annealing(See
Fig.2.2).
Generally, when too many ovalent links are present, the stru ture may get entangled in a knot
whi hwillresultinpoorlo algeometry. Ingeneral,someexperimentation mayberequiredtondout
if ertain ovalentlinks have toberemoved;the goalisto obtainan energybelow1000 k al/mole for
thenal step of minimization. We have taken advantage of the adaptive dynami s program (AMD)
(See se tion 2.4) whi h allows us to easily manipulate stru tures and entangle the knots by setting
thevanderWaalsfor e tozero inthe pro ess.
Substru ture embedding
A family of embedded substru tures is produ ed using distan e geometry. The substru tures are
regularizedafterembeddingusingminimizationagainsttheDGenergyterm. (ThisDistan eGeometry
term is still another harmoni potential meant to maintain a given distan e between a lowerand an
upperbound). Covalentlinksshouldbetreatedinthesamefashionasinthepriortemplategeneration.
Theremoved ovalent linksare reintrodu ed asdistan e restraints.
SA-Regularization of DG-Stru tures
Embedded distan e geometry (substru ture) oordinates require extensive regularization. The next
proto ol uses template tting followed by simulated annealing to regularize the oordinates. The
proto ol is lose in spirit to the one publishedbyNilges etal [43 ℄. The starting oordinates have to
bedened for at leastthree atoms inea h residue. Covalent links arenowtreated asreal bondsin
this proto ol, in ontrast to template generation.
Simulated Annealing Renement
Stru turesarenallyrenedusingaproto oloftheslow- oolingtypereminis entoftheproto olused
in rystallographi renement with softeningof thevanderWaals repulsions. This enables atoms to
move through ea h other.
Figure 2.2: Flow hartfor simulated annealing
2.2 Homology modelling
We have also used a more standard method for stru ture predi tion when a homology between the
protein of interest and otherproteins ofknown 3Dstru tureexists:
2.2.1 Introdu tion to Homology modelling
During evolution, sequen e hanges mu h faster than stru ture. It is possible to identify the
3D-stru ture by looking at a mole ule with some sequen e identity. Fig. 2.3 shows how mu h sequen e
identity is needed with a ertain number of aligned residues to rea h the safe homology modeling
zone. Forasequen eof100residues,forexample,asequen eidentityof40%issu ientforstru ture
predi tion. Whenthesequen eidentityfallsinthesafehomologymodeling zone,we an assumethat
the3D-stru ture ofboth sequen es isthesame.
2.2.2 Classi al steps in Homology modeling
The known stru ture is alled the template, the unknown stru ture is alled the target. Homology
stru ture when their sequen e identity fallsinthe safehomology zone, theupperpartof the pi ture
(From work bySander andS hneider [44 ℄).
2.2.2.1 Template re ognition and initial alignment
Inthisstepyou ompare thesequen e oftheunknownstru turewithalltheknownstru turesstored
intheProteinDataBank(PDB).Asear hwithBLASTagainstthisdatabasewillgivealistofknown
protein stru turesthat mat h the sequen e. If BLAST annot nd a template a more sophisti ated
te hnique might be ne essaryto identifythestru tureofamole ule. BLASTusesaresidueex hange
matrix to dene a s ore for every hit. Residues that are easily ex hanged (for example Ile to Leu)
get a better s ore than residues that have dierent properties (for example Glu to Trp). Conserved
residues with a spe i fun tion get the best s ore (for example Cys to Cys). Every hit is s ored
using this matrix and BLAST will provide a list of possible templates for the uknown stru ture. To
make the best initial alignment, BLAST uses an alignment-matrix based on the residue ex hange
matrix and adds extra penalties for opening and extension of a gap between residues. In pra ti e
the target-sequen e is sent to a BLAST server, whi h sear hes the PDB to obtain a list of possible
templatesand their alignments. Subsequently thebest hithas to be hosen, whi h is not ne essarily
the rst one. One has to keep in mind the resolution, missing parts, dierent states of a tion and
possibleligandsof themole ulesindoing so.
2.2.2.2 Alignment orre tion
It ispossiblethatthe alignment hasto be orre ted. A hange ofAla toGlu is possible but unlikely
to happen ina hydrophobi ore, so this Alaand Glu annot be aligned. Using a multiple sequen e
less likely to be hanged than the residues at the outside. Insertions and deletions an be made in
widelydivergentparts ofthemole ule andamultiple sequen ealignment anbehelpfulto ndthese
pla es. Gaps have to beshifted arounduntil they areassmall aspossible. InFig. 2.4is shownthat
after a deletion of 3 residues a big gap o urs in the red stru ture, whi h was the best alignment.
Aftershiftingseveralresidues,the gapismu hsmaller(bluestru ture) andmorelikely tobe orre t.
Corre tionof the alignment is typi allydone byhand.
Figure 2.4: Template stru ture (green) with the best aligned target (red) with a large gap, and the
target aftershifting several residues (blue). The gapis mu h smallernow.
2.2.2.3 Ba kbone generation
When the alignment is orre t, the ba kbone of the target an be reated. The oordinates of the
template-ba kboneare opiedtothetarget. Whentheresiduesareidenti al,theside- hain oordinates
are also opied. Be ause aPDB-le an always ontain some errors, it an be useful to make useof
multiple templates.
2.2.2.4 Loop modeling
Oftenthealignmentwill ontaingapsasaresultofdeletionsandinsertions. Whenthetargetsequen e
ontains agap,one ansimplydelete the orrespondingresiduesinthetemplate. This reatesahole
inthemodel,thishasalreadybeendis ussedinpreviousstep. Whenthereisaninsertioninthetarget,
thetemplatewill ontainagap andtherearenoba kbone oordinatesknownfortheseresiduesinthe
model. The ba kbone from the template has to be utto insert these residues. Su h large hanges
annot be modeled in se ondary stru ture elements and therefore have to be pla ed in loops and
strands. Surfa e loops are, however, exible and di ult to predi t. One way to handle loops is to
take some residues before andafter the insertion as"an hor"residues and sear h thePDB for loops
with the same an hor-residues. The best loop is simply opied in the model. In the MODELLER
program, thatwe haveused(See se tion2.2.3), loop modellingistreated withspe ial are.
2.2.2.5 Side- hain modeling
The next step isto add side- hains to theba kbone of the model. Conserved residues were already
to predi t the rotamer be ause many ba kbone ongurations strongly prefer a spe i rotamer as
showninFig.2.5,inthe aseofatyrosineresidue. Therearelibrariesbasedupontheba kboneofthe
residues anking the residueofinterest. By usingthese librariesthebestrotamer an be predi ted.
Figure2.5: Preferedrotamersoftyrosinexempliedwithdierent positionsofTYR52inthe10NMR
models oftherst MetalBinding DomainoftheMenkesATPase(PDB ode1KVJ).
2.2.2.6 Model optimization
The model has to be optimized be ause the hanged side- hains an ee t the ba kbone, and a
hangedba kbonewillhaveee tonthepredi tedrotamers. Optimization anbedonebyperforming
renementsusing Mole ularDynami s simulationsof themodel. Themodel ispla ed ina for e-eld
andthemovementsof the mole ulesarefollowed intime, thismimi sthefolding oftheprotein. The
bigerrorslikebumpswillberemovedbutnewsmallererrors anbeintrodu ed. The al ulatedenergy
should be aslowaspossible.
2.2.2.7 Model validation
Every model ontains errors. The model withthe lowest for eeldenergy might still be folded
om-pletely wrong. That is why the model should be he ked for bumps and if thebond angles, torsion
angles and bond lenghts are within normal ranges. Other properties, like the distribution of
po-lar/apolar residues, an be ompared with real stru tures. This an be done by using Pro he k, for
example. Theoutput an helpin the identi ation of errorsinthe model. When anerror o urs far
awayfromthea tivesite,itdoesnothaveto bebad. Butwhenanerroro ursinthea tivesite,one
MODELLER [45℄ is a omputer program that models three-dimensional stru tures of proteins and
their assemblies by satisfa tion of spatial restraints. MODELLER is most frequently used for
ho-mology or omparative protein stru ture modeling: The user provides an alignment of a sequen e
to bemodeledwith knownrelated stru turesand MODELLER will automati ally al ulate amodel
withall non-hydrogen atoms. Moregenerally,theinputs to theprogram arerestraints onthespatial
stru tureoftheaminoa idsequen e(s)andligandstobemodeled. Theoutputisa3Dstru turethat
satisesthese restraints aswell as possible. Restraints an inprin iple be derived from a number of
dierentsour es. Thesein luderelatedproteinstru tures( omparativemodeling),NMRexperiments
(NMR renement), rules of se ondary stru ture pa king ( ombinatorial modeling)... The restraints
an operate ondistan es, angles,dihedralangles,pairs of dihedralangles andsome otherspatial
fea-turesdened byatoms orpseudoatoms. Presently,MODELLERautomati allyderivestherestraints
only from the known related stru tures and their alignment with the target sequen e. A 3Dmodel
is obtained byoptimization of a mole ular probabilitydensity fun tion(pdf). The mole ular pdf for
omparativemodelingisoptimizedwiththevariabletargetfun tionpro edureinCartesianspa ethat
employs methods of onjugate gradients and mole ular dynami s with simulated annealing.
MOD-ELLER an also perform multiple omparison of protein sequen es and/or stru tures, lustering of
proteins, and sear hing ofsequen e databases.
2.3 Classi al Me hani s and Dynami s
The 3Dstru ture of themole ulesbeingknown, we arenow fa edwith thesigni ant omplexityof
thestru turesandintera tions thatwewantto simulateforarealisti model. Twopossiblestrategies
aregenerallyused: we an eitherin reasethe pro essingpower (e.g. use ostly parallel
super omput-ers), or use simplied representations of the geometry or of thedynami s of the involved mole ules.
Frequently, these simpli ation methods involve representations as me hani al models or inredu ed
oordinates(e.g. modellingthemole uleasanarti ulated body),wheresubsetsofatomsarerepla ed
byidealizedstru tures[46℄[47 ℄,orperformingnormal-modeorprin ipal omponentsanalysisinorder
to determine the essential dynami s of the system. Be ause they ontain fewer degrees of freedom,
thesesimpliedrepresentationsallowustoa eleratethe omputationofthemole ulardynami s,and
fa ilitatethestudyofthemole ularintera tions. Inthefollowing,Iwilldes ribehowamole ular
me- hani smodelisbuilt withatomsrepresentedas hargedspheresand ovalentbondsbysprings. This
me hani al model an onsequently bestudied using the lassi al equationsof motion of me hani s.
2.3.1 Newton's Se ond Law
Mole ularDynami s simulationsare basedonNewton'sse ond law, theequationof motion [48 ℄,[49 ℄:
~
F
i
= m
i
.~a
i
= m
i
.
d~v
i
dt
(2.4)It des ribesthemotionofa parti leofmass
m
i
alongthe oordinatex
i
withF
i
beingthefor eonm
i
in that dire tion. This is used to al ulate the motion of a nite number of atoms or mole ules,apotential energyfun tion,
V ~
(x)
,where~x
orrespondsto the oordinates ofallatoms inthesystem.Therelationship of the potential energy fun tion and Newton'sse ondlaw isgiven by
~
F (x
i
) = −∇
i
V (x
i
),
(2.5)with
F (x
~
i
)
being the for e a ting on a parti le due to a potential,V ~
(x)
. Combining these twoequaions gives
dV ~
(x)
dx
i
= −m
i
.
d
2
x
i
dt
2
,
(2.6)whi h relates the derivative of the potential energy to the hanges of the atomi oordinates in
time. Asthepotentialenergyisa omplexmultidimensionalfun tionthisequation anonlybesolved
numeri ally withsome approximations.
Withthea elerationbeing
a = −
1
m
.
dV
dx
we anthen al ulatethe hangesofthesystemintimeby justknowing (i) the potential energyV ~
(x)
, (ii)initial oordinatesx
i0
and(iii) an initial distributionof velo ities,
v
i0
. Thus,this methodis deterministi , meaning we an predi tthestate of thesystemat any point of time inthefutureor thepast.
The initial distribution of velo ities is usually randomly hosen from a Gaussian or
Maxwell-Boltzmanndistribution[49℄,whi hgivestheprobabilityofatomi havingthevelo ityinthedire tion
of x at thetemperature Tby:
p(v
i
,
x
) =
m
i
2πk
b
T
1
2
.exp
−
1
2
m
i
v
2
i
,
x
k
b
T
.
Velo ities arethen orre ted sothatthe overallmomentumof thesystemequalsa zerove tor:
P =
N
X
n=1
m
i
.~
v
i
= ~0.
2.3.2 Integration AlgorithmsThe solution of the equation of motion given above is a rather simple one whi h is only su iently
good over a very short period of time, in whi h the velo ities and a elerations an be regarded as
onstant. Soalgorithmswereintrodu edrepeatedlyperformingsmalltimesteps,thuspropagatingthe
system'sproperties (positions, velo ities and a elerations) intime. Time steps are typi ally hosen
in the range of 1 fs[49℄. It is ne essary to use su h a small timestep, as many mole ular pro esses
o urinsu hsmallperiodsoftime thatthey annotberesolvedwithlarger timesteps. A timeseries
of oordinate sets al ulated this way is referred to as a traje tory and a single oordinate set as a
frame.
2.3.2.1 Verlet Algorithm
Allalgorithmsassumethatthesystem'sproperties anbeapproximatedbyaTaylorseriesexpansion
~x (t + δt) = ~x (t) + δt.~v (t) +
1
2
δt
2
.~a (t) + ....
~v (t + δt) = ~v (t) + δt.~a (t) +
1
2
δt
2
.~b (t) + ....
~a (t + δt) = ~a (t) + δt.~b (t) +
1
2
δt
2
.~c (t) + ....
with
~x
,~v
and a being thepositions, thevelo ities and thea elerations of thesystem.The seriesexpansion is usually trun ated after the quadrati term. Probably the most widely used algorithm
for integrating the equations of motion inMD simulations isthe Verlet algorithm (1967)[48℄,[49℄. It
anbederived bysimplysummingtheTaylorexpressionsfor the oordinates atthetime
(t + δt)
and(t − δt)
:~x (t + δt) = ~x (t) + δt.~v (t) +
1
2
δt
2
.~a (t) + ....
~x (t − δt) = ~x (t) − δt.~v (t) +
1
2
δt
2
.~a (t) − ....
⇒ ~x (t + δt) = 2~x (t) − ~x (t − δt) + δt
2
.~a (t) .
Thus, itusestheposition
~x (t)
and a eleration~a (t)
at timet andthepositions fromtheprevious step~x (t − δt)
to al ulate new positions~x (t + δt)
. In this algorithm velo ities are not expli itlyal ulated but an be obtained in several ways. One is to al ulate mean velo ities between the
positions
~x (t + δt)
and~x (t − δt)
.~v (t) =
1
2δt
.[~x (t + δt) − ~x (t − δt)]
Theadvantagesofthisalgorithmarethatitisstraightforwardandhasmodeststoragerequirements,
omprisingonlytwosetsofpositions[
~x (t)
and~x (t − δt)
℄andthea elerations~a(t)
. Thedisadvantage,however,isitsmoderatepre ision,be ausethepositionsareobtainedbyaddingasmallterm
[δt
2
~a(t)]
to the dieren e of two mu h larger terms
[2~x(t) − ~x (t − δt)]
. This resultsin roundingerrors due tonumeri al limitationsof the omputer.
Furthermore, this isobviouslynot aself-starting algorithm. New positions
~x (t + δt)
areobtainedfrom the urrent positions
~x(t]
and the positions at the previous step~x (t − δt)
. So at t =0 thereare no positions for
(t − δt)
and therefore it is ne essary to provide another way to al ulate them.Onewayis to usetheTaylor expansiontrun ated aftertherstterm:
~x (t − δt) = ~x (t) − δt.~v (t) + ....
⇒ ~x (−δt) = ~x (0) − δt.~v (0)
2.3.2.2 Leap-Frog Algorithm
There areseveralvariations oftheVerletalgorithm tryingto avoidits disadvantages. Oneexampleis
~v
t +
1
2
δt
= ~v
t −
1
2
δt
+ δt.~a(t)
~x (t + δt) = ~x (t) + δt.~v
t +
1
2
δt
,
where
~a(t)
isobtained using~a(t) = −
m
1
.
dV
d~x
.
First, the velo ities~v t +
1
2
δt
are al ulated from the velo ities at
t − δt
and the a elerations~a(t)
. Then the positions~x (t + δt)
arededu ed from the velo ities just al ulated and thepositionsat time t. In this way the velo ities rst 'leap-frog' over the positions and then the positions leap
overthe velo ities. The leap-frog algorithm's advantages overthe Verletalgorithm arethe in lusion
of theexpli it velo ities and the la k ofthe need to al ulate thedieren es between largenumbers.
An obvious disadvantage, however, is that the positions and velo ities are not syn hronized. This
means it is not possible to al ulate the ontribution of the kineti energy (from the velo ities) and
thepotential energy (fromthepositions)to the total energy simultaneously.
2.3.2.3 Langevin Dynami s (LD) Simulation
TheLangevinequationisasto hasti dierentialequationinwhi htwofor etermshavebeenaddedto
Newton'sse ondlawtoapproximate theee ts ofnegle teddegreesof freedom. Onetermrepresents
a fri tional for e, the other a random for e
R
~
. For example, the ee ts of solvent mole ules notexpli itly present in thesystembeing simulatedwouldbeapproximated in termsof afri tional drag
on the soluteas well as random ki ks asso iated withthe thermal motions of the solvent mole ules.
Sin e fri tion opposes motion, therst additional for e isproportional to theparti le's velo ity and
oppositelydire ted. Langevin'sequationfor the motion of atomiis:
~
F
i
− γ
i
~v
i
+ ~
R
i
(t) = m
i
~a
i
,
where
~
F
i
is still the sum of all for es exerted on atom i by other atoms expli itly present in thesystem. Thisequation isoftenexpressedinterms ofthe` ollision frequen y'
ζ = γ/m
.The fri tion oe ient is related to the u tuations of the random for e by the
u tuation-dissipation theorem:
h ~
R
i
(t)i = 0,
Z
h ~
R
i
(0) · ~
R
i
(t)idt = 6k
B
T γ
i
.
In simulations it is often assumed that the random for e is ompletely un orrelated at dierent
times. That is,the above equationtakes theform:
The temperature of the system being simulated is maintained via this relationship between
~
R(t)
andγ
.Thejostlingofasolutebysolvent an expeditebarrier rossing,andhen eLangevindynami s an
sear h onformations better than Newtonian mole ular dynami s (
γ = 0
).2.4 Adaptive Mole ular Dynami s
2.4.1 Introdu tion
Inthe previoussimpleme hani almodel, theonly wayto redu ethe omputational ostofa
al ula-tionknowing thatsome partofthe systemisless important intheme hanismunderstudy isto x
the orrespondingatoms. Thus,wemusthavesomepriorknowledgeaboutthefun tionofthesystem
andwe ompletely negle tthedynami softhislessimportant partanditspossibleintera tions with
the rea tion site. In other words, there was no method that automati ally determines whi h parts
of the mole ule mustbe pre isely simulated, and whi hparts an be simpliedwithout ae ting the
studyofthemole ularintera tion. Resear hersinStephaneRedon'steam(Nano-D,atINRIA
Greno-ble) have re ently introdu ed adaptive torsion-angle quasi-stati s and Adaptive Mole ularDynami s
(AMD), a general te hnique to rigorously and automati ally determine the most important regions
ina simulationof mole ules represented asarti ulated bodies. At ea h timestep, theadaptive
algo-rithmdetermines theset ofjoints thatshould besimulated inorder to bestapproximate the motion
that would be obtained if all degrees of freedom were simulated, based on the urrent state of the
simulation and user-dened pre ision or time onstraints. They built on previous resear h on
adap-tive arti ulated-bodysimulation[51 ℄ andproposednoveldata stru turesand algorithmsfor adaptive
updateofmole ular for es and energies.
2.4.2 Divide and Conquer Algorithm
The starting point for the study of the dynami s of an arti ulated body is a Divide-And-Conquer
Algorithm proposedbyRoyFeatherstone[52℄. Featherstonere ursively denesan arti ulated body
by assembling two (rigid or arti ulated) bodies together. A omplete arti ulated bodyis thus
repre-sented by a binary tree: the root node des ribes the whole arti ulated body, while ea h leaf node is
a rigid body with a set of handles, i.e. lo ations atta hed to some other rigid bodies. Let C be an
arti ulatedbodywithmhandles, Featherstone denesthearti ulated- bodyequation:
a
1
a
2
. . .a
m
=
Φ
1
Φ
12
· · · Φ
1m
Φ
21
Φ
2
· · · Φ
2m
. . . . . . . . . . . .Φ
m1
Φ
m2
· · · Φ
mn
f
1
f
2
. . .f
m
+
b
1
b
2
. . .b
m
(2.7)where a
i
is thespatial a eleration of handlei
,fi
is the spatial for e applied to handlei
,bi
thebias a elerationofhandlei
,Φ
i
istheinversearti ulatedbodyinertiaofhandlei
andΦ
ij
the ross- ouplinginverse inertia between handles
i
andj
. This equation is theequivalent of the lassi al equation of motion:a
i
=
1
m
i
.f
i
We thus onsiderthe mole ulesas arti ulated bodies: every rigid bodyis one atom or a groupof
atoms. Joints (handles) between rigid bodies are ovalent bonds around whi h rotation is possible.
Thedynami sis al ulated inthedihedralanglespa e onstitutingthesystemlikeallthe
φ
,ψ
andχ
angles inproteins. Ifat agiventime, partsofthesystemare onsidered asrigid,theyforma subtree
ofthe ompleteassemblytreewhere for esneed not bere al ulated. Therest is onstituted ofa tive
jointsforming ana tive region (SeeFig.2.6).
Figure 2.6: A tive andrigid regions. Thegure orrespondsto5 a tive joints
The usefulness of the tree representation will now be made lear: The Featherstone algorithm is
a omplishedintwostages: i)1stagefrombottomtotop,fromtheleavestothetopwherearti ulated
body oe ientsb
i
andΦ
ij
forea h ompositeobje tCis al ulatedfromithe oe ientsofitssonsA andB.ii)1stagefromtoptobottomtoyieldjointa elerationsq
¨
i
andfor es. Thejointa elerations¨
q
i
arethe se ondderivatives ofthe motion variables su h asφ
¨
i
ifwe are interested to themovementarounddihedralangle
φ
i
Now, atheorem statesthatthe sum ofthesquares ofthea elerations :A =
X
i
¨
q
2
i
of the joints inone node of the tree an be al ulated without knowing thevalues of theindividual
¨
q
i
of its hildren nodes. This means that the algorithm an de ide by its own to partition the treein a tive and rigid regions, the latter being those where the metri s A is the lowest and thus the
dynami s maybe onsidered asless important for theme hanism understudy.
I parti ipatedin thiswork whi h waspublished inBioinformati s in2007 [53℄. I have used AMD
many times during my thesis both for theintera tive visualization of mole ules and in attempts to
for es (f
i
) on the system with the omputer mouse and let the system relax to a new equilibrium positiuon.2.5 Statisti al Me hani s
2.5.1 Introdu tion to Statisti al Me hani s
MD simulations provide information at the mi ros opi level. Statisti al me hani s is then required
to onvert this mi ros opi information to ma ros opi observables su h as pressure, energy, heat
apa ities, et . Statisti al me hani s relates these ma ros opi observables to the distribution of
mole ular positions and motions. Therefore, time independent statisti al averages are introdu ed.
For a better understanding some denitionsare reviewedhere [54 ℄:
2.5.2 Denitions
Theme hani alormi ros opi stateofasystemisdenedbytheatomi positions
x
i
andthemomentap
i
=m
i
v
i
. They an be onsideredasa multidimensional spa e with6N oordinates, for whi h theyboth ontribute 3N oordinates. Thisspa e is alledphase spa e.
The thermodynami or ma ros opi state of a systemisdened bya setof parameters that
om-pletely des ribesallthermodynami propertiesofthe system. An examplewouldbethetemperature
T,thepressureP,and thenumberofparti lesN.Allotherproperties an bederived fromthe
funda-mental thermodynami equations.
Anensembleisthe olle tionofallpossiblesystemswhi hhavedierentmi ros opi statesbuthave
the same ma ros opi or thermodynami state. Ensembles an be dened by xed thermodynami
properties asalready stated before. Examples for ensembles withdierent hara teristi s are: NVE,
NVT,NPT,
µ
VT, (E =total energy,P=pressure, V =volume,µ
= hemi al potential)2.5.3 Ensemble Averages and Time Averages
In an experiment one examines a ma ros opi sample with an enormously highnumber of atoms or
mole ulesrespe tively. Sothe measuredthermodynami properties ree tanextremelylargenumber
ofdierent onformationsofthesystem,representingasubsetoftheensemble. Wehavetosaysubset,
be auseanensemble isthe omplete olle tionof mi ros opi systems anda ma ros opi sample an
only onsist of a nite number of systems. A su iently bigsample, however, an be seen as good
approximation to an ensemble. That is why statisti al me hani s denes averages orresponding to
experimentallymeasuredthermodynami propertiesasensembleaverages [54℄. Theensembleaverage
isgiven by:
hAi
ensemble
=
Z Z
d~
p
N
d~x
N
A ~
p
N
, ~x
N
ρ ~p
N
, ~x
N
,
(2.8)
where
hAi
is the measured observable, whi h is stated as a fun tion of the momenta pi and thepositions
~x
i
. Quantityρ ~
p
N
, ~x
N
is the probability density for the ensemble and the integration
is performed over all momenta and positions of the system
d~
p
N
,
d~x
N
. So, the ensemble average is
theaverage valueof an observable weighted by its probability. Thisintegral is extremely di ult to