Structure prediction of P1-type ATPases and molecular dynamics simulations on their Metal Binding Domains

(1)

HAL Id: tel-00481898

https://tel.archives-ouvertes.fr/tel-00481898

Submitted on 7 May 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Structure prediction of P1-type ATPases and molecular

dynamics simulations on their Metal Binding Domains

Karthik Arumugam

To cite this version:

Karthik Arumugam. Structure prediction of P1-type ATPases and molecular dynamics simulations

on their Metal Binding Domains. Modeling and Simulation. Université Joseph-Fourier - Grenoble I,

2009. English. �tel-00481898�

(2)

É ole Do torale Chimie et S ien es du vivant

THÈSE

pour obtenir legrade de

DOCTEUR DE L'UNIVERSITÉ JOSEPH FOURIER

Spé ialité : Biologie Stru turale et Nanobiologie

présentée etsoutenue publiquement le12 novembre2009

par

Karthik ARUMUGAM

Stru ture predi tion of P1-type ATPases and mole ular dynami s

simulations on their Metal Binding Domains

JURY

Pr. Arthur SOUCEMARIANADIN Président

Dr. DorothéeBERTHOMIEU Rapporteur

Dr. Norbert GARNIER Rapporteur

Dr. Patri eCATTY Examinateur

Dr. Stéphane REDON Examinateur

Dr. Serge CROUZY Dire teur dethése

LABORATOIRE DE CHIMIEETBIOLOGIE DES MÉTAUX UMR 5249

Institut deRe her hes en Te hnologies etS ien espour leVivant

Commissariatá l'Énergie Atomique (CEA), Grenoble

NANO-D- INRIAGrenoble - Rhone-Alpes

(3)

(4)

I would like to thankmy advisor, Dr. Serge Crouzy, for his great guidan e and support throughout

myresear h. It wasaprivilege to bea memberof histeam. Ihad awonderful experien eworkingin

his labwhere Ienjoyed working on s ienti proje ts and learned howto be omea s ientist. During

the years we worked together he wasmu h more than an advisor to me. He taught me a lot about

lifeaswellass ien e. EverythingIhavelearnedfromhimwill staywithmethroughoutmylife. This

thesiswouldhaveneverbeen possiblewithout him.

I would like to thank my Co-supervisor Dr. Stephane Redon, Dr. Patri e Catty, Dr. Elisabeth

Mintz,Dr. Florent Guillainand Dr. Mi helFerrandfortheir valuable advi eandsupportduring my

PhD thesis.

ManythankstoDr. Mi helVivaudou and Dr. ChristopheMoreau, forhis usefuldis ussionabout

Hm2Kir6 proje t withmeandallowing meto do resear h onthesame.

Igratefullya knowledgetheeortsofmydissertation ommitteemembers,Dr. DorotheeBerthomieu

and Dr. Norbert Garnier. I owe my personal thanks to Prof. Arthur Sou emarianadin for his kind

helpduring myinitial time of my thesis.

Iwouldalsoliketothank allmypastandpresentmemberofmyLab. Iamdeeplyindebtedto the

peopleat UniversityJoseph Fourier and at CEA, for theirhelp duringmystay.

Igratefullya knowledgethenan ialsupportthatIre eivedfromtheCEADire tiondesRelations

Internationales (DRI) viaEGIDE,and fromthe LCBMlaboratory.

I would like to express my gratitude towards my Prof. R. Nayak and Prof. Shaila from Indian

Institute of S ien e, India, for their guidan e, en ouragement and support. I espe ially thank my

Deputy Registrar Mr. Paneerselvamfor his ontinuous en ouragement and trust onme.

A spe ialthoughts to myfriendswhohave mademylifeat Grenoble soenjoyable both insideand

outsidetheLab,in ludingPrem, Sandeep,Tarun,Peng,Mahesh,Chei kna, Marine,Mouna,Pankaj,

Abhinav, Robert,Shikha.

FinallyIwouldliketo thankmyfamily. Thebiggestthankswhi hhasbeensavedtilllasthastogo

(5)

(6)

1 Introdu tion 1

1.1 Histori al ba kground . . . 1

1.2 From sequen eto stru ture to fun tion . . . 2

1.3 The ell membrane . . . 2

1.4 Computational biologyand Bioinformati s . . . 4

1.4.1 Membraneproteins: . . . 4

1.4.2 Alpha(

α

)-Heli al membrane proteins: . . . 4

1.4.3 Beta(

β

)-barrel membrane proteins:. . . 5

1.4.4 Monotopi andperipheral membrane proteins . . . 5

1.5 MetalsinBiologi al Systems. . . 6

1.5.1 Metalhomeostasis and tra king . . . 6

1.5.2 Thehuman oppermetallo haperone: Atox1(orHah1). . . 7

1.6 Whatthis Thesis isAbout ? . . . 9

2 Theory and Methods 11 2.1 Stru turepredi tion using NOE-like restraints withX-PLOR . . . 11

2.1.1 Introdu tion. . . 11

2.1.2 Distan e Geometry . . . 12

2.1.3 NOE-like and dihedralangle restraints . . . 13

2.1.4 Stru turedetermination . . . 14

2.2 Homology modelling . . . 15

2.2.1 Introdu tionto Homology modelling . . . 15

2.2.2 Classi alsteps inHomology modeling . . . 15

2.2.2.1 Template re ognition andinitial alignment . . . 16

2.2.2.2 Alignment orre tion . . . 16 2.2.2.3 Ba kbone generation. . . 17 2.2.2.4 Loop modeling . . . 17 2.2.2.5 Side- hain modeling . . . 17 2.2.2.6 Modeloptimization . . . 18 2.2.2.7 Modelvalidation . . . 18

2.2.3 Theprogram MODELLER . . . 19

2.3 Classi alMe hani s andDynami s . . . 19

(7)

2.3.2.1 Verlet Algorithm . . . 20

2.3.2.2 Leap-Frog Algorithm . . . 21

2.3.2.3 LangevinDynami s(LD) Simulation . . . 22

2.4 Adaptive Mole ular Dynami s . . . 23

2.4.2 Divide andConquer Algorithm . . . 23

2.5 Statisti al Me hani s . . . 25

2.5.1 Introdu tionto Statisti al Me hani s . . . 25

2.5.2 Denitions . . . 25

2.5.3 EnsembleAveragesand TimeAverages . . . 25

2.6 For eFields . . . 26

2.6.1 General FeaturesofMole ular Me hani s For eFields . . . 26

2.6.2 Bondedterms . . . 27

2.6.2.1 BondStre hing . . . 27

2.6.2.2 AngleBending . . . 29

2.6.2.3 Torsional Terms . . . 30

2.6.2.4 ImproperTorsions /Out-of-Plane Bending . . . 31

2.6.3 NonBonded terms . . . 32

2.6.3.1 Ele trostati Intera tions . . . 32

2.6.3.2 Van derWaals Intera tions . . . 32

2.7 TheCHARMM program andfor e eld . . . 33

2.7.1 TheCHARMM for eeld . . . 33

2.7.2 DataStru tures . . . 34

2.7.2.1 Residue Topology File (RTF) . . . 34

2.7.2.2 Parameter File(PARAM) . . . 34

2.7.2.3 Protein Stru tureFile(PSF) . . . 35

2.7.2.4 CoordinateFile(CRD) . . . 35

2.7.3 Energy Minimization inCHARMM. . . 35

2.7.3.1 Des riptionof Minimization. . . 35 2.7.3.2 Minimizationmethods . . . 35 2.8 Solvation . . . 36 2.8.1 Expli itsolvation . . . 36 2.8.1.1 Periodi boundaries . . . 37 2.8.2 Impli it solvation . . . 38

2.8.2.1 A essible surfa earea basedmethod . . . 38

2.8.2.2 Poisson-Boltzmann . . . 39

2.8.2.3 GeneralizedBorn. . . 39

2.8.2.4 GBSA . . . 40

2.8.2.5 Ad-ho fastsolvation models . . . 40

2.8.3 Hybridimpli it/expli it solvationmodels . . . 40

(8)

2.8.4.2 Vis osity . . . 41

2.8.4.3 Hydrogen-bondswithwater . . . 41

3 Metal-Binding Membrane Proteins 43 3.1 Ioni Channels andTransporters . . . 43

3.1.1 Overview . . . 43

3.1.2 Spe ial ATP-binding assette transporters . . . 44

3.1.3 K

ATP

- hannels . . . 44

3.1.4 Biomole ularsensors basedon SUR . . . 44

3.1.5 Modelofan ICCR . . . 46

3.2 Generalities about P-Type ATPases . . . 51

3.2.2 Catalyti Me hanismofP-type ATPases . . . 51

3.3 SERCA :AP-type ATPaseofknowntri-dimensional stru ture . . . 52

3.3.1 Stru tureof SERCA . . . 52

3.4 P1B-type ATPases . . . 53

3.4.1 Stru tural Featuresof P1B-typeATPases . . . 53

3.4.2 Physiologi al Roles ofP1B-Type ATPases . . . 55

3.4.3 Catalyti Me hanismofP1B-type ATPases . . . 55

3.4.4 TransmembraneMetal BindingSites and Classi ation . . . 56

3.4.5 Cadmium ATPases . . . 57

3.4.6 Cytoplasmi MetalBindingDomains . . . 58

3.4.6.1 Stru tureof ytoplasmi MBDs. . . 58

3.4.6.2 RegulatoryRoles of ytoplasmi MBDs . . . 58

3.4.7 TheATPBinding (ATP-BD) andA tuator (A)Domains . . . 60

3.4.8 HumanCopperATPases . . . 60

3.4.8.1 Overview . . . 60

3.4.8.2 ATP7A and ATP7B are representatives of the P1B-family of ion-transporting ATPases . . . 63

3.4.8.3 Fun tional a tivity of human Cu-ATPases is oupled to their ability to tra . . . 63

3.4.8.4 Domainorganization of humanCu-ATPases. . . 65

3.4.8.5 ATP7A and ATP7B have distin t fun tionalproperties . . . 66

3.4.9 Chara teristi transmembranehairpin TMS1,2 inP1B-ATPases . . . 66

3.4.10 The diverse roles of the metal-binding sites within the N-terminal domain of Cu-ATPases . . . 66

4 Stru ture and dynami s of The TransMembrane region of a Cd

2+

ATPase 69 4.1 Introdu tion . . . 69

4.2 Se ondarystru ture predi tion . . . 69

(9)

4.2.3 Byhomology withCopA. . . 72

4.3 3Dstru turepredi tion byhomology withotherATPases . . . 72

4.3.1 Homology withSer a. . . 73

4.3.2 Homology withCopA . . . 73

4.4 Ab initioStru turepredi tion . . . 75

4.4.1 Core TMbundleresponsiblefor metalbinding . . . 78

4.4.2 Topology ofCadA TMbundle: program buildtopo . . . 78

4.4.3 Derivation ofmodel restraints forX-PLOR . . . 80

4.4.4 Modelbuilding . . . 82

4.4.5 Modelrenement withCHARMM . . . 82

4.5 Model he king using standardmethods . . . 86

4.5.1 Using Pro he k . . . 86

4.5.2 Using stride . . . 86

4.6 Model he king using energy minimizationand MD simulations . . . 90

4.6.1 Validation usingbR andSer a . . . 90

4.6.1.1 Ba teriorhodopsin (BR) . . . 90 4.6.1.2 Ser a . . . 92 4.6.2 Resultson CadA . . . 94 4.6.2.1 Energy . . . 94 4.6.2.2 RMSD . . . 94 4.6.2.3 TMHtiltangles . . . 98 4.6.2.4 Cadmium sites . . . 98 4.6.2.5 RoleofK181 (eqv. K671) . . . 99 4.6.2.6 RoleofD202 (eqv. D692) . . . 100 4.7 3D-models ofCadA. . . 101

4.8 Dis ussionand perspe tives . . . 101

5 Dynami s and stability of the Metal Binding Domains of the Menkes ATPase 105 5.1 Introdu tion . . . 105

5.2 Models and simulationdetails . . . 105

5.2.1 Simulation parameters . . . 105

5.2.2 Modelstru tures . . . 106

5.2.3 Simulation details . . . 107

5.2.4 Restraints . . . 108

5.2.5 Additionalsessionsand restraints . . . 109

5.3 Results. . . 110

5.3.1 Sequen e alignment. . . 110

5.3.2 Timeevolutionofenergy and boxdimensions . . . 110

5.3.2.1 Energy . . . 110

5.3.2.2 Boxdimensions. . . 111

(10)

5.3.3.2 Following sessions: Apo . . . 111

5.3.3.3 Following sessions: Holo . . . 114

5.3.4 Conservation of se ondarystru ture . . . 116

5.3.4.1 Session1 . . . 116

5.3.4.2 Following sessions . . . 121

5.3.5 RootMean Square Flu tuations. . . 121

5.3.6 Radial Distribution Fun tion ofwater around Copper . . . 125

5.3.7 Average stru tures . . . 127

5.4 Summary, dis ussionandperspe tives . . . 128

6 Con lusion 135 7 Supplementary Material 137 7.1 Program for ndingtopologi al models ofCadA:buildtopo . . . 139

7.2 First pageof outputof buildtopo . . . 139

7.3 All 2D-gridmodels ofCADA . . . 141

(11)

(12)

Introdu tion

1.1 Histori al ba kground

Four billion years ago,the rst living ell appeared. Mother earth had been waiting patiently for at

least10billionyearsforthistohappenandeversin ethatday,nothinghasbeenquitethesame. The

ellsrepli ated and repli ated and nallya group of ellswrote this thesis.

In order to have a fun tional, living ell some of the basi things that are neededare: Some way

of onverting and utilizing energy, for instan e hemi al energy or the ray from the sun and self

repli ation and selfassembly. Therst ells might have usedRNAas theinformation arrier and as

mole ular ma hines. Nowadays, modern ells have DNA as the information arrier and RNAas an

intermediate steptowards proteinsbut sometimes italso fun tionsasmole ular ma hines.

Oneless obviousthing is thatmole ulesneed to be lo atedat theright pla eat theright time. One

thingthat aidsthis is ompartmentalisation, whenmole ulesarekeptseparated bydierent kindsof

barriers. For instan e, the DNAmole ule of a ell needsto belo atedinsidethe ell, ifit isoutside,

itwill rapidly getdegraded.

My resear h has been fo used on the proteins that are lo ated in the membrane that surrounds

ellsand ompartments. Theseproteinsareresponsiblefortransportthroughthemembrane,theyare

involvedin ellto ell signalling,theyare ru ialforgeneratingand onverting hemi alandsunrays

into useful energy. In orderto be ableto understand these pro esses indetail, a 3Dstru ture of the

proteinisneeded. Unfortunately,stru turedeterminationofmembraneproteinsisverytime

onsum-ing and di ult so there are onsiderably fewer stru turesavailable than for water soluble proteins.

There are millions of proteins with known amino a id sequen e and tens of thousands of proteins

withknown stru ture but only afew hundred ofthese stru tures aremembraneproteins. Therefore,

theoreti alapproa hesthatgivestru tural insight from theamino a idsequen e holdgreatpotential

sin e omputingtimeisvery heap. However, sin e omputationalpredi tionsarejustpredi tions,it

(13)

The revolutionary dis overy of the mole ular stru ture of DNA by Watson and Cri k (1953) made

a tremendous impa t on s ien e [1 ℄. Neverbefore had it been possible to study and understand the

mole ular details ofhowgeneti information wasstored. It ouldbeseen thattheDNAdoublehelix

was stabilizebyhydrogen bondsbetween Adenine (A),Thymine (T), Guanine(G) and Cytosine(C).

Theproblem wasthatno one knewhow theDNAsequen es weretranslated into proteins.

The geneti ode wasnallybroken bythe Nirenberglaboratoryin 1961-66[2℄. It wasdis overed

thatea h DNAnu leotide tripletstands for one ofthe20 dierent amino a ids thatarethe building

blo ksof peptides andproteins. Theseevents leadto the birth ofmole ular biology.

There are two main events involved in going from DNA to proteins. The rst step is the

tran-s ription, opying, ofDNA into omplementary RNAbytheenzymeRNApolymerase. Thisenzyme

is essential for life and present in all organisms and also in some viruses. The se ond step is the

translation of the mRNA into an amino a id that forms the protein. This step is performedby the

ribosome whi h translatesthe geneti ode to anamino a id hain.

One of the remaining unsolved problems of mole ular biology is how the linearamino a id hain

foldsinto a threedimensionalprotein. Allinformation abouthowa proteinstru ture will looklikeis

present initsamino a idsequen e [3℄.

Awatersolubleproteinfoldswithinfra tionsofase ondandthatmakesitimpossibletoexploreall

dierent onformations thatexist for theamino a ids. The Levinthalparadoxstatesthat due to the

large amount of degrees offreedom intheprotein hainthemole ule hasan astronomi al number of

possible onformationsasestimateis10

143

,[4 ℄. Iftheproteinwouldhave toexploreall onformations

it would take longer time thatthe universe has existed. Thus, there exist folding pathways, yet the

details ofthem remainun lear.

The key to a detailed understanding of protein fun tion lies in its stru ture. A high resolution

stru ture is neededto beable to study theprotein fun tion at amole ular level. The most ommon

methodto determine the stru ture isto useX-ray rystallography.

1.3 The ell membrane

Membranes area key element for life sin e they a tas a physi al barrier to separate the interior of

a ell from the outside world. They also serve to onne its dierent ompartments, providing the

means for a ell to be out from the equilibrium withthesurroundings, allowing important pro esses

o urring in the interior. In addition to be ru ial for ell integrity, ell membranes also serve as a

matrixandsupportformanytypesofproteinsinvolvedinimportant ellfun tionsandtherefore,they

areessential for ell fun tion.

Biologi almembranes onsistoforganizedassembliesoflipidsandproteins. The urrentknowledgeof

howtheyarestru turedisstills ar e duetothedi ulties asso iatedtotheexperimentalte hniques

required to investigate their properties. The modernview ofbiologi al membranes is basi ally based

on the uid mosai model proposed in the seventies [5℄, in whi h lipids are arranged in bilayers,

where proteins are embedded, and subje t to a lateral freely diusion. At that time, however, the

(14)

membraneproteins,thea eptedvisionhasbe omemoreprotein enteredandthe ru ialroleoflipids

diluted. It is not until more re ently that, witha better understanding of lipid-protein intera tions,

theroleof lipidsinprotein fun tion isagainhavingan in reasing attention [6,7 ℄. The urrent vision

ofmembranesismoreasamosai thanauid: lipidsorganizeamatrixwhere proteinsaredistributed

inregionsof biased ompositionwith varyingprotein environment [8℄.

Figure1.1: The ell membrane

Membranes onsist of a omplex mixture of lipids, where dierent proteins both, integral and

peripheral are embedded (See Fig. 1.1). Protein ontent variesgreatly among the dierent kinds of

membranes, ranging typi ally between 15 to 75%, depending on the fun tionsthat they must arry

out [9 ℄. Furthermore,lipid omposition hanges fromone membrane toanother due to theenormous

stru tural diversity found, that an be asso iated with the dierential roles and properties of ea h

membraneor region. The most widelyfound lipids onsist ofa stru ture ofa fattya idlinked byan

ester bondto anal ohol su h asgly erolor holesterol, or through amide bondsto a sphingoid base

or to otheramines. Mostlipids have a highlypolar head group and two hydro arbon tails.

In a typi al membrane, approximately half of the lipids are phospholipids, mainly phosphatidyl

- holines (PC), -ethanolamines (PE) and -serines (PS). Other major omponents following in

im-portan e are sphingolipids, gly olipids and holesterol [10℄. Interestingly, sin e the two sides of the

membrane bilayer must deal with dierent surroundings, the two leaets of a membrane typi ally

exhibit an asymmetri omposition. The lateral distribution of lipids isalso nonhomogeneous in

re-gardtolipid omponents,providingregionswithdierentlevelsofmole ular orderthatareknownas

rafts. These seem to be important for the fun tion of membrane proteins [11 ℄. Finally, the presen e

(15)

1.4 Computational biology and Bioinformati s

The amount of available geneti information ontinues to in rease at a steady pa e. Ever sin e

GenBank, a sequen e database, started 1982 it has doubled its size every 18 months. The latest

release (v160) ontains 77,248,690,945 bases. To put this enormous number into perspe tive, the

number of bases that was sequen ed from April to June 2007 was 42,880 ea h se ond. The DNA

sequen ing te hnology also gets faster and faster, e.g. a 454 gene sequen er an sequen e 25 million

bases with 99%a ura y in 4 hours, whi h makes it possible to sequen ea ba terial genome during

the ourse ofa day[13 ℄.

There is learly a need for methods that an make sense out of all this vast amount of geneti

information and this is where bioinformati s and omputational biology steps in. The dieren e in

denition between the two terms is that omputational biology refers to hypothesis driven

investi-gations of biologi al problems using omputers while bioinformati s is more te hnique driven and

on erns development of algorithms and omputational and statisti al te hniques. Many times the

two termsareusedinter hangeably.

The unifyingtopi inthis thesisis membraneproteins andComputational Biophysi s te hnology.

I will then ontinue with a brief overview about membrane proteins. Then, I will fo us on metal

transporting proteins.

1.4.1 Membrane proteins:

Insidethemembranetherearemembraneproteinswithmanydiversefun tions. Theyareresponsible

for sele ting whi h substan es are allowed to pass through the membrane, they are important for

transmitting signals, ell-to- ell ommuni ation, and numerousother fun tions. Morethan halfofall

drugsaretargeting membraneproteins [14 ,15 ℄.

The amount of membrane proteins insidethe lipid bilayer vary from organism to organism, from

ell type to ell type. For instan e, membranes proteinsmake up 75% of themassof the membrane

inE. oli while only 30%for humanmyelin sheath ells.

There arethree main lassesof membraneproteins:

1.4.2 Alpha (

α

)-Heli almembrane proteins:

α

-heli almembraneproteinshavetheiraminoa idsarrangedintightlypa kedheli esinthe

transmem-braneregions(Fig.1.2A).Thereasonbehindthe

α

-heli alarrangementistohaveaslargehydrophobi

outersurfa e areaaspossible that an intera t withthehydrophobi fattya idsof thelipids. At the

same time,it alsosatisesall its internal ba kbonehydrogen bonds.

α

-heli al membrane proteins typi ally make up 20-30% of the en oded proteins of an organism

[16 ℄. Mostoftenthey arelo atedinthe innermembrane but re ently one example,a polysa haride

(16)

A. An

α

-heli al membrane protein with 7 transmembrane heli es like ba teriorhodopsine studied in

se tion 4.6.1.1. B. A

β

-barrel membrane protein with 8transmembrane

β

- strands. C. A monotopi

membraneprotein.

1.4.3 Beta (

β

)-barrel membrane proteins:

Theotherpossibilityofspanningthemembranewhilesatisfyinginternalba kbonehydrogenbondsand

maintainingalargehydrophobi surfa eareaisthe

β

-barrelfold(Fig.1.2B).Sin ea

β

-strandalways

hydrogen bonds to another

β

-strand, these proteins onsist of

β

-hairpins through the membrane.

Thus,this lassalways hasanevennumberof

β

-strands. Every se ondresidueinthetransmembrane

region is fa ing the inside of the barrel whereas the other is fa ing the outside, towards the lipids.

This alternating pattern is ree ted in the amino a id sequen e where one amino a id is non-polar

andhydrophobi (and dire tedtowardsthelipids) andthenextonemorepolar(anddire tedtowards

the interior of the barrel) et . Sin e a

β

-strand is extended ompared to the oiled

α

-helix, the

transmembrane regions are shorterand typi ally onsist of 8 to 15 amino a ids [18 ℄. The

β

- barrel

membraneproteins an befound inthe outer membrane ofba teria, mito hondria and hloroplasts.

Around2-3%ofagenomeen odes

β

-barrelmembraneproteins[19℄butthatisanun ertainnumber

giventhedi ultiesofidentifying them. Themainreasonfor thisisthat

β

-strandsaremoredi ult

topredi tthan

α

-heli essin ethey onsistoffewerresiduesand ontainmorelongrangeintera tions.

There arealso fewerstru turesof

β

-barrel membrane proteinsthan

α

-heli al.

1.4.4 Monotopi and peripheral membrane proteins

Monotopi membraneproteinsareonlyasso iatedtooneofthetwoleaetsofthemembrane. Prostaglandin

H2 synthase-1is one example and isshown inFig. 1.2C.[20 ℄. The heli es thatare asso iated to the

membraneareamphiphili [21 ℄ and the protein is onlyremovablefrom themembranebydetergents,

organi solventsor denaturantsthatinterfere withthe hydrophobi intera tions.

(17)

ele -domainsofother integralmembraneproteins. Peripheral membraneproteins an be disso iatedfrom

themembranebytreatment withsolutions of highioni strengthor elevatedpH.

1.5 Metals in Biologi al Systems

1.5.1 Metal homeostasis and tra king

Organismsrequireessentialheavymetalsin ludingCu,Zn,Mn,Fe,Co,NiandMoto arryout

biolog-i al fun tions. Heavymetals a tas ofa torsinenzymerea tions in ludinggrouptransfer,redoxand

hydrolysis[22℄. Otherslike Na,K,andCaareinvolved inphysiologi alpro essesand/ormaintaining

stru tureaswell as ontrolling the fun tionof ell walls. It hasbeenestimatedthatoveraquarterof

allknownenzymesrequireaparti ular metalionfor fun tion. Theseenzymes anbedividedintotwo

groups: metala tivatedenzymes and metalloenzymes. Metal-a tivatedenzymes require theaddition

ofmetalionstobe omestimulated[23 ℄. Kinasesareanexampleofthistypeofenzymesbe auseofits

use of the Mg-ATP omplex asa phosphoryl group donating substrate. Metalloenzymes ne essitate

metalionstofun tionproperly. Themetalionisrmlyboundtotheenzymeandisfrequentlyre y led

after protein degradation. Heme groups in hemoglobin or yto hromesbind a Fe

2+

ion tightly. The

Chlorophyll thatsustains every e osystemalso needs aMg

2+

ion. The Cu-Zn superoxide dismutases

(SOD) require metal ions to detoxify the ell from dangerous free radi als. These are some of the

manyexamplesofdiverserolesthatmetalionsplay. Metalionsareessentialforlife,yettheyaretoxi

to the ellsathigh on entrations or inthefreeform [22,24 ℄. They angenerate freeradi als,whi h

arehighly rea tive and oxidant mole ules that an rea t withanybiomole ule su h asnu lei a ids.

This intera tion with biomole ules may eventually lead to ell death. Cells have developed highly

omplex systems usedfor sensing, tra king and transporting of metals. Manyof these systems are

not well understood yet[25 , 26℄.

In biologi al systems, these metals are mostly bound to proteins. In these metalloproteins, they

have atalyti and stru tural roles:

1)

as onstituents of enzyme a tive sites;

2)

stabilizing enzyme

tertiary or quaternary stru ture;

3)

forming weak-bonds withsubstrates ontributing to their

orien-tation to support hemi al rea tions; and

4)

stabilizing harged transitionstates[27℄. Inoxygenated

states Cu, Fe and Mn have unpaired ele trons that allow their parti ipation in redox rea tions in

enzyme a tive sites [27 ℄. For instan e, Cu mediates the redu tion ofone superoxide anion to

hydro-gen peroxide and oxidation of a se ond superoxide anion to mole ular oxygen in the a tive site of

ytoplasmi superoxide dismutase [28℄. Zn does not have any unpaired ele trons in the Zn

2+ state

and it has been proposed to prevent the formation of harmful free radi als by ompeting with the

redox a tive metals su h as Fe and Cu inthe enzyme a tive sites [29℄. Zn

2+

is also a ofa tor of a

numberofenzymesin ludingRNApolymerase, arboni anhydraseandCu/Znsuperoxidedismutase

[27 ℄. Other heavy metals in luding Cd, Pb, Cr, Hg and As have no known physiologi al a tivity

and are non-essential [30℄. Elevated levels of both, essential and non-essential heavy metals, results

in toxi ity symptoms mostly asso iated with the formation of rea tive oxygen spe ies (ROS) that

(18)

homeostati me hanisms to maintainphysiologi al on entrations of essential heavy metals in

dier-ent ellular ompartments andto minimize thedamage fromexposure to non-essential ones.

The mainme hanisms ofheavy metalhomeostasisin lude transport, helation, and detoxi ation

byeuxorsequestrationintoorganelles. Heavymetalsaretransportedintothe ellsbyvarious

trans-membranemetal arriers. Ithasbeenshownthatalthough ellularZn

2+

orCu

2+

total on entrations

are in the millimolar and mi romolar range respe tively, ytosoli free Zn

2+

on entration is in the

femtomolar range while free Cu

+

is in the zeptomolar range, i.e. less than one free atom per ell

[25 , 32℄. Thisindi ates thatthe heavymetals areimmediately omplexed withmole ulesor peptides

upon entry to the ell. Chelators buer ytosoli metal on entrations and they in lude mole ules

su hasphosphates, phytates,polyphenolsandglutathiones, orsmall peptidessu h asphyto helatins

and some proteins like metallothioneins [27, 33℄. Someof these helators are thought to be involved

inmetaltransportinto sub ellularorganelle. Forinstan e, ithasbeen shownthatCadmiumATPase

fromListeriamono ytogenesis,subje tof hapter4ofthis work,transportsCd

2+

omplexesinto the

plasma. Chaperones are proteins that bind spe i essential heavy metals and deliverthem to

par-ti ular target metalloproteins where they fun tion as part of theenzymati a tivity. Similarly, they

tra the metal to spe i membrane transporters that eux the metal to the extra ellular spa e

and thelumen ofsub ellular organelles [34℄. Fig.1.3summarizes the interplay ofCu

+

with dierent

metallo haperones inyeast.

1.5.2 The human opper metallo haperone: Atox1 (or Hah1)

The redoxpotential of opper makes ita useful ofa tor for manyenzymes, but also requires opper

to be sequestered at all times to avoid oxidative damage. In yeast, it has been demonstrated that

almost no opper exists free in solution; instead all opper is bound to proteins or small mole ules

su h asglutathione [35℄. When opperenters the ell throughthe transporter Ctr1,it is ahought to

immediatelypasstoanothermole ulewiththe helpofa opper haperone. (SeeFig.1.3). Inhumans,

the ytosoli protein Atox1 (HAH1) has been shown to play a key role in the delivery of opper to

spe ialized membraneproteins alledCu-ATPases (Seese tion3.4for a ompletedes riptionofthese

proteins). The deletionof the Atox1geneinmi e leads to both intra ellular opper a umulation as

well asa de reaseinthea tivityof se reted opper-dependent enzymes [36 ℄,indi ative ofthe

dimin-ished Cu-ATPase transport a tivity. In yeast, the Atox1ortholog Atx1 hasbeen shown to fa ilitate

fun tion ofC 2, the P-type ATPase, whi h transports opper into thelate Golgi ompartment [37 ℄.

Theevolutionary onservationof fun tionalintera tions between Atox1and Cu-ATPasesemphasizes

therole ofthe haperone inregulating the Cu-ATPasefun tion.

Stru turally, Atox1 is a 68-amino a id residues protein that has the

βαββαβ

fold and a single

Mx-CxxC opper binding motif ommon with the N-terminal Metal Binding Domains (MBDs) of

Cu-ATPases [38 ℄. Similarly to the N-terminal MBDs, Atox1binds opper with a linear, two- oordinate

geometry butotherdata,in luding workdone inthe laboratory[39℄,arehighlysuggestiveofa

trans-ferme hanismwhere oppermoves fromAtox1 to a metal-binding domain of Cu-ATPase through a

(19)

transportedinto yeast ellswithhighanity,following redu tionfromCu

2+

to Cu

+

bytheFre1and

Fre2 plasma membrane Cu(II)/Fe(III) ion redu tases. Then, the high-anity Cu transporters Ctr1

and Ctr3 mediate thepassage ofCu a rossthe plasmamembrane On e insidethe ell, CCS ensures

thedistributionof oppertotheCu/ZnSuperoxyde dismutase(SOD),Cox17isakey-elementforthe

in orporation of opperinto mito hondriae andAtx1supplies opperto to theP-typeATPase: C 2

intheGolgiapparatus. On eCu

+

rea hesC 2,itistransportedtothelumenoftheGolgi/endosome

ompartment. Here,fourCuatomsareassembledwithFet3. TheFet3-Ftr1 high-anityFe-transport

omplex assembles at the plasma membrane. Ctr2 is a va uolar opper onveyor making itpossible

(20)

maylook [38℄. Insolution, theintermediateis transient and annotbevisualized [40℄.

1.6 What this Thesis is About ?

Both hemistry and most other experimental s ien es usually rely on a top-down approa h. That

is, measurements are gradually rened to be able to observe smaller stru turesand faster pro esses

until te hni al limits are rea hed. If we had a ess to very, very powerful omputers one ould try

to reverse this algorithm and do bottom-up modelling instead. This is the basi idea behind the

approa h Ihave usedfor the resear hsummarizedinthis thesis;starting fromsimple(almost trivial)

pairwiseintera tions ofatoms, omputers anbeusedtosimulatewhathappensin omplexbiologi al

mole ules on longer times ales. This way, it is a tually possible to see the atomi motions on a

levelusually not a essible to experiments. Theknowledge gained an thenbeused to return to the

drawing table and formulate better models for the phenomena observed, to be able to understand

and perhaps even manipulate the systems, e.g. transport drug mole ules or metals through ellular

membranes. Further, when the simulations rea h time and length s ale where it is also possible to

performexperimenst(andagreewiththese),the hemistryandphysi softhemole ules anbetra ed

all thewayfrom individual atoms up real-world ma ros opi systems. (Of ourse,that isthe ni e,

ideal pi tureinpra ti e it ishours, daysand months ofprogramming, debugging,running dynami s

and waitingfor those longrunsto nish. Butat leastyou neverhave to doanywetlaboratory work

!)

Morepre isely,andfromthetitleofthisthesis,I'vebeeninterestedinstru turepredi tionofheavy

metalATPases anddynami s oftheir MetalBindingDomains usingan insili oapproa h. Afterthis

introdu tion on membrane proteins and metals inbiologi al systems,thethesis is divided in4 main

hapters:

- TheoryandMethods: Iwillpresent omputationalte hniquesusedtopredi tthethree-dimensional

stru ture of proteins when it is unknown experimentally. Then, I will present mole ular

me- hani sfor eeldsand mole ulardynami s simulation algorithmsin ludingadaptive mole ular

dynami s thatIhave usedto study thefun tion ofthese proteins.

- Metal-BindingMembraneProteins: Inthis hapter, Iwilldetailthe urrentknowledgeonheavy

metaltransportingATPasesfo ussingonCd

2+

andCu

+

ATPases. Iwillalsoshowhowwe ould

model the stru ture of an ion hannel oupled to a 7-transmembrane-helix re eptor and open

thewayfor thestudy ofsignaltransdu tion inthis system.

- Stru ture and dynami s of The TransMembrane region of a Cd

2+

ATPase: I have applied

omputational te hniques to yieldseveral possible models of theTMregion of a Cd

2+

ATPase

also studied experimentally inthe laboratory.

- Dynami sand stability of theMetalBinding Domains of theMenkesATPase: In this hapter,

I've usedthe known3Dstru ture oftheN-terminalsoluble partofaCu

+

ATPaseasastarting

point for mole ular dynami s simulations of the metal binding domains of this protein in the

(21)

fun tion oftwoparti ular heavymetalATPases. When thestru ture oftheseATPases wasunknown

stru ture predi tion methods have been used. Mole ular dynami s simulations have been used to

approa h thefun tion oftheseproteins. Thenalgoalisto understandhowthemetalispassedfrom

soluble haperones whi h bindthemetalassoonasitenters the ell totheir partner ATPases whose

roleistodistributethis metalwhere it'sneededor eliminateit. Myworkmayalsobeusedto predi t

theee t of or proposeamino a id mutations thatimpair thenormal behaviorof theseproteisn and

(22)

Theory and Methods

Whenattemptingto studybiologi aland hemi alme hanismsinproteinsusing omputational

te h-niques, the rst requirement is the stru ture of the ma romole ule involved. When this stru ture is

not available fromdatabases, we an eithertry to predi tit abinitio, thatis withoutanyknowledge

about otherexisting stru tures, or usehomology modeling when thestru ture of a similarprotein is

known.

I will start des ribing a new method that we have developed to build stru tural models of the

transmembranepartofproteinswhennohomologyispresent. ThenIwillpresenthomologymodelling

and the program MODELLER that we have used to build models of an ABC transporter. Finally,

the stru ture of the protein being known, I will present the te hniques of mole ular me hani s and

dynami s and the program CHARMMthatI have usedto study thedynami s ofATPases.

2.1 Stru ture predi tion using NOE-like restraints with X-PLOR

2.1.1 Introdu tion

X-PLOR[41℄isaprogramusedtodetermineandrenesolutionNMRstru turesbasedoninterproton

distan e estimates (from Nu lear Overhauser Ee ts or NOE), oupling onstants measurements,

and other information, su h as known hydrogen-bonding patterns. What su h measures an we

theoreti allyderive or predi twhen buildingmodelsofthetransmembrane(TM) partofan

α

-heli al

membraneprotein ?

- Ea hTMhelix anbe onsideredasaregular

α

-helix. Wethenknowthatthedistan ebetween

Hatom of residue

i

inthe helixand Oatom ofresidue

i + 4

is lose to 1.5Å.

- We also knowthatinsu h aregularhelix, the

φ

and

ψ

dihedralangles have values lose to -60

and -50degrees,respe tively.

- TMheli esformabundleinwhi hthe enterof1helixis lose tothe enterof3or4neighbour

heli es;the distan ebetween enters liesaround 10 Å.

- InMetal transportingTM proteins, residuesin3 or 4TMheli es areknown to bindthe

trans-ported metal; this knowledge allows us to set up some new distan e restraints between metal

(23)

eralalternativemethods possible inX-PLOR:full-stru turedistan e geometry,substru turedistan e

geometry,andabinitiosimulatedannealing(SA)startingfromtemplatestru turesorrandom

oordi-nates. The hoi eofproto oldependsonthedesirede ien yand samplingof onformationalspa e.

In our ase of theoreti al distan e restraints substru ture embedding and regularization followed by

SArenementhavebeenused. Simulatedannealingissopowerfulthatit an onvertarandomarray

ofatomsintoawelldenedstru turethroughdistan erestraints. Stru turesobtainedbythisproto ol

haveto be regularized andrened.

Figure2.1: Overviewofstru ture al ulations

2.1.2 Distan e Geometry

Manytypesofstru turalinformation(distan es,J- ouplingdata, hemi al ross-linking,neutron

s at-tering,predi tedse ondarystru tures,et .) anbe onveniently expressedasintra-or intermole ular

distan es. In our ase, we dene NOE-like distan es whi h are distan e restraints imposed on our

models and similar to NOE results. In the absen e of J- oupling data, we dene dihedral angle

restraints. The distan e geometry formalism allows these distan e restraints to be assembled and

three-dimensional stru tures onsistent with them to be al ulated. The distan e geometry routines

inX-PLORbeginbytranslatingthebondlengths, bondangles,dihedralangles,improperangles,and

van derWaals radii in the urrent mole ular stru ture into a (sparse) matrix of distan ebounds

be-tweenthebondedatoms,atomsthatarebondedtoa ommonthirdatom,oratomsthatare onne ted

(24)

Experimental onstraints anbeadded bytheNOE assign statementsand therestraints dihedral

statements. These listsof restraints areautomati ally read,translated into distan e onstraints, and

enteredinto the boundsmatrix. Theyhave thefollowing forms:

NOE-like distan e restraints:

The total distan erestraint energy

E

N OE

is asumover all distan erestraints:

E

N OE

=

X

restraints

e

N OE

where:

e

N OE

= min(1000, S)∆

2

(2.1) and

∆

isdened as

∆ =











R − (d + d

plus

)

if

d + d

plus

< R

0

if

d − d

minus

< R < d + d

plus

d − d

minus

− R

if

R < d − d

minus

S

is the s ale fa tor, and

d

,

d

plus

, and

d

minus

are the average target distan e and ranges and

R

is thedistan ebetween the two atoms. This denes the biharmoni fun tion witha valueof 0 for

R

largerthan

d − d

minus

andsmaller than

d + d

plus

.

Dihedral Angle Restraints:

The fun tionalform oftheee tive dihedralrestraint energy

E

CDIH

isgivenby

E

CDIH

=

X

well(modulo

2π

(φ − φ

o

), ∆φ)

2

(2.2)

where thesumextends overall restrained dihedral angles and thesquare-well potential

well(a, b)

isgiven by

well(a, b) =











a − b

if

a > b

0

if

−b < a < b

a + b

if

a < −b

Plane Restraints:

Planar restraints have alsobeen addedinsome simulationsto maintainatoms ina ommon plane

mimi king thephospholipid membrane. Restraints on an individual atom are based on its distan e

fromaplanedenedbythenormalve tor

~z

. Anatomforwhi hplanerestraintsaredenedexperien es

restraints only in the dire tion of

~z

. Planerestraints are dened from the ve tor dieren e between

present (

~r

)and referen e(

~r

ref

i

) oordinates by:

E

P lane

=

X

~z

|~z|

.

~r

i

− ~r

i

ref

2

(2.3)

(25)

Stru ture determination of extended polypeptides or DNA/RNAdouble strands is usually

underde-termined. In parti ular,the overall shape or bend of themole ule isa freeparameter. Thus, neither

distan egeometrynor abinitiosimulatedannealingwillprodu e uniquestru tures. Theproblem an

beavoidedbyin ludingadditional restraints

The rst step of stru ture determination using X-PLOR involves providing the program with

the information it needs about the mole ular stru ture, NOE-like distan e bounds, dihedral angle

restraints. Themole ularinformationofthema romole ulehastobegeneratedusingtheall-hydrogen

for eelds "topallhdg.pro", "parallhdg.pro" for proteins.

Template Stru ture

The next step involvesgeneration of a template oordinate set. The template oordinate set an be

any onformation of the ma romole ule with good lo al geometry and no nonbonded onta ts. It

an be generated by using most mole ular modeling graphi s programs or, preferably, by using the

X-PLORproto oldes ribedbelow. The purposeofthe template oordinate setisto providedistan e

geometry information about the lo algeometry of thema romole ule.

The proto ol we used initially pla es the atoms of the ma romole ule along the x-axis, with y

and z setto random numbers. The oordinates are thenregularized using simulated annealing(See

Fig.2.2).

Generally, when too many ovalent links are present, the stru ture may get entangled in a knot

whi hwillresultinpoorlo algeometry. Ingeneral,someexperimentation mayberequiredtondout

if ertain ovalentlinks have toberemoved;the goalisto obtainan energybelow1000 k al/mole for

thenal step of minimization. We have taken advantage of the adaptive dynami s program (AMD)

(See se tion 2.4) whi h allows us to easily manipulate stru tures and entangle the knots by setting

thevanderWaalsfor e tozero inthe pro ess.

Substru ture embedding

A family of embedded substru tures is produ ed using distan e geometry. The substru tures are

regularizedafterembeddingusingminimizationagainsttheDGenergyterm. (ThisDistan eGeometry

term is still another harmoni potential meant to maintain a given distan e between a lowerand an

upperbound). Covalentlinksshouldbetreatedinthesamefashionasinthepriortemplategeneration.

Theremoved ovalent linksare reintrodu ed asdistan e restraints.

SA-Regularization of DG-Stru tures

Embedded distan e geometry (substru ture) oordinates require extensive regularization. The next

proto ol uses template tting followed by simulated annealing to regularize the oordinates. The

proto ol is lose in spirit to the one publishedbyNilges etal [43 ℄. The starting oordinates have to

bedened for at leastthree atoms inea h residue. Covalent links arenowtreated asreal bondsin

this proto ol, in ontrast to template generation.

Simulated Annealing Renement

Stru turesarenallyrenedusingaproto oloftheslow- oolingtypereminis entoftheproto olused

in rystallographi renement with softeningof thevanderWaals repulsions. This enables atoms to

move through ea h other.

(26)

Figure 2.2: Flow hartfor simulated annealing

2.2 Homology modelling

We have also used a more standard method for stru ture predi tion when a homology between the

protein of interest and otherproteins ofknown 3Dstru tureexists:

2.2.1 Introdu tion to Homology modelling

During evolution, sequen e hanges mu h faster than stru ture. It is possible to identify the

3D-stru ture by looking at a mole ule with some sequen e identity. Fig. 2.3 shows how mu h sequen e

identity is needed with a ertain number of aligned residues to rea h the safe homology modeling

zone. Forasequen eof100residues,forexample,asequen eidentityof40%issu ientforstru ture

predi tion. Whenthesequen eidentityfallsinthesafehomologymodeling zone,we an assumethat

the3D-stru ture ofboth sequen es isthesame.

2.2.2 Classi al steps in Homology modeling

The known stru ture is alled the template, the unknown stru ture is alled the target. Homology

(27)

stru ture when their sequen e identity fallsinthe safehomology zone, theupperpartof the pi ture

(From work bySander andS hneider [44 ℄).

2.2.2.1 Template re ognition and initial alignment

Inthisstepyou ompare thesequen e oftheunknownstru turewithalltheknownstru turesstored

intheProteinDataBank(PDB).Asear hwithBLASTagainstthisdatabasewillgivealistofknown

protein stru turesthat mat h the sequen e. If BLAST annot nd a template a more sophisti ated

te hnique might be ne essaryto identifythestru tureofamole ule. BLASTusesaresidueex hange

matrix to dene a s ore for every hit. Residues that are easily ex hanged (for example Ile to Leu)

get a better s ore than residues that have dierent properties (for example Glu to Trp). Conserved

residues with a spe i fun tion get the best s ore (for example Cys to Cys). Every hit is s ored

using this matrix and BLAST will provide a list of possible templates for the uknown stru ture. To

make the best initial alignment, BLAST uses an alignment-matrix based on the residue ex hange

matrix and adds extra penalties for opening and extension of a gap between residues. In pra ti e

the target-sequen e is sent to a BLAST server, whi h sear hes the PDB to obtain a list of possible

templatesand their alignments. Subsequently thebest hithas to be hosen, whi h is not ne essarily

the rst one. One has to keep in mind the resolution, missing parts, dierent states of a tion and

possibleligandsof themole ulesindoing so.

2.2.2.2 Alignment orre tion

It ispossiblethatthe alignment hasto be orre ted. A hange ofAla toGlu is possible but unlikely

to happen ina hydrophobi ore, so this Alaand Glu annot be aligned. Using a multiple sequen e

(28)

less likely to be hanged than the residues at the outside. Insertions and deletions an be made in

widelydivergentparts ofthemole ule andamultiple sequen ealignment anbehelpfulto ndthese

pla es. Gaps have to beshifted arounduntil they areassmall aspossible. InFig. 2.4is shownthat

after a deletion of 3 residues a big gap o urs in the red stru ture, whi h was the best alignment.

Aftershiftingseveralresidues,the gapismu hsmaller(bluestru ture) andmorelikely tobe orre t.

Corre tionof the alignment is typi allydone byhand.

Figure 2.4: Template stru ture (green) with the best aligned target (red) with a large gap, and the

target aftershifting several residues (blue). The gapis mu h smallernow.

2.2.2.3 Ba kbone generation

When the alignment is orre t, the ba kbone of the target an be reated. The oordinates of the

template-ba kboneare opiedtothetarget. Whentheresiduesareidenti al,theside- hain oordinates

are also opied. Be ause aPDB-le an always ontain some errors, it an be useful to make useof

multiple templates.

2.2.2.4 Loop modeling

Oftenthealignmentwill ontaingapsasaresultofdeletionsandinsertions. Whenthetargetsequen e

ontains agap,one ansimplydelete the orrespondingresiduesinthetemplate. This reatesahole

inthemodel,thishasalreadybeendis ussedinpreviousstep. Whenthereisaninsertioninthetarget,

thetemplatewill ontainagap andtherearenoba kbone oordinatesknownfortheseresiduesinthe

model. The ba kbone from the template has to be utto insert these residues. Su h large hanges

annot be modeled in se ondary stru ture elements and therefore have to be pla ed in loops and

strands. Surfa e loops are, however, exible and di ult to predi t. One way to handle loops is to

take some residues before andafter the insertion as"an hor"residues and sear h thePDB for loops

with the same an hor-residues. The best loop is simply opied in the model. In the MODELLER

program, thatwe haveused(See se tion2.2.3), loop modellingistreated withspe ial are.

2.2.2.5 Side- hain modeling

The next step isto add side- hains to theba kbone of the model. Conserved residues were already

(29)

to predi t the rotamer be ause many ba kbone ongurations strongly prefer a spe i rotamer as

showninFig.2.5,inthe aseofatyrosineresidue. Therearelibrariesbasedupontheba kboneofthe

residues anking the residueofinterest. By usingthese librariesthebestrotamer an be predi ted.

Figure2.5: Preferedrotamersoftyrosinexempliedwithdierent positionsofTYR52inthe10NMR

models oftherst MetalBinding DomainoftheMenkesATPase(PDB ode1KVJ).

2.2.2.6 Model optimization

The model has to be optimized be ause the hanged side- hains an ee t the ba kbone, and a

hangedba kbonewillhaveee tonthepredi tedrotamers. Optimization anbedonebyperforming

renementsusing Mole ularDynami s simulationsof themodel. Themodel ispla ed ina for e-eld

andthemovementsof the mole ulesarefollowed intime, thismimi sthefolding oftheprotein. The

bigerrorslikebumpswillberemovedbutnewsmallererrors anbeintrodu ed. The al ulatedenergy

should be aslowaspossible.

2.2.2.7 Model validation

Every model ontains errors. The model withthe lowest for eeldenergy might still be folded

om-pletely wrong. That is why the model should be he ked for bumps and if thebond angles, torsion

angles and bond lenghts are within normal ranges. Other properties, like the distribution of

po-lar/apolar residues, an be ompared with real stru tures. This an be done by using Pro he k, for

example. Theoutput an helpin the identi ation of errorsinthe model. When anerror o urs far

awayfromthea tivesite,itdoesnothaveto bebad. Butwhenanerroro ursinthea tivesite,one

(30)

MODELLER [45℄ is a omputer program that models three-dimensional stru tures of proteins and

their assemblies by satisfa tion of spatial restraints. MODELLER is most frequently used for

ho-mology or omparative protein stru ture modeling: The user provides an alignment of a sequen e

to bemodeledwith knownrelated stru turesand MODELLER will automati ally al ulate amodel

withall non-hydrogen atoms. Moregenerally,theinputs to theprogram arerestraints onthespatial

stru tureoftheaminoa idsequen e(s)andligandstobemodeled. Theoutputisa3Dstru turethat

satisesthese restraints aswell as possible. Restraints an inprin iple be derived from a number of

dierentsour es. Thesein luderelatedproteinstru tures( omparativemodeling),NMRexperiments

(NMR renement), rules of se ondary stru ture pa king ( ombinatorial modeling)... The restraints

an operate ondistan es, angles,dihedralangles,pairs of dihedralangles andsome otherspatial

fea-turesdened byatoms orpseudoatoms. Presently,MODELLERautomati allyderivestherestraints

only from the known related stru tures and their alignment with the target sequen e. A 3Dmodel

is obtained byoptimization of a mole ular probabilitydensity fun tion(pdf). The mole ular pdf for

omparativemodelingisoptimizedwiththevariabletargetfun tionpro edureinCartesianspa ethat

employs methods of onjugate gradients and mole ular dynami s with simulated annealing.

MOD-ELLER an also perform multiple omparison of protein sequen es and/or stru tures, lustering of

proteins, and sear hing ofsequen e databases.

2.3 Classi al Me hani s and Dynami s

The 3Dstru ture of themole ulesbeingknown, we arenow fa edwith thesigni ant omplexityof

thestru turesandintera tions thatwewantto simulateforarealisti model. Twopossiblestrategies

aregenerallyused: we an eitherin reasethe pro essingpower (e.g. use ostly parallel

super omput-ers), or use simplied representations of the geometry or of thedynami s of the involved mole ules.

Frequently, these simpli ation methods involve representations as me hani al models or inredu ed

oordinates(e.g. modellingthemole uleasanarti ulated body),wheresubsetsofatomsarerepla ed

byidealizedstru tures[46℄[47 ℄,orperformingnormal-modeorprin ipal omponentsanalysisinorder

to determine the essential dynami s of the system. Be ause they ontain fewer degrees of freedom,

thesesimpliedrepresentationsallowustoa eleratethe omputationofthemole ulardynami s,and

fa ilitatethestudyofthemole ularintera tions. Inthefollowing,Iwilldes ribehowamole ular

me- hani smodelisbuilt withatomsrepresentedas hargedspheresand ovalentbondsbysprings. This

me hani al model an onsequently bestudied using the lassi al equationsof motion of me hani s.

2.3.1 Newton's Se ond Law

Mole ularDynami s simulationsare basedonNewton'sse ond law, theequationof motion [48 ℄,[49 ℄:

~

F

i

= m

i

.~a

i

= m

i

.

d~v

i

dt

(2.4)

It des ribesthemotionofa parti leofmass

m

i

alongthe oordinate

x

i

with

F

i

beingthefor eon

m

i

in that dire tion. This is used to al ulate the motion of a nite number of atoms or mole ules,

(31)

apotential energyfun tion,

V ~

(x)

,where

~x

orrespondsto the oordinates ofallatoms inthesystem.

Therelationship of the potential energy fun tion and Newton'sse ondlaw isgiven by

~

F (x

i

) = −∇

i

V (x

i

),

(2.5)

with

F (x

~

i

)

being the for e a ting on a parti le due to a potential,

V ~

(x)

. Combining these two

equaions gives

dV ~

(x)

dx

i

= −m

i

.

d

2 _x

i

dt

2 ,

(2.6)

whi h relates the derivative of the potential energy to the hanges of the atomi oordinates in

time. Asthepotentialenergyisa omplexmultidimensionalfun tionthisequation anonlybesolved

numeri ally withsome approximations.

Withthea elerationbeing

a = −

1 m

.

dV

dx

we anthen al ulatethe hangesofthesystemintimeby justknowing (i) the potential energy

V ~

(x)

, (ii)initial oordinates

x

i0

and(iii) an initial distribution

of velo ities,

v

i0

. Thus,this methodis deterministi , meaning we an predi tthestate of thesystem

at any point of time inthefutureor thepast.

The initial distribution of velo ities is usually randomly hosen from a Gaussian or

Maxwell-Boltzmanndistribution[49℄,whi hgivestheprobabilityofatomi havingthevelo ityinthedire tion

of x at thetemperature Tby:

p(v

i

,

x

) =

m

i

2πk

b

T

1 ₂

.exp

−

1 ₂

m

i

v

2 i

,

x

k

b

T

.

Velo ities arethen orre ted sothatthe overallmomentumof thesystemequalsa zerove tor:

P =

N

X

n=1

m

i

.~

v

i

= ~0.

2.3.2 Integration Algorithms

The solution of the equation of motion given above is a rather simple one whi h is only su iently

good over a very short period of time, in whi h the velo ities and a elerations an be regarded as

onstant. Soalgorithmswereintrodu edrepeatedlyperformingsmalltimesteps,thuspropagatingthe

system'sproperties (positions, velo ities and a elerations) intime. Time steps are typi ally hosen

in the range of 1 fs[49℄. It is ne essary to use su h a small timestep, as many mole ular pro esses

o urinsu hsmallperiodsoftime thatthey annotberesolvedwithlarger timesteps. A timeseries

of oordinate sets al ulated this way is referred to as a traje tory and a single oordinate set as a

frame.

2.3.2.1 Verlet Algorithm

Allalgorithmsassumethatthesystem'sproperties anbeapproximatedbyaTaylorseriesexpansion

(32)

~x (t + δt) = ~x (t) + δt.~v (t) +

1

2 δt

2 _{.~a (t) + ....}

~v (t + δt) = ~v (t) + δt.~a (t) +

1

2 δt

2 _{.~b (t) + ....}

~a (t + δt) = ~a (t) + δt.~b (t) +

1

2 δt

2 _{.~c (t) + ....}

with

~x

,

~v

and a being thepositions, thevelo ities and thea elerations of thesystem.The series

expansion is usually trun ated after the quadrati term. Probably the most widely used algorithm

for integrating the equations of motion inMD simulations isthe Verlet algorithm (1967)[48℄,[49℄. It

anbederived bysimplysummingtheTaylorexpressionsfor the oordinates atthetime

(t + δt)

and

(t − δt)

:

~x (t + δt) = ~x (t) + δt.~v (t) +

1

2 δt

2 _{.~a (t) + ....}

~x (t − δt) = ~x (t) − δt.~v (t) +

1 ₂

δt

2 _{.~a (t) − ....}

⇒ ~x (t + δt) = 2~x (t) − ~x (t − δt) + δt

2 .~a (t) .

Thus, itusestheposition

~x (t)

and a eleration

~a (t)

at timet andthepositions fromtheprevious step

~x (t − δt)

to al ulate new positions

~x (t + δt)

. In this algorithm velo ities are not expli itly

al ulated but an be obtained in several ways. One is to al ulate mean velo ities between the

positions

~x (t + δt)

and

~x (t − δt)

.

~v (t) =

1 2δt

.[~x (t + δt) − ~x (t − δt)]

Theadvantagesofthisalgorithmarethatitisstraightforwardandhasmodeststoragerequirements,

omprisingonlytwosetsofpositions[

~x (t)

and

~x (t − δt)

℄andthea elerations

~a(t)

. Thedisadvantage,

however,isitsmoderatepre ision,be ausethepositionsareobtainedbyaddingasmallterm

[δt

2 _~a(t)]

to the dieren e of two mu h larger terms

[2~x(t) − ~x (t − δt)]

. This resultsin roundingerrors due to

numeri al limitationsof the omputer.

Furthermore, this isobviouslynot aself-starting algorithm. New positions

~x (t + δt)

areobtained

from the urrent positions

~x(t]

and the positions at the previous step

~x (t − δt)

. So at t =0 there

are no positions for

(t − δt)

and therefore it is ne essary to provide another way to al ulate them.

Onewayis to usetheTaylor expansiontrun ated aftertherstterm:

~x (t − δt) = ~x (t) − δt.~v (t) + ....

⇒ ~x (−δt) = ~x (0) − δt.~v (0)

2.3.2.2 Leap-Frog Algorithm

There areseveralvariations oftheVerletalgorithm tryingto avoidits disadvantages. Oneexampleis

(33)

~v

t +

1

2 δt

= ~v

t −

1 ₂

δt

+ δt.~a(t)

~x (t + δt) = ~x (t) + δt.~v

t +

1

2 δt

,

where

~a(t)

isobtained using

~a(t) = −

_m

1 .

dV

d~x

.

First, the velo ities

~v t +

1

2 δt

are al ulated from the velo ities at

t − δt

and the a elerations

~a(t)

. Then the positions

~x (t + δt)

arededu ed from the velo ities just al ulated and thepositions

at time t. In this way the velo ities rst 'leap-frog' over the positions and then the positions leap

overthe velo ities. The leap-frog algorithm's advantages overthe Verletalgorithm arethe in lusion

of theexpli it velo ities and the la k ofthe need to al ulate thedieren es between largenumbers.

An obvious disadvantage, however, is that the positions and velo ities are not syn hronized. This

means it is not possible to al ulate the ontribution of the kineti energy (from the velo ities) and

thepotential energy (fromthepositions)to the total energy simultaneously.

2.3.2.3 Langevin Dynami s (LD) Simulation

TheLangevinequationisasto hasti dierentialequationinwhi htwofor etermshavebeenaddedto

Newton'sse ondlawtoapproximate theee ts ofnegle teddegreesof freedom. Onetermrepresents

a fri tional for e, the other a random for e

R

~

. For example, the ee ts of solvent mole ules not

expli itly present in thesystembeing simulatedwouldbeapproximated in termsof afri tional drag

on the soluteas well as random ki ks asso iated withthe thermal motions of the solvent mole ules.

Sin e fri tion opposes motion, therst additional for e isproportional to theparti le's velo ity and

oppositelydire ted. Langevin'sequationfor the motion of atomiis:

~

F

i

− γ

i

~v

i

+ ~

R

i

(t) = m

i

~a

i

,

where

~

F

i

is still the sum of all for es exerted on atom i by other atoms expli itly present in the

system. Thisequation isoftenexpressedinterms ofthe` ollision frequen y'

ζ = γ/m

.

The fri tion oe ient is related to the u tuations of the random for e by the

u tuation-dissipation theorem:

h ~

R

i

(t)i = 0,

Z

h ~

R

i

(0) · ~

R

i

(t)idt = 6k

B

T γ

i

.

In simulations it is often assumed that the random for e is ompletely un orrelated at dierent

times. That is,the above equationtakes theform:

(34)

The temperature of the system being simulated is maintained via this relationship between

~

R(t)

and

γ

.

Thejostlingofasolutebysolvent an expeditebarrier rossing,andhen eLangevindynami s an

sear h onformations better than Newtonian mole ular dynami s (

γ = 0

).

2.4 Adaptive Mole ular Dynami s

2.4.1 Introdu tion

Inthe previoussimpleme hani almodel, theonly wayto redu ethe omputational ostofa

al ula-tionknowing thatsome partofthe systemisless important intheme hanismunderstudy isto x

the orrespondingatoms. Thus,wemusthavesomepriorknowledgeaboutthefun tionofthesystem

andwe ompletely negle tthedynami softhislessimportant partanditspossibleintera tions with

the rea tion site. In other words, there was no method that automati ally determines whi h parts

of the mole ule mustbe pre isely simulated, and whi hparts an be simpliedwithout ae ting the

studyofthemole ularintera tion. Resear hersinStephaneRedon'steam(Nano-D,atINRIA

Greno-ble) have re ently introdu ed adaptive torsion-angle quasi-stati s and Adaptive Mole ularDynami s

(AMD), a general te hnique to rigorously and automati ally determine the most important regions

ina simulationof mole ules represented asarti ulated bodies. At ea h timestep, theadaptive

algo-rithmdetermines theset ofjoints thatshould besimulated inorder to bestapproximate the motion

that would be obtained if all degrees of freedom were simulated, based on the urrent state of the

simulation and user-dened pre ision or time onstraints. They built on previous resear h on

adap-tive arti ulated-bodysimulation[51 ℄ andproposednoveldata stru turesand algorithmsfor adaptive

updateofmole ular for es and energies.

2.4.2 Divide and Conquer Algorithm

The starting point for the study of the dynami s of an arti ulated body is a Divide-And-Conquer

Algorithm proposedbyRoyFeatherstone[52℄. Featherstonere ursively denesan arti ulated body

by assembling two (rigid or arti ulated) bodies together. A omplete arti ulated bodyis thus

repre-sented by a binary tree: the root node des ribes the whole arti ulated body, while ea h leaf node is

a rigid body with a set of handles, i.e. lo ations atta hed to some other rigid bodies. Let C be an

arti ulatedbodywithmhandles, Featherstone denesthearti ulated- bodyequation:







a

1 a

2

. . .

a

m







=







Φ

1 Φ

12 · · · Φ

1m

Φ

21 Φ

2 · · · Φ

2m

. . . . . . . . . . . .

Φ

m1

Φ

m2

· · · Φ

mn













f

1 f

2

. . .

f

m







+







b

1 b

2

. . .

b

m







(2.7)

where a

i

is thespatial a eleration of handle

i

,f

i

is the spatial for e applied to handle

i

,b

i

thebias a elerationofhandle

i

,

Φ

i

istheinversearti ulatedbodyinertiaofhandle

i

and

Φ

ij

the ross- oupling

(35)

inverse inertia between handles

i

and

j

. This equation is theequivalent of the lassi al equation of motion:

a

i

=

1 m

i

.f

i

We thus onsiderthe mole ulesas arti ulated bodies: every rigid bodyis one atom or a groupof

atoms. Joints (handles) between rigid bodies are ovalent bonds around whi h rotation is possible.

Thedynami sis al ulated inthedihedralanglespa e onstitutingthesystemlikeallthe

φ

,

ψ

and

χ

angles inproteins. Ifat agiventime, partsofthesystemare onsidered asrigid,theyforma subtree

ofthe ompleteassemblytreewhere for esneed not bere al ulated. Therest is onstituted ofa tive

jointsforming ana tive region (SeeFig.2.6).

Figure 2.6: A tive andrigid regions. Thegure orrespondsto5 a tive joints

The usefulness of the tree representation will now be made lear: The Featherstone algorithm is

a omplishedintwostages: i)1stagefrombottomtotop,fromtheleavestothetopwherearti ulated

body oe ientsb

i

and

Φ

ij

forea h ompositeobje tCis al ulatedfromithe oe ientsofitssonsA andB.ii)1stagefromtoptobottomtoyieldjointa elerations

q

¨

i

andfor es. Thejointa elerations

¨

q

i

arethe se ondderivatives ofthe motion variables su h as

φ

¨

i

ifwe are interested to themovement

arounddihedralangle

φ

i

Now, atheorem statesthatthe sum ofthesquares ofthea elerations :

A =

X

i

¨

q

2 _i

of the joints inone node of the tree an be al ulated without knowing thevalues of theindividual

¨

q

i

of its hildren nodes. This means that the algorithm an de ide by its own to partition the tree

in a tive and rigid regions, the latter being those where the metri s A is the lowest and thus the

dynami s maybe onsidered asless important for theme hanism understudy.

I parti ipatedin thiswork whi h waspublished inBioinformati s in2007 [53℄. I have used AMD

many times during my thesis both for theintera tive visualization of mole ules and in attempts to

(36)

for es (f

i

) on the system with the omputer mouse and let the system relax to a new equilibrium positiuon.

2.5 Statisti al Me hani s

2.5.1 Introdu tion to Statisti al Me hani s

MD simulations provide information at the mi ros opi level. Statisti al me hani s is then required

to onvert this mi ros opi information to ma ros opi observables su h as pressure, energy, heat

apa ities, et . Statisti al me hani s relates these ma ros opi observables to the distribution of

mole ular positions and motions. Therefore, time independent statisti al averages are introdu ed.

For a better understanding some denitionsare reviewedhere [54 ℄:

2.5.2 Denitions

Theme hani alormi ros opi stateofasystemisdenedbytheatomi positions

x

i

andthemomenta

p

i

=

m

i

v

i

. They an be onsideredasa multidimensional spa e with6N oordinates, for whi h they

both ontribute 3N oordinates. Thisspa e is alledphase spa e.

The thermodynami or ma ros opi state of a systemisdened bya setof parameters that

om-pletely des ribesallthermodynami propertiesofthe system. An examplewouldbethetemperature

T,thepressureP,and thenumberofparti lesN.Allotherproperties an bederived fromthe

funda-mental thermodynami equations.

Anensembleisthe olle tionofallpossiblesystemswhi hhavedierentmi ros opi statesbuthave

the same ma ros opi or thermodynami state. Ensembles an be dened by xed thermodynami

properties asalready stated before. Examples for ensembles withdierent hara teristi s are: NVE,

NVT,NPT,

µ

VT, (E =total energy,P=pressure, V =volume,

µ

= hemi al potential)

2.5.3 Ensemble Averages and Time Averages

In an experiment one examines a ma ros opi sample with an enormously highnumber of atoms or

mole ulesrespe tively. Sothe measuredthermodynami properties ree tanextremelylargenumber

ofdierent onformationsofthesystem,representingasubsetoftheensemble. Wehavetosaysubset,

be auseanensemble isthe omplete olle tionof mi ros opi systems anda ma ros opi sample an

only onsist of a nite number of systems. A su iently bigsample, however, an be seen as good

approximation to an ensemble. That is why statisti al me hani s denes averages orresponding to

experimentallymeasuredthermodynami propertiesasensembleaverages [54℄. Theensembleaverage

isgiven by:

hAi

ensemble

=

Z Z

d~

p

N

d~x

N

A ~

p

N

, ~x

N

ρ ~p

N

_{, ~x}

N

_,

(2.8)

where

hAi

is the measured observable, whi h is stated as a fun tion of the momenta pi and the

positions

~x

i

. Quantity

ρ ~

p

N

_{, ~x}

N

is the probability density for the ensemble and the integration

is performed over all momenta and positions of the system

d~

p

N

,

d~x

N

. So, the ensemble average is

theaverage valueof an observable weighted by its probability. Thisintegral is extremely di ult to