• Aucun résultat trouvé

DNA partitions into triplets under tension in the presence of organic cations, with sequence evolutionary age predicting the stability of the triplet phase

N/A
N/A
Protected

Academic year: 2021

Partager "DNA partitions into triplets under tension in the presence of organic cations, with sequence evolutionary age predicting the stability of the triplet phase"

Copied!
34
0
0

Texte intégral

(1)

DNA partitions into triplets under tension in the presence of organic cations, with

sequence evolutionary age predicting the stability of the triplet phase

--Manuscript

Draft--Manuscript Number: QRBP-D-17-00009

Full Title: DNA partitions into triplets under tension in the presence of organic cations, with sequence evolutionary age predicting the stability of the triplet phase

Short Title: Triplet Phase of Ancient DNA Sequences

Article Type: Report

Corresponding Author: Joshua Berryman LUXEMBOURG Corresponding Author's Institution:

First Author: Amirhossein Taghavi

Order of Authors: Amirhossein Taghavi Paul van der Schoot Joshua Berryman

Abstract: Using atomistic simulations of GC-rich DNA duplexes we show the formation of stable triplet structure when extended in solution over a timescale of hundreds of

nanoseconds, in the

presence of organic salt. We present planar-stacked triplet disproportionated DNA (Σ DNA) as a solution phase of the double helix under tension, subject to the presence of stabilising co-factors. Considering the partitioning of the duplexes into triplets of base-pairs as the first step of operation of recombinase enzymes like RecA, we emphasize the structure-function relationship in Σ DNA. We supplement atomistic calculations with thermodynamic arguments to show that codons for 'phase one' amino acids (those appearing early in evolution) are more likely than a lower entropy GC-rich sequence to form triplets under tension. We further observe that the four amino acids supposed (in the 'GADV' world hypothesis) to constitute the minimal set to produce functional globular proteins have the strongest triplet-forming propensity within the phase one set, showing a series of decreasing triplet propensity with evolutionary newness.

Keywords: dna stretching; S-DNA; sigma DNA; triplet disproportionation; evolution; origin of life

(2)

Triplet phase of ancient DNA sequence 1 Title page 1 2 3 4

DNA partitions into triplets under tension in the presence of organic cations, 5

with evolutionary age predicting the stability of the triplet phase 6

7

Authors’ names: 8

9

Amirhossein Taghavi1, Paul van der Schoot2 and Joshua T. Berryman1*

10 11

1Department of Physics and Materials Science, University of Luxembourg, 162A Avenue de la 12

Faïencerie, Luxembourg City, Luxembourg 13

14

2Department of Applied Physics, Theory of Polymers and Soft matter, Technische Universiteit 15

Eindhoven P.O. Box 513 5600 MB Eindhoven 16 17 18 19 Correspondence: 20 Joshua T. Berryman, 21 22

Address : Campus Limpertsberg, Université du Luxembourg 162 A, avenue de la Faïencerie L-1511

(3)

2 1 2 3 Abstract 4

Using atomistic simulations of GC-rich DNA duplexes we show the formation of stable triplet

5

structure when extended in solution over a timescale of hundreds of nanoseconds, in the

6

presence of organic salt. We present planar-stacked triplet disproportionated DNA (Σ DNA) as

7

a solution phase of the double helix under tension, subject to the presence of stabilising

co-8

factors. Considering the partitioning of the duplexes into triplets of base-pairs as the first step

9

of operation of recombinase enzymes like RecA, we emphasize the structure-function

10

relationship in Σ DNA. We supplement atomistic calculations with thermodynamic arguments

11

to show that codons for ‘phase one’ amino acids (those appearing early in evolution) are more

12

likely than a lower entropy GC-rich sequence to form triplets under tension. We further observe

13

that the four amino acids supposed (in the ‘GADV’ world hypothesis) to constitute the minimal

14

set to produce functional globular proteins have the strongest triplet-forming propensity within

15

the phase one set, showing a series of decreasing triplet propensity with evolutionary newness.

(4)

3 INTRODUCTION

27

Under tension in aqueous solution with small or monatomic counterions, the DNA duplex

28

stretches, unwinding if not topologically constrained, and eventually denatures. The extension

29

against force shows a jump by a factor of ~1.5−1.7 (depending on sequence, pulling geometry

30

and solution) at ~ 65 pN (Williamset al., 2002; Vlassakis et al., 2008; Liu et al., 2010; Bosaeus 31

et al., 2012). Several models have been proposed to explain the sudden increase in length,

32

which is widely agreed to be the signal of a collective structural transition. The formation of

33

regions of single stranded DNA (ssDNA) (Williams et al., 2001) or of ladder-like stretched

34

and untwisted double stranded DNA (dsDNA) have been suggested (Cluzel et al., 1996;

35

Konrad and Bolonick, 1996; Lebrun and Lavery, 1996a; Smith et al., 1996). At modest

36

extensionsof sequences not dominated by AT base pairs, we expectto see a partly untwisted

37

ladder-like structure, in which the base pairs remain intact but the rise per base pair is

38

equilibrated to a new value of ~ 5.8 Å, compared to the rise in unstretched B-DNA of 3.4 Å.

39

This stretched phase or phases is known by the umbrella label of ‘S-DNA’. For

GC-40

rich structures having strong hydrogen bonding the base pairing is preserved in the S-DNA

41

structure, and the base stacking may also be somewhat preserved by tilting and sliding of the

42

base pairs or by opening of ‘bubbles’ between base-pairs. Reorientation of the base pairs

43

increases the solvent-exposed area while permitting them to remain in contact such that a

44

complete water gap does not open between them. The detailed S-DNA structure, particularly

45

the inclination, depends on the pulling scheme. When the 5´ ends of each strand are pulled, tilt

46

angle increases gradually until the terminal H-bonds are disrupted, while in the 3´3´ pulling

47

regime the tilt angle is decreased and no early breakage of H-bonds is seen (Lavery et al., 2002; 48

Li andGisler, 2009; Danilowicz et al., 2009; Bag et al., 2016).

49

The most readily available information on DNA under tension is the empirically

50

measured force-extension curve (Smith et al., 1992; Rief et al., 1999), which provides the clear

(5)

4

signal of some transition, but no atomistic-level information. This is supplemented by

52

fluorescence and polarised-light studies (Nordén et al., 1992; van Mameren et al., 2009; King 53

et al., 2013), and by atomistic simulationswhich are able to provide explicit descriptions ofthe

54

DNA but which are limited in the accessible timescalesand system sizes (Lebrun and Lavery,

55

1996b; Konrad and Bolonick, 1996; Li and Gisler, 2009). Simple energy minimisation of

56

d(GCG)4 DNA under extension (without thermal fluctuations or explicit solvent) yields

57

partition into four base-stacked triplets (Bertucat et al., 1998), however subsequent fully

58

dynamic simulations have shown instead the irregular formation of ‘denaturation bubbles’

59

(Harris et al., 2005; Rezác et al., 2010), different from the formation of regular triplets both in

60

the irregularity of the spacing and in the large disruption of base planarity and base-pairing

61

near to the solvent filled cavities formed.

62

DNA is often subjected to tension in its biological context, for purposes including

63

transport, transcription and tertiary structure manipulation (Nicklas, 1998) . A striking example

64

of this is the crystal structure (pdb: 3cmt) of DNA bound to the RecA protein (Chen et al.,

65

2008), a snapshot of the fundamental process of sexual reproduction: the recombination of

66

homologous DNA from two parent organisms. In this structure the extended protein-bound

67

DNA duplex does not adopt a recognised S-like configuration, but rather disproportionates into

68

groups of three bases, with orderly planar base stacking retained within each triplet. This triplet

69

disproportionation has been observed in solution when bound to RecA (Nordén et al.,1992), in

70

crystallogrphy structure of RecA-DNA complex (Chen et al., 2008) and also has been

71

suggested as a stable phase even without co-factors (Bosaeus et al., current work).

72

Orderly triplet formation when complexed is in contrast to current general

73

understanding of the structural behaviour when extended in solution, which leads us to examine

74

whether the triplet phase can be stabilised in solution and if it could in this case be considered

75

a canonical biologically active structure of DNA on the same footing as the A, B and Z forms.

(6)

5

Using molecular dynamics simulations of duplex DNA with an applied force, we do not

77

observe stable triplet structure in an aqueous solution of monatomic counterions, but do find

78

that it is stable without specific complex to a structured enzyme, forming triplets in a solution

79

either of terminus capped monomeric Arginine peptides (Ac-Arg+-NHMe Cl-) or (more

80

weakly) of the well-known intercalant Ethidium Bromide (Et+Br-).

81

The presence of intercalators has been observed to drive significant alterations in the 82

quasiequilibrium force vs. extension curve, with an effect at high concentrations of smoothing 83

over the B to S transition and possibly of modifying the structure of the S phase (Vladescu et 84

al., 2008). By carrying out simulated stretching experiments in the presence of DNA-binding 85

cofactors, we intend to reduce the barrier and collectivity associated 86

with the B-S transition, thus increasing the likelihood that the sub-microsecond simulation 87

timescale can describe the real (millisecond) process, and also to investigate the structural role 88

of biologically relevant moieties (Arg) in relation to DNA under tension. 89

We discuss planar-stacked triplet disproportionated DNA as a solution phase of the

90

double helix under tension, and refer to it as ‘Σ DNA’, with the three right-facing points of the

91

Σ character serving as a mnemonic for the three grouped base pairs. In the same way as for

92

unstretched Watson-Crick base paired DNA structures, we remark that the structure of the Σ

93

phase ones linked to function: the partitioning of bases into codons of three base-pairs each is

94

the first phase of operation of recombinase enzymes such as RecA, facilitating alignment of

95

homologous or near homologous sequences. By showing that this process does not require any

96

very sophisticated manipulation of the DNA, we position it as potentially appearing as an early

97

step in the development of life, and correlate the postulated sequence of incorporation of amino

98

acids (phase zero (the GADV world) (Ikehara et al., 2002), phase one and phase two (Wong,

99

1975; Wong, 2005;Koonin and Novozhilov, 2009), into molecular biology with the ease of

Σ-100

formation for sequences including the associated codons. We also note that the machinery of

(7)

6

nucleotide to peptide translation occurs necessarily with reference to triplets of bases, so that

102

further investigation into the Σ phase of single and double strands of RNA and DNA might be

103

a valuable source of insight into the origins not only of recombination, but also of gene

104 expression. 105 106 METHODS 107

Molecular structures were prepared using the Nucleic Acid Builder (NAB) (Macke and Case,

108

1998). Salt and water were represented using the Joung-Cheatham (Joung and Cheatham III,

109

2008) and TIP3P (Jorgensen et al., 1983) parameters, with the AMBER15 forcefield (Ivani et 110

al., 2015) used for DNA and the AMBER14SB forcefield used for peptides (Maier et al.,2015).

111

The Ethidium molecule was represented using the GAFF (Wang et al., 2004) with partial

112

charges and bond parameters assigned via the ANTECHAMBER tool (Wang et al., 2006).

113

Simulations were run using the GPU-accelerated implementation of pmemd (Götz et al., 2012)

114

in the AMBER16 package (Case et al., 2016). For each calculation, 16 independent replicas

115

were preparedand equilibrated in the B conformation for 10 ns. Pulling of the DNA then took

116

place using steered molecular dynamics, over a time period of 150 ns (giving a pulling rate of

117

0.68 Å ns-1). DNA was pulled using force applied to the centres of geometry of the top and

118

bottom base-pairs, such that no topological restraint was applied and force was distributed

119

equally between 3´ and 5´ strand ends. Averaging of angles (for study of DNA structural

120

parameters) was carried out by taking the mean cosine and sine, then the arctangent of the mean

121

values. Structures were analysed using CURVES+ (Lavery et al., 2009).

122

In order to present a dimensionless relative extension the unstretched contour length for

123

24 bp was estimated giving values of 3.46 Å (Arg Cl), 3.43 Å (EtBr) and 3.43 Å (NaCl) for

124

the [G12C12] sequence and 3.57 Å (Arg Cl), 3.41 Å (EtBr) and 3.38 Å (NaCl) for the

125

[(GGC)4(GAC)4].[(GTC)4(GCC)4] sequence.

(8)

7 RESULTS

127

Sequence-Dependence of Disproportionation 128

It is not clear what form the original genetic code had, as it is likely to have co-evolved to some

129

extent with the associated enzymes of transcription and translation. We can make a guess about

130

the history of the genetic code by considering the metabolic networks leading to the different

131

amino acids: it is hypothesised that a list of so-called ‘phase one’ amino acids were present

132

earlier in evolution than the ‘phase two’ amino acids, based on the complexity of the cellular

133

machinery used in current organisms to synthesize, for example, Methionine (M) from

134

Threonine (T) (Wongand Bronskill, 1979; Koonin and Novozhilov, 2009). If the genetic code

135

in the epoch of a much simplified amino-acid alphabet already had the current structure of three

136

basepairs per codon it was therefore highly redundant at this time.

137

In the current triplet code, the ‘phase one’ amino acids supposed to have been

138

incorporated earliest into biology (a list of ADEGILPSTV) are coded by triplets which have a

139

specific physical tendency: the energetic cost to break base-stacking at the triplet boundary is

140

low, relative to the complete modern genetic code. In this paper we first motivate the statistical

141

observation of preferential triplet disproportionation in the phase one genetic code. We then

142

analyse atomistic simulation data to show that disproportionation into coding triplets occurs

143

spontaneously under tension for appropriate sequences and solution conditions.

144

The weak form of our observation provides a physical mechanism to minimise

read-145

frame and recombination alignment errors in the early evolution of the genetic code. We further

146

motivate this to make the stronger claim of a possible route for the origin of the triplet genetic

147

code, with the three base-pair structure arising from simple physical conditions in the absence

148

of the sophisticated enzymatic machinery which later evolved to maintain the triplet code in

149

modern organisms with a full alphabet of amino-acids.

(9)

8

The free energy of base-stacking in duplex DNA was long ago calculated

151

combinatorically for the various pairs of bases, by Friedman and Honig (Friedman and Honig,

152

1995). Although free energy calculations for nucleic acids remain subtle and difficult two

153

decades after this initial work, as the results are in accordance with chemical intuition we can

154

be confident in the ranking of the different pairs, and the absolute values are in any case less

155

important for the following discussion. From this tabulated data we can see that the weakest

156

step is CG (-4.36 kcal/mol), and the strongest is GG (-7.79). If we approximate the stacking

157

energy for complementary duplex DNA as the sum of the stacking energies for the two

base-158

steps, the weakest step remains CG-CG (-8.72 kcal/mol) and the strongest is GG-CC, with an

159

energy of -12.3 kcal/mol.

160

Based on the stacking energy of complementary pairs, it is possible to arrive at the

161

energetic cost to separate a given codon from its neighbours as a triplet of stacked base-pairs.

162

If we consider the duplex xGGCy-pGCCq (coding for GLY on strand 1, ALA on the

163

complementary strand, where x,y,p,q are bases from the adjacent codons), then the stacks

164

needed to find the triplet disproportionation energy Gτ are xG, Cy, pG, Cq. In order to select 165

x, y we randomly choose two amino acids from the set of phase one residues ADEGILPSTV,

166

and then randomly choose a codon for the given amino acid subject to the constraint that if a

167

codon beginning in G or ending in C (or both) is available in the genetic code, this codon is

168

preferred. The bases p, q are then selected as complementary to x, y. After sampling 106 amino

169

acid pairs, it is then possible to tabulate the average energy to partition into a triplet associated

170

with a given codon (Gτ). 171

If we assume that the DNA is subject to tension such that it must partition in some way,

172

and that the partitions must be somewhat evenly spread (here we arbitrarily assume two breaks

173

per five base-pairs) then we can present a relative free energy ΔGτ by comparing against the 174

alternative pairs of sites at which to break base-stacking. Combinatorics gives 1/2 (N2 – N)

(10)

9

such site combinations for a stretch of N steps, where N is 1 less than the number of base pairs.

176

Here 1/2 (N2 - N) = 6, leaving 5 site pairs for step breakage not including the pair which defines

177

a ‘Σ’ triplet. To get a probability, the comparison should be to a Boltzmann-weighted sum of

178

all alternative energies, i.e.:

179 1..5 / / /

e

e

( )

e

B B B i k T i G k T G G k T

p

 

   

180

Tabulating this information for the 20 canonical amino acids, we can see a clear pattern

181

of reduced triplet disproportionation energy for the primordial ‘stage one’ amino acids (Table

182 1). 183

Table 1.

184 185

The dramatic pattern evident in the tabulated partitioning energies is that stage one

186

amino acids overwhelmingly have relatively favourable free energies to partition into triplets

187

aligned to their codons. The exception to this pattern is interesting: Arginine (R) is not listed

188

in Wong and Bronskill’s 1979 tabulation of stage one amino acids (Wong and Bronskill, 1979),

189

possibly due to the large energetic cost needed to synthesize it from citrulline in modern

190

organisms (Ratnerand Petrack, 1953).

191

It has been advanced the CGN and AGN (where N = ‘anything’) codons which yield 192

Arginine in the modern genetic code previously coded for the chemically similar non-canonical 193

amino acid Ornithine (Jukes, 1973), and that the function of this codon was usurped in a 194

presumably dramatic evolutionary event when selection advantage was found in having access 195

to the more strongly basic Arginine molecule. The original list ADEGILPSTV contains no 196

(11)

10 range of folded proteins, and the replacement of Ornithine with Arginine a beneficial 198

evolutionary step in giving access to a stronger base. 199

Arginine stands out for a second reason: the DNA-binding recombinase RecA achieves

200

triplet disproportionation by cradling the negatively charged DNA in a large number of

201

positively charged R side-chains (and some K). Thus we should perhaps not be surprised if a

202

phase of biochemical evolution in which control of triplet disproportionation is important

203

should have some means to produce either Arginine or a similar moderately bulky basic

204

residues.

205

The weaker statement of this work, that the stage one part of the genetic code is

206

structured so as to support a minimisation of read-frame errors by physically favouring the

207

partition into aligned triplets under tension, is related to a known subtle and remarkable

208

property of the genetic code. This property is that its redundancy is structured almost-optimally

209

so as to support overlapping codes orthogonal to the primary code specifying amino acids

210

(Itzkovitz andAlon, 2007), allowing the evolution of sequence changes altering DNA structure

211

and interactions even within protein coding regions, without changing the coded protein. The

212

overall flexibility of the genetic code in allowing arbitrary steganographic codes is not however

213

sufficient to explain the strong pattern which we observe: Table 2 shows that codons for phase

214

one amino acids are significantly more able to encode this partitioning than those in phase 2.

215

We further observe that the residues advanced by Ikehara et al. (Ikehara et al. 2002) as forming

216

the minimal set for a functional proteome (marked ☥ in Table 2) are also those which partition

217

most naturally into triplets.

218

Table 2.

219

220

(12)

11 Spontaneous Triplet Disproportionation Under Tension, Amplified in the Presence of 222

Organic Cations 223

Beyond the pairwise hydrophobic and electrostatic interactions of base stacking (covered by

224

the classic calculation used to generate Tables 1 and 2) the potential importance of complex

225

entropic, structural and solvent effects makes it necessary to carry out a full atomistic molecular

226

dynamics investigation of DNA under tension. Given the expected importance of sequence

227

effects, simulations were run both with a low-entropy sequence of d[G12C12] (encoding 4

228

glycines and 4 prolines) and a sequence chosen to show strong triplet disproportionation based

229

on Table 1, d[(GGC)4(GAC)4], encoding four repeats each of the high-scoring amino acids Gly

230

and Asp on the first strand, then Val and Ala on the complementary strand (the GADV set of

231

(Ikehara et al. 2002). The DNA duplexes were stretched by an additional 100 Å from their

232

relaxed lengths, over a time period of 150 ns, giving a stretching rate of 0.029 Å ns-1 bp-1.

233

Because of the apparent importance of Arginine, based on Table 1 and on the RecA structure

234

(Chen et al., 2008), simulations were run both in NaCl and in a solution of Ac-Arg-NHMe Cl,

235

with the capped Arginine molecule replacing sodium as the positive counterion.

236

We find that for the GC-rich sequence encoding phase one amino acids, the

triplet-237

disproportionated Σ-phase of DNA is observed, with the strongest triplet formation taking

238

place in the presence of the terminus-capped Arginine residues (Ac-Arg-NHMe). Fig. 1 shows

239

a regular pattern of vertical bases with spacing 3 bp, over a large range of extensions. The

low-240

entropy sequence in the presence of Arginine shows some weak structure at high extensions,

241

due to exclusion effects which disfavour binding of cations to adjacent sites. In the

high-242

entropy sequence, some structure of period three is seen, even in the absence of Arginine,

243

however this is relatively weak (as suggested by the order1 𝑘𝐵𝑇 free energies of

244

disproportionation in Table 1).

245

Fig. 1

(13)

12

The triplet-disproportionated structures show the essential features of Σ-DNA (Fig. 2) as seen

247

in the RecA bound crystal structure: preserved Watson-Crick base-pairing, approximately

248

planar orientation of the bases (Fig. 3), and a large cavity every third base pair.

249

250

Fig. 2

251

Extending beyond approximately 3 Å/bp leads to breakup of the Σ phase and also to loss of

252

Watson-Crick hydrogen bonding, as the bases interdigitate with each other and hydrogen bond

253

to the backbone.

254

The average base-pair inclination in the high entropy sequence d[(GCC)4(GAC)4] in

255

the presence and absence of intercalators up to the extension point of 1.5 follow the same

256

pattern and remain flat (Fig. 3b,d,f) which indicates base-pair perpendicularity with respect to

257

the helix axis (Nordén et al., 1978; Edmondson and Johnson, 1986, Bosaeus et al., 2012). In

258

the presence of intercalators this trend continues after extension point of 1.5 but shows a sudden

259

drop for the duplexes in NaCl after a relative extension of 1.7. For the low entropy sequence

260

d[(G)12(C)12], the change of average inclination up to an extension of 1.5 is the same as for the

261

high entropy sequence. The bare sequence and the one in the presence of Arginine reach a

262

maximum inclination at a relative extension of 1.6-1.7 and drop afterwards (Fig. 3a,c) but in

263

the presence of EtBr a continuous increase is observed after extension 1.5, followed by a second

264

flat region after 1.6 (Fig. 3e). That average inclination tends to be small during DNA extension

265

for the high entropy sequence with or without organic cations is consistent with the results put

266

forward from experiment (Bosaeuset al. Current work) as suggesting Σ formation even in free

(14)

13 271

DISCUSSION

272 273

DNA behaviour under tension is affected by factors like the counterions (Vlassakis et al.,

274

2008), sequence (Riefet al., 1999) and temperature (Fu et al. 2010). Molecular combing results

275

show that DNA in its stretched form is not denatured, with a double-helix structure which is

276

characterized by a diameter of 1.2 nm (Maaloum et al., 2011). X-ray diffractions of stretched

277

cross-link films of a mixed sequence of DNA show gaps of ~8 Å (André et al., 2008), nearly

278

the same size as seen in the DNA-RecA complex. These experimental results suggest the

279

existence of a stretched form of DNA with preserved base-pair stacking.

280

In order to relate the physics of DNA stretching with its function in storing and copying

281

information, we have estimated the sequence dependence of the free energy cost in water at

282

moderate ionic strength to separate a given codon from its neighbours as a triplet of stacked

283

base-pairs, and found that although the aqueous solution environment does not strongly drive

284

triplet partitioning, that a distinct hierarchy of triplet formation energies exists with respect to

285

sequence features. The triplet formation energy estimates showed that sequences coding for

286

‘stage one’ amino acids hypothesized to have appeared early in evolution (plus Arginine) are

287

more likely than otherwise to partition into triplets at the codon boundaries when under tension.

288

In order to investigate this phenomenon we carried out pulling simulations of DNA duplexes

289

encoding stage one amino acids, in the presence of Arginine and also of Ethidium Bromide, as

290

well as control simulations using low-entropy sequences, and in aqueous conditions with

291

monatomic salt only.

292

In order to observe strong triplet disproportionation both a bulky organic cation (i.e.

293

Arginine or Ethidium) and a sequence selected from codons yielding phase-one amino acids

294

was required, with the combination of these two factors operating in a non-additive way to

295

produce a solution structure of stacked base-pair triplets. Overstretching the Σ-duplex led to

(15)

14

formation of interdigitated zipper DNA, stretching without cofactors or appropriate sequence

297

led to disordered but not fully denatured structure consistent with experiments (Balaeff et al., 298

2011; Bosaeuset al. Current work) and simulations (Konrad et al. 1996).

299

Ikehara and co-workers have shown that codons matching the pattern GNC (where N 300

signifies “anything”) probably constituted the original, minimal functional genetic code. These

301

authors argue based on multiple strands of reasoning that the amino acids GADV, translated

302

from these codons, constitute the unique minimal adequate set to generate functional globular

303

proteins (Ikehara et al. 2002). We argue that it is no coincidence that the same GNC codons

304

are those which drive maximal triplet disproportionation, allowing recombination and possibly

305

protein synthesis to operate without the sophisticated enzymatic machinery which exists today.

306

We hypothesize that a bootstrap process took place, with crude triplet disproportionation

307

facilitating recombination, driving accelerated evolution and leading to more sophisticated

308

protein-DNA interactions, in turn allowing expansion in stages of the genetic code.

309 310 311 312 313 314 315 Speculative box 316

We speculate that Ornithine (instead of Arginine) may play the role of triplet promoter 317

in some organisms, or have done so earlier in the evolutionary process. The usurpation 318

of Ornithine codons to instead signify Arginine may have led to a jump in the efficiency 319

of recombination and an evolutionary explosion. 320

(16)

15 322

Acknowledgements 323

This project was financially supported by the University of Luxembourg. Computer

324

simulations presented in this paper were carried out using the HPC facility of the University of

325

Luxembourg (Varrette et al., 2014). We thank Bengt Nordén for considerable guidance,

326

criticism and inspiration.

327

328

329

Financial support 330

This work was supported by the University of Luxembourg (grant number R-AGR-0136).

(17)

16 References

BAG, S., MOGURAMPELLY, S., GODDARD III, W. A., AND MAITI, P. K. (2016). Dramatic changes in DNA conductance with stretching: structural polymorphism at a critical extension.

Nanoscale, 8(35):16044–16052.

BALAEFF, A., CRAIG, S. L., BERATAN, D. N. (2011). B-DNA to Zip-DNA: Simulating a DNA transition to a novel structure with enhanced charge-transport characteristics. Journal of Physical Chemistry A 115(34), 9377–9391.

BERTUCAT, G., LAVERY, R., AND PRÉVOST, C. (1998). A model for parallel triple helix formation by RecA: Single-strand association with a homologous duplex via the minor groove. Journal of

Biomolecular Structure and Dynamics, 16(3), 535–546.

BOSAEUS, N., EL-SAGHEER, A. H., BROWN, T., SMITH, S. B., ÅKERMAN, B., BUSTAMANTE, C., AND NORDÉN, B. (2012). Tension induces a base-paired overstretched DNA conformation.

Proceedings of the National Academy of Sciences, 109(38), 15179–15184.

CASE, D. A., BETZ, R. M., CERUTTI, D. S., CHEATHAM, III, T. E., DARDEN, T. A., DUKE, R. E., GIESE, T. J., GOHLKE, H., GOETZ, A. W., HOMEYER, N., IZADI, S., JANOWSKI, P., KAUS, J., A. KOVALENKO, LEE, T. S., LEGRAND, S., LI, P., LIN, C., LUCHKO, T., LUO, R., MADEJ, B., MERMELSTEIN, D., MERZ, K. M., MONARD, G., NGUYEN, H., NGUYEN, H. T., OMELYAN, I., ONUFRIEV, A., ROE, D. R., ROITBERG, A., SAGUI, C., SIMMERLING, C. L., BOTELLO-SMITH, W. M., SWAILS, J., WALKER, R. C., WANG, J., WOLF, R. M., WU, X., XIAO, L. AND KOLLMAN P. A. (2016), AMBER 2016, University of California, San Francisco.

CHEN, Z., YANG, H., AND PAVLETICH, N. P. (2008). Mechanism of homologous recombination from the RecA-ssDNA/dsDNA structures. Nature, 453(7194):489–494.

CLUZEL, P., LEBRUN, A., HELLER, C., LAVERY, R. (1996). DNA: an extensible molecule.

Science, 271(5250):792.

(18)

17

structure of DNA overstretched from the 3´3´ ends. Proceedings of the National Academy of Sciences, 106(32), 13196–13201.

EDMONDSON, S. P., JOHNSON JR, W. C. (1986). Base tilt of B-form poly[d(G)]-poly[d(C)] and the B- and Z-conformations of poly[d(GC)]-poly[d(GC)] in solution, Biopolymers, 25, 2335–2348. FRIEDMAN, R. A. AND HONIG, B. (1995). A free energy analysis of nucleic acid base stacking in aqueous solution. Biophysical Journal, 69(4):1528–1535.

FU, H., CHEN, H., MARKO, J. F., YAN, J. (2010) Two distinct overstretched DNA states. Nucleic Acids Research, 38(16), 5594-600.

GÖTZ, A. W., WILLIAMSON, M. J., XU, D., POOLE, D., LE GRAND, S., AND WALKER, R. C. (2012). Routine microsecond molecular dynamics simulations with amber on gpus. 1. generalized born. Journal of Chemical Theory and Computation, 8(5), 1542–1555.

HARRIS, S. A., SANDS, Z. A., AND LAUGHTON, C. A. (2005). Molecular dynamics simulations of duplex stretching reveal the importance of entropy in determining the biomechanical properties of DNA. Biophysical Journal, 88(3), 1684–1691.

IKEHARA, K., OMORI, Y., ARAI, R., AKIKO HIROSE, A. (2002). A novel theory on the origin of the genetic code: A GNC-SNS Hypothesis. Journal of Molecular Evolution, 54:530–538.

ITZKOVITZ, S. AND ALON, U. (2007). The genetic code is nearly optimal for allowing additional information within protein coding sequences. Genome Research, 17(4), 405–412.

IVANI, I.; DANS, P. D.; NOY, A.; PÉREZ, A.; FAUSTINO, I.; HOSPITAL, A.; WALTHER, J.; ANDRIO, P.; GOÑI, R.; BALACEANU, A.; PORTELLA, G.; BATTISTINI, F.; GELPÍ, J. L.; GONZÁLEZ, C.; VENDRUSCOLO, M.; LAUGHTON, C. A.; HARRIS, S. A.; CASE, D. A.; OROZCO, M. Parmbsc1: A Refined Force Field for DNA Simulations. Nature Methods 2015, 13, 55– 58.

(19)

18 JOUNG, I. S. AND CHEATHAM III, T. E. (2008). Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. The Journal of Physical Chemistry B, 112(30), 9020–9041.

JUKES, T. H. (1973) Arginine as an evolutionary intruder into protein synthesis. Biochemical and Biophysical Research Communications. 53(3):709-14.

KING, G. A., GROSS, P., BOCKELMANN, U., MODESTI, M., WUITE, G. J. L., AND PETERMAN, E. J. G. (2013). Revealing the competition between peeled ssDNA, melting bubbles, and S-DNA during DNA overstretching using fluorescence microscopy. Proceedings of the National Academy of Sciences, 110(10), 3859–3864.

KONRAD, M. W. AND BOLONICK, J. I. (1996). Molecular dynamics simulation of DNA stretching is consistent with the tension observed for extension and strand separation and predicts a novel ladder structure. Journal of the American Chemical Society, 118(45):10989–10994.

KOONIN, E. V. AND NOVOZHILOV, A. S. (2009). Origin and evolution of the genetic code: the

universal enigma. International Union of Biochemistry and Molecular Biology Life, 61(2):99–

111.

LAVERY, R., LEBRUN, A., ALLEMAND, J.F., BENSIMON, D., AND CROQUETTE, V. (2002). Structure and mechanics of single biomolecules: experiment and simulation. Journal of Physics:

Condensed Matter, 14(14):R383.

LAVERY, R., MOAKHER M., MADDOCKS, J. H., PETKEVICIUTE, D., ZAKRZEWSKA, K. (2009). Curves+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures. Nucleic Acids Research, 37:5917-5929.

LEBRUN, A. AND LAVERY, R. (1996). Modelling extreme stretching of DNA. Nucleic Acids

Research, 24(12):2260–2267.

LI, H. AND GISLER, T. (2009). Overstretching of a 30 bp DNA duplex studied with steered molecular dynamics simulation: Effects of structural defects on structure and force-extension relation. The

(20)

19

LIU, N., BU, T., SONG, Y., ZHANG, W., LI, J., ZHANG, W., SHEN, J., AND LI, H. (2010). The nature of the force-induced conformation transition of dsDNA studied by using single molecule force spectroscopy. Langmuir, 26(12), 9491–9496.

MACKE, T. AND CASE. D. A. Modeling unusual nucleic acid structures. In Molecular Modeling of Nucleic Acids, N.B. LEONTES AND J. SANTALUCIA, JR., eds. (Washington, DC: American Chemical Society, 1998), pp. 379-393.

JAMES A. MAIER, J. A., MARTINEZ, C., KASAVAJHALA, K., WICKSTROM, L., HAUSER, K. E., AND SIMMERLING C. (2015) ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation. 11 (8), pp 3696– 3713.

NORDÉN, B., SETH, S., TJERNELD, F. (1978) Renaturation of DNA in ethanol-methanol solvent induced by complexation with methyl green. Biopolymers, 17, 523–525.

NORDÉN, B., ELVINGSON, C., KUBISTA, M., SJÖBERG, B., RYBERG, H., RYBERG, M., MORTENSEN, K., AND TAKAHASHI, M. (1992). Structure of RecA-DNA complexes studied by combination of linear dichroism and small-angle neutron scattering measurements on flow-oriented samples. Journal of Molecular Biology, 226(4), 1175–1191.

NICKLAS, R. B. (1988). The forces that move chromosomes in mitosis. Annual Review of Biophysics and Biophysical Chemistry, 17(1):431–449.

RATNER, S. AND PETRACK, B. (1953). The mechanism of Arginine synthesis from citrulline in kidney. Journal of Biological Chemistry, 200(1):175–185.

ŘEZÁČ, J., HOBZA, P., AND HARRIS, S. A. (2010). Stretched DNA investigated using molecular-dynamics and quantum-mechanical calculations. Biophysical journal, 98(1), 101–110.

RIEF, M., CLAUSEN-SCHAUMANN, H., AND GAUB, H. E. (1999). Sequence-dependent

mechanics of single DNA molecules. Nature Structural & Molecular Biology, 6(4), 346–349.

SMITH, S., FINZI, L., AND BUSTAMANTE, C. (1992). Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science, 258(5085), 1122–1126.

(21)

20

VAN MAMEREN, J., GROSS, P., FARGE, G., HOOIJMAN, P., MODESTI, M., FALKENBERG, M., WUITE, G. J. L., AND PETERMAN, E. J. G. (2009). Unraveling the structure of DNA during over-stretching by using multicolor, single-molecule fluorescence imaging. Proceedings of the National

Academy of Sciences, 106(43), 18231–18236.

VARRETTE, S., BOUVRY, P., CARTIAUX, H., AND GEORGATOS, F. (2014). Management of an academic hpc cluster: The UL experience. In Proc. of the 2014 Intl. Conf. on High Performance Computing & Simulation (HPCS 2014), pages 959– 967, Bologna, Italy. IEEE.

VLADESCU, I. D., MCCAULEY, M. J., ROUZINA, I., WILLIAMS, M. C. (2005). Mapping the phase diagram of single DNA molecule force-induced melting in the presence of ethidium. Physical Review Letters. 95(15), 158102.

VLASSAKIS, J., WILLIAMS, J., HATCH, K., DANILOWICZ, C., COLJEE, V. W., AND PRENTISS, M. (2008). Probing the mechanical stability of DNA in the presence of monovalent cations.

Journal of the American Chemical Society, 130(15), 5004–5005.

WANG, J., WANG, W., KOLLMAN, P. A., AND CASE, D. A. (2006). Automatic atom type and bond type perception in molecular mechanical calculations. Journal of Molecular Graphics and Modelling, 25(2), 247–260.

WANG, J., WOLF, R. M., CALDWELL, J. W., KOLLMAN, P. A., AND CASE, D. A. (2004). Development and testing of a general amber force field. Journal of Computational Chemistry, 25(9):1157–1174.

WILLIAMS, M. C., ROUZINA, I., AND BLOOMFIELD, V. A. (2002). Thermodynamics of DNA interactions from single molecule stretching experiments. Accounts of Chemical Research, 35(3), 159– 166.

WILLIAMS, M. C., WENNER, J. R., ROUZINA, I., AND BLOOMFIELD, V. A. (2001). Effect of pH on the overstretching transition of double-stranded DNA: evidence of force-induced DNA melting.

Biophysical Journal, 80(2), 874–881.

(22)

21

WONG, J.T.F. (1975). A co-evolution theory of the genetic code. Proceedings of the National Academy

of Sciences, 72(5), 1909-1912.

(23)

22 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

(24)

23 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

(25)

24 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

(26)

25 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

(27)

26 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

(28)

27 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

Figure 1.

Kymographs of rise per bp-step under imposed whole- DNA extension. Triplet disproportionation is strongly evident in (b), while the strain is spread most evenly in (c). Presence of Arginine in a homogenous sequence (a) or presence of CG steps in the absence of Arginine (d) induce only weakly structured disproportionation.

Figure 2.

The ‘primordial’ sequence partitions under tension predominantly at the CG steps, forming triplets (a),

with Watson-Crick hydrogen bonding and planar base stacking preserved subject to some thermal disorder (a,b). Triplets are stabilised by one or two Arginines intercalating the stretched base steps (b,c) with non-specific binding that tends to place the charged end of the side-chain close to the phosphate, and partially or entirely excludes water from between the bases.

(c) is an axial view of the highlighted cavity in (b).

Figure 3.

(29)

28 Amirhossein Taghavi, Paul van der Schoot and Joshua T. Berryman

DNA partitions into triplets under tension….

Table 1.

Phase one amino acids (*,☥) tend to have (at least one) codon associated with them that partitions favourably at its boundaries. The most favourable partitioning is for the phase zero (DAGV) amino acids (☥). X indicates a stop codon, other letters are standard amino acid abbreviations. Energy units are kcal/mol.

Table 2.

(30)
(31)
(32)
(33)
(34)

Références

Documents relatifs

Conversely, it is easy to check that the latter condition implies that T a is bounded.. Moreover Λ is composed of eigenvalues λ associated with finite dimensional vector

ϭ /EZ͕hZϰϬϳWĂƚŚŽůŽŐŝĞsĠŐĠƚĂůĞ͕ϴϰϭϰϯDŽŶƞĂǀĞƚĞĚĞdž͕&ƌĂŶĐĞ

This diversity of sensitivity to antimicrobial compounds produced by biocontrol agents was also observed among different anastomosis groups within a species or different species

Prior to the development of the Nob Hill complex on what was the east side of the Riker Hill quarry, the exposed section below the Hook Mountain Basalt consisted of the uppermost

phase difference across the barrier becomes linear in time in the same way as the phase difference of the order parameter of a superconductor or of a superfluid in the presence of

R&um6 : Les modkles thermodynamiques classiques ne permettent pas une description rkaliste de la skgrkgation intergranulaire, car ils ne prennent pas en compte les

and in [1, 2] effectively reduces to the model of the superconducting chain array with a Josephson cou- pling, and we can use the advantages of this model for the

But at the intensive margin, participants' EDA rises in their own ODA only for those who face their own honesty rst ( 0.16 − 0.23 is not signicantly dierent from 0 in Model (5) of