HAL Id: hal-02402186
https://hal.archives-ouvertes.fr/hal-02402186
Preprint submitted on 10 Dec 2019
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Neo-formation of chromosomes in bacteria.
Olivier Poirion, Bénédicte Lafay
To cite this version:
Olivier Poirion, Bénédicte Lafay. Neo-formation of chromosomes in bacteria.. 2019. �hal-02402186�
Neo-formation of chromosomes in bacteria
1
Olivier B. Poirion
1,2†& Bénédicte Lafay
1,2,3*2
1
Université de Lyon, F-69134 Lyon, France 3
2
CNRS (French National Center for Scientific Research) UMR5005, Laboratoire 4
Ampère, École Centrale de Lyon, 36 avenue Guy de Collongue, 69134 Écully, France 5
3
CNRS (French National Center for Scientific Research) UMR5558, Laboratoire de 6
Biométrie et Biologie Évolutive, Université Claude Bernard – Lyon 1, 43 boulevard 7
du 11 novembre 1918, 69622 Villeurbanne cedex, France 8
†
Current address: Center for Epigenomics, Department of Cellular and Molecular 9
Medicine, University of California, San Diego, School of Medicine, 9500 Gilman Drive, 10
La Jolla, CA 92093, USA 11
* Author for correspondence: benedicte.lafay@univ-lyon1.fr 12
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
A BSTRACT 13
Although the bacterial secondary chromosomes/megaplasmids/chromids, first noticed 14
about forty years ago, are commonly held to originate from stabilized plasmids, their 15
true nature and definition are yet to be resolved. On the premise that the integration of a 16
replicon within the cell cycle is key to deciphering its essential nature, we show that the 17
content in genes involved in the replication, partition and segregation of the replicons 18
and in the cell cycle discriminates the bacterial replicons into chromosomes, plasmids, 19
and another class of essential genomic elements that function as chromosomes. These 20
latter do not derive directly from plasmids. Rather, they arise from the fission of a multi- 21
replicon molecule corresponding to the co-integrated and rearranged ancestral 22
chromosome and plasmid. All essential replicons in a distributed genome are thus neo- 23
chromosomes. Having a distributed genome appears to extend and accelerate the 24
exploration of the bacterial genome evolutionary landscape, producing complex 25
regulation and leading to novel eco-phenotypes and species diversification.
26
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
I NTRODUCTION 27
Chromosomes are the only components of the genome that encode the necessary 28
information for replication and life of the cell/organism under normal growth conditions.
29
Their number varies across taxa, a single chromosome being the standard in bacteria 30
(Krawiec and Riley, 1990). Evidence accumulated over the past forty years is proving 31
otherwise: bacterial genomes can be distributed on more than one chromosome-like 32
autonomously replicating genomic element (replicon) (Casjens, 1998; diCenzo and 33
Finan, 2017; Mackenzie et al., 2004). The largest, primary, essential replicon (ER) in a 34
multipartite genome corresponds to a bona fide chromosome and the additional, 35
secondary, ERs (SERs) are expected to derive from accessory replicons (plasmids 36
(Lederberg, 1998)). The most popular model of SER formation posits that a plasmid 37
acquired by a mono-chromosome progenitor bacterium is stabilized in the genome 38
through the transfer from the chromosome of genes essential to the cell viability 39
(diCenzo and Finan, 2017; diCenzo et al., 2013; Slater et al., 2009). The existence in 40
SERs of plasmid-like replication and partition systems (Dubarry et al., 2006; Egan and 41
Waldor, 2003; Livny et al., 2007; MacLellan et al., 2004, 2006; Slater et al., 2009;
42
Yamaichi et al., 2007) as well as experimental results (diCenzo et al., 2014) support this 43
view. Yet, the duplication and maintenance processes of SERs contrast with the typical 44
behaviour of plasmids for which both the timing of replication initiation and the 45
centromere movement are random (Million-Weaver and Camps, 2014; Reyes-Lamothe 46
et al., 2014). Indeed, the SERs share many characteristic features with chromosomes:
47
enrichment in Dam methylation sites of the replication origin (Egan and Waldor, 2003;
48
Gerding et al., 2015), presence of initiator titration sites (Egan and Waldor, 2003;
49
Venkova-Canova and Chattoraj, 2011), synchronization of the replication with the cell 50
cycle (De Nisco et al., 2014; Deghelt et al., 2014; Egan and Waldor, 2003; Egan et al., 51
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
2004; Fiebig et al., 2006; Frage et al., 2016; Kahng and Shapiro, 2003; Rasmussen et al., 52
2007; Srivastava et al., 2006; Stokke et al., 2011), KOPS-guided FtsK translocation (Val 53
et al., 2008), FtsK-dependent dimer resolution system (Val et al., 2008), MatP/matS 54
macrodomain organisation system (Demarre et al., 2014), and similar fine-scale 55
segregation dynamics (Fiebig et al., 2006; Frage et al., 2016). Within a multipartite 56
genome, the replication of the chromosome and that of the SER(s) are initiated at 57
different time points (De Nisco et al., 2014; Deghelt et al., 2014; Fiebig et al., 2006;
58
Frage et al., 2016; Rasmussen et al., 2007; Srivastava et al., 2006; Stokke et al., 2011), 59
and use replicon-specific systems (Drevinek et al., 2008; Egan and Waldor, 2003;
60
Galardini et al., 2013; MacLellan et al., 2004, 2006; Slater et al., 2009). Yet, they are 61
coordinated, hence maintaining the genome stoichiometry (Deghelt et al., 2014; Egan et 62
al., 2004; Fiebig et al., 2006; Frage et al., 2016; Stokke et al., 2011). In the few species 63
where this was studied, the replication of the SER is initiated after that of the 64
chromosome (De Nisco et al., 2014; Deghelt et al., 2014; Fiebig et al., 2006; Frage et al., 65
2016; Rasmussen et al., 2007; Srivastava, 2006; Stokke et al., 2011) under various 66
modalities. In the Vibrionaceae, the replication of a short region of the chromosome 67
licenses the SER duplication (Baek and Chattoraj, 2014; Kemter et al., 2018), and the 68
advancement of the SER replication and segregation triggers the divisome assembly 69
(Galli et al., 2016). In turn, the altering of the chromosome replication does not affect 70
the replication initiation control of the SER in α-proteobacterium Ensifer/Sinorhizobium 71
meliloti (Frage et al., 2016).
72
Beside the exploration of the replication/segregation mechanistic, studies of multipartite 73
genomes, targeting a single bacterial species or genus (diCenzo et al., 2013, 2014;
74
Dubarry et al., 2006; Mackenzie et al., 2004; Slater et al., 2009) or using a more 75
extensive set of taxa (diCenzo and Finan, 2017; Harrison et al., 2010), relied on 76
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
inadequate (replicon size, nucleotide composition, coding of core essential genes for 77
growth and survival (diCenzo and Finan, 2017; Harrison et al., 2010; Liu et al., 2015);
78
Figure 1) and/or oriented (presence of plasmid-type systems for genome maintenance 79
and replication initiation (Harrison et al., 2010)) criteria to characterize the SERs.
80
81
Figure 1. Structural features of the replicons 82
Boxplots of the lengths (base pairs) and numbers of genes (ORFs), protein-coding genes (CDS), pseudogenes,
83
ribosomal RNA genes and transfer RNA genes for the 2016 chromosomes (blue), 129 SERs (orange), and 2783
84
plasmids (green) included in the final dataset (4928 replicons).
85
10M
1 10 100 1000 10k 100k 1M
Log-transformed numberper replicon
Size (bp) ORFs CDS
120
0 20 40 60 80 100
Number per replicon
140 160
rRNA genes tRNA genes 0
200 400 600 800 1000
Number per replicon
Peudogenes
Chromosome SER Plasmid All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
While clarifying the functional and evolutionary contributions of each type of replicon 86
to a multipartite genome in given bacterial lineages (Galardini et al., 2013; Harrison et 87
al., 2010; MacLellan et al., 2004; Slater et al., 2009), these studies produced no absolute 88
definition of SERs (diCenzo and Finan, 2017; Harrison et al., 2010) or universal model 89
for their emergence (diCenzo and Finan, 2017; diCenzo et al., 2013, 2014; Galardini et 90
al., 2013; Harrison et al., 2010). We thus set out investigating the nature(s) and origin(s) 91
of these replicons using as few assumptions as possible.
92
R ESULTS
93
Replicon inheritance systems as diagnostic features 94
We did not limit our study to a particular multipartite genome or a unique gene family.
95
Rather, we performed a global analysis encompassing all bacterial replicons whose 96
complete sequence was available in public sequence databases (Figure 2). We reasoned 97
that the key property discriminating the chromosomes from the plasmids is their 98
transmission from mother to daughter cells during the bacterial cell cycle. The functions 99
involved in the replication, partition and maintenance of a replicon, i.e., its inheritance 100
systems (ISs), thence are expected to reflect the replicon degree of integration into the 101
host cycle.
102
We first faced the challenge of identifying all IS functional homologues. The inheritance 103
of genetic information requires functionally diverse and heterogeneous actuators 104
depending on the replicon type and the characteristics of the organism. Also, selecting 105
sequence orthologues whilst avoiding false positives (e.g., sequence paralogues) can be 106
tricky since remote sequence homology most likely prevails among 107
chromosome/plasmid protein-homologue pairs.
108
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
Statistical + biological evaluation
UNSUPERVISEDMODEL Structuration = fstructure(V)+error
MODEL:
replicon stabilisation = f(IS) + error
4928 replicon vectors SUPERVISEDMODEL Prediction = fstructure({Vf ,prediction}training,Vf)+errorACLAMEKEGGRefSeq Query set 47,604 proteins5125 replicons 6,903,452 proteins 358,624 sequence homologues
BLAST annotation
92 families71 groups 127,853 Pfam domains 1,174,018 Pfam domains
HMMER HMMER TRIBE-MCL 7013 clustersQuality assessment 6096 clusters Ci 267,497 proteins 117 functionalities fi
917 noisy clusters 91,127 proteins
Cleaning
4928 replicon vectors Best BLAST hit
vr=(NC1r,...,NC6069r)
197 IS-null replicons vfr=(Nf1r,...,Nf117r) Supervised classification
Visual exploration Bipartite graph SER-SPECIFICFUNCTIONALITIESVfR=(vfr1,...,vf|R|) vfg=(Nf1g r∈g
∑
,...,Nfng r∈g∑
)VfG =(vfg1,...,vf|G| )withVR =(vr1,...,v|R| ) PCA + WARD ERT
INFOMAP SERIDENTIFICATION
Regression
Logit
Unsupervised classification
Fi gu re 2 . An al yt ic al p ro ce d u re
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
Starting from an initial dataset of 5125 replicons, we identified 358,624 putative IS 110
functional homologues, overall corresponding to 1711 Pfam functional domains (Figure 111
3a), using a query set of 47,604 chromosomal and plasmidic IS-related proteins selected 112
from the KEGG and ACLAME databases (Tables 1 and 2).
113
Table 1. ACLAME families used in the building of the query set 114
P
ROCESSF
AMILYP
ROTEIN DESCRIPTIONReplication
32
RepB, pi, initiator protein, RepE, RepA76
Rep, RepB, Rep of rolling circle initiator, RepA, RepU, OrfB, Rep2107
RepC, RepCa1, RepCa2, RepCd114
Helicase, UrvD rep helicase, helicase super family 1, Yga2F, helicase II118
CdsE, CdsJ133
RepA, W0005, RepA1/A2171
RepA, RepB, putative theta replicative protein207
replicative DNA helicase, DnaB, pGP1208
RepA, W0013, W0041, RepFIB224
long form TrfA, TrfA1, TrfA2, S-TrfA, plasmid initiation protein237
RepA, putative RepA, truncated RepA244
RepA, RepB, CopB, repA1/A2, w0004294
Rop regulatory protein, RNAI modulator, RNA modulator, plasmid copy number control297
primase activity/DNA initiation, LtrC/LtrC-like hypothetical protein, PcfD330
DNA repair/ DNA helicase, type III restriction enzyme, res subunit, DEAD/DEAH box helicase377
replicase, replication initiation, RepC, RepJ, RepE, RepL383
RepA, Rb100404
RepA,RepB,RepW412
Rep, RepA423
truncated RCR replication, RepRC, RepB, OrfA426
cell division control protein 6 homolog440
Rep 14-4, rm protein, RepA hypothetical protein451
RepA, host type : Corynebacterium477
Rep, RepS, RepE, host type : Bacillus, RepS, RepR612
RepL, replication initiation775
DNA helicase activity, RepA, putative helicase854
DNA helicase activity, RepC, putative initiator protein921
RepA931
DNA replication initiation, putative protein, CdsD1005
helicase activity, putative protein, hypothetical helicase1055
RNA polymerase σ factor, σ 70 family, bacteriocin uviA, sigF/V/G, tetR, host type : Clostridium1095
DNA repair/helicase, RuvB, DNA pol III γ and τ subunits, DNA pol δ subunit1099
putative theta replicase, RepB, Rep21187
DNA replication, RepH, RepI1288
RepA1345
DNA primase activity, DNA primase , primase CHC2 family1398
helicase activity, GcrE, GcrCAll rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
1652
DNA repair/exonuclease activity, DNA exonuclease protein, SbcCD related protein1837
putative replication protein2881
RepC-like, PifPartition
4
plasmid partition protein, ParA, ParA IncC protein, ParA InC1/ IncC2, SopA, virC114
RepB, RepB partitionning, KorB repressor and partitionning, ParB-like domain, YefA, YdeB, ParB, ParB-like102
DNA binding, partitionning protein, control protein, ParB, VirB, partition protein B128
DNA segregation/DNA translocase activity, cell division FtsK/ SpoIIIE, SpoI, TraB289
ParM family, go : translocase, hypothetical protein, rode shape protein, putative ATPase of class HSP70316
microfilament motor activity, ParM family, StbA protein, stable inheritance protein, ParA318
ATPase, regulation of cell division, chromosome patition, GumC, ExoP related protein, EpsB, MPA1 family427
ATPase family, ParR family, ParB, StbB, mediator of plasmid stability875
DNA binding, partitionning protein family ParB/Spo0J, YPMT1.28c876
DNA binding, partitionning protein family ParB/Spo0J, YPMT1.29c983
DNA binding, ParB, CopG1227
DNA plasmid copy number control, CopG2158
RepC2894
DNA bindingDimer resolution
5
serine based recombinase activity, ylb, resolvase, second invertase, TniR, ParA10
tyrosine-based recombinase, integrase, putative integrase, Xer, recombinase-like SAM101
plasmid dimer resolution, tyrosine-based recombinase, yld, SAM-like protein170
tyrosine-based recombinase, OrfA, hypothetical protein589
tyrosine based protein, Fis protein688
tyrosine based protein, SAM like protein, XerDMaintenance
100
Postsegregational killing system vapBC/vag136
Postsegregational killing system parDE156
Postsegregational killing system epsilon-zeta201
Postsegregational killing system higBA212
Postsegregational killing system parDE293
Postsegregational killing system mazEF319
Postsegregational killing system relBE326
Postsegregational killing system mazEF335
Postsegregational killing system HOK/SOK338
Postsegregational killing system parDE356
Postsegregational killing system parDE366
Postsegregational killing system vapBC/vag380
Postsegregational killing system phD-doc428
Postsegregational killing system ccd470
Postsegregational killing system yacA474
Postsegregational killing system relBE515
Postsegregational killing system relBE556
Postsegregational killing system higBA563
Postsegregational killing system ccd588
Postsegregational killing system higBA677
Postsegregational killing system higBA798
Postsegregational killing system mazEF916
Postsegregational killing system relBE1031
Postsegregational killing system HOK/SOK1180
Postsegregational killing system vapXD1308
Postsegregational killing system HicAB1559
Postsegregational killing system epsilon-zeta1927
Postsegregational killing system mazEFAll rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
3357
Postsegregational killing system, plasmid maintenance4776
Postsegregational killing system, parC4777
Postsegregational killing system parDE, parD16584
Postsegregational killing system vapXDTable 2. KEGG “Prokaryotic-type chromosome” orthology groups used in the building of the 115
query set 116
BRITE
HIERARCHYKEGG
E
NTRYN
AMED
EFINITIONChromosome replication
Initiation factors (bacterial)
K02313
DnaA chromosomal replication initiator proteinK02314
DnaB replicative DNA helicase [EC:3.6.4.12]K03346
DnaB2, DnaB replication initiation and membrane attachment proteinK02315
DnaC DNA replication factor, helicase loaderK02316
DnaG DNA primase [EC:2.7.7.-]K11144
DnaI primosomal protein DnaIK05787
HupA DNA-binding protein HU-alphaK03530
hupB DNA-binding protein HU-betaK04764
IhfA, HimA integration host factor subunit alphaK05788
IhfB, HimD integration host factor subunit betaK03111
ssb single-strand DNA-binding protein Terminus site-bindingprotein
K10748
Tus, Tau DNA replication terminus site-binding proteinDNA methylation enzymeK06223 Dam DNA adenine methylase [EC:2.1.1.72]
Prevention of re- replication factors
K10763 Hda DnaA-homolog protein
K03645 SeqA negative modulator of initiation of replication
Chromosome partition
MukBEF complex
K03632 MukB chromosome partition protein MukB
K03804 MukE chromosome partition protein MukE
K03633 MukF chromosome partition protein MukF
Condensin-like complex
K03529 Smc chromosome segregation protein
K05896 ScpA segregation and condensation protein A
K06024 ScpB segregation and condensation protein B
Divisome proteins
K03585 AcrA membrane fusion protein
K01448
AmiA,AmiB,AmiC N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28]
K13052
DivIC, DivA cell division protein DivICK03590
FtsA cell division protein FtsAK05589
FtsB cell division protein FtsBK09812
FtsE cell division transport system ATP-binding proteinK03587
FtsI cell division protein FtsI [EC:2.4.1.129]K03466
FtsK, SpoIIIE DNA segregation ATPase FtsK/SpoIIIE, S-DNA-T familyK03586
FtsL cell division protein FtsLK03591
FtsN cell division protein FtsNK03589
FtsQ cell division protein FtsQK03588
FtsW, SpoVE cell division protein FtsWK09811
FtsX cell division transport system permease proteinK03531
FtsZ cell division protein FtsZK09888
ZapA cell division protein ZapAK03528
ZipA cell division protein ZipAInhibitors of FtsZ assembly
K04074
DivIVA cell division initiation proteinK06286
EzrA septation ring formation regulator All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
K03610
MinC septum site-determining protein MinCK03609
MinD septum site-determining protein MinDK03608
MinE cell division topological specificity factorK05501
SlmA, Ttk TetR/AcrR family transcriptional regulatorK09772
SepF cell division inhibitor SepFK13053
SulA cell division inhibitor, FtsZ assembly inhibitorOther chromosome partitioning proteins
K04095
Fic cell filamentation proteinK04094
Gid, TrmFO methylenetetrahydrofolate--tRNA-[uracil-5-)-methyltransferase [EC:2.1.1.74]K03495
GidA, MnmG,MTO1 tRNA uridine 5-carboxymethylaminomethyl modification enzyme
K03501
GidB, RsmG 16S rRNA [guanine527-N7)-methyltransferase [EC:2.1.1.170]K03569
MreB rod shape-determining protein MreB and related proteinsK03570
MreC rod shape-determining protein MreCK03571
MreD rod shape-determining protein MreDK03593
Mrp ATP-binding protein involved in chromosome partitioningK03496
ParA, Soj chromosome partitioning proteinK03497
ParB, Spo0J chromosome partitioning protein, ParB familyK02621
ParC topoisomerase IV subunit A [EC:5.99.1.-]K02622
ParE topoisomerase IV subunit B [EC:5.99.1.-]K11686
RacA chromosome-anchoring protein RacAK05837
RodA, MrdB rod shape determining protein RodAK03645
SeqA negative modulator of initiation of replicationK03733
XerC integrase/recombinase XerCK04763
XerD integrase/recombinase XerDNucleoid
HNS (histone-like nucleoid structuring
protein)
K03746
H-NS DNA-binding protein H-NSK11685
StpA DNA-binding protein StpAHU (heat unstable protein)
K05787
HupA DNA-binding protein HU-alphaK03530
HupB DNA-binding protein HU-betaIHF (integration host factor)
K04764
IhfA, HimA integration host factor subunit alphaK05788
IhfB, HimD integration host factor subunit betaOther nucleoid associated proteins
K05516
CbpA curved DNA-binding proteinK12961
DiaA chromosomal replication initiator proteinK02313
DnaA DnaA initiator-associating proteinK04047
Dps starvation-inducible DNA-binding proteinK03557
Fis Fis family transcriptional regulator, factor for inversion stimulation proteinK03666
Hfq host factor-I proteinK05596
IciA chromosome initiation inhibitor, LysR family transcriptional regulatorK03719
Lrp leucine-responsive regulatory protein, Lrp/AsnC family transcriptional regulatorK05804
Rob right origin-binding protein, AraC family transcriptional regulator All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint .
http://dx.doi.org/10.1101/264945 doi:
bioRxiv preprint first posted online Feb. 13, 2018;
117
Figure 3. Properties of the IS clustering 118
(a) Frequency distribution of the 358,624 putative IS protein homologues according to their number of functional domains (0 to
119
69) per protein (left), and occurrences of the 1711 functional Pfam domains (right). The 20 top most frequently encountered
120
functional domains are indicated. (b) Size distribution of the 7013 clusters, each comprising from a single to 1990 proteins. (c)
121
Percentage distribution of the most frequent annotation per cluster among all clusters (left) and among clusters with multiple
122
annotations (right). (d) Distribution of the most frequent annotation per cluster among the 917 excluded clusters (left) and the
123
6096 clusters retained for the analysis (right).
124
FtsE
Rob
AmiA, AmiB, AmiC
XerD IciA AcrA XerC
ParA, Soj Xer-like tyrosine
recombinase ParB, Spo0J Lrp ParA/ParM serine recombinase
CbpA Helicase
RuvB HupB
MreD FtsQ MinD FtsK, SpoIIIE FtsI DnaC Ssb
ATPase, TyrK, ExoP DNA helicase RepA,E,B SlmA, Ttk DnaG Mrp Smc ZapA Dam DnaB FtsX ParB MreC other FtsE
AmiA, AmiB, AmiC FtsK, SpoIIIE
MinD Smc GidB, RsmG ParB, Spo0J Mrp
FtsX DnaB ScpA ParA, Soj ZipA other
0 1000 2000 3000 4000 5000 6000 7000
1-5% 20-25% 45-50% 70-75% 95-100%
Number of clusters
% of the most frequent annotation per cluster 0 20 40 60 80 100 120
1-5% 20-25% 45-50% 70-75% 95-100%
Number of clusters with multiple annotations
% of the most frequent annotation per cluster
b c
d
6096 retained clusters 917 excluded clusters
0 20 000 40 000 60 000 80 000 100 000 120 000 140 000 160 000 180 000
2OG-FeII_Oxy 5_3_exonucA2M_comp AAA_10 AAA_15 AAA_2 AAA_25 AAA_3 AAA_35 ABC_membrane ABC_tran_2 ABC2_membrane_4 Abhydrolase_6 Acetyltransf_7 Actin Acyltransferase AFG1_ATPase Aldedh AlkA_N Amidase Amidase02_C AMP-binding_C_2 Ank_2 Anticodon_1 Apc4_WD40 Arabinose_bd Arch_ATPase Arm AsmAATG16 ATP-grasp_3 ATPgrasp_Ter B-block_TFIIIC Bac_DnaA_C BcrAD_BadFG BetaGal_dom 4_5 Big_4 BiPBP_C Borrelia_orfABRCTC-term_anchor CALCOCO1 Carboxype
pD_reg Cast CbiQ CBM_20 CBM_X2 CCDC144C CDC48_2 CENP-C_C CHB_HEX_C Chrome_Resist CMAS CobT_C Collar
Coupl
e_hipA CPT CTDII Cu_amine_oxidN3 Cupin_3 CW_binding_1 Cytochrom_C Cytochrome-c551 DAO DcpS_C DDE_Tnp_IS66 DDRGK DinB_2 DLH DNA_gyraseB_C DNA_pol_B DNA_pol3_delta DNA_primase_S DnaB_2 DnaJ DPBB_1 DRP DUF1049 DUF1175 DUF1510 DUF1664 DUF1778 DUF187 DUF1989 DUF2232 DUF2442 DUF2730 DUF29 DUF3099 DUF3258 DUF3380 DUF3435 DUF3491 DUF3620 DUF3701 DUF3794 DUF3863 DUF4047 DUF4101 DUF4135 DUF4163 DUF4201 DUF4347 DUF4368 DUF4411 DUF4480 DUF488 DUF559 DUF736 DUF830 DUF927 DUF977 EAL EcoR124_C EFG_C Endonuc_subdomERAP1_C Esterase_phd EzrA FAD_binding_3 Fascin
FeoB_N
Fer4_12 Fer4_17 Fer4_6 Ferritin FGGY_N Fic_N FixG_C FliJ Fn3_assoc Ftsk_gamma FTSW_RODA_SPOVE G5 GAS GATase_6 Germane GIDA_assoc_3 Gly_transf_sug Glyco_hydro_18 Glyco_hydro_20 Glyco_hydro_3 Glyco_hydro_39 Glyco_hydro_66 Glyco_hydro_cc Glyco_trans_1_2 Glyco_trans_4_4 Glycos_transf_2 GNVR GrpE Guanylate_cyc HAD HATPase_c HD HEAT_2 Helicase_C_3 Heme_oxygenase Hexapep_2 HHH_6 Hint_2 HisKAHlyD HNH HRDC HtaAHTH_15 HTH_21 HTH_26 HTH_30 HTH_37 HTH_5
HTH_AsnC-type HTH_Mga HU-DNA_bdg Hydrolase HYR IFT57 IncA Integrase_AP2 Jacalin KdpD Kelch_6 Kinase-like L_lactis_RepB_C LacY_symp Laminin_G_3 Lectin_legB Lipase_GDSL_2 Lon_2 LRR_1 LRR_9 Lyase_8_C Lysozyme_like LZ_Tnp_IS66 MAD MbeD_MobD MDMPI_N MerR Metallopep Methyltransf_11 Methyltransf_1NMethyltransf_26 Methyltransf_4 MFS_1 MgsA_C Miro MMR_HSR1 MoCF_biosynth MreB_Mbl MscS_porin MukE Mur_ligase_M MVIN NACHT NapD NERD Nitro_F
eMo-Co NodS NPV_P10 NTP_transf_5 NUMOD1 OEP OpcA_G6PD_assem P22_Coa tProtein ParA ParG PAS_8 PBP_dimer PD40 Pec_lyase PemK PepSY
Peptidase_C26 Peptidase_C69 Peptidase_M14 Peptidase_M20 Peptidase_M28 Peptidase_M6 Peptidase_S24 Peptidase_U35_2 Peripla_BP_4 PfkB PGI Phage_int_SAM_1 Phage_integ_NPhage_rep_O PhdYeFM_antitox PHP PIN Plasmid_stab_B PLDc_N PMT_2 Polysacc_deac_1 Porphobil_de amC PPC PQQ_3 Prenyltrans Pribosyltran Prim-Pol Propeptide_C25 PSD4 PT-TG Pyr_redox_2 Rad50_z
n_hook RCC1 Recombinase Reo_sigmaC Rep_trans RepC Response_reg Rhomboid Ribonuc_red_lgC Ribosomal_L22 Ribosomal_S30AE Rieske_2 RNA_helicase RNA_pol_Rpb1_5 RnfC_NRotamase_3 RQC
Rubre
rythrin rve S4 SbcD_C SBP_bac_5 SdrG_C_C Secretin SeqA SH3_2 SH3_9 Sigma54_activat Sigma70_r2 SIR2_2 SLT SNase Spc7 SpoU_methylase SSB Sugar_tr SurA_N_3 T2SS-T3SS_pil_N TBPIP Tektin Terminase_1 Terminase_GpA TetR_C_6 TGS ThiS Thymidylat_synt TIR_2 TOBE_3 Toprim_2 Toxi n_60 TPR_10 TPR_16 TPR_6 TraG-D_C Transglut_core3 Transpeptidase TrfA
tRNA_lig_kinase tRNA-synt_1e Trp_repressor Trypan_PARP Tubulin Ubie_methyltran UN_NPL4 UPF0114 UPF0547 UvrD_C VanW VirE VWA_3 WHG Y_phosphatase YadA_head YcgR_2 YHS YPEB Zeta_toxin zf-IS66 ZipA_C ZU5
Number of proteins
Pfam domain AAA_21
ABC transporter
SMC N
HTH Ara
C
HTH 1 LysR su
bstrate
Biotin lipoyl 2 ABC
me mb
rane
CbiA DNA gyra
se A C
oligo HPY
Ph age integrase AAAdoma
ins
HlyD doma
ins ABC domains
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000 80 000
Number of proteins
Number of Pfam domains per protein
40 69
10
0 20 30 50 60
a
0 500 1 000 1 500 2 000 2 500 3 000 3 500 4 000 4 500 5 000
1-20 80-100 180-200 280-300 380-400 480-500 580-600 680-700 780-800 8980-900 980-1000 1080-1100 1180-1200 1280-1300 1380-1400 1480-1500 1580-1600 1680-1700 1780-1800 1880-1900 1980-2000
Number of clusters
Number of proteins per cluster (Cluster size)