HAL Id: hal-01300043
https://hal.sorbonne-universite.fr/hal-01300043
Submitted on 8 Apr 2016
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0
Network-Thinking: Graphs to Analyze Microbial
Complexity and Evolution
Eduardo Corel, Philippe Lopez, Raphaël Méheust, Eric Bapteste
To cite this version:
Eduardo Corel, Philippe Lopez, Raphaël Méheust, Eric Bapteste. Network-Thinking: Graphs to
Analyze Microbial Complexity and Evolution. Trends in Microbiology, Elsevier, 2016, 24 (3),
pp.224-237. �10.1016/j.tim.2015.12.003�. �hal-01300043�
Review
Network-Thinking:
Graphs
to
Analyze
Microbial
Complexity
and
Evolution
Eduardo
Corel,
1,*
Philippe
Lopez,
1Raphaël
Méheust,
1and
Eric
Bapteste
1Thetree model andtree-based methods haveplayed a major,fruitful rolein evolutionarystudies.However,withtheincreasingrealizationofthequantitative and qualitative importance of reticulate evolutionary processes, affecting all levels of biological organization,complementary network-based models and methods are now flourishing, inviting evolutionary biology to experience a network-thinking era. We show how relatively recent comers in this field of study,thatis,sequence-similaritynetworks,genomenetworks,andgene fami-lies–genomesbipartitegraphs,alreadyallowforasignificantlyenhancedusage of molecular datasets in comparative studies. Analyses of these networks provide tools fortackling a multitude of complex phenomena, including the evolutionof genetransfer,compositegenes andgenomes,evolutionary tran-sitions,andholobionts.
NewMethods forStudying theWeb ofLife
ThetreemodelhasbeenlargelyandrightlyusedinevolutionaryanalysessinceDarwin'sseminal work[1].Thegenealogicalrelationshipsbetweenevolvingobjectsareindeedcriticaltoexplain life'sdiversity,notonlyfromaprocessualperspective(wherecommonancestryexplainssome similarities),butalsoasapowerfulpatterntoclassifyallrelatedevolvedforms[2].However,the treestructure,especiallywhenassumedtobeuniversal,stronglyconstrainsourdescriptionof theevolutionoflife[3–5].Bydefinition,atreecanonlydescribedivergencefromalastcommon ancestor (often with dichotomies, or with polytomies describing fast radiations). In vertical descent,thegeneticmaterialofaparticularevolutionaryunitispropagatedbyreplicationinside itsownlineage.Whensuchlineagessplit,andbecomegeneticallyisolatedfromoneanother,this produces a tree. By contrast, in introgressive descent, the genetic materialof a particular evolutionaryunitpropagatesintodifferenthoststructuresandisreplicatedwithinthesehost structures[4].However,atreewithasingleancestorforeachobjectcannotrepresentsucha mergingofdistinctlineagesintoanovelcommonhoststructure.Typically,organismsproduced bysexualreproductionineukaryotesoriginatefromtwoparentswhich mergedtheirgenetic material.Genealogicaltreeswithasingleancestordonotdescriberelationshipswithin eukary-oticsexualpopulations.Indeed,thisgenuinegenealogicalrelationshipcannotbedepictedwitha traditionaltreerepresentationsincethispatternwouldimposethatoneconsidersanoffspring either more closely related to only one of its parents, or to be the progenitor of its own ascendants[4].
Thedistinctionbetweenverticalandintrogressivedescentisnotaminorone;introgression (see Glossary)affects alllevelsof biological organization:from molecules,when sequences legitimatelyorillegitimatelyrecombine,togenomes,whensequencesentergenomesbylateral
Trends
Introgressive processes shape the microbial world at all levels of organisation.
Thisreticulatedevolutionisincreasingly studiedby sequence-similaritynetworks. Theyprovideaninclusiveaccurate mul-tilevelframeworktostudythewebof life.
Networksenhanceanalysesof micro-bial genes, genomes, communities, andofsymbiosis.
1
EquipeAIRE,UMR7138,Laboratoire EvolutionParis-Seine,UniversitéPierre etMarieCurie,7quaiStBernard 75005Paris,France
*Correspondence:
Glossary
Articulationpoint(orcut-vertex): nodeinagraphwhoseremoval increasesthenumberofconnected componentsoftheresultinggraph. Betweenness:centralitymeasurefor anodeinagraph,namely,the proportionofshortestpathsbetween allpairsofnodesthatpassthrough thisspecificnode.Nodeshavinga betweennesscloseto1aresaidto bemorecentral,andthosecloseto 0,moreperipheral.
Bipartitegraph:agraphwithtwo typesofnode(topnodesandbottom nodes)suchthatanedgeonly connectsnodesofonetypewith nodesoftheothertype. Clubofgenomes:acoalitionof entitiesreplicatinginseparateevents andexploitingsomecommongenetic materialthatdoesnotnecessarily tracebacktoasinglelastcommon ancestor.
Community:ingraphtheory,groups ofnodesthataremoreconnected betweenthemselvesthanwiththe restofthegraph.Thistechnical meaningshouldnotbeconfusedwith itsuseinexpressionssuchas ‘microbialcommunities’. Connectedcomponent:setof nodesinagraphforwhichthereis alwaysaninterconnectingpath. Degree:numberofincidentedgesto agivennode.
Introgression:descentprocess throughwhichthegeneticmaterialof aparticularevolutionaryunit propagatesintodifferenthost structuresandisreplicatedwithin thesehoststructures.
Multiplexgraph:agraphhaving possiblyseveraledgesofdifferent typesbetweentwonodes. Neighbors:nodesthataredirectly connectedbyanedge.
Publicgeneticgoods:thecommon geneticmaterialsharedbyaclubof phylogeneticallydistantgenomes. Quotientgraph:simplifiedgraph whosenodesrepresentdisjoint subsetsofnodesoftheoriginal graph;anedgeinthisnewgraph connectstwosuchnewnodes wheneveranedgeintheoriginal graphconnectsatleastoneelement ofanewnodewithatleastonefrom theother.
Support:thecommonsetof neighborsofatwinclass. Twins:nodesinagraphthathave exactlythesamesetofneighbors.
genetransfer,andtoholobionts,whenorganismsformacollectivesystem(suchasthetight associationobservedbetweenhostandendosymbionts)[6–8](Box1).Introgressivedescent doesnotalwaysimplylateralgenetransfer:forexample,independentlyreplicatedgenefamilies, eachhavingtheirowntree,canmerge,andthisresultsinanovelcompositegenefamily.Since evenintrogressivedescentisdescent,itencompassesaverticaldimension.Thetree represen-tationemphasizeshowentitiesevolveexunibusplurum,whereasthenetworkrepresentation emphasizeshowentities evolveex pluribusunum. Of course,evolution progressesinboth dimensions.Thus,thetreeoflifeandthenetworkoflifearenotmutuallyexclusivemodels.When lineagesthatevolvedinatree-likefashionmerge,thiscreatesreticulationbetweenbranchesof trees;likewise,afterareticulationevent,phylogeneticallycompositeentitiescanundergoa tree-likeevolution:atreestartsgrowingonthegroundofaninitialreticulation.Consequently,future syntheticrepresentationscouldaimatdisplayingsimultaneouslybothverticalandlateralpartsof biologicalevolution.
Box1.MosaicismofLife
Introgression,themergingofentitiesfromdifferentlineages,affectsmultiplelevelsofbiologicalorganization.For example,FigureIAdescribestheintrogressionofgenesfromdistinctgenefamilies,whichresultsincompositegenes, suchasmultidomaingenes[46].Thesesequencescancomefromwithinagivengenome,orwhentheycomefrom differentgenomes,suchasthegenomes ofanendosymbiontand ofits host,theresultingcompositegene, composedpartlyofmaterialfromanendosymbiont,isthereforeasymbiogeneticgene.FigureIBdescribesthe introgressionofageneintoahostgenome,occurringforinstanceduringalateraloranendosymbioticgenetransfer, whichresultsinacompositegenome[63],orwhensequencestransferacrossmobilegeneticelements,producing mosaicmobileelements[37].Ofnote,morethanonegenecanbesoacquiredbyagenome[59,60].FigureIC describestheintrogressionofagenomeintoanothergenome,occurringforinstancewhenthegenomeofWolbachia pipiensisbecomesinsertedintothegenomeofaDrosophilaananassae,orwhenthegenomeofavirophagesuchas SputnikbecomesinsertedintothegenomeofagiantvirussuchasMimivirus,whichresultsinacompositegenome
[87–89].FigureIDdescribestheintrogressionofamobileelement,suchasaplasmid,withinahostcell(ororganism), occurringfor instancewhen a symbioticplasmidcarrying hypermutagenesisdeterminants(e.g., the imuABC cassettes)invadessoilbacteria,enhancingtheexplantaphenotypicdiversificationofthesenovelcompositecells
[90,91].FigureIEdescribestheintrogressionofagenomeintoahostcell,occurringforinstanceduringeventsof kleptoplasty[92,93]orasaresultofanextremereductiveevolutionaftersecondaryortertiaryplastidacquisition, whichresultsina(transientorpersistent)compositeorganism[7].FigureIFdescribesintrogressionofcellsor organisms,occurringforinstanceduringtheevolutionandgrowthofmultispeciesbiofilms[94],endosymbiosis[8,95], duringthedevelopmentandspeciationofanimals[82,83].Typically,sequence-similaritynetworkscanbeusedto investigateforA;genomenetworksforB,C,andE;multiplexgenomenetworksforB,C,andE;andbipartitenetworks forB,D,E,andF. Composite sequence Sequence Sequence Composite genome Genome Genome Genome Genome Composite genome
Composite cell or organism Holobiont Holobiont
Mobile genec element Cell or organism Cell or organism Cell or organism Cell or organism
(A) (B) (C)
(D) (E) (F)
FigureI.SeveralIllustrationsofMosaicismthroughMergingEvents.(A)Compositegenesresultfromthefusion ofdifferentgenedomains.(B)Compositegenomescanresultfromtheintrogressionofageneintoagenome,or(C)from theintrogressionofagenomeintoagenome.(D)Compositeorganismscanarisefromtheintrogressionofamobile geneticelement.Holobiontsresultfromtheintrogressionofagenome(E)orofanothercell(F)intoacell.
Importantly,notallevolvingobjectsentertaingenealogicalrelationships:forinstance,virusesand cells,bothcriticalplayersofbiologicalevolution,arenotassumedtoberelatedinthisway[9–14], nor are plasmid and plasmid-like transferable objects, as integrative-conjugative elements (ICEs).Cells, viruses, plasmids, andICEs lack recognized genealogicalrelationships, either because they genuinely evolved from separate roots, or because their putative common ancestor(s) cannot be inferred from the data, for example, if other descendants of such ancestorsbecameextinct,orhavenotbeensequencedtodate.This apparentgenealogical disconnectiondoesnotexcludeverticalevolutionwithinlineagesofmobileelements.Thereis, for example, evidencefor both verticaland introgressive descent inplasmids and ICEs of firmicutes[15].Butitmeans thatonegenealogicaltreecannot representalltheevolutionary history[6].Therefore,atraditionalapproachtoanalyzingevolutionincurs theriskofmissing explananda(many phenomena that arenot describedby a genealogical tree) andmissing explanans(manyevolutionaryprocessesresponsibleforlife'sdiversity).Treesandnetworksare representationsthatallowforscientificanalysis.Consistently,tree-thinkinghasalreadylargely beenexploited, andit isnow timely and heuristic to turn to network-thinking to illuminate additionalandcomplexaspectsofthebiology.Inthisreview,wearguethatsequence-similarity networks,alreadyusedtoinvestigatetheevolutionofproteincodinggenes,canalsobeusedto analyzemanymosaicismsoflife,suchasbacterialgenomeevolution,prokaryotes’andprotists’ organismalevolution,andtheevolutionofholobiontsandcommunitiesinwhichmicrobesplaya role,inparticularassymbionts(Box1).Weexplainintrogressionresultsinatleastthreemajor phenomena:(i) microbial social life,understood hereas genetic transfers between different genomes,(ii)chimerism(occasionallyimplyingmajorevolutionarytransitions),and(iii)holobionts. Allthreeexamplesresistclassictree-basedanalysesandchallengeourevolutionaryknowledge. Atreemodelalonedoesnotdescribetheseintrogressiveprocesses,thatis,thefactthatthey involvemultiplelineages, andtheir outcomes,that is, thefact thatthey produce collective, composite,entities.Wedescribehowandwhythesephenomenacanbestudiedusingthree classesofnetworks[sequence-similaritynetworks,genomenetworks,andbipartitegraphs (Figure1,KeyFigure)],enlargingtheanalyticaltoolkitofevolutionarymicrobiologists.Ontheone hand,thedisplayoflargenetworkswillconstituteachallengeforthefuturedevelopment of network-thinking.On the otherhand, interms of interpretation,even very largeanddense networkscanbeeffectivelysimplified, forexample,usingtwinanalyses. Thus,weexpecta network-thinkingeratosoonbeattheforefrontinmicrobiology.
InvestigatingMicrobialSocial LifewithGenomeNetworks
Genetransferbetweenprokaryoticorganismsandmobilegeneticelements(i.e.,virusesand plasmids) has largely shaped cellular genome content,as illustrated by the observation of prokaryoticpangenomes[15,16],forwhichthecollectionofgenefamiliesusedbythemembers ofagivenspeciesislargerthanthenumberofgenefamiliespresentinanyindividualgenome fromthatspecies. Theflow ofgenesbetween genomes,oftenmediated bymobilegenetic elements,explainsthisobservation,butcomplicatesclassicinferencesaboutthepast(suchas genomereconstructionattempts)[4,5,17–20].Foragivenlineage,thecontentsofancestral genomes may be largely different from the union of extant genomes because prokaryotic genomesactas ‘read–write’storage organellesrather than‘read–only’ memories[21],and genomescanlosegenes.Thus,describingevolutionrequiresnotonlythetrackingofmutations thataccumulatewithingenefamilies,orlossofgenefamilies[22],butalsogenesthataregained byintrogression[23].Thelatterencouragesexploringhorizontalgenetransferwithinprokaryotic communities.This bringsforwarddifficult questions[20,24–30]sincetherearemanyroutes throughwhichgenespassfromonemicrobialhosttotheother,thatis,multiplechannels[31]for gene transmission. For example,is gene transmission random interms of cellular,viral, or plasmidictargets(howeverproducingasymmetricalresultsduetosomefurtherhostselection actingon the incoming genetic material)? Is it random in terms of what gene familiesare transmitted?Canwefindgroupsofcotransmittedgenes?
Sharedgenenetworkswereintroducedpreciselytotackletheseissues(Figure1)[17,19,32]. Thesenetworksrepresentwhichgenomessharegeneticmaterial,withoutprejudiceregarding theprocesses involved(verticaldescent, butalsointrogressions[19,33,34]).Ingenome net-works,allentitiesarenotnecessarilygenealogicallyrelated,allowingforsimultaneousanalysisof mobilegeneticelementsandcellularevolution.Inthatrespect,thesocialmicrobialnetworkis moreinclusivethanthetreeoflife,whichisrestrictedtoonetypeofrelationshipbetweenone
KeyFigure
Different
Graph
Representations
of
the
Same
Gene
Sharing
among
Genomes
CC 2 CC 3 CC 1 B A A C B C C B A 1 1 1 1 1 2 1 1 1 1 2 2 2 2 1 3 1 2 3 (A) C C C A B B C A B B A B (B) (C) (D)Figure1.(A)Sequence-similaritynetwork(SSN):eachnode(circle)representsaprotein-codinggenesequence;thecolor andthelabelofthenoderepresentthegenomewherethegeneisfound.Twonodesareconnectedbyanedge(alinelinking twonodes)ifthepairofsequencesfulfillsgivensimilaritycriteriasuchasaminimumpercentageidentityandcoverage(i.e., theratiobetweenthelengthofthematchingpartsandthetotallengthofanytwosequences).Sequence-similaritynetworks areanalyzedasapartitionintoconnectedcomponents(CCs,highlightedascolorhalos).Thispartitiondefinesgroupsof putativegenefamilies,whenreciprocalsequencecoverageandidentitypercentagearehigh[68]:forinstance,wecan interpretCC1asagenefamilyforwhichtwocopiesarepresentbothingenomesAandB.(B)Genomenetworks(GNs)can beobtainedfromSSNs:nodesaregenomes(describedbycolorandlabel);edgesconnectgenomesthatshareatleastone genefamily;GNscanbeweighted:weightscountthenumberofgenefamiliessharedbythetwogenomes.Intheexample, AandBsharethreegenefamilies,butthegraphdoesnotspecifywhichones.(C)Multiplexednetworks(MNs)canbe,in turn,obtainedfromGNsbylabellingedgesinordertoidentifywhatgenefamiliesareshared:nodesrepresentgenomes; multi-edgesrepresentdistinctsharedgenefamilies(samecolorcodeastheCCsintheSSN);weightscountthenumberof sharedgenesineachfamily:theblueedgebetweenAandBcorrespondstoCC1in(A)andhasthereforeweight2.(D) BipartitegraphscanalsobeobtainedfromSSNs;topnodesaregenomes;bottomnodesaregenefamilies;edgesconnect agenometoagenefamilyifthatgenomecontainsatleastonerepresentativeofthecorrespondinggenefamily;weights countthenumberofgenesofthatfamilypresentinthatgenome:intheexample,node1correspondstoCC1in(A),andhas thereforeedgesincidenttogenomesAandB,eachofweight2.
fractionofthebiologicaldiversity[6].Twogenomeswithadirectconnectioninsuchagraphare similarinthesensethattheyshareatleastonegenefamily,whereastwogenomesconnectedonly byanindirectpatharenotsimilarintermsofgenecontent.Thesegenomenetworksdisplaysome structure.Firstofall,plasmidsaremorecentral(higherbetweenness[35]foragivendegree)and virusesmoreperipheral,testifyingthatplasmidsaregeneralcouriersforgenetransmissionamongst microbes[19].Second,genomenetworkshaveseveralconnectedcomponents,thatis,several setsofgenomesforwhichthere isalwaysaninterconnectingpath.Eachoftheseconnected componentsgroupsgenomeswithexclusive,non-overlappingsets ofgenefamilies,andthus correspondstopoolsofgenesuniquelyassociatedwiththesegenomes[19].Theexistenceof differentconnectedcomponentssuggeststheexistenceofrestrictionstointrogression.
Withinaconnectedcomponent,agenomenetworkonlyshowsthatgenomessharegenes,but notwhatthesharedgenesare.Typically,atriangleofthreeconnectedgenomes(A,B,C)may resultfromthesharingofdifferentgenesforeachpair(AB,BC,AC)withinthistriangle[4](see
Figure1B,C).Thus,genomesmayformtightlyclustered communities[20]inthese graphs whilesharingdifferentgenes.Genomenetworksprovidegeneralinformationaboutbarriersto transmissionandaboutgenetic partnerships,suggestingclubsofgenomesenjoyingpublic geneticgoods[4,20].Thesegenomenetworksrequire, however,furtherspecifications(for example,ontheiredges)toaddressdetailedquestionsaboutgenetransmissionanditsbarriers. Amoreinformativerepresentationdisplaystheidentityofsharedgenesalongeachedgeofa genomenetwork,likein[36],whichshowedsomegenesharingbetweenbacteriophages(as early as 1999), or as in [37] that unraveled genetic transmission between mobile genetic elementsofgiantviruses(asrecentlyas2013).Suchmultiplexgraphsareunquestionably attractiveandrathernaturalrepresentationsofgeneticsharing.However,theirdisplaybecomes rapidlycomplexforlargedatasets,andfromananalyticalpointofview,othergraphscanoffer practicaladvantagestoanalyzegenetransmissionbeyondthegenomenetworkframework.
IntroducingBipartiteGraphsinEvolutionaryStudies
Theinformationontheidentityofsharededges(here,genefamilies)canbeconservedinaless clutteredfashionbyusingbipartite‘genefamilies–genomes’graphs.Inthesegraphs,theprecise informationregardinggenesharingisdirectlyencodedasedgesbetweenthesetwokindsof nodes.Multiplexgenomenetworkscanbeseenasunimodalprojections[38]ofsuchbipartite ‘genefamilies–genomes’graphs (Figure 1D).Bipartite graphs include the same diversity of genomesasthegenomenetworksdescribedabove,buttheyaremoreaccurate.Importantly, simplespecificbioinformatictreatmentsofthesemultilevelgraphsallowonetorapidlyidentify which groups of genes areshared by which groups ofgenomes [39],and to display and comparedifferentchannelsofgenetransmission,thatis,theroutesacrossgenerationsthrough whichhereditaryresourcesorinformationpassfromparenttooffspring[31].
Asingenomenetworks,connectedcomponentsproduceaninformativepartitionofthedata. Thispartitioncanmoreoverbeexaminedatdifferentlevelsofsimilaritybytuning,forexample,the sequenceidentitypercentage.Whenthedataconsistofalltheproteinsequencesfromallthe completeviral(3749),plasmidic(4350),andarchaeal(152)genomes,togetherwitha repre-sentativesubsampleoftheeubacteria(230)fromNCBI,wegetthenumbersshowninTable1. Assuming a rough molecular clock, these thresholds are useful for investigating events of differentages.Sequenceswith90%identityhavearelativelyweakdivergencewithrespect tosequenceswith30%identity;indeed,theselatterhavelikelydivergedfasterorforalonger periodoftime.
Thisrepresentationofgenefamilies–genomesbipartitegraphsisexplicitlymultilevel. Interest-ingly,itsanalysisdoesnotrequireanygraphclusteringalgorithm(whoseresultstendtovary
considerablywiththeirimplementation).Genetictransmissionamongmicrobescanbe investi-gated by simple topological notions of bipartite graphs that result in biologically relevant observations:twinsandarticulationpoints[40]thatwedetailbelow.
Weapplyherethesenotionsonlytogenefamilynodes.‘Twin’isanotionofgraphtheory;applied togenefamilies–genomesgraphs,itsinglesout‘fellowtravellers’:genefamiliesaretwinswhen theyarepresentinexactlythesamesetofgenomes.Inthelanguageintroducedin[34],the supportofsuchatwindefinesaclubofgenomes.Clubsofgenomes,whencomposedof individualspertainingtodifferentspecies,couldencouragefurtherstudiesof‘kin-coevolution’, forexample,thefactthatgeneticdivergenceaffectingmultipleecologicallycoexistinglineages, thatexchangedgenesatsomepointoftheirevolution,producesmultilineagepersistentclubs. Thebipartitegraphcanbesimplifiedbygroupingtogethersetsofgenefamiliesthatareshared byexclusivegroupsofgenomes,andbyreplacingeachsuchgroupofgenefamiliesbya super-node.Nodesthatremainuntouchedbythisreductionprocessareconsideredastrivialtwin classes(andresultintrivialsuper-nodes).Technically,thereisnodifferencebetweentrivialand non-trivialtwins,although,fromthebiologicalperspective,thelattercorrespondtogroupsof genefamiliesthataremorelikelytobetransmittedtogether.Theresultingquotientgraphis reduced,becauseeveryclubofgenomesisnowdefinedbyonesuper-node(individualgene familyorgroupofgenefamilieshostedinthisclubofgenomes) whilenoinformationislost (Figure2).Thispropertymeansthatevenverylargegraphscanbeinvestigated.Inthedataset presentedin Table 1,we typically find clubs, such as the one composed of the firmicute Enterococcusfaecalisandnineplasmids(present inlactococciorenterococci)that simulta-neouslyandexclusivelysharethefollowinggenefamilies(at90%identity):ribose5-phosphate isomerase RpiB, galactose mutarotase and related enzymes, b-glucosidase/6-phospho-b-glucosidase/b-galactosidase,andphosphotransferasesystemcellobiose-specificcomponent IIA.Thesesharedmobilizedgenefamiliesareinvolvedinneighborpathwaysofsugar metab-olisms(specificallyinglycolysisandinthepentosephosphatemetabolicpathways),whichlikely explainstheircollectivemobilizationinplasmids.
Articulation points ina gene families–genomesbipartite graph correspond to gene families sharedbymanygenomeswithotherwise totallydistinctgene contents(foragivensimilarity threshold).Althoughstrictlytopological,thenotionofanarticulationpointisthusexpectedto helpdetectpublicgeneticgoods[34],thatis,geneticmaterialthatisbeingsharedby taxonomi-callydistantgenomes,whichpossiblybenefitfromthepropertiestheyconfer,forsomereason otherthangenealogy(i.e.,genescodingforenvironmentaladaptationorhitch-hikingwiththose). However,anarticulationpointcanalsodetectselfishgenes,suchastheabundanttransposases
[41],whicharespreadingacrossmultipledistantlyrelatedgenomes(Box2). Table1.StatisticsoftheProkaryote–Virus–PlasmidGeneFamilies–GenomesBipartiteGraphsa
Minimalidentitypercentagetoconnectsequences 30% 60% 90% Numberofconnectedcomponents(CC) 156 375 488 NumberofCChavingonlyplasmids 25 73 155 NumberofCChavingonlyviruses 130 299 297 Sizeofthegiantconnectedcomponent(numberofnodes) 6362 5143 2769
a
Forreciprocal80%lengthcover,anddifferentidentitythresholds.
Thedataconsistofallproteinsequencesfromallcompleteplasmidic,viral,andarchaealgenomesfromNCBI(asof11/ 2013),aswellasonecompleteeubacterialcompletegenomeforeachfamily.Theidentitypercentagedescribesthe similarity,intermsoftheconservationofprimarysequences,betweenpairsof molecules.Thehigherthis‘identity threshold’themoresimilarpairsofsequencesmustbetobedirectlyconnectedinasequence-similaritynetwork.For high‘identitythreshold’,connectedcomponentsconsistofhighlyconservedsequences.Inafirstmolecularclock-like approximation,higher‘identitythresholds’definegroupsofsequencesthatdivergedmorerecentlyfromoneanotherthan groupsdefinedwithlower‘identitythreshold’.
InvestigatingCompositeGenes, Organisms,andEvolutionaryTransitions
withSequence-Similarity Networks
Introgressioncanalsobeinvestigatedbelowthegenelevelandabovetheorganismallevel.For instance,compositegenes,suchasthegenesproducedbyevolutionarytinkering[42],famous forencodingmultidomainproteins,arewelldocumentedincellulargenomes[43–45],andhave beenreportedinvirusesandplasmids[46,47].Suchgenesarecomposedofgeneticfragments (e.g.,components,whichcanbedomainsorfullgenes)thatareotherwisefoundinseparate genefamilies [48].The fusionof areceptor-binding protein witha tail fiberprotein inthe lactococcalbacteriophagebIBB29,producingacompositegeneinvolvedinhostspecificity, offersagoodexampleofthissortofmolecularmosaic[49].Whilemanysubstitutionmodels havebeen developed toaccount for gradual evolution by point mutation inphylogenetic inferences, modelsdescribing the rules andrates of emergence (or fission)of composite genesarestillrare[50–52],especiallyforunicellularorganismsandmobilegeneticelements
[46,47].However,manygenefamiliesarenotjustevolvinggraduallywithintheboundariesofa single genefamily [53].The accretionof twoprotein domainsinto a novelhost structure constitutesacaseofsaltatorymolecularevolutionbyintrogression.Therulesofevolutionand fragmentcombinationlargelyremaintobediscovered[54,55].Sequence-similaritynetworks couldcontributetothistask.Indeed,thesegraphscan:(i)provideasystematicdescriptionof bothcompositeandcomponentgenesingenomes(andmetagenomes);(ii)beusedtopolarize fusionandfissionevents(bycomparingthetaxonomicaldistributionofgeneshostsin associ-atedcomponentandcompositegenefamilies);(iii)bedirectlyusedtocomparetherelative conservationofoverlappingcomponentandcompositesequences,forexample,todetermine whetherdomainsfoundindifferentcombinationhavedifferentratesofevolution.Thedetection ofcompositegenesusingsequence-similaritynetworkscanfurthercontributeto understand-ingthe rulesof evolutionof other biologicalnetworks, suchas protein–protein interaction networks[56].Forinstance,when,asaresultofexon-ordomain-shuffling,compositegenes producenovelcombinationsofdomainsofinteraction,compositegenescanintroducenovel nodesandedgesinprotein–proteininteractions.Likewise,compositegenescanimpactthe robustnessofprotein–proteininteractionnetworks,whengenescodingforseparateproteins involvedinafunctionalinteractionbecomefused,‘crystallizing’anedgeoftheprotein–protein interactionnetwork. Support 2 Support 2 Support 3 Support 1 Support 3 Support 1
Twin 1 Twin 2 Twin 3
Super-node 1 Super-node 2 Arculaon point
Super-node 3
(A) (B)
Figure2.TwinsandArticulationPointsinaBipartiteGraph.(A)Topnodesinthisbipartitegrapharegenomesand bottomnodesgenefamilies.Nodesineachcoloredellipseatthebottomformatwinclass,sincetheirsetsofneighbors (supportsencircledbysimilarlycoloredellipsesonthetoplevel)areidentical(ashighlightedbythecoloringoftheirincident edges).(B)Collapsingtwinnodesintosuper-nodesyieldsareducedgraph,withoutfurtherbottomtwinnodes.The supportedgroupsofhostgenomesareunchanged,andarenowdefinedastheneighborsofasinglesuper-node.Dueto thegraphreduction,thegreensuper-nodeisnowanarticulationpoint,sinceitsremovaldisconnectsthenodesinthepink andbrownsupports.
This issue takes on particularly fundamental importance in organisms hosting genes from multipleorigins.Theseintrogressedgeneshavedistinctevolutionarypasthistoriesand,hence, possiblydifferentfutureevolvabilities.Forexample,eukaryoticgenomes[57,58]aswellasmajor archaeallineages arecomposed of genesfromboth bacterialand archaealorigins [59,60]. Somestudieshave focusedontheevolution ofcomplete genesofdistinctorigins inthese mosaictaxa[i.e.,contrastingtheessentialityorcentralityofgenesfrombacterialandarchaeal originsinregulatoryormetaboliceukaryoticnetworks[61],orsimplyperformingclassic phylo-geneticanalysesofthesegenestoidentifyendosymbioticgenetransfer(orEGT)[8]].Acommon fateforproteinsderivedfromsuchtransferredorganellargenesistobetargetedbacktothe compartmentof originto performtheir original function,but not only[62].Regardingthese proteinsandgenes,thestudy ofcompositeorganisms hasopenedthedoorto anexciting evolutionaryquestionthat,weargue,networkscannowbetteraddress:whathappensafter distinctgeneticmaterialbecomesintegratedintoanewhost?Genesfromdistinctoriginscould havedifferentpropensitiestobelostortodivergeduringsubsequentevolutionoftheirnovel composite host lineage [63]. Likewise, at the infragenic level, the evolutionary impact of introgression deserves consideration. Do composite organisms host novel symbiogenetic compositegeneswithcomponentsfromdifferentphylogeneticoriginsthatcouldonlybeborn insuchgeneticmelting-potsas aresultoftheoriginalmixingofgene fragments?Apositive answer,thatis,thedetectionofsuchnovelcompositegenesincompositeorganisms,could
Box2.ArticulationPointsRevealPotentialPublicGeneticGoods
Inaprokaryote–virus–plasmiddataset,wetypicallyfindclubsofgenomes,suchastheone(representedinFigureI) composedoftwomesophilicsulphur-reducingacetate-metabolizingProteobacteria(Geobactersulfurreducensand Desulfobaccaacetoxidans)andtwothermophilichydrogenotrophicmethanogenEuryarchaeota(Methanocellaconradii andMethanocellapaludicola).Thesetaxaarelinkedbyanarticulationpoint,whichindicatesthesharingofaconserved genefamily(at>90%identity),functionallyannotatedasatRNA(1-methyladenosine)methyltransferase.Thiskindof associationbetweensulphate-reducerandmethanogensiswell-documentedintheliterature[96,97].Thesharingof genesbetweendifferentprokaryotessuggestedbythisnetworkanalysismakessense,sincetheseprokaryotesare foundincommonanoxicenvironments,suchasricepaddysoils[98].Also,G.sulfurreducensandM.paludicolacontaina laterally-transferredtwo-genecluster,hgcAB,relatedtotheabilitytomethylatemercury[99].Thus,agraphanalysis producesanoveltestablehypothesis,namely,toseeifthesharedtRNAmethyltransferaseisinvolvedintheadaptationto theenvironmentofthesetaxa,orifithitch-hikedwithothergenestransferredbetweenthesetaxa,suchasthehcgAB cluster. Geobacter sulfurreducens Desulfobacca acetoxidans Methanocella conradii Methanocella paludicola tRNA methyltransferase
FigureI.ExcerptofaTypicalReducedGeneFamiles–GenomesBipartiteGrapharoundanArticulation Point.ThetopnodescomposetheclubdefinedbythesharingofaconservedtRNAmethyltransferase(bottomnodein yellow).Forsimplicity,onlythedirectneighborsofthemembersoftheclubhavebeenincludedinthepictureofthegraph. Theremovalofthearticulationpoint(inyellow)isolatesthetwotaxonomicallyhomogeneousgroupsfromeachother.
>80% reciprocal coverage Paral coverage
Related to endosymbiont
Composite in host Related to host
(A)
(B)
(C)
(D)
Figure3.Typical PatternsforCandidateEndosymbioticGeneTransfer(EGT)and CompositeGenesin Sequence-SimilarityNetworks.(A)Sequence-similaritynetworkscanbeusedforthedetectionofdistanthomologuesin eukaryoticgenomes.Complete(left)andpartial(right)sequencesimilarity,andhowtheyaretranslatedasdifferenttypesof edgesinthesequence-similaritynetwork(SSN).Inblack,thepercentageofreciprocalcoverishigh;thesequencesare homologousovertheirentirelength.Inpurple,thecoverpercentageislow;thesequencesareonlypartlysimilar,thatis,they shareahomologousdomain.(B)Shortest-pathanalysisinasequence-similaritygraphcanbeusedfordetectingpossible endosymbioticgenetransfer(EGT).Indeed,EGTresultsinacharacteristicnetworkpattern:anindirectshortpathalong whichalledgesindicatehomology,connectingtwonodescorrespondingtodivergedsequencespresentinagivenhost organism.Greennodesrepresenteukaryoticsequences;red,bacterialsequences;andyellow,archaealsequences.Black edgesdenotecompletesequencesimilarity(>80%length).Allshortestpathsbetweeneukaryoticsequencesthatpass throughthebacterialandarchaealcomponentsarelikelycandidatesforEGT,becausethisindicatesthatafirsttypeof eukaryoticsequencehasaffinitiestobacterialsequenceswhileasecondtypehasaffinitiestoarchaealones.(C) Sequence-similaritynetworkswithedgesforcompleteandpartialcoveragearealsousefulforthedetectionofcompositegenes.The
revolutionizeourunderstandingoftheoriginsofbiologicaltraits.Anegativeanswer,thatis,the lackofnovelcompositegeneticmaterialfromdifferentorganismalsourcesdespitetheirnew physicalproximity,wouldindicatestrongselectivepressurespreventingthebirthofnovelgene families in spite of changes in their genomic context. Thus, it would be worth testing if introgressionatonelevelofbiologicalorganization(i.e.,betweencells)canfavorintrogression atanotherlevel(i.e.,between genes).Forexample,organismswith compositegenomes,or holobionts, might be composed of morecomposite symbiogenetic genes than organisms devoidofendosymbionts,orlesssubjectedtogenetransfer.
Sequence-similaritynetworksareidealtoolsforinvestigatingtheseissues(Figure3).Thesevery inclusivegraphs[47,53]allowforcomparativeanalysesofmassivedatasetswithouttheneedfor multiplesequencealignments[4,64–66].SimilarityistypicallydetectedinaBLASTall-versus-all analysistoproduceatableofpairwisehits[67].Sequence-similaritynetworksaredisplayedand analyzedas asetof connectedcomponents(Figure 1A)[68].When thecoveragebetween sequencesishigh,thispartitionofthenodesdefinesgroupsofputativehomologoussequences orgenefamilies.Thus,sequence-similaritynetworkshavebeenusedwithrelativelystringent criteria(i.e., hits between two sequences must show >30% identity, cover 80% of both sequenceslength,andhaveamaximalE-valueof10–5inBLASTcomparativeanalyses)coupled with clustering methods to identify clusters of nodes corresponding to homologous gene families[69–71].Inthepast20years,sequence-similaritynetworkshaveindeedmainlybeen usedto investigate theevolution ofprotein-coding genes [4,64–66,71–75],and to perform functional annotation. For instance, the COG categories correspond to groups of similar sequences(withremarkabletopologicalpropertiesinsequence-similaritynetworks)thathave likelyevolvedfromasingleancestralgene.Incomparativeanalyses,COGareoftenusedas proxyforfunctionalannotationsbecausetheirremarkableconservationsuggeststhat sequen-cesfrom thesameCOG mayhave preservedsomecommonfunctions [71].This standard approach, however, would not readily detect composite genes [76]. Using less stringent thresholdsformutualsequencecoverage(Figure3A)oridentitypercentage,sequence-similarity networkscanbeusedtodetectsuperfamilies[66,77–79],divergenthomologues,orcomposite genes[when,forexample,thelengthcoverageconditionisrelaxedtotakeintoaccount(partial) similarity(Figure3C),suchasdomainsharing,betweensequences][46].
Thesekindsofanalyseswithflexibledefinitionsconfirmthatnotalleukaryoticgenefamilieshave homologsinprokaryotes.Whenthey do,sequence-similaritynetworkanalysesindicate that eukaryoticgenefamilieshomologoustothoseofbacteria(forwhichsequencesofeukaryotes exclusivelyclusterwithsequencesfrombacteria[63])andeukaryoticgenefamilieshomologous tothoseofarchaea(forwhichsequencesofeukaryotesexclusivelyclusterwithsequencesfrom archaea)havedifferentratesofevolution.Forexample,eukaryoticgenefamilieswithbacterial originsaremoreeasilyexpandedorlostwheneukaryoticgenomesexpandorshrink,whilethe numberofeukaryoticgenefamilieswitharchaealoriginsismuchmorestable[22,63].
figureshowsapatternassociatedwiththedetectionofcompositegenes.Blackedgesdenotecomplete(>80%cover)and purpleedgedenotepartial(<80%cover)sequencesimilarity.Thegreenfamilyisacandidatesymbiogeneticcomposite gene,derivedfromendosymbioticlateralgenetransfer,sinceitdisplaysonepartwithsimilaritytohost-relatedsequences (yellow)andanotherpartwithsimilaritytoendosymbiont-related(blue)genes.(D)AconcreteexampleofapossibleEGT: archaealsequencesarerepresentedinblue,eubacterialinred,andeukaryoticgenesingreen(thereisalsoasingle plasmidicsequenceinblue-greenontheright).Eukaryoticsequencesclearlyformtwogroups,oneclosertoarchaea,one morerelatedtoeubacteria.AllthesequenceshaveagenericannotationasRNA-pseudouridinesynthase,butwhilethe eubacterial(andrelatedeukaryotic)sequencesareexclusivelytRNAsynthases(thusputativelyofmitochondrialorigin),on thearchaealside(thuspossiblyofhostorigin)wefindtRNA-aswellasrRNA-pseudouridinesynthases.Itindeedturnsout thatthisfamilycontainstwopseudouridinesynthasegenesthatarebothpresentinSaccharomycescerevisiae,havinga similarfunctionbutactingonadifferentsubstrate:oneonthearchaealside,codingforCbf5pthatactsonlargeandsmall rRNA[100,101],andtheotherontheeubacterialside,codingforPus4,thatactsonmitochondrialandcytoplasmic tRNA-uridine[102].
Moreover,sequence-similaritynetworksdemonstratedtheirefficiencytounraveldistant homo-loguesineukaryoticgenomes,thatis,genefamiliesforwhich somepresent-dayeukaryotes possessaversionthatoriginatedfromabacterialprogenitor,whileotherpresent-day eukar-yotespossessanhomologousversionthatoriginatedfromanarchaealprogenitor,orwhenthe sameeukaryotespossessbothdivergedversionsinitsnucleargenome,onefromabacterial origin,theotherfromanarchaealorigin[63](Figure3B).Thelatterpresenceofsuchdistant homologuescharacterizestheoccurrenceofEGT[7,59],anintrogressiveprocesswhereagene fromanorganelle(suchas mitochondria orplastids)has beenimportedinto theeukaryotic nuclear DNA, where an homologous nuclear copy from archaebacterial origin was already present(Figure3D).Thesenetworksarepromisingtolookforpossiblystill-hiddenEGT,andpast endosymbioseswhentheyareappliedtonewgenomicdata.
Sequence-similaritynetworksarealsomostusefulforidentifyingcompositegenes(Figure3C), andtheirusefordetectinggenescomposedofpartsfromdifferentoriginswilllikelysoonaid reticulateevolutionanalyses[46,47,53].Indeed,thelevelofmolecularintricacybetweenhosts andsymbiontsmaywellexceedwholegeneintrogressioninthegenomeofcomposite organ-isms. Preliminary results show that photosynthetic eukaryotes contain some novel nuclear composite genes, featuring unique couplings of domains from plastid origin, without any counterpartintheprokaryoticworld.Forexample,photosyntheticdinophytescontaina com-positegenecodingforaproteinconsistingoftwodomains:oneSufEdomainofcyanobacterial origin(i.e., probablyoriginating from the chloroplastgenome) anda tRNA (5-methylamino-methyl-2-thiouridylate)-methyltransferaseofproteobacterialorigin.Interestingly,SufEdisplays desulfurylaseactivity[80],whilethetRNA (5-methylaminomethyl-2-thiouridylate)-methyltrans-feraseposessesathiolgroup(R-S-H)containingasulfuratom.Itispossiblethatthesulfuratom requiredforthethiolgroupisprovidedbytheactivityofSufEtothenewphysicalcouplingof these domains ina symbiogenetic gene.Such findings encourage experimentalstudies to establish whether and which biological properties emerged from the physical coupling of domainsinanoveleukaryoticgenewithendosymbioticorigin.
Understandingtheentanglementofmolecularbuildingblocks,belowandabovethegenelevel, isprobablythenextsteprequiredtoanalyzemolecularprocessesgoingonduringevolutionary transitionsmediatedbythemergingoflineages[4,57–60].
ConcludingRemarks:NetworksEnhanceOurComprehensionof Life's
Complexity
Thecomplexityanddiversityofphenomena acknowledgedandinvestigatedbyevolutionary biologistsisstriking,andgrowing:itnowgoeswellbeyondtheidentificationoflineage diver-gence from a single common ancestor, enhancing what is considered as the Darwinian paradigm.When pushedtoits limits,introgression mightresultintheintegrationoflaterally acquiredfeaturesintoasustainablestructure,controllablebyregulatorysystems,whichmay themselvesbetheresultofintrogression.Atechnicalandtheoreticaltransitionhasaccompanied this broadening of scope within the evolutionary paradigm. Namely, network models and methods,nevertrulyabsentinbiologicalstudies[81],havebeendevelopedandimplemented. Hence,theynowofferpowerfulcomplementaryapproachestoevolutionarystudies,whichwill enhancetheexploitationofmoleculardatasetsinmultipledirections.Theroutesandgenetic goodsofmicrobialsociallife,theoriginsandcombinationrulesofcompositegenes,andthe genetictransformationcoupledwithmajorevolutionarytransitions,canreadilybeinvestigated usingpowerful,inclusive,comparativenetwork-basedtools.Thediversityofsuchtoolsisitself constantlyincreasing:themulti-thresholdedsequence-similaritynetworks,(multiplex)genome networks, and the bipartite graphs presented here, allow one to perform multi-agent and multilevelcomparative analyses, and may become as familiar to evolutionary biologists as phylogenetictreesinthenearfuture.Importantly,thesenetworktoolshavenotyetbeenused
OutstandingQuestions
Whataretherulesofdomainandgene shufflinginmicrobes? Sequence-simi-laritynetworksprovidefastand effec-tivemeansforsystematicanalysesof theevolutionofcompositegenes,by simultaneously detecting families of componentscontributingtocomposite genefamilies.Thephylogeneticorigins andthefunctionalcategoriesofthese components will show whether microbesareusingtransferredgenes tocreatenewcompositegenesintheir genomes.Forexample,dothe notori-ously mosaic haloarchaeal genomes harbor compositegenesof bacterial origin?Doestheproportion of com-positegenesinmicrobeschangewith theenvironment?Canoneintroduce modelsofnucleotidesubstitutioninto sequence-similaritynetworksinorder tomakethemmorerealisticwithregard tosequenceevolution?
Iseverygeneeverywhere? Gene-simi-laritynetworksapplied tolarge-scale metagenomicdataandgene-sharing networks featuring environments insteadofgenomesastheirnodeswill provide inclusive novel ways to addressthisimportantquestion.These graphs will show whether similar sequencesarefoundingeographically or ecologically similar environments, and serve todetect ubiquitous and endemicgenessets.
Whatphenotypes inholobionts have multipleorigins,thatis,didnotevolve withina single phylum butemerged froma biological collective?Bipartite graphswithmicrobialtaxaormicrobial gene families as bottom nodes and with animalor humanhosts as top nodeswill immediately allowfor the identification of phylogenetically het-erogeneous groups of microbes, or groupsof genefamiliesin microbes, always associated with a particular host-levelphenotype.
Howdoprocessesofmolecular evo-lution occurring at the level of the microbiota affect eukaryotic hosts? Themicrobialgenefamilies–eukaryotic hostbipartitegraphsdescribedabove can berefined totake into account informationaboutthemolecular evolu-tionofthegenefamilies(e.g.,theirrate ofevolution,orwhethertowhatextent and by whatmobile elements each genefamilywaseventuallytransferred). This adds an explicit evolutionary dimensiontothebottom-levelnodes, allowingonetoevaluate,forexample,
attheirfullpotential(seeOutstandingQuestions).Inparticular,theycouldalsobeusedtoanalyze theevolutionofcommunitiesofsyntheticmicroorganisms,biofilms,andholobionts.Theselatter collectivesystems encompassa challengingcomplexity. Forexample, holobionts relyon a multiplicityofinteractingtransmissionsystemsandchannelsfortheirdevelopmentandevolution thatdifferinthemicrobesandintheirhosts.Thisheterogeneitycomplicatestheunderstandingof thecausesofholobionts’collectivephenotypesbytraditionalmethods,eveninthemetazoan world[82].Applyinganetworkanalyticalframeworktoholobiontstudiesmaybeaninnovative way to decipher what traits, long held as characteristic of a single animal (i.e., species incompatibility,self-immunity,orpossiblybehavior[83,84]),orofanindividualorganism/biofilm (i.e., health conditions [85,86] or drug resistance), originate from complex interactions, at multiplebiological levels,andhow theseinvolvemicrobes andtheir genes. Moregenerally, network-thinkinghaslotstocontributetomicrobiology.
Acknowledgments
E.C.andE.B.arefundedbyFP7/2007-2013GrantAgreement#615274.
References
1. Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNatural Selection,JohnMurray
2. O’Hara,R.J.(1997)Populationthinking andtreethinkingin systematics.Zool.Scr.26,323–329
3. Doolittle,W.F.andBapteste,E.(2007)Patternpluralism andtheTree ofLifehypothesis.Proc.Natl.Acad.Sci.U.S.A.104,2043–2049 4. Bapteste,E.etal.(2012)Evolutionaryanalysesof
non-genea-logicalbondsproducedbyintrogressivedescent.Proc.Natl. Acad.Sci.U.S.A.109,18266–18272
5. Doolittle,W.F.(1999)Phylogeneticclassificationandthe univer-saltree.Science284,2124–2129
6. Bapteste,E.(2014)Theoriginsofmicrobialadaptations:How introgressivedescent,egalitarianevolutionarytransitionsand expandedkinselectionshapethenetworkoflife.Front. Micro-biol.5,1–4
7. Archibald,J.M.(2015)Genomicperspectivesonthebirthand spreadofplastids:Fig1.Proc.Natl.Acad.Sci.U.S.A.112, 10147–10153
8. Lane,C.E.andArchibald,J.M.(2008)Theeukaryotictreeoflife: endosymbiosistakesitsTOL.TrendsEcol.Evol.23,268–275 9. Claverie,J-M.andOgata,H.(2009)Tengoodreasonsnotto
excludegirusesfromtheevolutionarypicture.Nat.Rev. Micro-biol.7,615
10. Koonin,E.V.etal.(2009)Compellingreasonswhyvirusesare relevantfortheoriginofcells.Nat.Rev.Microbiol.7,615 11. Moreira,D.andLópez-García,P.(2009)Tenreasonstoexclude
virusesfromthetreeoflife.Nat.Rev.Microbiol.7,306–311 12. Navas-Castillo,J.(2009)Sixcommentsonthetenreasonsfor
thedemotionofviruses.Nat.Rev.Microbiol.7,615 13. Raoult,D.(2009)Thereisnosuchthingasatreeoflife(andof
coursevirusesareout!).Nat.Rev.Microbiol.7,615 14. Villarreal,L.P.andWitzany,G.(2010)Virusesareessentialagents
withintherootsandstemofthetreeoflife.J.Theor.Biol.262, 698–710
15. Lukjancenko,O.etal.(2010)Comparisonof61sequenced Escherichiacoligenomes.Microb.Ecol.60,708–720 16. Tettelin,H.etal.(2005)Genomeanalysisofmultiplepathogenic
isolatesofStreptococcusagalactiae:Implicationsforthemicrobial “pan-genome”.Proc.Natl.Acad.Sci.U.S.A.102,13950–13955 17. Dagan,T.etal.(2008)Modularnetworksandcumulativeimpact oflateraltransferinprokaryotegenomeevolution.Proc.Natl. Acad.Sci.U.S.A.105,10039–10044
18. Dagan,T.andMartin,W.(2007)Ancestralgenomesizesspecify theminimumrateoflateralgenetransferduringprokaryote evolution.Proc.Natl.Acad.Sci.U.S.A.104,870–875 19. Halary,S.etal.(2010)Networkanalysesstructuregenetic
diver-sityinindependentgeneticworlds.Proc.Natl.Acad.Sci.U.S.A. 107,127–132
20. Skippington,E.andRagan,M.A.(2011)Lateralgenetictransfer andtheconstructionofgeneticexchangecommunities.FEMS Microbiol.Rev.35,707–735
21. Shapiro,J.A.(2007)Bacteriaaresmallbutnotstupid:cognition, naturalgeneticengineeringandsocio-bacteriology.Stud.Hist. Philos.Biol.Biomed.Sci.38,807–819
22. Ku,C.etal.(2015)Endosymbioticoriginanddifferentiallossof eukaryoticgenes.Nature524,427–432
23. Lobkovsky,A.E.etal.(2014)Estimationofprokaryotic super-genomesizeandcompositionfromgenefrequencydistributions. BMCGenomics15,S14
24. Kloesges,T.etal.(2011)Networksofgenesharingamong329 proteobacterialgenomesrevealdifferencesinlateralgene trans-ferfrequencyatdifferentphylogeneticdepths.Mol.Biol.Evol.28, 1057–1074
25. Popa,O.andDagan,T.(2011)Trendsandbarrierstolateralgene transferinprokaryotes.Curr.Opin.Microbiol.14,615–623 26. Popa,O.etal.(2011)Directednetworksrevealgenomicbarriers
andDNArepairbypassestolateralgenetransferamong prokar-yotes.GenomeRes.21,599–609
27. Jain,R.etal.(1999)Horizontalgenetransferamonggenomes: thecomplexityhypothesis.Proc.Natl.Acad.Sci.U.S.A.96, 3801–3806
28. Park,C.andZhang,J.(2012)Highexpressionhampers hori-zontalgenetransfer.GenomeBiol.Evol.4,523–532 29. Sorek,R.etal.(2007)Genome-wideexperimentaldetermination
ofbarrierstohorizontalgenetransfer.Science318,1449–1452 30. Cohen,O.etal.(2011)Thecomplexityhypothesisrevisited: connectivityratherthanfunctionconstitutesabarriertohorizontal genetransfer.Mol.Biol.Evol.28,1481–1489
31. Lamn,E.(2014)Inheritancesystems.InTheStanford Encyclo-pediaofPhilosophy(Winter2014)(Zalta,E.N.,ed.), http://plato. stanford.edu/archives/win2014/entries/inheritance-systems 32. Lima-Mendez,G.etal.(2008)Reticulaterepresentationof
evo-lutionaryandfunctionalrelationshipsbetweenphagegenomes. Mol.Biol.Evol.25,762–777
33. Halary,S.etal.(2013)EGN:awizardforconstructionofgeneand genomesimilaritynetworks.BMCEvol.Biol.13,146 34. McInerney,J.O.etal.(2011)Thepublicgoodshypothesisforthe
evolutionoflifeonEarth.Biol.Direct6,41
35. Brandes,U.(2008)Onvariantsofshortest-pathbetweenness centralityandtheirgenericcomputation.Soc.Netw.30,136–145 36. Hendrix,R.W.etal.(1999)Evolutionaryrelationshipsamong diversebacteriophagesandprophages:Alltheworld'saphage. Proc.Natl.Acad.Sci.U.S.A.96,2192–2197
37. Yutin,N.etal.(2013)Virophages,polintons,andtranspovirons:a complexevolutionarynetworkofdiverseselfishgeneticelements withdifferentreproductionstrategies.Virol.J.10,158
the impact of lateral gene transfer, operating at the microbial level, on thephenotypesoftheeukaryotichost. Forexample,itbecomeseasytotest whether laterally transferred genes, mobilizedbyabroaderrangeofmobile elements,aremorelargelydistributed inhumanhoststhanareresidentgene familiesofthemicrobiome.
Can one extend the methods from bipartitetotripartitegraphs,toaccount formorelevelsofbiological organiza-tion?Thisdefines,asarealistic objec-tive, the implementation of genes– genomes–environments tripartite graphs,whichcanthenbeclustered toprovideaglobalyetaccurate repre-sentationof thestructure of genetic diversityonEarthinasingle compara-tiveanalysis.
38. Ahn,Y-Y.etal.(2011)Flavornetworkandtheprinciplesoffood pairing.Sci.Rep.1,196
39. Rivera,C.G.etal.(2010)NeMo:networkmoduleidentificationin cytoscape.BMCBioinformatics11(Suppl.1),S61 40. Diestel,R.(2006)GraphTheory,SpringerScience&Business
Media
41. Aziz,R.K.etal.(2010)Transposasesarethemostabundant,most ubiquitousgenesinnature.NucleicAcidsRes.38,4207–4217 42. Derouiche,A.etal.(2015)Evolutionandtinkering:whatdoa
proteinkinase,atranscriptionalregulatorandchromosome seg-regation/celldivisionproteinshaveincommon?Curr.Genet. PublishedonlineAugust19,2015.http://dx.doi.org/10.1007/ s00294-015-0513-y
43. Kawashima,T.etal.(2009)Domainshufflingandtheevolutionof vertebrates.GenomeRes.19,1393–1403
44. Chothia,C.(2003)Evolutionoftheproteinrepertoire.Science 300,1701–1703
45. deSouza,S.J.(2012)Domainshufflingandtheincreasing com-plexityofbiologicalnetworks.Bioessays34,655–657 46. Jachiet,P.A.etal.(2013)MosaicFinder:Identificationoffused
genefamiliesinsequencesimilaritynetworks.Bioinformatics29, 837–844
47. Jachiet,P.etal.(2014)Extensivegeneremodelingintheviral world:newevidencefornon-gradualevolutioninthemobilome network.GenomeBiol.Evol.6,2195–2205
48. Cheng,S.etal.(2014)Sequencesimilaritynetworkrevealsthe imprintsofmajordiversificationeventsintheevolutionof micro-biallife.Front.Ecol.Evol.2,1–13
49. Hejnowicz,M.S.etal.(2009)Analysisofthecompletegenome sequenceofthelactococcalbacteriophagebIBB29.Int.J.Food Microbiol.131,52–61
50. Pasek,S.etal.(2006)Genefusion/fissionisamajorcontributor toevolutionofmulti-domainbacterialproteins.Bioinformatics22, 1418–1423
51. Kummerfeld,S.K.andTeichmann,S.A.(2005)Relativeratesof genefusionandfissioninmulti-domainproteins.TrendsGenet. 21,25–30
52. Snel,B.etal.(2000)Genomeevolution.TrendsGenet.16,9–11 53. Haggerty,L.S.etal.(2014)Apluralisticaccountofhomology: adaptingthemodelstothedata.Mol.Biol.Evol.31,501–516 54. Patthy,L.(2003)Modularassemblyofgenesandtheevolutionof
newfunctions.Genetica118,217–231
55. Nakamura,Y.etal.(2007)Rateandpolarityofgenefusionand fissioninOryzasativaandArabidopsisthaliana.Mol.Biol.Evol. 24,110–121
56. Dohrmann,J.etal.(2015)Globalmultipleprotein–protein inter-actionnetworkalignmentbycombiningpairwisenetwork align-ments.BMCBioinformatics16,S11
57. McInerney,J.O.etal.(2014)ThehybridnatureoftheEukaryotaand aconsilientviewoflifeonEarth.Nat.Rev.Microbiol.12,449–455 58. Williams,T.A.etal.(2013)Anarchaealoriginofeukaryotes supportsonlytwoprimarydomainsoflife.Nature504,231–236 59. Nelson-Sathi,S.etal.(2012)Acquisitionof1,000eubacterial genesphysiologicallytransformedamethanogenattheoriginof Haloarchaea.Proc.Natl.Acad.Sci.U.S.A.109,20537–20542 60. Nelson-Sathi,S.etal.(2015)Originsofmajorarchaealclades correspondtogeneacquisitionsfrom bacteria.Nature 517, 77–80
61. Alvarez-Ponce, D. and McInerney,J.O. (2011) Thehuman genomeretainsrelicsofitsprokaryoticancestry:humangenes ofarchaebacterialandeubacterialoriginexhibitremarkable differ-ences.GenomeBiol.Evol.3,782–790
62. Deane,J.A.etal.(2000)Evidencefornucleomorphtohost nucleusgenetransfer:light-harvestingcomplexproteinsfrom cryptomonadsandchlorarachniophytes.Protist151,239–252 63. Alvarez-Ponce,D.etal.(2013)Genesimilaritynetworksprovide toolsforunderstandingeukaryoteoriginsandevolution.Proc. Natl.Acad.Sci.U.S.A.110,E1594–E1603
64. Yona,G.etal.(2000)ProtoMap:automaticclassificationof proteinsequencesandhierarchyofproteinfamilies.Nucleic AcidsRes.28,49–55
65. Frickey,T.andLupas,A.(2004)CLANS:aJavaapplicationfor visualizingproteinfamiliesbasedonpairwisesimilarity. Bioinfor-matics20,3702–3704
66. Atkinson,H.J.etal.(2009)Usingsequencesimilaritynetworksfor visualizationofrelationshipsacrossdiverseproteinsuperfamilies. PLoSONE4,e4345
67. Forster,D.etal.(2014)Testingecologicaltheorieswithsequence similaritynetworks:marineciliatesexhibitsimilargeographic dispersalpatternsasmulticellularorganisms.ISMEJ.13,1–16 68. Bittner,L.etal.(2010)Someconsiderationsforanalyzing biodi-versityusingintegrativemetagenomicsandgenenetworks.Biol. Direct5,47
69. Altenhoff,A.M.etal.(2015)TheOMAorthologydatabasein 2015:functionpredictions,betterplantsupport,syntenyview andotherimprovements.NucleicAcidsRes.43,D240–D249 70. Sayers,E.W.etal.(2011)DatabaseresourcesoftheNational
CenterforBiotechnologyInformation.NucleicAcidsRes.39, D38–D51
71. Tatusov,R.L.etal.(1997)Agenomicperspectiveonprotein families.Science278,631–637
72. Bapteste,E.etal.(2013)Networks:expandingevolutionary thinking.TrendsGenet.29,439–441
73. Enright,A.J.andOuzounis,C.A.(2000)GeneRAGE:arobust algorithmforsequenceclusteringanddomaindetection. Bioin-formatics16,451–457
74. Li,L.etal.(2003)OrthoMCL:identificationoforthologgroupsfor eukaryoticgenomes.GenomeRes.13,2178–2189 75. Enright,A.J.etal.(2003)ProteinfamiliesandTRIBESingenome
sequencespace.NucleicAcidsRes.31,4632–4638 76. Song,N.etal.(2008)Sequencesimilaritynetworkreveals
com-monancestryofmultidomainproteins.PLoSComput.Biol.4, e1000063
77. Sasson,O.etal.(2003)ProtoNet:hierarchicalclassificationofthe proteinspace.NucleicAcidsRes.31,348–352
78. Matsui,M.etal.(2013)Comprehensivecomputationalanalysisof bacterialcrp/fnrsuperfamilyanditstargetmotifsrevealsstepwise evolutionoftranscriptionalnetworks.GenomeBiol.Evol.5,267– 282
79. Rappoport,N.etal.(2013)ProtoNet:chartingtheexpanding universeofproteinsequences.Nat.Biotechnol.31,290–292 80. Ollagnier-de-Choudens,S.etal.(2003)Mechanisticstudiesof
theSufS-SufEcysteinedesulfurase:evidenceforsulfurtransfer fromSufStoSufE.FEBSLett.555,263–267
81. Ragan,M.A.(2009)TreesandnetworksbeforeandafterDarwin. Biol.Direct4,43
82. Selosse,M-A.etal.(2014)Microbialprimingofplantandanimal immunity:symbiontsasdevelopmentalsignals.Trends Micro-biol.22,607–613
83. Brucker,R.M.andBordenstein,S.R.(2012)Speciationby sym-biosis.TrendsEcol.Evol.27,443–451
84. Brucker,R.M.andBordenstein,S.R.(2013)Thehologenomic basisofspeciation:gutbacteriacausehybridlethalityinthe genusNasonia.Science341,667–669
85. Hur,K.Y.andLee,M-S.(2015)Gutmicrobiotaandmetabolic disorders.DiabetesMetab.J.39,198–203
86. Gilbert,S.F.etal.(2012)Asymbioticviewoflife:wehavenever beenindividuals.Q.Rev.Biol.87,325–341
87. DunningHotopp,J.C.etal.(2007)Widespreadlateralgene transferfromintracellularbacteriatomulticellulareukaryotes. Science317,1753–1756
88. LaScola,B.etal.(2008)Thevirophageasauniqueparasiteof thegiantmimivirus.Nature455,100–104
89. Boyer,M.etal.(2011)Mimivirusshowsdramaticgenome reduc-tionafterintraamoebalculture.Proc.Natl.Acad.Sci.U.S.A.108, 10296–10301
90. Lanza,V.F.etal.(2015)TheplasmidomeofFirmicutes:impacton theemergenceandthespreadofresistancetoantimicrobials. Microbiol.Spectr.3, PLAS-0039-2014
91. Remigi,P.etal.(2014)Transienthypermutagenesisaccelerates theevolutionoflegumeendosymbiontsfollowinghorizontalgene transfer.PLoSBiol.12,e1001942
92. Serôdio,J.etal.(2014)Photophysiologyofkleptoplasts: photo-syntheticuseoflightbychloroplastslivinginanimalcells.Philos. Trans.R.Soc.Lond.B:Biol.Sci.369,20130242 93. Rauch,C.etal.(2015)Whyitistimetolookbeyondalgalgenesin
photosyntheticslugs.GenomeBiol.Evol.7,2602–2607 94. Ereshefsky,M.andPedroso,M.(2015)Rethinkingevolutionary
individuality.Proc.Natl.Acad.Sci.U.S.A.112,10126–10132 95. Martin,W.F.etal.(2015)Endosymbiotictheoriesfor eukary-oteorigin.Philos.Trans.R.Soc.Lond.B:Biol.Sci.370, 20140330
96. Orphan,V.J.etal.(2001)Comparativeanalysisof methane-oxidizingarchaeaandsulfate-reducingbacteriainanoxicmarine sediments.Appl.Environ.Microbiol.67,1922–1934 97. Ozuolmez,D.etal.(2015)Methanogenicarchaeaandsulfate
reducingbacteriaco-culturedonacetate:teamworkor coexis-tence?Front.Microbiol.6,492
98. Sun,M.etal.(2015)Microbialcommunityanalysisinricepaddy soilsirrigatedbyacidminedrainagecontaminatedwater.Appl. Microbiol.Biotechnol.99,2911–2922
99. Liu,Y.R.etal.(2014)Patternsofbacterialdiversityalonga long-termmercury-contaminatedgradientinthepaddysoils.Microb. Ecol.68,575–583
100.Lafontaine,D.L.etal.(1998)TheboxH+ACAsnoRNAscarry Cbf5p,theputativerRNApseudouridinesynthase.GenesDev. 12,527–537
101.Zebarjadian,Y.etal.(1999)PointmutationsinyeastCBF5can abolishinvivopseudouridylationofrRNA.Mol.Cell.Biol.19, 7461–7472
102.Becker,H.F.etal.(1997)TheyeastgeneYNL292wencodesa pseudouridinesynthase(Pus4)catalyzingtheformationofpsi55 inbothmitochondrialandcytoplasmictRNAs.NucleicAcidsRes. 25,4493–4499